## Goal

In this example, we would like to set up a boolean indicator that is based on whether all contributors to a thread belong to a well-connected group of contributors. We would like to generate two different metrics that let us measure this concept:

1. All of a thread's contributors are part of the [k-core](https://en.wikipedia.org/wiki/Degeneracy_\(graph_theory\)#k-Cores) of the co-contributor network. This means that for all contributors, ``degree(contributor_i) >= k``.
2. All contributors belong to the co-contributor network's [maximal clique](https://en.wikipedia.org/wiki/Clique_problem).

We would like to be able to generate different values for *k*, e.g., to fine-tune this parameter with machine learning.

## Defining the metrics

Every indicator is defined as a function that is decorated with the ``@metric`` decorator from ``pici.reporting``, receives a ``Community`` object as first parameter, and returns (by default) a dictionary of metric-name: metric-value items.

In [1]:
from pici.reporting import metric

# @metric(...)
# def my_indicator(community):
#   return {
#       'metric A': None
#   }

Note: to be able to use all crisp methods, you need to install some additional packages:  {'infomap', 'wurlitzer', 'karateclub', 'graph_tool'}
Note: to be able to use all overlapping methods, you need to install some additional packages:  {'karateclub', 'ASLPAw'}
Note: to be able to use all bipartite methods, you need to install some additional packages:  {'infomap', 'wurlitzer'}


To correctly generate and combine the output of this indicator method, the decorator needs to be fed with information about the indicator's **level** of measurement, and what it returns (**returntype**).

Levels and returntypes are defined as constants in ``pici.datatypes``:

In [2]:
from pici.datatypes import CommunityDataLevel, MetricReturnType

print(CommunityDataLevel._member_names_)
print(MetricReturnType._member_names_)

['POSTS', 'TOPICS', 'CONTRIBUTORS', 'COMMUNITY']
['PLAIN', 'TABLE', 'DATAFRAME']


We would like to generate a boolean value for every thread and all of our metrics. Our indicator's level is therefore *TOPICS*, and our return type *DATAFRAME*. The indicator would have to look like this:

In [3]:
@metric(
    level=CommunityDataLevel.TOPICS,
    returntype=MetricReturnType.DATAFRAME
)
def all_well_connected(community):
    # use the community object to generate the metrics
    return {
        'all contributors in k-core': False,
        'all contributors in dominating set': False
    }

To simplify the notation, there are shorthands for the common combinations of level and returntype. For example, ``@topics_metric`` would be the same as ``@metric(level=CommunityDataLevel.TOPICS, returntype=MetricReturnType.DATAFRAME)``:

In [4]:
from pici.reporting import topics_metric

@topics_metric
def all_well_connected(community):
    pass
    #...

We can use the community parameter to access the data that is required for measurement. In our case, we would like to access the threads (topics), the posts (to know which contributor posted in each thread), and the network. Our results should be Pandas.Series with indices that match that of the community's original topics DataFrame index.

**The complete indicator definition:**

In [5]:
import networkx as nx
from networkx.algorithms.approximation import clique
import pandas as pd
from pici.reporting import topics_metric

@topics_metric
def all_well_connected(community, k):
    """
    An indicator for whether all contributors to a thread
    are well-connected. There are two metrics for this concept,

    - ``all contributors in k-core`` - whether all contributors
       belong to the co-contributor network's k-core, and
    - ``all contributors in max clique`` - whether all
       contributors are in the co-contributor network's maximal
       clique.

    Args:
        community: A pici.Community object
        k: parameter for k-core metric

    Returns:
        dict of (str, Pandas.Series)

    """

    # the easiest way to retain the topics index
    # is to define our metric on the posts df
    # and then aggregate to the topics level:

    # a) k-cores

    contributor_cores = nx.core_number(
        community.co_contributor_graph
    )

    # cores: largest value k of a k-core containing that contributor
    cores = community.posts[community.contributor_column].map(
        contributor_cores
    )
    topics = community.posts[community.topic_column]

    # a df with one row per post, and True if contributor
    # belongs at least to k-core
    df = community.posts[[community.topic_column, community.contributor_column]]
    df['in_k_core'] = df[community.contributor_column].map(
        contributor_cores) >= k

    # aggregate to boolean topic-level metric,
    # rule: all(in_k_core)==True
    all_in_k_core = df[
        [community.topic_column, 'in_k_core']
    ].groupby(by=community.topic_column).agg(all)

    print(all_in_k_core)

    # b) max clique

    """
    max_clique = clique.max_clique(community.co_contributor_graph)

    df2 = pd.DataFrame({
        'topic': community.posts[community.topic_column],
        'in_max_clique': community.posts[
            community.contributor_column].isin(max_clique)
    })

    all_in_max_clique = df2.groupby(by='topic').agg({'bool': 'max'})
    """

    return {
        f'all contributors in {k}-core': all_in_k_core,
        #'all contributors in max clique': all_in_k_core # all_in_max_clique
    }

## Using the indicator

To used this new indicator, it has to registered with the ``Pici`` toolbox object. We can then generate reports that include the indicator, or use it in a ML pipeline.

Set up the toolbox:

In [6]:
from pici import Pici

p = Pici(
    cache_dir='../../cache',
    cache_nrows=3000,
    # start='2017-01-01',
    # end='2019-01-01',
)

Register the indicator with the toolbox:

In [7]:
p.add_metric(all_well_connected)

We can now use the metric in a report. We can pass a static value for *k*:

In [8]:
# p.generate_report([
#    (all_well_connected, {'k':10}),
#    (all_well_connected, {'k':30})
# ]).results

In [10]:
pipe = p.pipelines.topics(parameters={
    'all_well_connected': {'k': 10}
})

In [None]:
pipe.transform(p.communities)