# Describing the functionality of pandas, networkx and the basic functions from the face2face library

**Authors**: Andreas Kruff, Johann Schaible, Marcos Oliveira

**Version**: 12.05.2020

**Description**: This tutorial describes the underlying pandas and networkx methods that are used to build the face2face methods for calculating the average degree for different groups and subgroups in this toolbox.

## Table of Contents
#### [Implement the average_degree duration function](#average_degree)
#### [Implement the group_list_degree function](#group_list_degree)

In [1]:
import face2face as f2f

df = f2f.Data("Synthetic")

# Explanation of the degree methods

The degree of a node (an individual) describes with how many other distinct nodes (individuals) a node was in contact with. This can help us if we use the attributes of this node (like Age, Gender...) to analyze if specific groups are more or less communicative than others. (within and outside of the communitys with the same or different attributes)

## How to implement the average_degreee function 
<a name="average_degree"></a>

At first you have to import the networkx library, because it contains functions that simplify and speed up the calculations, that we will do in this tutorial. 

In [2]:
import networkx

At first we have to set up the dataframe for the metadata. As a first step we have to replace every nan value to "NaN" to make it better accessible. After that we have to create a networkX graph from this dataframe to use the benefits from this object for the measurement of the degree. For that we use the "create_network_from_data" function from the face2face toolbox.

In [3]:
df_meta_nan = df.metadata.fillna("NaN")
network = f2f.create_network_from_data(df)

To analyze the degrees based on specific attributes you have to get an overview which attributes are used in the metadata dataset. In this case the "ID" will be the attribute in the first column, thats why you have to remove it, because it makes no sense to analyze the "ID".

In [4]:
parameter_list = []
for col in df_meta_nan.columns:
    if col != "ID":
        parameter_list.append(col)

As a next step you have to split the "ID" column, based on the attribute values in the attribute columns, into multiple lists so that you can use them to measure the average degree in the next step. You don't want to use the rows where the attribute value that you want to analyze is "NaN". Thats why you have to filter the dataframe by this condition before using the groupby function in a for loop for every attibute. 

In [5]:
complete_parameter_value_list = []
for i in parameter_list:
    parameter_value_list = []
    nan_filtered_dataframe = df.metadata.loc[df.metadata[i] != "NaN"]
    for parameter_values, grouped_by_dataframes in nan_filtered_dataframe.groupby(i):
        parameter_value_list.append([grouped_by_dataframes["ID"], parameter_values])
    complete_parameter_value_list.append([i, parameter_value_list])

Now that you have lists of lists with the ID's for every attribute and every attribute value you can use the network.degree function to get the degrees for the ID's in a list and measure the average degree by accumulating them and dividing them by the length of the list.

In [6]:
avg_degree_param_list = []
for i in complete_parameter_value_list:
    value_avg_degree_pair_list = []
    for j in i[1]:
        avg_degree = 0
        for k in j[0]:
            avg_degree += network.degree[k]
        avg_degree = avg_degree / len(j[0])
        value_avg_degree_pair_list.append([j[1], avg_degree])
    avg_degree_param_list.append([i[0], value_avg_degree_pair_list])

In the end you can also add the total average degree for every attribute to the list by using the attribute value lists from before. 

In [7]:
for i in avg_degree_param_list:
    avg_degree_parameter = 0
    for j in i[1]:
        avg_degree_parameter += j[1]
    avg_degree_parameter = avg_degree_parameter / len(i[1])
    i[1].append(["GlobalAvG", avg_degree_parameter])

## How to implement the group_list_degree function
<a name="group_list_degree"></a>

The start of this implementation is pretty similar to the avg_degree_attr function so you can skip most of it.

In [8]:
import networkx

In [9]:
df_meta_nan = df.metadata.fillna("NaN")
network = f2f.create_network_from_data(df)

In [10]:
parameter_list = []
for col in df_meta_nan.columns:
    if col != "ID":
        parameter_list.append(col)

In this case you just need to get lists for every attribute, attribute value and the related ID's.

In [11]:
complete_parameter_value_list = []
for i in parameter_list:
    nan_filtered_dataframe = df.metadata.loc[df.metadata[i] != "NaN"]
    for parameter_values, grouped_by_dataframes in nan_filtered_dataframe.groupby(i):
        complete_parameter_value_list.append([i, parameter_values, list(grouped_by_dataframes["ID"])])

As a next step you can replace the ID values by their degree values with the help of network.degree.

In [12]:
for i in complete_parameter_value_list:
    parameter_value_degree_list = []
    for j in i[2]:
        parameter_value_degree_list.append(network.degree(j))
    i[2] = parameter_value_degree_list[:]

The lists can be used for comparing the correlation of the communicativity based on the different attribute values. 