# Dynamic network analyses

**Authors**: Andreas Kruff, Johann Schaible, Marcos Oliveira

**Version**: 20.04.2020

**Description**: For analyzing and comparing different networks from the same dataset this packages contains some methods to split the dataset into multiple networks based on different concepts. Furthermore it contains functions to get meaningful measures for the comparison.

For more information about the methods that are explained in this tutorial you can check out the online documentation of this toolbox here:

https://gesiscss.github.io/face2face/

## Table of Contents
#### [Create different networkx Graphs from a dataset](#create_network)
#### [Different representative network measurements](#measurements)

In [1]:
import face2face as f2f

df = f2f.Data("Synthetic")

## Create different networkx Graphs from a dataset 
<a name="create_network"></a>

If you want to analyze the whole dataset you can use the "create_network_from_data" function. If you want to replace string attributes into numeric attribute you can also set the replace_attr parameter "True". You can also use a label as input parameter to reduce the dataframe rows by the columns that have NaN values for the given label. Both parameter are basically not need for the normal usage, so you probably won't use them. They are just important for the null model functions.

In [2]:
network = f2f.create_network_from_data(df)

If you want to analyse specific time windows of a dataset you have three different functions to do so. 
The first method is called "hopping". The function gets a period of time (in minutes) and splits the dataset in time intervals like this: 0 - 10, 10 - 20, 20 - 30. 
The second method is called "sliding". The functions gets a period of time (in minutes) for the length of the interval and another period of time (in minutes) for the iteration steps. So the splits would look like this: 0-10, 1-11, 2-12.
The third option would be a list of different time intervals for given events as input to analyze just the specific events.

In [3]:
hopping_network_list, hopping_df_list = f2f.hopping_time_networks(df, minutes=10)

In [4]:
sliding_network_list, sliding_df_list = f2f.sliding_time_networks(df, slide=1, interval=10)

With the "event_time_networks" function we can create networks based on events that took place at the conference. 
For the tutorial we have a list of three fictional events "Meeting A", "Break" and "Lecture A" so we can compare the social behaviour of the persons at this events.
You can add events by yourself by creating a list similar to the "events" list below. You just need to create tupels like this:
("Eventname", timestamp A, timestamp B) and add it to a list. 

In [13]:
events = [("Meeting A", 20, 1800), ("Break", 40000, 40500), ("Lecture A", 300000, 350000)]
event_network_list, event_df_list = f2f.event_time_networks(df, events)

All this functions have two outputs. You have one list that contains networks and one list that contains dataframes. If you want to analyse them you can pick single networks/dataframes from the list by the index. Keep in mind that the index starts by zero, so to access the first network/dataframe you do this:

event_network_list[0] or event_df_list[0].

test = average_path_length_network(event_network_list[0])

Or if you want to analyse and compare them you loop through them. For example like this:

for i in event_network_list:

    test = average_path_length_network(i)

Then you can add the single network/dataframe or the iterator from the loop to a function as you can see above and compare the results. The output lists are always in chronologically order, so the standard, the hopping and the sliding network is ordered by the time(stamps) which they describe and the event_time_network is order in the order of the event input list. 

The network list can be used in conjunction with the network measurement functions which are also part of this tutorial. To use the dataframes for measurements you have to create a Data object like it was mentioned in the tutorial "import_data_set". With the tij dataframes from the function and the metadata from the original data set you can start analyzing specific time windows with the null model or the contact duration functions for example.

## Different representative network measurements
<a name="measurements"></a>

You can use the following functions for the measurements individually, if you just want the result for specific measurements.

In [6]:
number_of_nodes, number_of_edges = f2f.calculating_number_of_edges_nodes(event_network_list[0])
k = f2f.mean_degree_network(event_network_list[0])
var, std = f2f.variance_std_network(event_network_list[0])
d = f2f.average_path_length_network(event_network_list[0])
c = f2f.clustering_coefficient(event_network_list[0])

The other option is to use the "print_network_measures" to print all these measures at once, so you can compare them with different networks. In this case i gave the function the networks for the three events i mentioned before. Now you can compare the different behaviour of the affected persons for the different events.

In [7]:
f2f.print_network_measures(event_network_list[0])

Number of nodes = 86 , Number of edges = 59
Average Network degree <k> = 1.372093023255814
Standard deviation of the Network degree = 0.6827868956234043
Average Path Length <d> = 14.081004355155086
Clustering Coefficient C = 0.0159545700378583


In [8]:
f2f.print_network_measures(event_network_list[1])

Number of nodes = 200 , Number of edges = 1057
Average Network degree <k> = 10.57
Standard deviation of the Network degree = 3.8723507072578016
Average Path Length <d> = 2.2469350625524203
Clustering Coefficient C = 0.05285


In [15]:
f2f.print_network_measures(event_network_list[2])

Number of nodes = 200 , Number of edges = 1057
Average Network degree <k> = 10.57
Standard deviation of the Network degree = 3.8723507072578016
Average Path Length <d> = 2.2469350625524203
Clustering Coefficient C = 0.05285


The Average Path Length describes how many people have to be involved in the path from one person to any other person in the network. The Average Path Length for the "Break" and the "Lecture A" events is pretty low with values between around 2. It means you just have to talk to one person and he he will know the person you seek. Probably you may need 2. 
The Clustering Coefficient shows how dense the contacts between the neighbors of a person are. A value close to 1 means that every neighbor of one person is also in contact with the other neighbors of this person. A value close to 0 means the opposite. So in this case the Clustering Coefficient of all the events is pretty low.