# Dynamic network analyses

**Authors**: Andreas Kruff, Johann Schaible, Marcos Oliveira

**Version**: 20.04.2020

**Description**: For analyzing and comparing different networks from the same dataset this packages contains some methods to split the dataset into multiple networks based on different concepts. Furthermore it contains functions to get meaningful measures for the comparison.

For more information about the methods that are explained in this tutorial you can check out the online documentation of this toolbox here:

https://gesiscss.github.io/face2face_public/

## Table of Contents
#### [Create different networkx Graphs from a dataset](#create_network)
#### [Different representative network measurements](#measurements)

The cell below can be ignored, after being executed once. The path has to be set to the directory above to get access to the data and the functions of this toolbox.

In [1]:
import os
import sys
module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
    sys.path.append(module_path)

In [2]:
from face2face.imports.load_all_data import Data

df = Data("WS16")

## Create different networkx Graphs from a dataset 
<a name="create_network"></a>

If you want to analyze the whole dataset you can use the "create_network_from_data" function. If you want to replace string attributes into numeric attribute you can also set the replace_attr parameter "True". You can also use a label as input parameter to reduce the dataframe rows by the columns that have NaN values for the given label. Both parameter are basically not need for the normal usage, so you probably won't use them. They are just important for the null model functions.

In [5]:
from face2face.imports.create_network import create_network_from_data

network = create_network_from_data(df)

If you want to analyse specific time windows of a dataset you have three different functions to do so. 
The first method is called "hopping". The function gets a period of time (in minutes) and splits the dataset in time intervals like this: 0 - 10, 10 - 20, 20 - 30. 
The second method is called "sliding". The functions gets a period of time (in minutes) for the length of the interval and another period of time (in minutes) for the iteration steps. So the splits would look like this: 0-10, 1-11, 2-12.
The third option would be a list of different time intervals for given events as input to analyze just the specific events.

In [4]:
from face2face.imports.create_network import hopping_time_networks

hopping_network_list, hopping_df_list = hopping_time_networks(df, minutes=10)

In [5]:
from face2face.imports.create_network import sliding_time_networks

sliding_network_list, sliding_df_list = sliding_time_networks(df, slide=1, interval=10)

With the "event_time_networks" function we can create networks based on events that took place at the conference. 
For the tutorial we have a list of three fictional events "Meeting A", "Break" and "Lecture A" so we can compare the social behaviour of the persons at this events.
You can add events by yourself by creating a list similar to the "events" list below. You just need to create tupels like this:
("<Eventname>", timestamp A, timestamp B) and add it to a list. 

In [11]:
from face2face.imports.create_network import event_time_networks

events = [("Meeting A", 1480486100, 1480488220), ("Break", 1480488880, 1480493060), ("Lecture A", 1480505760, 1480590240)]
event_network_list, event_df_list = event_time_networks(df, events)

All this functions have two outputs. You have one list that contains networks and one list that contains dataframes. If you want to analyse them you can pick single networks/dataframes from the list by the index. Keep in mind that the index starts by zero, so to access the first network/dataframe you do this:

event_network_list[0] or event_df_list[0].

test = average_path_length_network(event_network_list[0])

Or if you want to analyse and compare them you loop through them. For example like this:

for i in event_network_list:

    test = average_path_length_network(i)

Then you can add the single network/dataframe or the iterator from the loop to a function as you can see above and compare the results. The output lists are always chronologically order, so the standard, the hopping and the sliding network is order by the time(stamps) which they describe and the event_time_network is order in the order of the event input list. 

The network list can be used in conjunction with the network measurement functions which are also part of this tutorial. To use the dataframes for measurements you have to create a Data object like it was mentioned in the tutorial "import_data_set". With the tij dataframes from the function and the metadata from the original data set you can start analyzing specific time windows with the null model or the contact duration functions for example.

## Different representative network measurements
<a name="measurements"></a>

You can use the following functions for the measurements individually, if you just want the result for specific measurements.

In [7]:
from face2face.statistics.calculating_multiple_measurements_for_a_network import *

number_of_nodes, number_of_edges = calculating_number_of_edges_nodes(event_network_list[0])
k = mean_degree_network(event_network_list[0])
var, std = variance_std_network(event_network_list[0])
d = average_path_length_network(event_network_list[0])
c = clustering_coefficient(event_network_list[0])

The other option is to use the "print_network_measures" to print all these measures at once, so you can compare them with different networks. In this case i gave the function the networks for the three events i mentioned before. Now you can compare the different behaviour of the affected persons for the different events.

In [12]:
from face2face.visualization.output_network_information import print_network_measures

print_network_measures(event_network_list[0])

Number of nodes = 109 , Number of edges = 592
Average Network degree <k> = 10.862385321100918
Standard deviation of the Network degree = 8.362726268419793
Average Path Length <d> = 1.9667698876287776
Clustering Coefficient C = 0.09965491120276071


In [13]:
print_network_measures(event_network_list[1])

Number of nodes = 132 , Number of edges = 6172
Average Network degree <k> = 93.51515151515152
Standard deviation of the Network degree = 23.64974368766382
Average Path Length <d> = 1.0759517569977246
Clustering Coefficient C = 0.7084481175390266


In [14]:
print_network_measures(event_network_list[2])

Number of nodes = 35 , Number of edges = 109
Average Network degree <k> = 6.228571428571429
Standard deviation of the Network degree = 3.98630307990975
Average Path Length <d> = 1.943719153771244
Clustering Coefficient C = 0.17795918367346938


The Average Path Length describes how many people have to be involved in the path from one person to any other person in the network. The Average Path Length for this events is pretty low with values between 1 and 2. It means you have to talk just to one person and he he will know the person you seek. Probably you may need 2. 
The Clustering Coefficient shows how dense the contacts between the neighbors of a person are. A value close to 1 means that every neighbor of one person is also in contact with the other neighbors of this person. A value close to 0 means the opposite. So in this case the Clustering Coefficient of the events 1 and 3 are pretty low, while the Clustering Coefficient of the second event is quite high.