# Prototype for the function that build up the connections.
This is a prototype for the function *create_connections* found in the file **network.py**.

This is version 0.1.

The notebook does not consider the waves as different time points.

Last update: 25/01/2018

In [1]:
import json
import networkx as nx
import numpy as np
import pandas as pd
import os

# Read nominations file
Get the nominations file from the data folder. The original file was modified. The columns were changed to:
```
nominations.columns = ['class', 'child', 'wave', 'variable', 'class_nominated', 'nominated', 'same_class']
```


In [2]:
data_f='./data/'
nominations = pd.read_csv(data_f+'nominations.csv', sep=';', header=0)

In [3]:
nominations.head()

Unnamed: 0,class,child,wave,variable,class_nominated,nominated,same_class
0,52,643,1,PA_Impression_Management,,633,0
1,52,645,1,ME_Com_Network,,633,0
2,52,645,1,GEN_Leader,,633,0
3,52,648,1,GEN_Friendship,,633,0
4,52,648,1,GEN_Want2B,,633,0


# Variables present in the file
The variables collected in this file are presented below.

In [4]:
set(nominations.variable)

{'DI_Com_Network',
 'DI_Impression_management',
 'DI_Modelling',
 'GEN_Advice',
 'GEN_Friendship',
 'GEN_Leader',
 'GEN_Respect',
 'GEN_Social_Facilitation',
 'GEN_Want2B',
 'ME_Com_Network',
 'PA_Com_Network',
 'PA_Impression_Management',
 'PA_Modelling'}

# Read the formula for the weights of each question
The JSON file is a dictionary containing the information of which questions should be taken into consideration and how much they value for the overall calculation of the connections.

max_score is the maximum weight a nomination can have. That means that if one person nominates another person in all the questions, it will reach a value of 1 for the connection strenght.

In [5]:
formula = json.loads(open('./settings/connections.json').read())

In [6]:
formula

{'DI_Com_Network': 1,
 'DI_Impression_management': 0,
 'DI_Modelling': 1,
 'GEN_Advice': 1,
 'GEN_Friendship': 1,
 'GEN_Leader': 1,
 'GEN_Respect': 1,
 'GEN_Social_Facilitation': 1,
 'GEN_Want2B': 1,
 'ME_Com_Network': 1,
 'PA_Com_Network': 1,
 'PA_Impression_Management': 1,
 'PA_Modelling': 1}

In [7]:
max_score = sum(formula.values())
max_score

12

# Building a dictionary of nominations

The idea is to build a dictionary with the list of nominations of each child and the final weight. The final result should be something like shown below:

```json
{
    child1: {
            nominated1: weight1,
            nominated2: weight2,
            ...
            nominatedx: weightx,
           },
    child2: {
            nominated1: weight1,
            nominated2: weight2,
            ...
            nominatedx: weightx,
           },
    ...
    childx: {
            nominated1: weight1,
            nominated2: weight2,
            ...
            nominatedx: weightx,
           }
}
```

# List of participants
*pp.csv* provides the list of the participants and the waves they were part of the experiment.

In [8]:
pp = pd.read_csv(data_f+'pp.csv', sep=';', header=0)
pp.head()

Unnamed: 0,School,Primary,Secondary,Class_Y1,Class_Y2,Child_Bosse,parti_W1,parti_W2,parti_W3,parti_W4
0,22,,1.0,52.0,52.0,643,1,0,0,0
1,22,,1.0,52.0,52.0,645,1,0,0,0
2,22,,1.0,52.0,52.0,648,1,0,0,0
3,22,,1.0,52.0,55.0,649,1,1,0,1
4,22,,1.0,52.0,55.0,650,1,1,0,1


# Build empty dictionary for all the kids (953)

In [9]:
connections_dict = {}
for child in list(pp.Child_Bosse):
    connections_dict[child] = {}

# Create nomination list
This version does not consider differences between waves.

In [10]:
# To avoid repetition
nominations_list = []

for line in nominations[['child', 'nominated', 'variable']].iterrows():
    (ch, nom, var) = line[1]
    #print(ch, nom, var)
    #print('weigh: ' , formula[var])

    # Verify if nominated is in the list of participants (pp)
    if nom in list(pp.Child_Bosse) and (ch, nom, var) not in nominations_list:
        # Add value in the key
        connections_dict[ch][nom] = connections_dict[ch].get(nom, 0) + 1*formula[var]
        nominations_list.append((ch, nom, var))


In [11]:
pd.DataFrame(connections_dict).values

array([[nan,  1.,  4., ..., nan, nan, nan],
       [ 2., nan,  4., ..., nan, nan, nan],
       [ 2.,  3., nan, ..., nan, nan, nan],
       ...,
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan]])

# Create a data frame of the connections

The data frame created is 901x953. That means that not all the students have data from their nominations.

The missing values receive 0, and the entire data frame is divided by the maximum score to normalize the values between 0 and 1.

In [12]:
# can be skipped if the division happens when the edges are being inserted.
connections_df = pd.DataFrame(connections_dict).fillna(0)/max_score

In [13]:
connections_df.shape

(901, 953)

In [14]:
len(connections_dict.keys())

953

# Build graph
The normalized data is transformed again to dictionary.

The items give the values for the edges and the weight of the connections.

In [15]:
connections_dict = connections_df.to_dict()

In [16]:
graph = nx.DiGraph()

In [17]:
for node in connections_dict.items():
    destine = node[0]
    origins = node[1]
    for peer, weight in origins.items():
        if weight > 0:
            graph.add_edge(peer, destine, weight=weight)

In [18]:
len(graph.edges())

9042

# Save information of the edges

The results are saved in the file *connections.csv* in the **results** folder.

The graph has 9.042 edges based on the data provided.

In [19]:
connections_df.to_csv('./results/connections.csv')