# Primo Progetto di Social Computing - Analisi della rete sociale di Twitter

## Autori: 

*   Emanuele Lena - 142411
*   Ilaria Fenos - 142494
*   Massimiliano Baldo - 142296
*   Simone Dalla Pietà - 141995




## Introduzione Generale
L’obiettivo del progetto è di reperire una porzione della rete sociale del social network Twitter,per poi fare un’analisi applicando alcune delle più tipiche tecniche di studio dei grafi.Nel dettaglio, si intende:

* Reperire i dati pubblici di 5 di profili di 
partenza, dei profili a loro direttamente correlati(followers e followed) e di ulteriori profili casuali (scelti secondo certi criteri spiegati inseguito)
* Costruire un grafo che rappresenta la rete sociale, dove–i nodi sono i profili scaricati–gli archi (diretti) indicano una relazione di follower→following (“chi segue chi”)
* Applicare le più comuni tecniche di analisi sul grafo, quali la visualizzazione del grafo,misura delle distanze e centralità, calcolo della copertura minima e la stima della “small-world-ness” del grafo.
* Calcolo delle correlazioni tra le variabili calcolate

## Preparazione
### Librerie

In [None]:
import tweepy  # for gather data from twitter API endpoint
import os
import json
import pprint
import csv  
import random as rn
import math
import networkx as nx # for storing data before saving it on graph
import pandas as pd   # for storing data before saving it on graph
import numpy as np
import matplotlib.pyplot as plt
from pylab import rcParams
import pickle
from scipy import stats
from google.colab import drive  # to have a sort of common filesystem for all the 
                                # group members
from datetime import datetime   # to measure execution time of some functions

### Collegamento a Google Drive (per leggere e scrivere files)

In [None]:
# mount google drive here to read users and relations data frame
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
# folder where I have the data
data_folder = "/content/drive/My Drive/Colab Notebooks/data-project/"

### Accesso alle API di Twitter con Tweepy 

In [None]:
# Twitter API Credentials
api_key = "insert_key_here"
api_key_secret = "insert_key_here"
access_token = "insert_key_here-mHL1uo0wCxBoFglM9I6ORfwg14xwv8"
access_token_secret = "insert_key_here"
bearer_token = "insert_key_here"

In [None]:
# Tweepy authentication
auth = tweepy.OAuthHandler(api_key, api_key_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)
if (api.verify_credentials):
  print("Authentication success!")
else:
  print("Got a error")

## Reperimento dei dati

### Reperimento dei profili principali e dei profili direttamente connessi

Questa funzione viene usata da ciascuno di noi per reperire le informazioni sul/i proprio/i utente/i assegnato e tutti i suoi followers e following. Nel dettaglio vengono reperiti:

* tutti i dati dell'utente
* tutti i dati dei followers (e l'indicazione che ognuno di loro segue l'utente) 
* tutti i dati dei following (e l'indicazione che ognuno di loro è seguito dall'utente) 

I dati di tutti gli utenti vengono salvati in un data frame (pd.DataFrame) df_users, dove ogni dettaglio reperito corrisponde ad una colonna. In particolare possono essere interessanti le colonne:

* `df_users["id"]` <- id del profilo
* `df_users["screen_name" ]` <- screen name del profilo
* `df_users["followers_count"]` <- numero di followers

I dati su "chi segue chi" sono invece rappresentati in un data frame (pd.DataFrame) df_relations, da 2 colonne: `df_relations["Follower"]` e `df_relations["Followed"]`.

Ogni riga di df_relations indica che un certo utente - `df_relations["Following"]` - segue un altro - `df_relations["Followed"]` (Gli utenti sono rappresentati dai loro id).

La funzione `save_users_informations` ritorna in output entrambi i data frame per un utente.

Nel dettaglio, la funzione usa:

* `api.get_user` per recuperare i dati dell'utente
* `api.followers` e `api.friends` per recuperare i followers e followings (quest'ultime con `Cursor`, per gestire automaticamente il numero limitato di richieste per gli endpoint)

In [None]:
def get_user_and_neighbors(api: tweepy.API, user: str,  
                         quantity: int = None):
    """
    Gather a twitter profile's data and the onces of its follower, followed

    :param api: the (initialized) tweepy twitter api istance to use
    :param user: the name of the profile from whitch I want to gather data
    :param df_users: the dataframe whene I will save users data
    :param df_users: the dataframe whene I will follower-followed relations
    :param quantity: the quantity of followers/followed I want to fetch

    return: df_users, df_relations filled with the new data
    """

    df_users: pd.DataFrame = pd.DataFrame()
    df_relations: pd.DataFrame = pd.DataFrame()
     
    # request of the user
    user_info = api.get_user(screen_name=user)._json
    row = pd.json_normalize(user_info)
    df_users = pd.concat([df_users, row])

    user_id = user_info["id"]

    if quantity is None:
      followers_count = user_info["followers_count"]
      friends_count = user_info["friends_count"]
    else: 
      followers_count = quantity
      friends_count = quantity

    counter = 0
    max = followers_count

    # request of user's followers
    for item in tweepy.Cursor(api.followers, id=user).items(followers_count):
        data_item = item._json
        row_user = pd.json_normalize(data_item)
        # item -> user
        row_relation = pd.DataFrame([[data_item["id"], user_id]], columns=["Follower", "Following"])
        df_users = pd.concat([df_users, row_user])
        df_relations = pd.concat([df_relations, row_relation])

        # debug count
        perc = math.floor(counter/max*100)
        print(str(perc) + "% of the followers downloaded")
        counter = counter+1

    counter = 0
    max = friends_count

    # request of user's followings
    for item in tweepy.Cursor(api.friends, id=user).items(friends_count):

      data_item = item._json
      row_user = pd.json_normalize(data_item)
      # user -> item
      row_relation = pd.DataFrame([[user_id, data_item["id"]]], columns=["Follower", "Following"])
      df_users = pd.concat([df_users, row_user])
      df_relations = pd.concat([df_relations, row_relation])

      # debug count
      perc = math.floor(counter/max*100)
      print(str(perc) + "% of the friends downloaded")
      counter = counter+1

    return df_users, df_relations

Reperimento dati (per uno dei 5 profili):

In [None]:
# call the created function to get user infos
df_users, df_relations = get_user_and_neighbors("damiano10")
# df_users, df_relations = get_user_and_neighbors("mizzaro")
# df_users, df_relations = get_user_and_neighbors("Iglu81")
# df_users, df_relations = get_user_and_neighbors("KevinRoitero")
# df_users, df_relations = get_user_and_neighbors("Miccighel_")

# save the datasets in csv
df_users.to_csv(data_folder + "raw_data/df_users_damiano.csv", index=False)
df_relations.to_csv(data_folder + ""raw_data/df_relations_damiano.csv", index=False)

### Reperimento dei profili casuali aggiuntivi

Le funzioni `random_followers` e `random_followings` permettono di reperire rispettivamente:

* 10 followers casuali per 5 followers (anch'essi casuali) di un profilo 
* 10 followings casuali per 5 followings (anch'essi casuali) di un profilo 

Queste funzioni recuperano:

* i dati dei nuovi utenti scovati
* un'indicazione di quali sono i profili dai quali sono stati scovati questi nuovi utenti (chi sono i 5 followers/followed diretti del profilo da cui abbiamo trovato i 10*5 nuovi profili)

I dati degli utenti scovati e le indicazioni sui profili da cui sono stati scoperti vengono rappresentati da 2 dataframe che seguno le stesse strutture descritte per `save_users_informations`.

Questi dataframe vengono ritornati in output da entrambi le funzioni.

Nel dettaglio, le funzioni usano:

* `api.followers_ids` e `api.friends_ids` per reperire la lista di followers/followed casuali per i 5 followers/followed del profilo principale (di questi, poi si estrarranno 10 casuali). Queste funzioni sono state eseguite con `Cursor`, per gestire automaticamente la paginazione dei risultati.
* `api.get_user` per recuperare i dati di ciascuno dei 10*5 (nuovi) profili casuali

In [None]:
def random_followers(
                    api: tweepy.API,
                    profile_screen_name: str, 
                    df_users: pd.DataFrame,
                    df_relations: pd.DataFrame):
  """

  :param api: the (initialized) tweepy twitter api istance to use
  :param profile_screen_name: the screen name of the user from which I select the 5 randoms
  :param df_users:
  :param df_relations:
  
  return: df_random_followers, df_random_followers_relations
  """

  # user_id = df_users.loc[0,"id"]
  user_id = df_users.loc[df_users["screen_name"] == profile_screen_name, "id"].to_numpy()[0]

  #code to get 5 random followers

  # get ids of all followers of user_id
  df_followers = df_relations.loc[df_relations["Following"] == user_id]

  # get their data
  df_followers = df_followers.join(df_users, lsuffix='Follower', rsuffix='id')

  # select only the onces that has at least 10 followers
  df_followers = df_followers.loc[df_followers["followers_count"] >= 10]

  # get 5 followers random
  df_sample = df_followers.sample(n=5)
  random_id = df_sample['id'].to_numpy()

  # print(random_id)
  
  #code to get 10 followers for each random follower
  list_of_ten = [] #the list will have 50 id

  # prepare the dataframe where i say who follows who
  df_random_followers_relations = pd.DataFrame()

  for user in random_id:

    # followers = api.followers_ids(user)

    followers = list()
    
    for flw in tweepy.Cursor(api.followers_ids, id=user).pages():
      followers.extend(flw)

    # print(followers)
    
    ten = rn.sample(followers, 10)

    # print(ten)
    for random_follower in ten:
      list_of_ten.append(random_follower)
      # random_follower -> user
      row_relation = pd.DataFrame([[random_follower, user]], columns=["Follower", "Following"])
      df_random_followers_relations = pd.concat([df_random_followers_relations, row_relation])
  
  # print(list_of_ten)

  #code to create a dataframe with the users's info
  df_random_followers = pd.DataFrame()

  for item in list_of_ten:

    user_info = api.get_user(id=item)._json
    
    # for user in tweepy.Cursor(api.get_user, id=item).items(1):
    #  user_info = user._json
    
    row = pd.json_normalize(user_info)
    df_random_followers = pd.concat([df_random_followers, row])

  # I return the new users data + the relations df that says who follows who
  return df_random_followers, df_random_followers_relations

In [None]:
def random_followings(
                    api: tweepy.API,
                    profile_screen_name: str, 
                    df_users: pd.DataFrame,
                    df_relations: pd.DataFrame):
  """

  :param api: the (initialized) tweepy twitter api istance to use
  :param profile_screen_name: the screen name of the user from which I select the 5 randoms
  :param df_users:
  :param df_relations:
  
  return: df_random_followings, df_random_followings_relations
  """

  # user_id = df_users.loc[0,"id"]
  user_id = df_users.loc[df_users["screen_name"] == profile_screen_name, "id"].to_numpy()[0]

  #code to get 5 random followings

  # get ids of all followings of user_id
  df_followings = df_relations.loc[df_relations["Follower"] == user_id]

  # get their data
  df_followings = df_followings.join(df_users, lsuffix='Following', rsuffix='id')

  # select only the onces that follows at least 10 people
  df_followings = df_followings.loc[df_followings["friends_count"] >= 10]

  # get 5 followings random
  df_sample = df_followings.sample(n=5)
  random_id = df_sample['id'].to_numpy()

  # print(random_id)
  
  #code to get 10 followings for each random following
  list_of_ten = [] #the list will have 50 id

  # prepare the dataframe where i say who follows who
  df_random_followings_relations = pd.DataFrame()

  for user in random_id:

    # followings = api.friends_ids(user)

    followings = list()
    
    for flw in tweepy.Cursor(api.friends_ids, id=user).pages():
      followings.extend(flw)

    # print(followings)
    
    ten = rn.sample(followings, 10)

    # print(ten)
    for random_follow in ten:
      list_of_ten.append(random_follow)
      # user -> random_follow
      row_relation = pd.DataFrame([[user, random_follow]], columns=["Follower", "Following"])
      df_random_followings_relations = pd.concat([df_random_followings_relations, row_relation])
  
  # print(list_of_ten)

  #code to create a dataframe with the users's info
  df_random_followings = pd.DataFrame()

  for item in list_of_ten:

    user_info = api.get_user(id=item)._json
    
    # for user in tweepy.Cursor(api.get_user, id=item).items(1):
    #  user_info = user._json
    
    row = pd.json_normalize(user_info)
    df_random_followings = pd.concat([df_random_followings, row])

  # I return the new users data + the relations df that says who follows who
  return df_random_followings, df_random_followings_relations

Reperimento dati (per uno dei 5 profili):

In [None]:
# Random's Followers and Random's Followings
random_users_followers, random_rel_followers = random_followers("daminano10", dam_users, dam_rel)
random_users_following, random_rel_following = random_followings("daminano10", dam_users, dam_rel)

# concat random users datasets
random_users = pd.concat([random_users_followers, random_users_following])

# concat random relations datasets
random_rel = pd.concat([random_rel_followers, random_rel_following])

# save the datasets in csv
random_users.to_csv(data_folder + "raw_data/df_users_daminano10_5random.csv", index=False)
random_rel.to_csv(data_folder + ""raw_data/df_relations_daminano10_5random.csv", index=False)

### Unione dei datasets

A questo punto, sono stati scaricati, per ognuno dei 5 profili principali:

* un dataset `df_user_nome_profilo.csv` con i dati del profilo e dei followers e followed diretti
* un dataset `df_relations_nome_profilo.csv` con le indicazioni su chi sono i followers e chi sono i followed
* un dataset `df_user_nome_profilo_5random.csv` con i dati dei 5\*10\*2 profili random scoperti tramite la ricerca dei 10 dei 5 follower/following random
* un dataset `df_relations_profilo_5random.csv` con le indicazioni su di chi sono followers/following questi profili casuali

Tutti questi datasets, vengono uniti in 2 unici datasets `df_users.csv` e `df_relations.csv` tramite la funzione `merge_relations_datasets`, che riassume tutti i datasets di una cartella in 2 unici dataframes pandas (poi salvati in csv).

In [None]:
def merge_users_datasets(data_folder: str):
  """
  Merge all the "df_users*.csv" from a certain data folder

  :param data_folder: the path to search the datasets
  
  return: df_users
  """

  df_users = pd.DataFrame()

  # Searching alle the file starting with "df_users_"
  files = os.listdir(data_folder)

  for entry in files:
    if ("df_users_" in os.path.basename(entry)):
      df_users = df_users.append(pd.read_csv(data_folder + entry))
  
  df_users = df_users.drop_duplicates(["id"])
  df_users = df_users.drop(["Unnamed: 0"], axis=1)

  print("Fine merge datasets df_users!")
  return df_users

In [None]:
def merge_relations_datasets(data_folder: str):
  """
  Merge all the "df_relations*.csv" from a certain data folder

  :param data_folder: the path to search the datasets
  
  return: df_relations
  """

  df_relations = pd.DataFrame()

  # Searching alle the file starting with "df_relations_"
  files = os.listdir(data_folder)

  for entry in files:
    if ("df_relations_" in os.path.basename(entry)):
      df_relations = df_relations.append(pd.read_csv(data_folder + entry))
  
  df_relations = df_relations.drop_duplicates(["Follower","Following"])

  df_relations = df_relations.drop(["Unnamed: 0"], axis=1)
  print("Fine merge datasets df_relations!")
  return df_relations

In [None]:
# Merge datasets
df_users = merge_users_datasets(data_folder + "raw_data/")
df_relations = merge_relations_datasets(data_folder + "raw_data/")

Fine merge datasets df_users!
Fine merge datasets df_relations!


Visualizzazzione datasets

In [None]:
display(df_users.head())
display(df_relations.head())

Unnamed: 0,id,id_str,name,screen_name,location,profile_location,description,url,protected,followers_count,friends_count,listed_count,created_at,favourites_count,utc_offset,time_zone,geo_enabled,verified,statuses_count,lang,contributors_enabled,is_translator,is_translation_enabled,profile_background_color,profile_background_image_url,profile_background_image_url_https,profile_background_tile,profile_image_url,profile_image_url_https,profile_link_color,profile_sidebar_border_color,profile_sidebar_fill_color,profile_text_color,profile_use_background_image,has_extended_profile,default_profile,default_profile_image,following,follow_request_sent,notifications,...,status.favorite_count,status.favorited,status.retweeted,status.possibly_sensitive,status.lang,profile_banner_url,live_following,muting,blocking,blocked_by,status.quoted_status_id,status.quoted_status_id_str,status.retweeted_status.place,entities.url.urls,status.retweeted_status.quoted_status_id,status.retweeted_status.quoted_status_id_str,status.retweeted_status.scopes.followers,status.place.id,status.place.url,status.place.place_type,status.place.name,status.place.full_name,status.place.country_code,status.place.country,status.place.contained_within,status.place.bounding_box.type,status.place.bounding_box.coordinates,status.geo.type,status.geo.coordinates,status.coordinates.type,status.coordinates.coordinates,profile_location.id,profile_location.url,profile_location.place_type,profile_location.name,profile_location.full_name,profile_location.country_code,profile_location.country,profile_location.contained_within,profile_location.bounding_box
0,18932422,18932422,mizzaro,mizzaro,,,,,False,157,332,8,Tue Jan 13 07:45:49 +0000 2009,344,,,True,False,142,,False,False,False,C0DEED,http://abs.twimg.com/images/themes/theme1/bg.png,https://abs.twimg.com/images/themes/theme1/bg.png,False,http://abs.twimg.com/sticky/default_profile_im...,https://abs.twimg.com/sticky/default_profile_i...,1DA1F2,C0DEED,DDEEF6,333333,True,False,True,True,False,False,False,...,0.0,False,False,False,en,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1,1972411447,1972411447,Christian Abbondo,ChriShot90,,,|1 Lan| |21| Swager since 1999,,False,46,227,0,Sat Oct 19 18:02:28 +0000 2013,1440,,,True,False,202,,False,False,False,000000,http://abs.twimg.com/images/themes/theme1/bg.png,https://abs.twimg.com/images/themes/theme1/bg.png,False,http://pbs.twimg.com/profile_images/8018434653...,https://pbs.twimg.com/profile_images/801843465...,171CA8,000000,000000,000000,False,False,False,False,False,False,False,...,1.0,False,False,False,en,https://pbs.twimg.com/profile_banners/19724114...,False,False,False,False,1.258389e+18,1.258389e+18,,,,,,,,,,,,,,,,,,,,,,,,,,,,
2,257691637,257691637,Dung Doan,dungdx34,Hanoi University of Technology,,,,False,121,3879,0,Fri Feb 25 23:56:08 +0000 2011,72,,,True,False,1,,False,False,False,0099B9,http://abs.twimg.com/images/themes/theme4/bg.gif,https://abs.twimg.com/images/themes/theme4/bg.gif,True,http://pbs.twimg.com/profile_images/3660852329...,https://pbs.twimg.com/profile_images/366085232...,0099B9,5ED4DC,95E8EC,3C3940,True,False,False,False,False,False,False,...,0.0,False,False,False,en,,False,False,False,False,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
3,4893681,4893681,Flavio Martins,flaviomartins,"Lisbon, Portugal",,Head of Data Science @vohcolab | Researcher @n...,https://t.co/4JV9v16sqe,False,603,577,29,Mon Apr 16 16:14:43 +0000 2007,3328,,,True,False,1396,,False,False,False,C0DEED,http://abs.twimg.com/images/themes/theme1/bg.png,https://abs.twimg.com/images/themes/theme1/bg.png,False,http://pbs.twimg.com/profile_images/1038536758...,https://pbs.twimg.com/profile_images/103853675...,1DA1F2,C0DEED,DDEEF6,333333,True,False,True,False,False,False,False,...,0.0,False,False,,en,https://pbs.twimg.com/profile_banners/4893681/...,False,False,False,False,,,,"[{'url': 'https://t.co/4JV9v16sqe', 'expanded_...",,,,,,,,,,,,,,,,,,,,,,,,,,
4,116897811,116897811,Robert Monarch,WWRob,San Francisco,,Private/Global Machine Learning at @Apple \nRu...,http://t.co/rNWjL2tin5,False,4407,2983,239,Tue Feb 23 22:47:08 +0000 2010,4140,,,True,False,3310,,False,False,False,000000,http://abs.twimg.com/images/themes/theme13/bg.gif,https://abs.twimg.com/images/themes/theme13/bg...,False,http://pbs.twimg.com/profile_images/1273073340...,https://pbs.twimg.com/profile_images/127307334...,1B95E0,000000,000000,000000,False,False,False,False,False,False,False,...,1.0,False,False,,en,https://pbs.twimg.com/profile_banners/11689781...,False,False,False,False,,,,"[{'url': 'http://t.co/rNWjL2tin5', 'expanded_u...",,,,,,,,,,,,,,,,,,,,,,,,,,


Unnamed: 0,Follower,Following
0,1972411447,18932422
1,257691637,18932422
2,4893681,18932422
3,116897811,18932422
4,551951577,18932422


Salvataggio datasets



In [None]:
df_users.to_csv(data_folder + "data/df_users.csv", index=False)
df_relations.to_csv(data_folder + "data/df_relations.csv", index=False)

### Verifica della relazione tra i profili

In [None]:
def check_relations(api: tweepy.API, df_users_source: pd.DataFrame, df_users_target: pd.DataFrame):
  """
  Check the relations between users in df_users_source and users in df_users_target

  :param api: the (initialized) tweepy twitter api istance to use
  :param df_users_source: 
  :param df_users_target: 

  return: a dataframe that explain the relations between all combination of users in 
          df_users_source and df_users_target
  """
  
  # get ids of profiles


  # (source, target, following, followed_by)
  df_accurate_relations = pd.DataFrame(columns=["source", "target", 
                                                "following", "followed_by"])
  
  n_users = df_users_source.shape[0] 
  i = 0
  n_errors = 0

  for index,user in df_users_source.iterrows():

    id = user["id"]

    for index2,compare_w_user in df_users_target.iterrows():
      
      id_cwu = compare_w_user["id"]

      check = df_accurate_relations.loc[
          ((df_accurate_relations["source"]==id) & (df_accurate_relations["target"]==id_cwu)) |
          ((df_accurate_relations["source"]==id_cwu) & (df_accurate_relations["target"]==id))  
      ]

      # (Check that the 2 profiles are different and 
      # that id and id_cwu is not yet compared)
      if id != id_cwu and check.shape[0] == 0: 
        
        try:

          # check friendship
          friendship = api.show_friendship(source_id=id, target_id=id_cwu)[0]
  
          row = pd.DataFrame([[id, id_cwu, friendship.following, friendship.followed_by]], 
                              columns=["source", "target", "following", "followed_by"])
          df_accurate_relations = pd.concat([df_accurate_relations, row])

        except tweepy.error.TweepError:
          print("Pfiu... salvato in corner...")
          n_errors = n_errors+1
        
        #(end if)

    # (end internal for)

    # % complete debug print
    i = i+1
    print(math.floor(i/n_users*100), "%")

  # (end external for)

  print("numero errori: ", n_errors)

  return df_accurate_relations

In [None]:
# get users
df_users = pd.read_csv(data_folder + "data/df_users.csv")

# main 5 users
compare_w_profiles_screen_names = ["mizzaro", "damiano10", "Miccighel_", "eglu81", "KevinRoitero"]

# main 5 users details
df_compare_w_users = pd.DataFrame(
      compare_w_profiles_screen_names, columns=["screen_name"]).merge(
          df_users, on="screen_name")
      
  
df_accurate_relations_sample = check_relation(df_users.sample(100), df_compare_w_users)

df_accurate_relations_sample.to_csv(data_folder + "data/df_accurate_relations_sample_100.csv", index=False)

In [None]:
# check that all relations in df_accurate_relations_sample are yet catched  in df_relations
df_accurate_relations_sample = pd.read_csv(data_folder + "data/df_accurate_relations_sample_100.csv")

# get the relations
df_relations = pd.read_csv(data_folder + "data/df_relations.csv")

n_errors = 0

for index, row in df_accurate_relations_sample.iterrows():

  source=row["source"]
  target = row["target"]
  source_follows_target = row["following"]
  source_followed_by_target = row["followed_by"]

  # search in df_relations ... 
  source_follows_target_search = df_relations.loc[((df_relations["Follower"]==source) & (df_relations["Following"]==target))].shape[0]>0
  source_followed_by_target_search = df_relations.loc[((df_relations["Follower"]==target) & (df_relations["Following"]==source))].shape[0]>0

  if not source_follows_target == source_follows_target_search:
    print("problem: ", source, " -> ", target, " = ", source_follows_target)
    n_errors = n_errors+1

  if not source_followed_by_target == source_followed_by_target_search:
    print("problem: ", target, " -> ", source, " = ", source_followed_by_target)
    n_errors = n_errors+1

print(n_errors, "wrong or undetected relations in 100x5 total", sep=" ")

## Analisi della rete sociale

### Costruzione del grafo della rete sociale

In [None]:
def graph_from_df_users_relations(df_users: pd.DataFrame(), 
                                  df_relations: pd.DataFrame(), 
                                  directed: bool=True, 
                                  include_user_data: bool=True, 
                                  use_screen_name: bool=False):
  """
  Build the graph of twitter users follow-friend relations

  :param df_users: the data frame with all the users data
  :param df_relations: the data frame with the users relations 
  :param directed: set False to create an undirected (default <- Directed)
  :param include_user_data: set False to non-include user data here
  :param use_screen_name: set True to use screen name as node names

  return: a networkx DiGraph with all the users as nodes and follow-friend 
  relations as edges 
  """

  

  if directed:
    twitter_graph = nx.DiGraph()
  else:
    twitter_graph = nx.Graph()

  # add users as nodes
  for index, row in df_users.iterrows():

    id = row["id"]
    screem_name = row["screen_name"]
    
    # create the node
    if use_screen_name:
      twitter_graph.add_node(screem_name, label=screem_name) 
    else:
      twitter_graph.add_node(id, label=screem_name) 


    if include_user_data: 
      nx.set_node_attributes(twitter_graph, 
                             {id: {
                                 "details": row.to_dict(), 
                                 "followers_count": row["followers_count"]
                                 }})

    # add also users data
    # twitter_graph.nodes[id]["details"] = 

    # add the followers numbers
    # twitter_graph.nodes[id]["followers_count"] = 
      

  # add follow-friend relations as edges
  for index, row in df_relations.iterrows():

    # x follows y
    if use_screen_name:
      # get screen names of profiles
      x = df_users.loc[df_users["id"] == row["Follower"], "screen_name"].to_numpy()[0]
      y = df_users.loc[df_users["id"] == row["Following"], "screen_name"].to_numpy()[0]
    else: 
      x, y = row["Follower"], row["Following"]
    
    twitter_graph.add_edge(x, y, weight=1) 

  # add code authors to the graph
  twitter_graph.authors = ["Emanuele Lena - 142411", "Ilaria Fenos - 142494", 
                           "Massimiliano Baldo - 142296", "Simone della Pietà - 141995"]

  # todo: fare bene tutto pt. 5 (anche nostri nomi come attributi 
  # + n_followers per ogni nodo)

  return twitter_graph

In [None]:
# read the files
df_users = pd.read_csv(data_folder + "data/df_users.csv")
df_relations = pd.read_csv(data_folder + "data/df_relations.csv")

In [None]:


# build the graph
twitter_graph = graph_from_df_users_relations(df_users, df_relations)

# build also a undirected version (necessary for some analysis)
twitter_graph_undirected = graph_from_df_users_relations(df_users, df_relations, directed=False)

# build also a version that uses screen names as node names 
# (for conversion in pyvis)
twitter_graph_screen_names = graph_from_df_users_relations(df_users, df_relations, 
                                                     include_user_data=False, 
                                                     use_screen_name=True)

# (save the graphs in pickle format)
# Save in pickle the graph data
with open(data_folder + 'graphs/twitter_graph.pickle', 'wb') as handle:
    pickle.dump(twitter_graph, handle, protocol=pickle.HIGHEST_PROTOCOL)

with open(data_folder + 'graphs/twitter_graph_undirected.pickle', 'wb') as handle:
    pickle.dump(twitter_graph_undirected, handle, protocol=pickle.HIGHEST_PROTOCOL)

with open(data_folder + 'graphs/twitter_graph_screen_names.pickle', 'wb') as handle:
    pickle.dump(twitter_graph_screen_names, handle, protocol=pickle.HIGHEST_PROTOCOL)

print("Grafo generato!")

Grafo generato!


### Variabile dove si salva i risultati dell'analisi

In [None]:
# where I save the data
graph_data = {}
try:
  with (open(data_folder + 'analysis_data/graph_data.pickle', "rb")) as openfile:
      while True:
          try:
              graph_data.update(pickle.load(openfile))
          except:
              break
  print("Found this data=", graph_data)
except:
  print("No saved data found, create new")

Found this data= {'is_connected': True, 'is_bipartite': False, 'center': [3036907250, 19659370, 132646210], 'diameter': 6, 'radius': 3, 'betweenness_centrality': {18932422: 0.054266081247212036, 1972411447: 0.0, 257691637: 0.0, 4893681: 0.0, 116897811: 0.0, 551951577: 0.0, 91697719: 0.0, 94732055: 0.0, 514667085: 0.0, 4149032843: 6.216962864662355e-06, 4188048952: 0.0018583998224644838, 14451127: 0.00029117524814474283, 2988561009: 0.0, 920700510306529280: 0.0, 950377242483343360: 0.0, 1527607790: 0.0, 923649456981135367: 0.0, 890413838578966528: 0.0, 1477918760: 0.0, 810893744593584128: 0.00012339585218781166, 125483940: 0.0, 406303880: 0.0, 4863093417: 0.0, 2924886188: 0.0, 2190533245: 0.002242678925418481, 3095317209: 0.0, 153764165: 0.0, 748315519674048512: 0.0, 3036907250: 0.05220462608474385, 386908329: 0.0, 21192987: 0.0, 2826508686: 0.0, 65915718: 0.0, 203070657: 0.0, 3916694415: 0.0, 3761306416: 0.0015044918187712914, 384127183: 0.0, 3513346942: 0.0, 115091858: 0.0, 2206179206

### Caratteristiche generali

In [None]:
print("Nnodi: ", twitter_graph.number_of_nodes())
print("Narchi: ", twitter_graph.number_of_edges())

Verifica grafo connesso e bipartito

In [None]:
# Verify if graph is connected
is_connected = nx.is_connected(twitter_graph_undirected)

# Verify if graph is bipartite
is_bipartite = nx.is_bipartite(twitter_graph)

graph_data["is_connected"] = is_connected
graph_data["is_bipartite"] = is_bipartite

In [None]:
print("Connesso?", is_connected)
print("Bipartito?", is_bipartite)

Centro, diametro e raggio:

In [None]:
# (difficoult to calculate)

# center = nx.center(twitter_graph)
center = nx.center(twitter_graph_undirected)
print("centro: ", center)

graph_data["center"] = center

In [None]:
# (semi-difficoult to calculate)

# diameter = nx.diameter(twitter_graph)
diameter = nx.diameter(twitter_graph_undirected)
print("diametro: ", diameter)

graph_data["diameter"] = diameter

In [None]:
# (semi-difficoult to calculate)

# radius = nx.radius(twitter_graph)
radius = nx.radius(twitter_graph_undirected)
print("raggio: ", radius)

graph_data["radius"] = radius

Nodi del centro:

In [None]:
for node_id in graph_data["center"]:
  name = twitter_graph.nodes[node_id]["label"]
  print(node_id, name, sep="\t")

3036907250	KevinRoitero
19659370	eglu81
132646210	damiano10


### Visualizzazione della rete

Visualizzazione con draw_network (pessimi risultati)

In [None]:
rcParams['figure.figsize'] = 50,50
nx.draw_networkx(
    twitter_graph,
    pos = nx.spring_layout(twitter_graph),
    node_color="#A0CBE2",
    width=2,
    edge_cmap = plt.cm.Blues,
    with_labels = True
)
plt.show()
plt.close()

Visualizzazione con pyvis

In [None]:
!pip install pyvis

In [None]:
from pyvis.network import Network

nt = Network(
    height="100%", width="100%", bgcolor="#222222", 
    font_color="white", heading="Twitter Graph Social Computing"
)

nt.barnes_hut()
nt.from_nx(twitter_graph_screen_names)
neighbor_map = nt.get_adj_list()
for node in nt.nodes:
  node["value"] = len(neighbor_map[node["id"]])

# vediamo...
nt.show(data_folder + "pyvise/twitter_graph.html")
# from google.colab import files
# files.download(data_folder + "pyvise/twitter_graph.html")

### Misure della centralità

* Betweenness centrality (betweenness_centrality)
* Closeness centrality (closeness_centrality)
* Degree centrality (degree_centrality)
* In-degree centrality (in_degree_centrality)
* Out-degree centrality (out_degree_centrality)
* Page Rank (pagerank)
* HITS (hits)

In [None]:
# 9. Calcolate le seguenti misure di centralità sul grafo
# Più casini perchè c'è il discorso bipartite
# https://networkx.org/documentation/stable/reference/algorithms/bipartite.html#module-networkx.algorithms.bipartite

# betweenness_centrality = nx.betweenness_centrality(twitter_graph, list_node_bipartite)
# closeness_centrality = nx.closeness_centrality(twitter_graph, list_node_bipartite)
# degree_centrality = nx.degree_centrality(twitter_graph, list_node_bipartite)

# todo: sistemare

# (semi-difficoult to calculate)

betweenness_centrality = nx.betweenness_centrality(twitter_graph)

print("Betweenness calcolata")
graph_data["betweenness_centrality"] = betweenness_centrality

closeness_centrality = nx.closeness_centrality(twitter_graph)

print("Closeness calcolata")
graph_data["closeness_centrality"] = closeness_centrality

degree_centrality = nx.degree_centrality(twitter_graph_undirected)

print("Degree Centrality calcolata")
graph_data["degree_centrality"] = degree_centrality

in_degree_centrality = nx.in_degree_centrality(twitter_graph)
out_degree_centrality = nx.out_degree_centrality(twitter_graph)

print("In-Degree e Out-Degree calcolati")
graph_data["in_degree_centrality"] = in_degree_centrality
graph_data["out_degree_centrality"] = out_degree_centrality

pagerank = nx.pagerank(twitter_graph)

print("Pagerank calcolato")
graph_data["pagerank"] = pagerank

In [None]:
start_time = datetime.now()
print('Started at: {}'.format(start_time))

hits = nx.hits(twitter_graph, max_iter=500) # todo: sistema

end_time = datetime.now()
print('Ended at: {}'.format(end_time))

graph_data["hits"] = hits

print('Duration: {}'.format(end_time - start_time))

In [None]:
print("Betweenness centrality: ", betweenness_centrality)
print("Closeness centrality ", closeness_centrality)
print("Degree centrality ", degree_centrality)
print("In-degree centrality ", in_degree_centrality)
print("Out-degree centrality ", out_degree_centrality)
print("Page Rank ", pagerank)
print("HITS ", hits)

### Visualzzione del miglior per ogni misura di centralità

In [None]:
# Create the dataframe for every measure of centrality, which are stored in the graph_data dictionary
df_betweenness = pd.DataFrame(graph_data["betweenness_centrality"].items(), columns=['id', 'betweenness_centrality'])
df_degree = pd.DataFrame(graph_data["degree_centrality"].items(), columns=['id', 'degree_centrality'])
df_in_degree = pd.DataFrame(graph_data["in_degree_centrality"].items(), columns=['id', 'in_degree_centrality'])
df_out_degree = pd.DataFrame(graph_data["out_degree_centrality"].items(), columns=['id', 'out_degree_centrality'])
df_pagerank = pd.DataFrame(graph_data["pagerank"].items(), columns=['id', 'pagerank'])
df_hits_hubness = pd.DataFrame(graph_data["hits"][0].items(), columns=['id', 'hits_hubness'])
df_hits_autority = pd.DataFrame(graph_data["hits"][1].items(), columns=['id', 'hits_authority'])

In [None]:
# For all the measure, sort the dataframe and take the first element
df_betweenness = df_betweenness.sort_values(by='betweenness_centrality', ascending=False)
df_betweenness = df_betweenness.head(1)
# Now for find the name of that id, use the join function with the df_user.
# Using the dataframes in this case is more cheap, becuase all the label you need 
# are stored in column of the dataframe. Using the graph, you need to search all this stuff.
df_betweenness = df_betweenness.join(df_users.set_index("id"), on='id')

display(df_betweenness[["id", "name", "betweenness_centrality"]])

In [None]:
df_degree = df_degree.sort_values(by='degree_centrality', ascending=False)
df_degree = df_degree.head(1)
df_degree = df_degree.join(df_users.set_index("id"), on='id')

display(df_degree[["id", "name", "degree_centrality"]])

In [None]:
df_in_degree = df_in_degree.sort_values(by='in_degree_centrality', ascending=False)
df_in_degree = df_in_degree.head(1)
df_in_degree = df_in_degree.join(df_users.set_index("id"), on='id')

display(df_in_degree[["id", "name", "in_degree_centrality"]])

In [None]:
df_out_degree = df_out_degree.sort_values(by='out_degree_centrality', ascending=False)
df_out_degree = df_out_degree.head(1)
df_out_degree = df_out_degree.join(df_users.set_index("id"), on='id')

display(df_out_degree[["id", "name", "out_degree_centrality"]])

In [None]:
df_pagerank = df_pagerank.sort_values(by='pagerank', ascending=False)
df_pagerank = df_pagerank.head(1)
df_pagerank = df_pagerank.join(df_users.set_index("id"), on='id')

display(df_pagerank[["id", "name", "pagerank"]])

In [None]:
df_hits_hubness = df_hits_hubness.sort_values(by='hits_hubness', ascending=False)
df_hits_hubness = df_hits_hubness.head(1)
df_hits_hubness = df_hits_hubness.join(df_users.set_index("id"), on='id')

display(df_hits_hubness[["id", "name", "hits_hubness"]])

In [None]:
df_hits_autority = df_hits_autority.sort_values(by='hits_authority', ascending=False)
df_hits_autority = df_hits_autority.head(1)
df_hits_autority = df_hits_autority.join(df_users.set_index("id"), on='id')

display(df_hits_autority[["id", "name", "hits_authority"]])

### Calcolo del sotto-grafo ridotto

In [None]:
# get KevinRoitero's id
id_sample_profile = df_users.loc[df_users["screen_name"] == "KevinRoitero", "id"].to_numpy()[0]

# calculate subgraph
# reduce_graph = nx.ego_graph(twitter_graph, id_sample_profile)
reduce_graph = nx.ego_graph(twitter_graph_undirected, id_sample_profile)

print("Calcolato sottografo indotto!")
print("Nnodes: ", reduce_graph.number_of_nodes())
print("Nedges: ", reduce_graph.number_of_edges())

### Calcolo della cricca massima

In [None]:
# calculate max clique 
from networkx.algorithms.approximation import clique

large_clique_size = clique.large_clique_size(reduce_graph)
print("Dimensione: ", large_clique_size)
graph_data["large_clique_size"] = large_clique_size

In [None]:
# (difficoult to compute)
max_clique = clique.max_clique(reduce_graph)
graph_data["max_clique"] = max_clique

In [None]:
print("Cricca massima: ", graph_data["max_clique"])

Cricca massima:  {132646210, 18932422, 19659370, 2273201773, 3036907250}


ricavo i nomi dei componenti della cricca: 


In [None]:
clique_ids = np.fromiter(graph_data["max_clique"], int, len(graph_data["max_clique"]))
clique_names = pd.DataFrame(clique_ids, columns=["id"])
clique_names = clique_names.merge(df_users, on="id")
clique_names[["id", "screen_name"]]

(Versione alternativa per recuperare i nomi dove non si usano i dataset)

In [None]:
for node_id in graph_data["max_clique"]:
  name = twitter_graph.nodes[node_id]["label"]
  print(node_id, name, sep="\t")

132646210	damiano10
18932422	mizzaro
19659370	eglu81
2273201773	SIGIRForum
3036907250	KevinRoitero


ricavo il sottografo con la cricca: 

In [None]:
clique_graph = graph.subgraph(clique_ids)
clique_graph

rcParams['figure.figsize'] = 15,5
nx.draw_networkx(
    clique_graph,
    pos = nx.spring_layout(clique_graph),
    node_color="#A0CBE2",
    width=2,
    edge_cmap = plt.cm.Blues,
    with_labels = True
)
plt.show()
lt.close()

### Calcolo della copertura minima

In [None]:
# min_edge_cover = nx.min_edge_cover(twitter_graph)
min_edge_cover = nx.min_edge_cover(twitter_graph_undirected)
print("Copertura minima degli archi: ", min_edge_cover)
graph_data["min_edge_cover"] = min_edge_cover

### Stima della “small-world-ness” del grafo

Definizioni da documentazione ufficiale:

*   Omega

> "The small-world coefficient of a graph G is:

> omega = Lr/L - C/Cl

> where C and L are respectively the average clustering coefficient and average shortest path length of G. Lr is the average shortest path length of an equivalent random graph and Cl is the average clustering coefficient of an equivalent lattice graph.

> The small-world coefficient (omega) ranges between -1 and 1. Values close to 0 means the G features small-world characteristics. Values close to -1 means G has a lattice shape whereas values close to 1 means G is a random graph."

*   Sigma

> "The small-world coefficient is defined as: sigma = C/Cr / L/Lr where C and L are respectively the average clustering coefficient and average shortest path length of G. Cr and Lr are respectively the average clustering coefficient and average shortest path length of an equivalent random graph. A graph is commonly classified as small-world if sigma>1."

Quindi secondo l'Omega ricavato, la rete è piccolo mondo. Secondo Sigma invece no (ma ci siamo vicini)


In [None]:
# (very difficoult to compute)

start_time = datetime.now()
print('Started at: {}'.format(start_time))

# omega = nx.omega(twitter_graph)
omega = nx.omega(twitter_graph_undirected, niter=1, nrand=1)
# omega = nx.omega(reduce_graph)

end_time = datetime.now()
print('Ended at: {}'.format(end_time))

graph_data["omega"] = omega

print('Duration: {}'.format(end_time - start_time))

In [None]:
# (difficoult to compute)

start_time = datetime.now()
print('Started at: {}'.format(start_time))

# sigma = nx.sigma(twitter_graph)
sigma = nx.sigma(twitter_graph_undirected, niter=1, nrand=1)
# sigma = nx.sigma(reduce_graph, niter=1)

end_time = datetime.now()
print('Ended at: {}'.format(end_time))

graph_data["sigma"] = sigma

print('Duration: {}'.format(end_time - start_time))

In [None]:
print("omega (grafo ridotto): ", graph_data["omega"] )
print("sigma (grafo ridotto): ", graph_data["sigma"] )

omega (grafo ridotto):  0.0028925104555082015
sigma (grafo ridotto):  0.9352890469954387


### Correlazione tra le misure di centralità

In [None]:
df_measure = pd.DataFrame()

# Recreate the dataframe used previously
df_betweenness = pd.DataFrame(graph_data["betweenness_centrality"].items(), columns=['id', 'betweenness_centrality'])
df_degree = pd.DataFrame(graph_data["degree_centrality"].items(), columns=['id', 'degree_centrality'])
df_in_degree = pd.DataFrame(graph_data["in_degree_centrality"].items(), columns=['id', 'in_degree_centrality'])
df_out_degree = pd.DataFrame(graph_data["out_degree_centrality"].items(), columns=['id', 'out_degree_centrality'])
df_pagerank = pd.DataFrame(graph_data["pagerank"].items(), columns=['id', 'pagerank'])
df_hits_hubness = pd.DataFrame(graph_data["hits"][0].items(), columns=['id', 'hits_hubness'])
df_hits_autority = pd.DataFrame(graph_data["hits"][1].items(), columns=['id', 'hits_authority'])

# To create a dataframe which contains all the measures, you need to concatenate
# multiple join operations using the "id" for the key of join, and for that reason
# the operation must be executed every single step 
df_measure = df_betweenness.join(df_degree.set_index("id"), on='id')
df_measure = df_measure.join(df_in_degree.set_index("id"), on='id')
df_measure = df_measure.join(df_out_degree.set_index("id"), on='id')
df_measure = df_measure.join(df_pagerank.set_index("id"), on='id')
df_measure = df_measure.join(df_hits_hubness.set_index("id"), on='id')
df_measure = df_measure.join(df_hits_autority.set_index("id"), on='id')

display(df_measure)

Unnamed: 0,id,betweenness_centrality,degree_centrality,in_degree_centrality,out_degree_centrality,pagerank,hits_hubness,hits_authority
0,18932422,0.054266,0.120284,0.050629,0.107062,0.023400,0.026367,2.443727e-03
1,1972411447,0.000000,0.000645,0.000000,0.000645,0.000131,0.000368,0.000000e+00
2,257691637,0.000000,0.000645,0.000000,0.000645,0.000131,0.000892,0.000000e+00
3,4893681,0.000000,0.000645,0.000322,0.000645,0.000257,0.000892,7.201491e-04
4,116897811,0.000000,0.001290,0.000322,0.000967,0.000257,0.000802,7.201491e-04
...,...,...,...,...,...,...,...,...
3097,199623886,0.000000,0.000322,0.000322,0.000000,0.000152,0.000000,3.891806e-07
3098,568953134,0.000000,0.000322,0.000322,0.000000,0.000152,0.000000,3.891806e-07
3099,323240280,0.000000,0.000322,0.000322,0.000000,0.000152,0.000000,3.891806e-07
3100,3588249436,0.000000,0.000322,0.000322,0.000000,0.000152,0.000000,3.891806e-07


In [None]:
# Create the dataframe where it will store the correlation values
df_correlation_rho = pd.DataFrame(index = ["betweenness_centrality", "degree_centrality", "in_degree_centrality", "out_degree_centrality", "pagerank", "hits_hubness", "hits_authority"],
                              columns = ["betweenness_centrality", "degree_centrality", "in_degree_centrality", "out_degree_centrality", "pagerank", "hits_hubness", "hits_authority"])

# Cycling on columns (exclued the "id"), you calculate
# the correlation value with the other column and save the file
# in a Series, which will be added in the final dataframe
for col1 in df_measure.columns:
  if (col1 != "id"):
    new_col = pd.Series(dtype = "float64")
    for col2 in df_measure.columns:
      if (col2 != "id"):
        r, p = stats.pearsonr(df_measure[col1], df_measure[col2])
        new_col = new_col.append(pd.Series(r), ignore_index=True)
    df_correlation_rho[col1] = new_col.values

# The final result will be a symmetrical dataframe on the main diagonal
display(df_correlation_rho)

Unnamed: 0,betweenness_centrality,degree_centrality,in_degree_centrality,out_degree_centrality,pagerank,hits_hubness,hits_authority
betweenness_centrality,1.0,0.99454,0.995865,0.975997,0.996149,0.909978,0.34527
degree_centrality,0.99454,1.0,0.9909,0.992307,0.986927,0.937525,0.362714
in_degree_centrality,0.995865,0.9909,1.0,0.96914,0.998743,0.912377,0.378228
out_degree_centrality,0.975997,0.992307,0.96914,1.0,0.962099,0.961181,0.353976
pagerank,0.996149,0.986927,0.998743,0.962099,1.0,0.900446,0.36282
hits_hubness,0.909978,0.937525,0.912377,0.961181,0.900446,1.0,0.375906
hits_authority,0.34527,0.362714,0.378228,0.353976,0.36282,0.375906,1.0


In [None]:
df_correlation_tau = pd.DataFrame(index = ["betweenness_centrality", "degree_centrality", "in_degree_centrality", "out_degree_centrality", "pagerank", "hits_hubness", "hits_authority"],
                              columns = ["betweenness_centrality", "degree_centrality", "in_degree_centrality", "out_degree_centrality", "pagerank", "hits_hubness", "hits_authority"])

for col1 in df_measure.columns:
  if (col1 != "id"):
    new_col = pd.Series(dtype = "float64")
    for col2 in df_measure.columns:
      if (col2 != "id"):
        tau, p_value = stats.kendalltau(df_measure[col1], df_measure[col2])
        new_col = new_col.append(pd.Series(tau), ignore_index=True)
    df_correlation_tau[col1] = new_col.values

display(df_correlation_tau)

Unnamed: 0,betweenness_centrality,degree_centrality,in_degree_centrality,out_degree_centrality,pagerank,hits_hubness,hits_authority
betweenness_centrality,1.0,0.482664,0.270257,0.276269,0.218591,0.156712,0.158652
degree_centrality,0.482664,1.0,0.533848,0.2484,0.414677,0.173905,0.394432
in_degree_centrality,0.270257,0.533848,1.0,-0.356314,0.827059,-0.267619,0.808228
out_degree_centrality,0.276269,0.2484,-0.356314,1.0,-0.255686,0.813294,-0.26817
pagerank,0.218591,0.414677,0.827059,-0.255686,1.0,-0.181745,0.824219
hits_hubness,0.156712,0.173905,-0.267619,0.813294,-0.181745,1.0,-0.166611
hits_authority,0.158652,0.394432,0.808228,-0.26817,0.824219,-0.166611,1.0


In [None]:
df_correlation_rho.to_csv(data_folder + "analysis_data/df_correlation_rho.csv", index=False)
df_correlation_tau.to_csv(data_folder + "analysis_data/df_correlation_tau.csv", index=False)

### Salvataggio in pickle dei risultati dell'analisi

In [None]:
print(graph_data)

# Save in pickle the graph data
with open(data_folder + 'analysis_data/graph_data.pickle', 'wb') as handle:
    pickle.dump(graph_data, handle, protocol=pickle.HIGHEST_PROTOCOL)