# Key Targets and Clusters within the Sustainable Development Goals

This notebook is made to explain and reproduce the results in [Laumann et al. (2019)]().

We compute the findings in the subsequent order. All computations are done per group of countries as defined in the Appendix of the paper.

1. Hilbert Schmidt independence criterion (HSIC) as dependence measure on indicator-level
2. Mapping indicator dependence measures onto target and goal-level
3. Weighted eigenvector centrality for all targets and goals
4. Weighted and unweighted connectivity per group of countries

Running the entire notebook takes approximately 5 hours. If you were looking for the results only, you can directly download them here:

[dependence_indicators]()

[dependence_targets]()

[dependence_goals]()

[weighted eigenvector centrality_targets]()

[weighted eigenvector centrality_goals]()

[weighted connectivity]()

[unweighted connectivity]()

In [1]:
import numpy as np
import pandas as pd
import math
import os
import pickle
import copy
import itertools
import networkx as nx
import operator
import matplotlib
import matplotlib.image as mpimg
import matplotlib.pyplot as plt
%matplotlib inline

import seaborn as sns
sns.set(style="white")

import warnings
warnings.filterwarnings('ignore')

import sklearn.metrics as metrics

## 1. HSIC as dependence measure on indicator-level

In [4]:
# loading standardised data set
dict_all_std = pickle.load(open('dict_all_std.pkl', 'rb'))

# check
print('Standardised values: ')
print(dict_all_std['Africa'].loc['AG_LND_FRST'])

Standardised values: 
TimePeriod
1990         NaN
1991         NaN
1992         NaN
1993         NaN
1994         NaN
1995         NaN
1996         NaN
1997         NaN
1998         NaN
1999         NaN
2000     1.35532
2001         NaN
2002         NaN
2003         NaN
2004         NaN
2005    0.448587
2006         NaN
2007         NaN
2008         NaN
2009         NaN
2010   -0.492797
2011         NaN
2012         NaN
2013         NaN
2014         NaN
2015    -1.31111
2016         NaN
2017         NaN
2018         NaN
2019         NaN
Name: AG_LND_FRST, dtype: object


In [5]:
# loading info
info = pd.read_csv('info.csv', dtype=object)
seriescodes = list(info['SeriesCode'])
info.head()

Unnamed: 0.1,Unnamed: 0,Goal,Target,Indicator,SeriesCode,SeriesDescription,Source,FootNote,Nature,Units,...,[Name of non-communicable disease],[Quantile],[Reporting Type],[Sex],[Tariff regime (status)],[Type of mobile technology],[Type of occupation],[Type of product],[Type of skill],[Type of speed]
0,0,1,1.1,1.1.1,SI_POV_DAY1,Proportion of population below international p...,"World Bank, Development Research Group. Data a...",World aggregate.,G,PERCENT,...,,,G,,,,,,,
1,1333,1,1.1,1.1.1,SI_POV_EMP1,Employed population below international povert...,"ILO estimates, November 2018, available in ILO...",,M,PERCENT,...,,,G,MALE,,,,,,
2,10504,1,1.2,1.2.1,SI_POV_NAHC,Proportion of population living below the nati...,"Source: World Development Indicators database,...",Source: Central Statistic Organization (CSO) ...,CA,PERCENT,...,,,G,,,,,,,
3,11277,1,1.3,1.3.1,SI_COV_BENFTS,[ILO] Proportion of population covered by at l...,ILO estimates based on country data compled th...,,E,PERCENT,...,,,G,BOTHSEX,,,,,,
4,11389,1,1.3,1.3.1,SI_COV_CHLD,[ILO] Proportion of children/households receiv...,ILO estimates based on country data compled th...,,E,PERCENT,...,,,G,BOTHSEX,,,,,,


We first generate a list of all unique **indicators**.

In [7]:
indicators = list(info['Indicator'].unique())
indicators

['1.1.1',
 '1.2.1',
 '1.3.1',
 '1.5.3',
 '1.5.4',
 '1.5.1',
 '1.5.2',
 '1.a.2',
 '2.1.2',
 '2.1.1',
 '2.2.2',
 '2.2.1',
 '2.5.1',
 '2.5.2',
 '2.a.1',
 '2.a.2',
 '2.b.1',
 '2.c.1',
 '3.1.2',
 '3.1.1',
 '3.2.1',
 '3.2.2',
 '3.3.4',
 '3.3.1',
 '3.3.3',
 '3.3.2',
 '3.3.5',
 '3.4.1',
 '3.4.2',
 '3.5.2',
 '3.6.1',
 '3.7.1',
 '3.7.2',
 '3.8.1',
 '3.8.2',
 '3.9.1',
 '3.9.3',
 '3.9.2',
 '3.a.1',
 '3.b.2',
 '3.b.1',
 '3.c.1',
 '3.d.1',
 '4.1.1',
 '4.2.1',
 '4.2.2',
 '4.3.1',
 '4.4.1',
 '4.5.1',
 '4.6.1',
 '4.a.1',
 '4.b.1',
 '4.c.1',
 '5.2.1',
 '5.3.2',
 '5.3.1',
 '5.4.1',
 '5.5.2',
 '5.5.1',
 '5.6.1',
 '5.b.1',
 '6.1.1',
 '6.2.1',
 '6.3.2',
 '6.3.1',
 '6.4.2',
 '6.4.1',
 '6.5.2',
 '6.5.1',
 '6.6.1',
 '6.a.1',
 '6.b.1',
 '7.1.2',
 '7.1.1',
 '7.2.1',
 '7.3.1',
 '8.1.1',
 '8.10.1',
 '8.10.2',
 '8.2.1',
 '8.3.1',
 '8.4.2',
 '8.5.1',
 '8.5.2',
 '8.6.1',
 '8.7.1',
 '8.8.1',
 '8.a.1',
 '9.1.2',
 '9.2.1',
 '9.2.2',
 '9.3.2',
 '9.3.1',
 '9.4.1',
 '9.5.2',
 '9.5.1',
 '9.a.1',
 '9.b.1',
 '9.c.1',
 '10.1.1

We also generate a dictionary with lists of series codes, i.e. sub-indicators, belonging to each indicator. Each indicator has one list, consequently.

In [8]:
dict_indicators = {}

for indicator in indicators:
    i = info['SeriesCode'].where(info['Indicator'] == indicator)

    dict_indicators[indicator] = [s for s in i if str(s) != 'nan']

In [34]:
#check 
print(dict_indicators['12.1.1'])

['SG_SCP_CNTRY', 'SG_SCP_CORMEC', 'SG_SCP_MACPOL', 'SG_SCP_POLINS']


We calculate the averages of all series codes, i.e. sub-indicators, in every given year to have values for indicators:

In [13]:
# defining the years
period = ['1990', '1991', '1992', '1993', '1994', '1995', '1996', '1997', '1998', '1999', '2000', '2001', '2002', '2003', '2004', '2005', '2006', '2007', '2008', '2009', '2010', '2011', '2012', '2013', '2014', '2015', '2016', '2017', '2018', '2019']

In [14]:
# have years as integers
years = []
for year in period:
    years.append([int(year)])

In [15]:
# average over series codes to have values for indicators
indicators_values = {}

for country in list(dict_all_std):
    
    indicators_values[country] = pd.DataFrame(columns=period, index=indicators)
    
    for year in period:
        
        for indicator in indicators:
            list_subindicators_values = []
    
            for subindicator in list(dict_indicators[indicator]):
                list_subindicators_values.append(dict_all_std[country].loc[subindicator, year])
    
            # ignoring NaNs
            indicators_values[country].loc[indicator, year] = np.nanmean(list_subindicators_values)

In [23]:
# check
print(dict_all_std['France'].loc['SI_COV_BENFTS'])
indicators_values['France'].head()

TimePeriod
1990    NaN
1991    NaN
1992    NaN
1993    NaN
1994    NaN
1995    NaN
1996    NaN
1997    NaN
1998    NaN
1999    NaN
2000    NaN
2001    NaN
2002    NaN
2003    NaN
2004    NaN
2005    NaN
2006    NaN
2007    NaN
2008    NaN
2009    NaN
2010    NaN
2011    NaN
2012    NaN
2013    NaN
2014    NaN
2015    NaN
2016    NaN
2017    NaN
2018    NaN
2019    NaN
Name: SI_COV_BENFTS, dtype: object


Unnamed: 0,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,...,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019
1.1.1,,,,,,,,,,,...,,,,,,,,,,
1.2.1,,,,,,,,,,,...,,,,,,,,,,
1.3.1,,,,,,,,,,,...,-0.477151,-0.694038,,,2.3337,,,,,
1.5.3,,,,,,,,,,,...,,,,,,,,,,
1.5.4,,,,,,,,,,,...,,,,,,,,,,


We compute the HSIC for unique pairs of indicators to attain later a dependence measure on target-level:

In [17]:
# create list out of all unique combinations
indicatorcombinations = list(itertools.combinations_with_replacement(indicators, 2))
indicatorcombinations

[('1.1.1', '1.1.1'),
 ('1.1.1', '1.2.1'),
 ('1.1.1', '1.3.1'),
 ('1.1.1', '1.5.3'),
 ('1.1.1', '1.5.4'),
 ('1.1.1', '1.5.1'),
 ('1.1.1', '1.5.2'),
 ('1.1.1', '1.a.2'),
 ('1.1.1', '2.1.2'),
 ('1.1.1', '2.1.1'),
 ('1.1.1', '2.2.2'),
 ('1.1.1', '2.2.1'),
 ('1.1.1', '2.5.1'),
 ('1.1.1', '2.5.2'),
 ('1.1.1', '2.a.1'),
 ('1.1.1', '2.a.2'),
 ('1.1.1', '2.b.1'),
 ('1.1.1', '2.c.1'),
 ('1.1.1', '3.1.2'),
 ('1.1.1', '3.1.1'),
 ('1.1.1', '3.2.1'),
 ('1.1.1', '3.2.2'),
 ('1.1.1', '3.3.4'),
 ('1.1.1', '3.3.1'),
 ('1.1.1', '3.3.3'),
 ('1.1.1', '3.3.2'),
 ('1.1.1', '3.3.5'),
 ('1.1.1', '3.4.1'),
 ('1.1.1', '3.4.2'),
 ('1.1.1', '3.5.2'),
 ('1.1.1', '3.6.1'),
 ('1.1.1', '3.7.1'),
 ('1.1.1', '3.7.2'),
 ('1.1.1', '3.8.1'),
 ('1.1.1', '3.8.2'),
 ('1.1.1', '3.9.1'),
 ('1.1.1', '3.9.3'),
 ('1.1.1', '3.9.2'),
 ('1.1.1', '3.a.1'),
 ('1.1.1', '3.b.2'),
 ('1.1.1', '3.b.1'),
 ('1.1.1', '3.c.1'),
 ('1.1.1', '3.d.1'),
 ('1.1.1', '4.1.1'),
 ('1.1.1', '4.2.1'),
 ('1.1.1', '4.2.2'),
 ('1.1.1', '4.3.1'),
 ('1.1.1', '4

In [None]:
# TBA

# HSIC
def HSIC

# F
def F(q):
    

Recall that we want to test across multiple lags $M$. To do so, we check whether a dependence exists between $X_t$ and $Y_{t+m}$ for $-M \leq m \leq M$, where an individual hypothesis must be defined for each lag $m$. This requires a Bonferroni correction to attain the correct test level $\alpha$ by defining $q = 1 - \frac{\alpha}{2M+1}$. Let $S_{m,n}$ denote the value of the normalised HSIC statistic of the shifted time-series $(X_t, Y_{t+m})$. Let $F_{b,n}(q)$ be the empirical cumulative distribution function (cdf) obtained by the aforementioned block circular bootstrapping at the Bonferroni correction $q$. 

We reject the null hypothesis $\mathcal{H}_0 : \mathbb{P}_{XY} = \mathbb{P}_{X} \mathbb{P}_{Y}$ if 
$$
\underset{-M \leq m \leq M}{\text{max}} S_{m,n} > F_{b,n}^{-1}(q).
$$

and see, if $\mathcal{H}_0$ is rejected, this maximum $S_{m,n}$ across all lags $2M + 1$ as the measure of dependence. We conclude that the larger this maximum $S_{m,n}$ is, the more dependent are the two time-series $X$ and $Y$.

In [None]:
# dictionary with countries as keys and dependence matrices as values
dict_dependence_indicator = {}

for country in list(dict_all_std):
    dict_dependence_indicator[country] = pd.DataFrame(columns=indicators, index=indicators)
    
    for indicatorcombination in indicatorcombinations:
        
        # HSIC across multiple lags
        HSIC_list = []
        
        for lag in years:
            HSIC = # define HSIC before and call it here
            HSIC_list.append(HSIC)
            
        max_HSIC = np.max(HSIC_list)
        min_HSIC = np.min(HSIC_list)
        
        # normalise HSIC statistic
        S_list = []
        
        for HSIC in HSIC_list:
            S = (HSIC - min_HSIC) / (max_HSIC - min_HSIC)    
            S_list.append(S)
            
        max_S = np.max(S_list)
        
        # reject H0
        if max_S > F(q):   # define F(q) before and call it here
            max_S = max_S
        
        # accept H0
        else:
            max_S = 0
        
        
        dict_dependence_indicator[country].loc[indicatorcombination[0], indicatorcombination[1]] = max_S

In [None]:
# check
dict_dependence_indicator['France'].loc['12.1.1', '1.1.1']

In [None]:
# saving the indicator values
v = open('indicator_values.pkl', 'wb')
pickle.dump(indicator_values, v)
v.close()

# saving these dependence matrices
f = open('dependence_indicator.pkl', 'wb')
pickle.dump(dict_dependence_indicator, f)
f.close()

## 2. Mapping indicator dependence measures onto target and goal-level

To obtain dependencies on target-level, we compute the average of all $S_{m,n}$ between any pair of indicators associated with two targets. 

For example, the dependence between the targets 12.2 and 15.3 wants to be determined and the progress of the former target is measured by two indicators, 12.2.1 and 12.2.2, the latter by one indicator, 15.3.1. Hence, we compute the $S_{m,n}$ between 12.2.1 and 15.3.1, and between 12.2.1 and 15.3.1. The average of these two $S_{m,n}$ is the dependence measure between the targets 12.2 and 15.3. If a pair of targets has two indicators only, the $S_{m,n}$ between the indicators is the dependence measure between the targets.

### 2.1 Target-level
We first generate a list of all unique **targets**.

In [30]:
targets = list(info['Target'].unique())
targets

['1.1',
 '1.2',
 '1.3',
 '1.5',
 '1.a',
 '2.1',
 '2.2',
 '2.5',
 '2.a',
 '2.b',
 '2.c',
 '3.1',
 '3.2',
 '3.3',
 '3.4',
 '3.5',
 '3.6',
 '3.7',
 '3.8',
 '3.9',
 '3.a',
 '3.b',
 '3.c',
 '3.d',
 '4.1',
 '4.2',
 '4.3',
 '4.4',
 '4.5',
 '4.6',
 '4.a',
 '4.b',
 '4.c',
 '5.2',
 '5.3',
 '5.4',
 '5.5',
 '5.6',
 '5.b',
 '6.1',
 '6.2',
 '6.3',
 '6.4',
 '6.5',
 '6.6',
 '6.a',
 '6.b',
 '7.1',
 '7.2',
 '7.3',
 '8.1',
 '8.10',
 '8.2',
 '8.3',
 '8.4',
 '8.5',
 '8.6',
 '8.7',
 '8.8',
 '8.a',
 '9.1',
 '9.2',
 '9.3',
 '9.4',
 '9.5',
 '9.a',
 '9.b',
 '9.c',
 '10.1',
 '10.4',
 '10.5',
 '10.6',
 '10.a',
 '10.b',
 '10.c',
 '11.1',
 '11.5',
 '11.6',
 '12.1',
 '12.4',
 '12.c',
 '14.4',
 '14.5',
 '14.6',
 '14.a',
 '14.b',
 '15.1',
 '15.2',
 '15.4',
 '15.5',
 '15.6',
 '15.a',
 '16.1',
 '16.10',
 '16.2',
 '16.3',
 '16.5',
 '16.6',
 '16.9',
 '16.a',
 '17.10',
 '17.11',
 '17.12',
 '17.15',
 '17.16',
 '17.18',
 '17.19',
 '17.2',
 '17.3',
 '17.4',
 '17.6',
 '17.8',
 '17.9']

We also generate a dictionary with lists of indicators belonging to each target. Each target has one list, consequently.

In [35]:
dict_targets = {}

for target in targets:
    i = info['Indicator'].where(info['Target'] == target).unique()

    dict_targets[target] = [s for s in i if str(s) != 'nan']

In [44]:
#check 
print(dict_targets['1.5'])

['1.5.3', '1.5.4', '1.5.1', '1.5.2']


Now, we generate all possible combinations of targets:

In [37]:
# create list out of all unique combinations
targetcombinations = list(itertools.combinations_with_replacement(targets, 2))
targetcombinations

[('1.1', '1.1'),
 ('1.1', '1.2'),
 ('1.1', '1.3'),
 ('1.1', '1.5'),
 ('1.1', '1.a'),
 ('1.1', '2.1'),
 ('1.1', '2.2'),
 ('1.1', '2.5'),
 ('1.1', '2.a'),
 ('1.1', '2.b'),
 ('1.1', '2.c'),
 ('1.1', '3.1'),
 ('1.1', '3.2'),
 ('1.1', '3.3'),
 ('1.1', '3.4'),
 ('1.1', '3.5'),
 ('1.1', '3.6'),
 ('1.1', '3.7'),
 ('1.1', '3.8'),
 ('1.1', '3.9'),
 ('1.1', '3.a'),
 ('1.1', '3.b'),
 ('1.1', '3.c'),
 ('1.1', '3.d'),
 ('1.1', '4.1'),
 ('1.1', '4.2'),
 ('1.1', '4.3'),
 ('1.1', '4.4'),
 ('1.1', '4.5'),
 ('1.1', '4.6'),
 ('1.1', '4.a'),
 ('1.1', '4.b'),
 ('1.1', '4.c'),
 ('1.1', '5.2'),
 ('1.1', '5.3'),
 ('1.1', '5.4'),
 ('1.1', '5.5'),
 ('1.1', '5.6'),
 ('1.1', '5.b'),
 ('1.1', '6.1'),
 ('1.1', '6.2'),
 ('1.1', '6.3'),
 ('1.1', '6.4'),
 ('1.1', '6.5'),
 ('1.1', '6.6'),
 ('1.1', '6.a'),
 ('1.1', '6.b'),
 ('1.1', '7.1'),
 ('1.1', '7.2'),
 ('1.1', '7.3'),
 ('1.1', '8.1'),
 ('1.1', '8.10'),
 ('1.1', '8.2'),
 ('1.1', '8.3'),
 ('1.1', '8.4'),
 ('1.1', '8.5'),
 ('1.1', '8.6'),
 ('1.1', '8.7'),
 ('1.1', '8.8

We take each of these combinations and calculate the mean over all average $S_{m,n}$ between the indicators associated to the given two targets.

This is easiest illustrated by an example. Here, we want to determine the dependence measure between the targets 7.1 and 13.3.

<img src="indicator_combination.png">

We can see that we have for this example six different $S_{m,n}$, which are already averaged over the different lags. We compute the mean over these six to have a dependence measure on target-level.

In [None]:
# dictionary with countries as keys and dependence matrices as values
dict_dependence_target = {}

for country in list(dict_all_std):
    dict_dependence_target[country] = pd.DataFrame(columns=targets, index=targets)
    
    for targetcombination in targetcombinations:
        
        S_list = []
        
        for indicator_0 in dict_targets[targetcombination[0]]:
            for indicator_1 in dict_targets[targetcombination[1]]:
                S_list.append(dict_dependence_indicator[country].loc[dict_targets[targetcombination[0]][indicator_0], dict_targets[targetcombination[1]][indicator_1]])
        
        avg_S = np.mean(S_list)
        
        dict_dependence_target[country].loc[targetcombination[0], targetcombination[1]] = avg_S

In [None]:
# check
dict_dependence_target['France'].loc['12.1', '1.1']

In [None]:
# saving these dependence matrices
f = open('dependence_target.pkl', 'wb')
pickle.dump(dict_dependence_target, f)
f.close()

### 2.2 Goal-level
We first generate a list of all unique **goals**.

In [48]:
goals = list(info['Goal'].unique())
goals

['1',
 '2',
 '3',
 '4',
 '5',
 '6',
 '7',
 '8',
 '9',
 '10',
 '11',
 '12',
 '14',
 '15',
 '16',
 '17']

We also generate a dictionary with lists of targets belonging to each goal. Each goal has one list, consequently.

In [49]:
dict_goals = {}

for goal in goals:
    i = info['Target'].where(info['Goal'] == goal).unique()

    dict_goals[goal] = [s for s in i if str(s) != 'nan']

In [50]:
#check 
print(dict_goals['1'])

['1.1', '1.2', '1.3', '1.5', '1.a']


Now, we generate all possible combinations of goals:

In [51]:
# create list out of all unique combinations
goalcombinations = list(itertools.combinations_with_replacement(goals, 2))
goalcombinations

[('1', '1'),
 ('1', '2'),
 ('1', '3'),
 ('1', '4'),
 ('1', '5'),
 ('1', '6'),
 ('1', '7'),
 ('1', '8'),
 ('1', '9'),
 ('1', '10'),
 ('1', '11'),
 ('1', '12'),
 ('1', '14'),
 ('1', '15'),
 ('1', '16'),
 ('1', '17'),
 ('2', '2'),
 ('2', '3'),
 ('2', '4'),
 ('2', '5'),
 ('2', '6'),
 ('2', '7'),
 ('2', '8'),
 ('2', '9'),
 ('2', '10'),
 ('2', '11'),
 ('2', '12'),
 ('2', '14'),
 ('2', '15'),
 ('2', '16'),
 ('2', '17'),
 ('3', '3'),
 ('3', '4'),
 ('3', '5'),
 ('3', '6'),
 ('3', '7'),
 ('3', '8'),
 ('3', '9'),
 ('3', '10'),
 ('3', '11'),
 ('3', '12'),
 ('3', '14'),
 ('3', '15'),
 ('3', '16'),
 ('3', '17'),
 ('4', '4'),
 ('4', '5'),
 ('4', '6'),
 ('4', '7'),
 ('4', '8'),
 ('4', '9'),
 ('4', '10'),
 ('4', '11'),
 ('4', '12'),
 ('4', '14'),
 ('4', '15'),
 ('4', '16'),
 ('4', '17'),
 ('5', '5'),
 ('5', '6'),
 ('5', '7'),
 ('5', '8'),
 ('5', '9'),
 ('5', '10'),
 ('5', '11'),
 ('5', '12'),
 ('5', '14'),
 ('5', '15'),
 ('5', '16'),
 ('5', '17'),
 ('6', '6'),
 ('6', '7'),
 ('6', '8'),
 ('6', '9'),
 ('

As with targets, ee take each of these combinations and calculate the mean over all dependence measures between the targets associated to the given two goals.

In [None]:
# dictionary with countries as keys and dependence matrices as values
dict_dependence_goal = {}

for country in list(dict_all_std):
    dict_dependence_goal[country] = pd.DataFrame(columns=goals, index=goals)
    
    for goalcombination in goalcombinations:
        
        S_list = []
        
        for target_0 in dict_goalss[goalcombination[0]]:
            for target_1 in dict_goals[goalcombination[1]]:
                S_list.append(dict_dependence_target[country].loc[dict_goals[targetcombination[0]][target_0], dict_goals[goalcombination[1]][target_1]])
        
        avg_S = np.mean(S_list)
        
        dict_dependence_goal[country].loc[goalcombination[0], goalcombination[1]] = avg_S

In [None]:
# check
dict_dependence_goal['France'].loc['12', '1']

In [None]:
# saving these dependence matrices
f = open('dependence_goal.pkl', 'wb')
pickle.dump(dict_dependence_goal, f)
f.close()

## 2.3 Visualisations

The goal and target interlinkages are visualised as networks. Additionally, target interlinkages are visualised in a matrix.

### 2.3.1 Visualisation as networks on goal-level

We want to visualise the interaction networks on a goal-level and chosse the width and colour of the edges connecting two goals according to the dependence measure between those. 

For the colours of the edges, we pick a yellow to orange to red colourmap `YlOrRd` and save the hex-codes for them in a dictionary. Since our dependence measures are normalised, we can pick the colour for our edges according to these normalised dependence measures. Therefore, we have the keys in our dictionary `colours` as `i/cmap.N` where `N` is the total number of bars and by default 256.

In [92]:
cmap = plt.cm.get_cmap('YlOrRd')

colours = {}

for i in range(cmap.N):
    rgb = cmap(i)[:3] # will return rgba, we take only first 3 so we get rgb
    colours[i/cmap.N] = matplotlib.colors.rgb2hex(rgb)

In [93]:
# check
colours[0.00390625]

'#fffecb'

Here, we pick the hex-code for our colour which corresponds to the dependence measure between two goals.

Adding the dependence measures as the weights of the edges is much easier. We multiply them by 5 to distinguish them stronger visually.

In [None]:
# create directory to save visualisations
if not os.path.exists('visualisations/goal-networks'):
    os.mkdir('visualisations/goal-networks')

Drawing of the networks:

In [None]:
# for goal-level centrality
ranking_goal = {}

for country in list(dict_all_std):
    
    G = nx.Graph()
    pos = nx.circular_layout(G)
    
    for goalcombination in goalcombinations:
        # picking the colour in colours that is closest to dependence
        dependence = dict_dependence_goal[country].loc[goalcombination[0], goalcombination[1]]
        colour = colours[dependence if dependence in colours else colours[min(colours.keys(), key=lambda k: abs(k-dependence))]
        
        G.add_edge(int(goalcombination[0]), int(goalcombination[1]), weight=np.multiply(dependence, 5), color=colour)
    
    plt.figure(figsize=(24,16))

    # nodes
    nx.draw_networkx_nodes(G, pos, node_size=1000)

    # labels
    nx.draw_networkx_labels(G, pos, font_size=46, font_family='sans-serif')

    edges = G.edges()
    colours = [G[u][v]['color'] for u,v in edges]
    weights = [G[u][v]['weight'] for u,v in edges]

    nx.draw_networkx(G, pos, with_labels=False, edges=edges, edge_color=colours, node_color='white', node_size=1000, width=weights)

    ax=plt.gca()
    fig=plt.gcf()
    trans = ax.transData.transform
    trans_axes = fig.transFigure.inverted().transform
    imsize = 0.08    # this is the image size
    fig.colorbar(cmap=cmap, ax=ax)

    for node in G.nodes():
        (x,y) = pos[node]   
        xx,yy = trans((x,y)) # figure coordinates
        xa,ya = trans_axes((xx,yy)) # axes coordinates
        a = plt.axes([xa-imsize/2.0,ya-imsize/2.0, imsize, imsize])
        a.imshow(mpimg.imread('images/WEB FILES/E SDG Web Files without UN Emblem/E SDG Icons Square/E_SDG goals_icons-individual-rgb-{}.png'.format(node)))
        a.axis('off')


    plt.axis('off')
    ax.axis('off')
    
    # eigenvector centrality
    centrality = nx.eigenvector_centrality(G)
    ranking_goal[country] = sorted(centrality.items(), key=operator.itemgetter(1)) #[0]                       
    
    # save visualisations as separate images
    plt.savefig('visualisations/goal-networks/{}.png'.format(country), format='png')

    #plt.show()

### 2.3.2 Visualisation  as networks on target-level

In [None]:
# create directory to save visualisations
if not os.path.exists('visualisations/target-networks'):
    os.mkdir('visualisations/target-networks')

In [None]:
# for target-level centrality
ranking_target = {}

for country in list(dict_all_std):
    
    G = nx.Graph()
    pos = nx.circular_layout(G)
    
    for targetcombination in targetcombinations:
        # picking the colour in colours that is closest to dependence
        dependence = dict_dependence_target[country].loc[targetcombination[0], targetcombination[1]]
        colour = colours[dependence if dependence in colours else colours[min(colours.keys(), key=lambda k: abs(k-dependence))]
        
        G.add_edge(int(targetcombination[0]), int(targetcombination[1]), weight=np.multiply(dependence, 5), color=colour)
    
    plt.figure(figsize=(24,16))

    # nodes
    nx.draw_networkx_nodes(G, pos, node_size=1000)

    # labels
    nx.draw_networkx_labels(G, pos, font_size=46, font_family='sans-serif')

    edges = G.edges()
    colours = [G[u][v]['color'] for u,v in edges]
    weights = [G[u][v]['weight'] for u,v in edges]

    nx.draw_networkx(G, pos, with_labels=False, edges=edges, edge_color=colours, node_color='white', node_size=1000, width=weights)

    ax=plt.gca()
    fig=plt.gcf()
    trans = ax.transData.transform
    trans_axes = fig.transFigure.inverted().transform
    imsize = 0.08    # this is the image size
    fig.colorbar(cmap=cmap, ax=ax)

    for node in G.nodes():
        (x,y) = pos[node]   
        xx,yy = trans((x,y)) # figure coordinates
        xa,ya = trans_axes((xx,yy)) # axes coordinates
        a = plt.axes([xa-imsize/2.0,ya-imsize/2.0, imsize, imsize])
        a.axis('off')


    plt.axis('off')
    ax.axis('off')
                         
    # eigenvector centrality
    centrality = nx.eigenvector_centrality(G)
    ranking_target[country] = sorted(centrality.items(), key=operator.itemgetter(1)) #[0]  
    
    # save visualisations as separate images
    plt.savefig('visualisations/target-networks/{}.png'.format(country), format='png')

    #plt.show()

### 2.3.3 Visualisation  as matrices on target-level

In [None]:
# create directory to save visualisations
if not os.path.exists('visualisations/target-matrices'):
    os.mkdir('visualisations/target-matrices')

Drawing the matrices:

In [None]:
for country in list(dict_all_std):

    # from dictionary to data frame
    dependence_targets = pd.DataFrame(columns=targets, index=targets)

    for targetcombination in targetcombinations:
        dependence_targets[targetcombination[0]].loc[targetcombination[1]] = float(dict_dependence_target[country].loc[targetcombination[0], targetcombination[1]])
    
    
    # generate a mask for the upper triangle
    mask = np.zeros_like(dependence_targets.fillna(0), dtype=np.bool)
    mask[np.triu_indices_from(mask)] = True

    # set up the matplotlib figure
    fig, ax = plt.subplots(figsize=(24,16))

    # call the custom colormap
    fig.colorbar(cmap=cmap, ax=ax)

    # draw the heatmap with the mask and correct aspect ratio
    sns.heatmap(dependence_targets.fillna(0), mask=mask, cmap=cmap, vmax=1, center=0.5, vmin=0,
                square=True, linewidths=.5, cbar_kws={"shrink": .8})

    # save visualisations as separate images
    plt.savefig('visualisations/target-matrices/{}.png'.format(country))
    
    #plt.show()

## 3. Weighted eigenvector centrality for targets and goals

As you may have spotted, we have already computed the centrality in the previous cells, because it was convenient to be added there. Here, you cann see the [documenation](https://networkx.github.io/documentation/stable/reference/algorithms/generated/networkx.algorithms.centrality.eigenvector_centrality.html#networkx.algorithms.centrality.eigenvector_centrality) of the eigenvector centrality.

In [None]:
# check
ranking_target['Italy']

In [None]:
# saving the target centrality measures
f = open('ranking_target.pkl', 'wb')
pickle.dump(ranking_target, f)
f.close()

In [None]:
# check
ranking_goal['Italy']

In [None]:
# saving the goal centrality measures
f = open('ranking_goal.pkl', 'wb')
pickle.dump(ranking_goal, f)
f.close()

## 4. Weighted and unweighted connectivity per country

We measure the weighted and unweighted connectivity $\mathcal{C}$ of the target-networks for any country $c$, which grant insights how strongly the SDGs are inter-linked overall. Let $L_c$ and $N_c$ be the sum of all weights/edges and nodes, respectively. Then, the weighted connectivity ranges from 0 to 1 where a network with connections between all nodes being of weight 1 has a weighted connectivity of 1. Comparably, the unweighted connectivity disregards these weights and sums over the number of edges only. Both are calculated by
$$
\mathcal{C}_c = \frac{L_c}{N_c \times (N_c-1)} .
$$

In [None]:
# weighted connectivity
dict_w_connectivity = {}

for country in list(dict_all_std):
    L = dict_dependence_target[country].sum().sum()   # first sum() sums over columns, second sum() sums over these
    N = list(dict_dependence_target)
    
    dict_w_connectivity[country] = L/(N*(N-1))

In [None]:
# check
dict_w_connectivity['Spain']

In [None]:
# saving these connectivities
f = open('connectivity_weighted.pkl', 'wb')
pickle.dump(dict_w_connectivity, f)
f.close()

In [None]:
# unweighted connectivity
dict_u_connectivity = {}

for country in list(dict_all_std):
    
    for dependence in dict_dependence_target[country].items():
        # print(dependence)
        if dependence[1][0] > 0:
            L += 1    # disregard weight
        
    N = list(dict_dependence_target)
    
    dict_u_connectivity[country] = L/(N*(N-1))

In [None]:
# check
dict_u_connectivity['Spain']

In [None]:
# saving these connectivities
f = open('connectivity_unweighted.pkl', 'wb')
pickle.dump(dict_u_connectivity, f)
f.close()