# **Graph Representation of Ethnologue Language Hierarchy**

We aim to structure the data in json format such that we have a graph representation resembling the airport data in lab:

```
{
    'nodes' : [
        {
            'id': node_index,
            'name': language_name,
            'country': country_name,
            'score': EGIDS_score,
            'latitude': latitude,
            'longitude': longitude
        }, 
        ...
    ], 

    'links' : [
        {
            'source': parent_classification_index,
            'target': child_classifcation_index
        }
    ]

}
```

In [21]:
import numpy as np
import pandas as pd
import json

In [22]:
tol_df = pd.read_csv("../data/Table_of_Languages.csv")
tol_df.head()

Unnamed: 0,ISO_639,Language_Name,Uninverted_Name,Country_Code,Country_Name,Region_Code,Region_Name,Area,L1_Users,Digits,...,Latitude,Longitude,EGIDS,Is_Written,Institutional,Developing,Vigorous,In_Trouble,Dying,Extinct
0,aaa,Ghotuo,Ghotuo,NG,Nigeria,WAF,Western Africa,Africa,9000.0,4.0,...,7.1154,5.9528,6a,F,0,0,1,0,0,0
1,aab,Alumu-Tesu,Alumu-Tesu,NG,Nigeria,WAF,Western Africa,Africa,7000.0,4.0,...,9.0164,8.612,6a,F,0,0,1,0,0,0
2,aac,Ari,Ari,PG,Papua New Guinea,MEL,Melanesia,Pacific,50.0,2.0,...,-7.9172,142.3877,6b,T,0,0,0,1,0,0
3,aad,Amal,Amal,PG,Papua New Guinea,MEL,Melanesia,Pacific,830.0,3.0,...,-4.0487,141.9967,6a,F,0,0,1,0,0,0
4,aae,"Albanian, Arbëreshë",Arbëreshë Albanian,IT,Italy,SEU,Southern Europe,Europe,100000.0,6.0,...,38.8985,16.7019,6b,T,0,0,0,1,0,0


In [23]:
tol_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7457 entries, 0 to 7456
Data columns (total 24 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   ISO_639          7456 non-null   object 
 1   Language_Name    7457 non-null   object 
 2   Uninverted_Name  7457 non-null   object 
 3   Country_Code     7441 non-null   object 
 4   Country_Name     7457 non-null   object 
 5   Region_Code      7457 non-null   object 
 6   Region_Name      7457 non-null   object 
 7   Area             7457 non-null   object 
 8   L1_Users         7248 non-null   float64
 9   Digits           7248 non-null   float64
 10  All_Users        7047 non-null   float64
 11  Countries        7457 non-null   int64  
 12  Family           7457 non-null   object 
 13  Classification   7425 non-null   object 
 14  Latitude         7426 non-null   float64
 15  Longitude        7426 non-null   float64
 16  EGIDS            7457 non-null   object 
 17  Is_Written    

## **Constructing a Hierarchal Language Tree**

In [24]:
# lets only deal with non-null classifications (complete cases)
tol_df_cc = tol_df.dropna(axis = 0, how = 'any')
print(tol_df_cc.shape)

(6967, 24)


Let's also set commas in `Language_Name` to hyphens in order to not mess up our `.split` distinction.

In [25]:
tol_df_cc['Language_Name'] = tol_df_cc['Language_Name'].str.replace(', ', '-')
tol_df_cc.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  tol_df_cc['Language_Name'] = tol_df_cc['Language_Name'].str.replace(', ', '-')


Unnamed: 0,ISO_639,Language_Name,Uninverted_Name,Country_Code,Country_Name,Region_Code,Region_Name,Area,L1_Users,Digits,...,Latitude,Longitude,EGIDS,Is_Written,Institutional,Developing,Vigorous,In_Trouble,Dying,Extinct
0,aaa,Ghotuo,Ghotuo,NG,Nigeria,WAF,Western Africa,Africa,9000.0,4.0,...,7.1154,5.9528,6a,F,0,0,1,0,0,0
1,aab,Alumu-Tesu,Alumu-Tesu,NG,Nigeria,WAF,Western Africa,Africa,7000.0,4.0,...,9.0164,8.612,6a,F,0,0,1,0,0,0
2,aac,Ari,Ari,PG,Papua New Guinea,MEL,Melanesia,Pacific,50.0,2.0,...,-7.9172,142.3877,6b,T,0,0,0,1,0,0
3,aad,Amal,Amal,PG,Papua New Guinea,MEL,Melanesia,Pacific,830.0,3.0,...,-4.0487,141.9967,6a,F,0,0,1,0,0,0
4,aae,Albanian-Arbëreshë,Arbëreshë Albanian,IT,Italy,SEU,Southern Europe,Europe,100000.0,6.0,...,38.8985,16.7019,6b,T,0,0,0,1,0,0


For each group of classification, we take the average longitude/latitude as the rough location (this may or may not be an accurate depiction of the true language origin).

In [26]:
tol_df_cc['Classification'].unique()

array(['Niger-Congo, Atlantic-Congo, Volta-Congo, Benue-Congo, Edoid, North-Central, Ghotuo-Uneme-Yekhee',
       'Niger-Congo, Atlantic-Congo, Volta-Congo, Benue-Congo, Plateau, Alumic',
       'Trans-New Guinea, Gogodala-Suki, Gogodala', ...,
       'Mixe-Zoquean, Zoquean, Chiapas Zoquean, Northeast Zoque',
       'Zaparoan, Záparo',
       'Niger-Congo, Atlantic-Congo, Volta-Congo, Benue-Congo, Bantoid, Southern, Narrow Bantu, Central, S, Nguni (S.42)'],
      dtype=object)

In [27]:
# build lon/lat hierarchal dictionary
hier_dict = {}

# build tree to leaf languages
node_index = 0
for row in range(tol_df_cc.shape[0]):
    curr_lang = tol_df_cc.iloc[row]
    classification = curr_lang['Classification']

    # for each row, look at each hierarchal structure
    hierarchy = classification.split(", ") + [curr_lang['Language_Name']]
    for i in range(1, len(hierarchy)+1): # ordered levels split by commas
        subgroup = ", ".join(hierarchy[:i])

        # if we haven't seen a child before, then define a new empty node
        if subgroup not in hier_dict: 
            # define child node
            hier_dict[subgroup] = {
                'subgroups': {}, 
                'name' : "",
                'country': [],
                'latitude': [],
                'longitude': [],
                'id': 0,
                'code': [],
                'score': [],
                'length': i
            }        
            hier_dict[subgroup]['id'] = node_index
            hier_dict[subgroup]['name'] = hierarchy[i-1]
            node_index += 1 # index for every subgroup
        
        # get child node lat/lon for aggregation later
        hier_dict[subgroup]['latitude'].append(curr_lang['Latitude'])
        hier_dict[subgroup]['longitude'].append(curr_lang['Longitude'])
        hier_dict[subgroup]['code'].append(curr_lang['ISO_639']) # language codes
        hier_dict[subgroup]['score'].append(curr_lang['EGIDS']) # egids scores
        hier_dict[subgroup]['country'].append(curr_lang['Country_Name']) # country names
        

print("Total Nodes: ", node_index)


Total Nodes:  10177


In [28]:
# aggregating node values
for subgroup in hier_dict:
    countries = hier_dict[subgroup]['country']
    scores = hier_dict[subgroup]['score']
    hier_dict[subgroup]['country'] = max(set(countries), key=countries.count)
    hier_dict[subgroup]['score'] = max(set(scores), key=scores.count)
    hier_dict[subgroup]['mean_latitude'] = np.mean(hier_dict[subgroup]['latitude'])
    hier_dict[subgroup]['mean_longitude'] = np.mean(hier_dict[subgroup]['longitude'])

#### **Building Links**

In [29]:
# build links
links = []
for subgroup in hier_dict:
    prev_hier_id = -1
    hierarchy = subgroup.split(', ')
    for i in range(1, len(hierarchy)+1): # ordered levels split by commas
        subgroup = ", ".join(hierarchy[:i])
        try: 
            hier_id = hier_dict[subgroup]['id'] 
        except:
            print(subgroup, "\n", hierarchy)
        links.append({
            'source': prev_hier_id,
            'target': hier_id,
            'length': i, # path length
        })
        prev_hier_id = hier_id
        

In [30]:
links = sorted(links, key=lambda x: (x['source'], x['target']))
links[:10]

[{'source': -1, 'target': 0, 'length': 1},
 {'source': -1, 'target': 0, 'length': 1},
 {'source': -1, 'target': 0, 'length': 1},
 {'source': -1, 'target': 0, 'length': 1},
 {'source': -1, 'target': 0, 'length': 1},
 {'source': -1, 'target': 0, 'length': 1},
 {'source': -1, 'target': 0, 'length': 1},
 {'source': -1, 'target': 0, 'length': 1},
 {'source': -1, 'target': 0, 'length': 1},
 {'source': -1, 'target': 0, 'length': 1}]

In [31]:
unique_links = []
for i in range(1, len(links)):
    if links[i] != links[i-1]:
        unique_links.append(links[i])
unique_links[:10]

[{'source': -1, 'target': 11, 'length': 1},
 {'source': -1, 'target': 15, 'length': 1},
 {'source': -1, 'target': 18, 'length': 1},
 {'source': -1, 'target': 22, 'length': 1},
 {'source': -1, 'target': 29, 'length': 1},
 {'source': -1, 'target': 36, 'length': 1},
 {'source': -1, 'target': 51, 'length': 1},
 {'source': -1, 'target': 59, 'length': 1},
 {'source': -1, 'target': 69, 'length': 1},
 {'source': -1, 'target': 116, 'length': 1}]

In [32]:
print(len(unique_links)) # this will probably be < total nodes since targets may be final nodes (~ 7k languages)

10176


In [33]:
a = 0
for i in unique_links:
    if i['length'] > a:
        a = i['length']
print(a)

15


#### **Building Nodes**

In [34]:
nodes = []
for subgroup in hier_dict:

    subgroup_node_data = {
        'id': hier_dict[subgroup]['id'],
        'name': hier_dict[subgroup]['name'],
        'country': hier_dict[subgroup]['country'], # get most freq country from subgroups
        'score': hier_dict[subgroup]['score'], # get most freq scores from subgroups
        'latitude': round(hier_dict[subgroup]['mean_latitude'], 2),
        'longitude': round(hier_dict[subgroup]['mean_longitude'], 2),
        'length': hier_dict[subgroup]['length']
    }
    nodes.append(subgroup_node_data)

In [35]:
nodes[:10]

[{'id': 0,
  'name': 'Niger-Congo',
  'country': 'Nigeria',
  'score': '6a',
  'latitude': 3.75,
  'longitude': 12.1,
  'length': 1},
 {'id': 1,
  'name': 'Atlantic-Congo',
  'country': 'Nigeria',
  'score': '6a',
  'latitude': 3.31,
  'longitude': 12.83,
  'length': 2},
 {'id': 2,
  'name': 'Volta-Congo',
  'country': 'Nigeria',
  'score': '6a',
  'latitude': 2.91,
  'longitude': 14.02,
  'length': 3},
 {'id': 3,
  'name': 'Benue-Congo',
  'country': 'Nigeria',
  'score': '6a',
  'latitude': 0.74,
  'longitude': 17.69,
  'length': 4},
 {'id': 4,
  'name': 'Edoid',
  'country': 'Nigeria',
  'score': '6a',
  'latitude': 6.6,
  'longitude': 6.06,
  'length': 5},
 {'id': 5,
  'name': 'North-Central',
  'country': 'Nigeria',
  'score': '6a',
  'latitude': 7.0,
  'longitude': 6.13,
  'length': 6},
 {'id': 6,
  'name': 'Ghotuo-Uneme-Yekhee',
  'country': 'Nigeria',
  'score': '6a',
  'latitude': 7.13,
  'longitude': 6.17,
  'length': 7},
 {'id': 7,
  'name': 'Ghotuo',
  'country': 'Nigeria',

In [36]:
min(nodes, key=lambda x : x['score'])

{'id': 2683,
 'name': 'Chinese-Mandarin',
 'country': 'China',
 'score': '0',
 'latitude': 39.91,
 'longitude': 116.4,
 'length': 3}

In [37]:
max(nodes, key=lambda x : x['score'])

{'id': 126,
 'name': 'Andamanese',
 'country': 'India',
 'score': 'x10',
 'latitude': 12.33,
 'longitude': 92.79,
 'length': 1}

#### **Making the JSON**

In [38]:
fin_json = {'nodes': nodes, 'links': unique_links}

In [39]:
import copy
fin_json_rename = copy.deepcopy(fin_json)

In [41]:
# renaming EGIDs to human friendly
EGIDS_dict = {
    '0': 'International use (Safe)',
    '1': 'National use (Safe)',
    '2': 'Regionally government/media (Safe_',
    '3': 'Regional trading (Safe)',
    '4': 'Taught throughout education (Safe)',
    '5': 'Written (Safe)',
    '6a': 'Spoken by all generations (Safe)',
    '6b': 'Spoken by all generations (Vulnerable)',
    '7': 'Spoken by parents (Endangered)',
    '8a': 'Spoken by grandparents (Severely Endangered)',
    '8b': 'Spoken by grandparents (Critically Endangered)',
    '9': 'Heritage itentity (Extinct)',
    '10': 'No ethnic identity remains (Extinct)',
    'x10': 'No ethnic identity remains (Extinct)'
}

for dict in fin_json_rename['nodes']:
    dict['score'] = EGIDS_dict[dict['score']]

In [42]:
with open("../data/ethno_links.json", "w") as outfile:
    json.dump(fin_json_rename, outfile)

### **Recursive Mess**

In [64]:
# build lon/lat hierarchal dictionary
hier_dict = {'Top-Level':{'subgroups': {},
                          'name': [],
                          'country': [],
                          'latitude': [],
                          'longitude': [],
                          'id': -1},
                          'code': [],
                          'score': [],
                          'mean_latitude': 0,
                          'mean_longitude': 0}

# build tree to leaf languages
node_index = 0
for row in range(tol_df_cc.shape[0]):
    curr_lang = tol_df_cc.iloc[row]
    classification = curr_lang['Classification']
    prev_hier_dict = hier_dict['Top-Level']['subgroups']

    # for each row, look at each hierarchal structure
    for subgroup in classification.split(", ") + [curr_lang['Language_Name']]: # ordered levels split by commas

        # if we haven't seen a child before, then define a new empty node
        if subgroup not in prev_hier_dict: 
            # define child node
            prev_hier_dict[subgroup] = {
                'subgroups': {}, 
                'name' : "",
                'country': [],
                'latitude': [],
                'longitude': [],
                'id': 0,
                'code': [],
                'score': [],
            }        
            prev_hier_dict[subgroup]['id'] = node_index
            prev_hier_dict[subgroup]['name'] = subgroup
            node_index += 1 # index for every subgroup
        
        # get child node lat/lon for aggregation later
        prev_hier_dict[subgroup]['latitude'].append(curr_lang['Latitude'])
        prev_hier_dict[subgroup]['longitude'].append(curr_lang['Longitude'])
        prev_hier_dict[subgroup]['code'].append(curr_lang['ISO_639']) # language codes
        prev_hier_dict[subgroup]['score'].append(curr_lang['EGIDS']) # egids scores
        prev_hier_dict[subgroup]['country'].append(curr_lang['Country_Name']) # country names

        # iterative recursion
        prev_hier_dict = prev_hier_dict[subgroup]['subgroups']

print("Total Nodes: ", node_index)


Total Nodes:  10177


In [73]:
def get_all_keys(d):
    for key, value in d.items():
        yield key
        if isinstance(value, dict):
            yield from get_all_keys(value)

i = 0
for x in get_all_keys(hier_dict):
    i += 1
print(i)

KeyError: 'subgroups'

In [41]:
sorted(list(hier_dict['Top-Level']['latitude']))

[]

In [None]:
prev_layer = hier_dict['Top-Level']['subgroups']
for subgroup in prev_layer:
    prev_layer[subgroup]['mean_latitude'] = np.mean(prev_layer[subgroup]['latitude'])
    prev_layer[subgroup]['mean_longitude'] = np.mean(prev_layer[subgroup]['longitude'])
    

In [65]:
# getting average lat/lon recursively
def aggregate_locations(hier_dict):
    """
    in-place aggregation of locations for every group (currently only mean)
    """

    if not hier_dict['subgroups']:
        # basecase
        hier_dict['mean_latitude'] = hier_dict['latitude'][0]
        hier_dict['mean_longitude'] = hier_dict['longitude'][0]
        return

    for subgroup in dict['subgroups']:
        hier_dict['subgroups'][subgroup]['mean_latitude'] = np.mean(hier_dict['subgroups'][subgroup]['latitude'])
        hier_dict['subgroups'][subgroup]['mean_longitude'] = np.mean(hier_dict['subgroups'][subgroup]['longitude'])
        return aggregate_locations(hier_dict['subgroups'][subgroup])

aggregate_locations(hier_dict['Top-Level'])

TypeError: aggregate_locations() takes 0 positional arguments but 1 was given

In [66]:
hier_dict['Top-Level']['subgroups']['Niger-Congo']['country']

['Nigeria',
 'Nigeria',
 'Côte d’Ivoire',
 'Cameroon',
 'Côte d’Ivoire',
 'Nigeria',
 'Nigeria',
 'Nigeria',
 'Ghana',
 'Côte d’Ivoire',
 'Ghana',
 'Nigeria',
 'Sudan',
 'Ghana',
 'Cameroon',
 'Togo',
 'Côte d’Ivoire',
 'Ghana',
 'Nigeria',
 'Cameroon',
 'Nigeria',
 'Nigeria',
 'Nigeria',
 'Ghana',
 'Nigeria',
 'Nigeria',
 'Democratic Republic of the Congo',
 'Cameroon',
 'Cameroon',
 'Ghana',
 'Côte d’Ivoire',
 'Togo',
 'Côte d’Ivoire',
 'Nigeria',
 'Côte d’Ivoire',
 'Nigeria',
 'Nigeria',
 'Central African Republic',
 'Benin',
 'Ghana',
 'Nigeria',
 'Nigeria',
 'Ghana',
 'Togo',
 'Cameroon',
 'Congo',
 'Nigeria',
 'Côte d’Ivoire',
 'Nigeria',
 'Nigeria',
 'Nigeria',
 'Ghana',
 'Nigeria',
 'Cameroon',
 'Nigeria',
 'Côte d’Ivoire',
 'Mali',
 'Nigeria',
 'Tanzania',
 'Nigeria',
 'Cameroon',
 'Cameroon',
 'Nigeria',
 'Côte d’Ivoire',
 'Cameroon',
 'Benin',
 'Zambia',
 'Nigeria',
 'Côte d’Ivoire',
 'Ghana',
 'Nigeria',
 'Nigeria',
 'Central African Republic',
 'Benin',
 'Nigeria',
 'Togo'

In [67]:
hier_dict['Top-Level']['subgroups']['Indo-European']['subgroups']['Germanic']['subgroups']['West']['subgroups']['English']

{'subgroups': {'English': {'subgroups': {},
   'name': 'English',
   'country': ['United Kingdom'],
   'latitude': [52.2486],
   'longitude': [-0.2397],
   'id': 3409,
   'code': ['eng'],
   'score': ['0']},
  'Scots': {'subgroups': {},
   'name': 'Scots',
   'country': ['United Kingdom'],
   'latitude': [56.5723],
   'longitude': [-3.8661],
   'id': 8032,
   'code': ['sco'],
   'score': ['5']}},
 'name': 'English',
 'country': ['United Kingdom', 'United Kingdom'],
 'latitude': [52.2486, 56.5723],
 'longitude': [-0.2397, -3.8661],
 'id': 3408,
 'code': ['eng', 'sco'],
 'score': ['0', '5']}

In [68]:
tol_df_cc.iloc[0]

ISO_639                                                          aaa
Language_Name                                                 Ghotuo
Uninverted_Name                                               Ghotuo
Country_Code                                                      NG
Country_Name                                                 Nigeria
Region_Code                                                      WAF
Region_Name                                           Western Africa
Area                                                          Africa
L1_Users                                                      9000.0
Digits                                                           4.0
All_Users                                                     9000.0
Countries                                                          1
Family                                                   Niger-Congo
Classification     Niger-Congo, Atlantic-Congo, Volta-Congo, Benu...
Latitude                          

In [69]:
hier_dict['Top-Level']['subgroups']['Niger-Congo']['subgroups']['Atlantic-Congo']['subgroups']['Volta-Congo']['id']

2

#### **Structuring JSON**

In [29]:
def build_graph(hier_dict):
    hier_graph = {
        'nodes': [],
        'links': []
    }

    def build_graph_features(hier_dict):
        """
        build graph map
        """
        # basecase
        if not hier_dict['subgroups']:
            countries = hier_dict['country']
            scores = hier_dict['score']

            subgroup_node_data = {
                'id': hier_dict['id'],
                'name': hier_dict['name'],
                'country': max(set(countries), key=countries.count), # get most freq country from subgroups
                'score': max(set(scores), key=scores.count), # get most freq scores from subgroups
                'latitude': hier_dict['mean_latitude'],
                'longitude': hier_dict['mean_longitude'] 
            }
            hier_graph['nodes'].append(subgroup_node_data)
            subgroup_link_data = {
                'source' : hier_dict['id'],
                'target' : [float("-inf")]
            }
            hier_graph['links'].append(subgroup_link_data)
            return

        for subgroup in hier_dict['subgroups']:
            countries = hier_dict['subgroups'][subgroup]['country']
            scores = hier_dict['subgroups'][subgroup]['score']

            subgroup_node_data = {
                'id': hier_dict['id'],
                'name': hier_dict['name'],
                'country': max(set(countries), key=countries.count), # get most freq country from subgroups
                'score': max(set(scores), key=scores.count), # get most freq scores from subgroups
                'latitude': hier_dict['mean_latitude'],
                'longitude': hier_dict['mean_longitude'] 
            }
            hier_graph['nodes'].append(subgroup_node_data)
            subgroup_link_data = {
                'source' : hier_dict['id'],
                'target' : [hier_dict['subgroups'][sub]['id'] for sub in hier_dict['subgroups']]
            }
            hier_graph['links'].append(subgroup_link_data)
            return build_graph_features(hier_dict['subgroups'][subgroup])
            
    build_graph_features(hier_dict['Top-Level'])

    return hier_graph

fin_json = build_graph(hier_dict)
fin_json

KeyError: 'mean_latitude'

In [None]:
fin_json['nodes'][522]['id']

6033

In [None]:
fin_json['links'][522]

{'source': 6033, 'target': [-inf]}

In [None]:
max([s['id'] for s in fin_json['nodes']])

10176

In [None]:
max([s['source'] for s in fin_json['links']])

10176

In [None]:
max([max(s['target']) for s in fin_json['links']])

10176

In [None]:
# make json unique to save space
nodes = sorted(fin_json['nodes'], key = lambda x : x['id'])
unique_nodes = []
for i in range(1, len(nodes)):
    if nodes[i] == nodes[i-1]:
        pass
    elif nodes[i]['id'] != -1:
        unique_nodes.append(nodes[i])
print(len(nodes))
print(len(unique_nodes))

17140
14855


In [None]:
unique_nodes

[{'id': 0,
  'name': 'Niger-Congo',
  'country': 'Nigeria',
  'score': '6a',
  'latitude': 3.7501451459606243,
  'longitude': 12.096057298031228},
 {'id': 0,
  'name': 'Niger-Congo',
  'country': 'Sudan',
  'score': '6b',
  'latitude': 3.7501451459606243,
  'longitude': 12.096057298031228},
 {'id': 0,
  'name': 'Niger-Congo',
  'country': 'Côte d’Ivoire',
  'score': '6a',
  'latitude': 3.7501451459606243,
  'longitude': 12.096057298031228},
 {'id': 0,
  'name': 'Niger-Congo',
  'country': 'Côte d’Ivoire',
  'score': '6b',
  'latitude': 3.7501451459606243,
  'longitude': 12.096057298031228},
 {'id': 1,
  'name': 'Atlantic-Congo',
  'country': 'Nigeria',
  'score': '6a',
  'latitude': 3.309481752353367,
  'longitude': 12.831211947863869},
 {'id': 1,
  'name': 'Atlantic-Congo',
  'country': 'Senegal',
  'score': '5',
  'latitude': 3.309481752353367,
  'longitude': 12.831211947863869},
 {'id': 2,
  'name': 'Volta-Congo',
  'country': 'Nigeria',
  'score': '6a',
  'latitude': 2.905315713196

In [None]:
unique_nodes

In [None]:
with open("../data/ethno_links.json", "w") as outfile:
    json.dump(fin_json, outfile)