# Compute Knowledge Graphs of the World

This notebook computes the hierarchy or Countries. and their neighbourhood relationships, at various levels, countries, regions, districts. It creates a graph representation that can be traversed according to an entity belonging to another entity, or an entity neighbouring an entity. Uses cases are
* geolocation and neighbourhood relations (e.g. infection status)
* computation of the notion of a border (travel restrictions)
* airtravel
* international trade
* mobility themes (e.g. weekend travel)

This notebook requires data to be downloaded from
* https://gadm.org/download_world.html, this is a roughly 2GB compressed shapefile file found [here](https://biogeo.ucdavis.edu/data/gadm3.6/gadm36_shp.zip). Note this file has vector data which contains defects. See also https://github.com/AmericanRedCross/simplegadm
* http://www.naturalearthdata.com/downloads/10m-cultural-vectors/ this is a 4 MB compressed shapefile for sovereignities found [here](https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/cultural/ne_10m_admin_0_sovereignty.zip)

This notebook requires an environment which has
* geopandas >= 0.7.0
* pandas
* pyarrow or fastparquet
* networkx

installed.

There is a lot to be said about what a country, a nation, a sovereignity is. This notebook is created to support 2020 COVID-19 disease outbreak modelling and its primary purpose is to compute neighbourhood relationships for country with health data at detail level below a nation. Simplifications will apply.

Please ensure your intended use to comply with
* [GADM license](https://gadm.org/license.html)
* [Natural Earth Terms of Use](http://www.naturalearthdata.com/about/terms-of-use/)

In [39]:
!pip install geopandas



In [40]:
import geopandas as gpd
import pandas as pd
import fiona
import networkx as nx
from pathlib import Path
import os
from ipywidgets import Dropdown, HBox, HTML
from IPython.display import display
from bokeh.io import output_notebook, show
from bokeh.plotting import figure
from bokeh.models import HoverTool, Text, Circle, MultiLine
try: # bokeh > 1.0.4
    from bokeh.plotting import from_networkx, NodesAndLinkedEdges, EdgesAndLinkedNodes
except:
    from bokeh.models.graphs import from_networkx, NodesAndLinkedEdges, EdgesAndLinkedNodes
#from bokeh.models import Circle, MultiLine, HoverTool, Text
#output_notebook()
try:
    import pyvis
except:
    !pip install jsonpickle pyvis
    import pyvis
from ipywidgets import IntProgress
from IPython.display import display
import requests
import zipfile
import io
import urllib
try:
    from project_lib import Project
    CLOUDPAK = True
except:
    CLOUDPAK = False

import threading
import numpy as np
from threading import Thread

try:
    import descartes
except:
    !pip install descartes
    import descartes

output_notebook()

## Helper Library

This code is used across the notebook.

In [41]:
class THEWORLD():
    def __init__(self,ROOTFOLDER):
        self.ROOT = os.path.join(ROOTFOLDER,"sun/geo/naturalearthdata.com_downloads/")
        self.dfSovereignities = gpd.read_file(os.path.join(self.ROOT,"ne_10m_admin_0_sovereignty.shp"))
        self.dfWorld = gpd.read_file(os.path.join(self.ROOT,"ne_10m_admin_0_countries.shp"))
        self.dfMapdata = gpd.read_file(os.path.join(self.ROOT,"ne_10m_admin_0_map_units.shp"))
                                       
    def search_info(self,CODE,NAME):
        if len(self.dfWorld[self.dfWorld.SOV_A3 == CODE]) == 1:
            result = self.dfWorld[self.dfWorld.SOV_A3 == CODE]
            retval = {"CODE":CODE,"REGION_UN":result["REGION_UN"].values[0],"SUBREGION":result["SUBREGION"].values[0],"NAME":result["ADMIN"].values[0],"name":NAME}
        elif len(self.dfWorld[self.dfWorld.ADM0_A3 == CODE]) == 1:
            result = self.dfWorld[self.dfWorld.ADM0_A3 == CODE]
            retval = {"CODE":CODE,"REGION_UN":result["REGION_UN"].values[0],"SUBREGION":result["SUBREGION"].values[0],"NAME":result["ADMIN"].values[0],"name":NAME}
        elif len(self.dfSovereignities[self.dfSovereignities.ADM0_A3 == CODE]) == 1:
            result = self.dfSovereignities[self.dfSovereignities.ADM0_A3 == CODE]
            retval = {"CODE":CODE,"REGION_UN":result["REGION_UN"].values[0],"SUBREGION":result["SUBREGION"].values[0],"NAME":result["ADMIN"].values[0],"name":NAME}
        elif len(self.dfMapdata[self.dfMapdata.ISO_A3 == CODE]) == 1:
            result = self.dfMapdata[self.dfMapdata.ISO_A3 == CODE]
            retval = {"CODE":CODE,"REGION_UN":result["REGION_UN"].values[0],"SUBREGION":result["SUBREGION"].values[0],"NAME":result["ADMIN"].values[0],"name":NAME}
        elif len(self.dfMapdata[self.dfMapdata.SOV_A3 == CODE]) == 1:
            result = self.dfMapdata[self.dfMapdata.SOV_A3 == CODE]
            retval = {"CODE":CODE,"REGION_UN":result["REGION_UN"].values[0],"SUBREGION":result["SUBREGION"].values[0],"NAME":result["ADMIN"].values[0],"name":NAME}
            
        elif CODE == "XAD": # Akrotiri and Dhekelia. two military bases on Cyprus
            result = self.dfSovereignities[self.dfSovereignities.ISO_A3 == "CYP"]
            retval = {"CODE":CODE,"REGION_UN":result["REGION_UN"].values[0],"SUBREGION":result["SUBREGION"].values[0],"NAME":NAME,"name":NAME}
            
        elif CODE == "XNC": # Northen Cyprus, w/o getting political, use Cyprus
            result = self.dfSovereignities[self.dfSovereignities.ISO_A3 == "CYP"]
            retval = {"CODE":CODE,"REGION_UN":result["REGION_UN"].values[0],"SUBREGION":result["SUBREGION"].values[0],"NAME":NAME,"name":NAME}
            
        elif CODE == "XCA": # Caspian Sea, lets use Kazakhstan
            result = self.dfWorld[self.dfWorld.ISO_A3 == "KAZ"]
            retval = {"CODE":CODE,"REGION_UN":result["REGION_UN"].values[0],"SUBREGION":result["SUBREGION"].values[0],"NAME":NAME,"name":NAME}
            
        elif CODE == "XKO": #Kozovo, use Serbia
            result = self.dfWorld[self.dfWorld.ISO_A3 == "SRB"]
            retval = {"CODE":CODE,"REGION_UN":result["REGION_UN"].values[0],"SUBREGION":result["SUBREGION"].values[0],"NAME":NAME,"name":NAME}
            
        elif CODE == "PSE": # Palestine, use Israel
            result = self.dfWorld[self.dfWorld.ISO_A3 == "ISR"]
            retval = {"CODE":CODE,"REGION_UN":result["REGION_UN"].values[0],"SUBREGION":result["SUBREGION"].values[0],"NAME":NAME,"name":NAME}
            
        elif CODE == "XCL": # Clipperton Island, actually not inhabited. Lets use American Samoa
            result = self.dfWorld[self.dfWorld.ISO_A3 == "ASM"]
            retval = {"CODE":CODE,"REGION_UN":result["REGION_UN"].values[0],"SUBREGION":result["SUBREGION"].values[0],"NAME":NAME,"name":NAME}
            
        elif CODE == "XPI" or CODE == "XSP": # Paracel Islands and Spratly Islands, South China sea, could be China, Viet Nam, Taiwan
            result = self.dfWorld[self.dfWorld.ISO_A3 == "PHL"]
            retval = {"CODE":CODE,"REGION_UN":result["REGION_UN"].values[0],"SUBREGION":result["SUBREGION"].values[0],"NAME":NAME,"name":NAME}
            
        else:
            retval = {"CODE":CODE,"error":True,"name":NAME}
        return retval

    
def find_node_by_attribute_value(graph,attribute,value,kind=""):
    for n,d in graph.nodes(data=True):
        if attribute not in d.keys():
            continue
        elif kind != "":
            if d[attribute] == value and d["kind"] == kind:
                retval = n
                break
        else:
            if d[attribute] == value:
                retval = n
                break
    else:
        retval = None
        
    return retval

def find_edges_of_kind(graph,kind):
    return [(u,v) for u,v,d in graph.edges(data=True) if d['kind']==kind]

def find_edges_of_key_value(graph,key,value):
    retval = []
    for edge in graph.edges:
        edge_dict = graph.edges[edge]
        print(edge,edge_dict)
        if edge_dict.get(key) is None:
            pass
        elif edge_dict.get(key) == value:
            retval.append(edge)
            #print(edge)
    return retval

def find_nodes_of_kind(graph,kind):
    retval = []
    for u,d in graph.nodes(data=True):
        if 'kind' in d.keys():
            if d['kind'] == kind:
                retval.append(u)
    return retval

def find_nodes_of_key_value(graph,key,value):
    retval = []
    for node in graph.nodes:
        node_dict = graph.nodes[node]
        #print(node,node_dict)
        if node_dict.get(key) is None:
            pass
        elif node_dict.get(key) == value:
            retval.append(node)
            #print(node)
    return retval

def create_subgraph_of_nodes(graph,nodes,edges):
    #nodes = find_nodes_of_kind(graph,kind_nodes)
    #edges = find_edges_of_kind(graph,kind_edges)
    return nx.restricted_view(graph,set(graph.nodes).difference(list(nodes)),set(graph.edges).difference(list(edges))).copy()

## Download Data

In [42]:
if CLOUDPAK:
    ROOTFOLDER = "/project_data/data_asset/"
else:
    ROOTFOLDER = "./"

In [43]:
def download_and_unzip(url,root,subfolder,filename=""):
    ROOT = os.path.join(root,subfolder)
    Path(ROOT).mkdir(exist_ok=True,parents=True)
    if filename == "":
        filename = os.path.split(urllib.parse.urlparse(url).path)[1]
    else:
        pass
    
    downloaded_file = os.path.join(ROOT,filename)
    if not os.path.exists(downloaded_file):
        r = requests.get(url)
        if r.ok:
            content = r.content
            with open(downloaded_file,"w+b") as outfile:
                outfile.write(content)
        else:
            return False
    else:
        with open(downloaded_file,"rb") as infile:
            content = infile.read()
    
    with zipfile.ZipFile(downloaded_file) as zf:
        zf.extractall(ROOT)
        
    return True

required_data = ["https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/cultural/ne_10m_admin_0_countries.zip",
                "https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/cultural/ne_10m_admin_0_map_subunits.zip",
                "https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/cultural/ne_10m_admin_0_sovereignty.zip",
                "https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/cultural/ne_10m_admin_1_states_provinces.zip",
                "https://opendata.arcgis.com/datasets/fef73aeaf13c417dadf2fc99abcf8eef_0.zip", # UK
                "https://opendata.arcgis.com/datasets/248e105774144a27aca2dfbfe080fc9d_0.zip", # Germany Kreisgrenzen 2019
                "https://opendata.arcgis.com/datasets/9f5d82911d4545c4be1da8cab89f21ae_0.zip", # Berlin
                "http://www.istat.it/storage/cartografia/confini_amministrativi/generalizzati/Limiti01012020_g.zip", # Italy, multiple levels
                #"https://www.data.gouv.fr/fr/datasets/r/eb36371a-761d-44a8-93ec-3d728bec17ce", # France, NB, need a file name
                "http://osm13.openstreetmap.fr/~cquest/openfla/export/departements-20180101-shp.zip", # France
                ]

for rd in required_data:
    filename = ""
    if "naturalearth" in rd:
        sub_root = "sun/geo/naturalearthdata.com_downloads"
    elif "arcgis" in rd:
        sub_root = "sun/geo/arcgis.com_datasets"
    elif "istat.it" in rd:
        sub_root = "sun/geo/istat.it"
    elif "gouv.fr" in rd:
        sub_root = "sun/geo/data.gouv.fr"
    elif "osm13.openstreetmap.fr" in rd:
        sub_root = "sun/geo/osm13.openstreetmap.fr"
        #filename = "departements-20180101.shp"
    if download_and_unzip(rd,ROOTFOLDER,sub_root,filename=filename):
        print(end=".")
    else:
        print("problem downloading {}".format(rd),end=" ")

.........

# Compute UN Regions, Subregions, and put Countries onto the World Map

This does include entries like Clipperton Island, who's existence was unknown to the author until recently. It is actually not inhabited, so should remain unaffected by COVID-19, but it is good to have a complete dataset.

# Build Graph

This builds up data structures as a geo graph. The hierarchy, specifed by the node `kind` attribute
* continent
* subregion
* country
* state

The edges are relations of `kind`, taken from https://schema.org/Country
* containsPlace
* geoTouches

The class hierarchy is thus

`Europe --containsPlace--> Western Europe --containsPlace--> France`

and

`Netherlands -->geoTouches--> Belgium -->geoTouches--> Germany ` 

In [13]:
gfWorld = gpd.read_file(os.path.join(ROOTFOLDER,"sun/geo/naturalearthdata.com_downloads/ne_10m_admin_0_countries.shp"),encoding="utf-8")

World = nx.DiGraph(name="",label="",kind="",id="", sov_a3="", iso_a3="")
labels = {}

World.add_node(0,kind="root",name="Earth",label="Earth")

index = max(list(World.nodes))+1

for c in gfWorld.REGION_UN.unique():
    World.add_node(index,kind="continent",name=c,label=c)
    World.add_edge(0,index,kind="containsPlace")
    labels[index] = c
    index += 1
    
index = max(list(World.nodes))+1

for i,row in gfWorld[["REGION_UN","SUBREGION"]].drop_duplicates().iterrows():
    region_un = find_node_by_attribute_value(World,"name",row["REGION_UN"],kind="continent")
    World.add_node(index,kind="subregion",name=row["SUBREGION"],label=row["SUBREGION"])
    World.add_edge(region_un,index,kind="containsPlace")
    labels[index] = row["SUBREGION"]
    index += 1

if not CLOUDPAK:
    nt = pyvis.network.Network("600px","600px",notebook=True)
    nt.from_nx(World)
    nt.show("figure1.html")
else:
    p = figure(title="World", x_range=(-1.1,1.1), y_range=(-1.1,1.1))
    graph = from_networkx(World, nx.spring_layout, scale=0.75, center=(0,0))
    
    graph.node_renderer.glyph=Circle(size=15)
    
    node_hover_tool = HoverTool(tooltips=[("index", "@index"), ("name", "@name")])
    p.add_tools(node_hover_tool)
    p.renderers.append(graph)
    
    show(p)

## Countries

In [14]:
for i,row in gfWorld[["NAME","ADM0_A3","SOV_A3","SUBREGION","ISO_A2","ISO_A3"]].drop_duplicates().iterrows():
    subregion = find_node_by_attribute_value(World,"name",row["SUBREGION"],kind="subregion")
    World.add_node(index,kind="country",name=row["NAME"],label=row["NAME"],sov_a3=row["SOV_A3"],id=row["ADM0_A3"],iso_a3=row["ISO_A3"],iso_a2=row["ISO_A2"])
    World.add_edge(subregion,index,kind="containsPlace")
    labels[index] = row["NAME"]
    index += 1

if not CLOUDPAK:
    nt = pyvis.network.Network("1000px","1000px",notebook=True)
    nt.from_nx(World)
    nt.show("figure2.html")
else:
    p = figure(title="World", x_range=(-1.1,1.1), y_range=(-1.1,1.1))
    graph = from_networkx(World, nx.spring_layout, scale=0.75, center=(0,0))
    
    graph.node_renderer.glyph=Circle(size=15)
    
    node_hover_tool = HoverTool(tooltips=[("index", "@index"), ("name", "@name")])
    p.add_tools(node_hover_tool)
    p.renderers.append(graph)
    
    show(p)

In [15]:
gfStates = gpd.read_file(os.path.join(ROOTFOLDER,"sun/geo/naturalearthdata.com_downloads/ne_10m_admin_1_states_provinces.shp"),encoding="utf-8")

for i,row in gfStates[["iso_a2","name","iso_3166_2","adm1_code","gu_a3","fips"]].drop_duplicates().iterrows():
    name_0 = find_node_by_attribute_value(World,"id",row["gu_a3"],kind="country")
    World.add_node(index,kind="state",name=row["name"],label=row["name"],id=row["iso_3166_2"],adm1_code=row["adm1_code"],fips=row["fips"],iso_3=row["gu_a3"])
    World.add_edge(name_0,index,kind="containsPlace")
    labels[index] = row["name"]
    index += 1

In [17]:
wD = Dropdown(options=sorted(gfWorld.NAME.unique()))
display(wD)

Dropdown(options=('Afghanistan', 'Akrotiri', 'Albania', 'Algeria', 'American Samoa', 'Andorra', 'Angola', 'Ang…

In [18]:
"""nt = pyvis.network.Network("500px","500px",notebook=True)
nodes = find_node_by_attribute_value(World,"name",wD.value,kind="country")
nt.from_nx(nx.ego_graph(World,nodes))
nt.show("figure3.html")
"""
if not CLOUDPAK:
    nt = pyvis.network.Network("600px","600px",notebook=True)
    nt.from_nx(World)
    nt.show("figure3.html")
else:
    p = figure(title="World", x_range=(-1.1,1.1), y_range=(-1.1,1.1))
    nodes = find_node_by_attribute_value(World,"name",wD.value,kind="country")
    graph = from_networkx(nx.ego_graph(World,nodes), nx.spring_layout, scale=0.75, center=(0,0))
    
    graph.node_renderer.glyph=Circle(size=15)
    
    node_hover_tool = HoverTool(tooltips=[("index", "@index"), ("name", "@name")])
    p.add_tools(node_hover_tool)
    p.renderers.append(graph)
    
    show(p)

# Neighbours

## Country or Nation level

In [19]:
save = World.copy() # just to avoid to have to recreate during development

In [20]:
World = save.copy() # just to avoid to have to recreate during development

In [21]:
wIP = IntProgress(min=0,max=len(gfWorld.ADM0_A3.unique()))
wIPempty = IntProgress(min=0,max=len(gfWorld.ADM0_A3.unique()))
display(HBox([wIP,wIPempty]))

FIELD = "ADM0_A3"

alldata = []
neighbours_adm1_code = []
neighbours_name = []

#for adm in ["DEU-1591"]:#gf.adm1_code.unique():
for adm in gfWorld[FIELD].unique():
    me = gfWorld[gfWorld[FIELD] == adm]
    ggf = gpd.read_file(os.path.join(ROOTFOLDER,"sun/geo/naturalearthdata.com_downloads/ne_10m_admin_0_countries.shp"),bbox=list(me.bounds.values[0]),encoding="utf-8")
    ggf = ggf[ggf[FIELD] != adm]
    if len(ggf) <= 0:
        wIPempty.value += 1
        continue
        
    mask = ggf.apply(lambda row: row['geometry'].touches(me.geometry.values[0]), axis=1)
    if len(ggf[mask])>0:
        neighbours = ggf[mask][FIELD]
        if "ESB" in neighbours:
            break
    
    alldata.append({"me":me[FIELD],"neighbours":neighbours})
    for n in neighbours:
        neighbours_adm1_code.append({"me":adm,"neighbour":n})
        
    for i,row in ggf[ggf[FIELD].isin(neighbours)].iterrows():
        neighbours_name.append({"me_"+FIELD:adm,"neighbour_"+FIELD:row[FIELD],"me_name":me["NAME"].values[0],
                                "me_iso_a2":me["ISO_A2"].values[0],"neighbour_iso_a2":row["ISO_A2"],
                                "me_iso_a3":me["ISO_A3"].values[0],"neighbour_iso_a3":row["ISO_A3"],
                                "neighbour_name":row["NAME"],FIELD:row[FIELD]})
    wIP.value += 1

HBox(children=(IntProgress(value=0, max=255), IntProgress(value=0, max=255)))

## Decouple Eurasia/Africa from the Americas

We need to cut French Guiana from mainland France as we want to model infection spread across boundaries. Note we could also tie Denmark to Sweden but the Alesund bridge is an entity that can easily be controlled and would have a different "flavour" than a normal land boundary.

In [22]:
dfNeighbours = pd.DataFrame(neighbours_name)
dfNeighbours = dfNeighbours.drop(dfNeighbours[(dfNeighbours.me_ADM0_A3 == "FRA")&(dfNeighbours.neighbour_ADM0_A3 == "SUR")].index)
dfNeighbours = dfNeighbours.drop(dfNeighbours[(dfNeighbours.me_ADM0_A3 == "FRA")&(dfNeighbours.neighbour_ADM0_A3 == "BRA")].index)
dfNeighbours = dfNeighbours.drop(dfNeighbours[(dfNeighbours.me_ADM0_A3 == "SUR")&(dfNeighbours.neighbour_ADM0_A3 == "FRA")].index)
dfNeighbours = dfNeighbours.drop(dfNeighbours[(dfNeighbours.me_ADM0_A3 == "BRA")&(dfNeighbours.neighbour_ADM0_A3 == "FRA")].index)
dfNeighbours.head(5)

Unnamed: 0,ADM0_A3,me_ADM0_A3,me_iso_a2,me_iso_a3,me_name,neighbour_ADM0_A3,neighbour_iso_a2,neighbour_iso_a3,neighbour_name
0,MYS,IDN,ID,IDN,Indonesia,MYS,MY,MYS,Malaysia
1,TLS,IDN,ID,IDN,Indonesia,TLS,TL,TLS,Timor-Leste
2,PNG,IDN,ID,IDN,Indonesia,PNG,PG,PNG,Papua New Guinea
3,IDN,MYS,MY,MYS,Malaysia,IDN,ID,IDN,Indonesia
4,THA,MYS,MY,MYS,Malaysia,THA,TH,THA,Thailand


In [23]:
wIP = IntProgress(min=0,max=len(dfNeighbours))
display(wIP)

for i,row in dfNeighbours.iterrows():
    #me = row["me_iso_a2"]
    me = row["me_ADM0_A3"]
    meNode = find_node_by_attribute_value(World,"id",me,kind="country")
    if meNode == None:
        print(end=".")
        continue
    #oth = row["neighbour_iso_a2"]
    oth = row["neighbour_ADM0_A3"]
    othNode = find_node_by_attribute_value(World,"id",oth,kind="country")
    #if me == "DEU":
    #    print(me,oth,meNode,othNode,end="..")
    if othNode == None:
        print("-")
        continue
    World.add_edge(meNode,othNode,kind="geoTouches")
    wIP.value += 1
    
Neighbours = create_subgraph_of_nodes(World,find_nodes_of_kind(World,"country"),find_edges_of_kind(World,"geoTouches"))

IntProgress(value=0, max=670)

In [24]:
wD2 = Dropdown(options=sorted(gfWorld.NAME.unique()))
display(wD2)

Dropdown(options=('Afghanistan', 'Akrotiri', 'Albania', 'Algeria', 'American Samoa', 'Andorra', 'Angola', 'Ang…

In [25]:
"""nt = pyvis.network.Network("400px","400px",notebook=True)
nt.from_nx(nx.ego_graph(Neighbours,find_node_by_attribute_value(Neighbours,"name",wD2.value,kind="country")))
#nt.from_nx(World)
nt.show("figure4.html")"""

if not CLOUDPAK:
    nt = pyvis.network.Network("600px","600px",notebook=True)
    nt.from_nx(World)
    nt.show("figure4.html")
else:
    p = figure(title="World", x_range=(-1.1,1.1), y_range=(-1.1,1.1))
    graph = from_networkx(nx.ego_graph(Neighbours,find_node_by_attribute_value(Neighbours,"name",wD2.value,kind="country")), nx.spring_layout, scale=0.75, center=(0,0))
    
    graph.node_renderer.glyph=Circle(size=15)
    
    node_hover_tool = HoverTool(tooltips=[("index", "@index"), ("name", "@name")])
    p.add_tools(node_hover_tool)
    p.renderers.append(graph)
    
    show(p)

In [26]:
"""nt = pyvis.network.Network("1500px","1500px",notebook=True)
nt.from_nx(Neighbours)
nt.show("figure5.html")"""

if not CLOUDPAK:
    nt = pyvis.network.Network("1500","1500px",notebook=True)
    nt.from_nx(Neighbours)
    nt.show("figure4.html")
else:
    p = figure(title="World", x_range=(-1.1,1.1), y_range=(-1.1,1.1))
    graph = from_networkx(Neighbours, nx.spring_layout, scale=0.75, center=(0,0))
    
    graph.node_renderer.glyph=Circle(size=15)
    
    node_hover_tool = HoverTool(tooltips=[("index", "@index"), ("name", "@name")])
    p.add_tools(node_hover_tool)
    p.renderers.append(graph)
    
    show(p)

## State

In [27]:
save = World.copy()

In [28]:
World = save.copy()

In [44]:
wIP = IntProgress(min=0,max=len(gfStates.iso_3166_2.unique()))
wIPempty = IntProgress(min=0,max=len(gfStates.iso_3166_2.unique()))
display(HBox([wIP,wIPempty]))

FIELD = "iso_3166_2"

alldata = []
neighbours_adm1_code = []
neighbours_name = []

#for adm in ["DEU-1591"]:#gf.adm1_code.unique():
for adm in gfStates[FIELD].unique():
    me = gfStates[gfStates[FIELD] == adm]
    ggf = gpd.read_file(os.path.join(ROOTFOLDER,"sun/geo/naturalearthdata.com_downloads/ne_10m_admin_1_states_provinces.shp"),bbox=list(me.bounds.values[0]),encoding="utf-8")
    ggf = ggf[ggf[FIELD] != adm]
    if len(ggf) <= 0:
        wIPempty.value += 1
        continue
        
    mask = ggf.apply(lambda row: row['geometry'].touches(me.geometry.values[0]), axis=1)
    if len(ggf[mask])>0:
        neighbours = ggf[mask][FIELD]
        if "ESB" in neighbours:
            break
    
    alldata.append({"me":me[FIELD],"neighbours":neighbours})
    for n in neighbours:
        neighbours_adm1_code.append({"me":adm,"neighbour":n})
        
    for i,row in ggf[ggf[FIELD].isin(neighbours)].iterrows():
        neighbours_name.append({"me_"+FIELD:adm,"neighbour_"+FIELD:row[FIELD],"me_name":me["name"].values[0],
                                #"me_iso_a2":me["ISO_A2"].values[0],"neighbour_iso_a2":row["ISO_A2"],
                                #"me_iso_a3":me["ISO_A3"].values[0],"neighbour_iso_a3":row["ISO_A3"],
                                "neighbour_name":row["name"],FIELD:row[FIELD]})
    wIP.value += 1

HBox(children=(IntProgress(value=0, max=4499), IntProgress(value=0, max=4499)))

In [45]:
dfNeighbours.head()

Unnamed: 0,iso_3166_2,me_iso_3166_2,me_name,neighbour_iso_3166_2,neighbour_name
0,UY-PA,AR-E,Entre Ríos,UY-PA,Paysandú
1,UY-AR,AR-E,Entre Ríos,UY-AR,Artigas
2,UY-SA,AR-E,Entre Ríos,UY-SA,Salto
3,AR-W,AR-E,Entre Ríos,AR-W,Corrientes
4,AR-B,AR-E,Entre Ríos,AR-B,Buenos Aires


In [46]:
dfNeighbours = pd.DataFrame(neighbours_name)

wIP = IntProgress(min=0,max=len(dfNeighbours)/100)
display(wIP)

counter = 0
for i,row in dfNeighbours.iterrows():
    #me = row["me_iso_a2"]
    me = row["me_iso_3166_2"]
    meNode = find_node_by_attribute_value(World,"id",me,kind="state")
    if meNode == None:
        print(end=".")
        continue
    #oth = row["neighbour_iso_a2"]
    oth = row["neighbour_iso_3166_2"]
    othNode = find_node_by_attribute_value(World,"id",oth,kind="state")
    if me == "DEU":
        print(me,oth,meNode,othNode,end="..")
    if othNode == None:
        print("-")
        continue
    World.add_edge(meNode,othNode,kind="geoTouches")
    counter += 1
    if counter % 100 == 0: # avoid IOPub message rate exceeded.
        wIP.value += 1
    
Neighbours = create_subgraph_of_nodes(World,find_nodes_of_kind(World,"country"),find_edges_of_kind(World,"geoTouches"))

IntProgress(value=0, max=215)

In [47]:
wD3 = Dropdown(options=sorted(gfStates.name.astype(str).unique()))
display(wD3)

Dropdown(options=("A'ana", 'Aargau', 'Aberdeen', 'Aberdeenshire', 'Abia', 'Abim', 'Abkhazia', 'Abra', 'Abu Dha…

In [48]:
nt = pyvis.network.Network("500px","500px",notebook=True)
nt.from_nx(nx.ego_graph(World,find_node_by_attribute_value(World,"name",wD3.value,kind="state")))
#nt.from_nx(World)
nt.show("figure6.html")

## Level below (districts)

## United Kingdom

This uses data from [arcgis](https://covid19.esriuk.com/datasets/fef73aeaf13c417dadf2fc99abcf8eef_0?geometry=-60.488%2C44.785%2C54.649%2C62.639)

In [49]:
ROOT = os.path.join(ROOTFOLDER,"sun/geo/arcgis.com_datasets")

gdf_uk = gpd.read_file(os.path.join(ROOT,"Local_Authority_Districts__December_2018__Boundaries_UK_BFC.shp"))
gdf_uk.head()

Unnamed: 0,objectid,lad18cd,lad18nm,lad18nmw,bng_e,bng_n,long,lat,st_areasha,st_lengths,geometry
0,1,E06000001,Hartlepool,,447157,531476,-1.27023,54.676201,93559510.0,71707.162397,"MULTIPOLYGON (((-1.26846 54.72612, -1.26822 54..."
1,2,E06000002,Middlesbrough,,451141,516887,-1.21099,54.544701,53888580.0,43840.876046,"MULTIPOLYGON (((-1.24390 54.58936, -1.24257 54..."
2,3,E06000003,Redcar and Cleveland,,464359,519597,-1.00611,54.567501,244820300.0,97993.352238,"MULTIPOLYGON (((-1.13758 54.64581, -1.13781 54..."
3,4,E06000004,Stockton-on-Tees,,444937,518183,-1.30669,54.5569,204962200.0,119581.539702,"MULTIPOLYGON (((-1.31729 54.64480, -1.31715 54..."
4,5,E06000005,Darlington,,428029,515648,-1.56835,54.535301,197475700.0,107206.28297,"POLYGON ((-1.43836 54.59508, -1.43829 54.59500..."


In [50]:
if CLOUDPAK:
    N_THREADS=32
else:
    N_THREADS=32

# This is tricky, we need one geometry in its original coordinate system, and one mapped to epsg 3857, the latter we need to get
# border lengths in meters, the former we need to retain to do the geopandas read_file with a bounding box contraint to 
# significantly reduce compute time
class NEIGHBOURS(Thread):
    def __init__(self,cells,cells_3857):
        Thread.__init__(self)
        self.cells = cells
        self.cells_3857 = cells_3857
        self.retvals = []
        
    def compute_neighbours(self):
        FIELD = "lad18cd"
        for i,me in self.cells.iterrows():
            bbox = me.geometry.bounds
            ggf = gpd.read_file(os.path.join(ROOT,"Local_Authority_Districts__December_2018__Boundaries_UK_BFC.shp"),
                                        bbox=bbox).to_crs(epsg='3857')
            ggf = ggf[ggf[FIELD] != me[FIELD]]
            if len(ggf) <= 0:
                continue

            me2 = self.cells_3857.loc[i]
            mask = ggf.apply(lambda row: row['geometry'].touches(me2.geometry), axis=1)
            dist = ggf.apply(lambda row: row['geometry'].intersection(me2.geometry).length, axis=1)
            if len(ggf[mask])>0:
                neighbours = ggf[mask][FIELD]
                dist = np.array(dist)
                dist = dist[mask]
                self.retvals.append({"me":me[FIELD],"neighbours":neighbours.values,"border_lengths":dist})
            else:
                self.retvals.append({"me":me[FIELD],"neighbours":[""],"border_lengths":[0.]})
            #time.sleep(0.1)
            #self.retvals.append(bbox)
        
    def get_results(self):
        return self.retvals
    
def process_chunks(cells):
    retval = []
    for i,row in cells.iterrows():
        retval.append(compute_neighbours(row))
    return retval
    
cells = gdf_uk
bits = np.array_split(cells,32)
cells2 = gdf_uk.to_crs(epsg='3857')
bobs = np.array_split(cells2,32)

threads = []
for i in range(len(bits)):
    worker = NEIGHBOURS(bits[i],bobs[i])
    worker.daemon=True
    worker.compute_neighbours()
    threads.append(worker)
    worker.start()

result = []
for x in threads:
    result = x.join()
result

In [15]:
alldata = []
for i in threads:
    results = i.get_results()
    for result in results:
        #for n in result["neighbours"]:
        if result["neighbours"][0] == '':
            continue
        elif len(result["neighbours"]) != len(result["border_lengths"]):
            print("!")
        for j in range(len(result["neighbours"])):
            n = result["neighbours"][j]
            l = result["border_lengths"][j]
            me_name = gdf_uk[gdf_uk.lad18cd == result["me"]].lad18nm.values[0]
            neighbour_name = gdf_uk[gdf_uk.lad18cd == n].lad18nm.values[0]
            alldata.append({"me":result["me"],"neighbour":n,"me_name":me_name,"neighbour_name":neighbour_name,"border":l})
            
dfNeighbours = pd.DataFrame(alldata)
#dfNeighbours.to_parquet("./tmp/UK.parquet")
#dfNeighbours.to_csv("./UK.neighbours.csv",index=False)

In [18]:
ROOT = os.path.join(ROOTFOLDER,"mercury/geo")
dfNeighbours.to_csv(os.path.join(ROOT,"UK.neighbours.csv"),index=False)

## Germany

Data are [Kreisgrenzen 2019](https://npgeo-corona-npgeo-de.hub.arcgis.com/datasets/esri-de-content::kreisgrenzen-2019?geometry=-27.581%2C46.270%2C48.489%2C55.886) and [Bezirke - Berlin](https://opendata-esri-de.opendata.arcgis.com/datasets/9f5d82911d4545c4be1da8cab89f21ae_0). Note that the two files have different coordinate reference systems, so the Berlin Bezirke needs to be converted to `.to_crs("EPSG:4326")`

In [27]:
ROOT = os.path.join(ROOTFOLDER,"sun/geo/arcgis.com_datasets")

gdf_d = gpd.read_file(os.path.join(ROOT,"Kreisgrenzen_2019.shp"))
gdf_d["IdLandkreis"] = gdf_d.RS.astype(int)
#geodata = geodata.merge(basedata[["IdLandkreis","EWZ"]],on="IdLandkreis")


gdf_b = gpd.read_file(os.path.join(ROOT,"Berlin_Bezirke.shp")).to_crs("EPSG:4326").rename(columns={"Land_name":"Bundesland","Gemeinde_n":"GEN"})
gdf_b["IdLandkreis"] = gdf_b.Schluessel.str[:-5]+gdf_b.Schluessel.str[-2:]
#berlindata["IdLandkreis"] = berlindata["IdLandkreis"].astype(int)
gdf_b.rename(columns={"Land_name":"Bundesland","Gemeinde_n":"GEN"},inplace=True)

gdf_d = pd.concat([gdf_d,gdf_b[["IdLandkreis","geometry","GEN"]]])
gdf_d.head()

of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.




Unnamed: 0,AGS,BEM,BEZ,FID,FK_S3,GEN,IBZ,IdLandkreis,NUTS,RS,...,SHAPE_Area,SHAPE_Leng,SN_G,SN_K,SN_L,SN_R,SN_V1,SN_V2,WSK,geometry
0,1001,--,Kreisfreie Stadt,1.0,R,Flensburg,40.0,1001,DEF01,1001,...,0.006873,0.524721,0,1,1,0,0,0,2008-01-01,"POLYGON ((9.41266 54.82264, 9.41318 54.82124, ..."
1,1002,--,Kreisfreie Stadt,2.0,R,Kiel,40.0,1002,DEF02,1002,...,0.015507,1.274684,0,2,1,0,0,0,2006-01-01,"POLYGON ((10.16916 54.43138, 10.16957 54.43067..."
2,1003,--,Kreisfreie Stadt,3.0,R,Lübeck,40.0,1003,DEF03,1003,...,0.028928,1.834534,0,3,1,0,0,0,2006-02-01,"POLYGON ((10.87684 53.98737, 10.87884 53.98595..."
3,1004,--,Kreisfreie Stadt,4.0,R,Neumünster,40.0,1004,DEF04,1004,...,0.009808,0.663262,0,4,1,0,0,0,1970-04-26,"POLYGON ((9.99545 54.14972, 9.99713 54.14806, ..."
4,1051,--,Kreis,5.0,R,Dithmarschen,42.0,1051,DEF05,1051,...,0.196087,3.073353,0,51,1,0,0,0,2011-08-01,"MULTIPOLYGON (((9.07402 54.36277, 9.07595 54.3..."


In [28]:
gdf_d.plot()

ImportError: The descartes package is required for plotting polygons in geopandas. You can install it using 'conda install -c conda-forge descartes' or 'pip install descartes'.