# PART 3: Learning graphs with ipycytoscape.  Making use of pandas

### Objective
The objective of this article that follows a first and a second one is to approach the creation of graphs using ipycytoscape assuming you are given the data in a table (excel, CSV, google sheet etc).

There might be a need and a desire to maniulate data exclusively in pandas and only then create the graphs.

## What do you need to know?
You should have read the previous two notebooks/articles in the series (part 1 and part 2).
You should have some notions of pandas.

## Pandas: Another approach
I would like you to imagine is that the graph that is going to be built in this notebook is real part of an interface. The interface is supposed to be used by people sit behind the computer and looking and manipulating the graph. Something similar to an air traffic controller. So the colourings and appearance of the graph might be changing continuously depending on the number of passengers on trains, number of trains in stations, different colours for different train speeds etc.   
So if were the coder of the project you will have to reflect all that data into the GUI.

The most natural way to work with tabular data in data science with python is pandas. 

Note 1: The graph and interface can be deploy with voilà (it is not the purpose of this article to deepen into voilà, you just need to know that there are tools to render, i.e. display, ipycytoscape graphs into a browser)

Note 2: Ipycytoscape includes already an API that allows to pass a pandas DataFrame to a graph constructor (see here: https://ipycytoscape.readthedocs.io/en/latest/examples/pandas.html). Nevertheless, my approach is slightly different; I would like to show a way to manipulate and change graphs using strictly pandas.

Imagine you are given the data in a table called "stations.xls" and another table called "railconnections.xls".

Note: I pasted here a dictionary in order for you to be able to follow along without the real need of reading an external file and for article self-content purposes)

In [1]:
import ipycytoscape
import json
import ipywidgets
import pandas as pd

stations = [{'id': 'BER','country': 'Germany','classes': 'east','label': 'BER Hbf','passengers': 400000},
        {'id': 'MUN','country': 'Germany','classes': 'west','label': 'MUN Hbf','passengers': 200000},
        {'id': 'FRA','country': 'Germany','classes': 'west','label': 'HBf FRA','passengers': 200000},
        {'id': 'HAM','country': 'Germany', 'classes': 'west','label': 'HBf HAM','passengers': 150000},
        {'id': 'LEP','country': 'Germany','label': 'HBf LEP','classes': 'east','passengers': 50000},
        {'id': 'NUR','country': 'Germany','label': 'HBf NUR','classes': 'west','passengers': 50000},
        {'id': 'PAR', 'country': 'France','label': 'PAR CS', 'classes': '', 'passengers': 350000},
        {'id': 'MIL', 'country': 'Italy', 'label': 'MIL CS','classes': '', 'passengers': 250000},
        {'id': 'BAR', 'country': 'Spain', 'label': 'BAR CS','classes': '', 'passengers': 200000},
        {'id': 'LYO', 'country': 'France','label': 'LYO CS','classes': '', 'passengers': 200000}]

stations_df = pd.DataFrame(stations)
stations_df

Unnamed: 0,id,country,classes,label,passengers
0,BER,Germany,east,BER Hbf,400000
1,MUN,Germany,west,MUN Hbf,200000
2,FRA,Germany,west,HBf FRA,200000
3,HAM,Germany,west,HBf HAM,150000
4,LEP,Germany,east,HBf LEP,50000
5,NUR,Germany,west,HBf NUR,50000
6,PAR,France,,PAR CS,350000
7,MIL,Italy,,MIL CS,250000
8,BAR,Spain,,BAR CS,200000
9,LYO,France,,LYO CS,200000


And here the file "railconnections.xls".

In [2]:
rail_lines = [{'id': 'line1', 'source': 'BER', 'target': 'MUN', 'speed': '200km/h'},
         {'id': 'line2', 'source': 'MUN', 'target': 'FRA', 'speed': '200km/h'},
         {'id': 'line3', 'source': 'FRA', 'target': 'BER', 'speed': '250km/h'}, 
         {'id': 'line4', 'source': 'BER', 'target': 'HAM', 'speed': '300km/h'}, 
         {'id': 'line5', 'source': 'BER', 'target': 'LEP', 'speed': '300km/h'},
         {'id': 'line6', 'source': 'NUR', 'target': 'LEP', 'speed': '150km/h'}, 
         {'id': 'line7', 'source': 'NUR', 'target': 'FRA', 'speed': '150km/h'},
         {'id': 'line8', 'source': 'BER', 'target': 'PAR', 'speed': '400km/h'}, 
         {'id': 'line9', 'source': 'PAR', 'target': 'LYO', 'speed': '400km/h'}, 
         {'id': 'line10', 'source': 'LYO', 'target': 'BAR', 'speed': '400km/h'},
         {'id': 'line11', 'source': 'LYO', 'target': 'MIL', 'speed': '200km/h'}]
rails_df = pd.DataFrame(rail_lines,columns=['id','source','target','speed'])
rails_df['label'] = rails_df['speed']
rails_df['background-color'] = 'black'
rails_df

Unnamed: 0,id,source,target,speed,label,background-color
0,line1,BER,MUN,200km/h,200km/h,black
1,line2,MUN,FRA,200km/h,200km/h,black
2,line3,FRA,BER,250km/h,250km/h,black
3,line4,BER,HAM,300km/h,300km/h,black
4,line5,BER,LEP,300km/h,300km/h,black
5,line6,NUR,LEP,150km/h,150km/h,black
6,line7,NUR,FRA,150km/h,150km/h,black
7,line8,BER,PAR,400km/h,400km/h,black
8,line9,PAR,LYO,400km/h,400km/h,black
9,line10,LYO,BAR,400km/h,400km/h,black


Lets now add the neccesary data to the table. The class of the new stations is 'EU' (look part one and two of this series for understanding classes).    
Every EU rail station should be displayed orange, and the rest of the stations blue, except for the German capital which will be yellow.    
Lets add this data to the DataFrame.

In [3]:
stations_df.loc[stations_df['country'] != 'Germany','classes'] = 'EU'
stations_df['background-color']=''
stations_df.loc[stations_df['country'] == 'Germany','background-color'] = 'blue'
stations_df.loc[stations_df['country'] != 'Germany','background-color'] = 'orange'
stations_df.loc[stations_df['id'] == 'BER','background-color']  = 'yellow'
stations_df

Unnamed: 0,id,country,classes,label,passengers,background-color
0,BER,Germany,east,BER Hbf,400000,yellow
1,MUN,Germany,west,MUN Hbf,200000,blue
2,FRA,Germany,west,HBf FRA,200000,blue
3,HAM,Germany,west,HBf HAM,150000,blue
4,LEP,Germany,east,HBf LEP,50000,blue
5,NUR,Germany,west,HBf NUR,50000,blue
6,PAR,France,EU,PAR CS,350000,orange
7,MIL,Italy,EU,MIL CS,250000,orange
8,BAR,Spain,EU,BAR CS,200000,orange
9,LYO,France,EU,LYO CS,200000,orange


The above showed data tables contain the attributes of the stations and rails connections.
A method to pass all this data to ipycytoscape is needed.   

The following method manipulates both tables above in order to output a JSON file that can then be passed to ipycytoscape. The method itself returns the ipycytoscape graph.   
Afterwards it is only necessary to plot it.

In [4]:
def transform_into_ipycytoscape(nodes_df,edges_df):
    
    nodes_dict = nodes_df.to_dict('records')
    edges_dict = edges_df.to_dict('records')

    # building nodes

    data_keys = ['id','label','classes']
    position_keys = ['position_x','position_y']
    rest_keys = ['score','idInt','name','score','group','removed','selected','selectable','locked','grabbed'
                 'grabbable']
    
    nodes_graph_list=[]
    for node in nodes_dict:
        dict_node = {}
        data_sub_dict = {'data':{el:node[el] for el in data_keys}}
        rest_sub_dict = {el:node[el] for el in node.keys() if el in rest_keys}
        posi_sub_dict = {}
        if 'position_x' in node.keys() and 'position_y' in node.keys():
            #print(node.keys())
            posi_sub_dict = {'position':{el:node[el] for el in node.keys() if el in position_keys}}
        
        dict_node = {**data_sub_dict,**rest_sub_dict,**posi_sub_dict}
        nodes_graph_list.append(dict_node)
        
    
    # building edges
    
    data_keys  = ['id','source','target']
    data_keys2 = ['label','classes']
    rest_keys  = ['score','weight','group','networkId','networkGroupId','intn','rIntnId','group','removed','selected','selectable','locked','grabbed','grabbable','classes']
    position_keys = ['position_x','position_y']
    
    edges_graph_list = []
    for edge in edges_dict:
        dict_edge = {}
        data_sub_dict = {el:edge[el] for el in data_keys}
        data_sub_dict2 = {el:edge[el] for el in edge.keys() if el in data_keys2}
        rest_sub_dict = {el:edge[el] for el in edge.keys() if el in rest_keys}
        
        dict_edge = {'data':{**data_sub_dict,**data_sub_dict},**rest_sub_dict}
        edges_graph_list.append(dict_edge)
    
    total_graph_dict = {'nodes': nodes_graph_list, 'edges':edges_graph_list}
    
    # building the style
    all_node_style = ['background-color','background-opacity',
                     'font-family','font-size','label','width',
                     'shape']
    all_edge_style = ['background-color','background-opacity',
                     'font-family','font-size','label','width','line-color',
                     ]
    
    total_style_dict = {}
    style_elements=[]
    for node in nodes_dict:
        node_dict = {'selector': f'node[id = \"{node["id"]}\"]'}
        style_dict ={"style": { el:node[el] for el in node.keys() if el in all_node_style}}
        node_dict.update(style_dict)
        style_elements.append(node_dict)
    
    for edge in edges_dict:
        edge_dict = {'selector': f'edge[id = \"{edge["id"]}\"]'}
        style_dict ={"style": { el:edge[el] for el in edge.keys() if el in all_edge_style}}
        edge_dict.update(style_dict)
        style_elements.append(edge_dict)
    
    # the graph
    data_graph = json.dumps(total_graph_dict)
    json_to_python = json.loads(data_graph)
    result_cyto = ipycytoscape.CytoscapeWidget()
    result_cyto.graph.add_graph_from_json(json_to_python)    
    result_cyto.set_style(style_elements)    
    
    return result_cyto

G=transform_into_ipycytoscape(stations_df,rails_df)
display(G)

CytoscapeWidget(cytoscape_layout={'name': 'cola'}, cytoscape_style=[{'selector': 'node[id = "BER"]', 'style': …

But I started this article saying that I would show why being able to work directly with tables is a good way to go.   
Now assume that you want to change the colours of the Graph in the following way.  
Stations with more than 200000 passengers should be rendered red and the rest green.  
The high-speed rail lines have should be painted red. ( high-speed line is considered the one with more or equal than 300km/h)
We can do that operating over the data frame and pass the resulting data frame to the ipycytoscape constructor of the method above defined.
What are then the CSS affected attributes? For the nodes is the 'background-color' (we already used it) and for the lines is the 'line-color'.
Let's see.

In [5]:
stations_df['background-color'] = stations_df['passengers'].apply(lambda x: 'red' if x>200000 else 'blue')
rails_df['line-color'] = rails_df['label'].apply(lambda x: 'red' if x in ['400km/h','300km/h'] else 'green')

With only two lines we added the neccesary changes and the dtaframes are now as follows.

In [6]:
stations_df

Unnamed: 0,id,country,classes,label,passengers,background-color
0,BER,Germany,east,BER Hbf,400000,red
1,MUN,Germany,west,MUN Hbf,200000,blue
2,FRA,Germany,west,HBf FRA,200000,blue
3,HAM,Germany,west,HBf HAM,150000,blue
4,LEP,Germany,east,HBf LEP,50000,blue
5,NUR,Germany,west,HBf NUR,50000,blue
6,PAR,France,EU,PAR CS,350000,red
7,MIL,Italy,EU,MIL CS,250000,red
8,BAR,Spain,EU,BAR CS,200000,blue
9,LYO,France,EU,LYO CS,200000,blue


In [7]:
rails_df

Unnamed: 0,id,source,target,speed,label,background-color,line-color
0,line1,BER,MUN,200km/h,200km/h,black,green
1,line2,MUN,FRA,200km/h,200km/h,black,green
2,line3,FRA,BER,250km/h,250km/h,black,green
3,line4,BER,HAM,300km/h,300km/h,black,red
4,line5,BER,LEP,300km/h,300km/h,black,red
5,line6,NUR,LEP,150km/h,150km/h,black,green
6,line7,NUR,FRA,150km/h,150km/h,black,green
7,line8,BER,PAR,400km/h,400km/h,black,red
8,line9,PAR,LYO,400km/h,400km/h,black,red
9,line10,LYO,BAR,400km/h,400km/h,black,red


In [8]:
G=transform_into_ipycytoscape(stations_df,rails_df)
display(G)

CytoscapeWidget(cytoscape_layout={'name': 'cola'}, cytoscape_style=[{'selector': 'node[id = "BER"]', 'style': …

### Changing appearance of the edges & nodes with the help of pandas
In order to further illustrate the power of this approach I will add another layout change.
A new security regulation was passed in parliament and all the stations where a high train speed arrives should have special anti-fire measures. So the GUI should show in violet all the stations where at least a high speed train arrives and in green those in which that is not the case.   
Stations in which at least one high speed line arrives 

In [9]:
rails_df['high-speed'] = rails_df.apply(lambda x: [x.target,x.source] if x.speed in ['400km/h'] else [], axis=1)

In [10]:
rails_df

Unnamed: 0,id,source,target,speed,label,background-color,line-color,high-speed
0,line1,BER,MUN,200km/h,200km/h,black,green,[]
1,line2,MUN,FRA,200km/h,200km/h,black,green,[]
2,line3,FRA,BER,250km/h,250km/h,black,green,[]
3,line4,BER,HAM,300km/h,300km/h,black,red,[]
4,line5,BER,LEP,300km/h,300km/h,black,red,[]
5,line6,NUR,LEP,150km/h,150km/h,black,green,[]
6,line7,NUR,FRA,150km/h,150km/h,black,green,[]
7,line8,BER,PAR,400km/h,400km/h,black,red,"[PAR, BER]"
8,line9,PAR,LYO,400km/h,400km/h,black,red,"[LYO, PAR]"
9,line10,LYO,BAR,400km/h,400km/h,black,red,"[BAR, LYO]"


In [11]:
stations_high_speed_arriving = rails_df['high-speed'].to_list()
list_of_HS_stations =list(set([item for sublist in stations_high_speed_arriving for item in sublist]))
list_of_HS_stations

['BAR', 'LYO', 'BER', 'PAR']

### changing color of nodes
Those are the stations from which at least one high speed train departs.   
Lets change the color of the stations in the stations dataframe.


In [12]:
stations_df['background-color'] = stations_df['id'].apply(lambda x: 'violet' if x in list_of_HS_stations else 'yellow')

In [13]:
G=transform_into_ipycytoscape(stations_df,rails_df)
display(G)

CytoscapeWidget(cytoscape_layout={'name': 'cola'}, cytoscape_style=[{'selector': 'node[id = "BER"]', 'style': …

### changing shapes of nodes
Next the stations with more than 350000 passengers are going to be plotted as square to differenciate them from the other ones that are smaller.

In [17]:
stations_df['shape'] = stations_df['passengers'].apply(lambda x: 'rectangle' if x>=350000 else '') 

In [18]:
stations_df

Unnamed: 0,id,country,classes,label,passengers,background-color,shape
0,BER,Germany,east,BER Hbf,400000,violet,rectangle
1,MUN,Germany,west,MUN Hbf,200000,yellow,
2,FRA,Germany,west,HBf FRA,200000,yellow,
3,HAM,Germany,west,HBf HAM,150000,yellow,
4,LEP,Germany,east,HBf LEP,50000,yellow,
5,NUR,Germany,west,HBf NUR,50000,yellow,
6,PAR,France,EU,PAR CS,350000,violet,rectangle
7,MIL,Italy,EU,MIL CS,250000,yellow,
8,BAR,Spain,EU,BAR CS,200000,violet,
9,LYO,France,EU,LYO CS,200000,violet,


In [19]:
G=transform_into_ipycytoscape(stations_df,rails_df)
display(G)

CytoscapeWidget(cytoscape_layout={'name': 'cola'}, cytoscape_style=[{'selector': 'node[id = "BER"]', 'style': …

1.0.4


In [77]:
!pip install ipycytoscape

