# Exersice 1.2 (solution)
## Peter, Paul and Mary network
---

1. Get familiar with the content of each of the three files listed below.
 - Are networks directed or undirected?
 - Are edges weighted or unweighted?
 - Which python libraries would you need?


2. Load the data!
 - Load the node attributes
 - Load layer #1 as an adjacency matrix using pandas
 - Load layer #2 as an edgelist using pandas


3. Create a networkx graph from the loaded data.
 - Create an empty multiple-edge graph G.
 - Add a graph attribute 'name', with value 'Peter, Paul and Mary'.
 - Assign node attributes to nodes in G (using data extracted in step 2)
 - Populate graph G using layer #1 and layer #2 (using data extracted from step 2)
    - Make sure that edges in layer #1 have layer ID "domain1", and edges in layer #2, "domain2".
 - Print the metadata of the graph, all nodes and all edges.

   
4. Save the whole network into a single file. Make sure that:
 - all attributes (for graph, nodes, edges) are saved!
 - all layers are saved!
 

---
## Task #1
#### Get familiar with the data and the libraries!
Open in your browser (in another window or tab) the files under `data/peter_paul_mary`.
Read those files, get to know which attributes the node have, the format of the network information file, etc.

---

#### Data file paths
Check out these files from the "data" folder, and get familiar with their format:\
column names, separators, directed edges?, weighted edges?, etc.

In [1]:
fn_att = "../../data/peter_paul_mary/peter_paul_mary_attributes.csv"
fn_layer1 = "../../data/peter_paul_mary/peter_paul_mary_domain1_adjacency.csv"
fn_layer2 = "../../data/peter_paul_mary/peter_paul_mary_domain2_edgelist.csv"

#### Dependencies (libraries)
Import here all necessary python packages

In [2]:
import networkx as nx
import pandas as pd
import numpy as np

---
## Task #2
#### Load the data!
- Load the node attributes
- Load layer #1 as an adjacency matrix using pandas
- Load layer #2 as an edgelist using pandas

---

#### Load node attributes

In [3]:
node_attributes = pd.read_csv(fn_att, sep=';', header=0)
node_attributes

Unnamed: 0,name,age
0,Peter,44
1,Paul,22
2,Mary,33


#### Load layer 1 as an adjacency matrix using pandas

In [4]:
df_layer1 = pd.read_csv(fn_layer1, sep=';', header=0, index_col=0)
df_layer1

Unnamed: 0,Peter,Paul,Mary
Peter,0,2,1
Paul,0,0,1
Mary,0,0,0


#### Load layer 2 as an edge list using pandas

In [5]:
df_layer2 = pd.read_csv(fn_layer2, sep=';', header=0)
df_layer2

Unnamed: 0,from,to,weight
0,Paul,Peter,3
1,Mary,Peter,1
2,Mary,Paul,1


---
## Task #3
#### Create a networkx graph with the loaded data.
- Create an empty multiple-edge graph G.
- Add a graph attribute 'name', with value 'Peter, Paul and Mary'.
- Assign node attributes to nodes in G (using data extracted in step 2)
- Populate graph G using layer #1 and layer #2 (using data extracted from step 2)
   - Make sure that edges in layer #1 have layer ID "domain1", and edges in layer #2, "domain2".
- Print the metadata of the graph, all nodes and all edges.
- Check nodes' metadata. What happened? Can you fix it?

---

#### Creating a multi-edge graph
Instanciate a networkx graph with a name: 'Peter, Paul and Mary'

In [6]:
G = nx.MultiDiGraph(name='Peter, Paul and Mary')
print(nx.info(G))

Name: Peter, Paul and Mary
Type: MultiDiGraph
Number of nodes: 0
Number of edges: 0



#### Adding node attributes
*As next step, we cannot add node attributes because the network is empty!\
We need to populate edges first (or nodes) to then assign attributes to the nodes.*

#### Adding first layer
Load layer 1 into G, and use "domain1" as layer (key) ID.\
*HINT 1: Create a networkx graph `g1` by using `nx.from_pandas_adjacency()`.*\
*HINT 2: Read all the edges from `g1` and create an edgelist of the form (source, target, key, weight).*\
*HINT 3: Then pass that edge list into ```G.add_edges_from()```.*

In [7]:
g1 = nx.from_pandas_adjacency(df_layer1, create_using=nx.DiGraph) # directed graph (layer 1)

el1 = [ (u,v,'domain1',w) for u,v,w in g1.edges(data=True)]       # edgelist of the form (s,t,k,w)
#for u,v,w in g1.edges(data=True):
#    el1.append((u,v,'domain1',w))
    
G.add_edges_from(el1)                                             # adding edges (with metadata) into main graph

print(nx.info(G))
print(G.nodes())
print(G.edges(data=True, keys=True))

Name: Peter, Paul and Mary
Type: MultiDiGraph
Number of nodes: 3
Number of edges: 3
Average in degree:   1.0000
Average out degree:   1.0000
['Peter', 'Paul', 'Mary']
[('Peter', 'Paul', 'domain1', {'weight': 2}), ('Peter', 'Mary', 'domain1', {'weight': 1}), ('Paul', 'Mary', 'domain1', {'weight': 1})]


#### Adding second layer
Load layer 2 into G, and use "domain2" as layer (key) ID.\
*HINT 1: Same as before, but now you need to create the networkx graph using `nx.from_pandas_edgelist`.*

In [8]:
g2 = nx.from_pandas_edgelist(df_layer2, source='from', target='to', edge_attr='weight', create_using=nx.DiGraph)
el2 = [ (u,v,'domain2',w) for u,v,w in g2.edges(data=True)] # edgelist of the form (s,t,k,w)
G.add_edges_from(el2)                                       # adding edges (with metadata) into main graph 

print(nx.info(G))
print(G.nodes())
print(G.edges(data=True, keys=True))

Name: Peter, Paul and Mary
Type: MultiDiGraph
Number of nodes: 3
Number of edges: 6
Average in degree:   2.0000
Average out degree:   2.0000
['Peter', 'Paul', 'Mary']
[('Peter', 'Paul', 'domain1', {'weight': 2}), ('Peter', 'Mary', 'domain1', {'weight': 1}), ('Paul', 'Mary', 'domain1', {'weight': 1}), ('Paul', 'Peter', 'domain2', {'weight': 3}), ('Mary', 'Peter', 'domain2', {'weight': 1}), ('Mary', 'Paul', 'domain2', {'weight': 1})]


#### Adding node attributes
Add the attribute 'age' to all nodes.\
*Hint 1: Set 'name' as the index column of the DataFrame ```node_attributes```. Then, convert the DataFrame to dictionary and store it in the 'tmp' variable.*
*Hint 2: Check with ```G.nodes(data=True)``` if all nodes have been modified with the new attributes. If not, what would you do to fix it? Where would you move this piece of code?*

In [9]:
tmp = node_attributes.set_index('name').to_dict()
nx.set_node_attributes(G, values=tmp['age'], name='age')
G.nodes(data=True)

NodeDataView({'Peter': {'age': 44}, 'Paul': {'age': 22}, 'Mary': {'age': 33}})

#### Printing all node metadata
Traverse all nodes, and print (node, age-value).\
*Hint: You need to pass one parameter to ```G.nodes(?=?)```*

In [10]:
for node, obj in G.nodes(data=True):
    print(node,obj['age'])

Peter 44
Paul 22
Mary 33


#### Printing all edge metadata
Traverse all edges, and print (source, target, layer id, weight-value).\
*Hint: You need to pass two parameters to ```G.edges(?=?, ?=?)```*

In [11]:
for u,v,k,w in G.edges(data=True, keys=True):
    print(u,v,k,w['weight'])

Peter Paul domain1 2
Peter Mary domain1 1
Paul Mary domain1 1
Paul Peter domain2 3
Mary Peter domain2 1
Mary Paul domain2 1


---
## Task #4
#### Save the whole network into a single file. Make sure that:
Make sure the file you create contains:
- all attributes (for graph, nodes, edges) and
- all layers.

---

#### Save the graph into a file
*Hint: The file must contain ALL metadata.*

In [12]:
fn = "../../results/exercise_12.gpickle"

# write (gpickle or gml)
# Notice that if you save the graph as ".gpickle" or ".gml" all the structure of the graph is stored as it is.
# In this case as a MultiDiGraph, that is:
# graph, graph attributes, nodes, node attributes, edges, edge attributes, and layers are stored!
# So, if you load the .gpickle (or .gml) file, you can access all edges' info with: g.edges(data=True, kesys=True)

# write (gexf) for gephi
# However, if you store the network as a ".gexf" file (for gephi), while all info will also be saved,
# the network will be (in this case) just a DiGraph. 
# The layer information is not lost but it is handled as edge attributes, so if you load the .gexf file, 
# this line g.edges(data=True, kesys=True) will fail, because layers don't exist (it's a DiGraph).
# You can then access the edges' info with: g.edges(data=True)
nx.write_gpickle(G, fn)

# read
g = nx.read_gpickle(fn)
print(nx.info(g))
print('')

# check nodes' info
print(g.nodes(data=True))
print('')

# check edges' info
print(g.edges(data=True, keys=True)) # for gpickle and gml files
# g.edges(data=True)          # for gexf files

Name: Peter, Paul and Mary
Type: MultiDiGraph
Number of nodes: 3
Number of edges: 6
Average in degree:   2.0000
Average out degree:   2.0000

[('Peter', {'age': 44}), ('Paul', {'age': 22}), ('Mary', {'age': 33})]

[('Peter', 'Paul', 'domain1', {'weight': 2}), ('Peter', 'Mary', 'domain1', {'weight': 1}), ('Paul', 'Mary', 'domain1', {'weight': 1}), ('Paul', 'Peter', 'domain2', {'weight': 3}), ('Mary', 'Peter', 'domain2', {'weight': 1}), ('Mary', 'Paul', 'domain2', {'weight': 1})]
