# Creating a feature matrix from a networkx graph

In this notebook we will look at a few ways to quickly create a feature matrix from a networkx graph.

In [24]:
import networkx as nx
import pandas as pd
import matplotlib.pyplot as plt
import pickle
#G = nx.read_gpickle('major_us_cities')
%matplotlib widget

%matplotlib widget
plt.figure(1)
pos = nx.spring_layout(G)
nx.draw_networkx(G, pos) 

## Node based features

Note that major_us_cities is adjacency file will work in networkX before ver 2.
It will be converted to the dataframe in 1.9 networkx and converted to a graph using `from_pandas_adjacency` in 2.1

* In 1.9

``` python
import pickle
with open("edgeslist.txt", "wb") as fp:   #Pickling
    pickle.dump(edges, fp)
    
nodes=G.nodes(data=True)

with open("nodeslist.txt", "wb") as fp:   #Pickling
    pickle.dump(nodes, fp)
```

* In 2.1
```python
with open("data/nodeslist.txt", "rb") as fp:   # Unpickling
    nodeslist = pickle.load(fp)
with open("data/edgeslist.txt", "rb") as fp:   # Unpickling
    edgeslist = pickle.load(fp)
G=nx.Graph()
G.add_nodes_from(nodeslist)
G.add_edges_from(edgeslist)    
```

In [39]:
with open("data/nodeslist.txt", "rb") as fp:   # Unpickling
    nodeslist = pickle.load(fp)
with open("data/edgeslist.txt", "rb") as fp:   # Unpickling
    edgeslist = pickle.load(fp)

In [40]:
G=nx.Graph()
G.add_nodes_from(nodeslist)
G.add_edges_from(edgeslist)

In [41]:
G.nodes(data=True)

NodeDataView({'El Paso, TX': {'population': 674433, 'location': (-106, 31)}, 'Long Beach, CA': {'population': 469428, 'location': (-118, 33)}, 'Dallas, TX': {'population': 1257676, 'location': (-96, 32)}, 'Oakland, CA': {'population': 406253, 'location': (-122, 37)}, 'Albuquerque, NM': {'population': 556495, 'location': (-106, 35)}, 'Baltimore, MD': {'population': 622104, 'location': (-76, 39)}, 'Raleigh, NC': {'population': 431746, 'location': (-78, 35)}, 'Mesa, AZ': {'population': 457587, 'location': (-111, 33)}, 'Arlington, TX': {'population': 379577, 'location': (-97, 32)}, 'Sacramento, CA': {'population': 479686, 'location': (-121, 38)}, 'Wichita, KS': {'population': 386552, 'location': (-97, 37)}, 'Tucson, AZ': {'population': 526116, 'location': (-110, 32)}, 'Cleveland, OH': {'population': 390113, 'location': (-81, 41)}, 'Louisville/Jefferson County, KY': {'population': 609893, 'location': (-85, 38)}, 'San Jose, CA': {'population': 998537, 'location': (-121, 37)}, 'Oklahoma City,

In [42]:
plt.figure(figsize=(10,7))

pos = nx.get_node_attributes(G, 'location')
nx.draw_networkx(G, pos)

FigureCanvasNbAgg()

In [43]:
# Initialize the dataframe, using the nodes as the index
df = pd.DataFrame(index=G.nodes())
df.head()

"El Paso, TX"
"Long Beach, CA"
"Dallas, TX"
"Oakland, CA"
"Albuquerque, NM"


### Extracting attributes

Using `nx.get_node_attributes` it's easy to extract the node attributes in the graph into DataFrame columns.

In [44]:
df['location'] = pd.Series(nx.get_node_attributes(G, 'location'))
df['population'] = pd.Series(nx.get_node_attributes(G, 'population'))

df.head()

Unnamed: 0,location,population
"El Paso, TX","(-106, 31)",674433
"Long Beach, CA","(-118, 33)",469428
"Dallas, TX","(-96, 32)",1257676
"Oakland, CA","(-122, 37)",406253
"Albuquerque, NM","(-106, 35)",556495


### Creating node based features

Most of the networkx functions related to nodes return a dictionary, which can also easily be added to our dataframe.

In [46]:
dict(G.degree())

DegreeView({'El Paso, TX': 5, 'Long Beach, CA': 11, 'Dallas, TX': 11, 'Oakland, CA': 8, 'Albuquerque, NM': 7, 'Baltimore, MD': 10, 'Raleigh, NC': 13, 'Mesa, AZ': 8, 'Arlington, TX': 11, 'Sacramento, CA': 9, 'Wichita, KS': 10, 'Tucson, AZ': 8, 'Cleveland, OH': 14, 'Louisville/Jefferson County, KY': 13, 'San Jose, CA': 8, 'Oklahoma City, OK': 12, 'Atlanta, GA': 9, 'New Orleans, LA': 8, 'Miami, FL': 1, 'Fresno, CA': 9, 'Philadelphia, PA': 10, 'Houston, TX': 9, 'Boston, MA': 5, 'Kansas City, MO': 14, 'San Diego, CA': 11, 'Chicago, IL': 11, 'Charlotte, NC': 12, 'Washington D.C.': 12, 'San Antonio, TX': 7, 'Phoenix, AZ': 9, 'San Francisco, CA': 8, 'Memphis, TN': 14, 'Los Angeles, CA': 11, 'New York, NY': 9, 'Denver, CO': 4, 'Omaha, NE': 9, 'Seattle, WA': 1, 'Portland, OR': 2, 'Tulsa, OK': 11, 'Austin, TX': 8, 'Minneapolis, MN': 4, 'Colorado Springs, CO': 6, 'Fort Worth, TX': 11, 'Indianapolis, IN': 13, 'Las Vegas, NV': 12, 'Detroit, MI': 11, 'Nashville-Davidson, TN': 13, 'Milwaukee, WI': 10,

In [47]:
df['clustering'] = pd.Series(nx.clustering(G))
df['degree'] = pd.Series(dict(G.degree()))

df

Unnamed: 0,location,population,clustering,degree
"El Paso, TX","(-106, 31)",674433,0.7,5
"Long Beach, CA","(-118, 33)",469428,0.745455,11
"Dallas, TX","(-96, 32)",1257676,0.763636,11
"Oakland, CA","(-122, 37)",406253,1.0,8
"Albuquerque, NM","(-106, 35)",556495,0.52381,7
"Baltimore, MD","(-76, 39)",622104,0.8,10
"Raleigh, NC","(-78, 35)",431746,0.615385,13
"Mesa, AZ","(-111, 33)",457587,0.75,8
"Arlington, TX","(-97, 32)",379577,0.763636,11
"Sacramento, CA","(-121, 38)",479686,0.777778,9


# Edge based features

In [49]:
G.edges(data=True)

EdgeDataView([('El Paso, TX', 'Albuquerque, NM', {'weight': 367.88584356108345}), ('El Paso, TX', 'Mesa, AZ', {'weight': 536.256659972679}), ('El Paso, TX', 'Tucson, AZ', {'weight': 425.41386739988224}), ('El Paso, TX', 'Phoenix, AZ', {'weight': 558.7835703774161}), ('El Paso, TX', 'Colorado Springs, CO', {'weight': 797.7517116740046}), ('Long Beach, CA', 'Oakland, CA', {'weight': 579.5829987228403}), ('Long Beach, CA', 'Mesa, AZ', {'weight': 590.156204210031}), ('Long Beach, CA', 'Sacramento, CA', {'weight': 611.0649790490104}), ('Long Beach, CA', 'Tucson, AZ', {'weight': 698.6566667728368}), ('Long Beach, CA', 'San Jose, CA', {'weight': 518.2330606219175}), ('Long Beach, CA', 'Fresno, CA', {'weight': 360.4704577972272}), ('Long Beach, CA', 'San Diego, CA', {'weight': 151.45008247402757}), ('Long Beach, CA', 'Phoenix, AZ', {'weight': 567.4125390872786}), ('Long Beach, CA', 'San Francisco, CA', {'weight': 585.6985397766858}), ('Long Beach, CA', 'Los Angeles, CA', {'weight': 31.69419563

In [50]:
# Initialize the dataframe, using the edges as the index
df = pd.DataFrame(index=G.edges())

### Extracting attributes

Using `nx.get_edge_attributes`, it's easy to extract the edge attributes in the graph into DataFrame columns.

In [51]:
df['weight'] = pd.Series(nx.get_edge_attributes(G, 'weight'))

df

Unnamed: 0,Unnamed: 1,weight
"El Paso, TX","Albuquerque, NM",367.885844
"El Paso, TX","Mesa, AZ",536.256660
"El Paso, TX","Tucson, AZ",425.413867
"El Paso, TX","Phoenix, AZ",558.783570
"El Paso, TX","Colorado Springs, CO",797.751712
"Long Beach, CA","Oakland, CA",579.582999
"Long Beach, CA","Mesa, AZ",590.156204
"Long Beach, CA","Sacramento, CA",611.064979
"Long Beach, CA","Tucson, AZ",698.656667
"Long Beach, CA","San Jose, CA",518.233061


### Creating edge based features

Many of the networkx functions related to edges return a nested data structures. We can extract the relevant data using list comprehension.

In [52]:
df['preferential attachment'] = [i[2] for i in nx.preferential_attachment(G, df.index)]

df

Unnamed: 0,Unnamed: 1,weight,preferential attachment
"El Paso, TX","Albuquerque, NM",367.885844,35
"El Paso, TX","Mesa, AZ",536.256660,40
"El Paso, TX","Tucson, AZ",425.413867,40
"El Paso, TX","Phoenix, AZ",558.783570,45
"El Paso, TX","Colorado Springs, CO",797.751712,30
"Long Beach, CA","Oakland, CA",579.582999,88
"Long Beach, CA","Mesa, AZ",590.156204,88
"Long Beach, CA","Sacramento, CA",611.064979,99
"Long Beach, CA","Tucson, AZ",698.656667,88
"Long Beach, CA","San Jose, CA",518.233061,88


In the case where the function expects two nodes to be passed in, we can map the index to a lamda function.

In [53]:
df['Common Neighbors'] = df.index.map(lambda city: len(list(nx.common_neighbors(G, city[0], city[1]))))

df

Unnamed: 0,Unnamed: 1,weight,preferential attachment,Common Neighbors
"El Paso, TX","Albuquerque, NM",367.885844,35,4
"El Paso, TX","Mesa, AZ",536.256660,40,3
"El Paso, TX","Tucson, AZ",425.413867,40,3
"El Paso, TX","Phoenix, AZ",558.783570,45,3
"El Paso, TX","Colorado Springs, CO",797.751712,30,1
"Long Beach, CA","Oakland, CA",579.582999,88,7
"Long Beach, CA","Mesa, AZ",590.156204,88,5
"Long Beach, CA","Sacramento, CA",611.064979,99,7
"Long Beach, CA","Tucson, AZ",698.656667,88,5
"Long Beach, CA","San Jose, CA",518.233061,88,7
