# Creating a feature matrix from a networkx graph

In this notebook we will look at a few ways to quickly create a feature matrix from a networkx graph.

In [2]:
import networkx as nx
import pandas as pd

G = nx.read_gpickle('major_us_cities')

## Node based features

In [3]:
G.nodes(data=True)[:10]

[('Dallas, TX', {'location': (-96, 32), 'population': 1257676}),
 ('Tulsa, OK', {'location': (-95, 36), 'population': 398121}),
 ('Las Vegas, NV', {'location': (-115, 36), 'population': 603488}),
 ('Mesa, AZ', {'location': (-111, 33), 'population': 457587}),
 ('Virginia Beach, VA', {'location': (-75, 36), 'population': 448479}),
 ('Phoenix, AZ', {'location': (-112, 33), 'population': 1513367}),
 ('Raleigh, NC', {'location': (-78, 35), 'population': 431746}),
 ('San Jose, CA', {'location': (-121, 37), 'population': 998537}),
 ('Washington D.C.', {'location': (-77, 38), 'population': 646449}),
 ('Sacramento, CA', {'location': (-121, 38), 'population': 479686})]

In [6]:
# Initialize the dataframe, using the nodes as the index
df = pd.DataFrame(index=G.nodes())

### Extracting attributes

Using `nx.get_node_attributes` it's easy to extract the node attributes in the graph into DataFrame columns.

In [7]:
df['location'] = pd.Series(nx.get_node_attributes(G, 'location'))
df['population'] = pd.Series(nx.get_node_attributes(G, 'population'))

df.head()

Unnamed: 0,location,population
"Dallas, TX","(-96, 32)",1257676
"Tulsa, OK","(-95, 36)",398121
"Las Vegas, NV","(-115, 36)",603488
"Mesa, AZ","(-111, 33)",457587
"Virginia Beach, VA","(-75, 36)",448479


### Creating node based features

Most of the networkx functions related to nodes return a dictionary, which can also easily be added to our dataframe.

In [8]:
df['clustering'] = pd.Series(nx.clustering(G))
df['degree'] = pd.Series(G.degree())

df.head()

Unnamed: 0,location,population,clustering,degree
"Dallas, TX","(-96, 32)",1257676,0.763636,11
"Tulsa, OK","(-95, 36)",398121,0.727273,11
"Las Vegas, NV","(-115, 36)",603488,0.666667,12
"Mesa, AZ","(-111, 33)",457587,0.75,8
"Virginia Beach, VA","(-75, 36)",448479,0.861111,9


# Edge based features

In [9]:
G.edges(data=True)[:10]

[('Dallas, TX', 'Fort Worth, TX', {'weight': 49.93359898977102}),
 ('Dallas, TX', 'Oklahoma City, OK', {'weight': 306.2597807397289}),
 ('Dallas, TX', 'Tulsa, OK', {'weight': 382.46410800205336}),
 ('Dallas, TX', 'San Antonio, TX', {'weight': 406.0065656782324}),
 ('Dallas, TX', 'Memphis, TN', {'weight': 675.3316242841653}),
 ('Dallas, TX', 'Wichita, KS', {'weight': 548.0572491959326}),
 ('Dallas, TX', 'Arlington, TX', {'weight': 29.425931317908415}),
 ('Dallas, TX', 'Houston, TX', {'weight': 361.54185907832755}),
 ('Dallas, TX', 'New Orleans, LA', {'weight': 711.0141469371868}),
 ('Dallas, TX', 'Kansas City, MO', {'weight': 730.377587942699})]

In [10]:
# Initialize the dataframe, using the edges as the index
df = pd.DataFrame(index=G.edges())

### Extracting attributes

Using `nx.get_edge_attributes`, it's easy to extract the edge attributes in the graph into DataFrame columns.

In [9]:
df['weight'] = pd.Series(nx.get_edge_attributes(G, 'weight'))

df.head()

Unnamed: 0,weight
"(El Paso, TX, Albuquerque, NM)",367.885844
"(El Paso, TX, Mesa, AZ)",536.25666
"(El Paso, TX, Tucson, AZ)",425.413867
"(El Paso, TX, Phoenix, AZ)",558.78357
"(El Paso, TX, Colorado Springs, CO)",797.751712


### Creating edge based features

Many of the networkx functions related to edges return a nested data structures. We can extract the relevant data using list comprehension.

In [11]:
df['preferential attachment'] = [i[2] for i in nx.preferential_attachment(G, df.index)]
df.head()

Unnamed: 0,weight,preferential attachment
"(El Paso, TX, Albuquerque, NM)",367.885844,35
"(El Paso, TX, Mesa, AZ)",536.25666,40
"(El Paso, TX, Tucson, AZ)",425.413867,40
"(El Paso, TX, Phoenix, AZ)",558.78357,45
"(El Paso, TX, Colorado Springs, CO)",797.751712,30


In the case where the function expects two nodes to be passed in, we can map the index to a lamda function.

In [12]:
df['Common Neighbors'] = df.index.map(lambda city: len(list(nx.common_neighbors(G, city[0], city[1]))))
df.head()

Unnamed: 0,weight,preferential attachment,Common Neighbors
"(El Paso, TX, Albuquerque, NM)",367.885844,35,4
"(El Paso, TX, Mesa, AZ)",536.25666,40,3
"(El Paso, TX, Tucson, AZ)",425.413867,40,3
"(El Paso, TX, Phoenix, AZ)",558.78357,45,3
"(El Paso, TX, Colorado Springs, CO)",797.751712,30,1
