# Chicago Regional Traffic EDA

### Install Packages

In [40]:
import pandas as pd
import numpy as np
import plotly.express as px

In [41]:
## Read in data:
flow_df = pd.read_csv("RawData/ChicagoSketch_flow.tntp", sep = '\t')
net_df = pd.read_csv("RawData/ChicagoSketch_net.tntp", sep = '\t', skiprows = 4)
node_df = pd.read_csv("RawData/ChicagoSketch_node.tntp", sep = '\t')
trips_df = pd.read_csv("RawData/ChicagoSketch_trips.tntp", sep = '\t')

### Flow_df

In [6]:
## Explore flow_df
flow_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2950 entries, 0 to 2949
Data columns (total 4 columns):
 #   Column   Non-Null Count  Dtype  
---  ------   --------------  -----  
 0   From     2950 non-null   int64  
 1   To       2950 non-null   int64  
 2   Volume   2950 non-null   float64
 3   Cost     2950 non-null   float64
dtypes: float64(2), int64(2)
memory usage: 92.3 KB


This tells us that there are 2950 edges within the Chicago Transportation network graph. The From-To distinction leads us to believe these are **directed** edges

In [7]:
flow_df.head()

Unnamed: 0,From,To,Volume,Cost
0,1,547,4989.13,0.034507
1,2,548,6719.41,0.034507
2,3,549,10095.53,0.034507
3,4,550,9444.62,0.034507
4,5,551,17223.82,0.034507


From the initial head and GitHub README, looks as though From and To are nodes within the Chicago Traffic Network. Volume seems to describe how much traffic is going on the edge From - To. Cost is unclear.

In [18]:
flow_df[['Volume ','Cost ']].describe()

Unnamed: 0,Volume,Cost
count,2950.0,2950.0
mean,2399.298662,3.666383
std,2661.518472,3.116103
min,0.0,0.034507
25%,488.845,0.034507
50%,1507.921298,3.910887
75%,3470.007065,5.427562
max,22380.62,25.755935


Volume and Cost both appear to be some form of weights that can be applied to the edges. The specific meaning is still unclear.

### Net_df

In [24]:
net_df.head()

Unnamed: 0,<ORIGINAL HEADER>~,tail node,head node,capacity (veh/h),length (miles),fftt(min),B,Power,speed limit (mph),toll (cents),link type,Unnamed: 11
0,<END OF METADATA>,,,,,,,,,,,
1,~,init_node,term_node,capacity,length,free_flow_time,b,power,speed,toll,link_type,;
2,,1,547,49500,0.86267,0,0.15,4,0,0,3,;
3,,2,548,49500,0.86267,0,0.15,4,0,0,3,;
4,,3,549,49500,0.86267,0,0.15,4,0,0,3,;


In [44]:
net_df = pd.read_csv("RawData/ChicagoSketch_net.tntp", sep = '\t', skiprows = 4)
net_df.drop([0,1], inplace = True)
net_df.drop(columns = net_df.columns[0], axis = 1, inplace = True)

In [45]:
net_df.head(10)

Unnamed: 0,tail node,head node,capacity (veh/h),length (miles),fftt(min),B,Power,speed limit (mph),toll (cents),link type,Unnamed: 11
2,1,547,49500,0.86267,0,0.15,4,0,0,3,;
3,2,548,49500,0.86267,0,0.15,4,0,0,3,;
4,3,549,49500,0.86267,0,0.15,4,0,0,3,;
5,4,550,49500,0.86267,0,0.15,4,0,0,3,;
6,5,551,49500,0.86267,0,0.15,4,0,0,3,;
7,6,552,49500,0.86267,0,0.15,4,0,0,3,;
8,7,553,49500,0.86267,0,0.15,4,0,0,3,;
9,8,554,49500,0.86267,0,0.15,4,0,0,3,;
10,9,555,49500,0.86267,0,0.15,4,0,0,3,;
11,10,556,49500,0.86267,0,0.15,4,0,0,3,;


In [46]:
net_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2950 entries, 2 to 2951
Data columns (total 11 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   tail node          2950 non-null   object
 1   head node          2950 non-null   object
 2   capacity (veh/h)   2950 non-null   object
 3   length (miles)     2950 non-null   object
 4   fftt(min)          2950 non-null   object
 5   B                  2950 non-null   object
 6   Power              2950 non-null   object
 7   speed limit (mph)  2950 non-null   object
 8   toll (cents)       2950 non-null   object
 9   link type          2950 non-null   object
 10  Unnamed: 11        2950 non-null   object
dtypes: object(11)
memory usage: 253.6+ KB


So it looks like the net_df dataframe is giving more specific context to the the 2950 edges within the network. This tells important information such as:
- Tail Node (Start)
- Head Node (End)
- Capacity - Number of vehicles per hour that the edge can support
- Length - The length of the edge in miles
- Speed limit - Speed limit of the road in miles per hour
- If there is a toll and the cost of that toll 

There are some items that have obscure meaning...
- fftt(min)
- B
- Power
- link type

In [47]:
net_df.describe()

Unnamed: 0,tail node,head node,capacity (veh/h),length (miles),fftt(min),B,Power,speed limit (mph),toll (cents),link type,Unnamed: 11
count,2950,2950,2950,2950.0,2950,2950.0,2950,2950,2950,2950,2950
unique,933,933,35,597.0,557,1.0,1,1,1,3,1
top,584,584,49500,0.86267,0,0.15,4,0,0,1,;
freq,10,10,774,778.0,774,2950.0,2950,2950,2950,1818,2950


### Trips_df

In [56]:
trips_df = pd.read_csv("RawData/ChicagoSketch_trips.tntp", sep = '\t')
trips_df.head(10)

Unnamed: 0,<NUMBER OF ZONES> 387
0,<TOTAL OD FLOW> 1260907.4400005303
1,<END OF METADATA>
2,~ Vehicle trip table for new sketch network
3,~ Generated by CMSC1.5
4,"~ 50 Evans iterations, 0.0001 relative gap, 0...."
5,"~ Date: June 15, 1999"
6,~ Hillel Bar-Gera
7,Origin 1
8,1 : 273.18; 2 : 347.31; ...
9,6 : 199.70; 7 : 119.69; ...


This needs substantial cleaning and without data dictionary the meaning is confusing. 

### Node_df

In [57]:
node_df.head()

Unnamed: 0,node,X,Y,;
0,1,690309,1976022,;
1,2,683649,1973025,;
2,3,693306,1963368,;
3,4,686313,1958373,;
4,5,696636,1946718,;


This dataframe is giving the X and Y coordinates for the nodes in the graph... what exactly are the X and Y coordinates? Longitude and latitude???