## Shortest Paths with NetworkX and OpenStreetMap Data  
### Shapefile Method

This implementation takes the following approach:
* Download OpenStreetMap data from Geofabrik, a third party OSM data provider. Data is downloaded in ESRI Shapefile format.
* Convert downloaded data into a NetworkX graph object using NetworkX read_shp method.
* Use a shortest path algorithm already implemented in NetworkX to find the shortest path between two nodes.

References:
* Geofabrik http://download.geofabrik.de/.

Conclusions:
* Reading the shapefiles is easy with NetworkX, but the shapefiles don't seem to include street names, which limits the utility of a mapping application.

In [1]:
import networkx as nx
from haversine import haversine
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
%pylab inline

Populating the interactive namespace from numpy and matplotlib


### Read a preexisting shapefile

This one is for delaware, which understandably has one of the smallest shapefiles of all US states.

In [2]:
G = nx.read_shp('/Users/jason/code/msan694/osm_data/delaware-latest.shp')

In [3]:
G.number_of_nodes(), G.number_of_edges()

(91046, 55498)

The NetworkX read_shp function assigns node identifiers based on coordinate tuples.

In [4]:
G.nodes()[:2]

[(-75.129821, 38.7252019), (-75.692653, 39.614696)]

In [5]:
G.edges()[:2]

[((-75.692653, 39.614696), (-75.6863635, 39.6170381)),
 ((-75.7631401, 39.6801052), (-75.7635329, 39.6799814))]

In [6]:
G.neighbors((-75.692653, 39.614696))

[(-75.6863635, 39.6170381)]

In this Shapefile about 10% of the nodes have some associated data.

In [7]:
has_data = filter(lambda x: bool(x[1]),G.nodes(data=True))
len(has_data)

12031

In [8]:
has_data[:1]

[((-75.7767609, 39.6231959),
  {'ShpName': 'points',
   'name': None,
   'osm_id': '620659508',
   'timestamp': '2010-07-04T01:29:22Z',
   'type': 'turning_circle'})]

Unfortunately, the node data doesn't seem to indicate things like street names. For instance, according to Google Maps OSM ID 620659508 is on Glen Avon Drive.

In [9]:
ShpNames = map(lambda x: x[1]['ShpName'], has_data)
types = map(lambda x: x[1]['type'], has_data)
names = map(lambda x: x[1]['name'], has_data)

In [10]:
pd.Series(ShpNames).value_counts()

points    10374
places     1657
dtype: int64

In [11]:
pd.Series(types).value_counts().head(5)

turning_circle     4083
hamlet             1557
traffic_signals    1490
crossing           1466
utility_pole        778
dtype: int64

In [12]:
pd.Series(names).value_counts().head(5)

Wawa              8
Grotto Pizza      7
Walgreens         6
Royal Farms       5
Dunkin' Donuts    5
dtype: int64