## High-resolution freeway data from USDOT

-This notebook is based on the one that loads road data for SD county, but modified to 
obtain USDOT road data for Santa Clara County (FIPS 06085) 
-It loads a representations of interstates and state routes covered by PEMS sensors into the AWS RDS database.
- The motivation is to provide visually more appealing representations of traffic conditions and forecasts,
especially for segments with sparse sensor coverage or at high zoom levels
- The data only provides coordinates and road names.
- Although both directions are represented through individual strands, labels do not allow distinguishing them
- Some freeways are made of multiple segments which are not stored in consecutive order.
- This notebook manually assigns road segments (identified by LINEARID) belonging to a given freeway and direction
- Segments are then re-ordered in direction from south to north or west to east, consistent with direction for mile markers.
- Relative positions are calculated using the Haversine distance between adjacent points.  These are converted to absolute positions (TBD how).

Data source:
https://data-usdot.opendata.arcgis.com/datasets/census-tiger-line-roads
https://www2.census.gov/geo/tiger/TIGER2019/ROADS/

In [1]:
!ls *.shp

tl_2019_01001_roads.shp  tl_2019_06073_roads.shp


In [3]:
#Download data for SC county, FIPS code 06085
#!curl -O https://www2.census.gov/geo/tiger/TIGER2019/ROADS/tl_2019_06085_roads.zip
#!unzip tl_2019_06085_roads.zip

Archive:  tl_2019_06085_roads.zip
 extracting: tl_2019_06085_roads.cpg  
  inflating: tl_2019_06085_roads.dbf  
  inflating: tl_2019_06085_roads.prj  
  inflating: tl_2019_06085_roads.shp  
  inflating: tl_2019_06085_roads.shp.ea.iso.xml  
  inflating: tl_2019_06085_roads.shp.iso.xml  
  inflating: tl_2019_06085_roads.shx  


In [5]:
fips='06085'
myshp = open("tl_2019_%s_roads.shp" % fips, "rb")
mydbf = open("tl_2019_%s_roads.dbf" % fips, "rb")

In [6]:
import shapefile
r = shapefile.Reader(shp=myshp, dbf=mydbf)

In [7]:
r.shapeRecords

<bound method Reader.shapeRecords of <shapefile.Reader object at 0x7f05d538b250>>

In [8]:
#for a in r.iterRecords():
#    print(a)
sh=[a for a in r.iterShapeRecords()]
#print(r)

In [12]:
import plotly.express as px
import plotly
plotly.io.templates.default='plotly_dark'

In [13]:
#import pandas as pd
#adf=pd.DataFrame(columns=["lat","lon","name"])

#p=0

import pandas as pd

lons=[]
lats=[]
roads=[]
lineids=[]
inf3=[]
inf4=[]

for ash in sh:
    #lineid=ash.__geo_interface__["properties"]["LINEARID"]
    #roadname=ash.__geo_interface__["properties"]["FULLNAME"]
    lineid=ash.record[0]
    roadname=str(ash.record[1])
    tmp3=str(ash.record[2])
    tmp4=str(ash.record[3])
    for pt in ash.shape.__geo_interface__["coordinates"]:
        lons.append(pt[0])
        lats.append(pt[1])
        roads.append(roadname)
        lineids.append(lineid)
        inf3.append(tmp3)
        inf4.append(tmp4)
    
adf=pd.DataFrame({"lat": lats, "lon": lons, "name": roads, "LINEARID": lineids, "inf3": inf3, "inf4": inf4})
adf.head()

Unnamed: 0,lat,lon,name,LINEARID,inf3,inf4
0,37.353499,-121.828215,Capitol Expy Rmp,1103341803137,M,S1400
1,37.353464,-121.828225,Capitol Expy Rmp,1103341803137,M,S1400
2,37.353434,-121.828216,Capitol Expy Rmp,1103341803137,M,S1400
3,37.353189,-121.828048,Capitol Expy Rmp,1103341803137,M,S1400
4,37.352796,-121.827778,Capitol Expy Rmp,1103341803137,M,S1400


In [59]:
#adf[adf["name"].str.contains('85')]

It looks like interstates are named `I- number`, e.g. I- 805, I- 5, with a space between the hyphen and number

In [62]:
def find_fwy(expr):
    return list(adf[adf["name"].str.contains(expr)]["name"].unique())

find_fwy('101')

['US Hwy 101 Bus', 'US Hwy 101']

In [28]:
def plot_fwy(fwlabel):
    #sel=adf[adf["name"].str.contains(fwlabel)]
    sel=adf[adf["name"] == fwlabel]
    fig=px.line_mapbox(sel, lat="lat", lon="lon", line_group="LINEARID",
                   mapbox_style="carto-darkmatter", color="LINEARID")
    
    print(sel["LINEARID"].unique())
    print(sel["name"].unique())
    
    fig2=px.scatter_mapbox(sel2)
    
    return fig 
            
plot_fwy("State Rte 85")

['1108296281873' '1108296281823' '1108296281825' '1108296281874'
 '1108296281824' '1108296281872']
['State Rte 85']


In [34]:
#list all interstates and state routes in the road shapefile
print(adf[adf["inf3"] == "I"]["name"].unique())
print(adf[adf["inf3"] == "S"]["name"].unique())

['I- 280' 'I- 880' 'I- 680']
['State Rte 152' 'State Rte 156' 'State Rte 35' 'State Rte 237'
 'State Rte 17' 'State Rte 9' 'State Rte 85' 'State Rte 25'
 'State Rte 130' 'State Hwy 35' 'State Rte 82' 'State Rte 87'
 'State Hwy 82']


## Tasks:  

- manually / automatically identify which LINEARIDs belong to which freeway and direction in the database
- sometimes (such as in the I-15) the LINEARID changes for the same fwy and direction
- come up with a list of coordinates for each fwy-direction combination in the database
- interpolate abs_pm into those segments

In [36]:
!cp ../traffic_viz_layer/SC_fwys.pkl .

In [42]:
import pickle
fwdir=pickle.load(open("SC_fwys.pkl","rb"))

In [109]:
fwdir

['101N',
 '85S',
 '880S',
 '87S',
 '880N',
 '85N',
 '101S',
 '17S',
 '280S',
 '237W',
 '237E',
 '87N',
 '17N',
 '280N',
 '680S',
 '680N']

In [110]:
#dataframe containing segment ids from Tiger dataset for each fwy and direction0
liddf=pd.DataFrame(columns=["FWY","DIR","IDS"])
for n,fwd in enumerate(fwdir):
    liddf.loc[n,"DIR"] = fwd[-1] 
    liddf.loc[n,"FWY"] = fwd.rstrip(liddf.loc[n,"DIR"])
    liddf.loc[n,"IDS"] = []
liddf.sort_values(by="FWY", inplace=True)
#liddf.reset_index(inplace=True, drop=True)
liddf.set_index(["FWY","DIR"], inplace=True)

In the case of SC county, this assigment could have been automated easily, as there are only two relevant strands to each FWY:
- Pick the two longest segments
- If it's and E-W route, compute mean longitudes of both segments, and assign the one further east to WB, the other EB.
- Analogous procedure for N-S routes

In [175]:
liddf.loc[('101','S'),'IDS'] = ['1108296281878','11010927752689']
liddf.loc[('101','N'),'IDS'] = ['1108296283245','1104485774015']
liddf.loc[('17','S'),'IDS'] = ['1108296282682', '1104977742909']
liddf.loc[('17','N'),'IDS'] = ['1108296282710', '1104483532376']
liddf.loc[('237','W'),'IDS'] = ['11010927738162']
liddf.loc[('237','E'),'IDS'] = ['11010927734054']
liddf.loc[('280','S'),'IDS'] = ['1104493001530']
liddf.loc[('280','N'),'IDS'] = ['1104493001502']
liddf.loc[('680','S'),'IDS'] = ['1104486061767']
liddf.loc[('680','N'),'IDS'] = ['1104483532103']
liddf.loc[('85','S'),'IDS'] = ['1108296281825']
liddf.loc[('85','N'),'IDS'] = ['1108296281874']
liddf.loc[('87','S'),'IDS'] = ['11010927726196']
liddf.loc[('87','N'),'IDS'] = ['11010927727383']
liddf.loc[('880','S'),'IDS'] = ['1104483422751']
liddf.loc[('880','N'),'IDS'] = ['1104485866616']

In [171]:
liddf.tail(n=10)

Unnamed: 0_level_0,Unnamed: 1_level_0,IDS
FWY,DIR,Unnamed: 2_level_1
280,S,[1104493001530]
280,N,[1104493001502]
680,S,[1104486061767]
680,N,[1104483532103]
85,S,[1108296281825]
85,N,[1108296281874]
87,S,[11010927726196]
87,N,[11010927727383]
880,S,[1104483422751]
880,N,[1104485866616]


In [133]:
#fwys=liddf.index.get_level_values(0).unique()

liddf.reset_index()["FWY"].unique()

array(['101', '17', '237', '280', '680', '85', '87', '880'], dtype=object)

In [168]:
find_fwy('17')

['N 17th St', 'S 17th St', 'State Rte 17']

In [169]:
#plot_fwy("I- 8")
plot_fwy("State Rte 17")

['1108296282682' '1104977742908' '1108296282710' '1104977742909'
 '1104483532376']
['State Rte 17']


In [136]:
liddf.to_pickle("TIGER_PEMS_SC_IDS.pkl")

In [137]:
list([1,2]) + list([3,4])

[1, 2, 3, 4]

In [138]:
from sklearn.metrics.pairwise import haversine_distances
from math import radians

In [139]:
#haversine_distances?

In [172]:
earth_radius=6371.
miles_per_km=1.609344
def get_relative_pm(indf):
    #relative position
    indf["rel_pm"] = 0.
    for n in indf.index[1:]:
        cp=[radians(indf.loc[n,"latitude"]), radians(indf.loc[n,"longitude"])]
        pp=[radians(indf.loc[n-1,"latitude"]), radians(indf.loc[n-1,"longitude"])]
        result=haversine_distances([cp, pp]) * earth_radius / miles_per_km
        indf.loc[n,"rel_pm"] = indf.loc[n-1,"rel_pm"] + result[1][0]
    return indf

Need a way to organize different segments into coherent route
Options:
1. Treat all points as independent (ignore LINEARID) and solve as TSP
2. Start with southernmost point for SB/NB fwy and subsequently find closest neighbor
3. Figure out which segment is most south / west.  Take as is if first point is furthest south or west, otherwise flipud.
   Then take second northernmost segment.  Flip if needed.  Continue.
   
Using option number 3 here

Note: mile marker numbers always get larger as you travel east or north (USDOT).  Therefore, absolute marker numbers computed for SD county should already be quite close to PEMS value, as SD is in the southwestern corner.

In [216]:
import numpy as np

#same as add_rte_coordinates, but ordering segments
#Mile marker numbers always get larger as you travel east or north (usdot)
def add_rte_coordinates2():
    fwy_hr={}
    for rte in liddf.index:
        fwy=rte[0]
        direc=rte[1]
        fwstr=fwy+direc
        
        linearid=[lineind for lineind in liddf.loc[rte,"IDS"]]
        latmean=[adf[adf["LINEARID"] == lineind]["lat"].values.mean() for lineind in liddf.loc[rte,"IDS"]]
        lonmean=[adf[adf["LINEARID"] == lineind]["lon"].values.mean() for lineind in liddf.loc[rte,"IDS"]]
        llmean=pd.DataFrame({"lat": latmean, "lon": lonmean}, index=linearid)
        
        if direc == "S" or direc == "N":
            llmean=llmean.sort_values(by="lat", ascending=True)
        elif direc == "E" or direc == "W":
            llmean=llmean.sort_values(by="lon", ascending=True)
        else:
            print("Unknown direction.")
        
        lats=[]
        lons=[]
        
        for lineid in llmean.index:
            newlat=adf[adf["LINEARID"] == lineid]["lat"].values
            newlon=adf[adf["LINEARID"] == lineid]["lon"].values
            
            if (direc == "S" and newlat[0] > newlat[-1]) or \
               (direc == "N" and newlat[0] > newlat[-1]) or \
               (direc == "W" and newlon[0] > newlon[-1]) or \
               (direc == "E" and newlon[0] > newlon[-1]):
                    print("flipping direction on segment %s" % lineid)
                    newlat=np.flipud(newlat)
                    newlon=np.flipud(newlon)
                
            lats = lats + list(newlat)
            lons = lons + list(newlon)
            
        tmpdf = pd.DataFrame({"latitude": lats, "longitude": lons})
        fwy_hr[fwstr] = get_relative_pm(tmpdf)
        
    return fwy_hr #llmean, newlat

fwy_hr = add_rte_coordinates2()

flipping direction on segment 1108296283245
flipping direction on segment 1104977742909
flipping direction on segment 1104483532376
flipping direction on segment 1108296282710
flipping direction on segment 11010927734054
flipping direction on segment 1104493001530
flipping direction on segment 1104493001502
flipping direction on segment 1104486061767
flipping direction on segment 1108296281825
flipping direction on segment 11010927726196
flipping direction on segment 1104485866616


In [217]:
#manually adjust position markers for 101.  For the others, the accuracy turns out to be within useful limits.
fwy_hr["101S"]["rel_pm"] += 349.3
fwy_hr["101N"]["rel_pm"] += 349.3

In [218]:
#fwy_hr["101S"]

In [219]:
#adf[adf["LINEARID"] == '1108296282470']["lat"].tolist()
import pickle
with open("TIGER_PEMS_SC_ll_rel_pm.pkl", "wb") as fid:
    pickle.dump(fwy_hr, fid)

In [181]:
!ls -l *.pkl

-rw-rw-r-- 1 daniel daniel    122 May 27 11:27 SC_fwys.pkl
-rw-rw-r-- 1 daniel daniel    200 May 24 21:41 SD_fwys.pkl
-rw-rw-r-- 1 daniel daniel   1321 May 27 11:53 TIGER_PEMS_SC_IDS.pkl
-rw-rw-r-- 1 daniel daniel 194572 May 27 12:37 TIGER_PEMS_SC_ll_rel_pm.pkl
-rw-rw-r-- 1 daniel daniel   1765 May 26 15:45 TIGER_PEMS_SD_IDS.pkl
-rw-rw-r-- 1 daniel daniel 521844 May 26 15:53 TIGER_PEMS_SD_ll_rel_pm.pkl


In [182]:
host="capstone.clihskgj8i7s.us-west-2.rds.amazonaws.com"
username="group3"
db="db1"
#pw=getpass.getpass("Enter database password")
pw=open("/home/daniel/Desktop/.awsdb","r").read().rstrip()

In [183]:
import sqlalchemy as sal
engine = sal.create_engine('postgresql://%s:%s@%s/%s' % (username, pw, host, db))

In [184]:
pd.read_sql("select * from pemslocs p limit 1", engine)

Unnamed: 0,sid,fwy,direc,district,county,city,state_pm,abs_pm,latitude,longitude,length,stype,lanes,name
0,308511,50,E,3,17,,31.627,60.162,38.761062,-120.569835,5.0,ML,2,Sly Park Rd


In [185]:
fwy_hr.keys()

dict_keys(['101N', '101S', '17S', '17N', '237W', '237E', '280S', '280N', '680S', '680N', '85S', '85N', '87S', '87N', '880S', '880N'])

In [214]:
query="""
select p.latitude as lat, p.longitude as lon, p.abs_pm, p.state_pm
from pemslocs p 
where p.fwy=101 and p.direc='N' and p.stype='ML' and p.county = 85
order by abs_pm
"""
sensors=pd.read_sql(query,  engine)

In [215]:
import plotly.graph_objects as go
sensors["color"] = 'PEMS'
fig=px.scatter_mapbox(sensors, lat="lat", lon="lon",
                 mapbox_style="carto-darkmatter",
                 hover_data=["abs_pm", "state_pm"], color="color",
                 color_discrete_map={"PEMS": "red"})

sel=fwy_hr["101N"]
sel["color"] = "USDOT"
fig2=px.scatter_mapbox(sel, lat="latitude", lon="longitude",
                      hover_data=["rel_pm"],
                      color="color",
                      color_discrete_map={"USDOT": "green"})

fig.add_trace(fig2.data[0])
#fig

In [79]:
liddf.loc[('125','S'),:]

IDS    [1108296282831, 1108296281611, 1108311437048]
Name: (125, S), dtype: object

In [80]:
pd.read_sql("select * from pemslocs limit 1", engine)

Unnamed: 0,sid,fwy,direc,district,county,city,state_pm,abs_pm,latitude,longitude,length,stype,lanes,name
0,308511,50,E,3,17,,31.627,60.162,38.761062,-120.569835,5.0,ML,2,Sly Park Rd


In [81]:
adf.query("LINEARID == '1108296281611'")

Unnamed: 0,lat,lon,name,LINEARID,inf3,inf4
787825,32.747805,-117.018261,State Rte 125,1108296281611,S,S1100
787826,32.747401,-117.018427,State Rte 125,1108296281611,S,S1100
787827,32.747031,-117.018503,State Rte 125,1108296281611,S,S1100
787828,32.746460,-117.018565,State Rte 125,1108296281611,S,S1100
787829,32.746223,-117.018586,State Rte 125,1108296281611,S,S1100
...,...,...,...,...,...,...
787909,32.700404,-117.011598,State Rte 125,1108296281611,S,S1100
787910,32.699786,-117.011823,State Rte 125,1108296281611,S,S1100
787911,32.698968,-117.012150,State Rte 125,1108296281611,S,S1100
787912,32.697965,-117.012618,State Rte 125,1108296281611,S,S1100


In [220]:
#merge into single dataframe

fwydf=pd.DataFrame(columns=["fwdir", "latitude", "longitude", "abs_pm"])

for fwk in fwy_hr.keys():
    ndf=fwy_hr[fwk].copy()
    ndf["fwdir"] = fwk
    newcol=list(ndf.columns)
    newcol[2] = "abs_pm"
    ndf.columns = newcol
    fwydf=fwydf.append(ndf[fwydf.columns])
fwydf.reset_index(drop=True, inplace=True)

In [221]:
for col in ["latitude","longitude","abs_pm"]:
    fwydf[col] = fwydf[col].astype(float)

In [85]:
##with engine.connect() as con:
##    con.execute("drop table usdot")

In [224]:
create_cmd="""
create table usdot (
    usdid integer primary key, 
    fwdir varchar not null,  
    latitude float not null,  
    longitude float not null,  
    abs_pm float);
"""

#table was already created for SD area  - not creating it again
#with engine.connect() as con:
#    con.execute(create_cmd)

In [225]:
fwydf.to_sql("usdot", engine, if_exists = 'append', method='multi', index=False, index_label="usdid")

In [226]:
pd.read_sql("select * from usdot", engine, index_col="usdid").tail()

Unnamed: 0_level_0,fwdir,latitude,longitude,abs_pm
usdid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
29296,880N,37.454719,-121.922881,10.09526
29297,880N,37.454857,-121.922908,10.10491
29298,880N,37.454968,-121.922931,10.112682
29299,880N,37.45505,-121.922948,10.118424
29300,880N,37.455296,-121.923001,10.135668
