Data below from here: https://geohub.lacity.org/datasets/9b1bc4861f1e4277b6bd6e51f48e0f4d/explore

In [135]:
import pandas as pd
from plotly import express as px
df = pd.read_csv("Metro_Stations.csv")
#The data is from the link above.

This data was much nicer and smaller than the New York Subway's hourly ridership data, but this data is also considerably less informative. We were unable to find ridership data by station, only by line, which made creating nice graphics and visualizations much more difficult. (Why LA Metro why???)

In [136]:
df

Unnamed: 0,X,Y,OBJECTID,source,ext_id,cat1,cat2,cat3,org_name,Name,...,description,zip,link,use_type,latitude,longitude,date_updated,dis_status,POINT_X,POINT_Y
0,-118.192933,33.768076,72713,Metropolitan Transportation Authority (MTA),,Transportation,Metro Stations,,,Downtown Long Beach Station,...,Blue Line,,,publish,33.768076,-118.192933,2023/04/04 16:19:54+00,,33.768076,-118.192933
1,-118.193712,33.772263,72714,Metropolitan Transportation Authority (MTA),,Transportation,Metro Stations,,,Pacific Ave Station,...,Blue Line,,,publish,33.772263,-118.193712,2023/04/04 16:19:54+00,,33.772263,-118.193712
2,-118.189396,33.781835,72715,Metropolitan Transportation Authority (MTA),,Transportation,Metro Stations,,,Anaheim Street Station,...,Blue Line,,,publish,33.781835,-118.189396,2023/04/04 16:19:54+00,,33.781835,-118.189396
3,-118.189394,33.789095,72716,Metropolitan Transportation Authority (MTA),,Transportation,Metro Stations,,,Pacific Coast Hwy Station,...,Blue Line,,,publish,33.789095,-118.189394,2023/04/04 16:19:54+00,,33.789095,-118.189394
4,-118.189846,33.807084,72717,Metropolitan Transportation Authority (MTA),,Transportation,Metro Stations,,,Willow Street Station,...,Blue Line,,,publish,33.807084,-118.189846,2023/04/04 16:19:54+00,,33.807084,-118.189846
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
125,-118.378703,33.945678,72838,Metropolitan Transportation Authority (MTA),,Transportation,Metro Stations,,,Aviation/Century Station,...,K Line,,,publish,33.945678,-118.378703,2023/04/04 16:19:54+00,,33.945678,-118.378703
126,-118.377271,33.929635,72839,Metropolitan Transportation Authority (MTA),,Transportation,Metro Stations,,,Aviation/LAX Station,...,K Line,,,publish,33.929635,-118.377271,2023/04/04 16:19:54+00,,33.929635,-118.377271
127,-118.251208,34.054751,72840,Metropolitan Transportation Authority (MTA),,Transportation,Metro Stations,,,Grand Av Arts/Bunker Hill,...,Regional Connector,,,publish,34.054751,-118.251208,2023/04/04 16:19:54+00,,34.054751,-118.251208
128,-118.246166,34.052039,72841,Metropolitan Transportation Authority (MTA),,Transportation,Metro Stations,,,Historic Broadway,...,Regional Connector,,,publish,34.052039,-118.246166,2023/04/04 16:19:54+00,,34.052039,-118.246166


We can get rid of a lot of the extra detail columns that we don't need.

In [137]:
cols = ['OBJECTID', 'post_id', 'latitude', 'longitude', 'Name', 'description']
df = df[cols]

In [138]:
df['description'] = df['description'].str.split().str.get(0)

#clean up the line names
for row in range(len(df)):
    if df['description'][row] == "Regional" or df['description'][row] == "Blue/EXPO":
        df['description'][row] = "Blue/Expo"
    if df['description'][row] == "EXPO":
        df['description'][row] = "Expo"

#clean up the lines column
df["Line"] = df["description"].str.split('/').str.get(0)
df["Lines"] = df["description"].str.split('/')

#we dont need description anymore
cols = ['OBJECTID', 'post_id', 'latitude', 'longitude', 'Name', 'Lines', 'Line']
df = df[cols]



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: 

In [139]:
df.to_csv('Clean_Metro_Stations.csv')

From here, the IDs startion ids are jumbled and unordered. So I went and manually sorted through the columns, putting the stations on each line in consecutive order.

In [140]:
df = pd.read_csv("Clean_Metro_Stations_Manual_Clean.csv")

So here's a rough system map showing the locations of all the stations and what line they are on:

In [141]:
#station map, lines not connected, also hover_data is kinda broken. The G-Line (BRT, not Rail) is shown as well
fig = px.scatter_mapbox(df, 
                        lat = "Latitude",
                        lon = "Longitude",
                        color = "Line",
                        hover_name = "Name",
                        #hover_data = "Lines",
                        zoom = 8.8,
                        height = 800,
                        width = 800,
                        title = "Graph Representation of LA Metro Station Locations by Line",
                        mapbox_style = "carto-positron")

fig.show()

Here's one with the lines connected:

In [142]:
#line map, stations not shown
fig = px.line_mapbox(df, 
                        lat = "Latitude",
                        lon = "Longitude",
                        color = "Line",
                        hover_name = "Name",
                        #hover_data = "Lines",
                        line_group='Line',
                        zoom = 8.8,
                        height = 800,
                        width = 800,
                        title = "Approximate Graph Representation of LA Metro Lines",
                        mapbox_style = "carto-positron")

fig.show()

And here's one showing distance from/to the nearest station. Again, I haven't quite figured out how to construct the contours based on real world distance yet so that's a bit unfortunate for now, but it's a work in progress!

In [143]:
#density mapbox, shows areas within a certain radius to a station. The higher the number the closer the straight line distance to a metro station
fig = px.density_mapbox(df,
                        lat='Latitude',
                        lon='Longitude',
                        radius=15,
                        opacity=0.3,
                        hover_name = "Name",
                        zoom = 8.8,
                        height = 800,
                        width = 800,
                        title = "Areas Within Approximately 1 Mile of a LA Metro Station",
                        mapbox_style = "carto-positron")
fig.show()

In [144]:
Adf = pd.read_csv("gvRailBlueMarch2023.csv")
BDdf = pd.read_csv("gvRailRedMarch2023.csv")
Cdf = pd.read_csv("gvRailGreenMarch2023.csv")
Edf = pd.read_csv("gvRailExpoMarch2023.csv")

In [145]:
ARidership = int(Adf["MAR 2023"][0].replace(',', ''))
BDRidership = int(BDdf["MAR 2023"][0].replace(',', ''))
CRidership = int(Cdf["MAR 2023"][0].replace(',', ''))
ERidership = int(Edf["MAR 2023"][0].replace(',', ''))

In [146]:
df["Average Weekday Ridership"] = 1

In [147]:
for i in range(len(df)):
    if df["Line"][i] == "A":
        df["Average Weekday Ridership"][i] = ARidership
    elif df["Line"][i] == "B" or df["Line"][i] == "D":
        df["Average Weekday Ridership"][i] = BDRidership
    elif df["Line"][i] == "C":
        df["Average Weekday Ridership"][i] = CRidership
    elif df["Line"][i] == "E":
        df["Average Weekday Ridership"][i] = ERidership



A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/i

In [148]:
df["Rough Line Weight"] = (df["Average Weekday Ridership"] / BDRidership) * 25

In [149]:
df

Unnamed: 0.1,Unnamed: 0,OBJECTID,post_id,Latitude,Longitude,Name,Lines,Line,Average Weekday Ridership,Rough Line Weight
0,31.0,72744,3306158,33.773603,-118.189424,5th Street Station,['Blue'],A,31752,9.780439
1,,72743,3306157,33.768745,-118.189374,1st Street Station,['Blue'],A,31752,9.780439
2,0.0,72713,3306127,33.768076,-118.192933,Downtown Long Beach Station,['Blue'],A,31752,9.780439
3,1.0,72714,3306128,33.772263,-118.193712,Pacific Ave Station,['Blue'],A,31752,9.780439
4,2.0,72715,3306129,33.781835,-118.189396,Anaheim Street Station,['Blue'],A,31752,9.780439
...,...,...,...,...,...,...,...,...,...,...
135,122.0,72835,3306249,33.967299,-118.351393,Downtown Inglewood Station,['K'],K,1,0.000308
136,123.0,72836,3306250,33.962021,-118.374455,Westchester/Veterans Station,['K'],K,1,0.000308
137,124.0,72837,3306251,33.949625,-118.378663,LAX/Metro Transit Center Station,['K'],K,1,0.000308
138,125.0,72838,3306252,33.945678,-118.378703,Aviation/Century Station,['K'],K,1,0.000308


In [150]:
#disconnected station map with line/ridership weights
fig = px.scatter_mapbox(df, 
                        lat = "Latitude",
                        lon = "Longitude",
                        color = "Line",
                        size = "Rough Line Weight",
                        hover_name = "Name",
                        #hover_data = "Lines",
                        zoom = 8.8,
                        height = 800,
                        width = 800,
                        title = "Graph Representation of LA Metro Station Locations by Line (With Ridership Weights)",
                        mapbox_style = "carto-positron")

fig.show()

In [151]:
stops = pd.read_csv("stops.csv")
stops

Unnamed: 0.1,Unnamed: 0,stop_id,stop_code,stop_name,stop_desc,stop_lat,stop_lon,stop_url,location_type,parent_station,tpis_name
0,0,80101,80101,Downtown Long Beach Station,,33.768071,-118.192921,,0,80101S,Long Bch
1,4,80102,80102,Pacific Ave Station,,33.772258,-118.193700,,0,80102S,Pacific
2,8,80105,80105,Anaheim Street Station,,33.781830,-118.189384,,0,80105S,Anaheim
3,12,80106,80106,Pacific Coast Hwy Station,,33.789090,-118.189382,,0,80106S,PCH
4,15,80107,80107,Willow Street Station,,33.807079,-118.189834,,0,80107S,Willow
...,...,...,...,...,...,...,...,...,...,...,...
99,403,80705,80705,Fairview Heights Station,,33.975252,-118.336072,,0,80705S,Fairview
100,407,80706,80706,Hyde Park Station,,33.988187,-118.330816,,0,80706S,Hyde Park
101,411,80707,80707,Leimert Park Station,,34.003909,-118.332016,,0,80707S,Leimert
102,415,80708,80708,Martin Luther King Jr Station,,34.009563,-118.335359,,0,80708S,MLK


In [152]:
stops = stops.rename(columns={"stop_name": "Name", "stop_id": "Station ID"})
cols = ["Station ID", "Name"]
stops = stops[cols]
stops

Unnamed: 0,Station ID,Name
0,80101,Downtown Long Beach Station
1,80102,Pacific Ave Station
2,80105,Anaheim Street Station
3,80106,Pacific Coast Hwy Station
4,80107,Willow Street Station
...,...,...
99,80705,Fairview Heights Station
100,80706,Hyde Park Station
101,80707,Leimert Park Station
102,80708,Martin Luther King Jr Station


In [153]:
df = pd.merge(df, stops, on="Name")

In [154]:
df

Unnamed: 0.1,Unnamed: 0,OBJECTID,post_id,Latitude,Longitude,Name,Lines,Line,Average Weekday Ridership,Rough Line Weight,Station ID
0,31.0,72744,3306158,33.773603,-118.189424,5th Street Station,['Blue'],A,31752,9.780439,80154
1,,72743,3306157,33.768745,-118.189374,1st Street Station,['Blue'],A,31752,9.780439,80153
2,0.0,72713,3306127,33.768076,-118.192933,Downtown Long Beach Station,['Blue'],A,31752,9.780439,80101
3,1.0,72714,3306128,33.772263,-118.193712,Pacific Ave Station,['Blue'],A,31752,9.780439,80102
4,2.0,72715,3306129,33.781835,-118.189396,Anaheim Street Station,['Blue'],A,31752,9.780439,80105
...,...,...,...,...,...,...,...,...,...,...,...
89,118.0,72831,3306245,34.010167,-118.335347,Martin Luther King Jr Station,['K'],K,1,0.000308,80708
90,119.0,72832,3306246,34.004582,-118.332665,Leimert Park Station,['K'],K,1,0.000308,80707
91,120.0,72833,3306247,33.988290,-118.330836,Hyde Park Station,['K'],K,1,0.000308,80706
92,121.0,72834,3306248,33.975284,-118.336030,Fairview Heights Station,['K'],K,1,0.000308,80705


In [155]:
cols = ["Station ID", "Name", "Latitude", "Longitude", "Lines", "Line", "Average Weekday Ridership", "Rough Line Weight"]
df = df[cols]
df

Unnamed: 0,Station ID,Name,Latitude,Longitude,Lines,Line,Average Weekday Ridership,Rough Line Weight
0,80154,5th Street Station,33.773603,-118.189424,['Blue'],A,31752,9.780439
1,80153,1st Street Station,33.768745,-118.189374,['Blue'],A,31752,9.780439
2,80101,Downtown Long Beach Station,33.768076,-118.192933,['Blue'],A,31752,9.780439
3,80102,Pacific Ave Station,33.772263,-118.193712,['Blue'],A,31752,9.780439
4,80105,Anaheim Street Station,33.781835,-118.189396,['Blue'],A,31752,9.780439
...,...,...,...,...,...,...,...,...
89,80708,Martin Luther King Jr Station,34.010167,-118.335347,['K'],K,1,0.000308
90,80707,Leimert Park Station,34.004582,-118.332665,['K'],K,1,0.000308
91,80706,Hyde Park Station,33.988290,-118.330836,['K'],K,1,0.000308
92,80705,Fairview Heights Station,33.975284,-118.336030,['K'],K,1,0.000308


In [156]:
#disconnected station map with line/ridership weights
fig = px.scatter_mapbox(df, 
                        lat = "Latitude",
                        lon = "Longitude",
                        color = "Line",
                        size = "Rough Line Weight",
                        hover_name = "Name",
                        #hover_data = "Lines",
                        zoom = 8.8,
                        height = 800,
                        width = 800,
                        title = "Graph Representation of LA Metro Station Locations by Line (With Ridership Weights)",
                        mapbox_style = "carto-positron")

fig.show()

In [157]:
edges = pd.read_csv("Edge Weights.csv")
edges

Unnamed: 0.1,Unnamed: 0,Station ID,Next Station ID,Edge Weight
0,0,80101,80102,0.962959
1,1,80105,80106,0.962959
2,2,80105,80154,0.962959
3,3,80106,80107,0.962959
4,4,80107,80108,0.962959
...,...,...,...,...
96,96,80704,80705,0.906039
97,97,80705,80706,0.906039
98,98,80706,80707,0.906039
99,99,80707,80708,0.906039


In [158]:
df = pd.merge(df, edges, on="Station ID")

In [159]:
df["Edge Weight"] = (df["Edge Weight"]**(3))*20

In [160]:
df

Unnamed: 0.1,Station ID,Name,Latitude,Longitude,Lines,Line,Average Weekday Ridership,Rough Line Weight,Unnamed: 0,Next Station ID,Edge Weight
0,80153,1st Street Station,33.768745,-118.189374,['Blue'],A,31752,9.780439,39,80154,17.858822
1,80101,Downtown Long Beach Station,33.768076,-118.192933,['Blue'],A,31752,9.780439,0,80102,17.858822
2,80105,Anaheim Street Station,33.781835,-118.189396,['Blue'],A,31752,9.780439,1,80106,17.858822
3,80105,Anaheim Street Station,33.781835,-118.189396,['Blue'],A,31752,9.780439,2,80154,17.858822
4,80106,Pacific Coast Hwy Station,33.789095,-118.189394,['Blue'],A,31752,9.780439,3,80107,17.858822
...,...,...,...,...,...,...,...,...,...,...,...
83,80708,Martin Luther King Jr Station,34.010167,-118.335347,['K'],K,1,0.000308,100,80709,14.875467
84,80707,Leimert Park Station,34.004582,-118.332665,['K'],K,1,0.000308,99,80708,14.875467
85,80706,Hyde Park Station,33.988290,-118.330836,['K'],K,1,0.000308,98,80707,14.875467
86,80705,Fairview Heights Station,33.975284,-118.336030,['K'],K,1,0.000308,97,80706,14.875467


In [162]:
#This is basically the same as the plot I had above, so this is redundant info... also, some stations are missing? JUst gonna use the plot above
fig = px.scatter_mapbox(df, 
                        lat = "Latitude",
                        lon = "Longitude",
                        color = "Line",
                        size = "Edge Weight",
                        hover_name = "Name",
                        #hover_data = "Lines",
                        zoom = 8.8,
                        height = 800,
                        width = 800,
                        title = "Graph Representation of LA Metro Station Locations by Line (With NetworkX Weights)",
                        mapbox_style = "carto-positron")

fig.show()