# A6
Eli Paul
DS4200

In [1]:
import pandas as pd
import altair as alt
alt.data_transformers.disable_max_rows()

DataTransformerRegistry.enable('default')

#### Loading data and a bit of manipulation

In [2]:
mbta = pd.read_csv('MBTA_Line_and_Stop.csv', index_col = 'FID')
years = ['2019','2018', '2017']
x = pd.DataFrame()
for y in years:
    idxs = mbta.index[mbta['season'].str.contains(y)].tolist()
    mbta.loc[idxs, 'season'] = y
    
mbta['season'] = pd.to_numeric(mbta['season'])

#### Manually set color theme so colors and route names match

In [3]:
colors = sorted(list(mbta['route_id'].unique()))
def my_theme():
    return {
    'config': {
      'view': {'continuousHeight': 300, 'continuousWidth': 400},
      'range': {'category': colors}
    }
  }

#### Plotting

In [4]:
scat = mbta.groupby(['route_name','direction_id','day_type_name', 'stop_name']).mean(numeric_only=True).reset_index()

# selector for brushing and linking
selector = alt.selection_point(on='mouseover', nearest=False, fields=['route_name'])

# boolean to be used for conditionals and pop effects
stop_bool = ((alt.datum.stop_name == 'Northeastern University') | (alt.datum.stop_name == 'Ruggles'))

# Scatter plot
scatter = alt.Chart(scat, title = 'Average_ons vs Average_offs').mark_point(filled=True).encode(
    x=alt.X('average_ons'),
    y='average_offs',
    color='route_name',
    tooltip=['stop_name', 'direction_id', 'day_type_name', 'average_ons','average_offs'],
    size=alt.condition(stop_bool, alt.value(100), alt.value(10)),
    opacity=alt.condition(stop_bool, alt.value(1), alt.value(0.3))
).properties(
    width=675,
    height=300
).interactive().add_params(selector)


# line plot
time_series = mbta.groupby(['route_id','season','time_period_name', 'route_name'])['average_flow'].mean(numeric_only=True).reset_index()
order = ['VERY_EARLY_MORNING', 'EARLY_AM', 'AM_PEAK', 'MIDDAY_BASE', 'OFF_PEAK', 'MIDDAY_SCHOOL', 'PM_PEAK', 'EVENING', 'LATE_EVENING', 'NIGHT']
alt.themes.register('my_theme', my_theme)
alt.themes.enable('my_theme')
line = alt.Chart(time_series).mark_line(point=True).encode(
    x=alt.X('time_period_name', sort = order),
    y='average_flow:Q',
    color='route_name'
).properties(
    width=225,
    height=300
).transform_filter(
    selector
).facet(
    column=alt.Column('season:N', title='Usage Throughout The Day, per year'))

scatter

This scatter plot shows average_off vs average_on. Each point represents a station, day type (weekday, saturday, or sunday), and direction (1 = inbound, 0 = outbound). For example, we can see that at the northeastern station on saturdays going inbound, there is roughly 6x more people getting on than getting off the train. Most points will have inverses, i.e. if a point has low average_offs and high average_ons, there is generally a point at the same station that has high average_offs and low average_ons which makes sense as most people will take a train out and back.

In [5]:
alt.Chart(time_series).mark_line(point=True).encode(
    x=alt.X('time_period_name', sort = order),
    y='average_flow:Q',
    color='route_name'
).properties(
    width=225,
    height=300
).facet(
    column=alt.Column('season:N', title='Usage Throughout The Day, per year'))

This is a faceted line plot that shows the daily ride usage trends broken down by year and route. We can see spikes in usage at AM_PEAK, when most people go to work, OFF_PEAK, presumably lunch, and PM_PEAK, when people are commuting home. Someting of note is that ride usage for the red and orange line are roughly the same in 2017 and 2019, but in 2018 the red line sees significanlty higher numbers.

In [6]:
# vertical concat
scatter & line

This is the concatanation of the graphs with interactability. Hovering over a point on the scatter plot highlights the appropriate route on the line graph below.<br>

For step 6, the pop out effects I used were both size and opacity. I chose to use both as size or opacity alone would not be enough due to the high point denisty near the origin, but both together effectivly draws the users attention to the highlighted points.