# Explore: Amtrak State Supported Michigan Service

## Intercity Passenger Rail Service Station Performance Metrics

The Amtrak [network](https://www.amtrak.com/content/dam/projects/dotcom/english/public/documents/Maps/Amtrak-System-Map-020923.pdf)
is a passenger rail service that provides intercity rail service in the
continental United States and to select Canadian cities. The network is operated by the
[National Railroad Passenger Corporation](https://railroads.dot.gov/passenger-rail/amtrak/amtrak),
a federally chartered for-profit corporation that receives some state funding and covers its
operating costs by selling tickets and providing other services.

This notebook commences exploration of the augmented quarterly
[Amtrak](https://www.amtrak.com/home.html) station performance metrics for trains supported by
the State of Michigan. The goal is to better understand individual Amtrak Michigan Service
performance and identify potential areas for further analysis.

### Variable names

A number of variable names in this project leverage the following abbreviations. The naming
strategy is to strike a balance between brevity and readability:

* `amtk`: Amtrak (reporting mark)
* `chrt`: chart
* `cols`: columns
* `const`: constant
* `cwd`: current working directory
* `eb`: eastbound direction of travel
* `lm`: linear model
* `mi`: miles
* `mm`: minutes (ISO 8601)
* `nb`: northbound direction of travel
* `psgr`: passenger
* `qtr`: quarter
* `rte`: route
* `sb`: southbound direction of travel
* `stats`: summary statistics
* `stn`: station
* `stns`: stations
* `svc`: service
* `trn`: train
* `wb`: westbound direction of travel

In [1]:
import json
import numpy as np
import pandas as pd
import pathlib as pl
import tomllib as tl

import fra_amtrak.amtk_detrain as detrn
import fra_amtrak.amtk_frame as frm
import fra_amtrak.amtk_network as ntwk
import fra_amtrak.chart_box_preagg as boxp
import fra_amtrak.chart_hist as hst
import fra_amtrak.chart_hist_layer as hstl
import fra_amtrak.chart_line as lne
import fra_amtrak.chart_title as ttl

## 1.0 Read files

### 1.1 Resolve paths

In [2]:
parent_path = pl.Path.cwd()  # current working directory
parent_path

PosixPath('/home/jovyan/work/assignments/Course4')

### 1.2 Load constants

Load a companion [TOML](https://toml.io/en/) file named `notebook.toml` containing constants.

In [3]:
filepath = parent_path.joinpath("notebook.toml")
with open(filepath, "rb") as file_obj:
    const = tl.load(file_obj)

# Access constants
AGG = const["agg"]
CHRT_BAR = const["chart"]["bar"]
COLORS = const["colors"]
COLS = const["columns"]
DIRECTION = const["train"]["direction"]
SVC = const["services"]
SUB_SVC = const["train"]["sub_service"]
TRN = const["train"]

### 1.3 Retrieve sub service route information

The file `amtk_sub_services.json` contains miscellaneous information about Amtrak sub services (i.e., named trains).

In [4]:
filepath = parent_path.joinpath("data", "processed", "amtk_sub_services.json")
with open(filepath, "r") as file:
    amtk_sub_svcs = json.load(file)
len(amtk_sub_svcs)

48

### 1.4 Retrieve station details

The file `amtk_stations.csv` contains location-related information for all Amtrak stations.

In [5]:
filepath = parent_path.joinpath("data", "processed", "amtk_stations.csv")
stations = pd.read_csv(filepath, dtype={"ZIP Code": "str"}, low_memory=False)
stations.shape

(536, 13)

### 1.5 Retrieve performance data

In [6]:
filepath = parent_path.joinpath("data", "processed", "station_performance_metrics-v1p2.csv")
trains = pd.read_csv(
    filepath, dtype={"Address 02": "str", "ZIP Code": "str"}, low_memory=False
)  # avoid DtypeWarning
trains.shape

(68412, 24)

### 1.6 Retrieve late time predictions

In [7]:
filepath = parent_path.joinpath("data", "student", "stu-amtk-avg_min_late_predict.csv")
predictions = pd.read_csv(filepath, low_memory=False)
predictions.shape

(153, 3)

## 2.0 State Supported Michigan Service [1 pt]

Amtrak's state-supported trains are funded by state governments. These services are typically
shorter in length and operate within a single state or across multiple states. Amtrak's
[Michigan](https://www.amtrak.com/michigan-services-train) service include the _Pere Marquette_,
_Blue Water_, and _Wolverine_ trains with routes between Chicago and Grand Rapids, Chicago and Port
Huron, and Chicago and greater Detroit.

Retrieve the Michigan Service performance data by calling the appropriate `amtk_network`
function. Assign the return value of the function call to a variable named `mich`.

In [8]:
mich = ntwk.by_service(trains, "Michigan")
mich.shape

(1079, 24)

In [9]:
#hidden tests are within this cell

### 2.1 Michigan service: on-time performance metrics (entire period) [1 pt]

Michigan service performance data is a compilation of quarterly metrics that focus on late
detraining passengers. Detraining passengers are considered on-time if they arrive at their
destination no later than fifteen (`15`) minutes after their scheduled arrival time. All other
detraining passengers are considered late.

In [10]:
# Total train arrivals
mich_trn_arrivals = mich.shape[0]

# Detraining totals
mich_detrn = mich[COLS["total_detrn"]].sum()
mich_detrn_late = mich[COLS["late_detrn"]].sum()
mich_detrn_on_time = mich_detrn - mich_detrn_late

# Compute summary statistics
mich_stats = detrn.get_sum_stats(mich, AGG["columns"], AGG["funcs"])
mich_stats

Unnamed: 0,Train Arrivals,Total Detraining Customers sum,Total Detraining Customers mean,Total Detraining Customers median,Total Detraining Customers std,Late Detraining Customers sum,Late Detraining Customers mean,Late Detraining Customers median,Late Detraining Customers std,Late to Total Detraining Customers Ratio,Late Detraining Customers Avg Min Late mean,Total On Time Detraining Customers sum
0,1079,1768527.0,1639.0426,431.0,3421.0704,553760.0,513.2159,136.0,1132.6699,0.3131,,1214767.0


In [11]:
#hidden tests are within this cell

### 2.2 Michigan service: mean late arrival times [1 pt]

Review the central tendency, dispersion, and shape for the mean late arrival times of _Wolverine_ trains.

In [12]:
# YOUR CODE HERE
# Drop missing values
mich_avg_mm_late = mich[COLS["late_detrn_avg_mm_late"]].dropna().reset_index(drop=True)

# Call the custom frm.describe_numeric_column() function again
mich_avg_mm_late_describe = frm.describe_numeric_column(mich_avg_mm_late)
mich_avg_mm_late_describe

{'type': pandas.core.series.Series,
 'name': 'Late Detraining Customers Avg Min Late',
 'values': {'non_null': np.int64(980),
  'missing': np.int64(0),
  'dtype': dtype('float64')},
 'center': {'mean': np.float64(49.15),
  'median': 45.0,
  'mode': np.float64(40.0)},
 'position': {'min': 16.0,
  '25%': np.float64(37.0),
  '50%': np.float64(45.0),
  '75%': np.float64(58.0),
  'max': 177.0},
 'spread': {'variance': 385.6904494382011,
  'std': 19.63900326997786,
  'range': 161.0,
  'iqr': np.float64(21.0)},
 'shape': {'skewness': np.float64(1.470751142773803),
  'kurtosis': np.float64(3.6730832219434637)}}

The skewness and kurtosis values returned suggest that the distribution of mean late arrival times of _Wolverine_ trains is positively skewed and features a sharper peak and heavier right tail than a normal distribution. Let's confirm this visually by generating a histogram.

In [13]:
#hidden tests are within this cell

### 2.3 Michigan service: visualize distribution of mean late arrival times [1 pt]

Visualize mean late arrival times for the entire period. The data is binned prior to plotting.

#### 2.3.1 Create the chart data

In [14]:
# Convert to DataFrame
mich_avg_mm_late = mich_avg_mm_late.to_frame(name=COLS["avg_mm_late"])

# Get mean and standard deviation
mich_mu = mich_avg_mm_late_describe["center"]["mean"]
mich_sigma = mich_avg_mm_late_describe["spread"]["std"]

# Get max value (for x-axis ticks); pad max value for chart display
mich_max_val = mich_avg_mm_late_describe["position"]["max"]
mich_max_val_ceil = (np.ceil(mich_max_val / 10) * 10).astype(int)

# Create bins
mich_mm_late, mich_bins, mich_num_bins, mich_bin_width = frm.create_bins(mich_avg_mm_late, COLS["avg_mm_late"], 5)

# Bin the data
chrt_data = frm.bin_data(mich_mm_late, COLS["avg_mm_late"], mich_bins)
# chrt_data

In [15]:
#hidden tests are within this cell

#### 2.3.2 Generate the histogram

In [16]:
# Chart title
title_txt = f"Amtrak {SVC['mich']} Service Late Detraining Passengers"
title = ttl.format_title(mich_stats, title_txt)

# Tooltips
tooltip_config = [
    {"shorthand": "bin_center:Q", "title": "Average Minutes Late", "format": None},
    {"shorthand": "count:Q", "title": "Late Arrivals Count", "format": None},
]

# Create and display the histogram
chart = hst.create_histogram(
    frame=chrt_data,
    x_shorthand="bin_center:Q",
    x_title="Average Minutes Late",
    y_shorthand="count:Q",
    y_title="Late Arrivals Count",
    y_stack=False,
    line_shorthand="Avg Min Late:Q",
    mu=mich_mu,
    sigma=mich_sigma,
    num_bins=mich_num_bins,
    bin_width=mich_bin_width,
    x_tick_count_max=mich_max_val_ceil,
    bar_color=COLORS["amtk_blue"],
    mu_color=COLORS["amtk_red"],
    sigma_color=COLORS["anth_gray"],
    tooltip_config=tooltip_config,
    title=title,
)
chart.display()

## 3.0 Michigan sub services: on-time performance metrics (entire period) [1 pt]

Call the appropriate `amtk_detrain` function and pass it the arguments required to return Michigan
service summary statistics grouped by sub service. Assign the return value of the function call to a
variable named `mich_sub_svcs_stats`.

In [17]:
# YOUR CODE HERE
mich_sub_svcs_stats = detrn.get_sum_stats_by_group(
    mich,
    [COLS["sub_svc"]],
    AGG["columns"],
    AGG["funcs"],
)
mich_sub_svcs_stats

Unnamed: 0,Sub Service,Train Arrivals,Total Detraining Customers sum,Total Detraining Customers mean,Total Detraining Customers median,Total Detraining Customers std,Late Detraining Customers sum,Late Detraining Customers mean,Late Detraining Customers median,Late Detraining Customers std,Late to Total Detraining Customers Ratio,Late Detraining Customers Avg Min Late mean,Total On Time Detraining Customers sum
0,Blue Water,220,436721,1985.0955,527.0,4256.8445,139527,634.2136,150.0,1592.8122,0.3195,51.7512,297194
1,Pere Marquette,88,238206,2706.8864,761.0,3651.0593,43311,492.1705,169.0,651.5801,0.1818,40.7361,194895
2,Wolverine,771,1093600,1418.4176,402.0,3084.1497,370922,481.0921,135.0,1010.5961,0.3392,49.2532,722678


In [18]:
#hidden tests are within this cell

### 3.1 Michigan sub services: visualize distribution of mean late arrival times

Visualize mean late arrival times for the entire period. The data is binned prior to plotting.

#### 3.1.1 Retrieve each sub service [3 pts]

Call the appropriate `amtk_network` function to retrieve the performance data for each
Michigan sub service (_Blue Water_, _Pere Marquette_, and _Wolverine_). Assign the return value of
each function call to the following variables:

1. _Blue Water_: `blwtr`
2. _Pere Marquette_: `prmrq`
3. _Wolverine_: `wolv`

In [19]:
# Assign Blue Water here
blwtr = ntwk.by_sub_service(trains, "Blue Water")

In [20]:
#hidden tests are within this cell

In [21]:
# Assign Pere Marquette here
prmrq = ntwk.by_sub_service(trains, "Pere Marquette")

In [22]:
#hidden tests are within this cell

In [23]:
# Assign Wolverine here
wolv = ntwk.by_sub_service(trains, "Wolverine")

In [24]:
#hidden tests are within this cell


#### 3.1.2 Create chart data

In [25]:
# List of sub-services and their mappings
sub_svcs = [
    {"sub_svc": SUB_SVC["blwtr"], "frame": blwtr, "color": COLORS["blue"], "order": 2},
    {"sub_svc": SUB_SVC["prmrq"], "frame": prmrq, "color": COLORS["amtk_red"], "order": 3},
    {"sub_svc": SUB_SVC["wolv"], "frame": wolv, "color": COLORS["amtk_blue"], "order": 1},
]

# Create a three-column DataFrame comprising average late times for each sub-service
mich_sub_svcs = pd.DataFrame({
    sub_svc["sub_svc"]: sub_svc["frame"][COLS["late_detrn_avg_mm_late"]]
    .dropna()
    .reset_index(drop=True)
    for sub_svc in sub_svcs
})

# print(mich_sub_svcs.head())

# Melt the DataFrame for charting purposes
chrt_data = pd.melt(
    mich_sub_svcs,
    var_name="Sub Service",
    value_name="Average Minutes Late",
)

# print(chrt_data.head())

# Histograme color and order mappings
hst_colors = {sub_svc["sub_svc"]: sub_svc["color"] for sub_svc in sub_svcs}
hst_order = {sub_svc["sub_svc"]: sub_svc["order"] for sub_svc in sub_svcs}

# Enforce the layering order
chrt_data["order"] = chrt_data[COLS["sub_svc"]].map(hst_order)
# chrt_data

#### 3.1.3 Generate histogram

In [26]:
# Chart title
title = ttl.format_title(mich_stats, f"Amtrak {SVC['mich']} Service Late Detraining Passengers")

# Tooltip configuration
tooltip_config = [
    {"shorthand": "Sub Service:N", "title": "Sub Service", "format": None},
    {"shorthand": "bin_range:N", "title": "Average Minutes Late (range)", "format": None},
    {"shorthand": "mean_late:Q", "title": "Average Minutes Late (mean)", "format": ".3f"},
    {"shorthand": "count:Q", "title": "Late Arrivals Count", "format": None},
]

chart = hstl.create_layered_histogram(
    frame=chrt_data,
    x_shorthand="bin_start:Q",
    x_title="Average Minutes Late",
    x_tick_count_max=mich_max_val_ceil,
    x2_shorthand="bin_end:Q",
    y_shorthand="count:Q",
    y_title="Late Arrivals Count",
    y_stack=False,
    line_shorthand="Avg Min Late:Q",
    mu=mich_mu,
    sigma=mich_sigma,
    max_bins=mich_num_bins,
    bin_step=5,
    hst_order_shorthand="order:O",
    hst_color_shorthand="Sub Service:N",
    hst_colors=hst_colors,
    mu_color=COLORS["amtk_red"],
    sigma_color=COLORS["anth_gray"],
    tooltip_config=tooltip_config,
    title=title,
)
chart.display()

## 4.0 Michigan _Blue Water_ service (Chicago, IL - Port Huron, MI)

The _Blue Water_ operates between [Chicago Union Station](https://www.amtrak.com/stations/chi), Chicago, IL ([CHI](https://www.amtrak.com/stations/chi))
and Port Huron, MI ([PTH](https://www.amtrak.com/stations/pth)). Intermediate stops include
Kalamazoo, MI ([KAL](https://www.amtrak.com/stations/kal)),
Battle Creek, MI ([BTL](https://www.amtrak.com/stations/btl)),
East Lansing, MI ([LNS](https://www.amtrak.com/stations/lns)),
and Flint, MI ([FLN](https://www.amtrak.com/stations/fln)), among other towns and cities.

### 4.1 _Blue Water_: on-time performance metrics (entire period) [2 pts]

_Blue Water_ performance data is a compilation of quarterly metrics that focus on late
detraining passengers. Detraining assengers are considered on-time if they arrive at their
destination no later than fifteen (`15`) minutes after their scheduled arrival time. All other
detraining passengers are considered late.

Retrieve the _Blue Water_ row from the `mich_sub_svcs_stats` DataFrame. Call the appropriate
`DataFrame` method to convert the row to a `Series`. Assign the return value to a variable named
`blwtr_stats`.

In [27]:
blwtr_stats = mich_sub_svcs_stats.loc[0,:].squeeze(axis=0)
blwtr_stats

Sub Service                                    Blue Water
Train Arrivals                                        220
Total Detraining Customers sum                     436721
Total Detraining Customers mean                 1985.0955
Total Detraining Customers median                   527.0
Total Detraining Customers std                  4256.8445
Late Detraining Customers sum                      139527
Late Detraining Customers mean                   634.2136
Late Detraining Customers median                    150.0
Late Detraining Customers std                   1592.8122
Late to Total Detraining Customers Ratio           0.3195
Late Detraining Customers Avg Min Late mean       51.7512
Total On Time Detraining Customers sum             297194
Name: 0, dtype: object

In [28]:
#hidden tests are within this cell

In [29]:
# Total train arrivals
blwtr_trn_arrivals = blwtr_stats["Train Arrivals"]
# blwtr_trn_arrivals

# Detraining totals
blwtr_detrn = blwtr_stats[f"{COLS['total_detrn']} sum"]
blwtr_detrn_late = blwtr_stats[f"{COLS['late_detrn']} sum"]
blwtr_detrn_on_time = blwtr_detrn - blwtr_detrn_late

print(
    f"Train Arrivals: {blwtr_trn_arrivals}",
    f"Total Detraining Customers: {blwtr_detrn}",
    f"Late Detraining Customers: {blwtr_detrn_late}",
    f"On-Time Detraining Customers: {blwtr_detrn_on_time}",
    sep="\n",
)

Train Arrivals: 220
Total Detraining Customers: 436721
Late Detraining Customers: 139527
On-Time Detraining Customers: 297194


In [30]:
#hidden tests are within this cell

### 4.2 _Blue Water_ trains [1 pt]

Each _Blue Water_ train is identified by a unique train number.

Create a `DataFrame` named `blwtr_trns` that contains one row for each train comprising the
_Pere Marquette_ service. Include the following columns in the `DataFrame` in the order specified:

1. "Service Line"
2. "Service"
3. "Sub Service"
4. "Route Miles"
5. "Train Number"

Reset the index (set `drop=True`) when creating the new `DataFrame`.

In [31]:
# YOUR CODE HERE
blwtr_trns = blwtr.drop_duplicates(subset='Train Number', ignore_index=True)
blwtr_trns = blwtr_trns[["Service Line", "Service", "Sub Service", "Route Miles", "Train Number"]]
blwtr_trns.sort_values(by="Train Number", inplace=True, ignore_index=True)
blwtr_trns

Unnamed: 0,Service Line,Service,Sub Service,Route Miles,Train Number
0,State Supported,Michigan,Blue Water,319,364
1,State Supported,Michigan,Blue Water,319,365


In [32]:
#hidden tests are within this cell

### 4.3 _Blue Water_: mean late arrival times [2 pts]

In an earlier notebook, a _simple_ least-squares linear regression was formulated to estimate the
linear relationship between mean late arrival times and distance traveled (e.g., route miles). The
model suggests that with every additional route mile, the average minutes late for late detraining
passengers increases by approximately `0.0338` minutes. The _R_&sup2; value indicated that around
`25.5%` of the variability in the average minutes late could be explained by the number of route
miles traveled, with the remaining variability due to other factors or random noise.

For _Blue Water_ late detraining passengers traveling the entire route, the model predicts a mean
late arrival time of approximately `39.29` minutes.

Retrieve the _Blue Water_ row from the `predictions` DataFrame. Call the appropriate `DataFrame`
method to convert the row to a `Series`. Assign the return value to a variable named
`blwtr_predicted`.

In [33]:
# YOUR CODE HERE
blwtr_predicted = predictions[predictions["Sub Service"] == "Blue Water"].squeeze(axis=0)
blwtr_predicted

Route Miles                      319
Predicted Avg Min Late         39.29
Sub Service               Blue Water
Name: 29, dtype: object

In [34]:
#hidden tests are within this cell

Contrast this prediction with the actual mean late arrival times experienced by _Blue Water_ late detraining passengers.

In [35]:
# Drop missing values
blwtr_avg_mm_late = blwtr[COLS["late_detrn_avg_mm_late"]].dropna().reset_index(drop=True)

# Call the custom frm.describe_numeric_column() function
blwtr_avg_mm_late_describe = frm.describe_numeric_column(blwtr_avg_mm_late)
blwtr_avg_mm_late_describe

{'type': pandas.core.series.Series,
 'name': 'Late Detraining Customers Avg Min Late',
 'values': {'non_null': np.int64(205),
  'missing': np.int64(0),
  'dtype': dtype('float64')},
 'center': {'mean': np.float64(51.75121951219512),
  'median': 45.0,
  'mode': np.float64(43.0)},
 'position': {'min': 19.0,
  '25%': np.float64(36.0),
  '50%': np.float64(45.0),
  '75%': np.float64(64.0),
  'max': 123.0},
 'spread': {'variance': 448.51133428981353,
  'std': 21.17808618099883,
  'range': 104.0,
  'iqr': np.float64(28.0)},
 'shape': {'skewness': np.float64(1.029988039569114),
  'kurtosis': np.float64(0.5463999664567885)}}

In [None]:
#hidden tests are within this cell

### 4.4 _Blue Water_: eastbound and westbound routes [1 pt]

Stations served by eastbound and westbound _Blue Water_ trains.

In [50]:
# Retrieve the sub service from the Amtrak sub services list
blwtr_sub_svc = next(
    (sub_svc for sub_svc in amtk_sub_svcs if sub_svc["sub service"] == SUB_SVC["blwtr"])
)
blwtr_stn_codes = blwtr_sub_svc["station codes"]
blwtr_stns = stations[stations[COLS["station_code"]].isin(blwtr_stn_codes)].reset_index(drop=True)
blwtr_stns.sort_values(by=COLS["lon"], inplace=True)
blwtr_stns

Unnamed: 0,Arrival Station Code,Arrival Station,Arrival Station Type,City,Address 01,Address 02,ZIP Code,State,Division,Region,Country,Latitude,Longitude
1,CHI,Chicago (Union Station),Station Building (with waiting room),Chicago,255 South Clinton Street,,60606,Illinois,East North Central,Midwest,United States,41.878992,-87.641015
8,NBU,New Buffalo,Platform with Shelter,New Buffalo,226 North Whittaker Street,,49117,Michigan,East North Central,Midwest,United States,41.796656,-86.745782
9,NLS,Niles,Station Building (with waiting room),Niles,598 Dey Street,,49120,Michigan,East North Central,Midwest,United States,41.837412,-86.252372
2,DOA,Dowagiac,Station Building (with waiting room),Dowagiac,200 Depot Drive,,49047,Michigan,East North Central,Midwest,United States,41.980941,-86.109041
5,KAL,Kalamazoo,Station Building (with waiting room),Kalamazoo,459 North Burdick Street,,49007,Michigan,East North Central,Midwest,United States,42.295255,-85.584018
0,BTL,Battle Creek,Station Building (with waiting room),Battle Creek,119 McCamly Street South,,49017,Michigan,East North Central,Midwest,United States,42.318453,-85.187825
6,LNS,East Lansing,Station Building (with waiting room),East Lansing,1240 South Harrison Road,,48823,Michigan,East North Central,Midwest,United States,42.718731,-84.495962
3,DRD,Durand,Station Building (with waiting room),Durand,200 South Railroad Street,,48429,Michigan,East North Central,Midwest,United States,42.909495,-83.98232
4,FLN,Flint,Station Building (with waiting room),Flint,1407 South Dort Highway,,48503,Michigan,East North Central,Midwest,United States,43.015425,-83.651728
7,LPE,Lapeer,Station Building (with waiting room),Lapeer,73 Howard Street,,48446,Michigan,East North Central,Midwest,United States,43.04952,-83.306154


Station order for eastbound and westbound _Blue Water_ trains.

In [54]:
blwtr_stn_order_eb = blwtr_sub_svc["station order"]["eastbound"]
blwtr_stn_order_wb = blwtr_sub_svc["station order"]["westbound"]
# blwtr_stn_order_eb, blwtr_stn_order_wb

({'CHI': 0,
  'NBU': 1,
  'NLS': 2,
  'DOA': 3,
  'KAL': 4,
  'BTL': 5,
  'LNS': 6,
  'DRD': 7,
  'FLN': 8,
  'LPE': 9,
  'PTH': 10},
 {'PTH': 0,
  'LPE': 1,
  'FLN': 2,
  'DRD': 3,
  'LNS': 4,
  'BTL': 5,
  'KAL': 6,
  'DOA': 7,
  'NLS': 8,
  'NBU': 9,
  'CHI': 10})

In [None]:
#hidden tests are within this cell

### 4.5 _Blue Water_: eastbound detraining passengers summary statistics

#### 4.5.1 _Blue Water_ Train 364 [1 pt]

In [56]:
# Base columns for routes
rte_cols = [
    COLS["trn"],
    COLS["station_code"],
    COLS["station"],
    COLS["state"],
    COLS["lat"],
    COLS["lon"],
]

# Train 364 eastbound
amtk_364 = ntwk.by_train_number(trains, 364)
amtk_364_rte = ntwk.create_route(amtk_364, TRN["364"]["direction"], blwtr_stn_order_eb)
amtk_364_rte_stats = detrn.get_route_sum_stats(
    amtk_364_rte, COLS["station_code"], AGG["columns"], AGG["funcs"], rte_cols
)
amtk_364_rte_stats

Unnamed: 0,Train Number,Arrival Station Code,Arrival Station,State,Latitude,Longitude,Train Arrivals,Total Detraining Customers sum,Total Detraining Customers mean,Total Detraining Customers median,Total Detraining Customers std,Late Detraining Customers sum,Late Detraining Customers mean,Late Detraining Customers median,Late Detraining Customers std,Late to Total Detraining Customers Ratio,Late Detraining Customers Avg Min Late mean,Total On Time Detraining Customers sum
0,364,NBU,New Buffalo,Michigan,41.796656,-86.745782,11,9288,844.3636,713.0,374.532,3755,341.3636,350.0,168.4994,0.4043,49.7273,5533
1,364,NLS,Niles,Michigan,41.837412,-86.252372,11,3967,360.6364,351.0,125.0882,1649,149.9091,168.0,50.6131,0.4157,55.0909,2318
2,364,DOA,Dowagiac,Michigan,41.980941,-86.109041,11,3051,277.3636,257.0,122.4143,1302,118.3636,110.0,66.9601,0.4267,54.2727,1749
3,364,KAL,Kalamazoo,Michigan,42.295255,-85.584018,11,23908,2173.4545,2327.0,466.517,10569,960.8182,950.0,254.7343,0.4421,55.7273,13339
4,364,BTL,Battle Creek,Michigan,42.318453,-85.187825,11,10605,964.0909,953.0,238.6191,5146,467.8182,455.0,162.7912,0.4852,53.2727,5459
5,364,LNS,East Lansing,Michigan,42.718731,-84.495962,11,85759,7796.2727,7876.0,831.2323,24850,2259.0909,2197.0,896.3483,0.2898,64.7273,60909
6,364,DRD,Durand,Michigan,42.909495,-83.98232,11,13861,1260.0909,1270.0,211.336,4052,368.3636,316.0,138.4755,0.2923,64.7273,9809
7,364,FLN,Flint,Michigan,43.015425,-83.651728,11,31196,2836.0,2888.0,465.1552,8965,815.0,637.0,307.9659,0.2874,68.9091,22231
8,364,LPE,Lapeer,Michigan,43.04952,-83.306154,11,9972,906.5455,921.0,138.3477,2551,231.9091,193.0,78.7318,0.2558,72.8182,7421
9,364,PTH,Port Huron,Michigan,42.960419,-82.443805,11,16708,1518.9091,1588.0,487.7802,3899,354.4545,275.0,169.5402,0.2334,69.0,12809


In [None]:
#hidden tests are within this cell

##### 4.5.1.1 Write to file [1 pt]

Write `amtk_364_rte_stats` to a CSV file named `stu-amtk_364_rte_stats.csv`. Store the file in the
`data/student` directory. Then compare it to the accompanying `fxt-amtk_364_rte_stats.csv` file.
It must match line for line, character for character.

In [58]:
# YOUR CODE HERE
filepath = parent_path.joinpath("data", "student", "stu-amtk_364_rte_stats.csv")
amtk_364_rte_stats.to_csv(filepath, index=False)

In [None]:
#hidden tests are within this cell

### 4.6 _Blue Water_ eastbound mean late arrival times

#### 4.6.1 _Blue Water_ Train 364 [1 pt]

Review the central tendency, dispersion, and shape for the mean late arrival times of train 364.

In [60]:
# Drop missing values
amtk_364_avg_mm_late = amtk_364[COLS["late_detrn_avg_mm_late"]].dropna().reset_index(drop=True)

# Describe the column
amtk_364_avg_mm_late_describe = frm.describe_numeric_column(amtk_364_avg_mm_late)
amtk_364_avg_mm_late_describe

{'type': pandas.core.series.Series,
 'name': 'Late Detraining Customers Avg Min Late',
 'values': {'non_null': np.int64(110),
  'missing': np.int64(0),
  'dtype': dtype('float64')},
 'center': {'mean': np.float64(60.82727272727273),
  'median': 57.0,
  'mode': np.float64(67.0)},
 'position': {'min': 27.0,
  '25%': np.float64(44.0),
  '50%': np.float64(57.0),
  '75%': np.float64(75.0),
  'max': 123.0},
 'spread': {'variance': 474.29099249374497,
  'std': 21.77822289567597,
  'range': 96.0,
  'iqr': np.float64(31.0)},
 'shape': {'skewness': np.float64(0.6697712494095664),
  'kurtosis': np.float64(-0.20750103718918478)}}

In [None]:
#hidden tests are within this cell

#### 4.6.2 Generate box plots

##### 4.6.2.1 Assemble the chart data

In [61]:
# Base columns for average minutes late
cols = [COLS["year"], COLS["quarter"], COLS["late_detrn_avg_mm_late"]]

# Chart data
chrt_data = detrn.get_qtr_avg_min_late(
    amtk_364_rte, cols, COLS["year_quarter"], [COLORS["amtk_blue"], COLORS["amtk_red"]]
)
chrt_data

Unnamed: 0,Fiscal Year Quarter,Late Detraining Customers Avg Min Late,Color
0,2022Q1,30.0,#ef3824
1,2022Q1,39.0,#ef3824
2,2022Q1,40.0,#ef3824
3,2022Q1,32.0,#ef3824
4,2022Q1,32.0,#ef3824
...,...,...,...
105,2024Q3,43.0,#ef3824
106,2024Q3,43.0,#ef3824
107,2024Q3,45.0,#ef3824
108,2024Q3,49.0,#ef3824


##### 4.6.2.2 Preaggregate the data

In [62]:
# Base columns for aggregation statistics
cols = [COLS["year_quarter"], COLS["late_detrn_avg_mm_late"]]

# Pre-aggregate the data
chrt_data = frm.aggregate_data(chrt_data, cols)

##### 4.6.2.3 Generate chart

In [63]:
# Create chart title
txt = TRN["364"]
title_txt = (
    f"Amtrak {txt['name']} Train {txt['number']} Late Detraining Passengers\n"
    f"{txt['route']} ({txt['direction']})"
)
title = ttl.format_title(amtk_364_rte_stats, title_txt)

# Create and display the vertical boxplot
chart_vertical = boxp.create_boxplot(
    data=chrt_data,
    x_shorthand="Fiscal Year Quarter:N",
    x_title="Period",
    y_shorthand="Late Detraining Customers Avg Min Late:Q",
    y_title="Average Minutes Late",
    box_size=20,
    outlier_shorthand="outliers:Q",
    color_shorthand="Color:N",
    chart_title=title,
    orient=boxp.Orient.VERTICAL,
)
chart_vertical.display()

### 4.7 _Blue Water_: visualize eastbound mean late arrival times by station

In [64]:
# Chart title
title_txt = f"Amtrak {SUB_SVC['blwtr']} Service Late Detraining Passengers (2202 Q1 - 2024 Q3)"
title = ttl.format_title(amtk_364_rte_stats, title_txt)

# Arrange stations by direction of travel
x_sort_order = amtk_364_rte_stats.index.tolist()

# Custom line colors
line_colors = {364: COLORS["amtk_blue"]}

# Tooltips
tooltip_config = [
    {"shorthand": f"{COLS['trn']}:N", "title": "Train", "format": None},
    {"shorthand": f"{COLS['station']}:N", "title": "Arrival Station", "format": None},
    {
        "shorthand": f"{COLS['late_detrn_avg_mm_late']} mean",
        "title": "Average Minutes Late",
        "format": None,
    },
]

chart = lne.create_line_chart(
    frame=amtk_364_rte_stats,
    x_shorthand=f"{COLS['station']}:N",
    x_title=f"{COLS['station']}",
    x_sort_order=x_sort_order,
    y_shorthand=f"{COLS['late_detrn_avg_mm_late']} mean:Q",
    y_title="Average Minutes Late",
    y_tick_count_max=75,
    point=True,
    # point={"filled": False, "fill": "white"},
    color_shorthand=f"{COLS['trn']}:N",
    colors=line_colors,
    tooltip_config=tooltip_config,
    title=title,
)
chart.display()

### 4.8 _Blue Water_: westbound detraining passengers summary statistics

#### 4.8.1 _Blue Water_ Train 365 [1 pt]

Review previous code employed to generate summary statistics for an Amtrak train. Then leverage
functions available in the `amtk_network` and `amtk_detrain` modules to create three new
`DataFrame` objects named `amtk_365`, `amtk_365_rte`, and `amtk_365_rte_stats`, respectively.

In [67]:
# YOUR CODE HERE
# Base columns for routes
rte_cols = [
    COLS["trn"],
    COLS["station_code"],
    COLS["station"],
    COLS["state"],
    COLS["lat"],
    COLS["lon"],
]

# Train 365 westbound
amtk_365 = ntwk.by_train_number(trains, 365)
amtk_365_rte = ntwk.create_route(amtk_365, TRN["365"]["direction"], blwtr_stn_order_wb)
amtk_365_rte_stats = detrn.get_route_sum_stats(
    amtk_365_rte, COLS["station_code"], AGG["columns"], AGG["funcs"], rte_cols
)
amtk_365_rte_stats

Unnamed: 0,Train Number,Arrival Station Code,Arrival Station,State,Latitude,Longitude,Train Arrivals,Total Detraining Customers sum,Total Detraining Customers mean,Total Detraining Customers median,Total Detraining Customers std,Late Detraining Customers sum,Late Detraining Customers mean,Late Detraining Customers median,Late Detraining Customers std,Late to Total Detraining Customers Ratio,Late Detraining Customers Avg Min Late mean,Total On Time Detraining Customers sum
0,365,LPE,Lapeer,Michigan,43.04952,-83.306154,11,87,7.9091,7.0,5.1856,4,0.3636,0.0,0.9244,0.046,70.0,83
1,365,FLN,Flint,Michigan,43.015425,-83.651728,11,797,72.4545,73.0,18.8805,73,6.6364,7.0,3.5853,0.0916,38.7273,724
2,365,DRD,Durand,Michigan,42.909495,-83.98232,11,185,16.8182,15.0,8.0848,11,1.0,1.0,1.2649,0.0595,39.1667,174
3,365,LNS,East Lansing,Michigan,42.718731,-84.495962,11,3139,285.3636,279.0,39.8855,163,14.8182,14.0,10.4864,0.0519,42.3,2976
4,365,BTL,Battle Creek,Michigan,42.318453,-85.187825,11,2538,230.7273,225.0,22.874,203,18.4545,14.0,11.0215,0.08,40.2727,2335
5,365,KAL,Kalamazoo,Michigan,42.295255,-85.584018,11,8703,791.1818,762.0,156.1818,1123,102.0909,98.0,57.3018,0.129,42.4545,7580
6,365,DOA,Dowagiac,Michigan,41.980941,-86.109041,11,1449,131.7273,119.0,22.8521,268,24.3636,19.0,12.3957,0.185,40.5455,1181
7,365,NLS,Niles,Michigan,41.837412,-86.252372,11,3625,329.5455,322.0,69.4872,784,71.2727,68.0,29.2339,0.2163,42.5455,2841
8,365,NBU,New Buffalo,Michigan,41.796656,-86.745782,11,3193,290.2727,240.0,109.8345,763,69.3636,69.0,41.3262,0.239,39.0,2430
9,365,CHI,Chicago (Union Station),Illinois,41.878992,-87.641015,11,204690,18608.1818,18972.0,3427.3903,69397,6308.8182,4687.0,3377.0266,0.339,40.0909,135293


In [None]:
#hidden tests are within this cell

##### 4.8.1.1 Write to file [1 pt]

Write `amtk_365_rte_stats` to a CSV file named `stu-amtk_365_rte_stats.csv`. Store the file in the
`data/student` directory. Then compare it to the accompanying `fxt-amtk_365_rte_stats.csv` file.
It must match line for line, character for character.

In [68]:
# YOUR CODE HERE
filepath = parent_path.joinpath("data", "student", "stu-amtk_365_rte_stats.csv")
amtk_365_rte_stats.to_csv(filepath, index=False)

In [None]:
#hidden tests are within this cell

### 4.9 _Blue Water_: westbound mean late arrival times

#### 4.9.1 _Blue Water_ Train 365 [1 pt]

Review the central tendency, dispersion, and shape for the mean late arrival times of
train 365.

In [69]:
# Drop missing values
amtk_365_avg_mm_late = amtk_365[COLS["late_detrn_avg_mm_late"]].dropna().reset_index(drop=True)

# Describe the column
amtk_365_avg_mm_late_describe = frm.describe_numeric_column(amtk_365_avg_mm_late)
amtk_365_avg_mm_late_describe

{'type': pandas.core.series.Series,
 'name': 'Late Detraining Customers Avg Min Late',
 'values': {'non_null': np.int64(95),
  'missing': np.int64(0),
  'dtype': dtype('float64')},
 'center': {'mean': np.float64(41.242105263157896),
  'median': 39.0,
  'mode': np.float64(36.0)},
 'position': {'min': 19.0,
  '25%': np.float64(30.5),
  '50%': np.float64(39.0),
  '75%': np.float64(45.5),
  'max': 106.0},
 'spread': {'variance': 215.37693169092955,
  'std': 14.675725934035752,
  'range': 87.0,
  'iqr': np.float64(15.0)},
 'shape': {'skewness': np.float64(1.7006423200028027),
  'kurtosis': np.float64(4.082604454045512)}}

In [None]:
#hidden tests are within this cell

#### 4.9.2 Generate box plots

##### 4.9.2.1 Assemble the chart data

In [70]:
# Base columns for average minutes late
cols = [COLS["year"], COLS["quarter"], COLS["late_detrn_avg_mm_late"]]

# Chart data
chrt_data = detrn.get_qtr_avg_min_late(
    amtk_365_rte, cols, COLS["year_quarter"], [COLORS["amtk_blue"], COLORS["amtk_red"]]
)
chrt_data

Unnamed: 0,Fiscal Year Quarter,Late Detraining Customers Avg Min Late,Color
0,2022Q1,34.0,#ef3824
1,2022Q1,40.0,#ef3824
2,2022Q1,,#ef3824
3,2022Q1,29.0,#ef3824
4,2022Q1,50.0,#ef3824
...,...,...,...
105,2024Q3,43.0,#ef3824
106,2024Q3,25.0,#ef3824
107,2024Q3,31.0,#ef3824
108,2024Q3,36.0,#ef3824


##### 4.9.2.2 Preaggregate the data

In [71]:
# Base columns for aggregation statistics
cols = [COLS["year_quarter"], COLS["late_detrn_avg_mm_late"]]

# Pre-aggregate the data
chrt_data = frm.aggregate_data(chrt_data, cols)

##### 4.9.2.3 Generate chart

In [72]:
# Create chart title
txt = TRN["365"]
title_txt = (
    f"Amtrak {txt['name']} Train {txt['number']} Late Detraining Passengers\n"
    f"{txt['route']} ({txt['direction']})"
)
title = ttl.format_title(amtk_365_rte_stats, title_txt)

# Create and display the vertical boxplot
chart_vertical = boxp.create_boxplot(
    data=chrt_data,
    x_shorthand="Fiscal Year Quarter:N",
    x_title="Period",
    y_shorthand="Late Detraining Customers Avg Min Late:Q",
    y_title="Average Minutes Late",
    box_size=20,
    outlier_shorthand="outliers:Q",
    color_shorthand="Color:N",
    chart_title=title,
    orient=boxp.Orient.VERTICAL,
)
chart_vertical.display()

### 4.10 _Blue Water_: visualize westbound mean late arrival times by station

In [73]:
# Chart title
title_txt = f"Amtrak {SUB_SVC['blwtr']} Service Late Detraining Passengers (2022 Q1 - 2024 Q3)"
title = ttl.format_title(amtk_365_rte_stats, title_txt)

# Arrange stations by direction of travel
x_sort_order = amtk_365_rte_stats.index.tolist()

# Custom line colors
line_colors = {365: COLORS["amtk_blue"]}

# Tooltips
tooltip_config = [
    {"shorthand": f"{COLS['trn']}:N", "title": "Train", "format": None},
    {"shorthand": f"{COLS['station']}:N", "title": "Arrival Station", "format": None},
    {
        "shorthand": f"{COLS['late_detrn_avg_mm_late']} mean",
        "title": "Average Minutes Late",
        "format": None,
    },
]

chart = lne.create_line_chart(
    frame=amtk_365_rte_stats,
    x_shorthand=f"{COLS['station']}:N",
    x_title=f"{COLS['station']}",
    x_sort_order=x_sort_order,
    y_shorthand=f"{COLS['late_detrn_avg_mm_late']} mean:Q",
    y_title="Average Minutes Late",
    y_tick_count_max=75,
    point=True,
    # point={"filled": False, "fill": "white"},
    color_shorthand=f"{COLS['trn']}:N",
    colors=line_colors,
    tooltip_config=tooltip_config,
    title=title,
)
chart.display()

## 5.0 Michigan _Pere Marquette_ service (Chicago, IL - Grand Rapids, MI)

The _Pere Marquette_ operates daily between
[Chicago Union Station](https://www.amtrak.com/stations/chi), Chicago, IL
([CHI](https://www.amtrak.com/stations/chi)) and 
Grand Rapids ([GRR](https://www.amtrak.com/stations/grr)), MI. Intermediate stops include
St. Joseph-Benton Harbor, MI ([SJM](https://www.amtrak.com/stations/sjm.html)),
Bangor, MI ([BAM](https://www.amtrak.com/stations/bam.html)), and
Holland, MI ([HOM](https://www.amtrak.com/stations/hom.html)).

### 5.1 _Pere Marquette_: on-time performance metrics (entire period) [2 pts]

_Pere Marquette_ performance data is a compilation of quarterly metrics that focus on late
detraining passengers. Detraining assengers are considered on-time if they arrive at their
destination no later than fifteen (`15`) minutes after their scheduled arrival time. All other
detraining passengers are considered late.

Retrieve the _Pere Marquette_ row from the `mich_sub_svcs_stats` DataFrame. Call the appropriate
`DataFrame` method to convert the row to a `Series`. Assign the return value to a variable named
`prmrq_stats`.

In [101]:
# YOUR CODE HERE
prmrq_stats = mich_sub_svcs_stats.loc[1,:].squeeze(axis=0)
prmrq_stats

Sub Service                                    Pere Marquette
Train Arrivals                                             88
Total Detraining Customers sum                         238206
Total Detraining Customers mean                     2706.8864
Total Detraining Customers median                       761.0
Total Detraining Customers std                      3651.0593
Late Detraining Customers sum                           43311
Late Detraining Customers mean                       492.1705
Late Detraining Customers median                        169.0
Late Detraining Customers std                        651.5801
Late to Total Detraining Customers Ratio               0.1818
Late Detraining Customers Avg Min Late mean           40.7361
Total On Time Detraining Customers sum                 194895
Name: 1, dtype: object

In [None]:
#hidden tests are within this cell

In [102]:
# Total train arrivals
prmrq_trn_arrivals = prmrq_stats["Train Arrivals"]
# prmrq_trn_arrivals

# Detraining totals
prmrq_detrn = prmrq_stats[f"{COLS['total_detrn']} sum"]
prmrq_detrn_late = prmrq_stats[f"{COLS['late_detrn']} sum"]
prmrq_detrn_on_time = prmrq_detrn - prmrq_detrn_late

print(
    f"Train Arrivals: {prmrq_trn_arrivals}",
    f"Total Detraining Customers: {prmrq_detrn}",
    f"Late Detraining Customers: {prmrq_detrn_late}",
    f"On-Time Detraining Customers: {prmrq_detrn_on_time}",
    sep="\n",
)

Train Arrivals: 88
Total Detraining Customers: 238206
Late Detraining Customers: 43311
On-Time Detraining Customers: 194895


In [None]:
#hidden tests are within this cell

### 5.2 _Pere Marquette_ trains [1 pt]

Each _Pere Marquette_ train is identified by a unique train number.

Create a `DataFrame` named `prmrq_trns` that contains one row for each train comprising the
_Pere Marquette_ service. Include the following columns in the `DataFrame` in the order specified:

1. "Service Line"
2. "Service"
3. "Sub Service"
4. "Route Miles"
5. "Train Number"

Reset the index (set `drop=True`) when creating the new `DataFrame`.

In [39]:
# YOUR CODE HERE
prmrq_trns = prmrq.drop_duplicates(subset='Train Number', ignore_index=True)
prmrq_trns = prmrq_trns[["Service Line", "Service", "Sub Service", "Route Miles", "Train Number"]]
prmrq_trns.sort_values(by="Train Number", inplace=True, ignore_index=True)
prmrq_trns

Unnamed: 0,Service Line,Service,Sub Service,Route Miles,Train Number
0,State Supported,Michigan,Pere Marquette,174,370
1,State Supported,Michigan,Pere Marquette,174,371


In [None]:
#hidden tests are within this cell

### 5.3 _Pere Marquette_: mean late arrival times [2 pts]

In an earlier notebook, a _simple_ least-squares linear regression was formulated to estimate the
linear relationship between mean late arrival times and distance traveled (e.g., route miles). The
model suggests that with every additional route mile, the average minutes late for late detraining
passengers increases by approximately `0.0338` minutes. The _R_&sup2; value indicated that around
`25.5%` of the variability in the average minutes late could be explained by the number of route
miles traveled, with the remaining variability due to other factors or random noise.

For _Pere Marquette_ late detraining passengers traveling the entire route, the model predicts a mean
late arrival time of approximately `34.38`` minutes.

Retrieve the _Pere Marquette_ row from the `predictions` DataFrame. Call the appropriate `DataFrame`
method to convert the row to a `Series`. Assign the return value to a variable named
`prmrq_predicted`.

In [40]:
# YOUR CODE HERE
prmrq_predicted = predictions[predictions["Sub Service"] == "Pere Marquette"].squeeze(axis=0)
prmrq_predicted

Route Miles                          174
Predicted Avg Min Late             34.38
Sub Service               Pere Marquette
Name: 13, dtype: object

In [None]:
#hidden tests are within this cell

Contrast this prediction with the actual mean late arrival times experienced by _Pere Marquette_ late detraining passengers.

In [42]:
# Drop missing values
prmrq_avg_mm_late = prmrq[COLS["late_detrn_avg_mm_late"]].dropna().reset_index(drop=True)

# Call the custom frm.describe_numeric_column() function
prmrq_avg_mm_late_describe = frm.describe_numeric_column(prmrq_avg_mm_late)
prmrq_avg_mm_late_describe

{'type': pandas.core.series.Series,
 'name': 'Late Detraining Customers Avg Min Late',
 'values': {'non_null': np.int64(72),
  'missing': np.int64(0),
  'dtype': dtype('float64')},
 'center': {'mean': np.float64(40.736111111111114),
  'median': 38.0,
  'mode': np.float64(45.0)},
 'position': {'min': 16.0,
  '25%': np.float64(31.0),
  '50%': np.float64(38.0),
  '75%': np.float64(45.0),
  'max': 93.0},
 'spread': {'variance': 236.64769170579044,
  'std': 15.383357621331905,
  'range': 77.0,
  'iqr': np.float64(14.0)},
 'shape': {'skewness': np.float64(1.3359714911625042),
  'kurtosis': np.float64(2.1135164843884393)}}

In [None]:
#hidden tests are within this cell

### 5.4 _Pere Marquette_: eastbound and westbound routes [1 pt]

Stations served by eastbound and westbound _Pere Marquette_ trains.

In [43]:
# Retrieve the sub service from the Amtrak sub services list
prmrq_sub_svc = next(
    (sub_svc for sub_svc in amtk_sub_svcs if sub_svc["sub service"] == SUB_SVC["prmrq"])
)
prmrq_stn_codes = prmrq_sub_svc["station codes"]
prmrq_stns = stations[stations[COLS["station_code"]].isin(prmrq_stn_codes)].reset_index(drop=True)
prmrq_stns.sort_values(by=COLS["lon"], inplace=True)
prmrq_stns

Unnamed: 0,Arrival Station Code,Arrival Station,Arrival Station Type,City,Address 01,Address 02,ZIP Code,State,Division,Region,Country,Latitude,Longitude
1,CHI,Chicago (Union Station),Station Building (with waiting room),Chicago,255 South Clinton Street,,60606,Illinois,East North Central,Midwest,United States,41.878992,-87.641015
4,SJM,St. Joseph,Station Building (with waiting room),St. Joseph,410 1/2 Vine Street,,49085,Michigan,East North Central,Midwest,United States,42.10908,-86.484484
0,BAM,Bangor,Station Building (with waiting room),Bangor,541 Railroad Street,,49013,Michigan,East North Central,Midwest,United States,42.314519,-86.111646
3,HOM,Holland,Station Building (with waiting room),Holland,171 Lincoln Avenue,,49423,Michigan,East North Central,Midwest,United States,42.791096,-86.096553
2,GRR,Grand Rapids,Station Building (with waiting room),Grand Rapids,440 Century Avenue SW,,49503,Michigan,East North Central,Midwest,United States,42.955788,-85.672447


In [46]:
prmrq_stn_order_eb = prmrq_sub_svc["station order"]["eastbound"]
prmrq_stn_order_wb = prmrq_sub_svc["station order"]["westbound"]
# prmrq_stn_order_eb, prmrq_stn_order_wb

In [None]:
#hidden tests are within this cell

### 5.5 _Pere Marquette_: eastbound detraining passengers summary statistics

#### 5.5.1 _Pere Marquette_ Train 370 [1 pt]

In [51]:
# Base columns for routes
rte_cols = [
    COLS["trn"],
    COLS["station_code"],
    COLS["station"],    
    COLS["state"],
    COLS["lat"],
    COLS["lon"],
]

# Train 370 eastbound
amtk_370 = ntwk.by_train_number(trains, 370)
amtk_370_rte = ntwk.create_route(amtk_370, TRN["370"]["direction"], prmrq_stn_order_eb)
amtk_370_rte_stats = detrn.get_route_sum_stats(
    amtk_370_rte, COLS["station_code"], AGG["columns"], AGG["funcs"], rte_cols
)
amtk_370_rte_stats

Unnamed: 0,Train Number,Arrival Station Code,Arrival Station,State,Latitude,Longitude,Train Arrivals,Total Detraining Customers sum,Total Detraining Customers mean,Total Detraining Customers median,Total Detraining Customers std,Late Detraining Customers sum,Late Detraining Customers mean,Late Detraining Customers median,Late Detraining Customers std,Late to Total Detraining Customers Ratio,Late Detraining Customers Avg Min Late mean,Total On Time Detraining Customers sum
0,370,SJM,St. Joseph,Michigan,42.10908,-86.484484,11,15941,1449.1818,1533.0,327.615,3385,307.7273,286.0,187.3633,0.2123,42.8182,12556
1,370,BAM,Bangor,Michigan,42.314519,-86.111646,11,4486,407.8182,365.0,128.0311,1292,117.4545,107.0,58.9514,0.288,39.0909,3194
2,370,HOM,Holland,Michigan,42.791096,-86.096553,11,40144,3649.4545,3732.0,647.7801,12121,1101.9091,1032.0,394.9968,0.3019,37.2727,28023
3,370,GRR,Grand Rapids,Michigan,42.955788,-85.672447,11,56472,5133.8182,5186.0,747.2168,10013,910.2727,976.0,317.1426,0.1773,44.3636,46459


In [None]:
#hidden tests are within this cell

##### 5.5.1.1 Write to file [1 pt]

Write `amtk_370_rte_stats` to a CSV file named `stu-amtk_370_rte_stats.csv`. Store the file in the
`data/student` directory. Then compare it to the accompanying `fxt-amtk_370_rte_stats.csv` file.
It must match line for line, character for character.

In [52]:
# YOUR CODE HERE
filepath = parent_path.joinpath("data", "student", "stu-amtk_370_rte_stats.csv")
amtk_370_rte_stats.to_csv(filepath, index=False)

In [None]:
#hidden tests are within this cell

### 5.6 _Pere Marquette_: eastbound mean late arrival times

#### 5.6.1 _Pere Marquette_ Train 370 [1 pt]

Review the central tendency, dispersion, and shape for the mean late arrival times of train 370.

In [53]:
# Drop missing values
amtk_370_avg_mm_late = amtk_370[COLS["late_detrn_avg_mm_late"]].dropna().reset_index(drop=True)

# Describe the column
amtk_370_avg_mm_late_describe = frm.describe_numeric_column(amtk_370_avg_mm_late)
amtk_370_avg_mm_late_describe

{'type': pandas.core.series.Series,
 'name': 'Late Detraining Customers Avg Min Late',
 'values': {'non_null': np.int64(44),
  'missing': np.int64(0),
  'dtype': dtype('float64')},
 'center': {'mean': np.float64(40.88636363636363),
  'median': 39.0,
  'mode': np.float64(36.0)},
 'position': {'min': 25.0,
  '25%': np.float64(34.0),
  '50%': np.float64(39.0),
  '75%': np.float64(44.25),
  'max': 75.0},
 'spread': {'variance': 116.89376321353065,
  'std': 10.81174191393462,
  'range': 50.0,
  'iqr': np.float64(10.25)},
 'shape': {'skewness': np.float64(1.1467845169929445),
  'kurtosis': np.float64(1.6933606741227574)}}

In [None]:
#hidden tests are within this cell

#### 5.6.2 Generate box plots

##### 5.6.2.1 Assemble the chart data

In [54]:
# Base columns for average minutes late
cols = [COLS["year"], COLS["quarter"], COLS["late_detrn_avg_mm_late"]]

# Chart data
chrt_data = detrn.get_qtr_avg_min_late(
    amtk_370_rte, cols, COLS["year_quarter"], [COLORS["amtk_blue"], COLORS["amtk_red"]]
)
# chrt_data

##### 5.6.2.2 Preaggregate the data

In [55]:
# Base columns for aggregation statistics
cols = [COLS["year_quarter"], COLS["late_detrn_avg_mm_late"]]

# Pre-aggregate the data
chrt_data = frm.aggregate_data(chrt_data, cols)

##### 5.6.2.3 Generate chart

In [56]:
# Create chart title
txt = TRN["370"]
title_txt = (
    f"Amtrak {txt['name']} Train {txt['number']} Late Detraining Passengers\n"
    f"{txt['route']} ({txt['direction']})"
)
title = ttl.format_title(amtk_370_rte_stats, title_txt)

# Create and display the vertical boxplot
chart_vertical = boxp.create_boxplot(
    data=chrt_data,
    x_shorthand="Fiscal Year Quarter:N",
    x_title="Period",
    y_shorthand="Late Detraining Customers Avg Min Late:Q",
    y_title="Average Minutes Late",
    box_size=20,
    outlier_shorthand="outliers:Q",
    color_shorthand="Color:N",
    chart_title=title,
    orient=boxp.Orient.VERTICAL,
)
chart_vertical.display()

### 5.7 _Pere Marquette_: visualize eastbound mean late arrival times by station

In [57]:
# Chart title
title_txt = f"Amtrak {SUB_SVC['prmrq']} Service Late Detraining Passengers (2022 Q1 - 2024 Q3)"
title = ttl.format_title(amtk_370_rte_stats, title_txt)

# Arrange stations by direction of travel
x_sort_order = amtk_370_rte_stats.index.tolist()

# Custom line colors
line_colors = {370: COLORS["amtk_blue"]}

# Tooltips
tooltip_config = [
    {"shorthand": f"{COLS['trn']}:N", "title": "Train", "format": None},
    {"shorthand": f"{COLS['station']}:N", "title": "Arrival Station", "format": None},
    {
        "shorthand": f"{COLS['late_detrn_avg_mm_late']} mean",
        "title": "Average Minutes Late",
        "format": None,
    },
]

chart = lne.create_line_chart(
    frame=amtk_370_rte_stats,
    x_shorthand=f"{COLS['station']}:N",
    x_title=f"{COLS['station']}",
    x_sort_order=x_sort_order,
    y_shorthand=f"{COLS['late_detrn_avg_mm_late']} mean:Q",
    y_title="Average Minutes Late",
    y_tick_count_max=75,
    point=True,
    # point={"filled": False, "fill": "white"},
    color_shorthand=f"{COLS['trn']}:N",
    colors=line_colors,
    tooltip_config=tooltip_config,
    title=title,
)
chart.display()

### 5.8 _Pere Marquette_: westbound detraining passengers summary statistics

#### 5.8.1 _Pere Marquette_ Train 371 [1 pt]

Review previous code employed to generate summary statistics for an Amtrak train. Then leverage
functions available in the `amtk_network` and `amtk_detrain` modules to create three new
`DataFrame` objects named `amtk_371`, `amtk_371_rte`, and `amtk_371_rte_stats`, respectively.

In [60]:
# Base columns for routes
rte_cols = [
    COLS["trn"],
    COLS["station_code"],
    COLS["station"],    
    COLS["state"],
    COLS["lat"],
    COLS["lon"],
]

# Train 371 westbound
amtk_371 = ntwk.by_train_number(trains, 371)
amtk_371_rte = ntwk.create_route(amtk_371, TRN["371"]["direction"], prmrq_stn_order_wb)
amtk_371_rte_stats = detrn.get_route_sum_stats(
    amtk_371_rte, COLS["station_code"], AGG["columns"], AGG["funcs"], rte_cols
)
amtk_371_rte_stats

Unnamed: 0,Train Number,Arrival Station Code,Arrival Station,State,Latitude,Longitude,Train Arrivals,Total Detraining Customers sum,Total Detraining Customers mean,Total Detraining Customers median,Total Detraining Customers std,Late Detraining Customers sum,Late Detraining Customers mean,Late Detraining Customers median,Late Detraining Customers std,Late to Total Detraining Customers Ratio,Late Detraining Customers Avg Min Late mean,Total On Time Detraining Customers sum
0,371,HOM,Holland,Michigan,42.791096,-86.096553,11,460,41.8182,42.0,20.9276,7,0.6364,1.0,0.6742,0.0152,41.6667,453
1,371,BAM,Bangor,Michigan,42.314519,-86.111646,11,183,16.6364,15.0,6.2012,6,0.5455,0.0,1.2136,0.0328,22.0,177
2,371,SJM,St. Joseph,Michigan,42.10908,-86.484484,11,937,85.1818,79.0,28.4282,45,4.0909,3.0,3.7803,0.048,39.875,892
3,371,CHI,Chicago (Union Station),Illinois,41.878992,-87.641015,11,119583,10871.1818,10944.0,1789.837,16442,1494.7273,1268.0,840.2458,0.1375,45.3636,103141


In [None]:
#hidden tests are within this cell

##### 5.8.1.1 Write to file [1 pt]

Write `amtk_371_rte_stats` to a CSV file named `stu-amtk_371_rte_stats.csv`. Store the file in the
`data/student` directory. Then compare it to the accompanying `fxt-amtk_371_rte_stats.csv` file.
It must match line for line, character for character.

In [61]:
# YOUR CODE HERE
filepath = parent_path.joinpath("data", "student", "stu-amtk_371_rte_stats.csv")
amtk_371_rte_stats.to_csv(filepath, index=False)

In [None]:
#hidden tests are within this cell

### 5.9 _Pere Marquette_: westbound mean late arrival times

#### 5.9.1 _Pere Marquette_ Train 371 [1 pt]

Review the central tendency, dispersion, and shape for the mean late arrival times of train 371.

In [62]:
# Drop missing values
amtk_371_avg_mm_late = amtk_371[COLS["late_detrn_avg_mm_late"]].dropna().reset_index(drop=True)

# Describe the column
amtk_371_avg_mm_late_describe = frm.describe_numeric_column(amtk_371_avg_mm_late)
amtk_371_avg_mm_late_describe

{'type': pandas.core.series.Series,
 'name': 'Late Detraining Customers Avg Min Late',
 'values': {'non_null': np.int64(28),
  'missing': np.int64(0),
  'dtype': dtype('float64')},
 'center': {'mean': np.float64(40.5),
  'median': 34.5,
  'mode': np.float64(21.0)},
 'position': {'min': 16.0,
  '25%': np.float64(25.25),
  '50%': np.float64(34.5),
  '75%': np.float64(45.5),
  'max': 93.0},
 'spread': {'variance': 436.037037037037,
  'std': 20.88149987517748,
  'range': 77.0,
  'iqr': np.float64(20.25)},
 'shape': {'skewness': np.float64(1.2366629304117347),
  'kurtosis': np.float64(0.742640018550266)}}

In [None]:
#hidden tests are within this cell

#### 5.9.2 Generate box plots

##### 5.9.2.1 Assemble the chart data

In [63]:
# Base columns for average minutes late
cols = [COLS["year"], COLS["quarter"], COLS["late_detrn_avg_mm_late"]]

# Chart data
chrt_data = detrn.get_qtr_avg_min_late(
    amtk_371_rte, cols, COLS["year_quarter"], [COLORS["amtk_blue"], COLORS["amtk_red"]]
)
# chrt_data

##### 5.9.2.2 Preaggregate the data

In [64]:
# Base columns for aggregation statistics
cols = [COLS["year_quarter"], COLS["late_detrn_avg_mm_late"]]

# Pre-aggregate the data
chrt_data = frm.aggregate_data(chrt_data, cols)

##### 5.9.2.3 Generate chart

In [65]:
# Create chart title
txt = TRN["371"]
title_txt = (
    f"Amtrak {txt['name']} Train {txt['number']} Late Detraining Passengers\n"
    f"{txt['route']} ({txt['direction']})"
)
title = ttl.format_title(amtk_371_rte_stats, title_txt)

# Create and display the vertical boxplot
chart_vertical = boxp.create_boxplot(
    data=chrt_data,
    x_shorthand="Fiscal Year Quarter:N",
    x_title="Period",
    y_shorthand="Late Detraining Customers Avg Min Late:Q",
    y_title="Average Minutes Late",
    box_size=20,
    outlier_shorthand="outliers:Q",
    color_shorthand="Color:N",
    chart_title=title,
    orient=boxp.Orient.VERTICAL,
)
chart_vertical.display()

### 5.10 _Pere Marquette_: visualize westbound mean late arrival times by station

In [66]:
# Chart title
title_txt = f"Amtrak {SUB_SVC['prmrq']} Service Late Detraining Passengers (2022 Q1 - 2024 Q3)"
title = ttl.format_title(amtk_371_rte_stats, title_txt)

# Arrange stations by direction of travel
x_sort_order = amtk_371_rte_stats.index.tolist()

# Custom line colors
line_colors = {371: COLORS["amtk_blue"]}

# Tooltips
tooltip_config = [
    {"shorthand": f"{COLS['trn']}:N", "title": "Train", "format": None},
    {"shorthand": f"{COLS['station']}:N", "title": "Arrival Station", "format": None},
    {
        "shorthand": f"{COLS['late_detrn_avg_mm_late']} mean",
        "title": "Average Minutes Late",
        "format": None,
    },
]

chart = lne.create_line_chart(
    frame=amtk_371_rte_stats,
    x_shorthand=f"{COLS['station']}:N",
    x_title=f"{COLS['station']}",
    x_sort_order=x_sort_order,
    y_shorthand=f"{COLS['late_detrn_avg_mm_late']} mean:Q",
    y_title="Average Minutes Late",
    y_tick_count_max=75,
    point=True,
    # point={"filled": False, "fill": "white"},
    color_shorthand=f"{COLS['trn']}:N",
    colors=line_colors,
    tooltip_config=tooltip_config,
    title=title,
)
chart.display()

## 6.0 Michigan _Wolverine_ service (Chicago, IL - Pontiac, MI)

_Wolverine_ trains operates between
[Chicago Union Station](https://www.amtrak.com/stations/chi),
Chicago, IL ([CHI](https://www.amtrak.com/stations/chi)) and
Pontiac, MI ([PNT](https://www.amtrak.com/stations/pnt)). Intermediate stops include
Kalamazoo, MI ([KAL](https://www.amtrak.com/stations/kal)),
Battle Creek, MI ([BTL](https://www.amtrak.com/stations/btl)),
Jackson, MI ([JXN](https://www.amtrak.com/stations/jxn)),
Ann Arbor, MI ([ARB](https://www.amtrak.com/stations/arb)), and
Detroit, MI ([DET](https://www.amtrak.com/stations/det)), among other towns and cities.

### 6.1 _Wolverine_: on-time performance metrics (entire period) [2 pts]

_Wolverine_ performance data is a compilation of quarterly metrics that focus on late
detraining passengers. Detraining assengers are considered on-time if they arrive at their
destination no later than fifteen (`15`) minutes after their scheduled arrival time. All other
detraining passengers are considered late.

Retrieve the _Wolverine_ row from the `mich_sub_svcs_stats` DataFrame. Call the appropriate
`DataFrame` method to convert the row to a `Series`. Assign the return value to a variable named
`wolv_stats`.

In [103]:
# YOUR CODE HERE
wolv_stats = mich_sub_svcs_stats.loc[2,:].squeeze(axis=0)
wolv_stats

Sub Service                                    Wolverine
Train Arrivals                                       771
Total Detraining Customers sum                   1093600
Total Detraining Customers mean                1418.4176
Total Detraining Customers median                  402.0
Total Detraining Customers std                 3084.1497
Late Detraining Customers sum                     370922
Late Detraining Customers mean                  481.0921
Late Detraining Customers median                   135.0
Late Detraining Customers std                  1010.5961
Late to Total Detraining Customers Ratio          0.3392
Late Detraining Customers Avg Min Late mean      49.2532
Total On Time Detraining Customers sum            722678
Name: 2, dtype: object

In [None]:
#hidden tests are within this cell

In [104]:
# Total train arrivals
wolv_trn_arrivals = wolv_stats["Train Arrivals"]
# wolv_trn_arrivals

# Detraining totals
wolv_detrn = wolv_stats[f"{COLS['total_detrn']} sum"]
wolv_detrn_late = wolv_stats[f"{COLS['late_detrn']} sum"]
wolv_detrn_on_time = wolv_detrn - wolv_detrn_late

print(
    f"Train Arrivals: {wolv_trn_arrivals}",
    f"Total Detraining Customers: {wolv_detrn}",
    f"Late Detraining Customers: {wolv_detrn_late}",
    f"On-Time Detraining Customers: {wolv_detrn_on_time}",
    sep="\n",
)


Train Arrivals: 771
Total Detraining Customers: 1093600
Late Detraining Customers: 370922
On-Time Detraining Customers: 722678


In [None]:
#hidden tests are within this cell

### 6.2 _Wolverine_ trains [1 pt]

Each _Wolverine_ train is identified by a unique train number.

Create a `DataFrame` named `wolv_trns` that contains one row for each train comprising the
_Pere Marquette_ service. Include the following columns in the `DataFrame` in the order specified:

1. "Service Line"
2. "Service"
3. "Sub Service"
4. "Route Miles"
5. "Train Number"

Reset the index (set `drop=True`) when creating the new `DataFrame`.

In [70]:
# YOUR CODE HERE
wolv_trns = wolv.drop_duplicates(subset='Train Number', ignore_index=True)
wolv_trns = wolv_trns[["Service Line", "Service", "Sub Service", "Route Miles", "Train Number"]]
wolv_trns.sort_values(by="Train Number", inplace=True, ignore_index=True)
wolv_trns

Unnamed: 0,Service Line,Service,Sub Service,Route Miles,Train Number
0,State Supported,Michigan,Wolverine,299,350
1,State Supported,Michigan,Wolverine,299,351
2,State Supported,Michigan,Wolverine,299,352
3,State Supported,Michigan,Wolverine,299,353
4,State Supported,Michigan,Wolverine,299,354
5,State Supported,Michigan,Wolverine,299,355


In [None]:
#hidden tests are within this cell

### 6.3 _Wolverine_: mean late arrival times [2 pts]

In an earlier notebook, a _simple_ least-squares linear regression was formulated to estimate the
linear relationship between mean late arrival times and distance traveled (e.g., route miles). The
model suggests that with every additional route mile, the average minutes late for late detraining
passengers increases by approximately `0.0338` minutes. The _R_&sup2; value indicated that around
`25.5%` of the variability in the average minutes late could be explained by the number of route
miles traveled, with the remaining variability due to other factors or random noise.

For _Wolverine_ late detraining passengers traveling the entire route, the model predicts a mean
late arrival time of approximately `38.61` minutes.

Retrieve the _Wolverine_ row from the `predictions` DataFrame. Call the appropriate `DataFrame`
method to convert the row to a `Series`. Assign the return value to a variable named
`wolv_predicted`.

In [71]:
# YOUR CODE HERE
wolv_predicted = predictions[predictions["Sub Service"] == "Wolverine"].squeeze(axis=0)
wolv_predicted

Route Miles                     299
Predicted Avg Min Late        38.61
Sub Service               Wolverine
Name: 26, dtype: object

In [None]:
#hidden tests are within this cell

Contrast this prediction with the actual mean late arrival times experienced by _Wolverine_ late detraining passengers.

In [73]:
# Drop missing values
wolv_avg_mm_late = wolv[COLS["late_detrn_avg_mm_late"]].dropna().reset_index(drop=True)

# Call the custom frm.describe_numeric_column() function again
wolv_avg_mm_late_describe = frm.describe_numeric_column(wolv_avg_mm_late)
wolv_avg_mm_late_describe

{'type': pandas.core.series.Series,
 'name': 'Late Detraining Customers Avg Min Late',
 'values': {'non_null': np.int64(703),
  'missing': np.int64(0),
  'dtype': dtype('float64')},
 'center': {'mean': np.float64(49.253200568990046),
  'median': 45.0,
  'mode': np.float64(40.0)},
 'position': {'min': 16.0,
  '25%': np.float64(37.5),
  '50%': np.float64(45.0),
  '75%': np.float64(58.0),
  'max': 177.0},
 'spread': {'variance': 374.3602995708256,
  'std': 19.348392687012158,
  'range': 161.0,
  'iqr': np.float64(20.5)},
 'shape': {'skewness': np.float64(1.6251588361072902),
  'kurtosis': np.float64(4.994281400131689)}}

In [None]:
#hidden tests are within this cell

### 6.4 _Wolverine_: eastbound and westbound routes [1 pt]

Stations served by eastbound and westbound _Wolverine_ trains.

In [74]:
# Retrieve the sub service from the Amtrak sub services list
wolv_sub_svc = next(
    (sub_svc for sub_svc in amtk_sub_svcs if sub_svc["sub service"] == SUB_SVC["wolv"])
)
wolv_stn_codes = wolv_sub_svc["station codes"]
wolv_stns = stations[stations[COLS["station_code"]].isin(wolv_stn_codes)].reset_index(drop=True)
# WARN: longitude sort does not guarantee correct station order: ROY, TRM, PNT last/first 3 stops
wolv_stns.sort_values(by=COLS["lon"], inplace=True)
wolv_stns

Unnamed: 0,Arrival Station Code,Arrival Station,Arrival Station Type,City,Address 01,Address 02,ZIP Code,State,Division,Region,Country,Latitude,Longitude
3,CHI,Chicago (Union Station),Station Building (with waiting room),Chicago,255 South Clinton Street,,60606,Illinois,East North Central,Midwest,United States,41.878992,-87.641015
7,HMI,Hammond-Whiting,Station Building (with waiting room),Hammond,1135 North Calumet Avenue,,46320,Indiana,East North Central,Midwest,United States,41.691155,-87.506511
10,MCI,Michigan City,,Michigan City,100 Washington Street,,46360,Indiana,East North Central,Midwest,United States,41.721111,-86.905556
11,NBU,New Buffalo,Platform with Shelter,New Buffalo,226 North Whittaker Street,,49117,Michigan,East North Central,Midwest,United States,41.796656,-86.745782
12,NLS,Niles,Station Building (with waiting room),Niles,598 Dey Street,,49120,Michigan,East North Central,Midwest,United States,41.837412,-86.252372
6,DOA,Dowagiac,Station Building (with waiting room),Dowagiac,200 Depot Drive,,49047,Michigan,East North Central,Midwest,United States,41.980941,-86.109041
9,KAL,Kalamazoo,Station Building (with waiting room),Kalamazoo,459 North Burdick Street,,49007,Michigan,East North Central,Midwest,United States,42.295255,-85.584018
2,BTL,Battle Creek,Station Building (with waiting room),Battle Creek,119 McCamly Street South,,49017,Michigan,East North Central,Midwest,United States,42.318453,-85.187825
0,ALI,Albion,Station Building (with waiting room),Albion,300 North Eaton Street,,49224,Michigan,East North Central,Midwest,United States,42.247196,-84.755814
8,JXN,Jackson,Station Building (with waiting room),Jackson,501 East Michigan Avenue,,49201,Michigan,East North Central,Midwest,United States,42.248113,-84.39967


Station order for eastbound and westbound _Pere Marquette_ trains.

In [75]:
# Correct order of stations for eastbound and westbound trains
wolv_stn_order_eb = wolv_sub_svc["station order"]["eastbound"]
wolv_stn_order_wb = wolv_sub_svc["station order"]["westbound"]
# wolv_stn_order_eb, wolv_stn_order_wb

({'CHI': 0,
  'HMI': 1,
  'MCI': 2,
  'NBU': 3,
  'NLS': 4,
  'DOA': 5,
  'KAL': 6,
  'BTL': 7,
  'ALI': 8,
  'JXN': 9,
  'ARB': 10,
  'DER': 11,
  'DET': 12,
  'ROY': 13,
  'TRM': 14,
  'PNT': 15},
 {'PNT': 0,
  'TRM': 1,
  'ROY': 2,
  'DET': 3,
  'DER': 4,
  'ARB': 5,
  'JXN': 6,
  'ALI': 7,
  'BTL': 8,
  'KAL': 9,
  'DOA': 10,
  'NLS': 11,
  'NBU': 12,
  'MCI': 13,
  'HMI': 14,
  'CHI': 15})

In [None]:
#hidden tests are within this cell

### 6.5 _Wolverine_: eastbound detraining passengers summary statistics [2 pts]

Call the appropriate `DataFrame` method to retrieve the station data for trains 350, 352, and 354
referenced in `wolv`. Assign the new `DataFrame` returned by the method call to a variable named
`wolv_eb`.

In [86]:
# YOUR CODE HERE
wolv_eb = wolv[wolv["Train Number"].isin([350, 352, 354])]
wolv_eb

Unnamed: 0,Fiscal Year,Fiscal Quarter,Service Line,Service,Sub Service,Route Miles,Train Number,Arrival Station Code,Arrival Station,Arrival Station Type,...,State,Division,Region,Country,Latitude,Longitude,Total Detraining Customers,Late Detraining Customers,Late to Total Detraining Customers Ratio,Late Detraining Customers Avg Min Late
0,2024,3,State Supported,Michigan,Wolverine,299,350,ARB,Ann Arbor,Station Building (with waiting room),...,Michigan,East North Central,Midwest,United States,42.287692,-83.743154,2817,1097,0.38942,39.0
1,2024,3,State Supported,Michigan,Wolverine,299,350,BTL,Battle Creek,Station Building (with waiting room),...,Michigan,East North Central,Midwest,United States,42.318453,-85.187825,379,95,0.25066,51.0
2,2024,3,State Supported,Michigan,Wolverine,299,350,DER,Dearborn,Station Building (with waiting room),...,Michigan,East North Central,Midwest,United States,42.307167,-83.235311,1654,679,0.41052,48.0
3,2024,3,State Supported,Michigan,Wolverine,299,350,DET,Detroit,Station Building (with waiting room),...,Michigan,East North Central,Midwest,United States,42.368097,-83.072397,2627,710,0.27027,59.0
4,2024,3,State Supported,Michigan,Wolverine,299,350,DOA,Dowagiac,Station Building (with waiting room),...,Michigan,East North Central,Midwest,United States,41.980941,-86.109041,81,14,0.17284,42.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
752,2022,1,State Supported,Michigan,Wolverine,299,354,NBU,New Buffalo,Platform with Shelter,...,Michigan,East North Central,Midwest,United States,41.796656,-86.745782,349,159,0.45559,45.0
753,2022,1,State Supported,Michigan,Wolverine,299,354,NLS,Niles,Station Building (with waiting room),...,Michigan,East North Central,Midwest,United States,41.837412,-86.252372,233,108,0.46352,58.0
754,2022,1,State Supported,Michigan,Wolverine,299,354,PNT,Pontiac,Station Building (with waiting room),...,Michigan,East North Central,Midwest,United States,42.632771,-83.292325,297,102,0.34343,95.0
755,2022,1,State Supported,Michigan,Wolverine,299,354,ROY,Royal Oak,Platform with Shelter,...,Michigan,East North Central,Midwest,United States,42.488439,-83.147010,570,213,0.37368,85.0


In [None]:
#hidden tests are within this cell

Compute the summary statistics for westbound _Wolverine_ trains. The `amtk_detrain` module includes
a function that you can call to perform the computation. Pass to it `wolv_eb` along with other
arguments required by the function. Assign the return value of the function call to a variable named
`wolv_eb_stats`.

In [135]:
# YOUR CODE HERE
wolv_eb_stats = detrn.get_sum_stats_by_group(
    wolv_eb,
    COLS["sub_svc"],
    AGG["columns"],
    AGG["funcs"],
)
wolv_eb_stats.drop(columns="Sub Service", inplace=True)
wolv_eb_stats

Unnamed: 0,Train Arrivals,Total Detraining Customers sum,Total Detraining Customers mean,Total Detraining Customers median,Total Detraining Customers std,Late Detraining Customers sum,Late Detraining Customers mean,Late Detraining Customers median,Late Detraining Customers std,Late to Total Detraining Customers Ratio,Late Detraining Customers Avg Min Late mean,Total On Time Detraining Customers sum
0,402,550320,1368.9552,767.0,1478.8744,215231,535.4005,267.5,750.8998,0.3911,55.1307,335089


In [None]:
#hidden tests are within this cell

#### 6.5.1 _Wolverine_ Train 350 [1 pt]

In [90]:
rte_cols = [
    COLS["trn"],
    COLS["station_code"],
    COLS["station"],
    COLS["state"],
    COLS["lat"],
    COLS["lon"],
]

# Train 350 eastbound
amtk_350 = ntwk.by_train_number(trains, 350)
amtk_350_rte = ntwk.create_route(amtk_350, TRN["350"]["direction"], wolv_stn_order_eb)
amtk_350_rte_stats = detrn.get_route_sum_stats(
    amtk_350_rte, COLS["station_code"], AGG["columns"], AGG["funcs"], rte_cols
)
amtk_350_rte_stats

Unnamed: 0,Train Number,Arrival Station Code,Arrival Station,State,Latitude,Longitude,Train Arrivals,Total Detraining Customers sum,Total Detraining Customers mean,Total Detraining Customers median,Total Detraining Customers std,Late Detraining Customers sum,Late Detraining Customers mean,Late Detraining Customers median,Late Detraining Customers std,Late to Total Detraining Customers Ratio,Late Detraining Customers Avg Min Late mean,Total On Time Detraining Customers sum
0,350,HMI,Hammond-Whiting,Indiana,41.691155,-87.506511,11,151,13.7273,13.0,5.4789,21,1.9091,2.0,1.3751,0.1391,39.7778,130
1,350,MCI,Michigan City,Indiana,41.721111,-86.905556,3,97,32.3333,26.0,34.9333,47,15.6667,11.0,18.4481,0.4845,34.0,50
2,350,NBU,New Buffalo,Michigan,41.796656,-86.745782,11,4169,379.0,365.0,189.1587,799,72.6364,68.0,32.7209,0.1917,55.2727,3370
3,350,NLS,Niles,Michigan,41.837412,-86.252372,11,1545,140.4545,121.0,35.1208,300,27.2727,21.0,21.2042,0.1942,68.1818,1245
4,350,DOA,Dowagiac,Michigan,41.980941,-86.109041,11,939,85.3636,81.0,29.723,234,21.2727,18.0,15.0538,0.2492,47.3636,705
5,350,KAL,Kalamazoo,Michigan,42.295255,-85.584018,11,18699,1699.9091,1712.0,537.2886,4956,450.5455,395.0,178.0749,0.265,52.9091,13743
6,350,BTL,Battle Creek,Michigan,42.318453,-85.187825,11,4377,397.9091,382.0,95.257,881,80.0909,71.0,39.7529,0.2013,63.7273,3496
7,350,JXN,Jackson,Michigan,42.248113,-84.39967,11,3607,327.9091,320.0,75.5135,1534,139.4545,121.0,59.1158,0.4253,43.4545,2073
8,350,ARB,Ann Arbor,Michigan,42.287692,-83.743154,11,37278,3388.9091,3232.0,625.0639,17388,1580.7273,1641.0,498.8202,0.4664,42.2727,19890
9,350,DER,Dearborn,Michigan,42.307167,-83.235311,11,19674,1788.5455,1669.0,304.1146,8022,729.2727,762.0,187.7318,0.4077,45.5455,11652


In [None]:
#hidden tests are within this cell

##### 6.5.1.1 Write to file [1 pt]

Write `amtk_350_rte_stats` to a CSV file named `stu-amtk_350_rte_stats.csv`. Store the file in the
`data/student` directory. Then compare it to the accompanying `fxt-amtk_350_rte_stats.csv` file.
It must match line for line, character for character.

In [91]:
# YOUR CODE HERE
filepath = parent_path.joinpath("data", "student", "stu-amtk_350_rte_stats.csv")
amtk_350_rte_stats.to_csv(filepath, index=False)

In [None]:
#hidden tests are within this cell

#### 6.5.2 _Wolverine_ Train 352 [1 pt]

In [92]:
# Train 352 eastbound
amtk_352 = ntwk.by_train_number(wolv, 352)
amtk_352_rte = ntwk.create_route(amtk_352, TRN["352"]["direction"], wolv_stn_order_eb)
amtk_352_rte_stats = detrn.get_route_sum_stats(
    amtk_352_rte, COLS["station_code"], AGG["columns"], AGG["funcs"], rte_cols
)
amtk_352_rte_stats

Unnamed: 0,Train Number,Arrival Station Code,Arrival Station,State,Latitude,Longitude,Train Arrivals,Total Detraining Customers sum,Total Detraining Customers mean,Total Detraining Customers median,Total Detraining Customers std,Late Detraining Customers sum,Late Detraining Customers mean,Late Detraining Customers median,Late Detraining Customers std,Late to Total Detraining Customers Ratio,Late Detraining Customers Avg Min Late mean,Total On Time Detraining Customers sum
0,352,NBU,New Buffalo,Michigan,41.796656,-86.745782,11,8985,816.8182,807.0,352.7353,3088,280.7273,238.0,147.783,0.3437,42.1818,5897
1,352,NLS,Niles,Michigan,41.837412,-86.252372,11,3669,333.5455,329.0,96.3643,1365,124.0909,109.0,47.5046,0.372,40.5455,2304
2,352,KAL,Kalamazoo,Michigan,42.295255,-85.584018,11,24668,2242.5455,2242.0,166.7443,10283,934.8182,988.0,222.0058,0.4169,43.8182,14385
3,352,BTL,Battle Creek,Michigan,42.318453,-85.187825,11,7952,722.9091,679.0,207.841,2489,226.2727,214.0,111.7936,0.313,48.3636,5463
4,352,JXN,Jackson,Michigan,42.248113,-84.39967,11,7310,664.5455,691.0,93.5888,2417,219.7273,236.0,53.939,0.3306,50.7273,4893
5,352,ARB,Ann Arbor,Michigan,42.287692,-83.743154,11,71314,6483.0909,6448.0,678.3898,37275,3388.6364,3624.0,962.3193,0.5227,49.1818,34039
6,352,DER,Dearborn,Michigan,42.307167,-83.235311,11,31871,2897.3636,3033.0,582.1256,11176,1016.0,1047.0,261.8652,0.3507,60.5455,20695
7,352,DET,Detroit,Michigan,42.368097,-83.072397,11,31310,2846.3636,2801.0,513.9956,8810,800.9091,808.0,163.8502,0.2814,60.9091,22500
8,352,ROY,Royal Oak,Michigan,42.488439,-83.14701,11,13016,1183.2727,1214.0,242.5049,4219,383.5455,403.0,109.7892,0.3241,60.8182,8797
9,352,TRM,Troy,Michigan,42.542555,-83.191026,11,17664,1605.8182,1641.0,272.3736,5808,528.0,566.0,133.2119,0.3288,63.3636,11856


In [None]:
#hidden tests are within this cell

##### 6.5.2.1 Write to file [1 pt]

Write `amtk_352_rte_stats` to a CSV file named `stu-amtk_352_rte_stats.csv`. Store the file in the
`data/student` directory. Then compare it to the accompanying `fxt-amtk_352_rte_stats.csv` file.
It must match line for line, character for character.

In [93]:
# YOUR CODE HERE
filepath = parent_path.joinpath("data", "student", "stu-amtk_352_rte_stats.csv")
amtk_352_rte_stats.to_csv(filepath, index=False)

In [None]:
#hidden tests are within this cell

#### 6.5.3 _Wolverine_ Train 354 [1 pt]

In [94]:
# Train 354 eastbound
amtk_354 = ntwk.by_train_number(wolv, 354)
amtk_354_rte = ntwk.create_route(amtk_354, TRN["354"]["direction"], wolv_stn_order_eb)
amtk_354_rte_stats = detrn.get_route_sum_stats(
    amtk_354_rte, COLS["station_code"], AGG["columns"], AGG["funcs"], rte_cols
)
amtk_354_rte_stats

Unnamed: 0,Train Number,Arrival Station Code,Arrival Station,State,Latitude,Longitude,Train Arrivals,Total Detraining Customers sum,Total Detraining Customers mean,Total Detraining Customers median,Total Detraining Customers std,Late Detraining Customers sum,Late Detraining Customers mean,Late Detraining Customers median,Late Detraining Customers std,Late to Total Detraining Customers Ratio,Late Detraining Customers Avg Min Late mean,Total On Time Detraining Customers sum
0,354,MCI,Michigan City,Indiana,41.721111,-86.905556,3,192,64.0,86.0,49.7896,77,25.6667,31.0,23.4592,0.401,39.5,115
1,354,NBU,New Buffalo,Michigan,41.796656,-86.745782,11,6639,603.5455,616.0,283.4499,1731,157.3636,152.0,79.0927,0.2607,58.0909,4908
2,354,NLS,Niles,Michigan,41.837412,-86.252372,11,3552,322.9091,288.0,121.3379,997,90.6364,82.0,49.6996,0.2807,59.5455,2555
3,354,KAL,Kalamazoo,Michigan,42.295255,-85.584018,11,19592,1781.0909,1812.0,264.3968,7155,650.4545,662.0,171.552,0.3652,54.9091,12437
4,354,BTL,Battle Creek,Michigan,42.318453,-85.187825,11,6211,564.6364,557.0,164.4161,2213,201.1818,177.0,77.9664,0.3563,58.2727,3998
5,354,ALI,Albion,Michigan,42.247196,-84.755814,11,2050,186.3636,199.0,38.6789,887,80.6364,64.0,35.3929,0.4327,48.4545,1163
6,354,JXN,Jackson,Michigan,42.248113,-84.39967,11,6884,625.8182,659.0,118.1675,3210,291.8182,289.0,73.8428,0.4663,60.0,3674
7,354,ARB,Ann Arbor,Michigan,42.287692,-83.743154,11,62140,5649.0909,5845.0,795.1041,30524,2774.9091,2668.0,803.5014,0.4912,55.1818,31616
8,354,DER,Dearborn,Michigan,42.307167,-83.235311,11,25140,2285.4545,2356.0,514.4956,11057,1005.1818,911.0,340.1673,0.4398,62.4545,14083
9,354,DET,Detroit,Michigan,42.368097,-83.072397,11,23298,2118.0,2157.0,383.0277,6961,632.8182,549.0,241.5855,0.2988,72.4545,16337


In [None]:
#hidden tests are within this cell

##### 6.5.3.1 Write to file [1 pt]

Write `amtk_354_rte_stats` to a CSV file named `stu-amtk_354_rte_stats.csv`. Store the file in the
`data/student` directory. Then compare it to the accompanying `fxt-amtk_354_rte_stats.csv` file.
It must match line for line, character for character.

In [95]:
# YOUR CODE HERE
filepath = parent_path.joinpath("data", "student", "stu-amtk_354_rte_stats.csv")
amtk_354_rte_stats.to_csv(filepath, index=False)

In [None]:
#hidden tests are within this cell

### 6.6 _Wolverine_: eastbound mean late arrival times

Review the central tendency, dispersion, and shape for the mean late arrival times of trains 350, 352, and 354.

#### 6.6.1 _Wolverine_ Train 350 [1 pt]

In [96]:
# Drop missing values
amtk_350_avg_mm_late = amtk_350[COLS["late_detrn_avg_mm_late"]].dropna().reset_index(drop=True)

# Describe the column
amtk_350_avg_mm_late_describe = frm.describe_numeric_column(amtk_350_avg_mm_late)
amtk_350_avg_mm_late_describe

{'type': pandas.core.series.Series,
 'name': 'Late Detraining Customers Avg Min Late',
 'values': {'non_null': np.int64(143),
  'missing': np.int64(0),
  'dtype': dtype('float64')},
 'center': {'mean': np.float64(49.76923076923077),
  'median': 45.0,
  'mode': np.float64(37.0)},
 'position': {'min': 17.0,
  '25%': np.float64(38.5),
  '50%': np.float64(45.0),
  '75%': np.float64(56.0),
  'max': 146.0},
 'spread': {'variance': 334.8266522210184,
  'std': 18.29826910450872,
  'range': 129.0,
  'iqr': np.float64(17.5)},
 'shape': {'skewness': np.float64(2.1608662012160575),
  'kurtosis': np.float64(7.226731291864651)}}

In [None]:
#hidden tests are within this cell

#### 6.6.2 _Wolverine_ Train 352 [1 pt]

In [97]:
# Drop missing values
amtk_352_avg_mm_late = amtk_352[COLS["late_detrn_avg_mm_late"]].dropna().reset_index(drop=True)

# Describe the column
amtk_352_avg_mm_late_describe = frm.describe_numeric_column(amtk_352_avg_mm_late)
amtk_352_avg_mm_late_describe

{'type': pandas.core.series.Series,
 'name': 'Late Detraining Customers Avg Min Late',
 'values': {'non_null': np.int64(121),
  'missing': np.int64(0),
  'dtype': dtype('float64')},
 'center': {'mean': np.float64(53.12396694214876),
  'median': 48.0,
  'mode': np.float64(45.0)},
 'position': {'min': 30.0,
  '25%': np.float64(44.0),
  '50%': np.float64(48.0),
  '75%': np.float64(60.0),
  'max': 96.0},
 'spread': {'variance': 212.576170798898,
  'std': 14.579992139877785,
  'range': 66.0,
  'iqr': np.float64(16.0)},
 'shape': {'skewness': np.float64(0.93595613955567),
  'kurtosis': np.float64(0.3595459501814209)}}

In [None]:
#hidden tests are within this cell

#### 6.6.3 _Wolverine_ Train 354 [1 pt]

In [98]:
# Drop missing values
amtk_354_avg_mm_late = amtk_354[COLS["late_detrn_avg_mm_late"]].dropna().reset_index(drop=True)

# Describe the column
amtk_354_avg_mm_late_describe = frm.describe_numeric_column(amtk_354_avg_mm_late)
amtk_354_avg_mm_late_describe

{'type': pandas.core.series.Series,
 'name': 'Late Detraining Customers Avg Min Late',
 'values': {'non_null': np.int64(134),
  'missing': np.int64(0),
  'dtype': dtype('float64')},
 'center': {'mean': np.float64(62.66417910447761),
  'median': 58.0,
  'mode': np.float64(57.0)},
 'position': {'min': 30.0,
  '25%': np.float64(45.25),
  '50%': np.float64(58.0),
  '75%': np.float64(73.0),
  'max': 177.0},
 'spread': {'variance': 565.3375042082815,
  'std': 23.77682704248575,
  'range': 147.0,
  'iqr': np.float64(27.75)},
 'shape': {'skewness': np.float64(1.538267408682376),
  'kurtosis': np.float64(3.821607423479781)}}

In [None]:
#hidden tests are within this cell

#### 6.6.4 Generate box plots

Retrieve the chart data, preaggregate it, and generate the box plots.

In [99]:
# Base columns for average minutes late
cols = [COLS["year"], COLS["quarter"], COLS["late_detrn_avg_mm_late"]]

# Data for boxplots
wolv_eb_trns = [
    {"number": 350, "route": amtk_350_rte, "stats": amtk_350_rte_stats},
    {"number": 352, "route": amtk_352_rte, "stats": amtk_352_rte_stats},
    {"number": 354, "route": amtk_354_rte, "stats": amtk_354_rte_stats},
]

# Assemble charts
for trn in wolv_eb_trns:
    chrt_data = detrn.get_qtr_avg_min_late(
        trn["route"], cols, COLS["year_quarter"], [COLORS["amtk_blue"], COLORS["amtk_red"]]
    )

    # Pre-aggregate the data
    chrt_data = frm.aggregate_data(
        chrt_data, [COLS["year_quarter"], COLS["late_detrn_avg_mm_late"]]
    )

    # Create chart title
    txt = TRN[str(trn["number"])]
    title_txt = (
        f"Amtrak {txt['name']} Train {txt['number']} Late Detraining Passengers\n"
        f"{txt['route']} ({txt['direction']})"
    )
    title = ttl.format_title(trn["stats"], title_txt)

    # Create and display the vertical boxplot
    chart_vertical = boxp.create_boxplot(
        data=chrt_data,
        x_shorthand="Fiscal Year Quarter:N",
        x_title="Period",
        y_shorthand="Late Detraining Customers Avg Min Late:Q",
        y_title="Average Minutes Late",
        box_size=20,
        outlier_shorthand="outliers:Q",
        color_shorthand="Color:N",
        chart_title=title,
        orient=boxp.Orient.VERTICAL,
    )
    chart_vertical.display()

### 6.7 _Wolverine_: visualize eastbound mean late arrival times by station

Visualizing mean late arrival times by station for each train in a single chart requires aligning
the station order for trains 350, 352, and 354. This requires adding "placeholder" rows to each
train `DataFrame` that represent stations not visited by all trains.

| Train | "Placeholder" Station(s) | Notes |
| :---- | :----------------------- | :---- |
| 350   | Albion, MI ([ALI](https://www.amtrak.com/stations/ali)) | &nbsp; |
| 352   | Albion, MI ([ALI](https://www.amtrak.com/stations/ali)), Dowagiac, MI ([DOA](https://www.amtrak.com/stations/doa)), Hammond-Whiting, IN ([HMI](https://www.amtrak.com/stations/hmi)), Michigan City, IN ([MCI](https://en.wikipedia.org/wiki/Michigan_City_station)) | MCI closed 4 April 2022 |
| 354   | Dowagiac, MI ([DOA](https://www.amtrak.com/stations/doa)), Hammond-Whiting, IN ([HMI](https://www.amtrak.com/stations/hmi)) | &nbsp; |

#### 6.7.1 Reindex `wolv_stns`

In [109]:
# Reindex DataFrame based on amtk_350_rte_stats columns.
wolv_stns = wolv_stns.reindex(columns=amtk_350_rte_stats.columns).reset_index(drop=True)
# wolv_stns

#### 6.7.2 Assemble chart data [1 pt]

Combine the three `amtk_*_chrt_data` `DataFrame` objects created below by calling the function
`ntwk.add_stations_to_route()` into a single `DataFrame` object named `chrt_data`. When combining
the `DataFrame` objects ignore their current indices.

In [110]:
amtk_350_chrt_data = ntwk.add_stations_to_route(
    amtk_350_rte_stats.copy(),
    wolv_stns[wolv_stns[COLS["station_code"]] == "ALI"],
    wolv_stn_order_eb,
)
# amtk_350_chrt_data

In [111]:
amtk_352_chrt_data = ntwk.add_stations_to_route(
    amtk_352_rte_stats.copy(),
    wolv_stns[wolv_stns[COLS["station_code"]].isin(("ALI", "DOA", "HMI", "MCI"))],
    wolv_stn_order_eb,
)
# amtk_352_chrt_data

In [113]:
amtk_354_chrt_data = ntwk.add_stations_to_route(
    amtk_354_rte_stats.copy(),
    wolv_stns[wolv_stns[COLS["station_code"]].isin(("DOA", "HMI"))],
    wolv_stn_order_eb,
)
# amtk_354_chrt_data

Unnamed: 0,Train Number,Arrival Station Code,Arrival Station,State,Latitude,Longitude,Train Arrivals,Total Detraining Customers sum,Total Detraining Customers mean,Total Detraining Customers median,Total Detraining Customers std,Late Detraining Customers sum,Late Detraining Customers mean,Late Detraining Customers median,Late Detraining Customers std,Late to Total Detraining Customers Ratio,Late Detraining Customers Avg Min Late mean,Total On Time Detraining Customers sum
0,354.0,HMI,Hammond-Whiting,Indiana,41.691155,-87.506511,,,,,,,,,,,,
1,354.0,MCI,Michigan City,Indiana,41.721111,-86.905556,3.0,192.0,64.0,86.0,49.7896,77.0,25.6667,31.0,23.4592,0.401,39.5,115.0
2,354.0,NBU,New Buffalo,Michigan,41.796656,-86.745782,11.0,6639.0,603.5455,616.0,283.4499,1731.0,157.3636,152.0,79.0927,0.2607,58.0909,4908.0
3,354.0,NLS,Niles,Michigan,41.837412,-86.252372,11.0,3552.0,322.9091,288.0,121.3379,997.0,90.6364,82.0,49.6996,0.2807,59.5455,2555.0
4,354.0,DOA,Dowagiac,Michigan,41.980941,-86.109041,,,,,,,,,,,,
5,354.0,KAL,Kalamazoo,Michigan,42.295255,-85.584018,11.0,19592.0,1781.0909,1812.0,264.3968,7155.0,650.4545,662.0,171.552,0.3652,54.9091,12437.0
6,354.0,BTL,Battle Creek,Michigan,42.318453,-85.187825,11.0,6211.0,564.6364,557.0,164.4161,2213.0,201.1818,177.0,77.9664,0.3563,58.2727,3998.0
7,354.0,ALI,Albion,Michigan,42.247196,-84.755814,11.0,2050.0,186.3636,199.0,38.6789,887.0,80.6364,64.0,35.3929,0.4327,48.4545,1163.0
8,354.0,JXN,Jackson,Michigan,42.248113,-84.39967,11.0,6884.0,625.8182,659.0,118.1675,3210.0,291.8182,289.0,73.8428,0.4663,60.0,3674.0
9,354.0,ARB,Ann Arbor,Michigan,42.287692,-83.743154,11.0,62140.0,5649.0909,5845.0,795.1041,30524.0,2774.9091,2668.0,803.5014,0.4912,55.1818,31616.0


In [114]:
# YOUR CODE HERE
chrt_data = pd.concat([amtk_350_chrt_data, amtk_352_chrt_data, amtk_354_chrt_data], ignore_index=True)

In [None]:
#hidden tests are within this cell

#### 6.7.3 Generate line chart

In [115]:
# Chart title
title_txt = f"Amtrak {SUB_SVC['wolv']} Service Late Detraining Passengers"
title = ttl.format_title(wolv_eb_stats, title_txt)

# Arrange stations by direction of travel
x_sort_order = chrt_data.index.tolist()

# Custom line colors
line_colors = {350: COLORS["amtk_blue"], 352: COLORS["amtk_red"], 354: COLORS["blue"]}

# Tooltips
tooltip_config = [
    {"shorthand": f"{COLS['trn']}:N", "title": "Train", "format": None},
    {"shorthand": f"{COLS['station']}:N", "title": "Arrival Station", "format": None},
    {
        "shorthand": f"{COLS['late_detrn_avg_mm_late']} mean",
        "title": "Average Minutes Late",
        "format": None,
    },
]

chart = lne.create_line_chart_interp(
    frame=chrt_data,
    x_shorthand=f"{COLS['station']}:N",
    x_title=f"{COLS['station']}",
    x_sort_order=x_sort_order,
    y_shorthand=f"{COLS['late_detrn_avg_mm_late']} mean:Q",
    y_title="Average Minutes Late",
    y_tick_count_max=85,
    point=True,
    # point={"filled": False, "fill": "white"},
    color_shorthand=f"{COLS['trn']}:N",
    colors=line_colors,
    tooltip_config=tooltip_config,
    title=title,
)
chart.display()

### 6.8 _Wolverine_: westbound detraining passengers summary statistics [2 pts]

Call the appropriate `DataFrame` method to retrieve the station data for trains 351, 353, and 355
referenced in `wolv`. Assign the new `DataFrame` returned by the method call to a variable named
`wolv_wb`.

In [116]:
# YOUR CODE HERE
wolv_wb = wolv[wolv["Train Number"].isin([351, 353, 355])]
wolv_wb

Unnamed: 0,Fiscal Year,Fiscal Quarter,Service Line,Service,Sub Service,Route Miles,Train Number,Arrival Station Code,Arrival Station,Arrival Station Type,...,State,Division,Region,Country,Latitude,Longitude,Total Detraining Customers,Late Detraining Customers,Late to Total Detraining Customers Ratio,Late Detraining Customers Avg Min Late
13,2024,3,State Supported,Michigan,Wolverine,299,351,ALI,Albion,Station Building (with waiting room),...,Michigan,East North Central,Midwest,United States,42.247196,-84.755814,81,26,0.32099,21.0
14,2024,3,State Supported,Michigan,Wolverine,299,351,ARB,Ann Arbor,Station Building (with waiting room),...,Michigan,East North Central,Midwest,United States,42.287692,-83.743154,229,42,0.18341,20.0
15,2024,3,State Supported,Michigan,Wolverine,299,351,BTL,Battle Creek,Station Building (with waiting room),...,Michigan,East North Central,Midwest,United States,42.318453,-85.187825,306,114,0.37255,21.0
16,2024,3,State Supported,Michigan,Wolverine,299,351,CHI,Chicago (Union Station),Station Building (with waiting room),...,Illinois,East North Central,Midwest,United States,41.878992,-87.641015,18987,5824,0.30674,33.0
17,2024,3,State Supported,Michigan,Wolverine,299,351,DER,Dearborn,Station Building (with waiting room),...,Michigan,East North Central,Midwest,United States,42.307167,-83.235311,24,2,0.08333,23.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
766,2022,1,State Supported,Michigan,Wolverine,299,355,MCI,Michigan City,,...,Indiana,East North Central,Midwest,United States,41.721111,-86.905556,103,66,0.64078,41.0
767,2022,1,State Supported,Michigan,Wolverine,299,355,NBU,New Buffalo,Platform with Shelter,...,Michigan,East North Central,Midwest,United States,41.796656,-86.745782,56,39,0.69643,39.0
768,2022,1,State Supported,Michigan,Wolverine,299,355,NLS,Niles,Station Building (with waiting room),...,Michigan,East North Central,Midwest,United States,41.837412,-86.252372,311,185,0.59486,44.0
769,2022,1,State Supported,Michigan,Wolverine,299,355,ROY,Royal Oak,Platform with Shelter,...,Michigan,East North Central,Midwest,United States,42.488439,-83.147010,1,1,1.00000,30.0


In [None]:
#hidden tests are within this cell

Compute the summary statistics for westbound _Wolverine_ trains. The `amtk_detrain` module includes
a function that you can call to perform the computation. Pass to it `wolv_wb` along with other
arguments required by the function. Assign the return value of the function call to a variable named
`wolv_wb_stats`.

In [136]:
# YOUR CODE HERE
wolv_wb_stats = detrn.get_sum_stats_by_group(
    wolv_wb,
    COLS["sub_svc"],
    AGG["columns"],
    AGG["funcs"],
)
wolv_wb_stats.drop(columns="Sub Service", inplace=True)
wolv_wb_stats

Unnamed: 0,Train Arrivals,Total Detraining Customers sum,Total Detraining Customers mean,Total Detraining Customers median,Total Detraining Customers std,Late Detraining Customers sum,Late Detraining Customers mean,Late Detraining Customers median,Late Detraining Customers std,Late to Total Detraining Customers Ratio,Late Detraining Customers Avg Min Late mean,Total On Time Detraining Customers sum
0,369,543280,1472.3035,164.0,4184.9739,155691,421.9268,51.0,1231.1878,0.2866,41.5836,387589


In [None]:
#hidden tests are within this cell

#### 6.8.1 _Wolverine_ Train 351 [1 pt]

In [119]:
# Train 351 westbound
amtk_351 = ntwk.by_train_number(wolv, 351)
amtk_351_rte = ntwk.create_route(amtk_351, TRN["351"]["direction"], wolv_stn_order_wb)
amtk_351_rte_stats = detrn.get_route_sum_stats(
    amtk_351_rte, COLS["station_code"], AGG["columns"], AGG["funcs"], rte_cols
)
amtk_351_rte_stats

Unnamed: 0,Train Number,Arrival Station Code,Arrival Station,State,Latitude,Longitude,Train Arrivals,Total Detraining Customers sum,Total Detraining Customers mean,Total Detraining Customers median,Total Detraining Customers std,Late Detraining Customers sum,Late Detraining Customers mean,Late Detraining Customers median,Late Detraining Customers std,Late to Total Detraining Customers Ratio,Late Detraining Customers Avg Min Late mean,Total On Time Detraining Customers sum
0,351,TRM,Troy,Michigan,42.542555,-83.191026,4,8,2.0,1.5,1.4142,0,0.0,0.0,0.0,0.0,,8
1,351,ROY,Royal Oak,Michigan,42.488439,-83.14701,5,18,3.6,3.0,2.51,0,0.0,0.0,0.0,0.0,,18
2,351,DET,Detroit,Michigan,42.368097,-83.072397,11,169,15.3636,16.0,7.0608,11,1.0,1.0,1.0954,0.0651,35.1667,158
3,351,DER,Dearborn,Michigan,42.307167,-83.235311,11,128,11.6364,8.0,8.4649,16,1.4545,2.0,1.1282,0.125,26.7143,112
4,351,ARB,Ann Arbor,Michigan,42.287692,-83.743154,11,1721,156.4545,151.0,41.7142,345,31.3636,32.0,15.8447,0.2005,32.0,1376
5,351,JXN,Jackson,Michigan,42.248113,-84.39967,11,776,70.5455,69.0,10.7365,256,23.2727,20.0,17.4132,0.3299,30.1818,520
6,351,ALI,Albion,Michigan,42.247196,-84.755814,11,772,70.1818,80.0,17.4403,296,26.9091,23.0,14.8893,0.3834,29.6364,476
7,351,BTL,Battle Creek,Michigan,42.318453,-85.187825,11,2235,203.1818,205.0,71.9608,934,84.9091,61.0,55.4968,0.4179,32.5455,1301
8,351,KAL,Kalamazoo,Michigan,42.295255,-85.584018,11,6076,552.3636,562.0,122.4682,2208,200.7273,135.0,133.8081,0.3634,34.7273,3868
9,351,CHI,Chicago (Union Station),Illinois,41.878992,-87.641015,11,175079,15916.2727,16921.0,3155.2069,59421,5401.9091,5649.0,2101.9358,0.3394,48.1818,115658


In [None]:
#hidden tests are within this cell

##### 6.8.1.1 Write to file [1 pt]

Write `amtk_351_rte_stats` to a CSV file named `stu-amtk_351_rte_stats.csv`. Store the file in the
`data/student` directory. Then compare it to the accompanying `fxt-amtk_351_rte_stats.csv` file.
It must match line for line, character for character.

In [120]:
# YOUR CODE HERE
filepath = parent_path.joinpath("data", "student", "stu-amtk_351_rte_stats.csv")
amtk_351_rte_stats.to_csv(filepath, index=False)

In [None]:
#hidden tests are within this cell

#### 6.8.2 _Wolverine_ Train 353 [1 pt]

In [121]:
# Train 353 westbound
amtk_353 = ntwk.by_train_number(wolv, 353)
amtk_353_rte = ntwk.create_route(amtk_353, TRN["353"]["direction"], wolv_stn_order_wb)
amtk_353_rte_stats = detrn.get_route_sum_stats(
    amtk_353_rte, COLS["station_code"], AGG["columns"], AGG["funcs"], rte_cols
)
amtk_353_rte_stats

Unnamed: 0,Train Number,Arrival Station Code,Arrival Station,State,Latitude,Longitude,Train Arrivals,Total Detraining Customers sum,Total Detraining Customers mean,Total Detraining Customers median,Total Detraining Customers std,Late Detraining Customers sum,Late Detraining Customers mean,Late Detraining Customers median,Late Detraining Customers std,Late to Total Detraining Customers Ratio,Late Detraining Customers Avg Min Late mean,Total On Time Detraining Customers sum
0,353,TRM,Troy,Michigan,42.542555,-83.191026,7,29,4.1429,3.0,3.1848,0,0.0,0.0,0.0,0.0,,29
1,353,ROY,Royal Oak,Michigan,42.488439,-83.14701,10,49,4.9,4.0,3.9847,0,0.0,0.0,0.0,0.0,,49
2,353,DET,Detroit,Michigan,42.368097,-83.072397,11,303,27.5455,27.0,10.8661,11,1.0,0.0,1.3416,0.0363,33.0,292
3,353,DER,Dearborn,Michigan,42.307167,-83.235311,11,267,24.2727,24.0,8.7988,20,1.8182,1.0,2.7863,0.0749,37.4286,247
4,353,ARB,Ann Arbor,Michigan,42.287692,-83.743154,11,6291,571.9091,585.0,77.7463,1033,93.9091,77.0,67.0827,0.1642,37.1818,5258
5,353,JXN,Jackson,Michigan,42.248113,-84.39967,11,1673,152.0909,148.0,22.678,615,55.9091,47.0,38.0958,0.3676,33.0909,1058
6,353,BTL,Battle Creek,Michigan,42.318453,-85.187825,11,2309,209.9091,210.0,33.4678,571,51.9091,46.0,34.1803,0.2473,50.8182,1738
7,353,KAL,Kalamazoo,Michigan,42.295255,-85.584018,11,9240,840.0,848.0,154.2874,2131,193.7273,157.0,110.5234,0.2306,49.3636,7109
8,353,NLS,Niles,Michigan,41.837412,-86.252372,11,3654,332.1818,338.0,63.6252,1134,103.0909,93.0,59.8823,0.3103,43.9091,2520
9,353,NBU,New Buffalo,Michigan,41.796656,-86.745782,11,2520,229.0909,244.0,57.6393,969,88.0909,65.0,52.2904,0.3845,40.2727,1551


In [None]:
#hidden tests are within this cell

##### 6.8.2.1 Write to file [1 pt]

Write `amtk_353_rte_stats` to a CSV file named `stu-amtk_353_rte_stats.csv`. Store the file in the
`data/student` directory. Then compare it to the accompanying `fxt-amtk_353_rte_stats.csv` file.
It must match line for line, character for character.

In [122]:
# YOUR CODE HERE
filepath = parent_path.joinpath("data", "student", "stu-amtk_353_rte_stats.csv")
amtk_353_rte_stats.to_csv(filepath, index=False)

In [None]:
#hidden tests are within this cell

#### 6.8.3 _Wolverine_ Train 355 [1 pt]

In [123]:
# Train 355 westbound
amtk_355 = ntwk.by_train_number(wolv, 355)
amtk_355_rte = ntwk.create_route(amtk_355, TRN["355"]["direction"], wolv_stn_order_wb)
amtk_355_rte_stats = detrn.get_route_sum_stats(
    amtk_355_rte, COLS["station_code"], AGG["columns"], AGG["funcs"], rte_cols
)
amtk_355_rte_stats

Unnamed: 0,Train Number,Arrival Station Code,Arrival Station,State,Latitude,Longitude,Train Arrivals,Total Detraining Customers sum,Total Detraining Customers mean,Total Detraining Customers median,Total Detraining Customers std,Late Detraining Customers sum,Late Detraining Customers mean,Late Detraining Customers median,Late Detraining Customers std,Late to Total Detraining Customers Ratio,Late Detraining Customers Avg Min Late mean,Total On Time Detraining Customers sum
0,355,TRM,Troy,Michigan,42.542555,-83.191026,10,130,13.0,6.5,12.1838,4,0.4,0.0,0.8433,0.0308,48.5,126
1,355,ROY,Royal Oak,Michigan,42.488439,-83.14701,11,71,6.4545,4.0,5.7856,10,0.9091,0.0,2.1192,0.1408,44.0,61
2,355,DET,Detroit,Michigan,42.368097,-83.072397,11,436,39.6364,40.0,13.0405,46,4.1818,3.0,2.5226,0.1055,51.2727,390
3,355,DER,Dearborn,Michigan,42.307167,-83.235311,11,334,30.3636,25.0,12.5561,30,2.7273,2.0,2.3703,0.0898,32.8889,304
4,355,ARB,Ann Arbor,Michigan,42.287692,-83.743154,11,5220,474.5455,480.0,131.7402,1194,108.5455,102.0,45.7741,0.2287,44.7273,4026
5,355,JXN,Jackson,Michigan,42.248113,-84.39967,11,2425,220.4545,236.0,50.6801,1178,107.0909,108.0,32.0639,0.4858,33.6364,1247
6,355,BTL,Battle Creek,Michigan,42.318453,-85.187825,11,2722,247.4545,232.0,56.4967,1488,135.2727,144.0,44.6544,0.5467,40.9091,1234
7,355,KAL,Kalamazoo,Michigan,42.295255,-85.584018,11,9229,839.0,844.0,192.3008,5227,475.1818,463.0,136.2357,0.5664,41.3636,4002
8,355,DOA,Dowagiac,Michigan,41.980941,-86.109041,11,896,81.4545,83.0,20.6851,483,43.9091,48.0,13.39,0.5391,46.0,413
9,355,NLS,Niles,Michigan,41.837412,-86.252372,11,3306,300.5455,306.0,55.863,1694,154.0,160.0,44.8152,0.5124,46.4545,1612


In [None]:
#hidden tests are within this cell

##### 6.8.3.1 Write to file [1 pt]

Write `amtk_355_rte_stats` to a CSV file named `stu-amtk_355_rte_stats.csv`. Store the file in the
`data/student` directory. Then compare it to the accompanying `fxt-amtk_355_rte_stats.csv` file.
It must match line for line, character for character.

In [124]:
# YOUR CODE HERE
filepath = parent_path.joinpath("data", "student", "stu-amtk_355_rte_stats.csv")
amtk_355_rte_stats.to_csv(filepath, index=False)

In [None]:
#hidden tests are within this cell

### 6.9 _Wolverine_: westbound mean late arrival times

Review the central tendency, dispersion, and shape for the mean late arrival times of trains 351, 353, and 355.

#### 6.9.1 _Wolverine_ Train 351 [1 pt]

In [125]:
# Drop missing values
amtk_351_avg_mm_late = amtk_351[COLS["late_detrn_avg_mm_late"]].dropna().reset_index(drop=True)

# Describe the column
amtk_351_avg_mm_late_describe = frm.describe_numeric_column(amtk_351_avg_mm_late)
amtk_351_avg_mm_late_describe

{'type': pandas.core.series.Series,
 'name': 'Late Detraining Customers Avg Min Late',
 'values': {'non_null': np.int64(79),
  'missing': np.int64(0),
  'dtype': dtype('float64')},
 'center': {'mean': np.float64(33.89873417721519),
  'median': 31.0,
  'mode': np.float64(28.0)},
 'position': {'min': 16.0,
  '25%': np.float64(27.0),
  '50%': np.float64(31.0),
  '75%': np.float64(39.5),
  'max': 100.0},
 'spread': {'variance': 163.93833171048374,
  'std': 12.803840506288875,
  'range': 84.0,
  'iqr': np.float64(12.5)},
 'shape': {'skewness': np.float64(2.405265812101544),
  'kurtosis': np.float64(9.703271715406135)}}

In [None]:
#hidden tests are within this cell

#### 6.9.2 _Wolverine_ Train 353 [1 pt]

In [126]:
# Drop missing values
amtk_353_avg_mm_late = amtk_353[COLS["late_detrn_avg_mm_late"]].dropna().reset_index(drop=True)

# Describe the column
amtk_353_avg_mm_late_describe = frm.describe_numeric_column(amtk_353_avg_mm_late)
amtk_353_avg_mm_late_describe

{'type': pandas.core.series.Series,
 'name': 'Late Detraining Customers Avg Min Late',
 'values': {'non_null': np.int64(100),
  'missing': np.int64(0),
  'dtype': dtype('float64')},
 'center': {'mean': np.float64(43.59),
  'median': 42.0,
  'mode': np.float64(38.0)},
 'position': {'min': 16.0,
  '25%': np.float64(35.75),
  '50%': np.float64(42.0),
  '75%': np.float64(50.0),
  'max': 121.0},
 'spread': {'variance': 257.7190909090909,
  'std': 16.053631704667044,
  'range': 105.0,
  'iqr': np.float64(14.25)},
 'shape': {'skewness': np.float64(1.6758855954166525),
  'kurtosis': np.float64(5.52546183370109)}}

In [None]:
#hidden tests are within this cell

#### 6.9.3 _Wolverine_ Train 355 [1 pt]

In [127]:
# Drop missing values
amtk_355_avg_mm_late = amtk_355[COLS["late_detrn_avg_mm_late"]].dropna().reset_index(drop=True)

# Describe the column
amtk_355_avg_mm_late_describe = frm.describe_numeric_column(amtk_355_avg_mm_late)
amtk_355_avg_mm_late_describe

{'type': pandas.core.series.Series,
 'name': 'Late Detraining Customers Avg Min Late',
 'values': {'non_null': np.int64(126),
  'missing': np.int64(0),
  'dtype': dtype('float64')},
 'center': {'mean': np.float64(44.80952380952381),
  'median': 44.0,
  'mode': np.float64(40.0)},
 'position': {'min': 18.0,
  '25%': np.float64(35.0),
  '50%': np.float64(44.0),
  '75%': np.float64(52.75),
  'max': 89.0},
 'spread': {'variance': 207.86742857142852,
  'std': 14.417608281938739,
  'range': 71.0,
  'iqr': np.float64(17.75)},
 'shape': {'skewness': np.float64(0.6438682121986395),
  'kurtosis': np.float64(0.4560000745119206)}}

In [None]:
#hidden tests are within this cell

#### 6.9.4 Generate box plots

Retrieve the chart data, preaggregate it, and generate the box plots.

In [128]:
# Base columns for average minutes late
cols = [COLS["year"], COLS["quarter"], COLS["late_detrn_avg_mm_late"]]

# Data for boxplots
wolv_wb_trns = [
    {"number": 351, "route": amtk_351_rte, "stats": amtk_351_rte_stats},
    {"number": 353, "route": amtk_353_rte, "stats": amtk_353_rte_stats},
    {"number": 355, "route": amtk_355_rte, "stats": amtk_355_rte_stats},
]

# Assemble charts
for trn in wolv_wb_trns:
    chrt_data = detrn.get_qtr_avg_min_late(
        trn["route"], cols, COLS["year_quarter"], [COLORS["amtk_blue"], COLORS["amtk_red"]]
    )

    # Pre-aggregate the data
    chrt_data = frm.aggregate_data(
        chrt_data, [COLS["year_quarter"], COLS["late_detrn_avg_mm_late"]]
    )

    # Create chart title
    txt = TRN[str(trn["number"])]
    title_txt = (
        f"Amtrak {txt['name']} Train {txt['number']} Late Detraining Passengers\n"
        f"{txt['route']} ({txt['direction']})"
    )
    title = ttl.format_title(trn["stats"], title_txt)

    # Create and display the vertical boxplot
    chart_vertical = boxp.create_boxplot(
        data=chrt_data,
        x_shorthand="Fiscal Year Quarter:N",
        x_title="Period",
        y_shorthand="Late Detraining Customers Avg Min Late:Q",
        y_title="Average Minutes Late",
        box_size=20,
        outlier_shorthand="outliers:Q",
        color_shorthand="Color:N",
        chart_title=title,
        orient=boxp.Orient.VERTICAL,
    )
    chart_vertical.display()


### 6.10 _Wolverine_: visualize westbound mean late arrival times by station

Visualizing mean late arrival times by station for each train in a single chart requires aligning
the station order for trains 351, 353, and 355. This requires adding "placeholder" rows to each
train `DataFrame` that represent stations not visited by all trains.

| Train | "Placeholder" Station(s) | Notes |
| :---- | :----------------------- | :---- |
| 351   | Dowagiac, MI ([DOA](https://www.amtrak.com/stations/doa)), Hammond-Whiting, IN ([HMI](https://www.amtrak.com/stations/hmi)), Michigan City, IN ([MCI](https://en.wikipedia.org/wiki/Michigan_City_station)), New Buffalo, MI ([NBU](https://www.amtrak.com/stations/nbu)), Niles, MI ([NLS](https://www.amtrak.com/stations/nls)) | MCI closed 4 April 2022 |
| 353   | Albion, MI ([ALI](https://www.amtrak.com/stations/ali)), Dowagiac, MI ([DOA](https://www.amtrak.com/stations/doa)), Michigan City, IN ([MCI](https://en.wikipedia.org/wiki/Michigan_City_station)) | MCI closed 4 April 2022 |
| 355   | Albion, MI ([ALI](https://www.amtrak.com/stations/ali)) | &nbsp; |

#### 6.10.1 Assemble chart data [1 pt]

Combine the three `amtk_*_chrt_data` `DataFrame` objects created below by calling the function
`ntwk.add_stations_to_route()` into a single `DataFrame` object named `chrt_data`. When combining
the `DataFrame` objects ignore their current indices.

In [129]:
amtk_351_chrt_data = ntwk.add_stations_to_route(
    amtk_351_rte_stats.copy(),
    wolv_stns[wolv_stns[COLS["station_code"]].isin(("DOA", "HMI", "MCI", "NBU", "NLS"))],
    wolv_stn_order_wb,
)
# amtk_351_chrt_data

In [130]:
amtk_353_chrt_data = ntwk.add_stations_to_route(
    amtk_353_rte_stats.copy(),
    wolv_stns[wolv_stns[COLS["station_code"]].isin(("ALI", "DOA", "MCI"))],
    wolv_stn_order_wb,
)
# amtk_353_chrt_data

In [131]:
# Add ALI to Train 355 route
amtk_355_chrt_data = ntwk.add_stations_to_route(
    amtk_355_rte_stats.copy(),
    wolv_stns[wolv_stns[COLS["station_code"]] == "ALI"],
    wolv_stn_order_wb,
)
# amtk_355_chrt_data

In [132]:
# YOUR CODE HERE
chrt_data = pd.concat([amtk_351_chrt_data, amtk_353_chrt_data, amtk_355_chrt_data], ignore_index=True)

In [None]:
#hidden tests are within this cell

#### 6.10.2 Generate line chart

In [133]:
# Chart title
title_txt = f"Amtrak {SUB_SVC['wolv']} Service Late Detraining Passengers"
title = ttl.format_title(wolv_wb_stats, title_txt)

# Arrange stations by direction of travel
x_sort_order = chrt_data.index.tolist()

# Custom line colors
line_colors = {351: COLORS["amtk_blue"], 353: COLORS["amtk_red"], 355: COLORS["blue"]}

# Tooltips
tooltip_config = [
    {"shorthand": f"{COLS['trn']}:N", "title": "Train", "format": None},
    {"shorthand": f"{COLS['station']}:N", "title": "Arrival Station", "format": None},
    {
        "shorthand": f"{COLS['late_detrn_avg_mm_late']} mean",
        "title": "Average Minutes Late",
        "format": None,
    },
]

chart = lne.create_line_chart_interp(
    frame=chrt_data,
    x_shorthand=f"{COLS['station']}:N",
    x_title=f"{COLS['station']}",
    x_sort_order=x_sort_order,
    y_shorthand=f"{COLS['late_detrn_avg_mm_late']} mean:Q",
    y_title="Average Minutes Late",
    y_tick_count_max=85,
    point=True,
    # point={"filled": False, "fill": "white"},
    color_shorthand=f"{COLS['trn']}:N",
    colors=line_colors,
    tooltip_config=tooltip_config,
    title=title,
)
chart.display()

## 5.0 Watermark

In [None]:
%load_ext watermark
%watermark -h -i -iv -m -v