# How did the Pandemic affect public transport?

In the face of the 2020 lockdowns, the number of riders on public transport plumetted. Many regular commuters and casual transit riders alike stayed home or switched to private modes. Bikes gained popularity and perhaps car share grew.

That's the common perception, at least in my city, Kraków. But what kind of data can we study to understand traffic dynamics?  

One source of truth are the publicly available GTFS schedules provided by made transit agencies worldwide.

Let's take a look at how MPK Kraków changed its transit offering over time.

## Approach

The metric we want to know is number of stops per hour. One line, every ten minutes means approximately 60 stops per hour.

We'll measure weekday peak times, in Kraków that's approximately 7:30-8:30 and 15:30 to 16:30. Outside of that, we'll measure weekday off peak from 12-13, weekday evenings 20-21, Night schedules on Friday nights from 1-2 am. Saturday and sunday at 12-13 and 20-21. This gives us approximately 5 weekday timings, and two timings per weekend day.

It would be interesting to see how the offering changed over time before and after lockdowns.

So let's say 1 March 2020, 1 April 2020, and 1 May 2020, or the nearest full schedule close to that.

We'll study both buses and trams, and their combined statistics.

## Visualize

From news articles (TODO link), we know that some lines were suspended entirely, while others were reduced. Was the applied evenly across the whole city or were certain parts disproportionately affected? Did service drop below acceptable levels? (max headway)  

To help us explore these ideas, we'll start mapping the impact visually.

In [25]:
%load_ext lab_black

In [26]:
import os

In [28]:
# os.chdir(os.path.dirname(os.getcwd()))
# os.chdir("2021sp-final-project-filipwodnicki")

## Get schedules

In [42]:
import requests


def download_url(url, save_path, chunk_size=128):
    r = requests.get(url, stream=True)
    with open(save_path, "wb") as fd:
        for chunk in r.iter_content(chunk_size=chunk_size):
            fd.write(chunk)


# code credit: https://stackoverflow.com/questions/9419162/download-returned-zip-file-from-url

In [32]:
feeds = [
    "https://transitfeeds.com/p/mpk-sa-w-krakowie/1105/20210201/download",
    "https://transitfeeds.com/p/mpk-sa-w-krakowie/1105/20210315/download",
    "https://transitfeeds.com/p/mpk-sa-w-krakowie/1105/20210327/download",
]

In [43]:
feeds_paths = []

for feed in feeds:
    city = "krakow"
    date = feed.split("/")[-2]
    save_path = os.path.join("demo", "data", f"gtfs_{city}_tram_{date}.zip")
    #     download_url(feed, save_path=save_path)
    feeds_paths.append(save_path)

In [45]:
feeds_paths

['demo/data/gtfs_krakow_tram_20210201.zip',
 'demo/data/gtfs_krakow_tram_20210315.zip',
 'demo/data/gtfs_krakow_tram_20210327.zip']

## Basic stats

Now that we have our schedules saved, we'll need to open them to start exploring- how many stops were there at each time period during the week?

By the way, it makes sense to save each file to disk because it's only 1.8MB each.   

A hundred schedules would yield 180MB and if we did 100 cities, that would be 18GB. All very reasonable. However, at this scale of study, we do start to run into big data issues.

We could reduce the weight by 1/10th by limiting the study period to 10 schedules. However, how would we do that? Is each schedule complete? 

In [46]:
# Let's open the schedule using partridge
import partridge as ptg

In [48]:
feed = feeds_paths[0]

In [49]:
date, service_ids = ptg.read_busiest_date(feed)

In [56]:
print(service_ids)

frozenset({'service_1'})


In [52]:
view = {"trips.txt": {"service_id": service_ids}}
feed = ptg.load_feed(feed, view)

In [54]:
feed.stop_times

Unnamed: 0,trip_id,arrival_time,departure_time,stop_id,stop_sequence,stop_headsign,pickup_type,drop_off_type,shape_dist_traveled,timepoint
0,block_1_trip_1_service_1,15000.0,15000.0,stop_293_61329,2,,1,0,,1
1,block_1_trip_1_service_1,15060.0,15060.0,stop_292_61229,3,,1,0,,1
2,block_1_trip_1_service_1,15180.0,15180.0,stop_285_58429,4,,1,0,,1
3,block_1_trip_1_service_1,15300.0,15300.0,stop_284_57729,5,,1,0,,1
4,block_1_trip_1_service_1,15420.0,15420.0,stop_283_57629,6,,1,0,,1
...,...,...,...,...,...,...,...,...,...,...
104536,block_328_trip_13_service_1,65700.0,65700.0,stop_353_274439,17,,0,0,,1
104537,block_328_trip_13_service_1,65820.0,65820.0,stop_253_42319,18,,0,0,,1
104538,block_328_trip_13_service_1,66000.0,66000.0,stop_267_45919,19,,1,0,,1
104539,block_328_trip_14_service_1,66120.0,66120.0,stop_273_46529,2,,1,0,,1


In [57]:
# How many stop times were there throughout the day?

In [58]:
# How many lines are there?

In [60]:
# What was the busiest stop

In [61]:
# Average stop?

In [62]:
# Least busy stop?

## Visualize

In [29]:
from final_project.network import WalkNetwork, TransitNetwork, MultiNetwork