# Subway Ridership Lab

### Introduction

In this lesson, we will work with some subway ridership data.  Information about the API is available [here](https://data.ny.gov/Transportation/MTA-Subway-Hourly-Ridership-Beginning-February-202/wujg-7c2s).

Let's get started.

### Why do it

Here's a couple reasons why exploring the API is useful.  

The main reason is that it's a proxy for movement in the city.  And as we know, city downtowns have changed dramatically post-covid.  So here, we can explore how ridership has changed, and how people now move about in a hybrid working world.

### Working with Data

In this lesson, we will work with our subway data. Let's start by making a request to the data here.

In [1]:
url = "https://data.ny.gov/resource/wujg-7c2s.json"

import requests

response = requests.get(url)

In [4]:
response.json()[:1]

[{'transit_timestamp': '2022-10-14T05:00:00.000',
  'station_complex_id': 'H007',
  'station_complex': '1 Av (L)',
  'borough': 'M',
  'routes': 'L',
  'payment_method': 'omny',
  'ridership': '31.0',
  'transfers': '0.0',
  'latitude': '40.730953216552734',
  'longitude': '-73.98162841796875',
  'georeference': {'type': 'Point',
   'coordinates': [-73.98162841796875, 40.730953216552734]},
  'itsuid': '2022-10-14T05:00:00H0071 Av (L)MLomny',
  ':@computed_region_wbg7_3whc': '724',
  ':@computed_region_kjdx_g34t': '2095',
  ':@computed_region_yamh_8v7k': '749'}]

Like always, a good first step is to determine the grain of the data.  Here, we can see that there is ridership amount at a given time -- potentially a given an hour.

### Explore the API

A good additional step is to now explore the API.  You can see the [Api documentation](https://dev.socrata.com/foundry/data.ny.gov/wujg-7c2s) listed here.

If you click on some of the fields listed, you can see how to make a specific query.  For example to filter for information in a specific borough, you can make the following request, which will only return hourly turnstile information in brooklyn.

<img src="./fields.png">

And if we want to combine two searches, we will of course places the ampersand between our query parameters.  For example, copying and pasting the following into a browser will return ridership stats for 2023 in brooklyn.

```python
https://data.ny.gov/resource/wujg-7c2s.json?transit_timestamp=2023-10-01T00:00:00.000&borough=BK
```

And from there, you can dig deeper.

### Some analysis

We would like this to be relatively free form, but here are some questions to get you started.  In general advice is the following -- start broad and then go narrow.  

So if we are exploring subway ridership, which of the subway stops are significant to explore.  That might be our first question.

1. Write a function that given a day and a borough, returns the dictionaries with the max ridership numbers.

2. From there, you can see find the stations with that have some of the top ridership numbers in a given hour.  


### Performing Analysis

Now we are ready to perform some of the analysis.  Fo

1. Now write a function that given a station (as a string), and a day, returns the total amount of riders for that day.

2. And now let's see how ridership varies by the day of week.  For example, for a station, what are the most popular days of travel?

3. Can you calculate this per month, and determine monthly travel?

4. Or hourly travel?
    > For something like this, it may be useful to first write a function that places the day, hour, and year of each dictionary into separate key value pairs, and then perform the analysis. 

5. What are some high daily ridership numbers for a station, and what are some low daily ridership numbers (for a day or for a week)?  Can you write a function that will show us the stations that surpass our high numbers for a given day or week.  Or stations that are lower than the low numbers?

6. How has ridership changed since Covid?  Can you see some stations that have been changed more significantly than others?  Or can you find certain days of the week that ridership has changed more dramatically than others?



7. And remember you can use plotly or folium to plot either some of these stations, or some of the data.