Urban Data Science & Smart Cities <br>
URSP688Y <br>
Instructor: Chester Harvey <br>
Urban Studies & Planning <br>
National Center for Smart Growth <br>
University of Maryland

[<img src="https://colab.research.google.com/assets/colab-badge.svg">](https://colab.research.google.com/github/ncsg/ursp688y_sp2024/blob/main/exercises/exercise06/exercise06.ipynb)

# Exercise 6

## Problem

In class this week, we saw how to access real-time data about Capital Bikeshare from the internet using their API. We also dealt with the challenge of wrangling those data. We needed to parse a JSON file into a table, and we considered how we might retrieve, store, and combine many JSONs in order to understand how bike availability changed over time.

These real-time data can help us answer questions about how well Captial Bikeshare is being utilized.

See if you can use data from the API (I have already stored and combined it--see below) answer these questions:
- How many bikes were available within the system during each hour over a 24 hour period?
    - Can you graph this over time?
    - Which hour of the day were bikes most available? Least available?

**Bonus:** Can you write a function to estimate how many bikes are <ins>currently being used</ins>, whenever you call the function? This will require loading real-time data from the API and comparing it to stored data.

## Data

I wrote a script, which you can see [here](https://github.com/ncsg/ursp688y_sp2024/blob/main/demos/demo06/cabi_data/get_cabi_free_bikes.py), to retrieve and store JSON data from the `free_bike_status` table in [Capital Bikeshare's](https://capitalbikeshare.com/system-data) GBFS feed every 5 minutes. I ran this script on my computer for a bit more than 24 hours. ([Here's a tutorial](https://realpython.com/run-python-scripts/) on running scripts on the command line, if you're curious.) All of those JSONS are available for you to use. They're stored at [`ursp688y_sp2024/demos/demo06/cabi_data`](https://github.com/ncsg/ursp688y_sp2024/tree/main/demos/demo06/cabi_data).

## Building Off of the Demo

The in-class demo gave us a starting point for how to access real-time JSON data from the API, load saved JSON data, and parse JSON data into a DataFrame.

I have copied what we did in class below and added onto it to develop a single tidy dataframe with records from all the saved JSONs, plus timestamps. This should be all the data you need for the questions above (except the bonus).

See if you can follow my code, then build onto it.

As usual, please wrap the code for your solution in a function, and put that function into a module (you can add to my module, or make a new one if you prefer). Then load your main function from the module and call it in the notebook to demonstrate your solution.


# Setup

In [3]:
# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [2]:
# Import packages
import os
import json
import requests
import pandas as pd

In [4]:
# Set the working directory
os.chdir('/content/drive/MyDrive/School/UMD Classes/Colab Notebooks/exercise06')

In [32]:
# Import module
import exercise06

# Making a get request
response = requests.get('https://gbfs.lyft.com/gbfs/1.1/dca-cabi/en/free_bike_status.json')

# Get JSON content
data = response.json()

# # Inspect the contents
# data.keys()

# Make a dataframe out of data for available bikes
df = pd.DataFrame(data['data']['bikes'])

# open a single stored json
with open('cabi_data/cabi_bike_status_2024-03-03_13-11-54.json') as json_data: # Notice how I added 'cabi_data/' to the front of the path to get into that subdirectory where the jsons are stored?
    data = json.load(json_data)
    json_data.close()

# drill into the records for each bike
records = data['data']['bikes']

# convert to a dataframe
df = pd.DataFrame(records)

# drop a column that we won't use, just to keep things clean
df = df.drop(columns=['rental_uris'])

# load json and combine
df = exercise06.load_and_combine_free_bike_status_jsons_as_df('cabi_data')

df.head()

# determine identifier for point a of trip

# determine identifier for point b of trip

# count instances of 'point a's

# make a new column with sum of 'point a'

# count instances of 'point b's

# make a new column with sum of 'point b'

# match point a to point b

# count matches


Unnamed: 0,is_reserved,fusion_lon,fusion_lat,lat,type,is_disabled,bike_id,name,lon,timestamp
0,0,0.0,0.0,38.88744,electric_bike,0,228fcead5bda270ea7ebf04674ef7389,320-065,-77.025751,2024-03-04 03:53:20-05:00
1,0,0.0,0.0,38.955424,electric_bike,0,cfada148cfa6fa2ab7903b153d11474d,570-760,-76.940124,2024-03-04 03:53:20-05:00
2,0,0.0,0.0,38.881418,electric_bike,0,f276923ad259d0e29a4a64ad999c6bab,268-224,-77.027456,2024-03-04 03:53:20-05:00
3,0,0.0,0.0,38.907854,electric_bike,0,76f98f8f819c41c4f60ec93ef8b2633b,137-726,-77.071638,2024-03-04 03:53:20-05:00
4,0,0.0,0.0,38.898327,electric_bike,0,8e1856bacedc8da0bf14a3213ee65aa3,329-768,-77.046905,2024-03-04 03:53:20-05:00


This is where you take over. Can you use this dataframe to answer the question(s) above?

apparently not LOL

In [None]:
# # count occurrences of each bike ID
# bike_id_counts = df['bike_id'].value_counts()

# # print the results
# print(bike_id_counts)

In [None]:
# How many bikes were available within the system during each hour over a 24 hour period?
# Can you graph this over time?
# Which hour of the day were bikes most available? Least available?

# Count trips in the past 24 hours

## Identify trips in the data -- how will we do this?

### Count dictionaries, how many entries are there?

#### Count changes in appearance

## Filter to the past 24 hours

# 2. How many bikes are being used right now?
# Calculate how many bikes there are in total
# Calculate how many bikes are available right now
# Subtract available bikes from total bikes to find bikes in use