## RU Data Science Club - Raw Rutgers Bus Datathon Dataset
Welcome to the Rutgers Data Science Club - Fall 2025 Datathon.
This notebook will provide Raw Rutgers Bus data for this Fall's Datathon

## Ridership

Ridership data provides the number of people on each operating bus in 5 minute intervals, starting from 6:00 AM and ending at approximately 3:00 AM the following day.

Ridership data will come from an rubus API endpoint, which pulls data from PassioGo's servers. <br />
A sample ridership csv file for a single day has been provided to you at **ridership.csv**. <br />
If you wish to collect the current day's ridership data, feel free to run the code below.

In [1]:
import json
import requests 
import csv
import pandas as pd
from datetime import datetime

# Converts a single JSON bus_breaks file to a CSV file containing:
# bus_id, time, and ridership_count
def api_fetch(api_url):
	print(f"Fetching {api_url}...")
	try:
		response = requests.get(api_url, timeout=10)
		response.raise_for_status()
		print(f"Fetched successfully.")
		return response.json()
	except requests.exceptions.RequestException as e:
		print(f"Failed to fetch {api_url}. Error: {e}")
		return None

def ridership_to_csv(json_data, csv_file_name):
	if json_data is None:
		print("No data")
		return
	rows = []
	for bus_id, time_data in json_data.items():
		for minute, ridership_count in time_data.items():
			rows.append([bus_id, minute, ridership_count])
	df = pd.DataFrame(rows, columns=['bus_id', 'time', 'ridership_count'])
	df['bus_id'] = pd.to_numeric(df['bus_id'])
	df['time'] = pd.to_numeric(df['time'])
	df['ridership_count'] = pd.to_numeric(df['ridership_count'])
	df = df.sort_values(by=['time']).reset_index(drop=True)
	df.to_csv(csv_file_name, index=False)
	print(df.head())

current_time = datetime.now().strftime("%Y-%m-%d_%H:%M:%S")
output_filename=f"ridership_{current_time}.csv"
ridership_data = api_fetch("https://demo.rubus.live/bus_ridership")
ridership_to_csv(ridership_data, output_filename)

Fetching https://demo.rubus.live/bus_ridership...
Fetched successfully.
   bus_id  time  ridership_count
0   15188   605                0
1   18015   610                3
2    4850   610                0
3   13209   610                0
4   15188   610                0


## Bus Breaks

Bus breaks data provides the time a bus arrives and departs from a stop.

Bus breaks data will come from an rubus API endpoint, which pulls data from PassioGo's servers. <br />
A sample bus breaks csv file for a single day has been provided to you at **bus_breaks.csv**. <br />
If you wish to collect the current day's bus breaks data, feel free to run the code below.

In [None]:
import json
import requests
import pandas as pd
from datetime import datetime

# Converts a single JSON bus_breaks file to a CSV file containing:
# bus_id, stop_id, time_arrived, time_departed, and break_duration
def api_fetch(api_url):
	print(f"Fetching {api_url}...")
	try:
		response = requests.get(api_url, timeout=10)
		response.raise_for_status()
		print(f"Fetched successfully.")
		return response.json()
	except requests.exceptions.RequestException as e:
		print(f"Failed to fetch {api_url}. Error: {e}")
		return None

def bus_breaks_to_csv(csv_file_name):
	try:
		
	bus_ids = vehicles_df['id'].tolist()
	all_breaks = []
	for id in bus_ids:
		api_url = f"https://demo.rubus.live/get_breaks?bus_id={id}"
		breaks_data = api_fetch(api_url)
		if breaks_data is not None:
			for break_record in breaks_data:
				break_record['id'] = id
				all_breaks.append(break_record)
		else:
			print(f"No breaks data for bus ID {id}")
			continue
	df = pd.DataFrame(all_breaks)
	columns=['id', 'stop_id', 'time_arrived', 'time_departed', 'break_duration']
	df = df[columns]
	df.to_csv(csv_file_name, index=False)
	print(df.head())

bus_breaks_to_csv("./data/bus_breaks_raw.csv")

## Other files
We have also provided two more csvs: routes.csv and stops.csv

**routes.csv**:
- route_id: The name of the bus route
- stop sequence: a sequence of stop ids which the buses on that route stop at

**stops.csv**:
- stop_id: the numerical id of the stop
- name: the known name of the stop
- campus: the campus which the stop is located on
- shortname: a shorter version of name
