311 Data Notebook

311 Data Notebook, Cleaning & Hosting Overview with Google Colab

Background

The 311 service request dataset is very large and challenging to host or query in in-browser environments. To make this data more accessible and usable, the project aims to process, clean, and split it into manageable files, enabling users to work with the data efficiently.

Tools Used

Python & Jupyter Notebook – for data cleaning and transformation
Pandas – for processing large datasets efficiently
Google Colab – for running the data pipeline in-browser, processing datasets, and providing temporary access to cleaned files

Objectives

This project builds a reproducible pipeline that:

Downloads raw 311 Service Request data
Cleans the dataset according to standardized rules for consistency and quality
Splits the data by year, then by month, with each file around 100MB in size
Provides cleaned and split datasets via Colab for direct download by users (instead of publishing large datasets directly to GitHub)

Process

1. Data Acquisition

Downloaded 311 Service Request data from the city’s open data portal
Users can dynamically select the year they want to process and download in the notebook. (Refer to the notebook for the code snippet that maps each year to the corresponding CSV URL.)

2. Data Cleaning

High-level steps performed:

Removed duplicates
Handled missing values
Standardized date fields
Reviewed and simplified categorical variables
Dropped unnecessary columns
Converted text columns to lowercase
Cleaned and validated geographical data
Partitioned and saved cleaned dataset into monthly files

3. Data Splitting

Partitioned cleaned datasets by year, then by month
Organized files in a clear folder hierarchy for easy access

4. Notebook

Documented notebook automates:
- Data download (with dynamic year selection)
- Cleaning rules
- Splitting logic
- Saving outputs
Includes annotations explaining each step

How to Use Google Colab

Google Colab lets you run this project in your browser without installing anything on your computer.

Open the Colab Notebook

Steps

Open the link above and sign in with your Google account.
(Optional) Click the "Connect" button in the top right corner to start a runtime. Alternatively, running any cell or using "Run all" will automatically connect.
Run the notebook cells:
- To run all cells automatically, use "Runtime" → "Run all" from the top menu.
- To run cells one by one, press Shift + Enter or click the play button next to each cell.
The notebook will:
- Download raw 311 data (you can select the year to process)
- Apply cleaning rules
- Split files by year and month
Download the resulting files by opening the "Files" tab in the left sidebar, right-clicking the CSV files, and choosing "Download".

Note: Files exist only during your Colab session. Be sure to download anything you need before closing the session.

How to Use Locally

For instructions on cloning the repo, installing dependencies, and running the notebook on your machine, please see the project’s README.

Deliverables

Annotated Jupyter Notebook with the full data pipeline
- Github
- Colab
Cleaned and partitioned datasets available via Colab runtime
Cleaning Rules Documentation

References

Project README
Related GitHub Issue
Older 311 data can be found at: LA City Open Data Portal

Uh oh!

311 Data Notebook

311 Data Notebook, Cleaning & Hosting Overview with Google Colab

Background

Tools Used

Objectives

Process

1. Data Acquisition

2. Data Cleaning

3. Data Splitting

4. Notebook

How to Use Google Colab

Steps

How to Use Locally

Deliverables

References

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Partner with us

Our Work

Get Involved

About us

Clone this wiki locally