# Analyzing Transportation Patterns Using Chicago Taxi Trip and Chicago Rideshare for Urban Planning Insights

Urban planning is an important aspect of designing cities to meet the infrastructure needed to support the livelihood of tens of millions of residents. Transportation in particular has a critical impact on people’s decision to accept employment, how they spend their time, places they visit, and even business locations. This exploration of Chicago taxi and ride share service data will provide an insight on traffic conditions, travel expenses, and hotspots for visitation in the city which can be used for city planning purposes.

**Group Members**
1. Monika Phuengmak
2. Winny 
3. Syeda Aqeel

## 1. Problem Definition
This project aims to analyze Chicago’s taxi and rideshare data from 2018 to 2023 to generate actionable insights that support urban planning, enhance traffic management, and optimize transportation services. By identifying peak demand zones, assessing traffic congestion effects on trip durations, and analyzing fare trends across variables such as time, location, and service type, the project seeks to provide data-driven recommendations to improve mobility, reduce congestion, and better meet the transportation needs of Chicago’s residents and visitors.

## 2. Data Sources
- Chicago Taxi Trips from 2013 to 2023: [link](https://data.cityofchicago.org/Transportation/Taxi-Trips-2013-2023-/wrvz-psew/about_data)
- Chicago Transportation Network Providers Trip: from 2018 to 2022: [link](https://data.cityofchicago.org/Transportation/Transportation-Network-Providers-Trips-2018-2022-/m6dm-c72p/about_data)
- Chicago Transportation Network Providers Trip: from 2023 to present: [link](https://data.cityofchicago.org/Transportation/Transportation-Network-Providers-Trips-2023-/n26f-ihde/about_data)
- Chicago Community Area: [link](https://data.cityofchicago.org/Facilities-Geographic-Boundaries/Boundaries-Community-Areas-current-/cauq-8yn6)

### Chicago Taxi Trips

This dataset reflects taxi trips reported to the City of Chicago in its role as a regulatory agency. To protect privacy but allow for aggregate analyses, the Taxi ID is consistent for any given taxi medallion number but does not show the number. Census Tracts are suppressed in some cases for privacy. Due to the data reporting process, not all trips are reported but the City believes that most are.

**Columns in this dataset**

|Column name|Description|Type|
|--|--|--|
|Trip ID|A unique identifier for the trip.|String|
|Taxi ID|A unique identifier for the taxi.|String|
|Trip Start Timestamp|Date and time when the trip started, rounded to the nearest 15 minutes.|Timestamp|
|Trip End Timestamp|Date and time when the trip ended, rounded to the nearest 15 minutes.|Timestamp|
|Trip Seconds|Duration of the trip in seconds.|Integer|
|Trip Miles|Distance of the trip in miles.|Integer|
|Pickup Census Tract|The Census Tract where the trip began. For privacy, this Census Tract is not shown for some trips. This column often will be blank for locations outside Chicago.|Number|
|Dropoff Census Tract|The Census Tract where the trip ended. For privacy, this Census Tract is not shown for some trips. This column often will be blank for locations outside Chicago.|Number|
|Pickup Community Area|The Community Area where the trip began. This column will be blank for locations outside Chicago.|Integer|
|Pickup Community Area|The Community Area where the trip began. This column will be blank for locations outside Chicago.|Integer|
|Dropoff Community Area|The Community Area where the trip ended. This column will be blank for locations outside Chicago.|Integer|
|Fare|The fare for the trip.|Integer|
|Tips|The tip for the trip. Cash tips generally will not be recorded.|Integer|
|Tolls|The tolls for the trip.|Integer|
|Extras|Extra charges for the trip. This generally includes airport surcharges, late-night or rush hour surcharges, credit card processing fee, and other surcharges.|Integer|
|Trip Total|Total cost of the trip calculated from are, tips, tolls, and extras.|Integer|
|Payment Type|Type of payment for the trip.|String|
|Company|The taxi company.|String|
|Pickup Centroid Latitude|The latitude of the center of the pickup census tract or the community area if the census tract has been hidden for privacy. This column often will be blank for locations outside Chicago.|Double|
|Pickup Centroid Longitude|The longitude of the center of the pickup census tract or the community area if the census tract has been hidden for privacy. This column often will be blank for locations outside Chicago.|Double|
|Pickup Centroid Location|The location of the center of the pickup census tract or the community area if the census tract has been hidden for privacy. This column often will be blank for locations outside Chicago.|Point|
|Dropoff Centroid Latitude|The latitude of the center of the dropoff census tract or the community area if the census tract has been hidden for privacy. This column often will be blank for locations outside Chicago.|Double|
|Dropoff Centroid Longitude|The longitude of the center of the dropoff census tract or the community area if the census tract has been hidden for privacy. This column often will be blank for locations outside Chicago.|Double|
|Dropoff Centroid Location|The location of the center of the dropoff census tract or the community area if the census tract has been hidden for privacy. This column often will be blank for locations outside Chicago.|Point|

Download data from Google Cloud Bucket:

In [3]:
# the following line gets the bucket name attached to our cluster
bucket = spark._jsc.hadoopConfiguration().get("fs.gs.system.bucket")

# specifying the path to our bucket where the data is located (no need to edit this path anymore)
data = "gs://" + bucket + "/data/chicago-taxi-trip/chicago-taxi-0000000000*"
print(data)

NameError: name 'spark' is not defined