Downloaded 3 files frmo citibike site (August, September, October). Want to create a union here to basically stack them on top of each other vertically, as opposed to a join that would horizontally attach them.

In [1]:
# Import dependencies
import pandas as pd
import geopandas as gpd
from sqlalchemy import create_engine

In [2]:
# Read the CSV files

# October file
oct_csv = pd.read_csv("Resources/JC-202410-citibike-tripdata.csv")

# September file
sept_csv = pd.read_csv("Resources/JC-202409-citibike-tripdata.csv")

# August file
aug_csv = pd.read_csv("Resources/JC-202408-citibike-tripdata.csv")

In [6]:
# Concatenate the dfs vertically
combined_df = pd.concat([aug_csv, sept_csv, oct_csv], ignore_index=True)

# Display new df
combined_df.head()

Unnamed: 0,ride_id,rideable_type,started_at,ended_at,start_station_name,start_station_id,end_station_name,end_station_id,start_lat,start_lng,end_lat,end_lng,member_casual
0,17AE31FCAE74D287,electric_bike,2024-08-07 13:22:55.656,2024-08-07 13:25:09.654,7 St & Monroe St,HB304,4 St & Grand St,HB301,40.746413,-74.037977,40.742258,-74.035111,member
1,FD9859BDBE0CDF70,electric_bike,2024-08-13 13:15:08.627,2024-08-13 13:17:44.971,7 St & Monroe St,HB304,4 St & Grand St,HB301,40.746413,-74.037977,40.742258,-74.035111,member
2,AAC5ECD095AE5572,electric_bike,2024-08-12 20:07:26.975,2024-08-12 20:09:38.180,7 St & Monroe St,HB304,4 St & Grand St,HB301,40.746413,-74.037977,40.742258,-74.035111,member
3,857C4DCB2F29655B,electric_bike,2024-08-09 13:43:18.882,2024-08-09 13:45:38.226,7 St & Monroe St,HB304,4 St & Grand St,HB301,40.746413,-74.037977,40.742258,-74.035111,member
4,4439657C244E7009,classic_bike,2024-08-01 10:29:40.174,2024-08-01 10:32:56.874,Clinton St & Newark St,HB409,4 St & Grand St,HB301,40.73743,-74.03571,40.742258,-74.035111,member


Now that we've combined them, we need to save them to CSV from Tableau.

In [4]:
# Save to CSV
combined_df.to_csv("Resources/combined_citibike_data.csv", index=False)

The zipcode aspect of the assignment puzzled me quite a bit. None of these files has a zipcode field, however they do have the latitude and longitude of stations. The zipcode of those coordinates could be found. I couldn't figure out how to do it using Tableau alone. 

Decided to download GEOJSON files containing the boundaries for the zip codes. Now that I'm thinking about it, I know that Tableau has an internal database for zipcode boundaries. I should be able to figure that out and not import geojson files to do it on top.

With hours of research I was not able to get any closer to using Tableau's built in databases to find zipode information about the coordinates. I will ask my instructor in the next office hours about possible issues here. Perhaps it's not possible. 

Now I will try to join some external geojson boundary data to the original dataset. I tried to join the data in Tableau, it seemed to work. Had some errors with the NJ data but it eventually worked. Once I joined the NY data the visuals all disappeared. I want to try joining in python to see if that could help the visuals work. If not I might have to give up on zip code data for now.

In [11]:
# Convert CSV lat/lng into a GeoDataFrame
gdf_points = gpd.GeoDataFrame(combined_df, 
                               geometry=gpd.points_from_xy(combined_df['end_lng'], combined_df['end_lat']),
                               crs="EPSG:4326")

# Load the geoJSON file for Zip Code boundaries
gdf_boundaries = gpd.read_file('Resources/nj_new_jersey_zip_codes_geo.json')

# Make sure both GeoDataFrames are using the same CRS
gdf_boundaries = gdf_boundaries.to_crs("EPSG:4326")

In [14]:
# Perform a spatial join to link lat/lng points to zip code boundaries
gdf_joined = gpd.sjoin(gdf_points, gdf_boundaries, how="left", op='within')


  if await self.run_code(code, result, async_=asy):


In [15]:

# The result should have the zip code info now attached to each point
print(gdf_joined[['end_lng', 'end_lat', 'geometry']])  # Replace 'Zip_Code' with the actual field name from the geoJSON


          end_lng    end_lat                    geometry
0      -74.035111  40.742258  POINT (-74.03511 40.74226)
1      -74.035111  40.742258  POINT (-74.03511 40.74226)
2      -74.035111  40.742258  POINT (-74.03511 40.74226)
3      -74.035111  40.742258  POINT (-74.03511 40.74226)
4      -74.035111  40.742258  POINT (-74.03511 40.74226)
...           ...        ...                         ...
340311 -74.028865  40.737215  POINT (-74.02887 40.73722)
340312 -74.028865  40.737215  POINT (-74.02887 40.73722)
340313 -74.028865  40.737215  POINT (-74.02887 40.73722)
340314 -74.028865  40.737215  POINT (-74.02887 40.73722)
340315 -74.028865  40.737215  POINT (-74.02887 40.73722)

[340316 rows x 3 columns]


In [17]:
# Export as CSV
gdf_joined.to_csv('Resources/combined_citibike_gdf.csv', index=False)
