## Code to Join Months Together

In this notebook we want to be able to join different raw datasets together to make a single csv for ease of use when doing data cleaning and exploration

In [1]:
import pandas as pd
import os
from tkinter import Tk
from tkinter.filedialog import askopenfilenames, asksaveasfilename

In [2]:
# Let the user select multiple CSV files to combine
Tk().withdraw()
file_paths = askopenfilenames(title="Select the cleaned CSV chunks to combine")

if not file_paths:
    raise ValueError("No files selected.")

print(f"Files selected:\n{file_paths}")

# check the csv files 

#  Step 3: Load and combine all files
chunks = []
for file in file_paths:
    print(f"Loading: {os.path.basename(file)}")
    df_chunk = pd.read_csv(file)
    chunks.append(df_chunk)

df_combined = pd.concat(chunks, ignore_index=True)
print(f"\nCombined shape: {df_combined.shape}")

#  Step 4: Optional check – preview
print("\n Preview of combined data:")
print(df_combined.head())
print("\n Column list:")
print(df_combined.columns)

#  Step 5: Save to new CSV
output_path = asksaveasfilename(
    title="Save combined file as...",
    defaultextension=".csv",
    filetypes=[("CSV Files", "*.csv")]
)

if output_path:
    df_combined.to_csv(output_path, index=False)
    print(f"\n Combined file saved as: {output_path}")
else:
    print("\n Save cancelled.")

Files selected:
('C:/diksha/Summer Sem/DataAnalysis/Data/cleaned/Clean_Jan_Feb_Taxi.csv', 'C:/diksha/Summer Sem/DataAnalysis/Data/cleaned/Clean_March_April_Taxi.csv')
Loading: Clean_Jan_Feb_Taxi.csv
Loading: Clean_March_April_Taxi.csv

Combined shape: (12060291, 25)

 Preview of combined data:
        tpep_pickup_datetime      tpep_dropoff_datetime  trip_distance  \
0  2023-02-01 00:00:00-05:00  2023-02-01 00:15:00-05:00           3.10   
1  2023-02-01 00:00:01-05:00  2023-02-01 00:33:41-05:00          17.31   
2  2023-02-01 00:00:02-05:00  2023-02-01 00:11:08-05:00           1.91   
3  2023-02-01 00:00:04-05:00  2023-02-01 00:25:20-05:00           6.40   
4  2023-02-01 00:00:07-05:00  2023-02-01 00:03:10-05:00           1.12   

   fare_amount  trip_duration_min pickup_date  pickup_hour  \
0        16.83          15.000000  2023-02-01            0   
1        70.00          33.666667  2023-02-01            0   
2        12.80          11.100000  2023-02-01            0   
3        29.