I have csv file with data, that have these columns: Year,Month,DayofMonth,DayOfWeek,Carrier,OriginAirportID,OriginAirportName,OriginCity,OriginState,DestAirportID,DestAirportName,DestCity,DestState,CRSDepTime,DepDelay,DepDel15,CRSArrTime,ArrDelay,ArrDel15,Cancelled,
Cleanse the data by identifying null values and replacing them with an appropriate value (zero in this case).

In [1]:
# Import required libraries for data cleansing and analysis
import pandas as pd
import numpy as np
import os

In [2]:
# First, let's see what files are available in the data directory
data_dir = '/workspaces/flight-delays/data'
if os.path.exists(data_dir):
    print(f"Files in {data_dir}:")
    for file in os.listdir(data_dir):
        print(f"  - {file}")
else:
    print(f"Data directory {data_dir} not found. Let's check the current directory:")
    for file in os.listdir('.'):
        if file.endswith('.csv'):
            print(f"  - {file}")

Files in /workspaces/flight-delays/data:
  - flights.csv


In [3]:
# Load the flights.csv file directly
csv_path = '/workspaces/flight-delays/data/flights.csv'
df = pd.read_csv(csv_path)

print(f"Successfully loaded flights.csv with shape: {df.shape}")
print(f"\nColumns: {list(df.columns)}")

Successfully loaded flights.csv with shape: (271940, 20)

Columns: ['Year', 'Month', 'DayofMonth', 'DayOfWeek', 'Carrier', 'OriginAirportID', 'OriginAirportName', 'OriginCity', 'OriginState', 'DestAirportID', 'DestAirportName', 'DestCity', 'DestState', 'CRSDepTime', 'DepDelay', 'DepDel15', 'CRSArrTime', 'ArrDelay', 'ArrDel15', 'Cancelled']


In [6]:
# Data cleansing: identify null values and replace with 0
print("=== DATA CLEANSING ===")

# Check for null values before cleaning
print("Null values before cleaning:")
null_before = df.isnull().sum()
print(null_before[null_before > 0])

# Replace all null values with 0
df_clean = df.fillna(0)

# Verify cleaning was successful
print("\nNull values after cleaning:")
null_after = df_clean.isnull().sum()
print(null_after[null_after > 0])

if null_after.sum() == 0:
    print("\n✅ Success! All null values have been replaced with 0")
else:
    print(f"\n⚠️ Warning: {null_after.sum()} null values still remain")

print(f"\nCleaned dataset shape: {df_clean.shape}")

=== DATA CLEANSING ===
Null values before cleaning:
DepDel15    2761
dtype: int64

Null values after cleaning:
Series([], dtype: int64)

✅ Success! All null values have been replaced with 0

Cleaned dataset shape: (271940, 20)

Null values after cleaning:
Series([], dtype: int64)

✅ Success! All null values have been replaced with 0

Cleaned dataset shape: (271940, 20)


In [7]:
# Save the cleansed data to a new file
import os

# Create processed data directory if it doesn't exist
processed_dir = '/workspaces/flight-delays/data/processed'
os.makedirs(processed_dir, exist_ok=True)

# Save cleansed data
clean_csv_path = os.path.join(processed_dir, 'flights_clean.csv')
df_clean.to_csv(clean_csv_path, index=False)

print(f"✅ Cleansed data saved to: {clean_csv_path}")
print(f"Original data: {df.shape[0]:,} rows")
print(f"Clean data: {df_clean.shape[0]:,} rows")
print(f"Columns: {df_clean.shape[1]}")

# Show a sample of the cleaned data
print(f"\nSample of cleaned data:")
print(df_clean.head())

✅ Cleansed data saved to: /workspaces/flight-delays/data/processed/flights_clean.csv
Original data: 271,940 rows
Clean data: 271,940 rows
Columns: 20

Sample of cleaned data:
   Year  Month  DayofMonth  DayOfWeek Carrier  OriginAirportID  \
0  2013      9          16          1      DL            15304   
1  2013      9          23          1      WN            14122   
2  2013      9           7          6      AS            14747   
3  2013      7          22          1      OO            13930   
4  2013      5          16          4      DL            13931   

              OriginAirportName  OriginCity OriginState  DestAirportID  \
0           Tampa International       Tampa          FL          12478   
1      Pittsburgh International  Pittsburgh          PA          13232   
2  Seattle/Tacoma International     Seattle          WA          11278   
3  Chicago O'Hare International     Chicago          IL          11042   
4         Norfolk International     Norfolk          VA   