<a href="https://colab.research.google.com/github/rgboss123/Airline/blob/main/airlines_data_analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [50]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [10]:
import pandas as pd
import os

output_directory = "/content/drive/MyDrive/airline_data"

# Dictionary to store all DataFrames
airline_data = {}

# Check if the directory exists and has CSV files
if os.path.exists(output_directory):
    csv_files = [f for f in os.listdir(output_directory) if f.endswith('.csv')]
    if csv_files:
        print(f"Importing {len(csv_files)} CSV files from '{output_directory}'...")
        for file in csv_files:
            file_path = os.path.join(output_directory, file)
            df_name = os.path.splitext(file)[0] # Get filename without extension
            try:
                airline_data[df_name] = pd.read_csv(file_path)
                print(f"Successfully imported '{file}'.")
            except Exception as e:
                print(f"Error importing '{file}': {e}")
        print("\nAll CSV files imported into the 'airline_data' dictionary.")
    else:
        print(f"No CSV files found in '{output_directory}' to import.")
else:
    print(f"Directory '{output_directory}' does not exist. Please ensure the data is prepared.")

Importing 11 CSV files from '/content/drive/MyDrive/airline_data'...
Successfully imported 'Passengers.csv'.
Successfully imported 'Revenue_Transactions.csv'.
Successfully imported 'Baggage_Details.csv'.
Successfully imported 'Customer_Feedback.csv'.
Successfully imported 'Flight_Delays.csv'.
Successfully imported 'Passport_Details.csv'.
Successfully imported 'Crew_Assignments.csv'.
Successfully imported 'Aircraft_Maintenance.csv'.
Successfully imported 'Visa_Details.csv'.
Successfully imported 'Aircraft_costs.csv'.
Successfully imported 'Flights.csv'.

All CSV files imported into the 'airline_data' dictionary.


You can now access each DataFrame by its original sheet name. For example, to view the first few rows of the 'Aircraft' DataFrame, you can use `display(airline_data['Aircraft'].head())`.

In [11]:
 display(airline_data['Aircraft_costs'])


Unnamed: 0,aircraft_id,aircraft_model,aircraft_price_inr,purchase_date
0,AIR135,B737,131063658,2019-05-02
1,AIR113,A350,279511707,2019-09-19
2,AIR111,A350,129759126,2019-01-09
3,AIR103,B737,253275906,2019-11-14
4,AIR164,A350,144214279,2019-02-08
...,...,...,...,...
76,AIR136,A350,138946164,2019-08-01
77,AIR128,A321,126867864,2019-07-05
78,AIR163,A350,372560184,2019-04-18
79,AIR100,B737,316118264,2019-01-18


In [12]:
display(airline_data['Passengers'])

Unnamed: 0,passenger_id,passenger_name,nationality,flight_id,origin_airport,origin_country,destination_airport,destination_country,distance_km,travel_class,ticket_price,travel_type,loyalty_tier,meal_preference,checkin_status,travel_date
0,PAX1,Sophia Brown,India,FL457,BLR,India,BOM,India,834.4,Premium Economy,8259,Domestic,Platinum,Vegan,Checked-in,2023-04-28
1,PAX2,Chris Clark,India,FL1956,BOM,India,DEL,India,1137.1,Economy,8147,Domestic,Platinum,Non-Veg,No-show,2024-07-17
2,PAX3,Alex Sharma,India,FL1603,BLR,India,MAA,India,267.9,Premium Economy,2461,Domestic,,,No-show,2022-07-24
3,PAX4,Ananya Sharma,India,FL1165,MAA,India,BLR,India,267.9,Business,100000,Domestic,Silver,Jain,Cancelled,2024-03-19
4,PAX5,Sophia Wilson,India,FL468,MAA,India,CCU,India,1385.2,Economy,9946,Domestic,Platinum,Jain,No-show,2024-06-05
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
299995,PAX299996,Alex Singh,Australia,FL172,SYD,Australia,MAA,India,9128.2,Economy,72853,International,Gold,Veg,Checked-in,2025-10-05
299996,PAX299997,Emily Miller,USA,FL620,ORD,USA,BLR,India,13705.6,Premium Economy,156504,International,Platinum,Jain,Cancelled,2024-08-25
299997,PAX299998,Neha Patel,India,FL1421,MAA,India,ORD,USA,13782.1,Economy,94492,International,Gold,Vegan,Cancelled,2025-09-12
299998,PAX299999,Neha Clark,UK,FL1040,LHR,UK,HYD,India,7752.9,Economy,50587,International,,Non-Veg,No-show,2021-09-20


In [13]:
if 'Customer_Feedback' in airline_data:
    null_counts = airline_data['Customer_Feedback'].isnull().sum()
    print("Null values per field in 'Customer_Feedback' DataFrame:\n")
    display(null_counts)
else:
    print("'Customer_Feedback' DataFrame not found in the dictionary.")

Null values per field in 'Customer_Feedback' DataFrame:



Unnamed: 0,0
PNR,0
rating,17606
comments,17516


# Task
# Prepare Data for Imputation

Merge the `Customer_Feedback` DataFrame with the `Passengers` DataFrame on the `PNR` column. This step will add the `flight_id` to each feedback record, which is necessary for group-wise imputation.

### Replace Null Values in 'loyalty_tier' and 'meal_preference'

I will now replace all null values in the `loyalty_tier` and `meal_preference` columns of the `Passengers` DataFrame with the string 'None' to handle missing categorical data consistently.

In [14]:
import pandas as pd

if 'airline_data' in globals() and isinstance(airline_data, dict):
    print("Analyzing null values for each DataFrame in 'airline_data':\n")
    for df_name, df in airline_data.items():
        print(f"Null values for DataFrame: '{df_name}'")
        null_counts = df.isnull().sum()
        display(null_counts[null_counts > 0]) # Display only columns with null values
        print("\n" + "="*50 + "\n")
else:
    print("'airline_data' dictionary not found or is not a dictionary.")

Analyzing null values for each DataFrame in 'airline_data':

Null values for DataFrame: 'Passengers'


Unnamed: 0,0
loyalty_tier,75129
meal_preference,60099




Null values for DataFrame: 'Revenue_Transactions'


Unnamed: 0,0




Null values for DataFrame: 'Baggage_Details'


Unnamed: 0,0




Null values for DataFrame: 'Customer_Feedback'


Unnamed: 0,0
rating,17606
comments,17516




Null values for DataFrame: 'Flight_Delays'


Unnamed: 0,0




Null values for DataFrame: 'Passport_Details'


Unnamed: 0,0




Null values for DataFrame: 'Crew_Assignments'


Unnamed: 0,0




Null values for DataFrame: 'Aircraft_Maintenance'


Unnamed: 0,0




Null values for DataFrame: 'Visa_Details'


Unnamed: 0,0




Null values for DataFrame: 'Aircraft_costs'


Unnamed: 0,0




Null values for DataFrame: 'Flights'


Unnamed: 0,0
flight_date,500
distance_km,500






In [15]:
if 'Flights' in airline_data:
    initial_rows = airline_data['Flights'].shape[0]

    # Drop rows where 'distance_km' or 'flight_date' are null
    airline_data['Flights'].dropna(subset=['distance_km', 'flight_date'], inplace=True)

    rows_after_drop = airline_data['Flights'].shape[0]
    print(f"Rows in 'Flights' DataFrame before dropping nulls: {initial_rows}")
    print(f"Rows in 'Flights' DataFrame after dropping nulls in 'distance_km' or 'flight_date': {rows_after_drop}")
    print("\nUpdated 'Flights' DataFrame info:")
    airline_data['Flights'].info()
else:
    print("'Flights' DataFrame not found in the dictionary.")

Rows in 'Flights' DataFrame before dropping nulls: 2500
Rows in 'Flights' DataFrame after dropping nulls in 'distance_km' or 'flight_date': 2000

Updated 'Flights' DataFrame info:
<class 'pandas.core.frame.DataFrame'>
Index: 2000 entries, 0 to 1999
Data columns (total 8 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   flight_id            2000 non-null   object 
 1   aircraft_id          2000 non-null   object 
 2   origin_airport       2000 non-null   object 
 3   origin_country       2000 non-null   object 
 4   destination_airport  2000 non-null   object 
 5   destination_country  2000 non-null   object 
 6   flight_date          2000 non-null   object 
 7   distance_km          2000 non-null   float64
dtypes: float64(1), object(7)
memory usage: 140.6+ KB


In [16]:
if 'airline_data' in globals():
    print("'airline_data' dictionary is defined.")
else:
    print("'airline_data' dictionary is NOT defined.")

'airline_data' dictionary is defined.


In [17]:
if 'Passengers' in airline_data:
    # Replace nulls in 'loyalty_tier' with 'None'
    airline_data['Passengers']['loyalty_tier'] = airline_data['Passengers']['loyalty_tier'].fillna('None')

    # Replace nulls in 'meal_preference' with 'None'
    airline_data['Passengers']['meal_preference'] = airline_data['Passengers']['meal_preference'].fillna('None')

    print("Null values in 'loyalty_tier' and 'meal_preference' columns of Passengers DataFrame have been replaced with 'None'.")
    print("\nVerifying null counts after replacement:")
    display(airline_data['Passengers'][['loyalty_tier', 'meal_preference']].isnull().sum())
else:
    print("'Passengers' DataFrame not found in the dictionary.")

Null values in 'loyalty_tier' and 'meal_preference' columns of Passengers DataFrame have been replaced with 'None'.

Verifying null counts after replacement:


Unnamed: 0,0
loyalty_tier,0
meal_preference,0


# Task
Merge the `Customer_Feedback` and `Passengers` DataFrames on their respective PNR and passenger_id columns to incorporate `flight_id` into the feedback data. Subsequently, impute the missing 'rating' values in `Customer_Feedback` by grouping by `flight_id` and filling with the median `rating` of each group. Similarly, impute missing 'comments' by grouping by `flight_id` and filling with the mode of `comments` for each group, selecting the first mode if multiple exist. Finally, verify the imputation by displaying the null counts for 'rating' and 'comments' in the `Customer_Feedback` DataFrame.

## Merge Customer_Feedback and Passengers DataFrames

### Subtask:
Merge the `Customer_Feedback` DataFrame with the `Passengers` DataFrame on the `PNR` column to add the `flight_id` to each feedback record.


**Reasoning**:
To prepare for the merge, I will first extract the 'Customer_Feedback' DataFrame and the necessary 'PNR' and 'flight_id' columns from the 'Passengers' DataFrame into temporary variables.



In [49]:
if 'Customer_Feedback' in airline_data and 'Passengers' in airline_data:
    customer_feedback_df = airline_data['Customer_Feedback'].copy()

    # Drop any existing flight_id columns from customer_feedback_df to prevent merge conflicts
    # We want the flight_id from passengers_flight_info_df to be the primary one.
    columns_to_drop = [col for col in customer_feedback_df.columns if 'flight_id' in col]
    customer_feedback_df = customer_feedback_df.drop(columns=columns_to_drop, errors='ignore')

    # Select 'passenger_id' and 'flight_id' from Passengers DataFrame
    passengers_flight_info_df = airline_data['Passengers'][['passenger_id', 'flight_id']].copy()

    # Perform the left merge using 'PNR' from Customer_Feedback and 'passenger_id' from Passengers
    merged_customer_feedback = pd.merge(
        customer_feedback_df,
        passengers_flight_info_df,
        left_on='PNR',
        right_on='passenger_id',
        how='left'
    )

    # Drop the redundant 'passenger_id' column from the merged DataFrame if it's not needed
    merged_customer_feedback = merged_customer_feedback.drop(columns=['passenger_id'])

    # Update the Customer_Feedback DataFrame in the airline_data dictionary
    airline_data['Customer_Feedback'] = merged_customer_feedback

    print("Customer_Feedback DataFrame merged with Passengers (PNR and flight_id) successfully.")
    print("Displaying the first 5 rows of the updated Customer_Feedback DataFrame:")
    display(airline_data['Customer_Feedback'].head())
else:
    print("Required DataFrames (Customer_Feedback or Passengers) not found in 'airline_data'.")

Customer_Feedback DataFrame merged with Passengers (PNR and flight_id) successfully.
Displaying the first 5 rows of the updated Customer_Feedback DataFrame:


Unnamed: 0,PNR,rating,comments,flight_id
0,PAX190363,3.0,average,FL1596
1,PAX159285,4.0,good,FL246
2,PAX169996,1.0,very poor,FL561
3,PAX251259,3.0,average,FL1979
4,PAX295993,1.0,very poor,FL739


if 'Customer_Feedback' in airline_data:
    # Calculate the sum of null values for specific columns in Customer_Feedback
    null_counts_after_imputation = airline_data['Customer_Feedback'][['rating', 'comments']].isnull().sum()

    print("Null values for 'rating' and 'comments' after imputation and update:")
    display(null_counts_after_imputation)

    if null_counts_after_imputation['rating'] == 0 and null_counts_after_imputation['comments'] == 0:
        print("\n✅ Imputation and comment update successful: No more null values in 'rating' or 'comments'.")
    else:
        print("\n❌ Imputation not fully successful: Null values still present in 'rating' or 'comments'.")
else:
    print("'Customer_Feedback' DataFrame not found in the dictionary.")

## Summary:

### Q&A
The count of null values for 'rating' after imputation is 0, and the count of null values for 'comments' after imputation is also 0, confirming that all missing values in these columns have been successfully handled.

### Data Analysis Key Findings
*   The `Customer_Feedback` DataFrame was successfully merged with relevant `flight_id` information from the `Passengers` DataFrame by matching `PNR` in `Customer_Feedback` with `passenger_id` in `Passengers`.
*   Missing 'rating' values in the `Customer_Feedback` DataFrame were imputed using the median rating for each `flight_id`. After imputation, the null count for 'rating' was 0.
*   Missing 'comments' values in the `Customer_Feedback` DataFrame were imputed using the mode of comments for each `flight_id`. If multiple modes existed, the first one was chosen; if no mode could be determined, 'No Comment' was used. After imputation, the null count for 'comments' was 0.

### Insights or Next Steps
*   The `Customer_Feedback` DataFrame is now clean and enriched with `flight_id`, making it ready for further analysis to identify flight-specific feedback patterns or performance issues.
*   The imputed 'comments' data, while useful for completeness, should be treated with caution in qualitative analysis, especially if 'No Comment' was frequently used as a placeholder.


### Update 'comments' based on 'rating'

I will now update the 'comments' column in the `Customer_Feedback` DataFrame using the 'rating' column, mapping the numerical ratings to descriptive text as follows:
*   1.0: 'very poor'
*   2.0: 'poor'
*   3.0: 'average'
*   4.0: 'good'
*   5.0: 'excellent'

In [21]:
import pandas as pd

# Check if 'airline_data' is defined in the global scope
if 'airline_data' not in globals():
    print("Error: 'airline_data' dictionary not found. Please ensure the data loading cell (e.g., cell 6de720c0) is executed.")
else:
    if 'Customer_Feedback' in airline_data:
        def map_rating_to_comment(rating):
            if pd.isna(rating):
                return 'No Rating Specified' # Handle explicit NaNs in rating, though rating column now has no NaNs after imputation
            rounded_rating = round(rating) # Round to the nearest integer
            if rounded_rating == 1:
                return 'very poor'
            elif rounded_rating == 2:
                return 'poor'
            elif rounded_rating == 3:
                return 'average'
            elif rounded_rating == 4:
                return 'good'
            elif rounded_rating == 5:
                return 'excellent'
            else:
                return 'Uncategorized' # For ratings outside the 1-5 range after rounding

        # Apply the modified mapping function to the 'rating' column to create new comments
        airline_data['Customer_Feedback']['comments'] = airline_data['Customer_Feedback']['rating'].apply(map_rating_to_comment)

        print("Comments in 'Customer_Feedback' DataFrame have been updated based on ratings with rounding and default.")
        print("Displaying the first 10 rows of the updated Customer_Feedback DataFrame with 'rating' and 'comments' columns:")
        display(airline_data['Customer_Feedback'][['rating', 'comments']].head(10))
    else:
        print("'Customer_Feedback' DataFrame not found in the dictionary.")

Comments in 'Customer_Feedback' DataFrame have been updated based on ratings with rounding and default.
Displaying the first 10 rows of the updated Customer_Feedback DataFrame with 'rating' and 'comments' columns:


Unnamed: 0,rating,comments
0,3.0,average
1,4.0,good
2,1.0,very poor
3,,No Rating Specified
4,1.0,very poor
5,1.0,very poor
6,1.0,very poor
7,2.0,poor
8,3.0,average
9,2.0,poor


In [None]:
airline_data['Customer_Feedback'].info()

In [19]:
if 'Flights' in airline_data:
    initial_rows = airline_data['Flights'].shape[0]

    # Drop rows where 'distance_km' or 'flight_date' are null
    airline_data['Flights'].dropna(subset=['distance_km', 'flight_date'], inplace=True)

    rows_after_drop = airline_data['Flights'].shape[0]
    print(f"Rows in 'Flights' DataFrame before dropping nulls: {initial_rows}")
    print(f"Rows in 'Flights' DataFrame after dropping nulls in 'distance_km' or 'flight_date': {rows_after_drop}")
    print("\nUpdated 'Flights' DataFrame info:")
    airline_data['Flights'].info()
else:
    print("'Flights' DataFrame not found in the dictionary.")

Rows in 'Flights' DataFrame before dropping nulls: 2000
Rows in 'Flights' DataFrame after dropping nulls in 'distance_km' or 'flight_date': 2000

Updated 'Flights' DataFrame info:
<class 'pandas.core.frame.DataFrame'>
Index: 2000 entries, 0 to 1999
Data columns (total 8 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   flight_id            2000 non-null   object 
 1   aircraft_id          2000 non-null   object 
 2   origin_airport       2000 non-null   object 
 3   origin_country       2000 non-null   object 
 4   destination_airport  2000 non-null   object 
 5   destination_country  2000 non-null   object 
 6   flight_date          2000 non-null   object 
 7   distance_km          2000 non-null   float64
dtypes: float64(1), object(7)
memory usage: 140.6+ KB


# Task
{
  "command": "execute_cell",
  "parameters": {
    "cell_id": "db981aba"
  }
}

## Merge Customer_Feedback and Passengers DataFrames

### Subtask:
Merge the Customer_Feedback DataFrame with the Passengers DataFrame to add the flight_id to each feedback record.


**Reasoning**:
The previous attempt to merge the dataframes resulted in a KeyError because the 'PNR' column was not found in the 'Passengers' DataFrame. The current cell `db981aba` correctly identifies this issue and uses 'passenger_id' from 'Passengers' to merge with 'PNR' from 'Customer_Feedback' to get the 'flight_id'.



In [22]:
if 'Customer_Feedback' in airline_data and 'Passengers' in airline_data:
    customer_feedback_df = airline_data['Customer_Feedback'].copy()

    # Drop any existing flight_id columns from customer_feedback_df to prevent merge conflicts
    # We want the flight_id from passengers_flight_info_df to be the primary one.
    columns_to_drop = [col for col in customer_feedback_df.columns if 'flight_id' in col]
    customer_feedback_df = customer_feedback_df.drop(columns=columns_to_drop, errors='ignore')

    # Select 'passenger_id' and 'flight_id' from Passengers DataFrame
    passengers_flight_info_df = airline_data['Passengers'][['passenger_id', 'flight_id']].copy()

    # Perform the left merge using 'PNR' from Customer_Feedback and 'passenger_id' from Passengers
    merged_customer_feedback = pd.merge(
        customer_feedback_df,
        passengers_flight_info_df,
        left_on='PNR',
        right_on='passenger_id',
        how='left'
    )

    # Drop the redundant 'passenger_id' column from the merged DataFrame if it's not needed
    merged_customer_feedback = merged_customer_feedback.drop(columns=['passenger_id'])

    # Update the Customer_Feedback DataFrame in the airline_data dictionary
    airline_data['Customer_Feedback'] = merged_customer_feedback

    print("Customer_Feedback DataFrame merged with Passengers (PNR and flight_id) successfully.")
    print("Displaying the first 5 rows of the updated Customer_Feedback DataFrame:")
    display(airline_data['Customer_Feedback'].head())
else:
    print("Required DataFrames (Customer_Feedback or Passengers) not found in 'airline_data'.")

Customer_Feedback DataFrame merged with Passengers (PNR and flight_id) successfully.
Displaying the first 5 rows of the updated Customer_Feedback DataFrame:


Unnamed: 0,PNR,rating,comments,flight_id
0,PAX190363,3.0,average,FL1596
1,PAX159285,4.0,good,FL246
2,PAX169996,1.0,very poor,FL561
3,PAX251259,,No Rating Specified,FL1979
4,PAX295993,1.0,very poor,FL739


In [33]:
airline_data['Customer_Feedback']

Unnamed: 0,PNR,rating,comments,flight_id
0,PAX190363,3.0,average,FL1596
1,PAX159285,4.0,good,FL246
2,PAX169996,1.0,very poor,FL561
3,PAX251259,3.0,No Rating Specified,FL1979
4,PAX295993,1.0,very poor,FL739
...,...,...,...,...
104995,PAX23403,3.0,average,FL1903
104996,PAX205937,5.0,excellent,FL620
104997,PAX18412,2.0,poor,FL1269
104998,PAX48922,3.0,average,FL923




**Reasoning**:
The previous merge operation was successful, adding the 'flight_id' to the 'Customer_Feedback' DataFrame. Now, I will proceed with imputing the missing 'rating' values by grouping by 'flight_id' and filling with the median 'rating' of each group, as specified in the task.



In [23]:
if 'Customer_Feedback' in airline_data:
    # Impute missing 'rating' values with the median rating per flight_id
    airline_data['Customer_Feedback']['rating'] = airline_data['Customer_Feedback'].groupby('flight_id')['rating'].transform(lambda x: x.fillna(x.median()))

    print("Missing 'rating' values in 'Customer_Feedback' DataFrame imputed with median per flight_id.")
    print("Verifying null counts for 'rating' after imputation:")
    display(airline_data['Customer_Feedback']['rating'].isnull().sum())
else:
    print("'Customer_Feedback' DataFrame not found in the dictionary.")

Missing 'rating' values in 'Customer_Feedback' DataFrame imputed with median per flight_id.
Verifying null counts for 'rating' after imputation:


np.int64(0)

**Reasoning**:
The 'rating' column has been successfully imputed. The next step, as per the task description, is to impute the missing 'comments' values by grouping by 'flight_id' and filling with the mode of 'comments' for each group, selecting the first mode if multiple exist, or 'No Comment' if no mode can be determined.



In [24]:
if 'Customer_Feedback' in airline_data:
    # Function to impute with mode, handling multiple modes and all-NaN groups
    def impute_mode(series):
        mode_values = series.mode()
        if not mode_values.empty:
            return series.fillna(mode_values.iloc[0])
        else:
            return series.fillna('No Comment') # Fallback if no mode can be determined (e.g., all NaN in group)

    # Impute missing 'comments' values with the mode per flight_id
    airline_data['Customer_Feedback']['comments'] = airline_data['Customer_Feedback'].groupby('flight_id')['comments'].transform(impute_mode)

    print("Missing 'comments' values in 'Customer_Feedback' DataFrame imputed with mode per flight_id.")
    print("Verifying null counts for 'comments' after imputation:")
    display(airline_data['Customer_Feedback']['comments'].isnull().sum())
else:
    print("'Customer_Feedback' DataFrame not found in the dictionary.")

Missing 'comments' values in 'Customer_Feedback' DataFrame imputed with mode per flight_id.
Verifying null counts for 'comments' after imputation:


np.int64(0)

### Update 'No Rating Specified' comments based on actual ratings

I will now go through the `Customer_Feedback` DataFrame and, for any row where the 'comments' column is currently 'No Rating Specified', I will update it using the actual (imputed) 'rating' value and the specified mapping:
*   1.0: 'very poor'
*   2.0: 'poor'
*   3.0: 'average'
*   4.0: 'good'
*   5.0: 'excellent'

In [35]:
import pandas as pd

if 'Customer_Feedback' in airline_data:
    def update_specific_comments(row):
        if row['comments'] == 'No Rating Specified':
            # Ensure rating is not NaN, though it should be imputed by now
            if pd.notna(row['rating']):
                rounded_rating = round(row['rating'])
                if rounded_rating == 1:
                    return 'very poor'
                elif rounded_rating == 2:
                    return 'poor'
                elif rounded_rating == 3:
                    return 'average'
                elif rounded_rating == 4:
                    return 'good'
                elif rounded_rating == 5:
                    return 'excellent'
                else:
                    return 'Uncategorized' # Fallback for unexpected rating values
            else:
                return 'No Rating Specified' # Should not happen after rating imputation
        else:
            return row['comments'] # Keep existing comments

    # Apply the function to update comments where necessary
    airline_data['Customer_Feedback']['comments'] = airline_data['Customer_Feedback'].apply(update_specific_comments, axis=1)

    print("Comments marked 'No Rating Specified' have been updated based on their corresponding ratings.")
    print("Displaying the first 10 rows of the updated Customer_Feedback DataFrame with 'rating' and 'comments' columns:")
    display(airline_data['Customer_Feedback'][['rating', 'comments']].head(10))
else:
    print("'Customer_Feedback' DataFrame not found in the dictionary.")

Comments marked 'No Rating Specified' have been updated based on their corresponding ratings.
Displaying the first 10 rows of the updated Customer_Feedback DataFrame with 'rating' and 'comments' columns:


Unnamed: 0,rating,comments
0,3.0,average
1,4.0,good
2,1.0,very poor
3,3.0,average
4,1.0,very poor
5,1.0,very poor
6,1.0,very poor
7,2.0,poor
8,3.0,average
9,2.0,poor


## Download Processed DataFrames as CSV Files

I will now save all the DataFrames from the `airline_data` dictionary, with all the implemented changes (null value replacements, row deletions, and comment updates), into a new folder in Google Drive. Each DataFrame will be saved as a separate CSV file, making them ready for download.

In [48]:
import os

output_processed_directory = "/content/drive/MyDrive/airline_data_processed"

# Create the directory if it doesn't exist
if not os.path.exists(output_processed_directory):
    os.makedirs(output_processed_directory)
    print(f"Created output directory: '{output_processed_directory}'")
else:
    print(f"Output directory already exists: '{output_processed_directory}'")

if 'airline_data' in globals() and isinstance(airline_data, dict):
    print("\nSaving processed DataFrames to CSV...")
    for df_name, df in airline_data.items():
        file_path = os.path.join(output_processed_directory, f"{df_name}.csv")
        try:
            df.to_csv(file_path, index=False)
            print(f"Successfully saved '{df_name}' to '{file_path}'.")
        except Exception as e:
            print(f"Error saving '{df_name}': {e}")
    print("\nAll processed DataFrames have been saved as CSV files. You can find them in your Google Drive at the specified path.")
else:
    print("'airline_data' dictionary not found. Please ensure the data loading cell was executed successfully.")

Created output directory: '/content/drive/MyDrive/airline_data_processed'

Saving processed DataFrames to CSV...
Successfully saved 'Passengers' to '/content/drive/MyDrive/airline_data_processed/Passengers.csv'.
Successfully saved 'Revenue_Transactions' to '/content/drive/MyDrive/airline_data_processed/Revenue_Transactions.csv'.
Successfully saved 'Baggage_Details' to '/content/drive/MyDrive/airline_data_processed/Baggage_Details.csv'.
Successfully saved 'Customer_Feedback' to '/content/drive/MyDrive/airline_data_processed/Customer_Feedback.csv'.
Successfully saved 'Flight_Delays' to '/content/drive/MyDrive/airline_data_processed/Flight_Delays.csv'.
Successfully saved 'Passport_Details' to '/content/drive/MyDrive/airline_data_processed/Passport_Details.csv'.
Successfully saved 'Crew_Assignments' to '/content/drive/MyDrive/airline_data_processed/Crew_Assignments.csv'.
Successfully saved 'Aircraft_Maintenance' to '/content/drive/MyDrive/airline_data_processed/Aircraft_Maintenance.csv'.
S