### Garbage In, Garbage Out (GIGO): Cleaning Missing Data
**Description**: Load a dataset (e.g., Titanic dataset) and identify missing values. Use
appropriate techniques to handle these missing values.

In [2]:
# Write your code from here


import pandas as pd
import seaborn as sns

# Load Titanic dataset from seaborn
df = sns.load_dataset('titanic')

print("Initial DataFrame shape:", df.shape)

# Step 1: Identify missing values
print("\nMissing values per column:")
print(df.isnull().sum())

# Step 2: Simple cleaning strategy

# For numerical columns, fill missing values with median
num_cols = df.select_dtypes(include=['float64', 'int64']).columns
for col in num_cols:
    median_val = df[col].median()
    df[col].fillna(median_val, inplace=True)
    print(f"Filled missing in '{col}' with median: {median_val}")

# For categorical columns, fill missing values with mode
cat_cols = df.select_dtypes(include=['object', 'category']).columns
for col in cat_cols:
    mode_val = df[col].mode()[0]
    df[col].fillna(mode_val, inplace=True)
    print(f"Filled missing in '{col}' with mode: {mode_val}")

# Step 3: Verify no missing values remain
print("\nMissing values after cleaning:")
print(df.isnull().sum())

print("\nCleaned DataFrame shape:", df.shape)


Initial DataFrame shape: (891, 15)

Missing values per column:
survived         0
pclass           0
sex              0
age            177
sibsp            0
parch            0
fare             0
embarked         2
class            0
who              0
adult_male       0
deck           688
embark_town      2
alive            0
alone            0
dtype: int64
Filled missing in 'survived' with median: 0.0
Filled missing in 'pclass' with median: 3.0
Filled missing in 'age' with median: 28.0
Filled missing in 'sibsp' with median: 0.0
Filled missing in 'parch' with median: 0.0
Filled missing in 'fare' with median: 14.4542
Filled missing in 'sex' with mode: male
Filled missing in 'embarked' with mode: S
Filled missing in 'class' with mode: Third
Filled missing in 'who' with mode: man
Filled missing in 'deck' with mode: C
Filled missing in 'embark_town' with mode: Southampton
Filled missing in 'alive' with mode: no

Missing values after cleaning:
survived       0
pclass         0
sex         

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df[col].fillna(median_val, inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df[col].fillna(median_val, inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always beha