### Task 1: Handling Schema Mismatches using Spark
**Description**: Use Apache Spark to address schema mismatches by transforming data to match
the expected schema.

**Steps**:
1. Create Spark session
2. Load dataframe
3. Define the expected schema
4. Handle schema mismatches
5. Show corrected data

In [None]:
# Write your code from here
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, StringType, IntegerType, DoubleType

spark = SparkSession.builder.appName("SchemaMismatchHandler").getOrCreate()

df = spark.read.option("header", True).csv("data.csv")

expected_schema = StructType([
    StructField("id", IntegerType(), True),
    StructField("name", StringType(), True),
    StructField("age", IntegerType(), True),
    StructField("salary", DoubleType(), True)
])

df_corrected = df.selectExpr(
    "cast(id as int) as id",
    "cast(name as string) as name",
    "cast(age as int) as age",
    "cast(salary as double) as salary"
)

df_corrected.show()


### Task 2: Detect and Correct Incomplete Data in ETL
**Description**: Use Python and Pandas to detect incomplete data in an ETL process and fill
missing values with estimates.

**Steps**:
1. Detect incomplete data
2. Fill missing values
3. Report changes

In [None]:
# Write your code from here
import pandas as pd

df = pd.read_csv("data.csv")
incomplete = df.isnull().sum()

df_filled = df.fillna(df.mean(numeric_only=True))

changes = df.isnull().sum() - df_filled.isnull().sum()
print("Missing values before filling:\n", incomplete)
print("\nChanges after filling:\n", changes)
print("\nData after filling missing values:\n", df_filled.head())
