## Detect Schema Mismatches in Data Pipelines
**Objective**: Identify and resolve schema mismatches that commonly occur in data pipelines.

**Task**: Column Name Mismatch

**Steps**:
1. Load the source DataFrame with the below schema:
    - id : Integer
    - name : String
    - age : Integer
2. Load the target DataFrame with the below schema:
    - id : Integer
    - fullname : String
    - age : Integer
3. Use a schema comparison tool or write a simple function to detect mismatches in column names.
4. Resolve the mismatch by renaming the `fullname` column in the target DataFrame to `name` .

In [1]:
# write your code from here
import pandas as pd

# Step 1: Load source DataFrame with schema: id, name, age
source_df = pd.DataFrame({
    'id': [1, 2, 3],
    'name': ['Alice', 'Bob', 'Charlie'],
    'age': [25, 30, 35]
})

# Step 2: Load target DataFrame with schema: id, fullname, age
target_df = pd.DataFrame({
    'id': [1, 2, 3],
    'fullname': ['Alice A.', 'Bob B.', 'Charlie C.'],
    'age': [25, 30, 35]
})

# Step 3: Detect mismatches in column names
def detect_schema_mismatches(source: pd.DataFrame, target: pd.DataFrame):
    source_cols = set(source.columns)
    target_cols = set(target.columns)

    missing_in_target = source_cols - target_cols
    extra_in_target = target_cols - source_cols

    return missing_in_target, extra_in_target

missing_cols, extra_cols = detect_schema_mismatches(source_df, target_df)
print("Columns missing in target:", missing_cols)
print("Extra columns in target:", extra_cols)

# Step 4: Resolve mismatch by renaming 'fullname' to 'name' in target DataFrame
if 'fullname' in target_df.columns:
    target_df.rename(columns={'fullname': 'name'}, inplace=True)

# Verify resolution
missing_cols_after, extra_cols_after = detect_schema_mismatches(source_df, target_df)
print("After rename:")
print("Columns missing in target:", missing_cols_after)
print("Extra columns in target:", extra_cols_after)


Columns missing in target: {'name'}
Extra columns in target: {'fullname'}
After rename:
Columns missing in target: set()
Extra columns in target: set()
