## Detect Schema Mismatches in Data Pipelines
**Objective**: Identify and resolve schema mismatches that commonly occur in data pipelines.

**Task**: Column Name Mismatch

**Steps**:
1. Load the source DataFrame with the below schema:
    - id : Integer
    - name : String
    - age : Integer
2. Load the target DataFrame with the below schema:
    - id : Integer
    - fullname : String
    - age : Integer
3. Use a schema comparison tool or write a simple function to detect mismatches in column names.
4. Resolve the mismatch by renaming the `fullname` column in the target DataFrame to `name` .

In [1]:
import pandas as pd

# Step 1: Load the source DataFrame
source_data = {'id': [1, 2, 3],
               'name': ['Alice', 'Bob', 'Charlie'],
               'age': [25, 30, 28]}
source_df = pd.DataFrame(source_data)
print("Source DataFrame Schema:")
print(source_df.dtypes)
print("\nSource DataFrame:")
print(source_df)

# Step 2: Load the target DataFrame
target_data = {'id': [1, 2, 3],
               'fullname': ['Alice Smith', 'Bob Johnson', 'Charlie Brown'],
               'age': [25, 30, 28]}
target_df = pd.DataFrame(target_data)
print("\nTarget DataFrame Schema:")
print(target_df.dtypes)
print("\nTarget DataFrame:")
print(target_df)

# Step 3: Detect mismatches in column names
source_columns = set(source_df.columns)
target_columns = set(target_df.columns)

missing_in_target = source_columns - target_columns
missing_in_source = target_columns - source_columns
common_columns = source_columns.intersection(target_columns)

print("\nColumn Name Mismatches:")
if missing_in_target:
    print(f"Columns missing in target DataFrame: {missing_in_target}")
if missing_in_source:
    print(f"Columns missing in source DataFrame: {missing_in_source}")
if common_columns:
    print(f"Common columns: {common_columns}")

# Step 4: Resolve the mismatch by renaming the 'fullname' column in the target DataFrame to 'name'
if 'fullname' in target_df.columns:
    target_df = target_df.rename(columns={'fullname': 'name'})
    print("\nTarget DataFrame after renaming 'fullname' to 'name':")
    print(target_df)
    print("\nUpdated Target DataFrame Schema:")
    print(target_df.dtypes)
else:
    print("\n'fullname' column not found in the target DataFrame.")

# Verify if the column names now match (for the relevant columns)
updated_target_columns = set(target_df.columns)
if 'name' in updated_target_columns and 'name' in source_columns:
    print("\n'name' column in target DataFrame now matches the source DataFrame.")

Source DataFrame Schema:
id       int64
name    object
age      int64
dtype: object

Source DataFrame:
   id     name  age
0   1    Alice   25
1   2      Bob   30
2   3  Charlie   28

Target DataFrame Schema:
id           int64
fullname    object
age          int64
dtype: object

Target DataFrame:
   id       fullname  age
0   1    Alice Smith   25
1   2    Bob Johnson   30
2   3  Charlie Brown   28

Column Name Mismatches:
Columns missing in target DataFrame: {'name'}
Columns missing in source DataFrame: {'fullname'}
Common columns: {'age', 'id'}

Target DataFrame after renaming 'fullname' to 'name':
   id           name  age
0   1    Alice Smith   25
1   2    Bob Johnson   30
2   3  Charlie Brown   28

Updated Target DataFrame Schema:
id       int64
name    object
age      int64
dtype: object

'name' column in target DataFrame now matches the source DataFrame.
