## Detect Schema Mismatches in Data Pipelines
**Objective**: Identify and resolve schema mismatches that commonly occur in data pipelines.

**Task**: Missing Column

1. Load the source DataFrame with the below schema:
    - id : Integer
    - email : String
    - signup_date : Date
2. Load the target DataFrame with the below schema:
    - id : Integer
    - email : String
3. Implement a check to identify any columns that are present in the source DataFrame but missing in the target.
4. Add the missing `signup_date` column to the target DataFrame.

In [1]:
import pandas as pd

# Step 1: Load the source DataFrame
source_data = {'id': [1, 2, 3],
               'email': ['alice@example.com', 'bob@example.com', 'charlie@example.com'],
               'signup_date': pd.to_datetime(['2023-01-15', '2023-02-20', '2023-03-10'])}
source_df = pd.DataFrame(source_data)
print("Source DataFrame Schema:")
print(source_df.dtypes)
print("\nSource DataFrame:")
print(source_df)

# Step 2: Load the target DataFrame
target_data = {'id': [1, 2, 3],
               'email': ['alice@example.com', 'bob@example.com', 'charlie@example.com']}
target_df = pd.DataFrame(target_data)
print("\nTarget DataFrame Schema:")
print(target_df.dtypes)
print("\nTarget DataFrame:")
print(target_df)

# Step 3: Implement a check to identify columns missing in the target DataFrame
source_columns = set(source_df.columns)
target_columns = set(target_df.columns)

missing_in_target = source_columns - target_columns

print("\nMissing Columns in Target DataFrame (compared to Source):")
if missing_in_target:
    print(missing_in_target)
else:
    print("No columns are missing in the target DataFrame (compared to the source).")

# Step 4: Add the missing 'signup_date' column to the target DataFrame
column_to_add = 'signup_date'

if column_to_add in missing_in_target:
    # You might want to initialize this column with a default value or NaN
    target_df[column_to_add] = pd.NaT  # Using NaT for datetime
    print(f"\nAdded the '{column_to_add}' column to the target DataFrame.")
    print("\nUpdated Target DataFrame Schema:")
    print(target_df.dtypes)
    print("\nUpdated Target DataFrame:")
    print(target_df)
else:
    print(f"\nThe '{column_to_add}' column was not missing in the target DataFrame.")

Source DataFrame Schema:
id                      int64
email                  object
signup_date    datetime64[ns]
dtype: object

Source DataFrame:
   id                email signup_date
0   1    alice@example.com  2023-01-15
1   2      bob@example.com  2023-02-20
2   3  charlie@example.com  2023-03-10

Target DataFrame Schema:
id        int64
email    object
dtype: object

Target DataFrame:
   id                email
0   1    alice@example.com
1   2      bob@example.com
2   3  charlie@example.com

Missing Columns in Target DataFrame (compared to Source):
{'signup_date'}

Added the 'signup_date' column to the target DataFrame.

Updated Target DataFrame Schema:
id                      int64
email                  object
signup_date    datetime64[ns]
dtype: object

Updated Target DataFrame:
   id                email signup_date
0   1    alice@example.com         NaT
1   2      bob@example.com         NaT
2   3  charlie@example.com         NaT
