### Task 1: Detecting Missing Values during Data Ingestion
**Description**: You have a CSV file with missing values in some columns. Write a Python script to detect and report missing values during the ingestion process.

**Steps**:
1. Load data
2. Check for missing values
3. Report missing values

In [1]:
# Write your code from here

import pandas as pd
import json

# ------------------ Task 1: Detect Missing Values ------------------

# Simulate CSV data as DataFrame with missing values
data_csv = {
    "id": [1, 2, 3, 4, 5],
    "name": ["Alice", None, "Charlie", "David", "Eva"],
    "age": [25, 30, None, 40, 35],
    "email": ["alice@example.com", "bob@example.com", "charlie@example.com", None, "eva@example.com"]
}
df = pd.DataFrame(data_csv)

def detect_missing_values(df):
    missing_report = df.isnull().sum()
    total_missing = missing_report.sum()
    print("Missing Values Report:")
    print(missing_report)
    print(f"Total missing values: {total_missing}\n")
    if total_missing > 0:
        print("Warning: Dataset contains missing values!\n")
    else:
        print("No missing values detected.\n")

detect_missing_values(df)

# ------------------ Task 2: Validate Data Types ------------------

# Simulate JSON data as a Python list of dicts (like extracted from JSON)
data_json = [
    {"id": 1, "name": "Alice", "age": 25, "email": "alice@example.com"},
    {"id": 2, "name": "Bob", "age": "30", "email": "bob@example.com"},  # age as string (invalid)
    {"id": 3, "name": "Charlie", "age": 22, "email": "charlie@example.com"},
    {"id": 4, "name": "David", "age": 40, "email": None}               # email as None (acceptable as string type)
]

# Expected schema: field -> type
expected_schema = {
    "id": int,
    "name": str,
    "age": int,
    "email": (str, type(None))  # email can be string or None (nullable)
}

def validate_data_types(data, schema):
    print("Data Type Validation Report:")
    errors = []
    for i, record in enumerate(data):
        for field, expected_type in schema.items():
            value = record.get(field, None)
            if not isinstance(value, expected_type):
                errors.append((i, field, value, type(value).__name__))
    if errors:
        for err in errors:
            print(f"Row {err[0]} - Field '{err[1]}' has invalid type '{err[3]}', value: {err[2]}")
    else:
        print("All records conform to the expected schema.")
    print()

validate_data_types(data_json, expected_schema)

# ------------------ Task 3: Remove Duplicate Records ------------------

# Simulate dataset with duplicates
data_duplicates = {
    "id": [1, 2, 3, 2, 4, 5, 3],
    "name": ["Alice", "Bob", "Charlie", "Bob", "David", "Eva", "Charlie"],
    "age": [25, 30, 22, 30, 40, 35, 22],
    "email": [
        "alice@example.com", "bob@example.com", "charlie@example.com",
        "bob@example.com", "david@example.com", "eva@example.com", "charlie@example.com"
    ]
}
df_dup = pd.DataFrame(data_duplicates)

def remove_duplicates(df):
    print("Duplicate Records Detection:")
    duplicate_rows = df[df.duplicated(keep=False)]
    if not duplicate_rows.empty:
        print(duplicate_rows)
        print(f"Number of duplicate rows found: {duplicate_rows.shape[0]}")
        df_cleaned = df.drop_duplicates()
        print(f"Number of rows after removing duplicates: {df_cleaned.shape[0]}\n")
    else:
        print("No duplicates found.\n")
        df_cleaned = df
    return df_cleaned

df_clean = remove_duplicates(df_dup)

Missing Values Report:
id       0
name     1
age      1
email    1
dtype: int64
Total missing values: 3


Data Type Validation Report:
Row 1 - Field 'age' has invalid type 'str', value: 30

Duplicate Records Detection:
   id     name  age                email
1   2      Bob   30      bob@example.com
2   3  Charlie   22  charlie@example.com
3   2      Bob   30      bob@example.com
6   3  Charlie   22  charlie@example.com
Number of duplicate rows found: 4
Number of rows after removing duplicates: 5



### Task 2: Validate Data Types during Extraction
**Description**: You have a JSON file that should have specific data types for each field. Write a script to validate if the data types match the expected schema.

**Steps**:
1. Define expected schema
2. Validate data types

In [None]:
# Write your code from here

### Task 3: Remove Duplicate Records in Data
**Description**: You have a dataset with duplicate entries. Write a Python script to find and remove duplicate records using Pandas.

**Steps**:
1. Find duplicate records
2. Remove duplicates
3. Report results

In [None]:
# Write your code from here