### Task 1: Detecting Missing Values during Data Ingestion
**Description**: You have a CSV file with missing values in some columns. Write a Python script to detect and report missing values during the ingestion process.

**Steps**:
1. Load data
2. Check for missing values
3. Report missing values

In [None]:
# Write your code from here
import pandas as pd

def detect_missing_values(csv_path):
    try:
        df = pd.read_csv(csv_path)
        missing_summary = df.isnull().sum()
        total_missing = missing_summary.sum()

        if total_missing == 0:
            print("No missing values found in the dataset.")
        else:
            print("Missing Values Detected:")
            print(missing_summary[missing_summary > 0])
    except Exception as e:
        print(f"Error reading CSV: {e}")

# Example usage
csv_file = "data.csv"  # Replace with your actual file path
detect_missing_values(csv_file)


### Task 2: Validate Data Types during Extraction
**Description**: You have a JSON file that should have specific data types for each field. Write a script to validate if the data types match the expected schema.

**Steps**:
1. Define expected schema
2. Validate data types

In [None]:
# Write your code from here
import json

def validate_data_types(json_path, expected_schema):
    try:
        with open(json_path, 'r') as file:
            data = json.load(file)

        for record in data:
            for field, expected_type in expected_schema.items():
                if field not in record:
                    print(f"Missing field '{field}' in record: {record}")
                elif not isinstance(record[field], expected_type):
                    print(f"Invalid type for field '{field}' in record: {record}")
    except Exception as e:
        print(f"Error processing JSON file: {e}")

# Example usage
schema = {
    "id": int,
    "name": str,
    "age": int,
    "email": str
}

json_file = "data.json"  # Replace with your actual file path
validate_data_types(json_file, schema)


### Task 3: Remove Duplicate Records in Data
**Description**: You have a dataset with duplicate entries. Write a Python script to find and remove duplicate records using Pandas.

**Steps**:
1. Find duplicate records
2. Remove duplicates
3. Report results

In [None]:
# Write your code from here
import pandas as pd

def remove_duplicates(csv_path):
    try:
        df = pd.read_csv(csv_path)
        initial_count = len(df)
        df_cleaned = df.drop_duplicates()
        final_count = len(df_cleaned)
        duplicates_removed = initial_count - final_count
        print(f"Total Records: {initial_count}")
        print(f"Duplicates Removed: {duplicates_removed}")
        print(f"Final Records: {final_count}")
        return df_cleaned
    except Exception as e:
        print(f"Error: {e}")

# Example usage
cleaned_df = remove_duplicates("data.csv")
