# Measuring Completeness

**Activity Overview**: Evaluate data completeness by checking missing data rates and handling partially available records.

## Title: Customer Profiles

**Task**: Calculate the missing data rate for customer profiles.

**Steps**:
1. List all required fields for a complete customer profile (e.g., name, address, email,
phone number).
2. Analyze the dataset to count how many profiles have missing fields.
3. Calculate the percentage of missing data fields across all profiles.

In [1]:
# Write your code from here
import pandas as pd

print("\n--- Calculate Missing Data Rate for Customer Profiles ---")

# --- Configuration ---
DATASET_PATH = 'customer_profiles.csv'  # Path to your customer profile dataset
REQUIRED_FIELDS = ['name', 'address', 'email', 'phone_number']  # List your required fields

try:
    # Load the dataset
    customer_profiles = pd.read_csv(DATASET_PATH)

    # Initialize counters for missing fields
    total_profiles = len(customer_profiles)
    total_missing_fields = 0

    # Iterate through each required field and count missing values
    missing_counts = {}
    for field in REQUIRED_FIELDS:
        if field in customer_profiles.columns:
            missing_count = customer_profiles[field].isnull().sum()
            missing_counts[field] = missing_count
            total_missing_fields += missing_count
        else:
            print(f"Warning: Required field '{field}' not found in the dataset.")
            missing_counts[field] = 0  # Treat as 0 missing for calculation

    # Calculate the total number of expected fields across all profiles
    total_expected_fields = total_profiles * len(REQUIRED_FIELDS)

    # Calculate the overall percentage of missing fields
    if total_expected_fields > 0:
        overall_missing_percentage = (total_missing_fields / total_expected_fields) * 100
        print(f"\nTotal number of customer profiles: {total_profiles}")
        print("Missing counts per required field:")
        for field, count in missing_counts.items():
            print(f"- {field}: {count}")
        print(f"\nOverall percentage of missing data across all required fields: {overall_missing_percentage:.2f}%")
    else:
        print("No required fields specified or dataset is empty.")

except FileNotFoundError as e:
    print(f"Error: The CSV file '{DATASET_PATH}' was not found: {e}")
except Exception as e:
    print(f"An unexpected error occurred: {e}")


--- Calculate Missing Data Rate for Customer Profiles ---
Error: The CSV file 'customer_profiles.csv' was not found: [Errno 2] No such file or directory: 'customer_profiles.csv'
