# Measuring Completeness

**Activity Overview**: Evaluate data completeness by checking missing data rates and handling partially available records.

## Title: Customer Profiles

**Task**: Calculate the missing data rate for customer profiles.

**Steps**:
1. List all required fields for a complete customer profile (e.g., name, address, email,
phone number).
2. Analyze the dataset to count how many profiles have missing fields.
3. Calculate the percentage of missing data fields across all profiles.

In [1]:
# Write your code from here

import pandas as pd

# Step 1: Simulated dataset of customer profiles
customers = pd.DataFrame({
    'customer_id': [1, 2, 3, 4],
    'name': ['Alice Smith', 'Bob Jones', None, 'Dana Lee'],
    'address': ['123 Apple St', None, '789 Banana Ave', '456 Orange Blvd'],
    'email': ['alice@example.com', None, 'carol@example.com', 'dana@example.com'],
    'phone_number': ['123-456-7890', '987-654-3210', None, None]
})

# Step 2: Define required fields for completeness
required_fields = ['name', 'address', 'email', 'phone_number']

# Step 3: Count missing values per profile
customers['missing_fields_count'] = customers[required_fields].isnull().sum(axis=1)

# Step 4: Calculate completeness stats
total_fields = len(customers) * len(required_fields)
total_missing = customers[required_fields].isnull().sum().sum()
missing_percentage = (total_missing / total_fields) * 100

# Output results
print("Customer Profile Data:")
print(customers)

print(f"\nTotal required fields: {total_fields}")
print(f"Total missing fields: {total_missing}")
print(f"Missing data rate: {missing_percentage:.2f}%")

print("\nMissing data count per field:")
print(customers[required_fields].isnull().sum())

Customer Profile Data:
   customer_id         name          address              email  phone_number  \
0            1  Alice Smith     123 Apple St  alice@example.com  123-456-7890   
1            2    Bob Jones             None               None  987-654-3210   
2            3         None   789 Banana Ave  carol@example.com          None   
3            4     Dana Lee  456 Orange Blvd   dana@example.com          None   

   missing_fields_count  
0                     0  
1                     2  
2                     2  
3                     1  

Total required fields: 16
Total missing fields: 5
Missing data rate: 31.25%

Missing data count per field:
name            1
address         1
email           1
phone_number    2
dtype: int64
