## Check Accuracy & Completeness

**Objective**: Learn to assess data quality by checking for accuracy and completeness using Python.

For this, you will use a sample dataset students.csv that contains the following
columns: ID , Name , Age , Grade , Email .

**Steps**:
1. Check Accuracy
    - Verify Numerical Data Accuracy
    - Validate Email Format
    - Integer Accuracy Check for Age
2. Check Completeness
    - Identify Missing Values
    - Rows with Missing Data
    - Column Specific Missing Value Check

In [2]:
# Write your code from here

import pandas as pd
import numpy as np
import re

# Simulating students.csv data
data = {
    'ID': [1, 2, 3, 4, 5],
    'Name': ['Alice', 'Bob', None, 'David', 'Eva'],
    'Age': [20, 21, np.nan, 19, 'twenty-two'],
    'Grade': [85.5, 92.0, 88.0, None, 101],
    'Email': ['alice@example.com', 'bob@example', 'charlie@example.com', '', None]
}

df = pd.DataFrame(data)

print("=== DATAFRAME PREVIEW ===")
print(df)

# ------------------------------
# ACCURACY CHECKS
# ------------------------------

# 1. Numerical Data Accuracy (Grade between 0-100)
df['grade_valid'] = df['Grade'].apply(lambda x: 0 <= x <= 100 if pd.notnull(x) else False)

# 2. Email Format Validation
email_pattern = r'^[\w\.-]+@[\w\.-]+\.\w+$'
df['email_valid'] = df['Email'].apply(lambda x: bool(re.match(email_pattern, str(x))) if pd.notnull(x) else False)

# 3. Integer Accuracy Check for Age
df['age_valid'] = df['Age'].apply(lambda x: isinstance(x, (int, float)) and float(x).is_integer() if pd.notnull(x) else False)

# ------------------------------
# COMPLETENESS CHECKS
# ------------------------------

# 1. Identify Missing Values (per column)
missing_values = df.isnull().sum()

# 2. Rows with Missing Data
rows_with_missing = df[df.isnull().any(axis=1)]

# 3. Column-specific Missing Value Check
missing_by_column = df.isnull().sum()

# ------------------------------
# OUTPUT
# ------------------------------

print("\n=== ACCURACY CHECKS ===")
print(df[['Age', 'age_valid']])
print(df[['Grade', 'grade_valid']])
print(df[['Email', 'email_valid']])

print("\n=== COMPLETENESS CHECKS ===")
print("Missing values per column:")
print(missing_by_column)

print("\nRows with missing data:")
print(rows_with_missing)

=== DATAFRAME PREVIEW ===
   ID   Name         Age  Grade                Email
0   1  Alice          20   85.5    alice@example.com
1   2    Bob          21   92.0          bob@example
2   3   None         NaN   88.0  charlie@example.com
3   4  David          19    NaN                     
4   5    Eva  twenty-two  101.0                 None

=== ACCURACY CHECKS ===
          Age  age_valid
0          20       True
1          21       True
2         NaN      False
3          19       True
4  twenty-two      False
   Grade  grade_valid
0   85.5         True
1   92.0         True
2   88.0         True
3    NaN        False
4  101.0        False
                 Email  email_valid
0    alice@example.com         True
1          bob@example        False
2  charlie@example.com         True
3                             False
4                 None        False

=== COMPLETENESS CHECKS ===
Missing values per column:
ID             0
Name           1
Age            1
Grade          1
Email    