## Check Uniqueness & Validity

**Objective**: Evaluate data quality by checking for uniqueness and validity of data entries.

For this activity, you will use a sample dataset students.csv that contains the following
columns: ID , Name , Age , Grade , Email .

**Steps**:
1. Check Uniqueness
    - Unique IDs
    - Unique Email Addresses
    - Unique Combination

2. Check Validity
    - Validate Age Range
    - Validate Grade Scale
    - Validate Name Format

In [1]:
# Write your code from here

import pandas as pd
import numpy as np
import re

# Sample Data
data = {
    'ID': [1, 2, 3, 4, 5],
    'Name': ['Alice', 'Bob', None, 'David', 'Eva'],
    'Age': [20, 21, np.nan, 19, 'twenty-two'],
    'Grade': [85.5, 92.0, 88.0, None, 101],
    'Email': ['alice@example.com', 'bob@example', 'charlie@example.com', '', None]
}

df = pd.DataFrame(data)

print("=== DATA PREVIEW ===")
print(df)

# ---------------------------------------
# VALIDATION FUNCTIONS
# ---------------------------------------

def is_valid_age(age):
    try:
        age = float(age)
        return 0 < age <= 120
    except:
        return False

def is_valid_grade(grade):
    try:
        grade = float(grade)
        return 0 <= grade <= 100
    except:
        return False

def is_valid_name(name):
    return isinstance(name, str) and bool(re.match(r'^[A-Za-z\s\-]+$', name))

def is_valid_email(email):
    if not isinstance(email, str) or not email:
        return False
    email_regex = r'^[\w\.-]+@[\w\.-]+\.\w+$'
    return re.match(email_regex, email) is not None

# ---------------------------------------
# UNIQUENESS CHECKS
# ---------------------------------------
print("\n=== UNIQUENESS CHECKS ===")
print(f"Unique IDs: {df['ID'].is_unique}")
print(f"Unique Emails: {df['Email'].is_unique}")
print(f"Unique (Name, Age) combinations: {df[['Name', 'Age']].duplicated().sum()} duplicates found")

# ---------------------------------------
# VALIDITY CHECKS
# ---------------------------------------
df['age_valid'] = df['Age'].apply(is_valid_age)
df['grade_valid'] = df['Grade'].apply(is_valid_grade)
df['name_valid'] = df['Name'].apply(is_valid_name)
df['email_valid'] = df['Email'].apply(is_valid_email)

print("\n=== VALIDITY CHECKS ===")
print(df[['Age', 'age_valid']])
print(df[['Grade', 'grade_valid']])
print(df[['Name', 'name_valid']])
print(df[['Email', 'email_valid']])

# ---------------------------------------
# INVALID ROWS SUMMARY
# ---------------------------------------
invalid_rows = df[~(df['age_valid'] & df['grade_valid'] & df['name_valid'] & df['email_valid'])]
print("\n=== INVALID DATA ROWS (Any Issue) ===")
print(invalid_rows)

=== DATA PREVIEW ===
   ID   Name         Age  Grade                Email
0   1  Alice          20   85.5    alice@example.com
1   2    Bob          21   92.0          bob@example
2   3   None         NaN   88.0  charlie@example.com
3   4  David          19    NaN                     
4   5    Eva  twenty-two  101.0                 None

=== UNIQUENESS CHECKS ===
Unique IDs: True
Unique Emails: True
Unique (Name, Age) combinations: 0 duplicates found

=== VALIDITY CHECKS ===
          Age  age_valid
0          20       True
1          21       True
2         NaN      False
3          19       True
4  twenty-two      False
   Grade  grade_valid
0   85.5         True
1   92.0         True
2   88.0         True
3    NaN        False
4  101.0        False
    Name  name_valid
0  Alice        True
1    Bob        True
2   None       False
3  David        True
4    Eva        True
                 Email  email_valid
0    alice@example.com         True
1          bob@example        False
2  c