**Task 1**: Checking Null Values for Completeness

**Description**: Verify if there are any null values in a dataset, which indicate incomplete data.

Task 9: Custom Completeness Rule Violation Report

Description: Create a report showing which rows violate specific completeness rules, such as mandatory fields being empty.

In [1]:
# Write your code from here
import pandas as pd
import numpy as np
import re

# Sample dataset
data = {
    'id': [1, 2, 3, 4, 4],
    'name': ['Alice', 'Bob', None, 'David', 'David'],
    'email': ['alice@example.com', 'bob[at]example.com', 'charlie@example.com', '', 'david@example'],
    'age': [25, -1, 200, 35, np.nan],
    'score': [90, 85, 1000, 70, 75]
}

df = pd.DataFrame(data)

# Task 1: Checking Null Values for Completeness
null_values = df.isnull().sum()

# Task 2: Checking Data Type Validity
data_type_validity = df.dtypes

# Task 3: Verify Uniqueness of Identifiers
id_unique = df['id'].is_unique
email_unique = df['email'].nunique() == len(df)

# Task 4: Validate Email Format Using Regex
email_regex = r'^[\w\.-]+@[\w\.-]+\.\w+$'
df['email_valid'] = df['email'].apply(lambda x: bool(re.match(email_regex, str(x))))

# Task 5: Check for Logical Age Validity (0-120)
df['age_valid'] = df['age'].apply(lambda x: 0 <= x <= 120 if pd.notnull(x) else False)

# Task 6: Identify and Handle Missing Data (Impute with mean for 'age')
df['age'] = df['age'].fillna(df['age'].mean())

# Task 7: Detect Duplicates
duplicates = df.duplicated()

# Task 8: Validate Correctness of Numerical Values (score 0-100)
df['score_valid'] = df['score'].between(0, 100)

# Task 9: Custom Completeness Rule Violation Report (e.g., mandatory: name, email)
mandatory_fields = ['name', 'email']
violation_report = df[df[mandatory_fields].isnull().any(axis=1)]

# Task 10: Advanced Regex for Data Validity Check (e.g., name: only alphabets)
name_regex = r'^[A-Za-z]+$'
df['name_valid'] = df['name'].apply(lambda x: bool(re.match(name_regex, str(x))) if pd.notnull(x) else False)

# --- Output Section ---
print("=== Task 1: Null Values ===")
print(null_values)

print("\n=== Task 2: Data Type Validity ===")
print(data_type_validity)

print("\n=== Task 3: Unique Identifiers ===")
print(f"ID Unique: {id_unique}, Email Unique: {email_unique}")

print("\n=== Task 4: Email Format Validity ===")
print(df[['email', 'email_valid']])

print("\n=== Task 5: Logical Age Validity ===")
print(df[['age', 'age_valid']])

print("\n=== Task 6: Age After Imputation ===")
print(df['age'])

print("\n=== Task 7: Duplicate Rows ===")
print(df[duplicates])

print("\n=== Task 8: Score Validity ===")
print(df[['score', 'score_valid']])

print("\n=== Task 9: Completeness Rule Violations ===")
print(violation_report)

print("\n=== Task 10: Name Validity with Advanced Regex ===")
print(df[['name', 'name_valid']])

=== Task 1: Null Values ===
id       0
name     1
email    0
age      1
score    0
dtype: int64

=== Task 2: Data Type Validity ===
id         int64
name      object
email     object
age      float64
score      int64
dtype: object

=== Task 3: Unique Identifiers ===
ID Unique: False, Email Unique: True

=== Task 4: Email Format Validity ===
                 email  email_valid
0    alice@example.com         True
1   bob[at]example.com        False
2  charlie@example.com         True
3                             False
4        david@example        False

=== Task 5: Logical Age Validity ===
      age  age_valid
0   25.00       True
1   -1.00      False
2  200.00      False
3   35.00       True
4   64.75      False

=== Task 6: Age After Imputation ===
0     25.00
1     -1.00
2    200.00
3     35.00
4     64.75
Name: age, dtype: float64

=== Task 7: Duplicate Rows ===
Empty DataFrame
Columns: [id, name, email, age, score, email_valid, age_valid, score_valid, name_valid]
Index: []

=== Ta

**Task 2**: Checking Data Type Validity

**Description**: Ensure that columns contain data of expected types, e.g., ages are integers.

In [None]:
# Write your code from here

**Task 3**: Verify Uniqueness of Identifiers

**Description**: Check if a dataset has unique identifiers (e.g., emails).

In [None]:
# Write your code from here

Task 4: Validate Email Format Using Regex

Description: Validate if email addresses in a dataset have the correct format.

In [None]:
# Write your code from here

Task 5: Check for Logical Age Validity

Description: Ensure ages are within a reasonable human range (e.g., 0-120).

In [None]:
# Write your code from here

Task 6: Identify and Handle Missing Data

Description: Identify missing values in a dataset and impute them using a simple strategy (e.g., mean).

In [None]:
# Write your code from here

Task 7: Detect Duplicates

Description: Detect duplicate rows in the dataset.

In [None]:
# Write your code from here

Task 8: Validate Correctness of Numerical Values

Description: Ensure numerical columns are within a specified range.

In [None]:
# Write your code from here

In [None]:
# Write your code from here

Task 10: Advanced Regex for Data Validity Check

Description: Check for validity with advanced regex patterns, such as validating complex fields with multi-level rules.

In [None]:
# Write your code from here