## Data Quality Framework Implementation

**Description**: Implement a simple data quality measurement framework using ISO 8000 principles to assess key dimensions in a dataset.

In [None]:
# Write a conceptual framework described in Python pseudo-code:

In [1]:
import pandas as pd

# Load dataset
df = pd.read_csv("swiggy.csv")

# Initialize results dictionary
data_quality_report = {}

# Completeness: % of non-null values
completeness = df.notnull().mean() * 100
data_quality_report['Completeness (%)'] = completeness.round(2)

# Uniqueness: % of unique rows
uniqueness = (len(df.drop_duplicates()) / len(df)) * 100
data_quality_report['Uniqueness (%)'] = uniqueness

# Accuracy: Check for known constraints (example: rating between 1 and 5)
# You can adjust these based on your data schema
if 'rating' in df.columns:
    accuracy = ((df['rating'] >= 1) & (df['rating'] <= 5)).mean() * 100
    data_quality_report['Accuracy (rating between 1-5) (%)'] = round(accuracy, 2)

# Consistency: Example - check if 'price' is consistent with 'discounted_price'
if 'price' in df.columns and 'discounted_price' in df.columns:
    consistency = (df['discounted_price'] <= df['price']).mean() * 100
    data_quality_report['Consistency (discounted <= price) (%)'] = round(consistency, 2)

# Timeliness: Example - check if 'last_updated' is within last 30 days
import datetime
if 'last_updated' in df.columns:
    df['last_updated'] = pd.to_datetime(df['last_updated'], errors='coerce')
    recent_cutoff = pd.Timestamp.now() - pd.Timedelta(days=30)
    timeliness = (df['last_updated'] >= recent_cutoff).mean() * 100
    data_quality_report['Timeliness (last 30 days) (%)'] = round(timeliness, 2)

# Display results
for metric, value in data_quality_report.items():
    print(f"{metric}:\n{value}\n")


Completeness (%):
ID               100.0
Area             100.0
City             100.0
Restaurant       100.0
Price            100.0
Avg ratings      100.0
Total ratings    100.0
Food type        100.0
Address          100.0
Delivery time    100.0
dtype: float64

Uniqueness (%):
100.0

