### Define Data Quality KPIs

**Task 1**: Identify Relevant KPIs

**Objective**: Develop KPIs that align with organizational goals.

**Steps**:
1. Choose a dataset from a domain of your interest (e.g., sales data, healthcare records, or transaction logs).
2. Identify three KPIs that would be crucial for assessing the data quality in your chosen dataset. Consider accuracy, completeness, and timeliness.
3. Document why each KPI is important for maintaining high-quality data in your given context.

In [None]:
# Write your code from here
# Define KPIs for a dataset with relevance to data quality

def get_data_quality_kpis():
    kpis = {
        "Accuracy Rate": {
            "definition": "Percentage of transactions with correct product codes and prices",
            "importance": "Ensures billing and inventory accuracy, preventing revenue loss"
        },
        "Completeness Rate": {
            "definition": "Percentage of transactions with all mandatory fields filled",
            "importance": "Prevents reporting gaps and supports downstream analytics"
        },
        "Timeliness": {
            "definition": "Percentage of records entered within 24 hours of transaction",
            "importance": "Supports near-real-time decision-making and forecasting"
        }
    }
    return kpis

# Example usage
kpis = get_data_quality_kpis()
for kpi, details in kpis.items():
    print(f"{kpi}:\n  Definition: {details['definition']}\n  Importance: {details['importance']}\n")


**Task 2**: Develop a KPI Dashboard

**Objective**: Visualize your KPIs for better monitoring.

**Steps**:
1. Use a tool like Excel or a BI tool (e.g., Tableau, Power BI) to create a simple dashboard.
2. Input sample data and visualize your chosen KPIs, showing how they would be monitored.
3. Share your dashboard with peers and gather feedback on KPI relevance and clarity.

In [None]:
# Write your code from here
import streamlit as st
import pandas as pd

def get_sample_kpi_data():
    data = {
        'KPI': ['Accuracy Rate', 'Completeness Rate', 'Timeliness'],
        'Current Value (%)': [97.5, 92.0, 85.5],
        'Target (%)': [99.0, 95.0, 90.0]
    }
    return pd.DataFrame(data)

st.title("Data Quality KPI Dashboard")

df = get_sample_kpi_data()

st.bar_chart(df.set_index('KPI')['Current Value (%)'])

st.write("### KPI Details")
for i, row in df.iterrows():
    st.write(f"**{row['KPI']}**: Current Value = {row['Current Value (%)']}%, Target = {row['Target (%)']}%")

# Optionally display raw data
if st.checkbox("Show raw data"):
    st.write(df)
