In [2]:
# Load and activate the SQL extension to allows us to execute SQL in a Jupyter notebook.
%load_ext sql
# Establish a connection to the local database using the '%sql' magic command
%sql mysql+pymysql://root:Dsk264501@localhost:3306/md_water_servicesb
%config SqlMagic.style = '_DEPRECATED_DEFAULT'

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


# Weaving The Data Threads of Maji Ndogo's Narrative

In this part, we focus on assessing water quality, pollution, and audit reports. We compare employee observations with auditor scores, uncover discrepancies, and investigate potential data quality and integrity issues that affect public health.

## 1. Generating an ERD

Understanding the database structure is key before diving into analysis. Below is the Entity Relationship Diagram (ERD), which visually represents the schema. (*The ERD image will be included in the README*).

## 2. Integrating the Auditor Report

We add the auditor report to our database for comparison against survey data.

In [3]:
%%sql
-- Weaving The Data Threads of Maji Ndogo's Narrative

-- Creating the auditor report table and importing the auditor report csv file in the table.
DROP TABLE IF EXISTS auditor_report;
CREATE TABLE auditor_report (
    location_id VARCHAR(32),
    type_of_water_source VARCHAR(64),
    true_water_source_score INT DEFAULT NULL,
    statements VARCHAR(255)
);

 * mysql+pymysql://root:***@localhost:3306/md_water_servicesb
0 rows affected.
0 rows affected.


[]

In [7]:
%%sql
-- Comparing the quality scores in the water_quality table to the auditor's scores.

SELECT 
    ar.location_id,
    v.record_id,
    ar.true_water_source_score AS auditor_score,
    w.subjective_quality_score AS employee_score
FROM auditor_report AS ar
INNER JOIN visits AS v ON ar.location_id = v.location_id
INNER JOIN water_quality AS w ON v.record_id = w.record_id
limit 25;

 * mysql+pymysql://root:***@localhost:3306/md_water_servicesb
25 rows affected.


location_id,record_id,auditor_score,employee_score
SoRu34980,5185,1,1
AkRu08112,59367,3,3
AkLu02044,37379,0,0
AkHa00421,51627,3,3
SoRu35221,28758,0,0
HaAm16170,31048,1,1
AkRu04812,1513,3,3
AkRu08304,1218,3,3
AkRu05107,8322,2,2
AkRu05215,21160,3,10


In [8]:
%%sql
-- Investigating the auditor and employees' scores.
-- This query checks if the auditor's scores are equal to the surveyor's scores for each visit made to a location.
    
SELECT 
    ar.location_id,
    v.record_id,
    ar.true_water_source_score AS auditor_score,
    w.subjective_quality_score AS employee_score
FROM auditor_report AS ar
INNER JOIN visits AS v ON ar.location_id = v.location_id
INNER JOIN water_quality AS w ON v.record_id = w.record_id
WHERE ar.true_water_source_score = w.subjective_quality_score
  AND v.visit_count = 1
Limit 25;

 * mysql+pymysql://root:***@localhost:3306/md_water_servicesb
25 rows affected.


location_id,record_id,auditor_score,employee_score
SoRu34980,5185,1,1
AkRu08112,59367,3,3
AkLu02044,37379,0,0
AkHa00421,51627,3,3
SoRu35221,28758,0,0
HaAm16170,31048,1,1
AkRu04812,1513,3,3
AkRu08304,1218,3,3
AkRu05107,8322,2,2
HaDe16541,13070,2,2


In [9]:
%%sql
-- Checking for incorrect records
-- This query checks if the auditor's scores are not equal to the surveyor's scores for each visit made to a location.
    
SELECT 
    ar.location_id,
    v.record_id,
    ar.type_of_water_source AS auditor_source,
    ws.type_of_water_source AS survey_source,
    ar.true_water_source_score AS auditor_score,
    w.subjective_quality_score AS employee_score
FROM auditor_report AS ar
INNER JOIN visits AS v ON ar.location_id = v.location_id
INNER JOIN water_quality AS w ON v.record_id = w.record_id
INNER JOIN water_source AS ws ON v.source_id = ws.source_id
WHERE ar.true_water_source_score != w.subjective_quality_score
  AND v.visit_count = 1
limit 25;

 * mysql+pymysql://root:***@localhost:3306/md_water_servicesb
25 rows affected.


location_id,record_id,auditor_source,survey_source,auditor_score,employee_score
AkRu05215,21160,well,well,3,10
KiRu29290,7938,shared_tap,shared_tap,3,10
KiHa22748,43140,tap_in_home_broken,tap_in_home_broken,9,10
SoRu37841,18495,shared_tap,shared_tap,6,10
KiRu27884,33931,well,well,1,10
KiZu31170,17950,tap_in_home_broken,tap_in_home_broken,9,10
KiZu31370,36864,shared_tap,shared_tap,3,10
AkRu06495,45924,well,well,2,10
HaRu17528,30524,well,well,1,10
SoRu38331,13192,shared_tap,shared_tap,3,10


## 3. Linking Records

We join employee data to identify who was responsible for incorrect or mismatched records.

In [10]:
%%sql
-- Joining the employee information
-- This query returns the names of the employees who made errors in their quality score observations

SELECT 
    ar.location_id,
    v.record_id,
    e.employee_name,
    ar.true_water_source_score AS auditor_score,
    w.subjective_quality_score AS employee_score
FROM auditor_report AS ar
INNER JOIN visits AS v ON ar.location_id = v.location_id
INNER JOIN water_quality AS w ON v.record_id = w.record_id
INNER JOIN employee AS e ON v.assigned_employee_id = e.assigned_employee_id
WHERE ar.true_water_source_score != w.subjective_quality_score
  AND v.visit_count = 1
limit 25;

 * mysql+pymysql://root:***@localhost:3306/md_water_servicesb
25 rows affected.


location_id,record_id,employee_name,auditor_score,employee_score
AkRu05215,21160,Rudo Imani,3,10
KiRu29290,7938,Bello Azibo,3,10
KiHa22748,43140,Bello Azibo,9,10
SoRu37841,18495,Rudo Imani,6,10
KiRu27884,33931,Bello Azibo,1,10
KiZu31170,17950,Zuriel Matembo,9,10
KiZu31370,36864,Yewande Ebele,3,10
AkRu06495,45924,Bello Azibo,2,10
HaRu17528,30524,Jengo Tumaini,1,10
SoRu38331,13192,Zuriel Matembo,3,10


In [11]:
%%sql
-- Converting the above information as a common table expression (CTE)

WITH Incorrect_records AS (
    SELECT 
        ar.location_id,
        v.record_id,
        e.employee_name,
        ar.true_water_source_score AS auditor_score,
        w.subjective_quality_score AS employee_score
    FROM auditor_report AS ar
    INNER JOIN visits AS v ON ar.location_id = v.location_id
    INNER JOIN water_quality AS w ON v.record_id = w.record_id
    INNER JOIN employee AS e ON v.assigned_employee_id = e.assigned_employee_id
    WHERE ar.true_water_source_score != w.subjective_quality_score
      AND v.visit_count = 1),
error_count AS (
    SELECT 
        employee_name,
        COUNT(*) AS number_of_mistakes
    FROM Incorrect_records
    GROUP BY employee_name)
SELECT * FROM error_count
limit 25;

 * mysql+pymysql://root:***@localhost:3306/md_water_servicesb
17 rows affected.


employee_name,number_of_mistakes
Rudo Imani,5
Bello Azibo,26
Zuriel Matembo,17
Yewande Ebele,3
Jengo Tumaini,3
Farai Nia,4
Malachi Mavuso,21
Makena Thabo,3
Lalitha Kaburi,7
Gamba Shani,3


## 4. Gathering Evidence

Now we examine which employees made more errors than average, create views, and identify suspicious patterns of misconduct.

In [13]:
%%sql
-- Investigating employees with number of mistakes greater than the average number of mistakes
-- This query returns the employees who made mistakes more than the average (6) number of mistakes.

WITH Incorrect_records AS (
    SELECT 
        ar.location_id,
        v.record_id,
        e.employee_name,
        ar.true_water_source_score AS auditor_score,
        w.subjective_quality_score AS employee_score
    FROM auditor_report AS ar
    INNER JOIN visits AS v ON ar.location_id = v.location_id
    INNER JOIN water_quality AS w ON v.record_id = w.record_id
    INNER JOIN employee AS e ON v.assigned_employee_id = e.assigned_employee_id
    WHERE ar.true_water_source_score != w.subjective_quality_score
      AND v.visit_count = 1),
error_count AS (
    SELECT 
        employee_name,
        COUNT(*) AS number_of_mistakes
    FROM Incorrect_records
    GROUP BY employee_name)
    
SELECT
    employee_name,
    ROUND(AVG(number_of_mistakes), 2) AS avg_number_of_mistakes
FROM error_count
WHERE number_of_mistakes > (SELECT ROUND(AVG(number_of_mistakes), 2) FROM error_count)
GROUP BY employee_name
limit 25;

 * mysql+pymysql://root:***@localhost:3306/md_water_servicesb
4 rows affected.


employee_name,avg_number_of_mistakes
Bello Azibo,26.0
Zuriel Matembo,17.0
Malachi Mavuso,21.0
Lalitha Kaburi,7.0


In [14]:
%%sql
-- Creating a view of Incorrect records

CREATE VIEW Incorrect_records AS (
    SELECT
        ar.location_id,
        v.record_id,
        e.employee_name,
        ar.true_water_source_score AS auditor_score,
        wq.subjective_quality_score AS employee_score,
        ar.statements AS statements
    FROM auditor_report AS ar
    JOIN visits AS v ON ar.location_id = v.location_id
    JOIN water_quality AS wq ON v.record_id = wq.record_id
    JOIN employee AS e ON e.assigned_employee_id = v.assigned_employee_id
    WHERE v.visit_count = 1
      AND ar.true_water_source_score != wq.subjective_quality_score);

 * mysql+pymysql://root:***@localhost:3306/md_water_servicesb
0 rows affected.


[]

In [15]:
%%sql
-- Creating the error count and suspect list CTEs
-- The query below creates a temporary table for error count of each employee and a suspect list containing "corrupt" employees.
    
WITH error_count AS (
    SELECT employee_name, COUNT(employee_name) AS number_of_mistakes
    FROM Incorrect_records
    GROUP BY employee_name),

/*
Incorrect_records is a view that joins the audit report to the database
for records where the auditor and employees scores are different*/

suspect_list AS (
    SELECT employee_name, number_of_mistakes
    FROM error_count
    WHERE number_of_mistakes > (SELECT AVG(number_of_mistakes) FROM error_count))

-- This query filters all of the records where the "corrupt" employees gathered data.
    
SELECT employee_name, location_id, statements
FROM Incorrect_records
WHERE employee_name IN (SELECT employee_name FROM suspect_list)
limit 25;

 * mysql+pymysql://root:***@localhost:3306/md_water_servicesb
25 rows affected.


employee_name,location_id,statements
Bello Azibo,KiRu29290,"A young artist sketches the faces in the queue, capturing the weariness of daily hours spent waiting for water."
Bello Azibo,KiHa22748,"A young girl's hopeful eyes are clouded by mistrust, her innocence tarnished by the corrupt system."
Bello Azibo,KiRu27884,"A traditional healer's empathy turns to bitterness, knowing that corrupt practices harm her community."
Zuriel Matembo,KiZu31170,"A community leader stood with his people, expressing concern for the water quality and the time lost in queues."","""
Bello Azibo,AkRu06495,"A healthcare worker in the queue expressed fears about water-borne diseases, her face etched with worry."","""
Zuriel Matembo,SoRu38331,"An unsettling atmosphere surrounded the official, as villagers shared their experiences of arrogance and lack of dedication. The mention of cash exchanges only intensified their doubts."
Malachi Mavuso,AmAm09607,Villagers spoke of an unsettling encounter with an official who appeared dismissive and detached. The reference to cash transactions added to their growing sense of distrust.
Zuriel Matembo,AkHa00314,"A street vendor's sales suffer from time spent waiting, her concern for the water's quality affecting her products."
Malachi Mavuso,KiRu26598,"A teenager's dreams are tempered by reality, her future threatened by the corrupt practices she sees around her."
Bello Azibo,KiIs23853,Villagers' wary accounts of an official's arrogance and detachment from their concerns raised suspicions. The mention of cash changing hands further tainted their perception.
