# Weaving the Data Threads of Maji Ndogo's Narrative

In [1]:
# Load the sql extension
%load_ext sql

In [2]:
# Create a connection to the mysql 'md_water_services' database
%sql mysql+pymysql://root:password@localhost:3306/md_water_services

## Maji Ndogo Water Services ERD

An audit has been conducted and we want to integrate the audit into the database. For us to integrate it successfully, we need to examine our current ERD of the database thoroughly to understand the relationships between the tables.

![The Maji Ndogo Water Services ERD!](./md_water_services_erd.png)

## ERD Investigative Analysis

From the ERD above, we can see that the visits table is the central table connecting other tables together.

- `location_id` is the **PRIMARY KEY** in the `location` table and a **FOREIGN KEY** in the `visits` table.
- `source_id` is the **PRIMARY KEY** in the `water_source` table and a **FOREIGN KEY** in the `visits` table.
- `assigned_employee_id` is the **PRIMARY KEY** in the `employee` table and a **FOREIGN KEY** in the `visits` table.

In a nutshell, the `visits` table logs **multiple** instances that a unique `location` was visited by a unique `employee` with interest to a particular `water_source`, hence the relationship between the three tables with the `visits` tables exudes a **one-to-many** relationship.

However, according to the ERD, the relationship between the `visits` table and `water_quality` table is a **one-to-many** relationship and yet according to our initial understanding, there should be one unique corresponding record in a water quality table to that of the `visits` table aluding to a potential error in the representation of a relationship between the two tables. hence we need to correct that.

![The Updated Maji Ndogo Water Services ERD!](./updated_md_water_services_erd.png)

## Integrating the Auditor's Report

Now that we have a proper representation of the relationships in our database, we can proceed to import the data from the auditor's report which is in a `.csv` format. To do this, we need to follow the steps below:

1. Create an empty `auditor_report` table in the `md_water_services` database. To do this we run the following in **MySQL Workbench**:

```sql
DROP TABLE IF EXISTS `auditor_report`;

CREATE TABLE `auditor_report` (
    `location_id` VARCHAR(32),
    `type_of_water_source` VARCHAR(64),
    `true_water_source_score` INT DEFAULT NULL,
    `statements` VARCHAR(255)
);
```

2. [Import]('https://www.youtube.com/watch?v=sfRwJH04QJc') the data sent by the auditor in `.csv` format on **MySQL Workbench**. Remember to use an existing table since we've already created an empty table from the first step.

## Questions to Answer

### Question 1: Is There a Difference in the Scores?

In [3]:
%%sql
# Get location_id and true_water_source_score columns from auditor report
SELECT
    location_id,
    true_water_source_score
FROM
    md_water_services.auditor_report;

location_id,true_water_source_score
SoRu34980,1
AkRu08112,3
AkLu02044,0
AkHa00421,3
SoRu35221,0
HaAm16170,1
AkRu04812,3
AkRu08304,3
AkRu05107,2
AkRu05215,3


In [4]:
%%sql
# Join the visits table to the auditor report table on the location_id to access the record_id
SELECT
    auditor_report.location_id AS audit_location,
    visits.location_id AS visit_location,
    visits.record_id,
    auditor_report.true_water_source_score
FROM
    md_water_services.visits
JOIN
    md_water_services.auditor_report
    ON auditor_report.location_id = visits.location_id;

audit_location,visit_location,record_id,true_water_source_score
SoRu34980,SoRu34980,5185,1
AkRu08112,AkRu08112,59367,3
AkLu02044,AkLu02044,37379,0
AkHa00421,AkHa00421,51627,3
SoRu35221,SoRu35221,28758,0
HaAm16170,HaAm16170,31048,1
AkRu04812,AkRu04812,1513,3
AkRu08304,AkRu08304,1218,3
AkRu05107,AkRu05107,8322,2
AkRu05215,AkRu05215,21160,3


In [5]:
%%sql
# Join the water_quality table to the newly joined table on record_id to access the subjective_quality_score
SELECT
    auditor_report.location_id AS audit_location,
    visits.record_id,
    auditor_report.true_water_source_score AS auditor_score,
    water_quality.subjective_quality_score AS surveyor_score
FROM
    md_water_services.auditor_report
JOIN
    md_water_services.visits
    ON visits.location_id = auditor_report.location_id
JOIN
    md_water_services.water_quality
    ON visits.record_id = water_quality.record_id;

audit_location,record_id,auditor_score,surveyor_score
SoRu34980,5185,1,1
AkRu08112,59367,3,3
AkLu02044,37379,0,0
AkHa00421,51627,3,3
SoRu35221,28758,0,0
HaAm16170,31048,1,1
AkRu04812,1513,3,3
AkRu08304,1218,3,3
AkRu05107,8322,2,2
AkRu05215,21160,3,10


In [6]:
%%sql
# Check the differences between the auditor's scores and surveyor's score
SELECT
    auditor_report.location_id AS audit_location,
    visits.record_id,
    auditor_report.true_water_source_score AS auditor_score,
    water_quality.subjective_quality_score AS surveyor_score
FROM
    md_water_services.auditor_report
JOIN
    md_water_services.visits
    ON visits.location_id = auditor_report.location_id
JOIN
    md_water_services.water_quality
    ON visits.record_id = water_quality.record_id
WHERE auditor_report.true_water_source_score = water_quality.subjective_quality_score;

audit_location,record_id,auditor_score,surveyor_score
SoRu34980,5185,1,1
AkRu08112,59367,3,3
AkLu02044,37379,0,0
AkHa00421,51627,3,3
SoRu35221,28758,0,0
HaAm16170,31048,1,1
AkRu04812,1513,3,3
AkRu08304,1218,3,3
AkRu05107,8322,2,2
HaDe16541,13070,2,2


In [7]:
%%sql
# Remove the records of locations removed more than once in the visits table
SELECT
    auditor_report.location_id AS audit_location,
    visits.record_id,
    auditor_report.true_water_source_score AS auditor_score,
    water_quality.subjective_quality_score AS surveyor_score
FROM
    md_water_services.auditor_report
JOIN
    md_water_services.visits
    ON visits.location_id = auditor_report.location_id
JOIN
    md_water_services.water_quality
    ON visits.record_id = water_quality.record_id
WHERE 
    visits.visit_count = 1 
    AND auditor_report.true_water_source_score = water_quality.subjective_quality_score;

audit_location,record_id,auditor_score,surveyor_score
SoRu34980,5185,1,1
AkRu08112,59367,3,3
AkLu02044,37379,0,0
AkHa00421,51627,3,3
SoRu35221,28758,0,0
HaAm16170,31048,1,1
AkRu04812,1513,3,3
AkRu08304,1218,3,3
AkRu05107,8322,2,2
HaDe16541,13070,2,2


In [8]:
# Calculate the percentage of the corrected scores from the surveyors according the to the auditor
print(f"Approximately {round((1518 / 1620) * 100)}% of surveyors scores were correct according to the auditor's records")

Approximately 94% of surveyors scores were correct according to the auditor's records


In [9]:
%%sql
# Let's check records that were incorrect according to the auditor's report
SELECT
    auditor_report.location_id AS audit_location,
    visits.record_id,
    auditor_report.true_water_source_score AS auditor_score,
    water_quality.subjective_quality_score AS surveyor_score
FROM
    md_water_services.auditor_report
JOIN
    md_water_services.visits
    ON visits.location_id = auditor_report.location_id
JOIN
    md_water_services.water_quality
    ON visits.record_id = water_quality.record_id
WHERE 
    visits.visit_count = 1 
    AND auditor_report.true_water_source_score != water_quality.subjective_quality_score;

audit_location,record_id,auditor_score,surveyor_score
AkRu05215,21160,3,10
KiRu29290,7938,3,10
KiHa22748,43140,9,10
SoRu37841,18495,6,10
KiRu27884,33931,1,10
KiZu31170,17950,9,10
KiZu31370,36864,3,10
AkRu06495,45924,2,10
HaRu17528,30524,1,10
SoRu38331,13192,3,10


In [10]:
%%sql
# Let's check records that were incorrect including the water sources information
SELECT
    auditor_report.location_id AS audit_location,
    auditor_report.type_of_water_source AS auditor_source,
    water_source.type_of_water_source AS surveyor_source,
    visits.record_id,
    auditor_report.true_water_source_score AS auditor_score,
    water_quality.subjective_quality_score AS surveyor_score
FROM
    md_water_services.auditor_report
JOIN
    md_water_services.visits
    ON visits.location_id = auditor_report.location_id
JOIN
    md_water_services.water_quality
    ON visits.record_id = water_quality.record_id
JOIN
    md_water_services.water_source
    ON visits.source_id = water_source.source_id
WHERE 
    visits.visit_count = 1 
    AND auditor_report.true_water_source_score != water_quality.subjective_quality_score;

audit_location,auditor_source,surveyor_source,record_id,auditor_score,surveyor_score
AkRu05215,well,well,21160,3,10
KiRu29290,shared_tap,shared_tap,7938,3,10
KiHa22748,tap_in_home_broken,tap_in_home_broken,43140,9,10
SoRu37841,shared_tap,shared_tap,18495,6,10
KiRu27884,well,well,33931,1,10
KiZu31170,tap_in_home_broken,tap_in_home_broken,17950,9,10
KiZu31370,shared_tap,shared_tap,36864,3,10
AkRu06495,well,well,45924,2,10
HaRu17528,well,well,30524,1,10
SoRu38331,shared_tap,shared_tap,13192,3,10


In [11]:
%%sql
# Create a CTE of employees/surveyors responsible for the errornours scores
WITH Incorrect_records AS (
    SELECT
        auditor_report.location_id AS audit_location,
        visits.record_id,
        employee.employee_name,
        auditor_report.true_water_source_score AS auditor_score,
        water_quality.subjective_quality_score AS surveyor_score
    FROM
        md_water_services.auditor_report
    JOIN
        md_water_services.visits
        ON visits.location_id = auditor_report.location_id
    JOIN
        md_water_services.water_quality
        ON visits.record_id = water_quality.record_id
    JOIN
        md_water_services.employee
        ON visits.assigned_employee_id = employee.assigned_employee_id
    WHERE 
        visits.visit_count = 1 
        AND auditor_report.true_water_source_score != water_quality.subjective_quality_score
)
SELECT *
FROM Incorrect_records;

audit_location,record_id,employee_name,auditor_score,surveyor_score
AkRu05215,21160,Rudo Imani,3,10
KiRu29290,7938,Bello Azibo,3,10
KiHa22748,43140,Bello Azibo,9,10
SoRu37841,18495,Rudo Imani,6,10
KiRu27884,33931,Bello Azibo,1,10
KiZu31170,17950,Zuriel Matembo,9,10
KiZu31370,36864,Yewande Ebele,3,10
AkRu06495,45924,Bello Azibo,2,10
HaRu17528,30524,Jengo Tumaini,1,10
SoRu38331,13192,Zuriel Matembo,3,10


In [12]:
%%sql
# Get the number of surveyors responsible for the errornous data
WITH Incorrect_records AS (
    SELECT
        auditor_report.location_id AS audit_location,
        visits.record_id,
        employee.employee_name,
        auditor_report.true_water_source_score AS auditor_score,
        water_quality.subjective_quality_score AS surveyor_score
    FROM
        md_water_services.auditor_report
    JOIN
        md_water_services.visits
        ON visits.location_id = auditor_report.location_id
    JOIN
        md_water_services.water_quality
        ON visits.record_id = water_quality.record_id
    JOIN
        md_water_services.employee
        ON visits.assigned_employee_id = employee.assigned_employee_id
    WHERE 
        visits.visit_count = 1 
        AND auditor_report.true_water_source_score != water_quality.subjective_quality_score
)
SELECT DISTINCT employee_name
FROM Incorrect_records;

employee_name
Rudo Imani
Bello Azibo
Zuriel Matembo
Yewande Ebele
Jengo Tumaini
Farai Nia
Malachi Mavuso
Makena Thabo
Lalitha Kaburi
Gamba Shani


In [13]:
%%sql
# Count the number of times the employees/surveyors responsible for errornous scores made mistakes
WITH Incorrect_records AS (
    SELECT
        auditor_report.location_id AS audit_location,
        visits.record_id,
        employee.employee_name,
        auditor_report.true_water_source_score AS auditor_score,
        water_quality.subjective_quality_score AS surveyor_score
    FROM
        md_water_services.auditor_report
    JOIN
        md_water_services.visits
        ON visits.location_id = auditor_report.location_id
    JOIN
        md_water_services.water_quality
        ON visits.record_id = water_quality.record_id
    JOIN
        md_water_services.employee
        ON visits.assigned_employee_id = employee.assigned_employee_id
    WHERE 
        visits.visit_count = 1 
        AND auditor_report.true_water_source_score != water_quality.subjective_quality_score
)
SELECT
    employee_name,
    COUNT(employee_name) AS number_of_mistakes
FROM 
    Incorrect_records
GROUP BY
    employee_name
ORDER BY number_of_mistakes DESC;

employee_name,number_of_mistakes
Bello Azibo,26
Malachi Mavuso,21
Zuriel Matembo,17
Lalitha Kaburi,7
Rudo Imani,5
Farai Nia,4
Enitan Zuri,4
Yewande Ebele,3
Jengo Tumaini,3
Makena Thabo,3


In [14]:
%%sql
# Change the Incorrect_records CTE to a SQL VIEW
CREATE VIEW Incorrect_records AS (
    SELECT
        auditor_report.location_id AS audit_location,
        visits.record_id,
        employee.employee_name,
        auditor_report.true_water_source_score AS auditor_score,
        water_quality.subjective_quality_score AS surveyor_score,
        auditor_report.statements AS statements
    FROM
        md_water_services.auditor_report
    JOIN
        md_water_services.visits
        ON visits.location_id = auditor_report.location_id
    JOIN
        md_water_services.water_quality
        ON visits.record_id = water_quality.record_id
    JOIN
        md_water_services.employee
        ON visits.assigned_employee_id = employee.assigned_employee_id
    WHERE 
        visits.visit_count = 1 
        AND auditor_report.true_water_source_score != water_quality.subjective_quality_score
);

In [15]:
%sql SELECT * FROM md_water_services.Incorrect_records;

audit_location,record_id,employee_name,auditor_score,surveyor_score,statements
AkRu05215,21160,Rudo Imani,3,10,"Villagers admired the official's visit for its respectful interactions, hard work, and genuine concern."
KiRu29290,7938,Bello Azibo,3,10,"A young artist sketches the faces in the queue, capturing the weariness of daily hours spent waiting for water."
KiHa22748,43140,Bello Azibo,9,10,"A young girl's hopeful eyes are clouded by mistrust, her innocence tarnished by the corrupt system."
SoRu37841,18495,Rudo Imani,6,10,"The official's respectful and diligent presence was met with heartfelt appreciation, creating a sense of closeness with the villagers."
KiRu27884,33931,Bello Azibo,1,10,"A traditional healer's empathy turns to bitterness, knowing that corrupt practices harm her community."
KiZu31170,17950,Zuriel Matembo,9,10,"A community leader stood with his people, expressing concern for the water quality and the time lost in queues."","""
KiZu31370,36864,Yewande Ebele,3,10,"With a keen understanding of urban challenges, the official's visit left a lasting impression of respect and commitment."
AkRu06495,45924,Bello Azibo,2,10,"A healthcare worker in the queue expressed fears about water-borne diseases, her face etched with worry."","""
HaRu17528,30524,Jengo Tumaini,1,10,"With humility and diligence, the official formed bonds with the villagers that felt like genuine family connections."
SoRu38331,13192,Zuriel Matembo,3,10,"An unsettling atmosphere surrounded the official, as villagers shared their experiences of arrogance and lack of dedication. The mention of cash exchanges only intensified their doubts."


In [16]:
%%sql
# Create a CTE called error count to count the number of errors made by surveyors and compute average no. of mistakes
WITH error_count AS (
    SELECT
        employee_name,
        COUNT(employee_name) AS number_of_mistakes
    FROM
        md_water_services.Incorrect_records
    GROUP BY
        employee_name
    ORDER BY number_of_mistakes DESC
)
SELECT
    AVG(number_of_mistakes)
FROM 
    error_count;

AVG(number_of_mistakes)
6.0


In [17]:
%%sql
# Find employees who made more than the average no. of mistakes
WITH error_count AS (
    SELECT
        employee_name,
        COUNT(employee_name) AS number_of_mistakes
    FROM
        md_water_services.Incorrect_records
    GROUP BY
        employee_name
    ORDER BY number_of_mistakes DESC
)
SELECT
    employee_name,
    number_of_mistakes
FROM
    error_count
WHERE
    number_of_mistakes > (SELECT AVG(number_of_mistakes) FROM error_count);

employee_name,number_of_mistakes
Bello Azibo,26
Malachi Mavuso,21
Zuriel Matembo,17
Lalitha Kaburi,7


In [18]:
%%sql
# Retrieve statements about the above suspect list
WITH error_count AS (
    SELECT
        employee_name,
        COUNT(employee_name) AS number_of_mistakes
    FROM
        md_water_services.Incorrect_records
    GROUP BY
        employee_name
    ORDER BY number_of_mistakes DESC
),
suspect_list AS (
    SELECT
        employee_name,
        number_of_mistakes
    FROM
        error_count
    WHERE
        number_of_mistakes > (SELECT AVG(number_of_mistakes) FROM error_count)
)
SELECT 
    employee_name,
    audit_location,
    statements
FROM
    md_water_services.Incorrect_records
WHERE
    employee_name IN (SELECT employee_name FROM suspect_list);

employee_name,audit_location,statements
Bello Azibo,KiRu29290,"A young artist sketches the faces in the queue, capturing the weariness of daily hours spent waiting for water."
Bello Azibo,KiHa22748,"A young girl's hopeful eyes are clouded by mistrust, her innocence tarnished by the corrupt system."
Bello Azibo,KiRu27884,"A traditional healer's empathy turns to bitterness, knowing that corrupt practices harm her community."
Zuriel Matembo,KiZu31170,"A community leader stood with his people, expressing concern for the water quality and the time lost in queues."","""
Bello Azibo,AkRu06495,"A healthcare worker in the queue expressed fears about water-borne diseases, her face etched with worry."","""
Zuriel Matembo,SoRu38331,"An unsettling atmosphere surrounded the official, as villagers shared their experiences of arrogance and lack of dedication. The mention of cash exchanges only intensified their doubts."
Malachi Mavuso,AmAm09607,Villagers spoke of an unsettling encounter with an official who appeared dismissive and detached. The reference to cash transactions added to their growing sense of distrust.
Zuriel Matembo,AkHa00314,"A street vendor's sales suffer from time spent waiting, her concern for the water's quality affecting her products."
Malachi Mavuso,KiRu26598,"A teenager's dreams are tempered by reality, her future threatened by the corrupt practices she sees around her."
Bello Azibo,KiIs23853,Villagers' wary accounts of an official's arrogance and detachment from their concerns raised suspicions. The mention of cash changing hands further tainted their perception.


In [19]:
%%sql
# Retrieve statements about the above suspect list
WITH error_count AS (
    SELECT
        employee_name,
        COUNT(employee_name) AS number_of_mistakes
    FROM
        md_water_services.Incorrect_records
    GROUP BY
        employee_name
    ORDER BY number_of_mistakes DESC
),
suspect_list AS (
    SELECT
        employee_name,
        number_of_mistakes
    FROM
        error_count
    WHERE
        number_of_mistakes > (SELECT AVG(number_of_mistakes) FROM error_count)
)
SELECT 
    employee_name,
    audit_location,
    statements
FROM
    md_water_services.Incorrect_records
WHERE
    employee_name IN (SELECT employee_name FROM suspect_list)
    AND audit_location IN ("AkRu04508", "AkRu07310", "KiRu29639", "AmAm09607");

employee_name,audit_location,statements
Malachi Mavuso,AmAm09607,Villagers spoke of an unsettling encounter with an official who appeared dismissive and detached. The reference to cash transactions added to their growing sense of distrust.
Bello Azibo,AkRu04508,"An unsettling atmosphere surrounded the official, as villagers shared their experiences of arrogance and lack of dedication. The mention of cash exchanges only intensified their doubts."
Lalitha Kaburi,AkRu07310,"Villagers spoke of their unsettling encounters with an official who seemed indifferent and uninterested, hinting at potential improprieties involving cash exchanges."
Bello Azibo,KiRu29639,An unsettling atmosphere prevailed as villagers shared stories of an official's arrogance and perceived corruption. The mention of cash exchanges only intensified their concerns.


In [21]:
%%sql
# Check for any other surveyor who is not in the suspect list with allegations of bribery
WITH error_count AS (
    SELECT
        employee_name,
        COUNT(employee_name) AS number_of_mistakes
    FROM
        md_water_services.Incorrect_records
    GROUP BY
        employee_name
    ORDER BY number_of_mistakes DESC
),
suspect_list AS (
    SELECT
        employee_name,
        number_of_mistakes
    FROM
        error_count
    WHERE
        number_of_mistakes > (SELECT AVG(number_of_mistakes) FROM error_count)
)
SELECT 
    employee_name,
    audit_location,
    statements
FROM
    md_water_services.Incorrect_records
WHERE
    statements LIKE "%cash%"
    AND employee_name NOT IN (SELECT employee_name FROM suspect_list);

employee_name,audit_location,statements


In [27]:
%%sql
# Check for any other surveyor who is not in the suspect list with allegations of bribery
WITH error_count AS (
    SELECT
        employee_name,
        COUNT(employee_name) AS number_of_mistakes
    FROM
        md_water_services.Incorrect_records
    GROUP BY
        employee_name
    ORDER BY number_of_mistakes DESC
),
suspect_list AS (
    SELECT
        employee_name,
        number_of_mistakes
    FROM
        error_count
    WHERE
        number_of_mistakes > (SELECT AVG(number_of_mistakes) FROM error_count)
)
SELECT 
    employee_name,
    audit_location,
    statements
FROM
    md_water_services.Incorrect_records
WHERE
    statements LIKE "%Suspicion%";

employee_name,audit_location,statements
Bello Azibo,KiIs23853,Villagers' wary accounts of an official's arrogance and detachment from their concerns raised suspicions. The mention of cash changing hands further tainted their perception.
Zuriel Matembo,AkRu05880,Villagers' wary accounts of an official's arrogance and detachment from their concerns raised suspicions. The allusion to cash changing hands deepened their skepticism.
Lalitha Kaburi,KiRu29329,Suspicion colored villagers' descriptions of an official's aloof demeanor and apparent laziness. The reference to cash transactions cast doubt on their motives.
Bello Azibo,KiMr24919,Suspicion and unease colored the villagers' accounts of an official's haughty behavior and potential corruption. The mention of cash changing hands added to their apprehension.
Zuriel Matembo,HaSe20888,Suspicion and unease colored the villagers' accounts of an official's haughty behavior and potential corruption. The mention of cash changing hands added to their apprehension.
Malachi Mavuso,AmRu15719,Suspicion and unease colored the villagers' accounts of an official's haughty behavior and potential corruption. The mention of cash changing hands added to their apprehension.
