# Weaving the Data Threads of Maji Ndogo's Narrative

## Introduction

In this third part of the integrated project, we will pull data from many different tables and apply some statistical analyses to examine the consequences of an audit report that cross-references a random sample of records.

## Notebook setup

In [1]:
# Load the sql extension
%load_ext sql

In [2]:
# Create a connection to the mysql 'md_water_services' database
%sql mysql+pymysql://root:password@localhost:3306/md_water_services

## Maji Ndogo Water Services ERD

An audit has been conducted and we want to integrate the audit into the database. For us to integrate it successfully, we need to examine our current ERD of the database thoroughly to understand the relationships between the tables.

![The Maji Ndogo Water Services ERD!](../assets/md_water_services_erd.png)

## ERD Investigative Analysis

From the ERD above, we can see that the visits table is the central table connecting other tables together.

- `location_id` is the **PRIMARY KEY** in the `location` table and a **FOREIGN KEY** in the `visits` table.
- `source_id` is the **PRIMARY KEY** in the `water_source` table and a **FOREIGN KEY** in the `visits` table.
- `assigned_employee_id` is the **PRIMARY KEY** in the `employee` table and a **FOREIGN KEY** in the `visits` table.

In a nutshell, the `visits` table logs **multiple** instances that a unique `location` was visited by a unique `employee` with interest to a particular `water_source`, hence the relationship between the three tables with the `visits` tables exudes a **one-to-many** relationship.

However, according to the ERD, the relationship between the `visits` table and `water_quality` table is a **one-to-many** relationship and yet according to our initial understanding, there should be one unique corresponding record in a water quality table to that of the `visits` table aluding to a potential error in the representation of a relationship between the two tables. hence we need to correct that.

![The Updated Maji Ndogo Water Services ERD!](../assets/updated_md_water_services_erd.png)

## Importing the Auditor's Report

Now that we have a proper representation of the relationships in our database, we can proceed to import the data from the auditor's report which is in a `.csv` format. To do this, we need to follow the steps below:

1. Create an empty `auditor_report` table in the `md_water_services` database. To do this we run the following in **MySQL Workbench**:

```sql
DROP TABLE IF EXISTS `auditor_report`;

CREATE TABLE `auditor_report` (
    `location_id` VARCHAR(32),
    `type_of_water_source` VARCHAR(64),
    `true_water_source_score` INT DEFAULT NULL,
    `statements` VARCHAR(255)
);
```

2. [Import]('https://www.youtube.com/watch?v=sfRwJH04QJc') the data sent by the auditor in `.csv` format on **MySQL Workbench**. Remember to use an existing table since we've already created an empty table from the first step.

## Auditor's Report Integration

After we've imported the auditor's report into a SQL database, at first glance, we can see that it has **1620** records alluding to all revisited sites by the auditor. The report also has the following attributes:

- `location_id` for the revisited locations.
- `type_of_water_source` that was visited by the auditor.
- `true_water_source_score` assigned by the auditor as a measure of water quality.
- `statements` captured while the auditor investigated each site by speaking to locals at the sites.

Based on the auditors report, we can perform a comparative analysis against the surveyors records.

## Questions to Answer

Now that we have all our data in one place, let's try and answer the following questions:

1. Is there a difference in scores between the auditor's report and the data provided by the surveyors?
2. If a difference exists, is there a pattern we can identify?

## Investigating Differences in Auditor & Surveyor Scores

To investigate if there are differences between auditor and surveyor water quality scores, we will have to perform some joins. Consider the following:

- The `auditor_report` table in our database has a `location_id` attribute but the `water_quality` table only has `record_id` attribute which consides with a similar attribute in the `visits` table.
- The `visits` table has both `location_id` and `record_id` attributes hence it is the perfect table to make the join and link the `auditor_report` and the `water_quality` tables.

In [3]:
%sql SELECT * FROM md_water_services.auditor_report;

location_id,type_of_water_source,true_water_source_score,statements
SoRu34980,well,1,"Residents admired the official's commitment to enhancing urban life, praising their cooperative and inclusive approach."
AkRu08112,well,3,"Villagers spoke highly of the official's dedication and genuine interest in their lives, fostering a sense of belonging and appreciation."
AkLu02044,river,0,"Villagers were touched by the official's interactions, noting their humility, strong work ethic, and respectful attitude."
AkHa00421,well,3,"Villagers were moved by the official's visit, praising their hard work, humility, and the profound sense of connection they fostered."
SoRu35221,river,0,"A photographer's lens captures the queue, though his own struggle for water is a hidden part of the story."
HaAm16170,well,1,"With an open heart, the official created an atmosphere of unity and familial camaraderie among the villagers."
AkRu04812,well,3,"The official's presence left an indelible mark, reflecting their humility, dedication, and the genuine connections they nurtured."
AkRu08304,well,3,"The official's interactions resonated deeply with the villagers, leaving a lasting impression of respect and camaraderie."
AkRu05107,well,2,"Villagers spoke highly of the official's dedication and genuine interest in their lives, fostering a sense of belonging and appreciation."
AkRu05215,well,3,"Villagers admired the official's visit for its respectful interactions, hard work, and genuine concern."


We confirm that the auditor's report contains a total of **1620** records shown by the output from running the cell above. These are unique locations where the auditor revisited and reassigned water quality scores. 
With that out of the way, there are a couple of steps we need to take to build the right SQL query to get the information we need from the various tables. Let's start with the following:

1. Grab the `location_id` and `true_water_source_score` attributes from the `auditor_report` entity.
2. **JOIN** the `visits` entity to the `auditor_report` on the shared `location_id` attribute to get access to the `record_id` attribute.
3. **JOIN** the `water_quality` entity on the `record_id` attribute to retrieve the corresponding `subjective_quality_scores` attribute in the `water_quality` entity.
4. Clean the Resulting table by removing unnecessary redundant columns and renaming the score columns as `surveyor_score` and `auditor_score` respectively
5. Check if the `surveyor_score` and `auditor_score` are different using a `WHERE` clause in the query

In [4]:
%%sql
# Check the differences between the auditor's scores and surveyor's score
SELECT
    auditor_report.location_id AS audit_location,
    visits.record_id,
    auditor_report.true_water_source_score AS auditor_score,
    water_quality.subjective_quality_score AS surveyor_score
FROM
    md_water_services.auditor_report
JOIN
    md_water_services.visits
    ON visits.location_id = auditor_report.location_id
JOIN
    md_water_services.water_quality
    ON visits.record_id = water_quality.record_id
WHERE auditor_report.true_water_source_score = water_quality.subjective_quality_score;

audit_location,record_id,auditor_score,surveyor_score
SoRu34980,5185,1,1
AkRu08112,59367,3,3
AkLu02044,37379,0,0
AkHa00421,51627,3,3
SoRu35221,28758,0,0
HaAm16170,31048,1,1
AkRu04812,1513,3,3
AkRu08304,1218,3,3
AkRu05107,8322,2,2
HaDe16541,13070,2,2


We get a result of **2505** records, but remember that the auditor's records had **1620** records. This is because in the `visits` entity, various locations were visited more than once meaning that there are duplicate records in our resulting dataset. To remove the duplicate records we specify that we only want records of locations that were visited once in the `WHERE` clause

In [5]:
%%sql
# Remove the records of locations removed more than once in the visits table
SELECT
    auditor_report.location_id AS audit_location,
    visits.record_id,
    auditor_report.true_water_source_score AS auditor_score,
    water_quality.subjective_quality_score AS surveyor_score
FROM
    md_water_services.auditor_report
JOIN
    md_water_services.visits
    ON visits.location_id = auditor_report.location_id
JOIN
    md_water_services.water_quality
    ON visits.record_id = water_quality.record_id
WHERE 
    visits.visit_count = 1 
    AND auditor_report.true_water_source_score = water_quality.subjective_quality_score;

audit_location,record_id,auditor_score,surveyor_score
SoRu34980,5185,1,1
AkRu08112,59367,3,3
AkLu02044,37379,0,0
AkHa00421,51627,3,3
SoRu35221,28758,0,0
HaAm16170,31048,1,1
AkRu04812,1513,3,3
AkRu08304,1218,3,3
AkRu05107,8322,2,2
HaDe16541,13070,2,2


Once we've filtered out the duplicate records, we now have a clear view of **1518** surveyor records that match those of the auditor. We can calculate the approximate percentage of accurate scores recorded by the surveyors per the auditor's report. 

In [6]:
# Calculate the percentage of the corrected scores from the surveyors according the to the auditor
print(f"Approximately {round((1518 / 1620) * 100)}% of surveyors scores were correct according to the auditor's records")

Approximately 94% of surveyors scores were correct according to the auditor's records


We now know that a huge percentage of the surveyors scores were accurate, what of the remaining ~6%? Let's have a look at those.

## Investigating Difference Patterns in Auditor & Surveyor Scores

We filter the resulting dataset to display the records where the `auditor_score` differed from the `surveyor_score`. In our previous analysis, we did not focus on water quality scores but investigated various types of water sources. 
Let's also include the `type_of_water_source` attribute from the `water_source` entity by **JOIN**ing the entity in our resulting dataset on `source_id` attribute just to make sure there are no discrepancies in that area.

In [7]:
%%sql
# Let's check records that were incorrect including the water sources information
SELECT
    auditor_report.location_id AS audit_location,
    auditor_report.type_of_water_source AS auditor_source,
    water_source.type_of_water_source AS surveyor_source,
    visits.record_id,
    auditor_report.true_water_source_score AS auditor_score,
    water_quality.subjective_quality_score AS surveyor_score
FROM
    md_water_services.auditor_report
JOIN
    md_water_services.visits
    ON visits.location_id = auditor_report.location_id
JOIN
    md_water_services.water_quality
    ON visits.record_id = water_quality.record_id
JOIN
    md_water_services.water_source
    ON visits.source_id = water_source.source_id
WHERE 
    visits.visit_count = 1 
    AND auditor_report.true_water_source_score != water_quality.subjective_quality_score;

audit_location,auditor_source,surveyor_source,record_id,auditor_score,surveyor_score
AkRu05215,well,well,21160,3,10
KiRu29290,shared_tap,shared_tap,7938,3,10
KiHa22748,tap_in_home_broken,tap_in_home_broken,43140,9,10
SoRu37841,shared_tap,shared_tap,18495,6,10
KiRu27884,well,well,33931,1,10
KiZu31170,tap_in_home_broken,tap_in_home_broken,17950,9,10
KiZu31370,shared_tap,shared_tap,36864,3,10
AkRu06495,well,well,45924,2,10
HaRu17528,well,well,30524,1,10
SoRu38331,shared_tap,shared_tap,13192,3,10


As we can see from the resulting dataset, there is a total of **102** records. We can also see that there are no differences between they types of water sources from the auditor report and the surveyors records being compared, so we are safe to proceed with our investigation on that account.
Moving forward, the SQL query we created will definitely get more complex so let's see how we can make our lives easier by using **Common Table Expressions (CTEs)**. To do this, we wrap the query we have in a **CTE** and name the expression as `Incorrect_records`.
This allows us to break our query down into simpler & more digestable steps. It also enables us to use the referenced query as if it was a table.

> **NOTE:** **CTEs** are not actual tables hence, they do **NOT** store data.

While we're on it, let us add the `employee` entity and **JOIN** it to our resulting dataset on the `assigned_employee_id` attribute so we can have a first look at the surveyors who recorded inaccurate scores per the auditor's report.

In [8]:
%%sql
# Create a CTE of employees/surveyors responsible for the errornours scores
WITH Incorrect_records AS (
    SELECT
        auditor_report.location_id AS audit_location,
        visits.record_id,
        employee.employee_name,
        auditor_report.true_water_source_score AS auditor_score,
        water_quality.subjective_quality_score AS surveyor_score
    FROM
        md_water_services.auditor_report
    JOIN
        md_water_services.visits
        ON visits.location_id = auditor_report.location_id
    JOIN
        md_water_services.water_quality
        ON visits.record_id = water_quality.record_id
    JOIN
        md_water_services.employee
        ON visits.assigned_employee_id = employee.assigned_employee_id
    WHERE 
        visits.visit_count = 1 
        AND auditor_report.true_water_source_score != water_quality.subjective_quality_score
)
SELECT *
FROM Incorrect_records;

audit_location,record_id,employee_name,auditor_score,surveyor_score
AkRu05215,21160,Rudo Imani,3,10
KiRu29290,7938,Bello Azibo,3,10
KiHa22748,43140,Bello Azibo,9,10
SoRu37841,18495,Rudo Imani,6,10
KiRu27884,33931,Bello Azibo,1,10
KiZu31170,17950,Zuriel Matembo,9,10
KiZu31370,36864,Yewande Ebele,3,10
AkRu06495,45924,Bello Azibo,2,10
HaRu17528,30524,Jengo Tumaini,1,10
SoRu38331,13192,Zuriel Matembo,3,10


Our `Incorrect_records` **CTE** works, we notice from the output above that some surveyor names repeat themselves across the records which means that some surveyors messed up more than once. First let's take a look at the unique number of surveyors that recorded inaccurate scores.

In [9]:
%%sql
# Get the number of surveyors responsible for the errornous data
WITH Incorrect_records AS (
    SELECT
        auditor_report.location_id AS audit_location,
        visits.record_id,
        employee.employee_name,
        auditor_report.true_water_source_score AS auditor_score,
        water_quality.subjective_quality_score AS surveyor_score
    FROM
        md_water_services.auditor_report
    JOIN
        md_water_services.visits
        ON visits.location_id = auditor_report.location_id
    JOIN
        md_water_services.water_quality
        ON visits.record_id = water_quality.record_id
    JOIN
        md_water_services.employee
        ON visits.assigned_employee_id = employee.assigned_employee_id
    WHERE 
        visits.visit_count = 1 
        AND auditor_report.true_water_source_score != water_quality.subjective_quality_score
)
SELECT DISTINCT employee_name
FROM Incorrect_records;

employee_name
Rudo Imani
Bello Azibo
Zuriel Matembo
Yewande Ebele
Jengo Tumaini
Farai Nia
Malachi Mavuso
Makena Thabo
Lalitha Kaburi
Gamba Shani


**17** Surveyors have errornous records from the output above. We take our analysis further by aggregating the number of times each surveyor recorded inaccurate scores using the `COUNT` function and grouping the results by `employee_name` attribute.

In [10]:
%%sql
# Count the number of times the employees/surveyors responsible for errornous scores made mistakes
WITH Incorrect_records AS (
    SELECT
        auditor_report.location_id AS audit_location,
        visits.record_id,
        employee.employee_name,
        auditor_report.true_water_source_score AS auditor_score,
        water_quality.subjective_quality_score AS surveyor_score
    FROM
        md_water_services.auditor_report
    JOIN
        md_water_services.visits
        ON visits.location_id = auditor_report.location_id
    JOIN
        md_water_services.water_quality
        ON visits.record_id = water_quality.record_id
    JOIN
        md_water_services.employee
        ON visits.assigned_employee_id = employee.assigned_employee_id
    WHERE 
        visits.visit_count = 1 
        AND auditor_report.true_water_source_score != water_quality.subjective_quality_score
)
SELECT
    employee_name,
    COUNT(employee_name) AS number_of_mistakes
FROM 
    Incorrect_records
GROUP BY
    employee_name
ORDER BY number_of_mistakes DESC;

employee_name,number_of_mistakes
Bello Azibo,26
Malachi Mavuso,21
Zuriel Matembo,17
Lalitha Kaburi,7
Rudo Imani,5
Farai Nia,4
Enitan Zuri,4
Yewande Ebele,3
Jengo Tumaini,3
Makena Thabo,3


We can see that **Bello Azibo** had that highest number of mistakes at **26**, followed by **Malachi Mavuso** at **21**. This might be due to genuine errors or just plain old malice but we don't know for sure. We need to do some more investigation to see the root cause of the errornous input from our surveyors. 
To make our work even simpler, let's transform the `Incorrect_records` **CTE** into a SQL **VIEW** and add `statements` attribute from the `auditor_report` entity while we're at it.

> **NOTE:** Only run the cell below once and then comment out the SQL query since, ideally, the view should be created once. When you restart the notebook and run it again, you will ecounter an error since the VIEW already exists in the database by the second run.

In [11]:
# %%sql
# Change the Incorrect_records CTE to a SQL VIEW
# CREATE VIEW Incorrect_records AS (
#     SELECT
#         auditor_report.location_id AS audit_location,
#         visits.record_id,
#         employee.employee_name,
#         auditor_report.true_water_source_score AS auditor_score,
#         water_quality.subjective_quality_score AS surveyor_score,
#         auditor_report.statements AS statements
#     FROM
#         md_water_services.auditor_report
#     JOIN
#         md_water_services.visits
#         ON visits.location_id = auditor_report.location_id
#     JOIN
#         md_water_services.water_quality
#         ON visits.record_id = water_quality.record_id
#     JOIN
#         md_water_services.employee
#         ON visits.assigned_employee_id = employee.assigned_employee_id
#     WHERE 
#         visits.visit_count = 1 
#         AND auditor_report.true_water_source_score != water_quality.subjective_quality_score
# );

Once we've transformed the **CTE** into a **VIEW**, we can test it out to see if our view works as expected.

In [12]:
%sql SELECT * FROM md_water_services.Incorrect_records;

audit_location,record_id,employee_name,auditor_score,surveyor_score,statements
AkRu05215,21160,Rudo Imani,3,10,"Villagers admired the official's visit for its respectful interactions, hard work, and genuine concern."
KiRu29290,7938,Bello Azibo,3,10,"A young artist sketches the faces in the queue, capturing the weariness of daily hours spent waiting for water."
KiHa22748,43140,Bello Azibo,9,10,"A young girl's hopeful eyes are clouded by mistrust, her innocence tarnished by the corrupt system."
SoRu37841,18495,Rudo Imani,6,10,"The official's respectful and diligent presence was met with heartfelt appreciation, creating a sense of closeness with the villagers."
KiRu27884,33931,Bello Azibo,1,10,"A traditional healer's empathy turns to bitterness, knowing that corrupt practices harm her community."
KiZu31170,17950,Zuriel Matembo,9,10,"A community leader stood with his people, expressing concern for the water quality and the time lost in queues."","""
KiZu31370,36864,Yewande Ebele,3,10,"With a keen understanding of urban challenges, the official's visit left a lasting impression of respect and commitment."
AkRu06495,45924,Bello Azibo,2,10,"A healthcare worker in the queue expressed fears about water-borne diseases, her face etched with worry."","""
HaRu17528,30524,Jengo Tumaini,1,10,"With humility and diligence, the official formed bonds with the villagers that felt like genuine family connections."
SoRu38331,13192,Zuriel Matembo,3,10,"An unsettling atmosphere surrounded the official, as villagers shared their experiences of arrogance and lack of dedication. The mention of cash exchanges only intensified their doubts."


Our **VIEW** works perfectly. Let's go ahead and transform the query to calculate the number of errors made by each employee into a **CTE** called `error_count` and use it to compute how many mistakes were made by the **17** surveyors on average using the `AVG` function.

In [13]:
%%sql
# Create a CTE called error count to count the number of errors made by surveyors and compute average no. of mistakes
WITH error_count AS (
    SELECT
        employee_name,
        COUNT(employee_name) AS number_of_mistakes
    FROM
        md_water_services.Incorrect_records
    GROUP BY
        employee_name
    ORDER BY number_of_mistakes DESC
)
SELECT
    AVG(number_of_mistakes)
FROM 
    error_count;

AVG(number_of_mistakes)
6.0


From the output above, we see that on average, **6** mistakes were made. Let's find out employees that made significant amount of errors by filtering out employees that made more than the average mistakes. we can use a subquery for this one in the `WHERE` clause.

In [14]:
%%sql
# Find employees who made more than the average no. of mistakes
WITH error_count AS (
    SELECT
        employee_name,
        COUNT(employee_name) AS number_of_mistakes
    FROM
        md_water_services.Incorrect_records
    GROUP BY
        employee_name
    ORDER BY number_of_mistakes DESC
)
SELECT
    employee_name,
    number_of_mistakes
FROM
    error_count
WHERE
    number_of_mistakes > (SELECT AVG(number_of_mistakes) FROM error_count);

employee_name,number_of_mistakes
Bello Azibo,26
Malachi Mavuso,21
Zuriel Matembo,17
Lalitha Kaburi,7


Now we are only left with 4 surveyors with above average number of mistakes. Since we have the statements recorded by the auditor during his visits to various sites, we can have a look at what locals had to say about these surveyors. First let's wrap the query to get the above average no. of mistakes employees into a **CTE** called `suspect_list` and then use it in a subquery in the `WHERE` clause as demonstrated in the code cell below.

In [15]:
%%sql
# Retrieve statements about the above suspect list
WITH error_count AS (
    SELECT
        employee_name,
        COUNT(employee_name) AS number_of_mistakes
    FROM
        md_water_services.Incorrect_records
    GROUP BY
        employee_name
    ORDER BY number_of_mistakes DESC
),
suspect_list AS (
    SELECT
        employee_name,
        number_of_mistakes
    FROM
        error_count
    WHERE
        number_of_mistakes > (SELECT AVG(number_of_mistakes) FROM error_count)
)
SELECT 
    employee_name,
    audit_location,
    statements
FROM
    md_water_services.Incorrect_records
WHERE
    employee_name IN (SELECT employee_name FROM suspect_list);

employee_name,audit_location,statements
Bello Azibo,KiRu29290,"A young artist sketches the faces in the queue, capturing the weariness of daily hours spent waiting for water."
Bello Azibo,KiHa22748,"A young girl's hopeful eyes are clouded by mistrust, her innocence tarnished by the corrupt system."
Bello Azibo,KiRu27884,"A traditional healer's empathy turns to bitterness, knowing that corrupt practices harm her community."
Zuriel Matembo,KiZu31170,"A community leader stood with his people, expressing concern for the water quality and the time lost in queues."","""
Bello Azibo,AkRu06495,"A healthcare worker in the queue expressed fears about water-borne diseases, her face etched with worry."","""
Zuriel Matembo,SoRu38331,"An unsettling atmosphere surrounded the official, as villagers shared their experiences of arrogance and lack of dedication. The mention of cash exchanges only intensified their doubts."
Malachi Mavuso,AmAm09607,Villagers spoke of an unsettling encounter with an official who appeared dismissive and detached. The reference to cash transactions added to their growing sense of distrust.
Zuriel Matembo,AkHa00314,"A street vendor's sales suffer from time spent waiting, her concern for the water's quality affecting her products."
Malachi Mavuso,KiRu26598,"A teenager's dreams are tempered by reality, her future threatened by the corrupt practices she sees around her."
Bello Azibo,KiIs23853,Villagers' wary accounts of an official's arrogance and detachment from their concerns raised suspicions. The mention of cash changing hands further tainted their perception.


From our resulting dataset, we can see some statements alluding to surveyor malpractice ranging from a corruption to arrogance. let's zoom in on specific audit locations using the `location_id` in the resulting dataset and see the kinds of statements recorded by the auditor from locals.

In [16]:
%%sql
# Retrieve statements about the above suspect list
WITH error_count AS (
    SELECT
        employee_name,
        COUNT(employee_name) AS number_of_mistakes
    FROM
        md_water_services.Incorrect_records
    GROUP BY
        employee_name
    ORDER BY number_of_mistakes DESC
),
suspect_list AS (
    SELECT
        employee_name,
        number_of_mistakes
    FROM
        error_count
    WHERE
        number_of_mistakes > (SELECT AVG(number_of_mistakes) FROM error_count)
)
SELECT 
    employee_name,
    audit_location,
    statements
FROM
    md_water_services.Incorrect_records
WHERE
    employee_name IN (SELECT employee_name FROM suspect_list)
    AND audit_location IN ("AkRu04508", "AkRu07310", "KiRu29639", "AmAm09607");

employee_name,audit_location,statements
Malachi Mavuso,AmAm09607,Villagers spoke of an unsettling encounter with an official who appeared dismissive and detached. The reference to cash transactions added to their growing sense of distrust.
Bello Azibo,AkRu04508,"An unsettling atmosphere surrounded the official, as villagers shared their experiences of arrogance and lack of dedication. The mention of cash exchanges only intensified their doubts."
Lalitha Kaburi,AkRu07310,"Villagers spoke of their unsettling encounters with an official who seemed indifferent and uninterested, hinting at potential improprieties involving cash exchanges."
Bello Azibo,KiRu29639,An unsettling atmosphere prevailed as villagers shared stories of an official's arrogance and perceived corruption. The mention of cash exchanges only intensified their concerns.


We've uncovered that surveyor's in our `suspect_list` had a couple of malpractices some taking part in corruption and bribery. Let's confirm to see any other surveyor who also took part in the same but were not included in our `suspect_list`.

In [17]:
%%sql
# Check for any other surveyor who is not in the suspect list with allegations of bribery
WITH error_count AS (
    SELECT
        employee_name,
        COUNT(employee_name) AS number_of_mistakes
    FROM
        md_water_services.Incorrect_records
    GROUP BY
        employee_name
    ORDER BY number_of_mistakes DESC
),
suspect_list AS (
    SELECT
        employee_name,
        number_of_mistakes
    FROM
        error_count
    WHERE
        number_of_mistakes > (SELECT AVG(number_of_mistakes) FROM error_count)
)
SELECT 
    employee_name,
    audit_location,
    statements
FROM
    md_water_services.Incorrect_records
WHERE
    statements LIKE "%cash%"
    AND employee_name NOT IN (SELECT employee_name FROM suspect_list);

employee_name,audit_location,statements


From the output above, it is evident that there were no other surveyors involved in briberty and corruption.

## Conclusion

So we can sum up the evidence we have for Zuriel Matembo, Malachi Mavuso, Bello Azibo and Lalitha Kaburi:

1. ﻿﻿﻿They all made more mistakes than their peers on average.
2. ﻿﻿﻿They all have incriminating statements made against them, and only them.

Keep in mind, that this is not decisive proof.