# Integrated project 3
Weaving the data threads of Maji Ndogo's narrative
What will we be doing?
1. Deal with some of the realities faced in many countries. 
2. Draw from different data sources to deepen the analysis into Maji 
Ndogo’s crisis.
3. Use advanced SQL tools to assemble the pieces of an audit together. 

This part of the integrated project focuses on integrating and comparing an independent auditor's report with the records in the already existing database to check for accuracy.

First, we have to load our database.

In [1]:
%reload_ext sql

In [2]:
%sql mysql+pymysql://root:Pk_0270197907@localhost:3306/md_water_services

To be sure we are connected to the database, let us try to retrieve records from a table in the database.

First  of all, let's create a table where we can import the auditor's report into.

In [3]:
%%sql
DROP TABLE IF EXISTS `auditor_report`;

CREATE TABLE 
    `auditor_report` (
    `location_id` VARCHAR(32),
    `type_of_water_source` VARCHAR(64),
    `true_water_source_score` int DEFAULT NULL,
    `statements` VARCHAR(255)
);

The table has been created and the report imported using MySQL Workbench.

In [4]:
%%sql
SELECT
    *
FROM
    employee;

assigned_employee_id,employee_name,phone_number,email,address,province_name,town_name,position
0,Amara Jengo,99637993287,amara.jengo@ndogowater.gov,36 Pwani Mchangani Road,Sokoto,Ilanga,Field Surveyor
1,Bello Azibo,99643864786,bello.azibo@ndogowater.gov,129 Ziwa La Kioo Road,Kilimani,Rural,Field Surveyor
2,Bakari Iniko,99222599041,bakari.iniko@ndogowater.gov,18 Mlima Tazama Avenue,Hawassa,Rural,Field Surveyor
3,Malachi Mavuso,99945849900,malachi.mavuso@ndogowater.gov,100 Mogadishu Road,Akatsi,Lusaka,Field Surveyor
4,Cheche Buhle,99381679640,cheche.buhle@ndogowater.gov,1 Savanna Street,Akatsi,Rural,Field Surveyor
5,Zuriel Matembo,99034075111,zuriel.matembo@ndogowater.gov,26 Bahari Ya Faraja Road,Kilimani,Rural,Field Surveyor
6,Deka Osumare,99379364631,deka.osumare@ndogowater.gov,104 Kenyatta Street,Akatsi,Rural,Field Surveyor
7,Lalitha Kaburi,99681623240,lalitha.kaburi@ndogowater.gov,145 Sungura Amanpour Road,Kilimani,Rural,Field Surveyor
8,Enitan Zuri,99248509202,enitan.zuri@ndogowater.gov,117 Kampala Road,Hawassa,Zanzibar,Field Surveyor
10,Farai Nia,99570082739,farai.nia@ndogowater.gov,33 Angélique Kidjo Avenue,Amanzi,Dahabu,Field Surveyor


# Integrating the report
The first part of the assignmentis add the auditor's report to our database.
To do this, we need to tackle a couple of questions;
1. Is there a difference in the scores?
2. If so, are there patterns?

For the first question, we will have to compare the quality scores in the water_quality table to the auditor's scores. The auditor_report table used location_id, but the quality scores table only has a record_id we can use. The visits table links location_id and record_id, so we can link the auditor_report table and water_quality using the visits table.

So first, grab the location_id and true_water_source_score columns from auditor_report.

In [6]:
%%sql
SELECT
    location_id,
    true_water_source_score
FROM
    auditor_report
ORDER BY
    location_ID;

location_id,true_water_source_score
AkHa00008,3
AkHa00053,9
AkHa00058,3
AkHa00068,3
AkHa00073,3
AkHa00088,1
AkHa00113,3
AkHa00168,9
AkHa00172,2
AkHa00193,9


Now, we join the visits table to the auditor_report table. Make sure to grab subjective_quality_score, record_id and location_id.

In [7]:
%%sql
SELECT
    auditor_report.location_id AS audit_location,
    auditor_report.true_water_source_score,
    visits.location_id AS visit_location,
    visits.record_id
FROM
    auditor_report
JOIN
    visits
ON
    auditor_report.location_id = visits.location_id;

audit_location,true_water_source_score,visit_location,record_id
SoRu34980,1,SoRu34980,5185
AkRu08112,3,AkRu08112,59367
AkLu02044,0,AkLu02044,37379
AkHa00421,3,AkHa00421,51627
SoRu35221,0,SoRu35221,28758
HaAm16170,1,HaAm16170,31048
AkRu04812,3,AkRu04812,1513
AkRu08304,3,AkRu08304,1218
AkRu05107,2,AkRu05107,8322
AkRu05215,3,AkRu05215,21160


Now that we have the record_id for each location, our next step is to retrieve the corresponding scores from the water_quality table. We
are particularly interested in the subjective_quality_score. To do this, we'll JOIN the visits table and the water_quality table, using the
record_id as the connecting key.

In [8]:
%%sql
SELECT
    auditor_report.location_id AS audit_location,
    auditor_report.true_water_source_score,
    visits.location_id AS visit_location,
    visits.record_id,
    water_quality.subjective_quality_score
FROM
    auditor_report
JOIN
    visits
ON
    auditor_report.location_id = visits.location_id
JOIN
    water_quality
ON
    visits.record_id=water_quality.record_id;

audit_location,true_water_source_score,visit_location,record_id,subjective_quality_score
SoRu34980,1,SoRu34980,5185,1
AkRu08112,3,AkRu08112,59367,3
AkLu02044,0,AkLu02044,37379,0
AkHa00421,3,AkHa00421,51627,3
SoRu35221,0,SoRu35221,28758,0
HaAm16170,1,HaAm16170,31048,1
AkRu04812,3,AkRu04812,1513,3
AkRu08304,3,AkRu08304,1218,3
AkRu05107,2,AkRu05107,8322,2
AkRu05215,3,AkRu05215,21160,10


Since it is a duplicate, we can drop one of the location_id columns. Let's leave record_id and rename the scores to surveyor_score and auditor_score to make it clear which scores we're looking at in the results set.

In [9]:
%%sql
SELECT
    auditor_report.location_id AS location_id,
    visits.record_id,
    auditor_report.true_water_source_score AS auditor_score,
    water_quality.subjective_quality_score AS surveyor_score   
FROM
    auditor_report
JOIN
    visits
ON
    auditor_report.location_id = visits.location_id
JOIN
    water_quality
ON
    visits.record_id=water_quality.record_id;

location_id,record_id,auditor_score,surveyor_score
SoRu34980,5185,1,1
AkRu08112,59367,3,3
AkLu02044,37379,0,0
AkHa00421,51627,3,3
SoRu35221,28758,0,0
HaAm16170,31048,1,1
AkRu04812,1513,3,3
AkRu08304,1218,3,3
AkRu05107,8322,2,2
AkRu05215,21160,3,10


In [11]:
%config SqlMagic.displaylimit = None

Now, let's try to determine if there is a difference in the scores. Note that some of sites were visited multiple times, so there can be duplicates.

In [12]:
%%sql
SELECT
    auditor_report.location_id AS location_id,
    visits.record_id,
    auditor_report.true_water_source_score AS auditor_score,
    water_quality.subjective_quality_score AS surveyor_score   
FROM
    auditor_report
JOIN
    visits
ON
    auditor_report.location_id = visits.location_id
JOIN
    water_quality
ON
    visits.record_id=water_quality.record_id
WHERE
    auditor_report.true_water_source_score=water_quality.subjective_quality_score
AND
    visits.visit_count=1

location_id,record_id,auditor_score,surveyor_score
SoRu34980,5185,1,1
AkRu08112,59367,3,3
AkLu02044,37379,0,0
AkHa00421,51627,3,3
SoRu35221,28758,0,0
HaAm16170,31048,1,1
AkRu04812,1513,3,3
AkRu08304,1218,3,3
AkRu05107,8322,2,2
HaDe16541,13070,2,2


With the duplicates removed I now get 1518. What does this mean considering the auditor visited 1620 sites?

I think that is an excellent result. 1518/1620 * 100 = 94% of the records the auditor checked were correct!

But that means that 102 records are incorrect. So let's look at those. This can be done by adding one character in the last query.

In [13]:
%%sql
SELECT
    auditor_report.location_id AS location_id,
    visits.record_id,
    auditor_report.true_water_source_score AS auditor_score,
    water_quality.subjective_quality_score AS surveyor_score   
FROM
    auditor_report
JOIN
    visits
ON
    auditor_report.location_id = visits.location_id
JOIN
    water_quality
ON
    visits.record_id=water_quality.record_id
WHERE
    auditor_report.true_water_source_score!=water_quality.subjective_quality_score
AND
    visits.visit_count=1;

location_id,record_id,auditor_score,surveyor_score
AkRu05215,21160,3,10
KiRu29290,7938,3,10
KiHa22748,43140,9,10
SoRu37841,18495,6,10
KiRu27884,33931,1,10
KiZu31170,17950,9,10
KiZu31370,36864,3,10
AkRu06495,45924,2,10
HaRu17528,30524,1,10
SoRu38331,13192,3,10


Since we used some of this data in our previous analyses, we need to make sure those results are still valid, now we know some of them are incorrect. We didn't use the scores that much, but we relied a lot on the type_of_water_source, so let's check if there are any errors there.

In [14]:
%%sql
SELECT
    auditor_report.location_id AS location_id,
    auditor_report.type_of_water_source AS auditor_source,
    water_source.type_of_water_source AS survey_source,
    visits.record_id,
    auditor_report.true_water_source_score AS auditor_score,
    water_quality.subjective_quality_score AS surveyor_score   
FROM
    auditor_report
JOIN
    visits
ON
    auditor_report.location_id = visits.location_id
JOIN
    water_quality
ON
    visits.record_id = water_quality.record_id
JOIN
    water_source
ON
    visits.source_id = water_source.source_id
WHERE
    auditor_report.true_water_source_score!=water_quality.subjective_quality_score
AND
    visits.visit_count=1;

location_id,auditor_source,survey_source,record_id,auditor_score,surveyor_score
AkRu05215,well,well,21160,3,10
KiRu29290,shared_tap,shared_tap,7938,3,10
KiHa22748,tap_in_home_broken,tap_in_home_broken,43140,9,10
SoRu37841,shared_tap,shared_tap,18495,6,10
KiRu27884,well,well,33931,1,10
KiZu31170,tap_in_home_broken,tap_in_home_broken,17950,9,10
KiZu31370,shared_tap,shared_tap,36864,3,10
AkRu06495,well,well,45924,2,10
HaRu17528,well,well,30524,1,10
SoRu38331,shared_tap,shared_tap,13192,3,10


So what I can see is that the types of sources look the same. So even though the scores are wrong, the integrity of the type_of_water_source data we analysed last time is not affected.

# Linking records
Linking records to employees
Next up, let's look at where these errors may have come from. At some of the locations, employees assigned scores incorrectly, and those records
ended up in this results set.

I think there are two reasons this can happen.
1. These workers are all humans and make mistakes so this is expected.
2. Unfortunately, the alternative is that someone assigned scores incorrectly on purpose.

In either case, the employees are the source of the errors, so let's JOIN the assigned_employee_id for all the people on our list from the visits table to our query. Remember, our query shows the shows the 102 incorrect records, so when we join the employee data, we can see which employees made these incorrect records.

In [15]:
%%sql
SELECT
    auditor_report.location_id AS location_id,
    visits.record_id,
    employee.assigned_employee_id,
    auditor_report.true_water_source_score AS auditor_score,
    water_quality.subjective_quality_score AS surveyor_score   
FROM
    auditor_report
JOIN
    visits
ON
    auditor_report.location_id = visits.location_id
JOIN
    water_quality
ON
    visits.record_id=water_quality.record_id
JOIN
    employee
ON
    visits.assigned_employee_id=employee.assigned_employee_id
WHERE
    auditor_report.true_water_source_score!=water_quality.subjective_quality_score
AND
    visits.visit_count=1;

location_id,record_id,assigned_employee_id,auditor_score,surveyor_score
AkRu05215,21160,34,3,10
KiRu29290,7938,1,3,10
KiHa22748,43140,1,9,10
SoRu37841,18495,34,6,10
KiRu27884,33931,1,1,10
KiZu31170,17950,5,9,10
KiZu31370,36864,48,3,10
AkRu06495,45924,1,2,10
HaRu17528,30524,18,1,10
SoRu38331,13192,5,3,10


So now we can link the incorrect records to the employees who recorded them. The ID's don't help us to identify them. We have employees' names
stored along with their IDs, so let's fetch their names from the employees table instead of the ID's.

In [16]:
%%sql
SELECT
    auditor_report.location_id AS location_id,
    visits.record_id,
    employee.employee_name,
    auditor_report.true_water_source_score AS auditor_score,
    water_quality.subjective_quality_score AS surveyor_score   
FROM
    auditor_report
JOIN
    visits
ON
    auditor_report.location_id = visits.location_id
JOIN
    water_quality
ON
    visits.record_id=water_quality.record_id
JOIN
    employee
ON
    visits.assigned_employee_id=employee.assigned_employee_id
WHERE
    auditor_report.true_water_source_score!=water_quality.subjective_quality_score
AND
    visits.visit_count=1;

location_id,record_id,employee_name,auditor_score,surveyor_score
AkRu05215,21160,Rudo Imani,3,10
KiRu29290,7938,Bello Azibo,3,10
KiHa22748,43140,Bello Azibo,9,10
SoRu37841,18495,Rudo Imani,6,10
KiRu27884,33931,Bello Azibo,1,10
KiZu31170,17950,Zuriel Matembo,9,10
KiZu31370,36864,Yewande Ebele,3,10
AkRu06495,45924,Bello Azibo,2,10
HaRu17528,30524,Jengo Tumaini,1,10
SoRu38331,13192,Zuriel Matembo,3,10


In [17]:
%%sql
WITH 
    Incorrect_records AS (
        SELECT
            auditor_report.location_id AS location_id,
            visits.record_id,
            employee.employee_name,
            auditor_report.true_water_source_score AS auditor_score,
            water_quality.subjective_quality_score AS surveyor_score   
        FROM
            auditor_report
        JOIN
            visits
        ON
            auditor_report.location_id = visits.location_id
        JOIN
            water_quality
        ON
            visits.record_id=water_quality.record_id
        JOIN
            employee
        ON
            visits.assigned_employee_id=employee.assigned_employee_id
        WHERE
            auditor_report.true_water_source_score!=water_quality.subjective_quality_score
        AND
            visits.visit_count=1)
SELECT
    *
FROM
    Incorrect_records;

location_id,record_id,employee_name,auditor_score,surveyor_score
AkRu05215,21160,Rudo Imani,3,10
KiRu29290,7938,Bello Azibo,3,10
KiHa22748,43140,Bello Azibo,9,10
SoRu37841,18495,Rudo Imani,6,10
KiRu27884,33931,Bello Azibo,1,10
KiZu31170,17950,Zuriel Matembo,9,10
KiZu31370,36864,Yewande Ebele,3,10
AkRu06495,45924,Bello Azibo,2,10
HaRu17528,30524,Jengo Tumaini,1,10
SoRu38331,13192,Zuriel Matembo,3,10


Let's see how many unique employees are in this table.

In [18]:
%%sql
WITH 
    Incorrect_records AS (
        SELECT
            auditor_report.location_id AS location_id,
            visits.record_id,
            employee.employee_name,
            auditor_report.true_water_source_score AS auditor_score,
            water_quality.subjective_quality_score AS surveyor_score   
        FROM
            auditor_report
        JOIN
            visits
        ON
            auditor_report.location_id = visits.location_id
        JOIN
            water_quality
        ON
            visits.record_id=water_quality.record_id
        JOIN
            employee
        ON
            visits.assigned_employee_id=employee.assigned_employee_id
        WHERE
            auditor_report.true_water_source_score!=water_quality.subjective_quality_score
        AND
            visits.visit_count=1)
SELECT
    COUNT(DISTINCT employee_name) AS number_of_employees
FROM
    Incorrect_records;

number_of_employees
17


In [19]:
%%sql
WITH 
    Incorrect_records AS (
        SELECT
            auditor_report.location_id AS location_id,
            visits.record_id,
            employee.employee_name,
            auditor_report.true_water_source_score AS auditor_score,
            water_quality.subjective_quality_score AS surveyor_score   
        FROM
            auditor_report
        JOIN
            visits
        ON
            auditor_report.location_id = visits.location_id
        JOIN
            water_quality
        ON
            visits.record_id=water_quality.record_id
        JOIN
            employee
        ON
            visits.assigned_employee_id=employee.assigned_employee_id
        WHERE
            auditor_report.true_water_source_score!=water_quality.subjective_quality_score
        AND
            visits.visit_count=1)
SELECT
    employee_name,
    COUNT(employee_name) AS number_of_mistakes
FROM
    Incorrect_records
GROUP BY
    employee_name
ORDER BY
    COUNT(employee_name) DESC;

employee_name,number_of_mistakes
Bello Azibo,26
Malachi Mavuso,21
Zuriel Matembo,17
Lalitha Kaburi,7
Rudo Imani,5
Farai Nia,4
Enitan Zuri,4
Yewande Ebele,3
Jengo Tumaini,3
Makena Thabo,3


# Gathering evidence
We are going to build a complex query seeking the truth.

We need to calculate the average number of mistakes employees made. We can do that by taking the average of the previous query's
results.

In [20]:
%%sql
 DROP VIEW IF EXISTS Incorrect_records;
CREATE VIEW
    Incorrect_records AS (
        SELECT
            auditor_report.location_id AS location_id,
            visits.record_id,
            employee.employee_name,
            auditor_report.true_water_source_score AS auditor_score,
            water_quality.subjective_quality_score AS surveyor_score,
            auditor_report.statements
        FROM
            auditor_report
        JOIN
            visits
        ON
            auditor_report.location_id = visits.location_id
        JOIN
            water_quality
        ON
            visits.record_id=water_quality.record_id
        JOIN
            employee
        ON
            visits.assigned_employee_id=employee.assigned_employee_id
        WHERE
            auditor_report.true_water_source_score!=water_quality.subjective_quality_score
        AND
            visits.visit_count=1)

In [21]:
%%sql
SELECT
    *
FROM
    incorrect_records;

location_id,record_id,employee_name,auditor_score,surveyor_score,statements
AkRu05215,21160,Rudo Imani,3,10,"Villagers admired the official's visit for its respectful interactions, hard work, and genuine concern."
KiRu29290,7938,Bello Azibo,3,10,"A young artist sketches the faces in the queue, capturing the weariness of daily hours spent waiting for water."
KiHa22748,43140,Bello Azibo,9,10,"A young girl's hopeful eyes are clouded by mistrust, her innocence tarnished by the corrupt system."
SoRu37841,18495,Rudo Imani,6,10,"The official's respectful and diligent presence was met with heartfelt appreciation, creating a sense of closeness with the villagers."
KiRu27884,33931,Bello Azibo,1,10,"A traditional healer's empathy turns to bitterness, knowing that corrupt practices harm her community."
KiZu31170,17950,Zuriel Matembo,9,10,"A community leader stood with his people, expressing concern for the water quality and the time lost in queues."","""
KiZu31370,36864,Yewande Ebele,3,10,"With a keen understanding of urban challenges, the official's visit left a lasting impression of respect and commitment."
AkRu06495,45924,Bello Azibo,2,10,"A healthcare worker in the queue expressed fears about water-borne diseases, her face etched with worry."","""
HaRu17528,30524,Jengo Tumaini,1,10,"With humility and diligence, the official formed bonds with the villagers that felt like genuine family connections."
SoRu38331,13192,Zuriel Matembo,3,10,"An unsettling atmosphere surrounded the official, as villagers shared their experiences of arrogance and lack of dedication. The mention of cash exchanges only intensified their doubts."


In [23]:
%%sql
WITH 
    error_count AS ( -- This CTE calculates the number of mistakes each employee made
        SELECT
            employee_name,
            COUNT(employee_name) AS number_of_mistakes
        FROM
            Incorrect_records
                     
        GROUP BY
            employee_name
        ORDER BY
            number_of_mistakes DESC)
-- Query
SELECT 
    * 
FROM 
    error_count;
    
/*
Incorrect_records is a view that joins the audit report to the database
for records where the auditor and
employees scores are different*
/

employee_name,number_of_mistakes
Bello Azibo,26
Malachi Mavuso,21
Zuriel Matembo,17
Lalitha Kaburi,7
Rudo Imani,5
Farai Nia,4
Enitan Zuri,4
Yewande Ebele,3
Jengo Tumaini,3
Makena Thabo,3


In [24]:
%%sql
WITH 
    error_count AS ( -- This CTE calculates the number of mistakes each employee made
        SELECT
            employee_name,
            COUNT(employee_name) AS number_of_mistakes
        FROM
            Incorrect_records
                     
        GROUP BY
            employee_name
        ORDER BY
            number_of_mistakes DESC)
-- Query
SELECT 
    ROUND(AVG(number_of_mistakes)) AS  avg_error_count_per_empl
FROM 
    error_count;

avg_error_count_per_empl
6


Finaly we have to compare each employee's error_count with avg_error_count_per_empl. We will call this results set our suspect_list.

In [25]:
%%sql
WITH 
    error_count AS ( -- This CTE calculates the number of mistakes each employee made
        SELECT
            employee_name,
            COUNT(employee_name) AS number_of_mistakes
        FROM
            Incorrect_records
                     
        GROUP BY
            employee_name
        ORDER BY
            number_of_mistakes DESC)
-- Query
SELECT
    employee_name,
    number_of_mistakes
FROM 
    error_count
WHERE
    number_of_mistakes > (SELECT 
    ROUND(AVG(number_of_mistakes)) AS  avg_error_count_per_empl
FROM 
    error_count)

employee_name,number_of_mistakes
Bello Azibo,26
Malachi Mavuso,21
Zuriel Matembo,17
Lalitha Kaburi,7


We should look at the Incorrect_records table again, and isolate all of the records these four employees gathered. We should also look at the
statements for these records to look for patterns.

In [26]:
%%sql
WITH 
    error_count AS ( 
        SELECT
            employee_name,
            COUNT(employee_name) AS number_of_mistakes
        FROM
            Incorrect_records
                     
        GROUP BY
            employee_name
        ORDER BY
            number_of_mistakes DESC),
   suspect_list AS (
    SELECT ec1.employee_name, ec1.number_of_mistakes
    FROM error_count ec1
    WHERE ec1.number_of_mistakes >= (
        SELECT AVG(ec2.number_of_mistakes)
        FROM error_count ec2
        WHERE ec2.employee_name = ec1.employee_name))
SELECT
    employee_name,
    location_id,
    statements
FROM
    Incorrect_records
WHERE
    employee_name IN (SELECT employee_name FROM suspect_list);

employee_name,location_id,statements
Rudo Imani,AkRu05215,"Villagers admired the official's visit for its respectful interactions, hard work, and genuine concern."
Bello Azibo,KiRu29290,"A young artist sketches the faces in the queue, capturing the weariness of daily hours spent waiting for water."
Bello Azibo,KiHa22748,"A young girl's hopeful eyes are clouded by mistrust, her innocence tarnished by the corrupt system."
Rudo Imani,SoRu37841,"The official's respectful and diligent presence was met with heartfelt appreciation, creating a sense of closeness with the villagers."
Bello Azibo,KiRu27884,"A traditional healer's empathy turns to bitterness, knowing that corrupt practices harm her community."
Zuriel Matembo,KiZu31170,"A community leader stood with his people, expressing concern for the water quality and the time lost in queues."","""
Yewande Ebele,KiZu31370,"With a keen understanding of urban challenges, the official's visit left a lasting impression of respect and commitment."
Bello Azibo,AkRu06495,"A healthcare worker in the queue expressed fears about water-borne diseases, her face etched with worry."","""
Jengo Tumaini,HaRu17528,"With humility and diligence, the official formed bonds with the villagers that felt like genuine family connections."
Zuriel Matembo,SoRu38331,"An unsettling atmosphere surrounded the official, as villagers shared their experiences of arrogance and lack of dedication. The mention of cash exchanges only intensified their doubts."


In [27]:
%%sql
DROP VIEW error_count;

RuntimeError: (pymysql.err.OperationalError) (1051, "Unknown table 'md_water_services.error_count'")
[SQL: DROP VIEW error_count;]
(Background on this error at: https://sqlalche.me/e/20/e3q8)
If you need help solving this issue, send us a message: https://ploomber.io/community


In [28]:
%%sql
WITH 
    error_count AS ( 
        SELECT
            employee_name,
            COUNT(employee_name) AS number_of_mistakes
        FROM
            Incorrect_records
                     
        GROUP BY
            employee_name
        ORDER BY
            number_of_mistakes DESC),
    suspect_list AS (
            SELECT
                employee_name,
                number_of_mistakes
            FROM
                error_count
            WHERE
                number_of_mistakes > (SELECT 
                                        ROUND(AVG(number_of_mistakes)) AS  avg_error_count_per_empl
                                        FROM 
                                            error_count))
SELECT
    employee_name,
    location_id,
    statements
FROM
    Incorrect_records
WHERE
    employee_name IN (SELECT employee_name FROM suspect_list);

employee_name,location_id,statements
Bello Azibo,KiRu29290,"A young artist sketches the faces in the queue, capturing the weariness of daily hours spent waiting for water."
Bello Azibo,KiHa22748,"A young girl's hopeful eyes are clouded by mistrust, her innocence tarnished by the corrupt system."
Bello Azibo,KiRu27884,"A traditional healer's empathy turns to bitterness, knowing that corrupt practices harm her community."
Zuriel Matembo,KiZu31170,"A community leader stood with his people, expressing concern for the water quality and the time lost in queues."","""
Bello Azibo,AkRu06495,"A healthcare worker in the queue expressed fears about water-borne diseases, her face etched with worry."","""
Zuriel Matembo,SoRu38331,"An unsettling atmosphere surrounded the official, as villagers shared their experiences of arrogance and lack of dedication. The mention of cash exchanges only intensified their doubts."
Malachi Mavuso,AmAm09607,Villagers spoke of an unsettling encounter with an official who appeared dismissive and detached. The reference to cash transactions added to their growing sense of distrust.
Zuriel Matembo,AkHa00314,"A street vendor's sales suffer from time spent waiting, her concern for the water's quality affecting her products."
Malachi Mavuso,KiRu26598,"A teenager's dreams are tempered by reality, her future threatened by the corrupt practices she sees around her."
Bello Azibo,KiIs23853,Villagers' wary accounts of an official's arrogance and detachment from their concerns raised suspicions. The mention of cash changing hands further tainted their perception.


If you have a look, you will notice some alarming statements about these four officials (look at these records: AkRu04508, AkRu07310,
KiRu29639, AmAm09607, for example.
We want to see all statements with cash in it.

In [29]:
%%sql
WITH 
    error_count AS ( 
        SELECT
            employee_name,
            COUNT(employee_name) AS number_of_mistakes
        FROM
            Incorrect_records
                     
        GROUP BY
            employee_name
        ORDER BY
            number_of_mistakes DESC),
   suspect_list AS (
            SELECT
                employee_name,
                number_of_mistakes
            FROM
                error_count
            WHERE
                number_of_mistakes > (SELECT 
                                        ROUND(AVG(number_of_mistakes)) AS  avg_error_count_per_empl
                                        FROM 
                                            error_count))
SELECT
    employee_name,
    location_id,
    statements
FROM
    Incorrect_records
WHERE
    employee_name IN (SELECT employee_name FROM suspect_list)
AND
    statements LIKE '%cash%';

employee_name,location_id,statements
Zuriel Matembo,SoRu38331,"An unsettling atmosphere surrounded the official, as villagers shared their experiences of arrogance and lack of dedication. The mention of cash exchanges only intensified their doubts."
Malachi Mavuso,AmAm09607,Villagers spoke of an unsettling encounter with an official who appeared dismissive and detached. The reference to cash transactions added to their growing sense of distrust.
Bello Azibo,KiIs23853,Villagers' wary accounts of an official's arrogance and detachment from their concerns raised suspicions. The mention of cash changing hands further tainted their perception.
Bello Azibo,HaSe21323,Villagers spoke of an unsettling encounter with an official who appeared dismissive and detached. The reference to cash transactions added to their growing sense of distrust.
Zuriel Matembo,AkRu05880,Villagers' wary accounts of an official's arrogance and detachment from their concerns raised suspicions. The allusion to cash changing hands deepened their skepticism.
Bello Azibo,KiRu27065,Villagers expressed their discomfort with an official who displayed a haughty demeanor and negligence. The mention of cash transactions deepened their growing sense of unease.
Malachi Mavuso,KiRu25347,Villagers expressed their discontent with an official who appeared dismissive and neglectful. The mention of cash changing hands added to their growing sense of distrust.
Zuriel Matembo,SoIl32575,Villagers recounted unsettling encounters with an official known for their arrogance and avoidance of responsibilities. The mention of cash changing hands added to their apprehension and distrust.
Bello Azibo,AkRu04508,"An unsettling atmosphere surrounded the official, as villagers shared their experiences of arrogance and lack of dedication. The mention of cash exchanges only intensified their doubts."
Lalitha Kaburi,AkRu07310,"Villagers spoke of their unsettling encounters with an official who seemed indifferent and uninterested, hinting at potential improprieties involving cash exchanges."


Now, let's see if "cash" was mentioned in statements pertaining to the other surveyors.

In [30]:
%%sql
WITH 
    error_count AS ( 
        SELECT
            employee_name,
            COUNT(employee_name) AS number_of_mistakes
        FROM
            Incorrect_records
                     
        GROUP BY
            employee_name
        ORDER BY
            number_of_mistakes DESC),
    suspect_list AS (
            SELECT
                employee_name,
                number_of_mistakes
            FROM
                error_count
            WHERE
                number_of_mistakes > (SELECT 
                                        ROUND(AVG(number_of_mistakes)) AS  avg_error_count_per_empl
                                        FROM 
                                            error_count))
SELECT
    employee_name,
    location_id,
    statements
FROM
    Incorrect_records
WHERE
    employee_name NOT IN (SELECT employee_name FROM suspect_list)
AND
    statements LIKE '%cash%';

employee_name,location_id,statements


# Summary
We can see that there are 0 results returned.
So we can sum up the evidence we have for Zuriel Matembo, Malachi Mavuso, Bello Azibo and Lalitha Kaburi:
1. They all made more mistakes than their peers on average.
2. They all have incriminating statements made against them, and only them.
Keep in mind, that this is not decisive proof, but it is concerning enough that we should flag it. Pres. Naledi has worked hard to stamp out corruption, so she would urge us to report this.

location_id,record_id,auditor_score,employee_score,score_diff
SoBa31691,3267,0,10,10
AkHa00363,25387,0,10,10
SoKo33094,16159,0,10,10
KiRu27364,38102,0,10,10
KiRu27065,29772,0,10,10
KiRu29147,3179,0,10,10
SoIl32770,7548,0,10,10
KiRu29329,11962,0,10,10
SoRu39544,17929,0,10,10
SoRu38535,44282,0,10,10


This last query was just to fetch other information that was not necessarily a part of the project.