## Connecting to our MySQL database

Using the 'md_water_services' database we created in MySQL Workbench, we want to answer some questions about our dataset. We can apply the same queries we used in MySQL Workbench in this notebook if we connect to our MySQL server by running the cells below.

In [1]:
# Load and activate the SQL extension to allows us to execute SQL in a Jupyter notebook.
# If you get an error here, make sure that mysql and pymysql is installed correctly.

%load_ext sql

In [2]:
# Establish a connection to the local database using the '%sql' magic command,
# Replace 'password' with our connection password and `db_name` with our database name.
# If you get an error here, please make sure the database name or password is correct.

%sql mysql+pymysql://root:paschalugwu@localhost:3306/md_water_services

# PROJECT PHASE 0: Setting the stage for our data exploration journey - Introduction.

I am thrilled to share with you the progress of my project, "Maji Ndogo: From Analysis to Action." This project aims to weave the data threads of Maji Ndogo's narrative, focusing on the analysis and action required to address the water challenges in our country. 
 
In the initial stages, I have set the stage for our data exploration journey by understanding the database structure and generating an Entity-Relationship Diagram (ERD). This allows us to visualize the relationships between different data entities and gain a comprehensive understanding of the database. 
 
Furthermore, I have integrated the auditor's report into our database, ensuring that the findings and recommendations are incorporated into our decision-making process. This step is crucial in maintaining data integrity and ensuring that our actions are based on accurate and reliable information. 
 
To link records and gain deeper insights, I have joined the employee data with the audit report. This enables us to analyze the data from multiple perspectives and identify areas for improvement. 
 
As part of this project, I have also gathered evidence by building complex queries to seek the truth. By leveraging the power of data analysis and querying, we can uncover hidden patterns, identify trends, and make informed decisions to drive positive change. 
 
I am excited about the progress we have made so far and the potential impact this project can have on Maji Ndogo and our community. It is my firm belief that through data-driven decision-making, we can address the challenges we face and create a sustainable water future. 

# PROJECT PHASE 1: Understanding the database structure through ERD generation.

At this stage of the project, we have made significant progress in analyzing and integrating the data for the Maji Ndogo water project. We have conducted an independent audit of the database, specifically focusing on the water sources recorded in our country. The objective of this audit was to assess the integrity and accuracy of the data stored in the database.

After a thorough examination of the database's records and the associated data entry and modification procedures, we can confirm that the majority of the data aligns with the principles of good governance and data-driven decision-making. This is a testament to the commitment and efforts put forth by our team.

However, during the audit, we did identify some instances where the data was tampered with. These findings require immediate attention to ensure the integrity of the database and the reliability of the information it provides for decision-making and governance.

In the upcoming stage, we will be integrating the auditor's report into the database, which will add valuable insights and recommendations for further improvement. This integration will enhance the overall functionality and effectiveness of the database.

Before proceeding with the addition of the ERD diagram, it is essential to review the auditor's report and address any issues highlighted. This will ensure that the ERD accurately represents the current state of the database and reflects the necessary modifications based on the audit findings.

By incorporating the ERD diagram, we will have a visual representation of the database structure and its relationships, providing a clearer understanding of the data architecture. This will serve as a valuable reference for future analysis, decision-making, and communication with stakeholders.

Let's proceed with adding the ERD diagram and continue our journey towards transforming Maji Ndogo's narrative into actionable insights and positive change.

![ERD Diagram Without Auditor Report](MajiNdogoModel.png)

# PROJECT PHASE 2: Integrating the Auditor's report

![ERD Diagram With Auditor Report](maji_ndogo_model_AND_AuditorReport.png)

## Step 1: Begin by creating a new SQL query to drop the  auditor_report  table if it already exists and create a new table with the specified columns.

In [3]:
%%sql

DROP TABLE IF EXISTS `auditor_report`;
CREATE TABLE `auditor_report` (
  `location_id` VARCHAR(32),
  `type_of_water_source` VARCHAR(64),
  `true_water_source_score` INT DEFAULT NULL,
  `statements` VARCHAR(255)
);

 * mysql+pymysql://root:***@localhost:3306/md_water_services


0 rows affected.
0 rows affected.


[]

### Lets view what we just imported.

In [3]:
%%sql

SELECT *
FROM auditor_report;

 * mysql+pymysql://root:***@localhost:3306/md_water_services
1620 rows affected.


location_id,type_of_water_source,true_water_source_score,statements
SoRu34980,well,1,"Residents admired the official's commitment to enhancing urban life, praising their cooperative and inclusive approach."
AkRu08112,well,3,"Villagers spoke highly of the official's dedication and genuine interest in their lives, fostering a sense of belonging and appreciation."
AkLu02044,river,0,"Villagers were touched by the official's interactions, noting their humility, strong work ethic, and respectful attitude."
AkHa00421,well,3,"Villagers were moved by the official's visit, praising their hard work, humility, and the profound sense of connection they fostered."
SoRu35221,river,0,"A photographer's lens captures the queue, though his own struggle for water is a hidden part of the story."
HaAm16170,well,1,"With an open heart, the official created an atmosphere of unity and familial camaraderie among the villagers."
AkRu04812,well,3,"The official's presence left an indelible mark, reflecting their humility, dedication, and the genuine connections they nurtured."
AkRu08304,well,3,"The official's interactions resonated deeply with the villagers, leaving a lasting impression of respect and camaraderie."
AkRu05107,well,2,"Villagers spoke highly of the official's dedication and genuine interest in their lives, fostering a sense of belonging and appreciation."
AkRu05215,well,3,"Villagers admired the official's visit for its respectful interactions, hard work, and genuine concern."


## Step 2: To compare the quality scores between the  auditor_report  table and the  water_quality  table, let us perform a join operation using the  visits  table as the intermediary.

In [4]:
%%sql

SELECT
  auditor_report.location_id AS audit_location,
  auditor_report.true_water_source_score,
  visits.location_id AS visit_location,
  visits.record_id
FROM
  auditor_report
JOIN
  visits ON auditor_report.location_id = visits.location_id;

 * mysql+pymysql://root:***@localhost:3306/md_water_services
2698 rows affected.


audit_location,true_water_source_score,visit_location,record_id
SoRu34980,1,SoRu34980,5185
AkRu08112,3,AkRu08112,59367
AkLu02044,0,AkLu02044,37379
AkHa00421,3,AkHa00421,51627
SoRu35221,0,SoRu35221,28758
HaAm16170,1,HaAm16170,31048
AkRu04812,3,AkRu04812,1513
AkRu08304,3,AkRu08304,1218
AkRu05107,2,AkRu05107,8322
AkRu05215,3,AkRu05215,21160


## Step 3:  To retrieve the corresponding scores from the  water_quality  table, perform another join operation using the  record_id  as the connecting key

In [5]:
%%sql

SELECT
  auditor_report.location_id AS audit_location,
  auditor_report.true_water_source_score,
  visits.location_id AS visit_location,
  visits.record_id,
  water_quality.subjective_quality_score
FROM
  auditor_report
JOIN
  visits ON auditor_report.location_id = visits.location_id
JOIN
  water_quality ON visits.record_id = water_quality.record_id;

 * mysql+pymysql://root:***@localhost:3306/md_water_services
2698 rows affected.


audit_location,true_water_source_score,visit_location,record_id,subjective_quality_score
SoRu34980,1,SoRu34980,5185,1
AkRu08112,3,AkRu08112,59367,3
AkLu02044,0,AkLu02044,37379,0
AkHa00421,3,AkHa00421,51627,3
SoRu35221,0,SoRu35221,28758,0
HaAm16170,1,HaAm16170,31048,1
AkRu04812,3,AkRu04812,1513,3
AkRu08304,3,AkRu08304,1218,3
AkRu05107,2,AkRu05107,8322,2
AkRu05215,3,AkRu05215,21160,10


## Step 4: Clean up the resulting table by removing duplicate columns and renaming them for clarity.

In [6]:
%%sql

SELECT
  visits.location_id AS location_id,
  visits.record_id,
  auditor_report.true_water_source_score AS auditor_score,
  water_quality.subjective_quality_score AS employee_score
FROM
  auditor_report
JOIN
  visits ON auditor_report.location_id = visits.location_id
JOIN
  water_quality ON visits.record_id = water_quality.record_id;

 * mysql+pymysql://root:***@localhost:3306/md_water_services
2698 rows affected.


location_id,record_id,auditor_score,employee_score
SoRu34980,5185,1,1
AkRu08112,59367,3,3
AkLu02044,37379,0,0
AkHa00421,51627,3,3
SoRu35221,28758,0,0
HaAm16170,31048,1,1
AkRu04812,1513,3,3
AkRu08304,1218,3,3
AkRu05107,8322,2,2
AkRu05215,21160,3,10


## Step 5: To check if the auditor's and employee's scores agree, add a WHERE clause to compare the scores.

In [7]:
%%sql

SELECT
  visits.location_id AS location_id,
  visits.record_id,
  auditor_report.true_water_source_score AS auditor_score,
  water_quality.subjective_quality_score AS employee_score
FROM
  auditor_report
JOIN
  visits ON auditor_report.location_id = visits.location_id
JOIN
  water_quality ON visits.record_id = water_quality.record_id
WHERE
  auditor_report.true_water_source_score = water_quality.subjective_quality_score;

 * mysql+pymysql://root:***@localhost:3306/md_water_services


2505 rows affected.


location_id,record_id,auditor_score,employee_score
SoRu34980,5185,1,1
AkRu08112,59367,3,3
AkLu02044,37379,0,0
AkHa00421,51627,3,3
SoRu35221,28758,0,0
HaAm16170,31048,1,1
AkRu04812,1513,3,3
AkRu08304,1218,3,3
AkRu05107,8322,2,2
HaDe16541,13070,2,2


<font color="red">

NOTE: With the duplicates removed we now get 1518. What does this mean considering the auditor visited 1620 sites?
I think that is an excellent result. 1518/1620 = 94% of the records the auditor checked were correct!
But that means that 102 records are incorrect. So let's look at those.

</font>

## Step 6: To remove duplicates, add  visits.visit_count = 1  in the WHERE clause.

In [8]:
%%sql

SELECT
  visits.location_id AS location_id,
  visits.record_id,
  auditor_report.true_water_source_score AS auditor_score,
  water_quality.subjective_quality_score AS employee_score
FROM
  auditor_report
JOIN
  visits ON auditor_report.location_id = visits.location_id
JOIN
  water_quality ON visits.record_id = water_quality.record_id
WHERE
  auditor_report.true_water_source_score = water_quality.subjective_quality_score
  AND visits.visit_count = 1;

 * mysql+pymysql://root:***@localhost:3306/md_water_services
1518 rows affected.


location_id,record_id,auditor_score,employee_score
SoRu34980,5185,1,1
AkRu08112,59367,3,3
AkLu02044,37379,0,0
AkHa00421,51627,3,3
SoRu35221,28758,0,0
HaAm16170,31048,1,1
AkRu04812,1513,3,3
AkRu08304,1218,3,3
AkRu05107,8322,2,2
HaDe16541,13070,2,2


## Step 7:  To analyze the records that are incorrect, modify the WHERE clause to check if the scores are not equal.

In [9]:
%%sql

SELECT
  visits.location_id AS location_id,
  visits.record_id,
  auditor_report.true_water_source_score AS auditor_score,
  water_quality.subjective_quality_score AS employee_score
FROM
  auditor_report
JOIN
  visits ON auditor_report.location_id = visits.location_id
JOIN
  water_quality ON visits.record_id = water_quality.record_id
WHERE
  auditor_report.true_water_source_score != water_quality.subjective_quality_score
  AND visits.visit_count = 1;

 * mysql+pymysql://root:***@localhost:3306/md_water_services
102 rows affected.


location_id,record_id,auditor_score,employee_score
AkRu05215,21160,3,10
KiRu29290,7938,3,10
KiHa22748,43140,9,10
SoRu37841,18495,6,10
KiRu27884,33931,1,10
KiZu31170,17950,9,10
KiZu31370,36864,3,10
AkRu06495,45924,2,10
HaRu17528,30524,1,10
SoRu38331,13192,3,10


## Step 8: To check if there are any errors in the  type_of_water_source  column, join the  auditor_report  table with the  water_source  table using the  source_id  as the connecting key.

In [10]:
%%sql

SELECT 
    v.location_id, 
    ar.type_of_water_source AS auditor_source, 
    ws.type_of_water_source AS survey_source, 
    v.record_id, 
    ar.true_water_source_score AS auditor_score,
    wq.subjective_quality_score AS employee_score
FROM 
    visits v
JOIN 
    water_source ws ON v.source_id = ws.source_id
JOIN 
    auditor_report ar ON v.location_id = ar.location_id
JOIN 
    water_quality wq ON v.record_id = wq.record_id
WHERE
  ar.true_water_source_score != wq.subjective_quality_score
  AND v.visit_count = 1;

 * mysql+pymysql://root:***@localhost:3306/md_water_services
102 rows affected.


location_id,auditor_source,survey_source,record_id,auditor_score,employee_score
AkRu05215,well,well,21160,3,10
KiRu29290,shared_tap,shared_tap,7938,3,10
KiHa22748,tap_in_home_broken,tap_in_home_broken,43140,9,10
SoRu37841,shared_tap,shared_tap,18495,6,10
KiRu27884,well,well,33931,1,10
KiZu31170,tap_in_home_broken,tap_in_home_broken,17950,9,10
KiZu31370,shared_tap,shared_tap,36864,3,10
AkRu06495,well,well,45924,2,10
HaRu17528,well,well,30524,1,10
SoRu38331,shared_tap,shared_tap,13192,3,10


# PROJECT PHASE 3: Linking records to employees

## Step 1: Join the assigned_employee_id for all the people on the list from the visits table to the query. This will help identify the employees responsible for the incorrect records.

In [11]:
%%sql

SELECT 
    v.location_id, 
    v.record_id,
    v.assigned_employee_id, 
    ar.true_water_source_score AS auditor_score,
    wq.subjective_quality_score AS employee_score
FROM 
    visits v
JOIN 
    water_source ws ON v.source_id = ws.source_id
JOIN 
    auditor_report ar ON v.location_id = ar.location_id
JOIN 
    water_quality wq ON v.record_id = wq.record_id
WHERE
    ar.true_water_source_score != wq.subjective_quality_score
    AND v.visit_count = 1;

 * mysql+pymysql://root:***@localhost:3306/md_water_services
102 rows affected.


location_id,record_id,assigned_employee_id,auditor_score,employee_score
AkRu05215,21160,34,3,10
KiRu29290,7938,1,3,10
KiHa22748,43140,1,9,10
SoRu37841,18495,34,6,10
KiRu27884,33931,1,1,10
KiZu31170,17950,5,9,10
KiZu31370,36864,48,3,10
AkRu06495,45924,1,2,10
HaRu17528,30524,18,1,10
SoRu38331,13192,5,3,10


## Step 2: To fetch the names of the employees who recorded the incorrect records, you can join the  employees  table with the previous query. Here's an example query to accomplish this.

In [13]:
%%sql

SELECT
    v.location_id,
    v.record_id,
    e.employee_name,
    ar.true_water_source_score AS auditor_score,
    wq.subjective_quality_score AS employee_score
FROM
    visits v
JOIN
    water_source ws ON v.source_id = ws.source_id
JOIN
    auditor_report ar ON v.location_id = ar.location_id
JOIN
    water_quality wq ON v.record_id = wq.record_id
JOIN
    employee e ON v.assigned_employee_id = e.assigned_employee_id
WHERE
    ar.true_water_source_score != wq.subjective_quality_score
    AND v.visit_count = 1;

 * mysql+pymysql://root:***@localhost:3306/md_water_services
102 rows affected.


location_id,record_id,employee_name,auditor_score,employee_score
AkRu05215,21160,Rudo Imani,3,10
KiRu29290,7938,Bello Azibo,3,10
KiHa22748,43140,Bello Azibo,9,10
SoRu37841,18495,Rudo Imani,6,10
KiRu27884,33931,Bello Azibo,1,10
KiZu31170,17950,Zuriel Matembo,9,10
KiZu31370,36864,Yewande Ebele,3,10
AkRu06495,45924,Bello Azibo,2,10
HaRu17528,30524,Jengo Tumaini,1,10
SoRu38331,13192,Zuriel Matembo,3,10


## Step 3: To create the CTE named "Incorrect_records" (this would allow us to easily reference and query the CTE as if it were a table) and check if it returns the same table, we can use the following query:

In [50]:
%%sql

WITH Incorrect_records AS (
    SELECT
        v.location_id,
        v.record_id,
        e.employee_name,
        ar.true_water_source_score AS auditor_score,
        wq.subjective_quality_score AS employee_score
    FROM
        visits v
    JOIN
        water_source ws ON v.source_id = ws.source_id
    JOIN
        auditor_report ar ON v.location_id = ar.location_id
    JOIN
        water_quality wq ON v.record_id = wq.record_id
    JOIN
        employee e ON v.assigned_employee_id = e.assigned_employee_id
    WHERE
        ar.true_water_source_score != wq.subjective_quality_score
        AND v.visit_count = 1
)
SELECT *
FROM Incorrect_records;

 * mysql+pymysql://root:***@localhost:3306/md_water_services
102 rows affected.


location_id,record_id,employee_name,auditor_score,employee_score
AkRu05215,21160,Rudo Imani,3,10
KiRu29290,7938,Bello Azibo,3,10
KiHa22748,43140,Bello Azibo,9,10
SoRu37841,18495,Rudo Imani,6,10
KiRu27884,33931,Bello Azibo,1,10
KiZu31170,17950,Zuriel Matembo,9,10
KiZu31370,36864,Yewande Ebele,3,10
AkRu06495,45924,Bello Azibo,2,10
HaRu17528,30524,Jengo Tumaini,1,10
SoRu38331,13192,Zuriel Matembo,3,10


## Step 4: Let's retrieve the  employee_name  column from the Incorrect_records  table and counts the number of occurrences for each unique employee name using the  COUNT(*)  function.

In [40]:
%%sql

WITH  error_count AS (
    SELECT
        v.location_id,
        v.record_id,
        e.employee_name,
        ar.true_water_source_score AS auditor_score,
        wq.subjective_quality_score AS employee_score
    FROM
        visits v
    JOIN
        water_source ws ON v.source_id = ws.source_id
    JOIN
        auditor_report ar ON v.location_id = ar.location_id
    JOIN
        water_quality wq ON v.record_id = wq.record_id
    JOIN
        employee e ON v.assigned_employee_id = e.assigned_employee_id
    WHERE
        ar.true_water_source_score != wq.subjective_quality_score
        AND v.visit_count = 1
)
SELECT
    employee_name,
    COUNT(*) AS number_of_mistakes
FROM
     error_count
GROUP BY
    employee_name
ORDER BY
    number_of_mistakes DESC;

 * mysql+pymysql://root:***@localhost:3306/md_water_services
17 rows affected.


employee_name,number_of_mistakes
Bello Azibo,26
Malachi Mavuso,21
Zuriel Matembo,17
Lalitha Kaburi,7
Rudo Imani,5
Farai Nia,4
Enitan Zuri,4
Yewande Ebele,3
Jengo Tumaini,3
Makena Thabo,3


# PROJECT PHASE 4: Gathering some evidence.

## Step 1: Calculate the number of times an employee's name comes up (error_count).

## Step 2: Calculate the average number of mistakes made by employees (avg_error_count_per_empl).

In [45]:
%%sql

WITH avgerage_error_count AS (
    SELECT
        v.location_id,
        v.record_id,
        e.employee_name,
        ar.true_water_source_score AS auditor_score,
        wq.subjective_quality_score AS employee_score
    FROM
        visits v
    JOIN
        water_source ws ON v.source_id = ws.source_id
    JOIN
        auditor_report ar ON v.location_id = ar.location_id
    JOIN
        water_quality wq ON v.record_id = wq.record_id
    JOIN
        employee e ON v.assigned_employee_id = e.assigned_employee_id
    WHERE
        ar.true_water_source_score != wq.subjective_quality_score
        AND v.visit_count = 1
),
avg_error_count_per_empl AS (
    SELECT
        AVG(number_of_mistakes) AS avg_error_count
    FROM (
        SELECT
            employee_name,
            COUNT(*) AS number_of_mistakes
        FROM
            avgerage_error_count
        GROUP BY
            employee_name
    ) AS subquery -- Provide an alias for the subquery
)
SELECT
    avg_error_count
FROM
    avg_error_count_per_empl;

 * mysql+pymysql://root:***@localhost:3306/md_water_services
1 rows affected.


avg_error_count
6.0


## Step 3: Compare each employee's error_count with the average number of mistakes (suspect_list).

In [46]:
%%sql

WITH suspect_list AS (
    SELECT
        v.location_id,
        v.record_id,
        e.employee_name,
        ar.true_water_source_score AS auditor_score,
        wq.subjective_quality_score AS employee_score
    FROM
        visits v
    JOIN
        water_source ws ON v.source_id = ws.source_id
    JOIN
        auditor_report ar ON v.location_id = ar.location_id
    JOIN
        water_quality wq ON v.record_id = wq.record_id
    JOIN
        employee e ON v.assigned_employee_id = e.assigned_employee_id
    WHERE
        ar.true_water_source_score != wq.subjective_quality_score
        AND v.visit_count = 1
),
avg_error_count_per_empl AS (
    SELECT
        AVG(number_of_mistakes) AS avg_error_count
    FROM (
        SELECT
            employee_name,
            COUNT(*) AS number_of_mistakes
        FROM
            suspect_list
        GROUP BY
            employee_name
    ) AS subquery
)
SELECT
    employee_name,
    number_of_mistakes
FROM (
    SELECT
        employee_name,
        COUNT(*) AS number_of_mistakes
    FROM
        suspect_list
    GROUP BY
        employee_name
) AS subquery
JOIN
    avg_error_count_per_empl ON number_of_mistakes > avg_error_count
ORDER BY
    number_of_mistakes DESC;

 * mysql+pymysql://root:***@localhost:3306/md_water_services
4 rows affected.


employee_name,number_of_mistakes
Bello Azibo,26
Malachi Mavuso,21
Zuriel Matembo,17
Lalitha Kaburi,7


## Step 4: Firstly, let's add the statements column to the Incorrect_records CTE. Then pull up all of the records where the employee_name is in the suspect list.

In [52]:
%%sql

WITH suspect_list AS (
    SELECT
        v.location_id,
        v.record_id,
        e.employee_name,
        ar.true_water_source_score AS auditor_score,
        wq.subjective_quality_score AS employee_score
    FROM
        visits v
    JOIN
        water_source ws ON v.source_id = ws.source_id
    JOIN
        auditor_report ar ON v.location_id = ar.location_id
    JOIN
        water_quality wq ON v.record_id = wq.record_id
    JOIN
        employee e ON v.assigned_employee_id = e.assigned_employee_id
    WHERE
        ar.true_water_source_score != wq.subjective_quality_score
        AND v.visit_count = 1
),
Incorrect_records AS (
    SELECT
        v.location_id,
        v.record_id,
        e.employee_name,
        ar.true_water_source_score AS auditor_score,
        wq.subjective_quality_score AS employee_score,
        ar.statements
    FROM
        visits v
    JOIN
        water_source ws ON v.source_id = ws.source_id
    JOIN
        auditor_report ar ON v.location_id = ar.location_id
    JOIN
        water_quality wq ON v.record_id = wq.record_id
    JOIN
        employee e ON v.assigned_employee_id = e.assigned_employee_id
    WHERE
        ar.true_water_source_score != wq.subjective_quality_score
        AND v.visit_count = 1
)
SELECT
    i.employee_name,
    i.location_id,
    i.statements
FROM
    Incorrect_records i
JOIN
    suspect_list s ON i.employee_name = s.employee_name
WHERE
    i.employee_name IN ('Bello Azibo', 'Malachi Mavuso', 'Zuriel Matembo', 'Lalitha Kaburi')
ORDER BY
    i.employee_name, i.location_id;

 * mysql+pymysql://root:***@localhost:3306/md_water_services
1455 rows affected.


employee_name,location_id,statements
Bello Azibo,AkHa00363,"A local mason's strong back is bent by the wait for water, his concerns for quality reflecting a broader challenge."
Bello Azibo,AkHa00363,"A local mason's strong back is bent by the wait for water, his concerns for quality reflecting a broader challenge."
Bello Azibo,AkHa00363,"A local mason's strong back is bent by the wait for water, his concerns for quality reflecting a broader challenge."
Bello Azibo,AkHa00363,"A local mason's strong back is bent by the wait for water, his concerns for quality reflecting a broader challenge."
Bello Azibo,AkHa00363,"A local mason's strong back is bent by the wait for water, his concerns for quality reflecting a broader challenge."
Bello Azibo,AkHa00363,"A local mason's strong back is bent by the wait for water, his concerns for quality reflecting a broader challenge."
Bello Azibo,AkHa00363,"A local mason's strong back is bent by the wait for water, his concerns for quality reflecting a broader challenge."
Bello Azibo,AkHa00363,"A local mason's strong back is bent by the wait for water, his concerns for quality reflecting a broader challenge."
Bello Azibo,AkHa00363,"A local mason's strong back is bent by the wait for water, his concerns for quality reflecting a broader challenge."
Bello Azibo,AkHa00363,"A local mason's strong back is bent by the wait for water, his concerns for quality reflecting a broader challenge."


## Step 5: If you have a look, you will notice some alarming statements about these four officials (look at these records: AkRu04508, AkRu07310, KiRu29639, AmAm09607), for example. See how the word "cash" is used a lot in these statements.

In [53]:
%%sql

WITH suspect_list AS (
    SELECT
        v.location_id,
        v.record_id,
        e.employee_name,
        ar.true_water_source_score AS auditor_score,
        wq.subjective_quality_score AS employee_score
    FROM
        visits v
    JOIN
        water_source ws ON v.source_id = ws.source_id
    JOIN
        auditor_report ar ON v.location_id = ar.location_id
    JOIN
        water_quality wq ON v.record_id = wq.record_id
    JOIN
        employee e ON v.assigned_employee_id = e.assigned_employee_id
    WHERE
        ar.true_water_source_score != wq.subjective_quality_score
        AND v.visit_count = 1
),
Incorrect_records AS (
    SELECT
        v.location_id,
        v.record_id,
        e.employee_name,
        ar.true_water_source_score AS auditor_score,
        wq.subjective_quality_score AS employee_score,
        ar.statements
    FROM
        visits v
    JOIN
        water_source ws ON v.source_id = ws.source_id
    JOIN
        auditor_report ar ON v.location_id = ar.location_id
    JOIN
        water_quality wq ON v.record_id = wq.record_id
    JOIN
        employee e ON v.assigned_employee_id = e.assigned_employee_id
    WHERE
        ar.true_water_source_score != wq.subjective_quality_score
        AND v.visit_count = 1
)
SELECT
    i.employee_name,
    i.location_id,
    i.statements
FROM
    Incorrect_records i
JOIN
    suspect_list s ON i.employee_name = s.employee_name
WHERE
    i.employee_name IN ('Bello Azibo', 'Malachi Mavuso', 'Zuriel Matembo', 'Lalitha Kaburi')
    AND i.location_id IN ('AkRu04508', 'AkRu07310', 'KiRu29639', 'AmAm09607')
ORDER BY
    i.employee_name, i.location_id;

 * mysql+pymysql://root:***@localhost:3306/md_water_services
80 rows affected.


employee_name,location_id,statements
Bello Azibo,AkRu04508,"An unsettling atmosphere surrounded the official, as villagers shared their experiences of arrogance and lack of dedication. The mention of cash exchanges only intensified their doubts."
Bello Azibo,AkRu04508,"An unsettling atmosphere surrounded the official, as villagers shared their experiences of arrogance and lack of dedication. The mention of cash exchanges only intensified their doubts."
Bello Azibo,AkRu04508,"An unsettling atmosphere surrounded the official, as villagers shared their experiences of arrogance and lack of dedication. The mention of cash exchanges only intensified their doubts."
Bello Azibo,AkRu04508,"An unsettling atmosphere surrounded the official, as villagers shared their experiences of arrogance and lack of dedication. The mention of cash exchanges only intensified their doubts."
Bello Azibo,AkRu04508,"An unsettling atmosphere surrounded the official, as villagers shared their experiences of arrogance and lack of dedication. The mention of cash exchanges only intensified their doubts."
Bello Azibo,AkRu04508,"An unsettling atmosphere surrounded the official, as villagers shared their experiences of arrogance and lack of dedication. The mention of cash exchanges only intensified their doubts."
Bello Azibo,AkRu04508,"An unsettling atmosphere surrounded the official, as villagers shared their experiences of arrogance and lack of dedication. The mention of cash exchanges only intensified their doubts."
Bello Azibo,AkRu04508,"An unsettling atmosphere surrounded the official, as villagers shared their experiences of arrogance and lack of dedication. The mention of cash exchanges only intensified their doubts."
Bello Azibo,AkRu04508,"An unsettling atmosphere surrounded the official, as villagers shared their experiences of arrogance and lack of dedication. The mention of cash exchanges only intensified their doubts."
Bello Azibo,AkRu04508,"An unsettling atmosphere surrounded the official, as villagers shared their experiences of arrogance and lack of dedication. The mention of cash exchanges only intensified their doubts."


## Step 6: Let's check if there are any employees in the Incorrect_records table with statements mentioning "cash" that are not in our suspect list

In [55]:
%%sql

WITH suspect_list AS (
    SELECT
        v.location_id,
        v.record_id,
        e.employee_name,
        ar.true_water_source_score AS auditor_score,
        wq.subjective_quality_score AS employee_score
    FROM
        visits v
    JOIN
        water_source ws ON v.source_id = ws.source_id
    JOIN
        auditor_report ar ON v.location_id = ar.location_id
    JOIN
        water_quality wq ON v.record_id = wq.record_id
    JOIN
        employee e ON v.assigned_employee_id = e.assigned_employee_id
    WHERE
        ar.true_water_source_score != wq.subjective_quality_score
        AND v.visit_count = 1
),
Incorrect_records AS (
    SELECT
        v.location_id,
        v.record_id,
        e.employee_name,
        ar.true_water_source_score AS auditor_score,
        wq.subjective_quality_score AS employee_score,
        ar.statements
    FROM
        visits v
    JOIN
        water_source ws ON v.source_id = ws.source_id
    JOIN
        auditor_report ar ON v.location_id = ar.location_id
    JOIN
        water_quality wq ON v.record_id = wq.record_id
    JOIN
        employee e ON v.assigned_employee_id = e.assigned_employee_id
    WHERE
        ar.true_water_source_score != wq.subjective_quality_score
        AND v.visit_count = 1
)
SELECT
    i.employee_name,
    i.location_id,
    i.statements
FROM
    Incorrect_records i
JOIN
    suspect_list s ON i.employee_name = s.employee_name
WHERE
    i.employee_name IN ('Bello Azibo', 'Malachi Mavuso', 'Zuriel Matembo', 'Lalitha Kaburi')
    AND i.location_id NOT IN ('AkRu04508', 'AkRu07310', 'KiRu29639', 'AmAm09607')
    AND i.statements LIKE '%cash%'
ORDER BY
    i.employee_name, i.location_id;

 * mysql+pymysql://root:***@localhost:3306/md_water_services
311 rows affected.


employee_name,location_id,statements
Bello Azibo,AkRu05741,"An air of mistrust surrounded the official, as villagers spoke of laziness and hints of corruption. The mention of cash passing discreetly only deepened their concerns."
Bello Azibo,AkRu05741,"An air of mistrust surrounded the official, as villagers spoke of laziness and hints of corruption. The mention of cash passing discreetly only deepened their concerns."
Bello Azibo,AkRu05741,"An air of mistrust surrounded the official, as villagers spoke of laziness and hints of corruption. The mention of cash passing discreetly only deepened their concerns."
Bello Azibo,AkRu05741,"An air of mistrust surrounded the official, as villagers spoke of laziness and hints of corruption. The mention of cash passing discreetly only deepened their concerns."
Bello Azibo,AkRu05741,"An air of mistrust surrounded the official, as villagers spoke of laziness and hints of corruption. The mention of cash passing discreetly only deepened their concerns."
Bello Azibo,AkRu05741,"An air of mistrust surrounded the official, as villagers spoke of laziness and hints of corruption. The mention of cash passing discreetly only deepened their concerns."
Bello Azibo,AkRu05741,"An air of mistrust surrounded the official, as villagers spoke of laziness and hints of corruption. The mention of cash passing discreetly only deepened their concerns."
Bello Azibo,AkRu05741,"An air of mistrust surrounded the official, as villagers spoke of laziness and hints of corruption. The mention of cash passing discreetly only deepened their concerns."
Bello Azibo,AkRu05741,"An air of mistrust surrounded the official, as villagers spoke of laziness and hints of corruption. The mention of cash passing discreetly only deepened their concerns."
Bello Azibo,AkRu05741,"An air of mistrust surrounded the official, as villagers spoke of laziness and hints of corruption. The mention of cash passing discreetly only deepened their concerns."


## Step 7: Let's check if there are any employees in the Incorrect_records table with statements mentioning "cash" that are not in our suspect list.

In [57]:
%%sql

WITH suspect_list AS (
    SELECT
        v.location_id,
        v.record_id,
        e.employee_name,
        ar.true_water_source_score AS auditor_score,
        wq.subjective_quality_score AS employee_score
    FROM
        visits v
    JOIN
        water_source ws ON v.source_id = ws.source_id
    JOIN
        auditor_report ar ON v.location_id = ar.location_id
    JOIN
        water_quality wq ON v.record_id = wq.record_id
    JOIN
        employee e ON v.assigned_employee_id = e.assigned_employee_id
    WHERE
        ar.true_water_source_score != wq.subjective_quality_score
        AND v.visit_count = 1
),
Incorrect_records AS (
    SELECT
        v.location_id,
        v.record_id,
        e.employee_name,
        ar.true_water_source_score AS auditor_score,
        wq.subjective_quality_score AS employee_score,
        ar.statements
    FROM
        visits v
    JOIN
        water_source ws ON v.source_id = ws.source_id
    JOIN
        auditor_report ar ON v.location_id = ar.location_id
    JOIN
        water_quality wq ON v.record_id = wq.record_id
    JOIN
        employee e ON v.assigned_employee_id = e.assigned_employee_id
    WHERE
        ar.true_water_source_score != wq.subjective_quality_score
        AND v.visit_count = 1
)
SELECT
    i.employee_name,
    i.location_id,
    i.statements
FROM
    Incorrect_records i
JOIN
    suspect_list s ON i.employee_name = s.employee_name
WHERE
    i.employee_name NOT IN ('Bello Azibo', 'Malachi Mavuso', 'Zuriel Matembo', 'Lalitha Kaburi')
    AND i.location_id NOT IN ('AkRu04508', 'AkRu07310', 'KiRu29639', 'AmAm09607')
    AND i.statements LIKE '%cash%'
ORDER BY
    i.employee_name, i.location_id;

 * mysql+pymysql://root:***@localhost:3306/md_water_services
0 rows affected.


employee_name,location_id,statements
