## Connecting to our MySQL database

Using the 'md_water_services' database we created in MySQL Workbench, we want to answer some questions about our dataset. We can apply the same queries we used in MySQL Workbench in this notebook if we connect to our MySQL server by running the cells below.

In [2]:
# Load and activate the SQL extension to allows us to execute SQL in a Jupyter notebook.
# If you get an error here, make sure that mysql and pymysql is installed correctly.

%load_ext sql

In [3]:
# Establish a connection to the local database using the '%sql' magic command,
# Replace 'password' with our connection password and `db_name` with our database name.
# If you get an error here, please make sure the database name or password is correct.

%sql mysql+pymysql://root:paschalugwu@localhost:3306/md_water_services

# PROJECT PHASE 0: Setting the stage for our data exploration journey - Introduction.

I am thrilled to share with you the progress of my project, "Maji Ndogo: From Analysis to Action." This project aims to weave the data threads of Maji Ndogo's narrative, focusing on the analysis and action required to address the water challenges in our country. 
 
In the initial stages, I have set the stage for our data exploration journey by understanding the database structure and generating an Entity-Relationship Diagram (ERD). This allows us to visualize the relationships between different data entities and gain a comprehensive understanding of the database. 
 
Furthermore, I have integrated the auditor's report into our database, ensuring that the findings and recommendations are incorporated into our decision-making process. This step is crucial in maintaining data integrity and ensuring that our actions are based on accurate and reliable information. 
 
To link records and gain deeper insights, I have joined the employee data with the audit report. This enables us to analyze the data from multiple perspectives and identify areas for improvement. 
 
As part of this project, I have also gathered evidence by building complex queries to seek the truth. By leveraging the power of data analysis and querying, we can uncover hidden patterns, identify trends, and make informed decisions to drive positive change. 
 
I am excited about the progress we have made so far and the potential impact this project can have on Maji Ndogo and our community. It is my firm belief that through data-driven decision-making, we can address the challenges we face and create a sustainable water future. 

# PROJECT PHASE 1: Understanding the database structure through ERD generation.

At this stage of the project, we have made significant progress in analyzing and integrating the data for the Maji Ndogo water project. We have conducted an independent audit of the database, specifically focusing on the water sources recorded in our country. The objective of this audit was to assess the integrity and accuracy of the data stored in the database.

After a thorough examination of the database's records and the associated data entry and modification procedures, we can confirm that the majority of the data aligns with the principles of good governance and data-driven decision-making. This is a testament to the commitment and efforts put forth by our team.

However, during the audit, we did identify some instances where the data was tampered with. These findings require immediate attention to ensure the integrity of the database and the reliability of the information it provides for decision-making and governance.

In the upcoming stage, we will be integrating the auditor's report into the database, which will add valuable insights and recommendations for further improvement. This integration will enhance the overall functionality and effectiveness of the database.

Before proceeding with the addition of the ERD diagram, it is essential to review the auditor's report and address any issues highlighted. This will ensure that the ERD accurately represents the current state of the database and reflects the necessary modifications based on the audit findings.

By incorporating the ERD diagram, we will have a visual representation of the database structure and its relationships, providing a clearer understanding of the data architecture. This will serve as a valuable reference for future analysis, decision-making, and communication with stakeholders.

Let's proceed with adding the ERD diagram and continue our journey towards transforming Maji Ndogo's narrative into actionable insights and positive change.

![ERD Diagram Without Auditor Report](MajiNdogoModel.png)

# PROJECT PHASE 2: Integrating the Auditor's report

![ERD Diagram With Auditor Report](maji_ndogo_model_AND_AuditorReport.png)

## Step 1: Begin by creating a new SQL query to drop the  auditor_report  table if it already exists and create a new table with the specified columns.

In [3]:
%%sql

DROP TABLE IF EXISTS `auditor_report`;
CREATE TABLE `auditor_report` (
  `location_id` VARCHAR(32),
  `type_of_water_source` VARCHAR(64),
  `true_water_source_score` INT DEFAULT NULL,
  `statements` VARCHAR(255)
);

 * mysql+pymysql://root:***@localhost:3306/md_water_services


0 rows affected.
0 rows affected.


[]

### Lets view what we just imported.

In [21]:
%%sql

SELECT *
FROM auditor_report LIMIT 5;

 * mysql+pymysql://root:***@localhost:3306/md_water_services
5 rows affected.


location_id,type_of_water_source,true_water_source_score,statements
SoRu34980,well,1,"Residents admired the official's commitment to enhancing urban life, praising their cooperative and inclusive approach."
AkRu08112,well,3,"Villagers spoke highly of the official's dedication and genuine interest in their lives, fostering a sense of belonging and appreciation."
AkLu02044,river,0,"Villagers were touched by the official's interactions, noting their humility, strong work ethic, and respectful attitude."
AkHa00421,well,3,"Villagers were moved by the official's visit, praising their hard work, humility, and the profound sense of connection they fostered."
SoRu35221,river,0,"A photographer's lens captures the queue, though his own struggle for water is a hidden part of the story."


## Step 2: To compare the quality scores between the  auditor_report  table and the  water_quality  table, let us perform a join operation using the  visits  table as the intermediary.

In [22]:
%%sql

SELECT
  auditor_report.location_id AS audit_location,
  auditor_report.true_water_source_score,
  visits.location_id AS visit_location,
  visits.record_id
FROM
  auditor_report
JOIN
  visits ON auditor_report.location_id = visits.location_id
LIMIT 5;

 * mysql+pymysql://root:***@localhost:3306/md_water_services
5 rows affected.


audit_location,true_water_source_score,visit_location,record_id
SoRu34980,1,SoRu34980,5185
AkRu08112,3,AkRu08112,59367
AkLu02044,0,AkLu02044,37379
AkHa00421,3,AkHa00421,51627
SoRu35221,0,SoRu35221,28758


## Step 3:  To retrieve the corresponding scores from the  water_quality  table, perform another join operation using the  record_id  as the connecting key

In [23]:
%%sql

SELECT
  auditor_report.location_id AS audit_location,
  auditor_report.true_water_source_score,
  visits.location_id AS visit_location,
  visits.record_id,
  water_quality.subjective_quality_score
FROM
  auditor_report
JOIN
  visits ON auditor_report.location_id = visits.location_id
JOIN
  water_quality ON visits.record_id = water_quality.record_id
LIMIT 5;

 * mysql+pymysql://root:***@localhost:3306/md_water_services
5 rows affected.


audit_location,true_water_source_score,visit_location,record_id,subjective_quality_score
SoRu34980,1,SoRu34980,5185,1
AkRu08112,3,AkRu08112,59367,3
AkLu02044,0,AkLu02044,37379,0
AkHa00421,3,AkHa00421,51627,3
SoRu35221,0,SoRu35221,28758,0


## Step 4: Clean up the resulting table by removing duplicate columns and renaming them for clarity.

In [24]:
%%sql

SELECT
  visits.location_id AS location_id,
  visits.record_id,
  auditor_report.true_water_source_score AS auditor_score,
  water_quality.subjective_quality_score AS employee_score
FROM
  auditor_report
JOIN
  visits ON auditor_report.location_id = visits.location_id
JOIN
  water_quality ON visits.record_id = water_quality.record_id
LIMIT 5;

 * mysql+pymysql://root:***@localhost:3306/md_water_services
5 rows affected.


location_id,record_id,auditor_score,employee_score
SoRu34980,5185,1,1
AkRu08112,59367,3,3
AkLu02044,37379,0,0
AkHa00421,51627,3,3
SoRu35221,28758,0,0


## Step 5: To check if the auditor's and employee's scores agree, add a WHERE clause to compare the scores.

In [25]:
%%sql

SELECT
  visits.location_id AS location_id,
  visits.record_id,
  auditor_report.true_water_source_score AS auditor_score,
  water_quality.subjective_quality_score AS employee_score
FROM
  auditor_report
JOIN
  visits ON auditor_report.location_id = visits.location_id
JOIN
  water_quality ON visits.record_id = water_quality.record_id
WHERE
  auditor_report.true_water_source_score = water_quality.subjective_quality_score
LIMIT 5;

 * mysql+pymysql://root:***@localhost:3306/md_water_services
5 rows affected.


location_id,record_id,auditor_score,employee_score
SoRu34980,5185,1,1
AkRu08112,59367,3,3
AkLu02044,37379,0,0
AkHa00421,51627,3,3
SoRu35221,28758,0,0


## Step 6: To remove duplicates, add  visits.visit_count = 1  in the WHERE clause.

In [26]:
%%sql

SELECT
  visits.location_id AS location_id,
  visits.record_id,
  auditor_report.true_water_source_score AS auditor_score,
  water_quality.subjective_quality_score AS employee_score
FROM
  auditor_report
JOIN
  visits ON auditor_report.location_id = visits.location_id
JOIN
  water_quality ON visits.record_id = water_quality.record_id
WHERE
  auditor_report.true_water_source_score = water_quality.subjective_quality_score
  AND visits.visit_count = 1
LIMIT 5;

 * mysql+pymysql://root:***@localhost:3306/md_water_services
5 rows affected.


location_id,record_id,auditor_score,employee_score
SoRu34980,5185,1,1
AkRu08112,59367,3,3
AkLu02044,37379,0,0
AkHa00421,51627,3,3
SoRu35221,28758,0,0


## Step 7:  To analyze the records that are incorrect, modify the WHERE clause to check if the scores are not equal.

In [27]:
%%sql

SELECT
  visits.location_id AS location_id,
  visits.record_id,
  auditor_report.true_water_source_score AS auditor_score,
  water_quality.subjective_quality_score AS employee_score
FROM
  auditor_report
JOIN
  visits ON auditor_report.location_id = visits.location_id
JOIN
  water_quality ON visits.record_id = water_quality.record_id
WHERE
  auditor_report.true_water_source_score != water_quality.subjective_quality_score
  AND visits.visit_count = 1
LIMIT 5;

 * mysql+pymysql://root:***@localhost:3306/md_water_services
5 rows affected.


location_id,record_id,auditor_score,employee_score
AkRu05215,21160,3,10
KiRu29290,7938,3,10
KiHa22748,43140,9,10
SoRu37841,18495,6,10
KiRu27884,33931,1,10


## Step 8: To check if there are any errors in the  type_of_water_source  column, join the  auditor_report  table with the  water_source  table using the  source_id  as the connecting key.

In [28]:
%%sql

SELECT 
    v.location_id, 
    ar.type_of_water_source AS auditor_source, 
    ws.type_of_water_source AS survey_source, 
    v.record_id, 
    ar.true_water_source_score AS auditor_score,
    wq.subjective_quality_score AS employee_score
FROM 
    visits v
JOIN 
    water_source ws ON v.source_id = ws.source_id
JOIN 
    auditor_report ar ON v.location_id = ar.location_id
JOIN 
    water_quality wq ON v.record_id = wq.record_id
WHERE
  ar.true_water_source_score != wq.subjective_quality_score
  AND v.visit_count = 1
LIMIT 5;

 * mysql+pymysql://root:***@localhost:3306/md_water_services
5 rows affected.


location_id,auditor_source,survey_source,record_id,auditor_score,employee_score
AkRu05215,well,well,21160,3,10
KiRu29290,shared_tap,shared_tap,7938,3,10
KiHa22748,tap_in_home_broken,tap_in_home_broken,43140,9,10
SoRu37841,shared_tap,shared_tap,18495,6,10
KiRu27884,well,well,33931,1,10


# PROJECT PHASE 4: Linking records to employees

## Step 1: Join the assigned_employee_id for all the people on the list from the visits table to the query. This will help identify the employees responsible for the incorrect records.

In [29]:
%%sql

SELECT 
    v.location_id, 
    v.record_id,
    v.assigned_employee_id, 
    ar.true_water_source_score AS auditor_score,
    wq.subjective_quality_score AS employee_score
FROM 
    visits v
JOIN 
    water_source ws ON v.source_id = ws.source_id
JOIN 
    auditor_report ar ON v.location_id = ar.location_id
JOIN 
    water_quality wq ON v.record_id = wq.record_id
WHERE
    ar.true_water_source_score != wq.subjective_quality_score
    AND v.visit_count = 1
LIMIT 5;

 * mysql+pymysql://root:***@localhost:3306/md_water_services
5 rows affected.


location_id,record_id,assigned_employee_id,auditor_score,employee_score
AkRu05215,21160,34,3,10
KiRu29290,7938,1,3,10
KiHa22748,43140,1,9,10
SoRu37841,18495,34,6,10
KiRu27884,33931,1,1,10


## Step 2: To fetch the names of the employees who recorded the incorrect records, you can join the  employees  table with the previous query. Here's an example query to accomplish this.

In [30]:
%%sql

SELECT
    v.location_id,
    v.record_id,
    e.employee_name,
    ar.true_water_source_score AS auditor_score,
    wq.subjective_quality_score AS employee_score
FROM
    visits v
JOIN
    water_source ws ON v.source_id = ws.source_id
JOIN
    auditor_report ar ON v.location_id = ar.location_id
JOIN
    water_quality wq ON v.record_id = wq.record_id
JOIN
    employee e ON v.assigned_employee_id = e.assigned_employee_id
WHERE
    ar.true_water_source_score != wq.subjective_quality_score
    AND v.visit_count = 1
LIMIT 5;

 * mysql+pymysql://root:***@localhost:3306/md_water_services
5 rows affected.


location_id,record_id,employee_name,auditor_score,employee_score
AkRu05215,21160,Rudo Imani,3,10
KiRu29290,7938,Bello Azibo,3,10
KiHa22748,43140,Bello Azibo,9,10
SoRu37841,18495,Rudo Imani,6,10
KiRu27884,33931,Bello Azibo,1,10
