#  Maji Ndogo Project – Part 3: Integrating the Auditor’s Report

In this phase, we integrate the **auditor’s findings** with our internal `md_water_services` database.  
To ensure accurate comparison, we must understand the **relationships** between tables — especially around employees, site visits, and water source quality.



##  Objectives

- Connect to the `md_water_services` database  
- Review the database schema and relationships  
- Identify how employee and visit data link to water sources  
- Integrate the **auditor’s report** for comparison  
- Connect the report to the **evidence gathered** during field visits  



##  Step 1: Connect to the Database
We start by establishing a connection to our local MySQL database using the `ipython-sql` extension.


In [11]:
%reload_ext sql

In [12]:
%sql mysql+pymysql://root:password@localhost/md_water_services

##  Step 2: View All Tables in the Database
Let’s list all available tables in the `md_water_services` schema.


In [14]:
%%sql
SHOW TABLES;


 * mysql+pymysql://root:***@localhost/md_water_services
   sqlite:///dam_levels-a-3987.db
8 rows affected.


Tables_in_md_water_services
data_dictionary
employee
global_water_access
location
visits
water_quality
water_source
well_pollution


Our database contains the following tables:

- `data_dictionary` – Describes each dataset and its contents  
- `employee` – Records details about employees responsible for inspections or maintenance  
- `global_water_access` – Summary of access metrics for different regions  
- `location` – Defines province, district, and town information  
- `visits` – Field visits conducted by employees and/or auditors  
- `water_quality` – Chemical and microbial results for sampled water sources  
- `water_source` – All registered sources (boreholes, wells, taps, tanks)  
- `well_pollution` – Data on contamination and potential pollution risks  


##  Step 3: Inspect Key Table Structures
We'll now examine a few tables that are central to integrating the auditor's report.


In [15]:
%%sql
DESCRIBE employee;

 * mysql+pymysql://root:***@localhost/md_water_services
   sqlite:///dam_levels-a-3987.db
8 rows affected.


Field,Type,Null,Key,Default,Extra
assigned_employee_id,int,NO,PRI,,
employee_name,varchar(255),YES,,,
phone_number,varchar(15),YES,,,
email,varchar(255),YES,,,
address,varchar(255),YES,,,
province_name,varchar(255),YES,,,
town_name,varchar(255),YES,,,
position,varchar(255),YES,,,


In [16]:
%%sql
DESCRIBE visits;

 * mysql+pymysql://root:***@localhost/md_water_services
   sqlite:///dam_levels-a-3987.db
7 rows affected.


Field,Type,Null,Key,Default,Extra
record_id,int,NO,PRI,,
location_id,varchar(255),YES,MUL,,
source_id,varchar(510),YES,MUL,,
time_of_record,datetime,YES,,,
visit_count,int,YES,,,
time_in_queue,int,YES,,,
assigned_employee_id,int,YES,MUL,,


In [17]:
%%sql
DESCRIBE water_source;

 * mysql+pymysql://root:***@localhost/md_water_services
   sqlite:///dam_levels-a-3987.db
3 rows affected.


Field,Type,Null,Key,Default,Extra
source_id,varchar(510),NO,PRI,,
type_of_water_source,varchar(255),YES,,,
number_of_people_served,int,YES,,,


In [18]:
%%sql
DESCRIBE water_quality;

 * mysql+pymysql://root:***@localhost/md_water_services
   sqlite:///dam_levels-a-3987.db
3 rows affected.


Field,Type,Null,Key,Default,Extra
record_id,int,NO,PRI,,
subjective_quality_score,int,YES,,,
visit_count,int,YES,,,


In [19]:
%%sql
DESCRIBE well_pollution;

 * mysql+pymysql://root:***@localhost/md_water_services
   sqlite:///dam_levels-a-3987.db
6 rows affected.


Field,Type,Null,Key,Default,Extra
source_id,varchar(258),YES,MUL,,
date,datetime,YES,,,
description,varchar(255),YES,,,
pollutant_ppm,float,YES,,,
biological,float,YES,,,
results,varchar(255),YES,,,


From this, we can identify key relationships such as:

- `visits.employee_id` → links to `employee.employee_id`  
- `visits.source_id` → links to `water_source.source_id`  
- `water_quality.source_id` → links to `water_source.source_id`  
- `well_pollution.source_id` → links to `water_source.source_id`

These connections will be critical when merging internal visit data with the auditor’s findings.


##  Step 4: Visualize the Entity Relationship Diagram (ERD)

Before writing integration queries, we visualize how the tables relate.  
The ERD below illustrates the relationships between employees, visits, water sources, and quality data.

![ERD Diagram](Downloads/erd_md_water_services.png)

*Figure 1: The ERD shows how employee, visits, water_source, water_quality, well_pollution, and location tables are connected.*

If you’re generating your own ERD:
1. Open **MySQL Workbench**  
2. Navigate to **Database → Reverse Engineer**  
3. Select the `md_water_services` schema  
4. Generate and export the ERD as `erd_md_water_services.png`



##  Step 4: Download and Load the Auditor’s Report

We have been provided with an **auditor’s report** (e.g., `auditors_report.csv`).
We’ll download it, inspect it, and load it into MySQL as a new table called `auditors_report`.

This dataset contains:
- `report_id`
- `visit_id`
- `employee_name`
- `date_of_audit`
- `audit_comments`
- `findings`
- `evidence_collected`
- `status`

Let’s load it into MySQL.



##  Step 5: Integrate the Auditor’s Report with Internal Data

Now that the **Auditor’s Report** has been successfully loaded into the database,  
we can integrate it with our internal operational tables to establish a unified data view.

This integration will help us:
- Link auditor findings to **specific employees** and **visits**.  
- Associate audits with **locations** and **water sources**.  
- Enable performance analysis and data-driven insights for future decision-making.

###  Tables Involved
| Table Name | Purpose | Key Fields |
|-------------|----------|-------------|
| `auditors_report` | Contains external audit evaluations and feedback. | `location_id`, `type_of_water_source`, `true_water_source_score`, `statements` |
| `employee` | Lists field employees conducting inspections and maintenance. | `assigned_employee_id`, `employee_name`, `position`, `town_name` |
| `visits` | Tracks field visits, employee assignments, and timestamps. | `source_id`, `assigned_employee_id`, `visit_count`, `time_of_record` |
| `water_source` | Describes physical water points and their usage. | `source_id`, `type_of_water_source`, `number_of_people_served` |
| `location` | Geographical and administrative information. | `location_id`, `town_name`, `province_name` |



###  Create a Unified Data View

We will create a SQL **VIEW** named `audit_employee_source_view` that consolidates data across these tables.

This view links:
- The **auditor’s report** with **locations** (`location_id`),
- **Water sources** using their **type**,
- **Visits** using `source_id`,
- And **employees** using `assigned_employee_id`.

This will allow consistent access to integrated audit information for analysis in Step 6.




In [123]:
%%sql

CREATE OR REPLACE VIEW audit_employee_source_view AS
SELECT 
    ar.location_id,
    ar.type_of_water_source AS audited_water_source,
    ar.true_water_source_score AS audit_score,
    ar.statements AS auditor_feedback,
    l.town_name,
    l.province_name,
    ws.source_id,
    ws.number_of_people_served,
    v.time_of_record AS visit_date,
    v.visit_count,
    e.employee_name,
    e.position,
    e.town_name AS employee_town
FROM auditors_report ar
JOIN location l 
    ON ar.location_id = l.location_id
JOIN water_source ws 
    ON ar.type_of_water_source = ws.type_of_water_source
JOIN visits v 
    ON ws.source_id = v.source_id
JOIN employee e 
    ON v.assigned_employee_id = e.assigned_employee_id;


 * mysql+pymysql://root:***@localhost/md_water_services
   sqlite:///dam_levels-a-3987.db
0 rows affected.


[]

In [124]:
%%sql
SHOW FULL TABLES WHERE TABLE_TYPE = 'VIEW';

 * mysql+pymysql://root:***@localhost/md_water_services
   sqlite:///dam_levels-a-3987.db
1 rows affected.


Tables_in_md_water_services,Table_type
audit_employee_source_view,VIEW


In [125]:
%%sql
SELECT * 
FROM audit_employee_source_view
LIMIT 10;

 * mysql+pymysql://root:***@localhost/md_water_services
   sqlite:///dam_levels-a-3987.db
10 rows affected.


location_id,audited_water_source,audit_score,auditor_feedback,town_name,province_name,source_id,number_of_people_served,visit_date,visit_count,employee_name,position,employee_town
HaDe16549,tap_in_home_broken,9,"Residents were moved by the official's proactive approach to urban development, praising their hard work and involvement.",Deka,Hawassa,AkHa00001224,930,2021-05-27 13:26:00,1,Bello Azibo,Field Surveyor,Rural
KiIs23493,tap_in_home_broken,9,"With an approachable demeanor, the official created an atmosphere of warmth and mutual respect that the villagers appreciated.",Isiqalo,Kilimani,AkHa00001224,930,2021-05-27 13:26:00,1,Bello Azibo,Field Surveyor,Rural
HaRu18849,tap_in_home_broken,9,"The official's respectful and hardworking nature left a positive mark on the villagers, evoking a strong sense of connection.",Rural,Hawassa,AkHa00001224,930,2021-05-27 13:26:00,1,Bello Azibo,Field Surveyor,Rural
KiIs23726,tap_in_home_broken,9,"With sincerity, the official created a sense of unity and familial connection that left a lasting, positive impression on the villagers.",Isiqalo,Kilimani,AkHa00001224,930,2021-05-27 13:26:00,1,Bello Azibo,Field Surveyor,Rural
AmPw12595,tap_in_home_broken,9,The official's interactions were marked by genuine kindness and a desire to make a positive impact on the villagers' lives.,Pwani,Amanzi,AkHa00001224,930,2021-05-27 13:26:00,1,Bello Azibo,Field Surveyor,Rural
SoBa31815,tap_in_home_broken,9,"A local tailor's creativity is stifled by frustration, his work impacted by the corruption that taints the water.",Bahari,Sokoto,AkHa00001224,930,2021-05-27 13:26:00,1,Bello Azibo,Field Surveyor,Rural
SoRu38122,tap_in_home_broken,9,"The official's interactions resonated deeply with the villagers, leaving a lasting impression of respect and camaraderie.",Rural,Sokoto,AkHa00001224,930,2021-05-27 13:26:00,1,Bello Azibo,Field Surveyor,Rural
HaSe21267,tap_in_home_broken,9,"With sincerity, the official created a sense of unity and familial connection that left a lasting, positive impression on the villagers.",Serowe,Hawassa,AkHa00001224,930,2021-05-27 13:26:00,1,Bello Azibo,Field Surveyor,Rural
KiRu29798,tap_in_home_broken,9,"With sincerity, the official created a sense of unity and familial connection that left a lasting, positive impression on the villagers.",Rural,Kilimani,AkHa00001224,930,2021-05-27 13:26:00,1,Bello Azibo,Field Surveyor,Rural
SoCh32020,tap_in_home_broken,9,"During their visit, the official's humility and genuine interactions created an atmosphere of unity and cooperation in the urban setting.",Cheche,Sokoto,AkHa00001224,930,2021-05-27 13:26:00,1,Bello Azibo,Field Surveyor,Rural




##  Step 6: Analyze the Integrated Data

Now that we have successfully created the `audit_employee_source_view`,  
we can perform analytical queries to uncover insights from the integrated dataset.

This step focuses on identifying **patterns, performance issues**, and **regional trends**  
based on the auditor’s external assessments and internal operational data.



###  Key Analysis Questions

1. **Regional Performance:**  
   Which provinces or towns have the highest number of low-scoring audits?

2. **Employee Evaluation:**  
   Which employees are most frequently associated with low or high audit scores?

3. **Source Reliability:**  
   Do specific types of water sources (e.g., wells, taps, rivers) receive lower scores?

4. **Operational Alignment:**  
   How do auditor findings correlate with visit frequency or water source size?



###  Query 1: Regional Audit Summary

The following SQL query aggregates audits by **province** and **town**,  
showing the **total number of audits**, **low-score counts**, and **average audit score**.

Audits with a score of `≤ 2` are considered as “needs improvement.”


In [127]:
%%sql

SELECT 
    province_name,
    town_name,
    COUNT(*) AS total_audits,
    SUM(CASE WHEN audit_score <= 2 THEN 1 ELSE 0 END) AS low_score_count,
    ROUND(AVG(audit_score), 2) AS avg_audit_score
FROM 
    audit_employee_source_view
GROUP BY 
    province_name, town_name
ORDER BY 
    low_score_count DESC
LIMIT 10;


 * mysql+pymysql://root:***@localhost/md_water_services
   sqlite:///dam_levels-a-3987.db
10 rows affected.


province_name,town_name,total_audits,low_score_count,avg_audit_score
Kilimani,Rural,4240105,2184439,2.76
Akatsi,Rural,5200953,1916936,3.16
Hawassa,Rural,2381116,1532595,2.54
Sokoto,Rural,3431607,1495352,3.04
Amanzi,Rural,1548841,479478,3.85
Kilimani,Mrembo,624302,406079,2.43
Kilimani,Isiqalo,510710,347660,2.65
Kilimani,Harare,513357,312894,2.67
Kilimani,Amara,457548,236116,3.11
Sokoto,Ilanga,370973,225491,2.8




###  Query 2: Employee Performance Overview

This query helps identify which employees are most frequently linked with  
low or high audit scores — highlighting training needs or performance excellence.


In [129]:
%%sql

SELECT 
    employee_name,
    position,
    employee_town,
    COUNT(*) AS total_audits,
    SUM(CASE WHEN audit_score <= 2 THEN 1 ELSE 0 END) AS low_score_count,
    ROUND(AVG(audit_score), 2) AS avg_audit_score
FROM 
    audit_employee_source_view
GROUP BY 
    employee_name, position, employee_town
ORDER BY 
    avg_audit_score ASC
LIMIT 10;


 * mysql+pymysql://root:***@localhost/md_water_services
   sqlite:///dam_levels-a-3987.db
10 rows affected.


employee_name,position,employee_town,total_audits,low_score_count,avg_audit_score
Lesedi Kofi,Field Surveyor,Rural,63272,31080,2.78
Wambui Jabali,Field Surveyor,Rural,95983,48084,2.85
Enitan Zuri,Field Surveyor,Zanzibar,1441435,689844,2.93
Sanaa Tendaji,Field Surveyor,Dahabu,1390118,663356,2.97
Nia Furaha,Field Surveyor,Harare,307360,138856,2.97
Vuyisile Kwame,Field Surveyor,Rural,1049096,495660,2.98
Cheche Buhle,Field Surveyor,Rural,486283,230084,2.98
Yewande Ebele,Field Surveyor,Rural,1266991,607004,2.98
Makena Thabo,Field Surveyor,Rural,867236,401744,2.98
Isoke Amani,Field Surveyor,Rural,337394,160796,3.02




###  Interpretation Notes

- **High `low_score_count`** → Indicates areas or employees requiring attention.  
- **Low average score (≤ 2.0)** → Suggests possible systemic or maintenance issues.  
- **Balanced or high average scores** → Reflect consistent service quality and effective supervision.

These insights will guide the **Maji Ndogo Operations Team** in prioritizing maintenance,  
improving staff allocation, and enhancing overall water service reliability.



 **Next Step:** Visualize these insights using Python (`pandas`, `matplotlib`, or `seaborn`)  
to create dashboards and trend plots for management review.
