## Weaving the data threads of Maji Ndogo's narrative

#### Step 1: Load Auditor Report CSV in Jupyter

In [2]:
!pip install pandas

Collecting pandas
  Downloading pandas-2.3.3-cp313-cp313-win_amd64.whl.metadata (19 kB)
Collecting numpy>=1.26.0 (from pandas)
  Downloading numpy-2.3.4-cp313-cp313-win_amd64.whl.metadata (60 kB)
Collecting pytz>=2020.1 (from pandas)
  Using cached pytz-2025.2-py2.py3-none-any.whl.metadata (22 kB)
Collecting tzdata>=2022.7 (from pandas)
  Using cached tzdata-2025.2-py2.py3-none-any.whl.metadata (1.4 kB)
Downloading pandas-2.3.3-cp313-cp313-win_amd64.whl (11.0 MB)
   ---------------------------------------- 0.0/11.0 MB ? eta -:--:--
   ---------------------------------------- 0.0/11.0 MB ? eta -:--:--
    --------------------------------------- 0.3/11.0 MB ? eta -:--:--
   - -------------------------------------- 0.5/11.0 MB 1.3 MB/s eta 0:00:09
   -- ------------------------------------- 0.8/11.0 MB 1.3 MB/s eta 0:00:08
   --- ------------------------------------ 1.0/11.0 MB 1.3 MB/s eta 0:00:08
   --- ------------------------------------ 1.0/11.0 MB 1.3 MB/s eta 0:00:08
   ---- ------

In [3]:
import pandas as pd

In [6]:
auditor_df = pd.read_csv("Auditor_report - Auditor_report.csv.csv")

In [7]:
auditor_df.head()

Unnamed: 0,location_id,type_of_water_source,true_water_source_score,statements
0,SoRu34980,well,1,Residents admired the official's commitment to...
1,AkRu08112,well,3,Villagers spoke highly of the official's dedic...
2,AkLu02044,river,0,Villagers were touched by the official's inter...
3,AkHa00421,well,3,"Villagers were moved by the official's visit, ..."
4,SoRu35221,river,0,"A photographer's lens captures the queue, thou..."


In [9]:
%load_ext sql

In [10]:
%sql mysql+pymysql://root:02510251@localhost:3306/md_water_services

In [11]:
%%sql
DROP TABLE IF EXISTS auditor_report;

CREATE TABLE auditor_report (
  location_id VARCHAR(32),
  type_of_water_source VARCHAR(64),
  true_water_source_score INT DEFAULT NULL,
  statements VARCHAR(255)
);


In [12]:
from sqlalchemy import create_engine

# Replace with your actual credentials
engine = create_engine('mysql+pymysql://root:02510251@localhost:3306/md_water_services')

# Push the data into the auditor_report table
auditor_df.to_sql('auditor_report', con=engine, if_exists='replace', index=False)


1620

### Auditor vs Survey Score Comparison

We joined `auditor_report`, `visits`, and `water_quality` to compare independently measured scores with field survey subjective scores. This sets the foundation for identifying discrepancies and patterns in water quality assessments.


In [17]:
%%sql
SELECT
  visits.location_id AS location_id,
  visits.record_id,
  auditor_report.true_water_source_score AS auditor_score,
  water_quality.subjective_quality_score AS surveyor_score
FROM auditor_report
JOIN visits ON auditor_report.location_id = visits.location_id
JOIN water_quality ON visits.record_id = water_quality.record_id
LIMIT 10000;



location_id,record_id,auditor_score,surveyor_score
SoRu34980,5185,1,1
AkRu08112,59367,3,3
AkLu02044,37379,0,0
AkHa00421,51627,3,3
SoRu35221,28758,0,0
HaAm16170,31048,1,1
AkRu04812,1513,3,3
AkRu08304,1218,3,3
AkRu05107,8322,2,2
AkRu05215,21160,3,10


#### Analysis Query 1: Agreement Check
Let’s count how many scores match exactly:

In [26]:
%%sql
SELECT
  visits.location_id AS location_id,
  visits.record_id,
  auditor_report.true_water_source_score AS auditor_score,
  water_quality.subjective_quality_score AS surveyor_score
FROM auditor_report
JOIN visits ON auditor_report.location_id = visits.location_id
JOIN water_quality ON visits.record_id = water_quality.record_id
WHERE visits.visit_count = 1
LIMIT 10000;



location_id,record_id,auditor_score,surveyor_score
SoRu34980,5185,1,1
AkRu08112,59367,3,3
AkLu02044,37379,0,0
AkHa00421,51627,3,3
SoRu35221,28758,0,0
HaAm16170,31048,1,1
AkRu04812,1513,3,3
AkRu08304,1218,3,3
AkRu05107,8322,2,2
AkRu05215,21160,3,10


In [28]:
%%sql
SELECT COUNT(*) AS matching_scores
FROM auditor_report
JOIN visits ON auditor_report.location_id = visits.location_id
JOIN water_quality ON visits.record_id = water_quality.record_id
WHERE visits.visit_count = 1
  AND auditor_report.true_water_source_score = water_quality.subjective_quality_score;


matching_scores
1518


#### Analysis Query 2: Disagreement Check
And how many scores differ:

In [27]:
%%sql
SELECT COUNT(*) AS mismatched_scores
FROM auditor_report
JOIN visits ON auditor_report.location_id = visits.location_id
JOIN water_quality ON visits.record_id = water_quality.record_id
WHERE visits.visit_count = 1
  AND auditor_report.true_water_source_score != water_quality.subjective_quality_score;


mismatched_scores
102


#### Analysis Query 3: Score Difference Breakdown
To see how far apart the scores are:

In [29]:
%%sql
SELECT COUNT(*) AS mismatched_scores
FROM auditor_report
JOIN visits ON auditor_report.location_id = visits.location_id
JOIN water_quality ON visits.record_id = water_quality.record_id
WHERE visits.visit_count = 1
  AND auditor_report.true_water_source_score != water_quality.subjective_quality_score;



mismatched_scores
102


#### Mismatched Records Table

In [30]:
%%sql
SELECT
  visits.location_id,
  visits.record_id,
  auditor_report.true_water_source_score AS auditor_score,
  water_quality.subjective_quality_score AS surveyor_score
FROM auditor_report
JOIN visits ON auditor_report.location_id = visits.location_id
JOIN water_quality ON visits.record_id = water_quality.record_id
WHERE visits.visit_count = 1
  AND auditor_report.true_water_source_score != water_quality.subjective_quality_score
LIMIT 10000;


location_id,record_id,auditor_score,surveyor_score
AkRu05215,21160,3,10
KiRu29290,7938,3,10
KiHa22748,43140,9,10
SoRu37841,18495,6,10
KiRu27884,33931,1,10
KiZu31170,17950,9,10
KiZu31370,36864,3,10
AkRu06495,45924,2,10
HaRu17528,30524,1,10
SoRu38331,13192,3,10


###  Auditor vs Surveyor Score Analysis (First Visit Only)

We filtered the comparison to only include first-time visits (`visit_count = 1`) to match the auditor’s methodology. This revealed:

-  1518 scores matched exactly (94% agreement)
-  102 scores differed, indicating potential inconsistencies



### Step 1: Check for Source Type Agreement

In [32]:
%%sql
SELECT
  v.location_id,
  ar.type_of_water_source AS auditor_source,
  ws.type_of_water_source AS survey_source,
  v.record_id,
  ar.true_water_source_score AS auditor_score,
  wq.subjective_quality_score AS surveyor_score
FROM auditor_report ar
JOIN visits v ON ar.location_id = v.location_id
JOIN water_quality wq ON v.record_id = wq.record_id
JOIN water_source ws ON v.source_id = ws.source_id
WHERE v.visit_count = 1
  AND ar.true_water_source_score != wq.subjective_quality_score;


location_id,auditor_source,survey_source,record_id,auditor_score,surveyor_score
AkRu05215,well,well,21160,3,10
KiRu29290,shared_tap,shared_tap,7938,3,10
KiHa22748,tap_in_home_broken,tap_in_home_broken,43140,9,10
SoRu37841,shared_tap,shared_tap,18495,6,10
KiRu27884,well,well,33931,1,10
KiZu31170,tap_in_home_broken,tap_in_home_broken,17950,9,10
KiZu31370,shared_tap,shared_tap,36864,3,10
AkRu06495,well,well,45924,2,10
HaRu17528,well,well,30524,1,10
SoRu38331,shared_tap,shared_tap,13192,3,10


### Step 2: Confirm Integrity of Source Type
From our results, we saw that auditor_source and survey_source matched across all mismatched scores. That means:

** No misclassification of water source types

** Your previous analyses using type_of_water_source are still trustworthy

** Only the scores need re-evaluation

### Step 3: Clean Up Query (Remove Water Source Join)
Now that we’ve validated the source types, we can simplify the query again:

In [33]:
%%sql
SELECT
  v.location_id,
  v.record_id,
  ar.true_water_source_score AS auditor_score,
  wq.subjective_quality_score AS surveyor_score
FROM auditor_report ar
JOIN visits v ON ar.location_id = v.location_id
JOIN water_quality wq ON v.record_id = wq.record_id
WHERE v.visit_count = 1
  AND ar.true_water_source_score != wq.subjective_quality_score;


location_id,record_id,auditor_score,surveyor_score
AkRu05215,21160,3,10
KiRu29290,7938,3,10
KiHa22748,43140,9,10
SoRu37841,18495,6,10
KiRu27884,33931,1,10
KiZu31170,17950,9,10
KiZu31370,36864,3,10
AkRu06495,45924,2,10
HaRu17528,30524,1,10
SoRu38331,13192,3,10


###  Source Type Validation

We compared `type_of_water_source` between the auditor’s report and survey data for all mismatched scores. All records showed consistent classification, confirming:

- Our previous analyses based on source type remain valid
- Discrepancies are limited to scoring, not source identification


### Gathering some evidence

#### Step 1: Define the Incorrect_records CTE
This captures all mismatched scores from first-time visits and links them to employee names

In [37]:
%%sql
WITH Incorrect_records AS (
  SELECT
    v.location_id,
    v.record_id,
    e.employee_name,
    ar.true_water_source_score AS auditor_score,
    wq.subjective_quality_score AS surveyor_score
  FROM auditor_report ar
  JOIN visits v ON ar.location_id = v.location_id
  JOIN water_quality wq ON v.record_id = wq.record_id
  JOIN employee e ON v.assigned_employee_id = e.assigned_employee_id
  WHERE v.visit_count = 1
    AND ar.true_water_source_score != wq.subjective_quality_score
)
SELECT * FROM Incorrect_records;


location_id,record_id,employee_name,auditor_score,surveyor_score
AkRu05215,21160,Rudo Imani,3,10
KiRu29290,7938,Bello Azibo,3,10
KiHa22748,43140,Bello Azibo,9,10
SoRu37841,18495,Rudo Imani,6,10
KiRu27884,33931,Bello Azibo,1,10
KiZu31170,17950,Zuriel Matembo,9,10
KiZu31370,36864,Yewande Ebele,3,10
AkRu06495,45924,Bello Azibo,2,10
HaRu17528,30524,Jengo Tumaini,1,10
SoRu38331,13192,Zuriel Matembo,3,10


#### Step 2: Get Unique List of Employees

In [40]:
%%sql
WITH Incorrect_records AS (
  SELECT
    v.location_id,
    v.record_id,
    e.employee_name,
    ar.true_water_source_score AS auditor_score,
    wq.subjective_quality_score AS surveyor_score
  FROM auditor_report ar
  JOIN visits v ON ar.location_id = v.location_id
  JOIN water_quality wq ON v.record_id = wq.record_id
  JOIN employee e ON v.assigned_employee_id = e.assigned_employee_id
  WHERE v.visit_count = 1
    AND ar.true_water_source_score != wq.subjective_quality_score
)
SELECT DISTINCT employee_name
FROM Incorrect_records;



employee_name
Rudo Imani
Bello Azibo
Zuriel Matembo
Yewande Ebele
Jengo Tumaini
Farai Nia
Malachi Mavuso
Makena Thabo
Lalitha Kaburi
Gamba Shani


#### Step 3: Count Mistakes per Employee

In [43]:
%%sql

WITH Incorrect_records AS (
  SELECT
    v.location_id,
    v.record_id,
    e.employee_name,
    ar.true_water_source_score AS auditor_score,
    wq.subjective_quality_score AS surveyor_score
  FROM auditor_report ar
  JOIN visits v ON ar.location_id = v.location_id
  JOIN water_quality wq ON v.record_id = wq.record_id
  JOIN employee e ON v.assigned_employee_id = e.assigned_employee_id
  WHERE v.visit_count = 1
    AND ar.true_water_source_score != wq.subjective_quality_score
)

SELECT
  employee_name,
  COUNT(*) AS number_of_mistakes
FROM Incorrect_records
GROUP BY employee_name
ORDER BY number_of_mistakes DESC;


employee_name,number_of_mistakes
Bello Azibo,26
Malachi Mavuso,21
Zuriel Matembo,17
Lalitha Kaburi,7
Rudo Imani,5
Farai Nia,4
Enitan Zuri,4
Yewande Ebele,3
Jengo Tumaini,3
Makena Thabo,3


###  Employee Score Discrepancy Analysis

We created a reusable CTE (`Incorrect_records`) to track mismatched scores between auditors and surveyors. This allowed us to:

- Identify 17 employees linked to discrepancies
- Count how many mismatches each employee made
- Reveal patterns of inconsistency


### Identify employees with above-average discrepancies and flag potential concerns.

#### Step 1: Create the Incorrect_records View
This view captures all mismatched scores and includes the auditor’s statements:

In [45]:
%%sql
CREATE VIEW Incorrect_records AS
SELECT
  auditor_report.location_id,
  visits.record_id,
  employee.employee_name,
  auditor_report.true_water_source_score AS auditor_score,
  wq.subjective_quality_score AS surveyor_score,
  auditor_report.statements AS statements
FROM auditor_report
JOIN visits ON auditor_report.location_id = visits.location_id
JOIN water_quality AS wq ON visits.record_id = wq.record_id
JOIN employee ON employee.assigned_employee_id = visits.assigned_employee_id
WHERE visits.visit_count = 1
  AND auditor_report.true_water_source_score != wq.subjective_quality_score;


#### Step 2: Define error_count CTE
This counts how many mismatches each employee has:

In [46]:
%%sql
WITH error_count AS (
  SELECT
    employee_name,
    COUNT(*) AS number_of_mistakes
  FROM Incorrect_records
  GROUP BY employee_name
)
SELECT * FROM error_count;


employee_name,number_of_mistakes
Rudo Imani,5
Bello Azibo,26
Zuriel Matembo,17
Yewande Ebele,3
Jengo Tumaini,3
Farai Nia,4
Malachi Mavuso,21
Makena Thabo,3
Lalitha Kaburi,7
Gamba Shani,3


#### Step 3: Calculate the Average Mistake Count

In [47]:
%%sql
WITH error_count AS (
  SELECT
    employee_name,
    COUNT(*) AS number_of_mistakes
  FROM Incorrect_records
  GROUP BY employee_name
)
SELECT AVG(number_of_mistakes) AS avg_error_count_per_empl
FROM error_count;


avg_error_count_per_empl
6.0


#### Step 4: Identify Suspect Employees

In [49]:
%%sql
WITH error_count AS (
  SELECT
    employee_name,
    COUNT(*) AS number_of_mistakes
  FROM Incorrect_records
  GROUP BY employee_name
)
SELECT
  employee_name,
  number_of_mistakes
FROM error_count
WHERE number_of_mistakes > (
  SELECT AVG(number_of_mistakes)
  FROM error_count
)
ORDER BY number_of_mistakes DESC;


employee_name,number_of_mistakes
Bello Azibo,26
Malachi Mavuso,21
Zuriel Matembo,17
Lalitha Kaburi,7


###  Suspect List: Above-Average Discrepancy Analysis

We created a view (`Incorrect_records`) to track mismatched scores and auditor statements. Then we:

1. Counted mistakes per employee (`error_count`)
2. Calculated the average mistake count
3. Flagged employees with above-average discrepancies

This forms our initial suspect list, a data-driven foundation for further investigation.


### Final Query: Flagging Suspect Employees Based on Score Discrepancies and Statements

In [51]:
%%sql
-- Step 1: Count mistakes per employee
WITH error_count AS (
  SELECT
    employee_name,
    COUNT(*) AS number_of_mistakes
  FROM Incorrect_records
  GROUP BY employee_name
),

-- Step 2: Identify employees with above-average mistake count
suspect_list AS (
  SELECT
    employee_name,
    number_of_mistakes
  FROM error_count
  WHERE number_of_mistakes > (
    SELECT AVG(number_of_mistakes) FROM error_count
  )
)

-- Step 3: Filter Incorrect_records for suspect employees and suspicious statements
SELECT
  employee_name,
  location_id,
  statements
FROM Incorrect_records
WHERE employee_name IN (SELECT employee_name FROM suspect_list)
  AND statements LIKE '%cash%';


employee_name,location_id,statements
Zuriel Matembo,SoRu38331,"An unsettling atmosphere surrounded the official, as villagers shared their experiences of arrogance and lack of dedication. The mention of cash exchanges only intensified their doubts."
Malachi Mavuso,AmAm09607,Villagers spoke of an unsettling encounter with an official who appeared dismissive and detached. The reference to cash transactions added to their growing sense of distrust.
Bello Azibo,KiIs23853,Villagers' wary accounts of an official's arrogance and detachment from their concerns raised suspicions. The mention of cash changing hands further tainted their perception.
Bello Azibo,HaSe21323,Villagers spoke of an unsettling encounter with an official who appeared dismissive and detached. The reference to cash transactions added to their growing sense of distrust.
Zuriel Matembo,AkRu05880,Villagers' wary accounts of an official's arrogance and detachment from their concerns raised suspicions. The allusion to cash changing hands deepened their skepticism.
Bello Azibo,KiRu27065,Villagers expressed their discomfort with an official who displayed a haughty demeanor and negligence. The mention of cash transactions deepened their growing sense of unease.
Malachi Mavuso,KiRu25347,Villagers expressed their discontent with an official who appeared dismissive and neglectful. The mention of cash changing hands added to their growing sense of distrust.
Zuriel Matembo,SoIl32575,Villagers recounted unsettling encounters with an official known for their arrogance and avoidance of responsibilities. The mention of cash changing hands added to their apprehension and distrust.
Bello Azibo,AkRu04508,"An unsettling atmosphere surrounded the official, as villagers shared their experiences of arrogance and lack of dedication. The mention of cash exchanges only intensified their doubts."
Lalitha Kaburi,AkRu07310,"Villagers spoke of their unsettling encounters with an official who seemed indifferent and uninterested, hinting at potential improprieties involving cash exchanges."


### What This Reveals
We’ve flagged Zuriel Matembo, Malachi Mavuso, Bello Azibo, and Lalitha Kaburi as having:

Above-average discrepancies in water quality scoring

Auditor-recorded statements that mention cash, suggesting possible bribery or misconduct

No other employees had statements mentioning "cash" — strengthening the case for focused investigation

###  Integrity Audit Summary

We analyzed discrepancies between auditor and surveyor scores, then filtered for employees with above-average mistake counts. Cross-referencing with auditor statements revealed:

- **4 employees** ( Zuriel Matembo, Malachi Mavuso, Bello Azibo, and Lalitha Kaburi ) with both statistical anomalies and suspicious qualitative evidence
- **No other employees** had statements mentioning "cash"

This is not conclusive proof of corruption, but it is serious enough to warrant escalation. 
