In [33]:

# Load and activate the SQL extension to allows us to execute SQL in a Jupyter notebook.
%load_ext sql
# Establish a connection to the local database using the '%sql' magic command,
%sql mysql+pymysql://root:Dsk264501@localhost:3306/md_water_servicesb
%config SqlMagic.style = '_DEPRECATED_DEFAULT'

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


# Maji Ndogo Water Services Analysis
 
This notebook analyses the water services situation in Maji Ndogo. We will run several SQL queries to understand employee information, locations, water sources, queue times, and provide insights for actionable planning.


## 1. Clustering data to unveil Maji Ndogo's water crisis


In [2]:
%%sql
-- Updating the employee's table with their respective email addresses.
UPDATE employee
SET email = CONCAT(LOWER(REPLACE(employee_name, ' ', '.')), '@ndogowater.gov');


 * mysql+pymysql://root:***@localhost:3306/md_water_servicesb
56 rows affected.


[]

In [3]:
%%sql
-- Removing trailing whitespaces from the phone numbers
UPDATE employee
SET phone_number = TRIM(phone_number);

 * mysql+pymysql://root:***@localhost:3306/md_water_servicesb
56 rows affected.


[]

## 2. Honouring the workers

In [34]:
%%sql
-- how many of our employees live in each town
SELECT
	town_name,
    COUNT(*) AS num_employees
FROM 
	employee
GROUP BY town_name
ORDER BY num_employees DESC;

 * mysql+pymysql://root:***@localhost:3306/md_water_servicesb
9 rows affected.


town_name,num_employees
Rural,29
Dahabu,6
Harare,5
Lusaka,4
Zanzibar,4
Ilanga,3
Serowe,3
Kintampo,1
Yaounde,1


In [5]:
%%sql
-- Top 3 employees based on the visit count
SELECT 
	assigned_employee_id,
    COUNT(*) AS number_of_visits
FROM visits
GROUP BY assigned_employee_id
ORDER BY number_of_visits DESC
LIMIT 3;

 * mysql+pymysql://root:***@localhost:3306/md_water_servicesb
3 rows affected.


assigned_employee_id,number_of_visits
1,3708
30,3676
34,3539


In [6]:
%%sql
-- Top 3 employee information
SELECT 
	employee_name,
    email,
    phone_number
FROM employee
WHERE assigned_employee_id IN (1, 30, 34);

 * mysql+pymysql://root:***@localhost:3306/md_water_servicesb
3 rows affected.


employee_name,email,phone_number
Bello Azibo,bello.azibo@ndogowater.gov,99643864786
Pili Zola,pili.zola@ndogowater.gov,99822478933
Rudo Imani,rudo.imani@ndogowater.gov,99046972648


## 3. Analysing Locations

In [8]:
%%sql
-- Number of records per town 
SELECT 
	town_name,
    COUNT(*) AS number_of_records
FROM location
GROUP BY town_name
ORDER BY number_of_records DESC
LIMIT 5;

 * mysql+pymysql://root:***@localhost:3306/md_water_servicesb
5 rows affected.


town_name,number_of_records
Rural,23740
Harare,1650
Amina,1090
Lusaka,1070
Mrembo,990


In [9]:
%%sql
-- Number of records per province
SELECT 
	province_name,
    COUNT(*) AS number_of_records
FROM location
GROUP BY province_name
ORDER BY number_of_records DESC
LIMIT 5;

 * mysql+pymysql://root:***@localhost:3306/md_water_servicesb
5 rows affected.


province_name,number_of_records
Kilimani,9510
Akatsi,8940
Sokoto,8220
Amanzi,6950
Hawassa,6030


In [10]:
%%sql
-- Number of records per province per town
SELECT 
	province_name,
    town_name,
    COUNT(town_name) AS records_per_town
FROM 
	location
GROUP BY province_name, town_name
ORDER BY province_name ASC, records_per_town DESC
LIMIT 5;

 * mysql+pymysql://root:***@localhost:3306/md_water_servicesb
5 rows affected.


province_name,town_name,records_per_town
Akatsi,Rural,6290
Akatsi,Lusaka,1070
Akatsi,Harare,800
Akatsi,Kintampo,780
Amanzi,Rural,3100


In [11]:
%%sql
-- Number of records per location type
SELECT 
	location_type,
    COUNT(*) AS num_sources
FROM
	location
GROUP BY location_type;

 * mysql+pymysql://root:***@localhost:3306/md_water_servicesb
2 rows affected.


location_type,num_sources
Urban,15910
Rural,23740


In [12]:
%%sql
-- Expressing the number of records per location in percentages
-- Approximately 60% of ou water sources are in Rural communities across Maji Ndogo.
SELECT 
	(23740 / (15910 + 23740) * 100) AS pct_urban,
    (15910 / (15910 + 23740) * 100) AS pct_rural;

 * mysql+pymysql://root:***@localhost:3306/md_water_servicesb
1 rows affected.


pct_urban,pct_rural
59.8739,40.1261


## 4. Diving into water sources

In [13]:
%%sql
-- Number of people served
SELECT 
    SUM(number_of_people_served) AS total_num_of_people
FROM water_source;

 * mysql+pymysql://root:***@localhost:3306/md_water_servicesb
1 rows affected.


total_num_of_people
27628140


In [14]:
%%sql
-- Count of each water source
SELECT
	type_of_water_source,
    COUNT(*) AS num_of_records
FROM
	water_source
GROUP BY type_of_water_source
ORDER BY num_of_records DESC;

 * mysql+pymysql://root:***@localhost:3306/md_water_servicesb
5 rows affected.


type_of_water_source,num_of_records
well,17383
tap_in_home,7265
tap_in_home_broken,5856
shared_tap,5767
river,3379


In [15]:
%%sql
-- Average number of people per water source
SELECT 
	type_of_water_source,
    ROUND(AVG(number_of_people_served), 0) AS average_num_of_people
FROM water_source
GROUP BY type_of_water_source
ORDER BY average_num_of_people DESC;

 * mysql+pymysql://root:***@localhost:3306/md_water_servicesb
5 rows affected.


type_of_water_source,average_num_of_people
shared_tap,2071
river,699
tap_in_home_broken,649
tap_in_home,644
well,279


In [17]:
%%sql
-- Number of people getting water from each water source
SELECT 
	type_of_water_source,
    SUM(number_of_people_served) AS population_served,
    ROUND(SUM((number_of_people_served) / 27628140 * 100), 0) AS percentage_people_per_source
FROM water_source
GROUP BY type_of_water_source
ORDER BY percentage_people_per_source DESC;

 * mysql+pymysql://root:***@localhost:3306/md_water_servicesb
5 rows affected.


type_of_water_source,population_served,percentage_people_per_source
shared_tap,11945272,43
well,4841724,18
tap_in_home,4678880,17
tap_in_home_broken,3799720,14
river,2362544,9


## 5. Ranking water sources in order of priority

In [19]:
%%sql
-- Ranking water sources in order of priority
SELECT
    type_of_water_source,
    SUM(number_of_people_served) AS population_served,
    RANK() OVER(ORDER BY SUM(number_of_people_served) DESC) AS rank_population
FROM water_source
GROUP BY type_of_water_source
ORDER BY population_served DESC;

 * mysql+pymysql://root:***@localhost:3306/md_water_servicesb
5 rows affected.


type_of_water_source,population_served,rank_population
shared_tap,11945272,1
well,4841724,2
tap_in_home,4678880,3
tap_in_home_broken,3799720,4
river,2362544,5


In [21]:
%%sql
-- Ranking water sources with RANK()
SELECT
	source_id,
    type_of_water_source,
    number_of_people_served,
    RANK() OVER(PARTITION BY type_of_water_source ORDER BY number_of_people_served DESC) AS priority_rank
FROM water_source
WHERE type_of_water_source IN ('well', 'shared_tap', 'river')
ORDER BY type_of_water_source, priority_rank DESC
LIMIT 25;

 * mysql+pymysql://root:***@localhost:3306/md_water_servicesb
25 rows affected.


source_id,type_of_water_source,number_of_people_served,priority_rank
AkKi01493224,river,400,3364
SoMa33669224,river,400,3364
SoRu38500224,river,400,3364
SoRu34808224,river,400,3364
HaDj16848224,river,400,3364
AmRu14978224,river,400,3364
KiMr24417224,river,400,3364
KiRu27420224,river,400,3364
SoRu38759224,river,400,3364
SoRu39436224,river,400,3364


In [22]:
%%sql
-- Ranking water sources with DENSE_RANK()
SELECT
	source_id,
    type_of_water_source,
    number_of_people_served,
    DENSE_RANK() OVER(PARTITION BY type_of_water_source ORDER BY number_of_people_served DESC) AS priority_rank
FROM water_source
WHERE type_of_water_source IN ('well', 'shared_tap', 'river')
ORDER BY type_of_water_source, priority_rank DESC
LIMIT 25;

 * mysql+pymysql://root:***@localhost:3306/md_water_servicesb
25 rows affected.


source_id,type_of_water_source,number_of_people_served,priority_rank
AkKi01493224,river,400,300
SoMa33669224,river,400,300
SoRu38500224,river,400,300
SoRu34808224,river,400,300
HaDj16848224,river,400,300
AmRu14978224,river,400,300
KiMr24417224,river,400,300
KiRu27420224,river,400,300
SoRu38759224,river,400,300
SoRu39436224,river,400,300


In [25]:
%%sql
-- Ranking water sources with ROW_NUMBER()
SELECT
	source_id,
    type_of_water_source,
    number_of_people_served,
    ROW_NUMBER() OVER(PARTITION BY type_of_water_source ORDER BY number_of_people_served DESC) AS priority_rank
FROM water_source
WHERE type_of_water_source IN ('well', 'shared_tap', 'river')
ORDER BY type_of_water_source, priority_rank DESC
LIMIT 25;

 * mysql+pymysql://root:***@localhost:3306/md_water_servicesb
25 rows affected.


source_id,type_of_water_source,number_of_people_served,priority_rank
SoRu37948224,river,400,3379
SoRu37633224,river,400,3378
SoRu38345224,river,400,3377
SoRu38759224,river,400,3376
SoRu38500224,river,400,3375
SoRu34808224,river,400,3374
HaDj16848224,river,400,3373
AmRu14978224,river,400,3372
KiMr24417224,river,400,3371
KiRu27420224,river,400,3370


## 6. Analysing queues

In [28]:
%%sql
-- How long did the survey take?
SELECT 
	MIN(time_of_record) AS first_date,
    MAX(time_of_record) AS last_date,
    DATEDIFF(MAX(time_of_record), MIN(time_of_record)) AS Survey_length
FROM
	visits;

 * mysql+pymysql://root:***@localhost:3306/md_water_servicesb
1 rows affected.


first_date,last_date,Survey_length
2021-01-01 09:10:00,2023-07-14 13:53:00,924


In [29]:
%%sql
-- What is the average total queue time for water?
SELECT
	AVG(NULLIF(time_in_queue, 0))AS Avg_time_in_queue
FROM 
	visits;

 * mysql+pymysql://root:***@localhost:3306/md_water_servicesb
1 rows affected.


Avg_time_in_queue
123.2574


In [30]:
%%sql
-- What is the average queue time on different days?
SELECT 
	DAYNAME(time_of_record) AS day_of_week,
    ROUND(AVG(NULLIF(time_in_queue,0)),0) AS avg_queue_time
FROM visits
GROUP BY DAYNAME(time_of_record);


 * mysql+pymysql://root:***@localhost:3306/md_water_servicesb
7 rows affected.


day_of_week,avg_queue_time
Friday,120
Saturday,246
Sunday,82
Monday,137
Tuesday,108
Wednesday,97
Thursday,105


In [31]:
%%sql
-- How can we communicate this information efficiently?
SELECT 
	TIME_FORMAT(TIME(time_of_record), '%H:00') AS hour_of_day,
    ROUND(AVG(NULLIF(time_in_queue,0)),0) AS avg_queue_time
FROM visits
GROUP BY TIME_FORMAT(TIME(time_of_record), '%H:00')
ORDER BY avg_queue_time DESC;

 * mysql+pymysql://root:***@localhost:3306/md_water_servicesb
14 rows affected.


hour_of_day,avg_queue_time
19:00,168
07:00,149
08:00,149
17:00,149
06:00,149
18:00,147
09:00,118
13:00,115
10:00,114
14:00,114


In [32]:
%%sql
-- Drilling down for each time and day of the week
SELECT
	TIME_FORMAT(TIME(time_of_record), '%H:00') AS hour_of_day,
	ROUND(AVG(
		CASE
		WHEN DAYNAME(time_of_record) = 'Sunday' THEN time_in_queue
		ELSE NULL
	END
		),0) AS Sunday,
	ROUND(AVG(
		CASE
		WHEN DAYNAME(time_of_record) = 'Monday' THEN time_in_queue
		ELSE NULL
	END
		),0) AS Monday,
	ROUND(AVG(
		CASE
		WHEN DAYNAME(time_of_record) = 'Tuesday' THEN time_in_queue
		ELSE NULL
	END
		),0) AS Tuesday,
	ROUND(AVG(
		CASE
		WHEN DAYNAME(time_of_record) = 'Wednesday' THEN time_in_queue
		ELSE NULL
	END
		),0) AS Wednesday,
	ROUND(AVG(
		CASE
		WHEN DAYNAME(time_of_record) = 'Thursday' THEN time_in_queue
		ELSE NULL
	END
		),0) AS Thursday,
	ROUND(AVG(
		CASE
		WHEN DAYNAME(time_of_record) = 'Friday' THEN time_in_queue
		ELSE NULL
	END
		),0) AS Friday,
	ROUND(AVG(
		CASE
		WHEN DAYNAME(time_of_record) = 'Saturday' THEN time_in_queue
		ELSE NULL
	END
		),0) AS Saturday
FROM
	visits
WHERE
	time_in_queue != 0 
GROUP BY
	hour_of_day
ORDER BY
	hour_of_day;


 * mysql+pymysql://root:***@localhost:3306/md_water_servicesb
14 rows affected.


hour_of_day,Sunday,Monday,Tuesday,Wednesday,Thursday,Friday,Saturday
06:00,79,190,134,112,134,153,247
07:00,82,186,128,111,139,156,247
08:00,86,183,130,119,129,153,247
09:00,84,127,105,94,99,107,252
10:00,83,119,99,89,95,112,259
11:00,78,115,102,86,99,104,236
12:00,78,115,97,88,96,109,239
13:00,81,122,97,98,101,115,242
14:00,83,127,104,92,96,110,244
15:00,83,126,104,88,92,110,248


## 7. Reporting Insights

## **Water Accessibility and Infrastructure Summary Report**

This survey aimed to identify the water sources people use and determine both the total and average number of users for each source. Additionally, it examined the duration citizens typically spend in queues to access water.

### **Insights**

1. **Most water sources are rural.**
2. **Shared taps are heavily used:**
   - 43% of our people are using shared taps.
   - On average, 2,000 people share one tap.
3. **In-home infrastructure is limited and often broken:**
   - 31% of our population has water infrastructure in their homes.
   - Within that group, 45% face non-functional systems due to issues with pipes, pumps, and reservoirs.
4. **Wells are common but often unsafe:**
   - 18% of our people are using wells.
   - Only 28% of those wells are clean.
5. **Queue times are long:**
   - Citizens often wait more than 120 minutes for water.
6. **Queue patterns:**
   - Queues are very long on Saturdays.
   - Queues are longer in the mornings and evenings.
   - Wednesdays and Sundays have the shortest queues.


---

### **Practical Solutions**

1. **For river-dependent communities:**
   - **Short-term:** Dispatch water trucks to provide temporary relief.
   - **Long-term:** Deploy drilling teams to establish clean wells.

2. **For well users:**
   - Install **UV filters** for biological contamination.
   - Use **reverse osmosis filters** for chemically polluted wells.
   - Investigate root causes of pollution for sustainable remediation.

3. **For shared taps:**
   - **Short-term:** Send additional water tankers to high-traffic taps during peak queue times (based on pivot table analysis).
   - **Long-term:** Install more taps to reduce wait times below the UN-recommended 30-minute threshold.

4. **For taps with already short queues (< 30 min):**
   - Recognize the logistical challenge of further reducing wait times.
   - Consider **in-home tap installation** as a long-term, resource-intensive goal.

5. **For broken infrastructure:**
   - Prioritize repairs on high-impact systems (e.g., reservoirs or pipes serving multiple taps).
   - Map commonly affected areas to target interventions efficiently.

---

**Goal:** Reduce water access inequality and queue times while improving infrastructure reliability across Maji Ndogo.