# Charting the course for Maji Ndogo's water future

We’ll start by selecting only the columns we need for now:
                                                  
• All of the information about the location of a water source is in the location table, specifically the town and province of that water source.
                                                  
• water_source has the type of source and the number of people served by each source.
                                                  
• visits has queue information, and connects source_id to location_id. There were multiple visits to sites, so we need to be careful to
include duplicate data (visit_count > 1 ).

• well_pollution has information about the quality of water from only wells, so we need to keep that in mind when we join this table.


## Questions to answer

1. Are there any specific provinces, or towns where some sources are more abundant?
2. We identified that tap_in_home_broken taps are easy wins. Are there any towns where this is a particular problem?



To answer question 1, we will need province_name and town_name from the location table. We also need to know type_of_water_source and
number_of_people_served from the water_source table.
if we use visits as the table we query from, we can join location where
the location_id matches, and water_source where the source_id matches.

In [2]:
%load_ext sql

In [3]:
%sql mysql+pymysql://root:02510251@localhost:3306/md_water_services

In [5]:
%%sql
SELECT
  l.province_name,
  l.town_name,
  v.visit_count,
  v.location_id
FROM visits v
JOIN location l ON v.location_id = l.location_id;

province_name,town_name,visit_count,location_id
Akatsi,Harare,1,AkHa00000
Akatsi,Harare,1,AkHa00001
Akatsi,Harare,1,AkHa00002
Akatsi,Harare,1,AkHa00003
Akatsi,Harare,1,AkHa00004
Akatsi,Harare,1,AkHa00005
Akatsi,Harare,1,AkHa00006
Akatsi,Harare,1,AkHa00007
Akatsi,Harare,1,AkHa00008
Akatsi,Harare,1,AkHa00009


This join gives us the geographic context (province and town) for each visit.
Now let's Extend join to include source type and people served

In [8]:
%%sql
SELECT
  l.province_name,
  l.town_name,
  v.visit_count,
  v.location_id,
  ws.type_of_water_source,
  ws.number_of_people_served
FROM visits v
JOIN location l ON v.location_id = l.location_id
JOIN water_source ws ON v.source_id = ws.source_id;


province_name,town_name,visit_count,location_id,type_of_water_source,number_of_people_served
Akatsi,Harare,1,AkHa00000,tap_in_home,956
Akatsi,Harare,1,AkHa00001,tap_in_home_broken,930
Akatsi,Harare,1,AkHa00002,tap_in_home_broken,486
Akatsi,Harare,1,AkHa00003,well,364
Akatsi,Harare,1,AkHa00004,tap_in_home_broken,942
Akatsi,Harare,1,AkHa00005,tap_in_home,736
Akatsi,Harare,1,AkHa00006,tap_in_home,882
Akatsi,Harare,1,AkHa00007,tap_in_home,554
Akatsi,Harare,1,AkHa00008,well,398
Akatsi,Harare,1,AkHa00009,well,346


## Let's filter for first visits only

In [12]:
%%sql
SELECT
  l.province_name,
  l.town_name,
  ws.type_of_water_source,
  ws.number_of_people_served
FROM visits v
JOIN location l ON v.location_id = l.location_id
JOIN water_source ws ON v.source_id = ws.source_id
WHERE v.visit_count = 1;

province_name,town_name,type_of_water_source,number_of_people_served
Sokoto,Ilanga,river,402
Kilimani,Rural,well,252
Hawassa,Rural,shared_tap,542
Akatsi,Lusaka,well,210
Akatsi,Rural,shared_tap,2598
Kilimani,Rural,river,862
Akatsi,Rural,tap_in_home_broken,496
Kilimani,Rural,tap_in_home,562
Hawassa,Zanzibar,well,308
Amanzi,Dahabu,tap_in_home,556


## Let's Add location_type and time_in_queue

Let's add well pollution results (only for wells)

In [13]:
%%sql
SELECT
  ws.type_of_water_source,
  l.town_name,
  l.province_name,
  l.location_type,
  ws.number_of_people_served,
  v.time_in_queue,
  wp.results
FROM visits v
LEFT JOIN well_pollution wp ON wp.source_id = v.source_id
INNER JOIN location l ON l.location_id = v.location_id
INNER JOIN water_source ws ON ws.source_id = v.source_id
WHERE v.visit_count = 1;

type_of_water_source,town_name,province_name,location_type,number_of_people_served,time_in_queue,results
river,Ilanga,Sokoto,Urban,402,15,
well,Rural,Kilimani,Rural,252,0,Contaminated: Biological
shared_tap,Rural,Hawassa,Rural,542,62,
well,Lusaka,Akatsi,Urban,210,0,Contaminated: Biological
shared_tap,Rural,Akatsi,Rural,2598,28,
river,Rural,Kilimani,Rural,862,9,
tap_in_home_broken,Rural,Akatsi,Rural,496,0,
tap_in_home,Rural,Kilimani,Rural,562,0,
well,Zanzibar,Hawassa,Urban,308,0,Contaminated: Chemical
tap_in_home,Dahabu,Amanzi,Urban,556,0,


## Let's Create a view for simplified analysis

In [14]:
%%sql
CREATE VIEW combined_analysis_table AS
SELECT
  ws.type_of_water_source AS source_type,
  l.town_name,
  l.province_name,
  l.location_type,
  ws.number_of_people_served AS people_served,
  v.time_in_queue,
  wp.results
FROM visits v
LEFT JOIN well_pollution wp ON wp.source_id = v.source_id
INNER JOIN location l ON l.location_id = v.location_id
INNER JOIN water_source ws ON ws.source_id = v.source_id
WHERE v.visit_count = 1;

## Provincial Pivot Table Query

This is the pivot table analysis at the provincial level.

In [16]:
%%sql
WITH province_totals AS (
    -- This CTE calculates the total population served in each province
    SELECT
        province_name,
        SUM(people_served) AS total_ppl_serv
    FROM combined_analysis_table
    GROUP BY province_name
)

SELECT
    ct.province_name,

    -- Each CASE statement calculates the percentage of people served by a source type
    ROUND( (SUM(CASE WHEN source_type = 'river'
                     THEN people_served ELSE 0 END) * 100.0 / pt.total_ppl_serv), 0) AS river,

    ROUND( (SUM(CASE WHEN source_type = 'shared_tap'
                     THEN people_served ELSE 0 END) * 100.0 / pt.total_ppl_serv), 0) AS shared_tap,

    ROUND( (SUM(CASE WHEN source_type = 'tap_in_home'
                     THEN people_served ELSE 0 END) * 100.0 / pt.total_ppl_serv), 0) AS tap_in_home,

    ROUND( (SUM(CASE WHEN source_type = 'tap_in_home_broken'
                     THEN people_served ELSE 0 END) * 100.0 / pt.total_ppl_serv), 0) AS tap_in_home_broken,

    ROUND( (SUM(CASE WHEN source_type = 'well'
                     THEN people_served ELSE 0 END) * 100.0 / pt.total_ppl_serv), 0) AS well

FROM combined_analysis_table ct
JOIN province_totals pt
  ON ct.province_name = pt.province_name
GROUP BY ct.province_name
ORDER BY ct.province_name;


province_name,river,shared_tap,tap_in_home,tap_in_home_broken,well
Akatsi,5,49,14,10,23
Amanzi,3,38,28,24,7
Hawassa,4,43,15,15,24
Kilimani,8,47,13,12,20
Sokoto,21,38,16,10,15


### CTE province_totals

Calculates the total population served in each province (SUM(people_served)).

This gives us the denominator for percentage calculations.

### Main Query

Uses CASE statements to sum populations by source type (river, shared_tap, tap_in_home, tap_in_home_broken, well).

Divides each by the province’s total population to get percentages.

Rounds results to whole numbers for readability.

### Output

A pivot-style table with one row per province and columns showing % of population served by each source type.

## Town-level breakdown of water source types
-- We group by province_name + town_name to avoid merging duplicate town names across provinces

In [17]:
%%sql
WITH town_totals AS (
    -- This CTE calculates the total population served in each town
SELECT
    province_name,
    town_name,
    SUM(people_served) AS total_ppl_serv
FROM combined_analysis_table
GROUP BY province_name, town_name
)

SELECT
ct.province_name,
ct.town_name,

-- Each CASE statement calculates the percentage of people served by a source type
ROUND( (SUM(CASE WHEN source_type = 'river'
                    THEN people_served ELSE 0 END) * 100.0 / tt.total_ppl_serv), 0) AS river,

ROUND( (SUM(CASE WHEN source_type = 'shared_tap'
                    THEN people_served ELSE 0 END) * 100.0 / tt.total_ppl_serv), 0) AS shared_tap,

ROUND( (SUM(CASE WHEN source_type = 'tap_in_home'
                    THEN people_served ELSE 0 END) * 100.0 / tt.total_ppl_serv), 0) AS tap_in_home,

ROUND( (SUM(CASE WHEN source_type = 'tap_in_home_broken'
                    THEN people_served ELSE 0 END) * 100.0 / tt.total_ppl_serv), 0) AS tap_in_home_broken,

ROUND( (SUM(CASE WHEN source_type = 'well'
                    THEN people_served ELSE 0 END) * 100.0 / tt.total_ppl_serv), 0) AS well

FROM combined_analysis_table ct
JOIN town_totals tt
ON ct.province_name = tt.province_name
AND ct.town_name = tt.town_name
GROUP BY ct.province_name, ct.town_name
ORDER BY ct.town_name;

province_name,town_name,river,shared_tap,tap_in_home,tap_in_home_broken,well
Amanzi,Abidjan,2,53,22,19,4
Kilimani,Amara,8,22,25,16,30
Amanzi,Amina,8,24,3,56,9
Hawassa,Amina,2,14,19,24,42
Amanzi,Asmara,3,49,24,20,4
Sokoto,Bahari,21,11,36,12,20
Amanzi,Bello,3,53,20,22,3
Sokoto,Cheche,19,16,35,12,18
Amanzi,Dahabu,3,37,55,1,4
Hawassa,Deka,3,16,23,21,38


## Store as a Temporary Table
This makes repeated queries faster:

In [18]:
%%sql
CREATE TEMPORARY TABLE town_aggregated_water_access AS
WITH town_totals AS (
    SELECT
        province_name,
        town_name,
        SUM(people_served) AS total_ppl_serv
    FROM combined_analysis_table
    GROUP BY province_name, town_name
)
SELECT
    ct.province_name,
    ct.town_name,
    ROUND( (SUM(CASE WHEN source_type = 'tap_in_home'
                     THEN people_served ELSE 0 END) * 100.0 / tt.total_ppl_serv), 0) AS tap_in_home,
    ROUND( (SUM(CASE WHEN source_type = 'tap_in_home_broken'
                     THEN people_served ELSE 0 END) * 100.0 / tt.total_ppl_serv), 0) AS tap_in_home_broken,
    ROUND( (SUM(CASE WHEN source_type = 'shared_tap'
                     THEN people_served ELSE 0 END) * 100.0 / tt.total_ppl_serv), 0) AS shared_tap,
    ROUND( (SUM(CASE WHEN source_type = 'well'
                     THEN people_served ELSE 0 END) * 100.0 / tt.total_ppl_serv), 0) AS well,
    ROUND( (SUM(CASE WHEN source_type = 'river'
                     THEN people_served ELSE 0 END) * 100.0 / tt.total_ppl_serv), 0) AS river
FROM combined_analysis_table ct
JOIN town_totals tt
  ON ct.province_name = tt.province_name
 AND ct.town_name = tt.town_name
GROUP BY ct.province_name, ct.town_name
ORDER BY ct.town_name;


## Broken Tap Ratio Query

In [20]:
%%sql
SELECT
    province_name,
    town_name,
    ROUND(tap_in_home_broken / (tap_in_home_broken + tap_in_home) * 100, 0) AS Pct_broken_taps
FROM town_aggregated_water_access
ORDER BY Pct_broken_taps DESC;


province_name,town_name,Pct_broken_taps
Amanzi,Amina,95
Kilimani,Zuri,65
Hawassa,Amina,56
Hawassa,Djenne,55
Kilimani,Rural,53
Amanzi,Bello,52
Amanzi,Pwani,51
Hawassa,Yaounde,51
Akatsi,Lusaka,50
Sokoto,Rural,50
