# Introduction

Dear Team,

I would like to express my gratitude to the team for uncovering the corruption among our field workers and bringing it to my attention. As you are aware, I have zero tolerance for individuals who prioritize their own interests over the collective well-being, and I have taken the necessary actions to address this issue.

Our journey continues as we strive to convert our data into actionable knowledge. It is not enough to merely understand the situation; it is the translation of that understanding into informed decisions that will truly make a difference.

In the upcoming phase, your role will be to transform our raw data into meaningful insights, providing crucial information to decision-makers. This will enable us to identify the necessary resources, plan our budgets effectively, and address areas that require immediate attention. Our goal is not just to analyze data; we aim to communicate it in a language that all stakeholders involved in this mission can comprehend and act upon.

Additionally, we will be creating job lists for our engineers. Their expertise will be invaluable in overcoming the challenges we face, but they can only perform their duties efficiently when they have clear, data-driven instructions.

Please remember that each step you take in this process contributes to a larger objective - the transformation of Maji Ndogo. Your dedication and diligence are crucial in shaping a brighter future for our community. Thank you for being an integral part of this journey.

Best regards,

Aziza

In [1]:
# Load and activate the SQL extension to allows us to execute SQL in a Jupyter notebook.
# If you get an error here, make sure that mysql and pymysql is installed correctly.

%load_ext sql

In [2]:
# Establish a connection to the local database using the '%sql' magic command,
# Replace 'password' with our connection password and `db_name` with our database name.
# If you get an error here, please make sure the database name or password is correct.

%sql mysql+pymysql://root:paschalugwu@localhost:3306/md_water_services

# PROJECT PASE 1: Joining pieces together

## Step 1: Let's start by joining location to visits.

In [3]:
%%sql

SELECT
    l.province_name,
    l.town_name,
    v.visit_count,
    v.location_id
FROM
    location l
JOIN
    visits v ON l.location_id = v.location_id
LIMIT 20;

 * mysql+pymysql://root:***@localhost:3306/md_water_services
20 rows affected.


province_name,town_name,visit_count,location_id
Akatsi,Harare,1,AkHa00000
Akatsi,Harare,1,AkHa00001
Akatsi,Harare,1,AkHa00002
Akatsi,Harare,1,AkHa00003
Akatsi,Harare,1,AkHa00004
Akatsi,Harare,1,AkHa00005
Akatsi,Harare,1,AkHa00006
Akatsi,Harare,1,AkHa00007
Akatsi,Harare,1,AkHa00008
Akatsi,Harare,1,AkHa00009


## Step 2: Now, we can join the water_source table on the key shared between water_source and visits.

In [4]:
%%sql

SELECT
    l.province_name,
    l.town_name,
    v.visit_count,
    v.location_id,
    ws.type_of_water_source,
    ws.number_of_people_served
FROM
    location l
JOIN
    visits v ON l.location_id = v.location_id
JOIN
    water_source ws ON v.source_id = ws.source_id
LIMIT 20;

 * mysql+pymysql://root:***@localhost:3306/md_water_services
20 rows affected.


province_name,town_name,visit_count,location_id,type_of_water_source,number_of_people_served
Akatsi,Harare,1,AkHa00000,tap_in_home,956
Akatsi,Harare,1,AkHa00001,tap_in_home_broken,930
Akatsi,Harare,1,AkHa00002,tap_in_home_broken,486
Akatsi,Harare,1,AkHa00003,well,364
Akatsi,Harare,1,AkHa00004,tap_in_home_broken,942
Akatsi,Harare,1,AkHa00005,tap_in_home,736
Akatsi,Harare,1,AkHa00006,tap_in_home,882
Akatsi,Harare,1,AkHa00007,tap_in_home,554
Akatsi,Harare,1,AkHa00008,well,398
Akatsi,Harare,1,AkHa00009,well,346


## Step 3: Note that there are rows where visit_count > 1. These were the sites our surveyors collected additional information for, but they happened at thesame source/location. For example, add this to your query: WHERE visits.location_id = 'AkHa00103

In [7]:
%%sql

SELECT
    l.province_name,
    l.town_name,
    v.visit_count,
    v.location_id,
    ws.type_of_water_source,
    ws.number_of_people_served
FROM
    location l
JOIN
    visits v ON l.location_id = v.location_id
JOIN
    water_source ws ON v.source_id = ws.source_id
WHERE
    v.location_id = 'AkHa00103'
LIMIT 20;

 * mysql+pymysql://root:***@localhost:3306/md_water_services
8 rows affected.


province_name,town_name,visit_count,location_id,type_of_water_source,number_of_people_served
Akatsi,Harare,1,AkHa00103,shared_tap,3340
Akatsi,Harare,2,AkHa00103,shared_tap,3340
Akatsi,Harare,3,AkHa00103,shared_tap,3340
Akatsi,Harare,4,AkHa00103,shared_tap,3340
Akatsi,Harare,5,AkHa00103,shared_tap,3340
Akatsi,Harare,6,AkHa00103,shared_tap,3340
Akatsi,Harare,7,AkHa00103,shared_tap,3340
Akatsi,Harare,8,AkHa00103,shared_tap,3340


## Step 4: There you can see what I mean. For one location, there are multiple AkHa00103 records for the same location. If we aggregate, we will include these rows, so our results will be incorrect. To fix this, we can just select rows where visits.visit_count = 1.

## Task: Remove WHERE visits.location_id = 'AkHa00103' and add the visits.visit_count = 1 as a filter.

In [8]:
%%sql

SELECT
    l.province_name,
    l.town_name,
    v.visit_count,
    v.location_id,
    ws.type_of_water_source,
    ws.number_of_people_served
FROM
    location l
JOIN
    visits v ON l.location_id = v.location_id
JOIN
    water_source ws ON v.source_id = ws.source_id
WHERE
    v.visit_count = 1
LIMIT 20;

 * mysql+pymysql://root:***@localhost:3306/md_water_services
20 rows affected.


province_name,town_name,visit_count,location_id,type_of_water_source,number_of_people_served
Sokoto,Ilanga,1,SoIl32582,river,402
Kilimani,Rural,1,KiRu28935,well,252
Hawassa,Rural,1,HaRu19752,shared_tap,542
Akatsi,Lusaka,1,AkLu01628,well,210
Akatsi,Rural,1,AkRu03357,shared_tap,2598
Kilimani,Rural,1,KiRu29315,river,862
Akatsi,Rural,1,AkRu05234,tap_in_home_broken,496
Kilimani,Rural,1,KiRu28520,tap_in_home,562
Hawassa,Zanzibar,1,HaZa21742,well,308
Amanzi,Dahabu,1,AmDa12214,tap_in_home,556


## Step 5: Ok, now that we verified that the table is joined correctly, we can remove the location_id and visit_count columns. Add the location_type column from location and time_in_queue from visits to our results set.

In [10]:
%%sql

SELECT
    l.province_name,
    l.town_name,
    ws.type_of_water_source,
    l.location_type,
    ws.number_of_people_served,
    v.time_in_queue
FROM
    location l
JOIN
    visits v ON l.location_id = v.location_id
JOIN
    water_source ws ON v.source_id = ws.source_id
WHERE
    v.visit_count = 1
LIMIT 20;

 * mysql+pymysql://root:***@localhost:3306/md_water_services
20 rows affected.


province_name,town_name,type_of_water_source,location_type,number_of_people_served,time_in_queue
Sokoto,Ilanga,river,Urban,402,15
Kilimani,Rural,well,Rural,252,0
Hawassa,Rural,shared_tap,Rural,542,62
Akatsi,Lusaka,well,Urban,210,0
Akatsi,Rural,shared_tap,Rural,2598,28
Kilimani,Rural,river,Rural,862,9
Akatsi,Rural,tap_in_home_broken,Rural,496,0
Kilimani,Rural,tap_in_home,Rural,562,0
Hawassa,Zanzibar,well,Urban,308,0
Amanzi,Dahabu,tap_in_home,Urban,556,0


## Step 6: Last one! Now we need to grab the results from the well_pollution table. This one is a bit trickier. The well_pollution table contained only data for well. If we just use JOIN, we will do an inner join, so that only records that are in well_pollution AND visits will be joined. We have to use a LEFT JOIN to join the results from the well_pollution table for well sources, and will be NULL for all of the rest. Play around with the different JOIN operations to make sure you understand why we used LEFT JOIN.

In [15]:
%%sql

SELECT
    water_source.type_of_water_source,
    location.town_name,
    location.province_name,
    location.location_type,
    water_source.number_of_people_served,
    visits.time_in_queue,
    well_pollution.results
FROM
    visits
LEFT JOIN
    well_pollution ON well_pollution.source_id = visits.source_id
INNER JOIN
    location ON location.location_id = visits.location_id
INNER JOIN
    water_source ON water_source.source_id = visits.source_id
WHERE
    visits.visit_count = 1
LIMIT 20;

 * mysql+pymysql://root:***@localhost:3306/md_water_services
20 rows affected.


type_of_water_source,town_name,province_name,location_type,number_of_people_served,time_in_queue,results
river,Ilanga,Sokoto,Urban,402,15,
well,Rural,Kilimani,Rural,252,0,Contaminated: Biological
shared_tap,Rural,Hawassa,Rural,542,62,
well,Lusaka,Akatsi,Urban,210,0,Contaminated: Biological
shared_tap,Rural,Akatsi,Rural,2598,28,
river,Rural,Kilimani,Rural,862,9,
tap_in_home_broken,Rural,Akatsi,Rural,496,0,
tap_in_home,Rural,Kilimani,Rural,562,0,
well,Zanzibar,Hawassa,Urban,308,0,Contaminated: Chemical
tap_in_home,Dahabu,Amanzi,Urban,556,0,


## Step 7: So this table contains the data we need for this analysis. Now we want to analyse the data in the results set. We can either create a CTE, and then query it, or in my case, I'll make it a VIEW so it is easier to share with you. I'll call it the combined_analysis_table.

In [17]:
%%sql

CREATE VIEW combined_analysis_table AS
-- This view assembles data from different tables into one to simplify analysis
SELECT
    water_source.type_of_water_source,
    location.town_name,
    location.province_name,
    location.location_type,
    water_source.number_of_people_served,
    visits.time_in_queue,
    well_pollution.results
FROM
    visits
LEFT JOIN
    well_pollution ON well_pollution.source_id = visits.source_id
INNER JOIN
    location ON location.location_id = visits.location_id
INNER JOIN
    water_source ON water_source.source_id = visits.source_id
WHERE
    visits.visit_count = 1;


 * mysql+pymysql://root:***@localhost:3306/md_water_services
0 rows affected.


[]

# PROJECT PHASE 2: The last analysis

## Step 1: We're building another pivot table! This time, we want to break down our data into provinces or towns and source types. If we understand where the problems are, and what we need to improve at those locations, we can make an informed decision on where to send our repair teams.

In [22]:
%%sql

WITH province_totals AS (
    SELECT
        province_name,
        SUM(number_of_people_served) AS total_ppl_serv
    FROM
        combined_analysis_table
    GROUP BY
        province_name
)
SELECT
    ct.province_name,
    ROUND((SUM(CASE WHEN type_of_water_source = 'river' THEN number_of_people_served ELSE 0 END) * 100.0 / pt.total_ppl_serv), 0) AS river,
    ROUND((SUM(CASE WHEN type_of_water_source = 'shared_tap' THEN number_of_people_served ELSE 0 END) * 100.0 / pt.total_ppl_serv), 0) AS shared_tap,
    ROUND((SUM(CASE WHEN type_of_water_source = 'tap_in_home' THEN number_of_people_served ELSE 0 END) * 100.0 / pt.total_ppl_serv), 0) AS tap_in_home,
    ROUND((SUM(CASE WHEN type_of_water_source = 'tap_in_home_broken' THEN number_of_people_served ELSE 0 END) * 100.0 / pt.total_ppl_serv), 0) AS tap_in_home_broken,
    ROUND((SUM(CASE WHEN type_of_water_source = 'well' THEN number_of_people_served ELSE 0 END) * 100.0 / pt.total_ppl_serv), 0) AS well
FROM
    combined_analysis_table ct
JOIN
    province_totals pt ON ct.province_name = pt.province_name
GROUP BY
    ct.province_name
ORDER BY
    ct.province_name;

 * mysql+pymysql://root:***@localhost:3306/md_water_services
5 rows affected.


province_name,river,shared_tap,tap_in_home,tap_in_home_broken,well
Akatsi,5,49,14,10,23
Amanzi,3,38,28,24,7
Hawassa,4,43,15,15,24
Kilimani,8,47,13,12,20
Sokoto,21,38,16,10,15


## Note: province_totals is a CTE that calculates the sum of all the people surveyed grouped by province. Let's take a look.

In [24]:
%%sql

WITH province_totals AS (
    SELECT
        province_name,
        SUM(number_of_people_served) AS total_ppl_serv
    FROM
        combined_analysis_table
    GROUP BY
        province_name
)
SELECT
    *
FROM
    province_totals;

 * mysql+pymysql://root:***@localhost:3306/md_water_services


5 rows affected.


province_name,total_ppl_serv
Sokoto,5774434
Kilimani,6584764
Hawassa,3843810
Akatsi,5993306
Amanzi,5431826


## Step 2: Water Source Analysis by Province

This visualization provides an insightful analysis of water sources in different provinces. It highlights the population served by various source types and allows us to identify patterns and make informed decisions to improve water access and infrastructure.

### Population Served by Water Source Type

The chart below represents the percentage of the population served by each water source type in different provinces:

<img src="water_sources_visual.png" alt="Water Source Analysis" width="600">

### Key Findings

1. **River Water Usage**: The province of Sokoto has the highest population relying on river water as their primary source. This indicates a need for urgent action to provide safe filtered water from wells to ensure the health and well-being of the population.

2. **Taps Usage**: The majority of water from Amanzi comes from taps, but half of these home taps don't work because the infrastructure is broken. We need to send out engineering teams to look at the infrastructure in Amanzi first. Fixing a large pump, treatment plant or reservoir means that thousands of people will have running water. This means they will also not have to queue for water, so we improve two things at once

3. **Well Usage**: The province of Hawassa has a substantial population relying on wells as their primary water source. It is essential to maintain and monitor the quality and sustainability of these wells to meet the water needs of the population.

## Actionable Insights

Based on the analysis, the following actions are recommended:

- Send drilling equipment to Sokoto to provide safe filtered water from wells, reducing the reliance on river water.
- Prioritize infrastructure repairs in Amanzi to fix broken taps, ensuring a steady supply of water to thousands of people.
- Conduct regular inspections and maintenance of wells in Hawassa to ensure their sustainability and quality.

By taking these actions, we can improve water access, address infrastructure challenges, and enhance the overall well-being of the population in different provinces.

## Step 3: Chidi requires us to aggregate the data per town, taking into account the issue of duplicate town names in different provinces. The challenge arises when SQL does not distinguish between the duplicate town names, resulting in combined results. To overcome this, we need to group the data by province first and then by town, ensuring that the duplicate towns are distinct because they belong to different provinces. 

In [6]:
%%sql

WITH town_totals AS (
    SELECT
        province_name,
        town_name,
        SUM(number_of_people_served) AS total_ppl_serv
    FROM
        combined_analysis_table
    GROUP BY
        province_name,
        town_name
)
SELECT
    ct.province_name,
    ct.town_name,
    ROUND((SUM(CASE WHEN type_of_water_source = 'river' THEN number_of_people_served ELSE 0 END) * 100.0 / tt.total_ppl_serv), 0) AS river,
    ROUND((SUM(CASE WHEN type_of_water_source = 'shared_tap' THEN number_of_people_served ELSE 0 END) * 100.0 / tt.total_ppl_serv), 0) AS shared_tap,
    ROUND((SUM(CASE WHEN type_of_water_source = 'tap_in_home' THEN number_of_people_served ELSE 0 END) * 100.0 / tt.total_ppl_serv), 0) AS tap_in_home,
    ROUND((SUM(CASE WHEN type_of_water_source = 'tap_in_home_broken' THEN number_of_people_served ELSE 0 END) * 100.0 / tt.total_ppl_serv), 0) AS tap_in_home_broken,
    ROUND((SUM(CASE WHEN type_of_water_source = 'well' THEN number_of_people_served ELSE 0 END) * 100.0 / tt.total_ppl_serv), 0) AS well
FROM
    combined_analysis_table ct
JOIN
    town_totals tt ON ct.province_name = tt.province_name AND ct.town_name = tt.town_name
GROUP BY
    ct.province_name,
    ct.town_name
ORDER BY
    ct.town_name;

 * mysql+pymysql://root:***@localhost:3306/md_water_services


31 rows affected.


province_name,town_name,river,shared_tap,tap_in_home,tap_in_home_broken,well
Amanzi,Abidjan,2,53,22,19,4
Kilimani,Amara,8,22,25,16,30
Amanzi,Amina,8,24,3,56,9
Hawassa,Amina,2,14,19,24,42
Amanzi,Asmara,3,49,24,20,4
Sokoto,Bahari,21,11,36,12,20
Amanzi,Bello,3,53,20,22,3
Sokoto,Cheche,19,16,35,12,18
Amanzi,Dahabu,3,37,55,1,4
Hawassa,Deka,3,16,23,21,38


## Step 4: Before we jump into the data, let's store it as a temporary table first, so it is quicker to access. Temporary tables in SQL are a nice way to store the results of a complex query. We run the query once, and the results are stored as a table. The catch? If you close the database connection, it deletes the table, so you have to run it again each time you start working in MySQL. The benefit is that we can use the table to do more calculations, without running the whole query each time.

In [10]:
%%sql

CREATE TEMPORARY TABLE town_aggregated_water_access
WITH town_totals AS (
    SELECT
        province_name,
        town_name,
        SUM(number_of_people_served) AS total_ppl_serv
    FROM
        combined_analysis_table
    GROUP BY
        province_name,
        town_name
)
SELECT
    ct.province_name,
    ct.town_name,
    ROUND((SUM(CASE WHEN type_of_water_source = 'river' THEN number_of_people_served ELSE 0 END) * 100.0 / tt.total_ppl_serv), 0) AS river,
    ROUND((SUM(CASE WHEN type_of_water_source = 'shared_tap' THEN number_of_people_served ELSE 0 END) * 100.0 / tt.total_ppl_serv), 0) AS shared_tap,
    ROUND((SUM(CASE WHEN type_of_water_source = 'tap_in_home' THEN number_of_people_served ELSE 0 END) * 100.0 / tt.total_ppl_serv), 0) AS tap_in_home,
    ROUND((SUM(CASE WHEN type_of_water_source = 'tap_in_home_broken' THEN number_of_people_served ELSE 0 END) * 100.0 / tt.total_ppl_serv), 0) AS tap_in_home_broken,
    ROUND((SUM(CASE WHEN type_of_water_source = 'well' THEN number_of_people_served ELSE 0 END) * 100.0 / tt.total_ppl_serv), 0) AS well
FROM
    combined_analysis_table ct
JOIN
    town_totals tt ON ct.province_name = tt.province_name AND ct.town_name = tt.town_name
GROUP BY
    ct.province_name,
    ct.town_name
ORDER BY
    ct.province_name;

 * mysql+pymysql://root:***@localhost:3306/md_water_services


31 rows affected.


[]

In [11]:
%%sql

SELECT
    *
FROM
    town_aggregated_water_access;

 * mysql+pymysql://root:***@localhost:3306/md_water_services
31 rows affected.


province_name,town_name,river,shared_tap,tap_in_home,tap_in_home_broken,well
Akatsi,Harare,2,17,28,27,27
Akatsi,Kintampo,2,15,31,26,26
Akatsi,Lusaka,2,17,28,28,26
Akatsi,Rural,6,59,9,5,22
Amanzi,Abidjan,2,53,22,19,4
Amanzi,Amina,8,24,3,56,9
Amanzi,Asmara,3,49,24,20,4
Amanzi,Bello,3,53,20,22,3
Amanzi,Dahabu,3,37,55,1,4
Amanzi,Pwani,3,53,20,21,4


## Step 5: Let's sort by each of the columns to observe the patterns.

### 1. Sort by town_name

In [14]:
%%sql

SELECT
    *
FROM
    town_aggregated_water_access
ORDER BY
    town_name;

 * mysql+pymysql://root:***@localhost:3306/md_water_services
31 rows affected.


province_name,town_name,river,shared_tap,tap_in_home,tap_in_home_broken,well
Amanzi,Abidjan,2,53,22,19,4
Kilimani,Amara,8,22,25,16,30
Amanzi,Amina,8,24,3,56,9
Hawassa,Amina,2,14,19,24,42
Amanzi,Asmara,3,49,24,20,4
Sokoto,Bahari,21,11,36,12,20
Amanzi,Bello,3,53,20,22,3
Sokoto,Cheche,19,16,35,12,18
Amanzi,Dahabu,3,37,55,1,4
Hawassa,Deka,3,16,23,21,38


### 2. Sort by river

In [21]:
%%sql

SELECT
    *
FROM
    town_aggregated_water_access
ORDER BY
    river DESC;

 * mysql+pymysql://root:***@localhost:3306/md_water_services
31 rows affected.


province_name,town_name,river,shared_tap,tap_in_home,tap_in_home_broken,well
Sokoto,Rural,22,49,8,8,13
Sokoto,Bahari,21,11,36,12,20
Sokoto,Kofi,20,16,34,10,20
Sokoto,Cheche,19,16,35,12,18
Sokoto,Majengo,18,14,36,12,20
Sokoto,Marang,17,19,31,13,21
Sokoto,Ilanga,16,12,36,15,21
Kilimani,Rural,9,55,8,9,19
Amanzi,Amina,8,24,3,56,9
Kilimani,Amara,8,22,25,16,30


### 3. Sort by shared_tap

In [20]:
%%sql

SELECT
    *
FROM
    town_aggregated_water_access
ORDER BY
    shared_tap DESC;

 * mysql+pymysql://root:***@localhost:3306/md_water_services
31 rows affected.


province_name,town_name,river,shared_tap,tap_in_home,tap_in_home_broken,well
Kilimani,Zuri,8,71,6,11,4
Akatsi,Rural,6,59,9,5,22
Kilimani,Rural,9,55,8,9,19
Amanzi,Abidjan,2,53,22,19,4
Amanzi,Bello,3,53,20,22,3
Amanzi,Pwani,3,53,20,21,4
Hawassa,Rural,4,52,12,12,19
Amanzi,Asmara,3,49,24,20,4
Sokoto,Rural,22,49,8,8,13
Amanzi,Dahabu,3,37,55,1,4


### 4. Sort by tap_in_home

In [19]:
%%sql

SELECT
    *
FROM
    town_aggregated_water_access
ORDER BY
    tap_in_home DESC;

 * mysql+pymysql://root:***@localhost:3306/md_water_services
31 rows affected.


province_name,town_name,river,shared_tap,tap_in_home,tap_in_home_broken,well
Amanzi,Dahabu,3,37,55,1,4
Sokoto,Bahari,21,11,36,12,20
Sokoto,Ilanga,16,12,36,15,21
Sokoto,Majengo,18,14,36,12,20
Sokoto,Cheche,19,16,35,12,18
Sokoto,Kofi,20,16,34,10,20
Akatsi,Kintampo,2,15,31,26,26
Sokoto,Marang,17,19,31,13,21
Amanzi,Rural,3,27,30,30,10
Kilimani,Harare,7,11,30,20,31


### 5. Sort by tap_in_home_broken

In [18]:
%%sql

SELECT
    *
FROM
    town_aggregated_water_access
ORDER BY
    tap_in_home_broken DESC;

 * mysql+pymysql://root:***@localhost:3306/md_water_services
31 rows affected.


province_name,town_name,river,shared_tap,tap_in_home,tap_in_home_broken,well
Amanzi,Amina,8,24,3,56,9
Amanzi,Rural,3,27,30,30,10
Akatsi,Lusaka,2,17,28,28,26
Akatsi,Harare,2,17,28,27,27
Akatsi,Kintampo,2,15,31,26,26
Hawassa,Amina,2,14,19,24,42
Hawassa,Djenne,3,18,19,23,36
Hawassa,Serowe,6,14,23,23,34
Hawassa,Yaounde,2,14,22,23,38
Amanzi,Bello,3,53,20,22,3


### 6. Sort by well

In [22]:
%%sql

SELECT
    *
FROM
    town_aggregated_water_access
ORDER BY
    well DESC;

 * mysql+pymysql://root:***@localhost:3306/md_water_services
31 rows affected.


province_name,town_name,river,shared_tap,tap_in_home,tap_in_home_broken,well
Hawassa,Amina,2,14,19,24,42
Hawassa,Zanzibar,0,22,22,17,40
Hawassa,Deka,3,16,23,21,38
Hawassa,Yaounde,2,14,22,23,38
Hawassa,Djenne,3,18,19,23,36
Hawassa,Serowe,6,14,23,23,34
Kilimani,Mrembo,7,16,25,21,32
Kilimani,Harare,7,11,30,20,31
Kilimani,Amara,8,22,25,16,30
Kilimani,Isiqalo,7,19,25,18,30


## Step 6: Which town has the highest ratio of people who have taps, but have no running water?

In [25]:
%%sql

SELECT
    province_name,
    town_name,
    ROUND(tap_in_home_broken / (tap_in_home_broken + tap_in_home) * 100,0) AS Pct_broken_taps
FROM
    town_aggregated_water_access
ORDER BY
    Pct_broken_taps DESC;

 * mysql+pymysql://root:***@localhost:3306/md_water_services
31 rows affected.


province_name,town_name,Pct_broken_taps
Amanzi,Amina,95
Kilimani,Zuri,65
Hawassa,Amina,56
Hawassa,Djenne,55
Kilimani,Rural,53
Amanzi,Bello,52
Amanzi,Pwani,51
Hawassa,Yaounde,51
Akatsi,Lusaka,50
Amanzi,Rural,50


# PROJECT PHASE 3: Summary report

## Insights
Based on the data analysis, we have discovered the following insights:

1. **Water Sources in Maji Ndogo**: Most water sources in Maji Ndogo are located in rural areas.

2. **Shared Taps**: Approximately 43% of the population relies on shared taps, with an average of 2000 people sharing one tap.

3. **Water Infrastructure in Homes**: Around 31% of the population has water infrastructure in their homes. However, within this group:

   - 45% face non-functional systems due to issues with pipes, pumps, and reservoirs.
   - Broken infrastructure is observed in towns like Amina, the rural parts of Amanzi, and some towns across Akatsi and Hawassa.

4. **Wells**: About 18% of the population uses wells. However, only 28% of these wells are clean. This issue is prevalent in Hawassa, Kilimani, and Akatsi.

5. **Wait Times for Water**: Citizens often face long wait times for water, with an average wait time of more than 120 minutes. We observed the following patterns:

   - Long queues are experienced on Saturdays.
   - Morning and evening periods have longer queues.
   - Wednesdays and Sundays have shorter queues.

## Plan of Action
Based on the insights, we have devised a plan of action to address the water issues:

1. **Improving Water Sources**: Our efforts will focus on improving the water sources that affect the most people. The priority areas are:

   - Shared taps: Most people will benefit if we improve the shared taps first. We will send additional water tankers to the busiest taps on the busiest days while installing extra taps where needed.
   - Wells: We will install filters to purify the water and address contamination issues. For chemically polluted wells, we will install reverse osmosis (RO) filters, and for biological contamination, we will install UV filters. The long-term goal is to investigate the reasons behind the pollution of these sources.
   - Broken infrastructure: Repairing broken infrastructure offers significant impact, even with a single intervention. Fixing reservoirs or pipes connected to multiple taps can benefit many people. Priority towns for repair include Amina, Lusaka, Zuri, Djenne, and rural parts of Amanzi.

2. **Queue Time Reduction**: Our aim is to reduce queue times for shared taps to below 30 minutes, which aligns with UN standards. The following steps will be taken:

   - Send additional water tankers to the busiest taps on the busiest days, based on the queue time pivot table.
   - Install extra taps in towns with high usage of shared taps, such as Bello, Abidjan, and Zuri.

3. **Taps in Homes**: Installing taps in homes is a resource-intensive solution and better suited as a long-term goal. Currently, towns with short queue times for shared taps (<30 minutes) pose a logistical challenge for further reduction in waiting times.

4. **Rural Challenges**: As most water sources are in rural areas, our teams need to be aware of the challenges posed by road conditions, supplies, and labor in these areas. Repair and upgrade efforts will require additional planning and resources.

## Practical Solutions
To address the identified issues, we will implement the following practical solutions:

1. **Temporary Water Supply**: For communities relying on rivers, we will dispatch trucks to provide temporary water supply while drilling wells for a more permanent solution. The first province we will target is Sokoto.

2. **Water Purification**: For communities using wells, we will install filters to purify the water. Reverse osmosis (RO) filters will be installed for chemically polluted wells, and UV filters will be installed for biological contamination. RO filters will also be installed as a precautionary measure. Investigating and mitigating the pollution sources is a long-term goal.

3. **Additional Water Tankers**: To address the high demand for shared taps, we will send additional water tankers to the busiest taps on the busiest days. This will be based on the queue time pivot table. Simultaneously, we will initiate the installation of extra taps in towns with high usage, such as Bello, Abidjan, and Zuri.

4. **Infrastructure Repair**: Addressing broken infrastructure can have a significant impact. Repairing a single facility, such as a reservoir or pipe connected to multiple taps, can benefit many people. Priority towns for repair include Amina, Lusaka, Zuri, Djenne, and rural parts of Amanzi.

By implementing these solutions, we aim to improve access to clean and reliable water sources, reduce waiting times, and address infrastructure challenges.

# PROJECT PHASE 4: A practical plan

## Step 1: Our final goal is to implement our plan in the database.
- We have a plan to improve the water access in Maji Ndogo, so we need to think it through, and as our final task, create a table where our teams have the information they need to fix, upgrade and repair water sources. They will need the addresses of the places they should visit (street address, town, province), the type of water source they should improve, and what should be done to improve it.
- We should also make space for them in the database to update us on their progress. We need to know if the repair is complete, and the date it was completed, and give them space to upgrade the sources. Let's call this table Project_progress.

## NOTE: The query we are about to create is the  Project_progress  table with the following columns: 
 
-  Project_id : A unique identifier for each project (auto-incremented). 
-  source_id : A reference to the  source_id  column in the  water_source  table, ensuring data integrity. 
-  Address : The street address of the location. 
-  Town : The town where the location is situated. 
-  Province : The province where the location is located. 
-  Source_type : The type of water source to be improved. 
-  Improvement : The specific actions or improvements that the engineers need to perform at that location. 
-  Source_status : The status of the project, with a default value of 'Backlog' and limited to three options: 'Backlog', 'In progress', or 'Complete'. 
-  Date_of_completion : The date when the source has been upgraded or completed. 
-  Comments : A text field where engineers can leave comments about the project. 

In [3]:
%%sql

CREATE TABLE Project_progress (
    Project_id SERIAL PRIMARY KEY,
    source_id VARCHAR(20) NOT NULL REFERENCES water_source(source_id) ON DELETE CASCADE ON UPDATE CASCADE,
    Address VARCHAR(50),
    Town VARCHAR(30),
    Province VARCHAR(30),
    Source_type VARCHAR(50),
    Improvement VARCHAR(50),
    Source_status VARCHAR(50) DEFAULT 'Backlog' CHECK (Source_status IN ('Backlog', 'In progress', 'Complete')),
    Date_of_completion DATE,
    Comments TEXT
);

 * mysql+pymysql://root:***@localhost:3306/md_water_services
(pymysql.err.OperationalError) (1050, "Table 'project_progress' already exists")
[SQL: CREATE TABLE Project_progress (
    Project_id SERIAL PRIMARY KEY,
    source_id VARCHAR(20) NOT NULL REFERENCES water_source(source_id) ON DELETE CASCADE ON UPDATE CASCADE,
    Address VARCHAR(50),
    Town VARCHAR(30),
    Province VARCHAR(30),
    Source_type VARCHAR(50),
    Improvement VARCHAR(50),
    Source_status VARCHAR(50) DEFAULT 'Backlog' CHECK (Source_status IN ('Backlog', 'In progress', 'Complete')),
    Date_of_completion DATE,
    Comments TEXT
);]
(Background on this error at: https://sqlalche.me/e/20/e3q8)


## Step 2: Let's start with the provided  Project_progress_query

In [33]:
%%sql

SELECT
    location.address,
    location.town_name,
    location.province_name,
    water_source.source_id,
    water_source.type_of_water_source,
    well_pollution.results
FROM
    water_source
LEFT JOIN
    well_pollution ON water_source.source_id = well_pollution.source_id
INNER JOIN
    visits ON water_source.source_id = visits.source_id
INNER JOIN
    location ON location.location_id = visits.location_id
LIMIT 20;

 * mysql+pymysql://root:***@localhost:3306/md_water_services
20 rows affected.


address,town_name,province_name,source_id,type_of_water_source,results
2 Addis Ababa Road,Harare,Akatsi,AkHa00000224,tap_in_home,
10 Addis Ababa Road,Harare,Akatsi,AkHa00001224,tap_in_home_broken,
9 Addis Ababa Road,Harare,Akatsi,AkHa00002224,tap_in_home_broken,
139 Addis Ababa Road,Harare,Akatsi,AkHa00003224,well,Clean
17 Addis Ababa Road,Harare,Akatsi,AkHa00004224,tap_in_home_broken,
125 Addis Ababa Road,Harare,Akatsi,AkHa00005224,tap_in_home,
98 Addis Ababa Road,Harare,Akatsi,AkHa00006224,tap_in_home,
21 Addis Ababa Road,Harare,Akatsi,AkHa00007224,tap_in_home,
11 Addis Ababa Road,Harare,Akatsi,AkHa00008224,well,Clean
6 Addis Ababa Road,Harare,Akatsi,AkHa00009224,well,Contaminated: Biological


## Step 3: Next, we modify the query by adding CASE statements to handle the different improvement scenarios. We replace the  SELECT  clause with the following:

In [32]:
%%sql

SELECT
    location.address,
    location.town_name,
    location.province_name,
    water_source.source_id,
    water_source.type_of_water_source,
    CASE
        WHEN well_pollution.results = 'Contaminated: Chemical' THEN 'Install RO filter'
        WHEN well_pollution.results = 'Contaminated: Biological' THEN 'Install UV and RO filter'
        WHEN water_source.type_of_water_source = 'shared_tap' AND visits.time_in_queue >= '30 minutes' THEN 'Install additional taps nearby'
        WHEN water_source.type_of_water_source = 'tap_in_home_broken' THEN 'Diagnose local infrastructure'
        ELSE 'No specific improvement identified'
    END AS improvement
FROM
    water_source
LEFT JOIN
    well_pollution ON water_source.source_id = well_pollution.source_id
INNER JOIN
    visits ON water_source.source_id = visits.source_id
INNER JOIN
    location ON location.location_id = visits.location_id
LIMIT 20;

 * mysql+pymysql://root:***@localhost:3306/md_water_services
20 rows affected.


address,town_name,province_name,source_id,type_of_water_source,improvement
2 Addis Ababa Road,Harare,Akatsi,AkHa00000224,tap_in_home,No specific improvement identified
10 Addis Ababa Road,Harare,Akatsi,AkHa00001224,tap_in_home_broken,Diagnose local infrastructure
9 Addis Ababa Road,Harare,Akatsi,AkHa00002224,tap_in_home_broken,Diagnose local infrastructure
139 Addis Ababa Road,Harare,Akatsi,AkHa00003224,well,No specific improvement identified
17 Addis Ababa Road,Harare,Akatsi,AkHa00004224,tap_in_home_broken,Diagnose local infrastructure
125 Addis Ababa Road,Harare,Akatsi,AkHa00005224,tap_in_home,No specific improvement identified
98 Addis Ababa Road,Harare,Akatsi,AkHa00006224,tap_in_home,No specific improvement identified
21 Addis Ababa Road,Harare,Akatsi,AkHa00007224,tap_in_home,No specific improvement identified
11 Addis Ababa Road,Harare,Akatsi,AkHa00008224,well,No specific improvement identified
6 Addis Ababa Road,Harare,Akatsi,AkHa00009224,well,Install UV and RO filter


## Step 4: In the following modified query, the WHERE clause is updated to filter the data based on Chidi's instructions. It includes the following conditions: 
 
-  visits.visit_count = 1  ensures that only records with  visit_count  equal to 1 are included. 
- The conditions within the parentheses  ( ... )  are combined with OR operators to include any of the following options: 
  -  well_pollution.results != 'Clean'  includes records where the well is not clean. 
  -  water_source.type_of_water_source IN ('tap_in_home_broken', 'river')  includes records where the source type is either 'tap_in_home_broken' or 'river'. 
  -  (water_source.type_of_water_source = 'shared_tap' AND visits.time_in_queue >= '30 minutes')  includes records where the source type is 'shared_tap' and the time in the queue is greater than or equal to '30 minutes'. 

In [39]:
%%sql

SELECT
    location.address,
    location.town_name,
    location.province_name,
    water_source.source_id,
    water_source.type_of_water_source,
    CASE
        WHEN well_pollution.results = 'Contaminated: Chemical' THEN 'Install RO filter'
        WHEN well_pollution.results = 'Contaminated: Biological' THEN 'Install UV and RO filter'
        WHEN water_source.type_of_water_source = 'shared_tap' AND visits.time_in_queue >= '30 minutes' THEN 'Install additional taps nearby'
        WHEN water_source.type_of_water_source = 'tap_in_home_broken' THEN 'Diagnose local infrastructure'
        ELSE 'No specific improvement identified'
    END AS improvement
FROM
    water_source
LEFT JOIN
    well_pollution ON water_source.source_id = well_pollution.source_id
INNER JOIN
    visits ON water_source.source_id = visits.source_id
INNER JOIN
    location ON location.location_id = visits.location_id
WHERE
    visits.visit_count = 1
    AND (
        well_pollution.results != 'Clean'
        OR water_source.type_of_water_source IN ('tap_in_home_broken', 'river')
        OR (water_source.type_of_water_source = 'shared_tap' AND visits.time_in_queue >= 30)
    )
LIMIT 20;

 * mysql+pymysql://root:***@localhost:3306/md_water_services
20 rows affected.


address,town_name,province_name,source_id,type_of_water_source,improvement
36 Pwani Mchangani Road,Ilanga,Sokoto,SoIl32582224,river,No specific improvement identified
129 Ziwa La Kioo Road,Rural,Kilimani,KiRu28935224,well,Install UV and RO filter
18 Mlima Tazama Avenue,Rural,Hawassa,HaRu19752224,shared_tap,Install additional taps nearby
100 Mogadishu Road,Lusaka,Akatsi,AkLu01628224,well,Install UV and RO filter
26 Bahari Ya Faraja Road,Rural,Kilimani,KiRu29315224,river,No specific improvement identified
104 Kenyatta Street,Rural,Akatsi,AkRu05234224,tap_in_home_broken,Diagnose local infrastructure
117 Kampala Road,Zanzibar,Hawassa,HaZa21742224,well,Install RO filter
55 Fennec Way,Rural,Sokoto,SoRu35008224,shared_tap,Install additional taps nearby
52 Moroni Avenue,Rural,Sokoto,SoRu35703224,well,Install UV and RO filter
51 Addis Ababa Road,Harare,Akatsi,AkHa00070224,well,Install RO filter


## Step 5: Let's combine the modified query with an INSERT statement to insert the data into the  Project_progress  table. The INSERT statement will populate the table with the necessary information for each project, including the address, town, province, source ID, source type, and improvement.

In [40]:
%%sql

INSERT INTO Project_progress (Address, Town, Province, source_id, Source_type, Improvement)
SELECT
    location.address,
    location.town_name,
    location.province_name,
    water_source.source_id,
    water_source.type_of_water_source,
    CASE
        WHEN well_pollution.results = 'Contaminated: Chemical' THEN 'Install RO filter'
        WHEN well_pollution.results = 'Contaminated: Biological' THEN 'Install UV and RO filter'
        WHEN water_source.type_of_water_source = 'shared_tap' AND visits.time_in_queue >= 30 THEN 'Install additional taps nearby'
        WHEN water_source.type_of_water_source = 'tap_in_home_broken' THEN 'Diagnose local infrastructure'
        ELSE 'No specific improvement identified'
    END AS improvement
FROM
    water_source
LEFT JOIN
    well_pollution ON water_source.source_id = well_pollution.source_id
INNER JOIN
    visits ON water_source.source_id = visits.source_id
INNER JOIN
    location ON location.location_id = visits.location_id
WHERE
    visits.visit_count = 1
    AND (
        well_pollution.results != 'Clean'
        OR water_source.type_of_water_source IN ('tap_in_home_broken', 'river')
        OR (water_source.type_of_water_source = 'shared_tap' AND visits.time_in_queue >= 30)
    );

 * mysql+pymysql://root:***@localhost:3306/md_water_services


25398 rows affected.


[]

## Step 6: Let's view our newly created 'Project_progress' table

In [43]:
%%sql

SELECT
    *
FROM
    Project_progress
LIMIT 20;

 * mysql+pymysql://root:***@localhost:3306/md_water_services
20 rows affected.


Project_id,source_id,Address,Town,Province,Source_type,Improvement,Source_status,Date_of_completion,Comments
16,AkHa00000224,2 Addis Ababa Road,Harare,Akatsi,tap_in_home,No specific improvement identified,Backlog,,
17,AkHa00001224,10 Addis Ababa Road,Harare,Akatsi,tap_in_home_broken,Diagnose local infrastructure,Backlog,,
18,AkHa00002224,9 Addis Ababa Road,Harare,Akatsi,tap_in_home_broken,Diagnose local infrastructure,Backlog,,
19,AkHa00003224,139 Addis Ababa Road,Harare,Akatsi,well,No specific improvement identified,Backlog,,
20,AkHa00004224,17 Addis Ababa Road,Harare,Akatsi,tap_in_home_broken,Diagnose local infrastructure,Backlog,,
21,AkHa00005224,125 Addis Ababa Road,Harare,Akatsi,tap_in_home,No specific improvement identified,Backlog,,
22,AkHa00006224,98 Addis Ababa Road,Harare,Akatsi,tap_in_home,No specific improvement identified,Backlog,,
23,AkHa00007224,21 Addis Ababa Road,Harare,Akatsi,tap_in_home,No specific improvement identified,Backlog,,
24,AkHa00008224,11 Addis Ababa Road,Harare,Akatsi,well,No specific improvement identified,Backlog,,
25,AkHa00009224,6 Addis Ababa Road,Harare,Akatsi,well,Install UV and RO filter,Backlog,,


## Step 7: Wells
- Let's start with wells. Depending on whether they are chemically contaminated, or biologically contaminated — we'll decide on the interventions.
- Chidi is requiring us to update the query to include control flow logic for wells. We need to determine the interventions based on whether the wells are chemically contaminated or biologically contaminated.

In [50]:
%%sql

SELECT
    location.address,
    location.town_name,
    location.province_name,
    water_source.source_id,
    water_source.type_of_water_source,
    CASE
        WHEN well_pollution.results = 'Contaminated: Chemical' THEN 'Install RO filter'
        WHEN well_pollution.results = 'Contaminated: Biological' THEN 'Install UV and RO filter'
        ELSE NULL
    END AS improvement
FROM
    water_source
LEFT JOIN
    well_pollution ON water_source.source_id = well_pollution.source_id
INNER JOIN
    visits ON water_source.source_id = visits.source_id
INNER JOIN
    location ON location.location_id = visits.location_id
LIMIT 100;

 * mysql+pymysql://root:***@localhost:3306/md_water_services
100 rows affected.


address,town_name,province_name,source_id,type_of_water_source,improvement
2 Addis Ababa Road,Harare,Akatsi,AkHa00000224,tap_in_home,
10 Addis Ababa Road,Harare,Akatsi,AkHa00001224,tap_in_home_broken,
9 Addis Ababa Road,Harare,Akatsi,AkHa00002224,tap_in_home_broken,
139 Addis Ababa Road,Harare,Akatsi,AkHa00003224,well,
17 Addis Ababa Road,Harare,Akatsi,AkHa00004224,tap_in_home_broken,
125 Addis Ababa Road,Harare,Akatsi,AkHa00005224,tap_in_home,
98 Addis Ababa Road,Harare,Akatsi,AkHa00006224,tap_in_home,
21 Addis Ababa Road,Harare,Akatsi,AkHa00007224,tap_in_home,
11 Addis Ababa Road,Harare,Akatsi,AkHa00008224,well,
6 Addis Ababa Road,Harare,Akatsi,AkHa00009224,well,Install UV and RO filter


## Step 8:  Rivers
Now for the rivers. We upgrade those by drilling new wells nearby.

In [52]:
%%sql

SELECT
    location.address,
    location.town_name,
    location.province_name,
    water_source.source_id,
    water_source.type_of_water_source,
    CASE
        WHEN well_pollution.results = 'Contaminated: Chemical' THEN 'Install RO filter'
        WHEN well_pollution.results = 'Contaminated: Biological' THEN 'Install UV and RO filter'
        WHEN water_source.type_of_water_source = 'river' THEN 'Drill well'
        ELSE NULL
    END AS improvement
FROM
    water_source
LEFT JOIN
    well_pollution ON water_source.source_id = well_pollution.source_id
INNER JOIN
    visits ON water_source.source_id = visits.source_id
INNER JOIN
    location ON location.location_id = visits.location_id
LIMIT 100;

 * mysql+pymysql://root:***@localhost:3306/md_water_services
100 rows affected.


address,town_name,province_name,source_id,type_of_water_source,improvement
2 Addis Ababa Road,Harare,Akatsi,AkHa00000224,tap_in_home,
10 Addis Ababa Road,Harare,Akatsi,AkHa00001224,tap_in_home_broken,
9 Addis Ababa Road,Harare,Akatsi,AkHa00002224,tap_in_home_broken,
139 Addis Ababa Road,Harare,Akatsi,AkHa00003224,well,
17 Addis Ababa Road,Harare,Akatsi,AkHa00004224,tap_in_home_broken,
125 Addis Ababa Road,Harare,Akatsi,AkHa00005224,tap_in_home,
98 Addis Ababa Road,Harare,Akatsi,AkHa00006224,tap_in_home,
21 Addis Ababa Road,Harare,Akatsi,AkHa00007224,tap_in_home,
11 Addis Ababa Road,Harare,Akatsi,AkHa00008224,well,
6 Addis Ababa Road,Harare,Akatsi,AkHa00009224,well,Install UV and RO filter


## Step 9: Shared taps
Next up, shared taps. We need to install one tap near each shared tap for every 30 min of queue time. This is the logic:

In [53]:
%%sql

SELECT
    location.address,
    location.town_name,
    location.province_name,
    water_source.source_id,
    water_source.type_of_water_source,
    CASE
        WHEN well_pollution.results = 'Contaminated: Chemical' THEN 'Install RO filter'
        WHEN well_pollution.results = 'Contaminated: Biological' THEN 'Install UV and RO filter'
        WHEN water_source.type_of_water_source = 'river' THEN 'Drill well'
        WHEN water_source.type_of_water_source = 'shared_tap' AND visits.time_in_queue > 60 THEN CONCAT('Install ', FLOOR(visits.time_in_queue / 30), ' taps nearby')
        ELSE NULL
    END AS improvement
FROM
    water_source
LEFT JOIN
    well_pollution ON water_source.source_id = well_pollution.source_id
INNER JOIN
    visits ON water_source.source_id = visits.source_id
INNER JOIN
    location ON location.location_id = visits.location_id
LIMIT 100;

 * mysql+pymysql://root:***@localhost:3306/md_water_services
100 rows affected.


address,town_name,province_name,source_id,type_of_water_source,improvement
2 Addis Ababa Road,Harare,Akatsi,AkHa00000224,tap_in_home,
10 Addis Ababa Road,Harare,Akatsi,AkHa00001224,tap_in_home_broken,
9 Addis Ababa Road,Harare,Akatsi,AkHa00002224,tap_in_home_broken,
139 Addis Ababa Road,Harare,Akatsi,AkHa00003224,well,
17 Addis Ababa Road,Harare,Akatsi,AkHa00004224,tap_in_home_broken,
125 Addis Ababa Road,Harare,Akatsi,AkHa00005224,tap_in_home,
98 Addis Ababa Road,Harare,Akatsi,AkHa00006224,tap_in_home,
21 Addis Ababa Road,Harare,Akatsi,AkHa00007224,tap_in_home,
11 Addis Ababa Road,Harare,Akatsi,AkHa00008224,well,
6 Addis Ababa Road,Harare,Akatsi,AkHa00009224,well,Install UV and RO filter


## Step 10:  In-home taps
Lastly, let's look at in-home taps, specifically broken ones. These taps indicate broken infrastructure. So these need to be inspected by our engineers

In [54]:
%%sql

SELECT
    location.address,
    location.town_name,
    location.province_name,
    water_source.source_id,
    water_source.type_of_water_source,
    CASE
        WHEN well_pollution.results = 'Contaminated: Chemical' THEN 'Install RO filter'
        WHEN well_pollution.results = 'Contaminated: Biological' THEN 'Install UV and RO filter'
        WHEN water_source.type_of_water_source = 'river' THEN 'Drill well'
        WHEN water_source.type_of_water_source = 'shared_tap' AND visits.time_in_queue > 60 THEN CONCAT('Install ', FLOOR(visits.time_in_queue / 30), ' taps nearby')
        WHEN water_source.type_of_water_source = 'tap_in_home_broken' THEN 'Diagnose local infrastructure'
        ELSE NULL
    END AS improvement
FROM
    water_source
LEFT JOIN
    well_pollution ON water_source.source_id = well_pollution.source_id
INNER JOIN
    visits ON water_source.source_id = visits.source_id
INNER JOIN
    location ON location.location_id = visits.location_id
LIMIT 100;

 * mysql+pymysql://root:***@localhost:3306/md_water_services
100 rows affected.


address,town_name,province_name,source_id,type_of_water_source,improvement
2 Addis Ababa Road,Harare,Akatsi,AkHa00000224,tap_in_home,
10 Addis Ababa Road,Harare,Akatsi,AkHa00001224,tap_in_home_broken,Diagnose local infrastructure
9 Addis Ababa Road,Harare,Akatsi,AkHa00002224,tap_in_home_broken,Diagnose local infrastructure
139 Addis Ababa Road,Harare,Akatsi,AkHa00003224,well,
17 Addis Ababa Road,Harare,Akatsi,AkHa00004224,tap_in_home_broken,Diagnose local infrastructure
125 Addis Ababa Road,Harare,Akatsi,AkHa00005224,tap_in_home,
98 Addis Ababa Road,Harare,Akatsi,AkHa00006224,tap_in_home,
21 Addis Ababa Road,Harare,Akatsi,AkHa00007224,tap_in_home,
11 Addis Ababa Road,Harare,Akatsi,AkHa00008224,well,
6 Addis Ababa Road,Harare,Akatsi,AkHa00009224,well,Install UV and RO filter


## Step 11: Add the data to Project_progress
- Now that we have the data we want to provide to engineers, populate the Project_progress table with the results of our query.
- HINT: Make sure the columns in the query line up with the columns in Project_progress. If you make any mistakes, just use DROP TABLE project_progress, and run your query again.

In [71]:
%%sql

INSERT INTO Project_progress (Address, Town, Province, source_id, Source_type, Improvement)
SELECT
    location.address,
    location.town_name,
    location.province_name,
    water_source.source_id,
    water_source.type_of_water_source,
    CASE
        WHEN well_pollution.results = 'Contaminated: Chemical' THEN 'Install RO filter'
        WHEN well_pollution.results = 'Contaminated: Biological' THEN 'Install UV and RO filter'
        WHEN water_source.type_of_water_source = 'river' THEN 'Drill well'
        WHEN water_source.type_of_water_source = 'shared_tap' AND visits.time_in_queue > 60 THEN CONCAT('Install ', FLOOR(visits.time_in_queue / 30), ' taps nearby')
        WHEN water_source.type_of_water_source = 'tap_in_home_broken' THEN 'Diagnose local infrastructure'
        ELSE NULL
    END AS improvement
FROM
    water_source
LEFT JOIN
    well_pollution ON water_source.source_id = well_pollution.source_id
INNER JOIN
    visits ON water_source.source_id = visits.source_id
INNER JOIN
    location ON location.location_id = visits.location_id;

 * mysql+pymysql://root:***@localhost:3306/md_water_services


60146 rows affected.


[]

## Step 12: Let's view our changes

In [4]:
%%sql

SELECT
    *
FROM
    Project_progress
LIMIT 250;

 * mysql+pymysql://root:***@localhost:3306/md_water_services
250 rows affected.


Project_id,source_id,Address,Town,Province,Source_type,Improvement,Source_status,Date_of_completion,Comments
1,AkHa00000224,2 Addis Ababa Road,Harare,Akatsi,tap_in_home,,Backlog,,
2,AkHa00001224,10 Addis Ababa Road,Harare,Akatsi,tap_in_home_broken,Diagnose local infrastructure,Backlog,,
3,AkHa00002224,9 Addis Ababa Road,Harare,Akatsi,tap_in_home_broken,Diagnose local infrastructure,Backlog,,
4,AkHa00003224,139 Addis Ababa Road,Harare,Akatsi,well,,Backlog,,
5,AkHa00004224,17 Addis Ababa Road,Harare,Akatsi,tap_in_home_broken,Diagnose local infrastructure,Backlog,,
6,AkHa00005224,125 Addis Ababa Road,Harare,Akatsi,tap_in_home,,Backlog,,
7,AkHa00006224,98 Addis Ababa Road,Harare,Akatsi,tap_in_home,,Backlog,,
8,AkHa00007224,21 Addis Ababa Road,Harare,Akatsi,tap_in_home,,Backlog,,
9,AkHa00008224,11 Addis Ababa Road,Harare,Akatsi,well,,Backlog,,
10,AkHa00009224,6 Addis Ababa Road,Harare,Akatsi,well,Install UV and RO filter,Backlog,,


**There we go, all done! Now we send off our summary report to Pres. Naledi with our main findings, so they can start organising the teams. We'll also explain the Project_progress table, and how this will help us track our progress.**

In [9]:
%%sql

SELECT
project_progress.Project_id, 
project_progress.Town, 
project_progress.Province, 
project_progress.Source_type, 
project_progress.Improvement,
Water_source.number_of_people_served,
RANK() OVER(PARTITION BY Province ORDER BY number_of_people_served)
FROM  project_progress 
JOIN water_source 
ON water_source.source_id = project_progress.source_id
WHERE Improvement = "Drill Well"
ORDER BY Province DESC, number_of_people_served
LIMIT 5;

 * mysql+pymysql://root:***@localhost:3306/md_water_services
5 rows affected.


Project_id,Town,Province,Source_type,Improvement,number_of_people_served,RANK() OVER(PARTITION BY Province ORDER BY number_of_people_served)
58787,Rural,Sokoto,river,Drill well,400,1
50533,Kofi,Sokoto,river,Drill well,400,1
57388,Rural,Sokoto,river,Drill well,400,1
51176,Majengo,Sokoto,river,Drill well,400,1
52232,Rural,Sokoto,river,Drill well,400,1
