Omotola Lawal
📅 20th October, 2025
🔗 LinkedIn
- 🗣️ Message from President Aziza Naledi
- 🏁 Introduction: Starting the Final Journey
- 🔗 Joining Pieces Together: Finding the Data We Need Across Tables
- 📊 The Last Analysis: Finding the Final Insights from Our Data
- 📋 Summary Report: Sharing Our Knowledge with Decision Makers
- 🧭 A Practical Plan: From Analysis to Action
- 📑 Reference
Dear Team,
I would like to thank the team for uncovering the corruption of our field workers and letting me know. As you all know, I have no tolerance for people who look after themselves first, at the cost of everyone else, so I have taken the necessary steps!Our journey continues, as we aim to convert our data into actionable knowledge. Understanding the situation is one thing, but translating that understanding into informed decisions will truly make a difference.
As we step into this next phase, you will be shaping our raw data into meaningful views—providing essential information to decision-makers. This will enable us to discern the materials we need, plan our budgets, and identify the areas requiring immediate attention.
Remember, each step you take contributes to a larger goal—the transformation of Maji Ndogo.
Your diligence and dedication are instrumental in shaping a brighter future for our community.— President Aziza Naledi
Hey there! 👋
It’s been quite a journey working on the Maji Ndogo Water Crisis analysis. As I prepare to move on to my next project, I wanted to share this final reflection with you.
It feels rewarding to see justice finally take the front seat.
Maji Ndogo is truly changing — and I believe it’s because President Naledi understands the power of data.
Her commitment to transparency and accountability gives me hope for the future. 🌍This project has reminded me why I’m passionate about data storytelling — transforming raw numbers into meaningful insights that can shape real-world outcomes.
- SQL Mastery: Joins · Aggregations · Filtering · Conditional logic · Case statements
- Data Cleaning & Validation: Detecting inconsistencies · Ensuring data integrity · Handling nulls and duplicates
- Exploratory Data Analysis (EDA): Exploring provincial water patterns · Investigating queue times · Spotting contamination trends
- Analytical Thinking: Building stepwise queries · Simplifying logic with CTEs · Iterative testing for accuracy
- Data Storytelling: Turning query outputs into actionable insights · Communicating findings through visuals and reports
- Problem-Solving: Debugging queries · Structuring efficient logic · Balancing readability and scalability
- Transparency & Accountability Focus: Applying data to drive equitable water access · Supporting governance with evidence-based insights
These skills were applied throughout the Maji Ndogo Water Access project to identify improvement priorities and guide data-driven decision-making.
At this stage of the Maji Ndogo Water Crisis project, I was determined to bring together all the scattered pieces of data into a single, insightful view.
When I first started exploring the dataset, I often made the rookie mistake of merging every table and column into one large table before analyzing. It worked on smaller datasets, but with one this size — performance dropped, queries lagged, and the whole process became inefficient.
So this time, I took a step back and thought carefully about the questions I needed to answer.
That mindset shift — from "get everything" to "get only what I need" — changed the game for me.
- Are there any specific provinces or towns where certain types of water sources are more common?
- We previously identified
tap_in_home_brokentaps as easy wins — are there any towns where this problem is particularly bad?
To answer these, I needed data from:
location→ province and town nameswater_source→ type of source and number of people servedvisits→ connection between sources and locationswell_pollution→ quality data for wells only
Because location and water_source can’t be directly joined, I realized the visits table would act as the bridge between them.
Once I understood that, building the query felt like solving a puzzle — piece by piece.
🧾 Click to view SQL Query
USE md_water_services;
SELECT
l.province_name,
l.town_name,
v.visit_count,
v.location_id
FROM location AS l
JOIN visits AS v
ON l.location_id = v.location_id;📊 Click to view result table
| province_name | town_name | visit_count | location_id |
|---|---|---|---|
| Sokoto | Ilanga | 1 | SoIl32582 |
| ... | ... | ... | ... |
Next, I joined the water_source table to include details about each water source.
🧾 Click to view SQL Query
SELECT
l.province_name,
l.town_name,
v.visit_count,
v.location_id,
s.type_of_water_source,
s.number_of_people_served
FROM location AS l
JOIN visits AS v
ON l.location_id = v.location_id
JOIN water_source AS s
ON v.source_id = s.source_id;📊 Click to view result table
| province_name | town_name | visit_count | location_id | type_of_water_source | number_of_people_served |
|---|---|---|---|---|---|
| Akatsi | Harare | 1 | AkHa00000 | tap_in_home | 956 |
| Akatsi | Harare | 1 | AkHa00001 | tap_in_home_broken | 930 |
| Akatsi | Harare | 1 | AkHa00002 | tap_in_home_broken | 486 |
| Akatsi | Harare | 1 | AkHa00003 | well | 364 |
| ... | ... | ... | ... | ... | ... |
When I filtered for a specific location, I noticed something interesting — the same location_id appeared multiple times with different visit_count values.
🧾 Click to view SQL Query
SELECT
l.province_name,
l.town_name,
v.visit_count,
v.location_id,
s.type_of_water_source,
s.number_of_people_served
FROM location AS l
JOIN visits AS v
ON l.location_id = v.location_id
JOIN water_source AS s
ON v.source_id = s.source_id
WHERE v.location_id = 'AkHa00103';📊 Click to view result table
| province_name | town_name | visit_count | location_id | type_of_water_source | number_of_people_served |
|---|---|---|---|---|---|
| Akatsi | Harare | 1 | AkHa00103 | shared_tap | 3340 |
| Akatsi | Harare | 2 | AkHa00103 | shared_tap | 3340 |
| Akatsi | Harare | 3 | AkHa00103 | shared_tap | 3340 |
| ... | ... | ... | ... | ... | ... |
This duplication would inflate results during aggregation. To fix it, I filtered for visit_count = 1 to ensure only the first survey record was used.
🧾 Click to view SQL Query
SELECT
l.province_name,
l.town_name,
s.type_of_water_source,
l.location_type,
s.number_of_people_served,
v.time_in_queue
FROM location AS l
JOIN visits AS v
ON l.location_id = v.location_id
JOIN water_source AS s
ON v.source_id = s.source_id
WHERE v.visit_count = 1;📊 Click to view result table
| province_name | town_name | type_of_water_source | location_type | number_of_people_served | time_in_queue |
|---|---|---|---|---|---|
| Sokoto | Ilanga | river | Urban | 402 | 15 |
| Kilimani | Rural | well | Rural | 252 | 0 |
| Hawassa | Rural | shared_tap | Rural | 542 | 62 |
| Akatsi | Lusaka | well | Urban | 210 | 0 |
| ... | ... | ... | ... | ... | ... |
For this part, I had to remind myself that well_pollution only contains data about wells. Using an INNER JOIN would exclude non-well sources, so I opted for a LEFT JOIN instead to retain all sources and add pollution data where available.
🧾 Click to view SQL Query
SELECT
l.province_name,
l.town_name,
v.visit_count,
v.location_id,
s.type_of_water_source,
s.number_of_people_served,
l.location_type,
v.time_in_queue,
w.results
FROM visits AS v
LEFT JOIN well_pollution AS w
ON v.source_id = w.source_id
INNER JOIN location AS l
ON v.location_id = l.location_id
INNER JOIN water_source AS s
ON v.source_id = s.source_id
WHERE v.visit_count = 1;I decided to turn the final query into a view so I could easily reuse it across other analyses and visualizations.
🧾 Click to view SQL Query
CREATE VIEW combined_analysis_table AS
SELECT
l.province_name,
l.town_name,
v.visit_count,
v.location_id,
s.type_of_water_source AS source_type,
s.number_of_people_served AS people_served,
l.location_type,
v.time_in_queue,
w.results
FROM visits AS v
LEFT JOIN well_pollution AS w
ON v.source_id = w.source_id
INNER JOIN location AS l
ON v.location_id = l.location_id
INNER JOIN water_source AS s
ON v.source_id = s.source_id
WHERE v.visit_count = 1;This part of the project reminded me why I love data analysis. It’s not just about writing SQL — it’s about understanding relationships, optimizing queries, and making data tell a coherent story.
Every join felt like connecting the dots in a mystery, and when the final table came together, it finally made sense. It was a small but satisfying victory in my journey toward becoming a better data storyteller.
his part of the project felt different for me — more personal. I wasn’t just querying data anymore; I was uncovering stories hidden in the numbers. Every percentage, every row, was a reflection of real people’s lives.
So, I decided to take a closer look — province by province, and then town by town — to understand who had access to clean water and who didn’t.
Before jumping into towns, I wanted to see the broader picture across provinces. Here’s the query I wrote to aggregate the data.
👩💻 Click to view my SQL Query
WITH province_totals AS ( -- This CTE calculates the population of each province
SELECT
province_name,
SUM(people_served) AS total_ppl_serv
FROM
combined_analysis_table
GROUP BY
province_name
)
SELECT
ct.province_name,
ROUND((SUM(CASE WHEN source_type = 'river' THEN people_served ELSE 0 END) * 100.0 / pt.total_ppl_serv), 0) AS river,
ROUND((SUM(CASE WHEN source_type = 'shared_tap' THEN people_served ELSE 0 END) * 100.0 / pt.total_ppl_serv), 0) AS shared_tap,
ROUND((SUM(CASE WHEN source_type = 'tap_in_home' THEN people_served ELSE 0 END) * 100.0 / pt.total_ppl_serv), 0) AS tap_in_home,
ROUND((SUM(CASE WHEN source_type = 'tap_in_home_broken' THEN people_served ELSE 0 END) * 100.0 / pt.total_ppl_serv), 0) AS tap_in_home_broken,
ROUND((SUM(CASE WHEN source_type = 'well' THEN people_served ELSE 0 END) * 100.0 / pt.total_ppl_serv), 0) AS well
FROM
combined_analysis_table ct
JOIN
province_totals pt ON ct.province_name = pt.province_name
GROUP BY
ct.province_name
ORDER BY
ct.province_name;📊 Click to view provincial result table
| province_name | river | shared_tap | tap_in_home | tap_in_home_broken | well |
|---|---|---|---|---|---|
| Sokoto | 45 | 18 | 25 | 9 | 3 |
| Amanzi | 3 | 28 | 48 | 21 | 0 |
| Akatsi | 7 | 15 | 30 | 32 | 16 |
Here’s what the data looks like when visualized — clearer, more emotional, and more powerful.
🧩 This graph confirmed what I saw in the query — Sokoto stands out with a very high percentage of people depending on river water. Amanzi, on the other hand, shows a worrying number of broken taps. These patterns helped me prioritize where to send repair teams first.
Next, I zoomed in further. But it wasn’t easy — I realized that some town names repeat (like Harare in both Akatsi and Kilimani). So I grouped my data by both province and town, to get the right picture.
👩💻 Click to view my Town Aggregation Query
WITH town_totals AS (
SELECT
province_name,
town_name,
SUM(people_served) AS total_ppl_serv
FROM
combined_analysis_table
GROUP BY
province_name, town_name
)
SELECT
ct.province_name,
ct.town_name,
ROUND((SUM(CASE WHEN source_type = 'river' THEN people_served ELSE 0 END) * 100.0 / tt.total_ppl_serv), 0) AS river,
ROUND((SUM(CASE WHEN source_type = 'shared_tap' THEN people_served ELSE 0 END) * 100.0 / tt.total_ppl_serv), 0) AS shared_tap,
ROUND((SUM(CASE WHEN source_type = 'tap_in_home' THEN people_served ELSE 0 END) * 100.0 / tt.total_ppl_serv), 0) AS tap_in_home,
ROUND((SUM(CASE WHEN source_type = 'tap_in_home_broken' THEN people_served ELSE 0 END) * 100.0 / tt.total_ppl_serv), 0) AS tap_in_home_broken,
ROUND((SUM(CASE WHEN source_type = 'well' THEN people_served ELSE 0 END) * 100.0 / tt.total_ppl_serv), 0) AS well
FROM
combined_analysis_table AS ct
JOIN
town_totals AS tt
ON ct.province_name = tt.province_name
AND ct.town_name = tt.town_name
GROUP BY
ct.province_name, ct.town_name
ORDER BY
ct.town_name;📊 Click to view sample Town Result Table
| province_name | town_name | tap_in_home | tap_in_home_broken | shared_Tap | well | river |
|---|---|---|---|---|---|---|
| Akatsi | Harare | 28 | 27 | 17 | 27 | 2 |
| Amanzi | Amina | 3 | 56 | 24 | 9 | 8 |
| Amanzi | Dahabu | 55 | 1 | 37 | 4 | 3 |
While reading through these results, one town stood out — Amina in Amanzi. Only 3% of its people have running water, yet more than half have taps installed that don’t work.
That hit me. I wrote in my notes:
“If I could deploy one repair team today, I’d send them to Amina first.”
I also calculated which towns had the highest share of broken taps relative to total installed taps:
👩💻 Click to view Query
SELECT
province_name,
town_name,
ROUND(tap_in_home_broken / (tap_in_home_broken + tap_in_home) * 100, 0) AS Pct_broken_taps
FROM
town_aggregated_water_access;📊 Click to view Result Table
| province_name | town_name | Pct_broken_taps |
|---|---|---|
| Amanzi | Amina | 95 |
| Amanzi | Bello | 52 |
| Akatsi | Kintampo | 46 |
Amina again. Almost all taps broken. It was clear now — this wasn’t just data; it was evidence. I could visualize the inequality: Dahabu (the capital, where past politicians lived) had perfectly working taps, while surrounding towns struggled to find clean water.
This realization shaped my final summary and recommendations.
This part of the project reminded me why I love data. It’s not just about numbers and queries — it’s about using data to see people, spot injustice, and propose solutions that matter.
Every CTE, every JOIN, every percentage — they all told a story.
And now, I was ready to turn those insights into action.
Alright, let’s wrap up what I’ve discovered from this project.
A few weeks ago, I dug into the data, and here’s what stood out to me:
- Most of Maji Ndogo’s water sources are rural.
- About 43% of our people rely on shared taps — sometimes as many as 2,000 people per tap.
- Around 31% of households have water infrastructure at home, but within that group,
- Nearly 45% face non-functional systems — mostly due to faulty pipes, pumps, and reservoirs.
- Towns like Amina, rural Amanzi, and parts of Akatsi and Hawassa are among the worst hit.
- About 18% of the population use wells, but only 28% of those wells are clean — especially in Hawassa, Kilimani, and Akatsi.
- Water access also comes with long waits — often over 120 minutes on average:
- Queues stretch the longest on Saturdays.
- They peak in the mornings and evenings.
- Wednesdays and Sundays tend to be the easiest days to fetch water.
Seeing all this laid out made me realize just how much everyday life in Maji Ndogo depends on a few key sources — and how much impact a small fix can make.
Based on the data, here’s the approach I’d take moving forward:
- Tackle shared taps first — improving them will help the largest number of people immediately.
- Clean up wells — they’re a good source, but contamination is widespread.
- Repair broken infrastructure — one fix here can benefit hundreds of homes and reduce queuing elsewhere.
- Hold off on new home installations (for now) — focusing resources where they’ll have the biggest impact.
- Prepare rural teams — most work will happen outside urban centers, where logistics and access are tougher.
- Rivers: For communities still depending on rivers, I’d start by sending water trucks as a short-term solution while we drill wells for a permanent fix. Sokoto would be my first target.
- Wells: I’d install filters — RO (Reverse Osmosis) filters for chemical pollution and Ultra Violent (UV) filters for biological contamination. Long-term, I’d investigate the root causes of these pollutants.
- Shared taps:
- In the short term, send extra tankers to the busiest taps on the busiest days (guided by the queue-time data).
- In the medium term, install additional taps where demand is highest — starting with Bello, Abidjan, and Zuri.
- The goal: bring average queue times below 30 minutes, following UN standards.
- Low-queue taps: Taps with short wait times (<30 min) don’t need immediate upgrades — installing home taps here will be a longer-term goal.
- Broken infrastructure: Repairing one reservoir or major pipe can restore access for hundreds. I’d prioritize Amina, Lusaka, Zuri, Djenne, and the rural parts of Amanzi.
I’ve really enjoyed diving deep into this project — it’s been more than just SQL queries and dashboards; it’s about understanding how data can tell human stories.
Every row and column pointed back to real people waiting in real queues — and that’s what keeps me motivated to keep building, one dataset at a time. 💧📊
Now that I’ve wrapped up my analysis, it’s time to put the plan into action — right inside the database.
I’ve come a long way with this project, and this part feels like the most meaningful one.
It’s where all the insights, numbers, and logic finally become real, trackable steps that can help our engineers improve water access across Maji Ndogo.
The goal is simple:
I want to build a table that gives our field teams everything they need — where to go, what to fix, and how to report back.
They’ll need:
- The address of each location (street, town, province).
- The type of water source.
- The specific action needed (repair, upgrade, or replacement).
- Space to update progress and record completion dates.
So, I created a table called Project_progress.
🧱 Click to view the CREATE TABLE query
CREATE TABLE Project_progress (
Project_id SERIAL PRIMARY KEY,
source_id VARCHAR(20) NOT NULL REFERENCES water_source(source_id) ON DELETE CASCADE ON UPDATE CASCADE,
Address VARCHAR(50),
Town VARCHAR(30),
Province VARCHAR(30),
Source_type VARCHAR(50),
Improvement VARCHAR(50),
Source_status VARCHAR(50) DEFAULT 'Backlog' CHECK (Source_status IN ('Backlog', 'In progress', 'Complete')),
Date_of_completion DATE,
Comments TEXT
);Once the structure is ready, I moved on to design the query that actually fills this table — using everything I’ve learned so far.
The diagram below illustrates the logical flow of my query construction and data-filtering steps:
At a high level, my improvement logic looks like this:
-
Rivers → Drill wells.
-
Wells:
- If chemically contaminated → Install RO filter.
- If biologically contaminated → Install UV + RO filters.
-
Shared taps:
- If queue time ≥ 30 minutes → Install X extra taps, where
X = FLOOR(time_in_queue / 30).
- If queue time ≥ 30 minutes → Install X extra taps, where
-
In-home taps (broken): Diagnose local infrastructure.
The logic felt like a perfect use case for CASE statements — my favourite SQL control flow tool.
To make this work, I joined together the water_source, well_pollution, visits, and location tables.
Then I filtered out any data that wasn’t relevant — keeping only the first visit to each site and the sources that actually need improvement.
Here’s the initial query logic:
🪄 Click to view the filtering query
SELECT
l.address,
l.town_name,
l.province_name,
s.source_id,
s.type_of_water_source,
w.results
FROM
water_source AS s
LEFT JOIN
well_pollution AS w ON s.source_id = w.source_id
INNER JOIN
visits AS v ON s.source_id = v.source_id
INNER JOIN
location AS l ON l.location_id = v.location_id
WHERE
v.visit_count = 1
AND (
w.results != 'Clean'
OR s.type_of_water_source IN ('tap_in_home_broken', 'river')
OR (s.type_of_water_source = 'shared_tap' AND v.time_in_queue > 30)
);After layering in all the logic with CASE statements, here’s the final query that generates the data for Project_progress.
📊 Click to view the final query
SELECT
s.source_id,
l.address,
l.town_name,
l.province_name,
s.type_of_water_source AS source_type,
CASE
WHEN w.results = 'Contaminated: Chemical' THEN 'Install RO filter'
WHEN w.results = 'Contaminated: Biological' THEN 'Install UV filter'
WHEN s.type_of_water_source = 'river' THEN 'Drill well'
WHEN s.type_of_water_source = 'shared_tap' AND v.time_in_queue > 30
THEN CONCAT('Install ', FLOOR(v.time_in_queue / 30), ' taps nearby')
WHEN s.type_of_water_source = 'tap_in_home_broken' THEN 'Diagnose local infrastructure'
ELSE NULL
END AS Improvement,
CASE
WHEN w.results IN ('Contaminated: Chemical', 'Contaminated: Biological')
OR s.type_of_water_source = 'river'
OR (s.type_of_water_source = 'shared_tap' AND v.time_in_queue > 30)
OR s.type_of_water_source = 'tap_in_home_broken'
THEN 'In progress'
ELSE 'Backlog'
END AS Source_status,
NULL AS Date_of_completion,
NULL AS Comments
FROM
water_source AS s
LEFT JOIN
well_pollution AS w ON s.source_id = w.source_id
INNER JOIN
visits AS v ON s.source_id = v.source_id
INNER JOIN
location AS l ON l.location_id = v.location_id
WHERE
v.visit_count = 1
AND (
w.results != 'Clean'
OR s.type_of_water_source IN ('tap_in_home_broken', 'river')
OR (s.type_of_water_source = 'shared_tap' AND v.time_in_queue > 30)
);After verifying the query results (about 25,398 rows), I inserted them directly into the new table.
🗂️ Click to view the INSERT query
INSERT INTO Project_progress (
source_id,
Address,
Town,
Province,
Source_type,
Improvement,
Source_status,
Date_of_completion,
Comments
)
SELECT
s.source_id,
l.address,
l.town_name,
l.province_name,
s.type_of_water_source AS source_type,
CASE
WHEN w.results = 'Contaminated: Chemical' THEN 'Install RO filter'
WHEN w.results = 'Contaminated: Biological' THEN 'Install UV filter'
WHEN s.type_of_water_source = 'river' THEN 'Drill well'
WHEN s.type_of_water_source = 'shared_tap' AND v.time_in_queue > 30
THEN CONCAT('Install ', FLOOR(v.time_in_queue / 30), ' taps nearby')
WHEN s.type_of_water_source = 'tap_in_home_broken' THEN 'Diagnose local infrastructure'
ELSE NULL
END AS Improvement,
CASE
WHEN w.results IN ('Contaminated: Chemical', 'Contaminated: Biological')
OR s.type_of_water_source = 'river'
OR (s.type_of_water_source = 'shared_tap' AND v.time_in_queue > 30)
OR s.type_of_water_source = 'tap_in_home_broken'
THEN 'In progress'
ELSE 'Backlog'
END AS Source_status,
NULL AS Date_of_completion,
NULL AS Comments
FROM
water_source AS s
LEFT JOIN
well_pollution AS w ON s.source_id = w.source_id
INNER JOIN
visits AS v ON s.source_id = v.source_id
INNER JOIN
location AS l ON l.location_id = v.location_id
WHERE
v.visit_count = 1
AND (
w.results != 'Clean'
OR s.type_of_water_source IN ('tap_in_home_broken', 'river')
OR (s.type_of_water_source = 'shared_tap' AND v.time_in_queue > 30)
);And that’s it — my Project_progress table is ready to guide the repair teams!
It will let us track repairs, upgrades, and completions — and give President Naledi a clear picture of progress as it happens.
This was a long journey — from joins to window functions, and even a few debugging marathons. But looking back, I can honestly say I’ve learned how to think in SQL, not just write it.
-
📚 ALX Data Program: Querying Data: Integrated Project 4 – Maji Ndogo: Charting the Course for Maji Ndogo’s Water Future.
-
🗂️ Dataset: Maji Ndogo Water Services – a fictional yet realistic dataset designed for SQL practice, data cleaning, and exploratory analysis.
-
✍️ Author’s Contribution: Every SQL query, data-cleaning step, and analytical insight presented here was personally executed and refined by me, as part of the ALX Data Program capstone project.
-
🖼️ Image Credits: All images and visual assets used in this documentation are courtesy of the ALX Data Program.
If you found this project insightful, feel free to ⭐ the repo and connect with me on LinkedIn for collaboration.


