# Clustering Data to Unveil Maji Ndogo's Water Crisis

## Introduction

In this second part of the integrated project, we gear up for a deep analytical dive into Maji Ndogo's water scenario. Harness the power of a wide range of SQL functions, including intricate window functions, to tease out insights from the data tables.

## Notebook Setup

In [1]:
# Load the sql extension
%load_ext sql

Deploy Panel apps for free on Ploomber Cloud! Learn more: https://ploomber.io/s/signup


In [2]:
# Create a connection to the mysql 'md_water_services' databases
%sql mysql+pymysql://root:password@localhost:3306/md_water_services

## Cleaning the Data

Let's bring up the `employee` entity. It has information on all of Maji Ndogo's workers, but note that the email addresses have not been added. If we will have to send them reports and figures, we will need their emails hence we need to update it the `email` attribute. Luckily the emails for the organisations, per the project description, are easy: `first_name.last_name@ndogowater.gov`.

We can determine the email address for each employee by:
- selecting the employee_name column
- replacing the space with a full stop
- make it lowercase
- and stitch it all together

We have to update the database again with these email addresses, so before we do, we can use a `SELECT` query to get the format right, then use `UPDATE` and `SET` to persist the changes into the database.

In [3]:
%%sql
# Construct the email addresses for maji ndogo's workers
SELECT 
	CONCAT(
    LOWER(REPLACE(employee_name, " ", ".")), "@ndogowater.gov") AS new_email
FROM employee;

new_email
amara.jengo@ndogowater.gov
bello.azibo@ndogowater.gov
bakari.iniko@ndogowater.gov
malachi.mavuso@ndogowater.gov
cheche.buhle@ndogowater.gov
zuriel.matembo@ndogowater.gov
deka.osumare@ndogowater.gov
lalitha.kaburi@ndogowater.gov
enitan.zuri@ndogowater.gov
farai.nia@ndogowater.gov


In [4]:
# %%sql
# # Update the employee table with the constructed emails
# UPDATE employee
# SET email = CONCAT(LOWER(REPLACE(employee_name, " ", ".")), "@ndogowater.gov");

Let's make sure that the query above worked.

In [5]:
%sql SELECT * FROM employee LIMIT 5;

assigned_employee_id,employee_name,phone_number,email,address,province_name,town_name,position
0,Amara Jengo,99637993287,amara.jengo@ndogowater.gov,36 Pwani Mchangani Road,Sokoto,Ilanga,Field Surveyor
1,Bello Azibo,99643864786,bello.azibo@ndogowater.gov,129 Ziwa La Kioo Road,Kilimani,Rural,Field Surveyor
2,Bakari Iniko,99222599041,bakari.iniko@ndogowater.gov,18 Mlima Tazama Avenue,Hawassa,Rural,Field Surveyor
3,Malachi Mavuso,99945849900,malachi.mavuso@ndogowater.gov,100 Mogadishu Road,Akatsi,Lusaka,Field Surveyor
4,Cheche Buhle,99381679640,cheche.buhle@ndogowater.gov,1 Savanna Street,Akatsi,Rural,Field Surveyor


Awesome, now we have emails for all employees persisted in the database. let's check the `phone_number` entity. The phone numbers should be 12 characters long but as we can see below 👇🏾, the phone numbers are 13 numbers long.

In [8]:
%%sql
# Check the length of the phone numbers
SELECT LENGTH(phone_number) FROM md_water_services.employee LIMIT 5;

LENGTH(phone_number)
13
13
13
13
13


That's because there is a space at the end of the number! If you try to send an automated SMS to that number it will fail. This happens so often and the remedy is to `TRIM(column)` as it removes any leading or trailing spaces from a string.

In [9]:
%%sql
# Trim the leading and trailing whitespaces in the phone_number attribute and check the link
SELECT LENGTH(TRIM(phone_number)) FROM md_water_services.employee LIMIT 5;

LENGTH(TRIM(phone_number))
12
12
12
12
12


In [10]:
%%sql
# Update the table to persist the changes to the databases
UPDATE md_water_services.employee
SET employee.phone_number = TRIM(employee.phone_number);

Let's check if the query above worked

In [None]:
%%sql
# Confirm that the phone_number attribute was updated
SELECT LENGTH(phone_number) FROM 