### Connecting to MySQL Database

In [1]:
# Install or upgrade the 'jupysql' package for running SQL queries in Jupyter
!pip install jupysql --upgrade -q


In [2]:
# Configure SQL Magic to display all query results without limiting the output
%config SqlMagic.displaylimit = None


In [3]:
# Load the SQL extension to enable SQL queries within the Jupyter notebook

%load_ext sql


In [4]:
# Load the SQL extension for the notebook and connect to the MySQL database
%sql mysql+pymysql://root:salomeK2020!@localhost:3306/md_water_services


### Getting to Know Our Data
*Exploring the Foundational Tables and their Structure*

In [5]:
%%sql 
SHOW Tables

Tables_in_md_water_services
data_dictionary
employee
global_water_access
location
visits
water_quality
water_source
well_pollution


The database contains 8 distinct tables

#### Location Table

In [6]:
%%sql
SELECT
     *
FROM
    location
LIMIT 5

location_id,address,province_name,town_name,location_type
AkHa00000,2 Addis Ababa Road,Akatsi,Harare,Urban
AkHa00001,10 Addis Ababa Road,Akatsi,Harare,Urban
AkHa00002,9 Addis Ababa Road,Akatsi,Harare,Urban
AkHa00003,139 Addis Ababa Road,Akatsi,Harare,Urban
AkHa00004,17 Addis Ababa Road,Akatsi,Harare,Urban


This table has information on a specific location, with an address, the province and town the location is in, and if it's
in a city (Urban) or not.

#### Visits Table

In [7]:
%%sql
SELECT
     *
FROM
    visits
LIMIT 5

record_id,location_id,source_id,time_of_record,visit_count,time_in_queue,assigned_employee_id
0,SoIl32582,SoIl32582224,2021-01-01 09:10:00,1,15,12
1,KiRu28935,KiRu28935224,2021-01-01 09:17:00,1,0,46
2,HaRu19752,HaRu19752224,2021-01-01 09:36:00,1,62,40
3,AkLu01628,AkLu01628224,2021-01-01 09:53:00,1,0,1
4,AkRu03357,AkRu03357224,2021-01-01 10:11:00,1,28,14


Th is a list of location_id, source_id, record_id, and a date and time, so it makes sense that someone (assigned_em-
ployee_id) visited some location (location_id) at some time (time_of_record ) and found a 'source' there (source_id).

#### Water Source Table

In [8]:
%%sql
SELECT
     *
FROM
    water_source
LIMIT 5

source_id,type_of_water_source,number_of_people_served
AkHa00000224,tap_in_home,956
AkHa00001224,tap_in_home_broken,930
AkHa00002224,tap_in_home_broken,486
AkHa00003224,well,364
AkHa00004224,tap_in_home_broken,942


People in Maji Ndogo get water from different water sources

### Dive Into Water Sources
**Understanding Different Sources with SELECT**


*It is important we understand different types of water sources we are dealing with*

In [14]:
# Querring Database to Understand the Unique types of water sources in Maji Ndogo
%%sql
SELECT 
    type_of_water_source
FROM 
    water_source
LIMIT 5

IndentationError: unexpected indent (2989168006.py, line 4)

1. River - People collect drinking water along a river. This is an open water source that millions of people use in Maji Ndogo.
2. Well - These sources draw water from underground sources, and are commonly shared by communities. Since these are closed water sources, contamination is much less likely compared to a river.
3. Shared tap - This is a tap in a public area shared by communities.
4. Tap in home - These are taps that are inside the homes of our citizens. On average about 6 people live together in Maji Ndogo, so each of these taps serves about 6 people.
5. Broken tap in home - These are taps that have been installed in a citizen’s home, but the infrastructure connected to that tap is not functional. This can be due to burst pipes, broken pumps or water treatment plants that are not working.



**An important note on the home taps: About 6-10 million people have running water installed in their homes in Maji Ndogo, including broken taps. If we were to document this, we would have a row of data for each home, so that one record is one tap. That means our database would contain about 1 million rows of data, which may slow our systems down. For now, the surveyors combined the data of many households together into a single record**

### Unpack the Visits
Discovering the Visits Patterns

In [None]:
### Visits where queue time is more than 8 hours


In [None]:
%%sql
SELECT
     *
FROM
    visits
WHERE time_in_queue > 500
ORDER BY time_in_queue DESC
LIMIT 5

In [None]:
%%sql
SELECT
    *
FROM
    water_source
WHERE
    source_id = 'AmRu14612224'
OR  source_id = 'HaRu19538224' 
OR  source_id = 'HaRu20126224' 
OR  source_id = 'AkRu05234224' 
OR  source_id = 'SoRu35388224'

From this, shared taps reported the highest number of time spend in queues

### Water Source Quality
Understanding the quality of Water

The quality of our water sources is the whole point of this survey. We have a table that contains a quality score for each visit made about a water source that was assigned by a Field surveyor. They assigned a score to each source from 1, being terrible, to 10 for a good, clean water source in a home. Shared taps are not rated as high, and the score also depends on how long the queue times are.

In [None]:
%%sql
SELECT
    *
FROM
    water_quality
LIMIT 5

The surveyors only made multiple visits to shared taps and did not revisit other types of water sources. So
there should be no records of second visits to locations where there are good water sources, like taps in homes.

In [None]:
%%sql
SELECT 
    *
FROM 
    water_quality
WHERE 
    subjective_quality_score = 10
AND visit_count = 2

### Pollution Issues
Correcting pollution Data with Like and String Operators

In [None]:
## Pollution Table
%%sql
SELECT 
     *
FROM
   well_pollution
LIMIT 10