# Beginning Our Data-driven Journey in Maji Ndogo

## Introduction

In this first part of the integrated project, we dive into Maji ndogo's expansive dataset containing just over 60000 records spread across various tables. As we navigate this trove of data, we'll use basic queries to familiarise ourselves with the contents of each table in the database. We'll also use SQL **Data Manipulation Language (DML)** to refine some data points while we're at it.

## Notebook Setup

In [1]:
# Load the sql extension
%load_ext sql

In [2]:
# Create a connection to the mysql 'md_water_services' database
%sql mysql+pymysql://root:password@localhost:3306/md_water_services

## Familiarising Ourselves With the Data

Let's start by reviewing the first few records of each table to get a high level overview of what our data looks like. First things first, let's see the tables that are in Maji Ndogo's database.

In [3]:
%sql SHOW TABLES

Tables_in_md_water_services
data_dictionary
employee
global_water_access
location
visits
water_quality
water_source
well_pollution


We can see that we have a total of **8** tables. Let's see what each of these tables contain starting with the `data_dictionary` table.

In [4]:
%sql SELECT * FROM data_dictionary;

table_name,column_name,description,datatype,related_to
employee,assigned_employee_id,Unique ID assigned to each employee,INT,visits
employee,employee_name,Name of the employee,VARCHAR(255),
employee,phone_number,Contact number of the employee,VARCHAR(15),
employee,email,Email address of the employee,VARCHAR(255),
employee,address,Residential address of the employee,VARCHAR(255),
employee,town_name,Name of the town where the employee resides,VARCHAR(255),
employee,province_name,Name of the province where the employee resides,VARCHAR(255),
employee,position,Position or job title of the employee,VARCHAR(255),
visits,record_id,Unique ID assigned to each visit,int,"water_quality, water_source"
visits,location_id,ID of the location visited,varchar(255),location


We notice that the data dictionary has description of column names per table in the database. So to get any information a specific table and their column names along with description of each column we can just run a query like below.

In [5]:
%sql SELECT column_name, description, datatype, related_to FROM data_dictionary WHERE table_name = "employee";

column_name,description,datatype,related_to
assigned_employee_id,Unique ID assigned to each employee,INT,visits
employee_name,Name of the employee,VARCHAR(255),
phone_number,Contact number of the employee,VARCHAR(15),
email,Email address of the employee,VARCHAR(255),
address,Residential address of the employee,VARCHAR(255),
town_name,Name of the town where the employee resides,VARCHAR(255),
province_name,Name of the province where the employee resides,VARCHAR(255),
position,Position or job title of the employee,VARCHAR(255),


The information above tells us that the `employee` table has **8** columns on of which seems to be a primary key related to another table i.e. `assigned_employee_id` is used to reference some information in the `visits` table. We can even retrieve table names that are related to each other by running a query like so. 

In [18]:
%%sql
# Retrieve related tables
SELECT DISTINCT table_name, related_to
FROM data_dictionary
WHERE related_to != "";

table_name,related_to
employee,visits
visits,"water_quality, water_source"
visits,location
visits,well_pollution
visits,employee
water_quality,visits
water_source,visits
well_pollution,visits
location,visits


We can see that there are only **6** tables related to each other as per the `data_dictionary` table. Great, with the `data_dictionary` table as our map and the `md_water_services` database as our landscape, we now know how to navigate our data landscape. We just go ahead and view the first fiew rows for every table save for the `data_dictionary` table as we already know that it is more of a reference point for our real data in the database. You can run the query below multiple times while changing the table name after the `FROM` clause and it should display the first 10 records and each of their attributes per table/entity

In [19]:
%sql SELECT * FROM employee;

Deploy Flask apps for free on Ploomber Cloud! Learn more: https://ploomber.io/s/signup


assigned_employee_id,employee_name,phone_number,email,address,province_name,town_name,position
0,Amara Jengo,99637993287,,36 Pwani Mchangani Road,Sokoto,Ilanga,Field Surveyor
1,Bello Azibo,99643864786,,129 Ziwa La Kioo Road,Kilimani,Rural,Field Surveyor
2,Bakari Iniko,99222599041,,18 Mlima Tazama Avenue,Hawassa,Rural,Field Surveyor
3,Malachi Mavuso,99945849900,,100 Mogadishu Road,Akatsi,Lusaka,Field Surveyor
4,Cheche Buhle,99381679640,,1 Savanna Street,Akatsi,Rural,Field Surveyor
5,Zuriel Matembo,99034075111,,26 Bahari Ya Faraja Road,Kilimani,Rural,Field Surveyor
6,Deka Osumare,99379364631,,104 Kenyatta Street,Akatsi,Rural,Field Surveyor
7,Lalitha Kaburi,99681623240,,145 Sungura Amanpour Road,Kilimani,Rural,Field Surveyor
8,Enitan Zuri,99248509202,,117 Kampala Road,Hawassa,Zanzibar,Field Surveyor
10,Farai Nia,99570082739,,33 Angélique Kidjo Avenue,Amanzi,Dahabu,Field Surveyor


## Diving Into the Water Sources

## Unpacking the Visits to Water Sources

## Assessing the Quality of Water Sources

## Investigating any Pollution Issues