### Connecting to the MySQL Database

Before creating new tables, we need to establish a connection between our Jupyter Notebook and the MySQL server.

We’ll:
- Load the SQL extension to enable running SQL queries directly in the notebook.
- Connect to the `united_nations` database hosted on our local MySQL server.


In [1]:
%load_ext sql

In [2]:
%sql mysql+pymysql://root:password@localhost:3306/united_nations

### Testing the Database Connection

To confirm that our connection to the `united_nations` database is successful,  
we’ll query the first five records from the `Access_to_Basic_Services` table.

This helps ensure:
- The database connection is active.
- The table exists and is accessible.


In [3]:
%%sql

SELECT
    *
FROM
    Access_to_Basic_Services
LIMIT 5;


 * mysql+pymysql://root:***@localhost:3306/united_nations
5 rows affected.


Region,Sub_region,Country_name,Time_period,Pct_managed_drinking_water_services,Pct_managed_sanitation_services,Est_population_in_millions,Est_gdp_in_billions,Land_area,Pct_unemployment
Central and Southern Asia,Central Asia,Kazakhstan,2015,94.67,98.0,17.542806,184.39,2699700.0,4.93
Central and Southern Asia,Central Asia,Kazakhstan,2016,94.67,98.0,17.794055,137.28,2699700.0,4.96
Central and Southern Asia,Central Asia,Kazakhstan,2017,95.0,98.0,18.037776,166.81,2699700.0,4.9
Central and Southern Asia,Central Asia,Kazakhstan,2018,95.0,98.0,18.276452,179.34,2699700.0,4.85
Central and Southern Asia,Central Asia,Kazakhstan,2019,95.0,98.0,18.513673,181.67,2699700.0,4.8


### Creating the `Geographic_Location` Table

#### Learning Objective
In this section, we will:
- Create a new table named `Geographic_Location` that will store the geographic information for each country.
- Define a **primary key** to ensure each record is unique.
- Extract specific columns from the `Access_to_Basic_Services` table to populate the new table.

#### Steps
1. **Create the table**  
   We’ll define a table named `Geographic_Location` with the following columns:
   - `Country_name` (Primary Key)
   - `Sub_region`
   - `Region`
   - `Land_area`

2. **Extract and insert data**  
   After creating the table, we’ll extract the relevant columns (`Country_name`, `Sub_region`, `Region`, and `Land_area`) from the `Access_to_Basic_Services` table and insert them into the `Geographic_Location` table.


In [5]:
%%sql

-- Clear the existing data to avoid further conflicts (optional, if you already have duplicates)
TRUNCATE TABLE Geographic_Location;

-- Insert unique country data by grouping by Country_name
INSERT INTO Geographic_Location (Country_name, Sub_region, Region, Land_area)
SELECT
    Country_name,
    MAX(Sub_region) AS Sub_region,
    MAX(Region) AS Region,
    MAX(Land_area) AS Land_area
FROM
    Access_to_Basic_Services
GROUP BY
    Country_name;


 * mysql+pymysql://root:***@localhost:3306/united_nations
0 rows affected.
182 rows affected.


[]

### Inserting Data into the `Geographic_Location` Table

After fixing the duplicate country issue, we successfully inserted clean and unique geographic data into the `Geographic_Location` table.

#### What Happened
1. **`TRUNCATE TABLE`**  
   This command cleared all existing records from the `Geographic_Location` table.  
   It prepares the table to receive new, clean data without duplicates.

2. **`INSERT INTO ... SELECT ... GROUP BY`**  
   We inserted data from the `Access_to_Basic_Services` table, grouped by `Country_name`.  
   The use of the `GROUP BY` clause ensured that each country appeared only once.  
   To handle duplicate values, we used `MAX()` to select one valid entry for each column (`Sub_region`, `Region`, and `Land_area`).

3. **Result**  
   - The command inserted **182 unique country records** into the table.  
   - The message `0 rows affected` referred to the truncate operation, while  
     `182 rows affected` confirmed successful insertion of new data.  
   - The empty brackets `[]` indicate that no result set was expected from the `INSERT` command.

Next, we’ll verify that the data has been inserted correctly by selecting the first few rows from the `Geographic_Location` table.


In [6]:
%%sql

SELECT
    *
FROM
    Geographic_Location
LIMIT 5;


 * mysql+pymysql://root:***@localhost:3306/united_nations
5 rows affected.


Country_name,Sub_region,Region,Land_area
Afghanistan,Southern Asia,Central and Southern Asia,652230.0
Algeria,Northern Africa,Northern Africa and Western Asia,2381740.0
American Samoa,Polynesia,Oceania,200.0
Angola,Middle Africa,Sub-Saharan Africa,1246700.0
Anguilla,Caribbean,Latin America and the Caribbean,
