# Subquery in  WHERE 

In this notebook, we delve deeper into subqueries, specifically subqueries within a WHERE clause. One of the powerful aspects of subqueries is their ability to compare individual rows with aggregated data. This concept will be explored here.

In [None]:
# Load and activate the SQL extension to allow us to execute SQL in a Jupyter notebook. 
# If you get an error here, make sure that mysql and pymysql are installed correctly. 

%load_ext sql

In [None]:
# Establish a connection to the local database using the '%sql' magic command.
# Replace 'password' with our connection password and `db_name` with our database name. 
# If you get an error here, please make sure the database name or password is correct.

%sql mysql+pymysql://root:Explore2022!@localhost:3306/united_nations



To make a query, we add the `%%sql` command to the start of a cell, create one open line then the query like below, and run the cell.

In [None]:
%%sql

SELECT 
    *
FROM
    Access_to_Basic_Services
LIMIT 5;

## Exercise



We want to answer the following question:

For the year 2020, which countries have a GDP above the global average, but still have less than 90% of their population with access to managed drinking water services?

This question will shed light on nations that, despite having a robust economy, may still be facing challenges in providing basic amenities like water.


### Task 1

Start by constructing a query that displays the average GDP of a country during the year 2020.

### Task 2

In order to answer our question, we need to pull data from both the Economic_Indicators and the Basic_Services tables; therefore, we need to join them together. Using `Country_name` and  `Time_period`, join the Basic_Services table to the Economic_Indicators table.

### Task 3

Using the query created in Task 2, filter the results to display records where:

1. The year = 2020.
2. The GDP is above the global average. 
3. Less than 90% of the country's population have access to managed drinking water services. 

Hint: Keep in mind that we determined the GDP above the global average in Task 1.

### Task 1

In [None]:
%%sql

SELECT 
    AVG(Est_gdp_in_billions)
FROM 
    Economic_Indicators 
WHERE 
    Time_period = 2020;


### Task 2

In [None]:
%%sql

SELECT 
    econ.Country_name,
    econ.Time_period,
    econ.Est_gdp_in_billions,
    service.Pct_managed_drinking_water_services
FROM 
    Economic_Indicators AS econ
INNER JOIN 
    Basic_Services AS service
ON 
    econ.Country_name = service.Country_name
    AND econ.Time_period = service.Time_period
LIMIT 10;

### Task 3

In [None]:
%%sql

SELECT 
    econ.Country_name,
    econ.Time_period,
    econ.Est_gdp_in_billions,
    service.Pct_managed_drinking_water_services
FROM 
    Economic_Indicators AS econ
INNER JOIN 
    Basic_Services AS service
ON 
    econ.Country_name = service.Country_name
    AND econ.Time_period = service.Time_period
WHERE
    econ.time_period = 2020
    AND service.Pct_managed_drinking_water_services < 90
    AND econ.Est_gdp_in_billions > (SELECT 
                                        AVG(Est_gdp_in_billions)
                                    FROM 
                                        Economic_Indicators 
                                    WHERE 
                                        Time_period = 2020);


## Summary

 


This layered query first calculates the average GDP and then uses that value to filter out countries, along with all of the other criteria. 
Nigeria is the only country that satisfies this criteria. While its GDP might be above average, the low access to water is linked to policies that are not aligned with SDG 6, bad governance, infrastructure, quality, and supply, according to the World Bank.
Finally, note that these queries quickly become complex when we use JOINs, subqueries, filters, and calculations together. The code is hard to read now and will be even harder to read for someone else who might work on this later. So we need to be mindful of this as we create these complex queries. 