# Queries for JOINs

## Overview

In this activity I practice writing queries that join multiple tables, as well as making queries more readable using aliasing. For this purpose, I will examine two tables from the World Bank’s International Education dataset in order to answer some questions.

## Dataset

I will use the BigQuery public dataset called `world_bank_intl_education` with the full path `bigquery-public-data.world_bank_intl_education`. To view all the tables in this public dataset in BigQuery, I use the INFORMATION_SCHEMA.TABLES view as follows:

In [None]:
SELECT *
FROM bigquery-public-data.world_bank_intl_education.INFORMATION_SCHEMA.TABLES;

The dataset contains the following tables:

- country_series_definitions
- country_summary
- international_education
- series_summary

## Query: Number of people of the official age for secondary education in 2015 by region

I execute the following query that will join the data from the `international_education` table to the `country_summary` table to return a list of regions with the number of people of the official age for secondary education in 2015, filtering out records:

- where no region has been captured,
- where the indicator_name is "Population of the official age for secondary education, both sexes (number)", and
- where the year is 2015.

In [None]:
SELECT
    summary.region,
    edu.year,
    SUM(edu.value) secondary_edu_pop
FROM
    `bigquery-public-data.world_bank_intl_education.international_education` AS edu
    --using edu as alias for this table
INNER JOIN
    `bigquery-public-data.world_bank_intl_education.country_summary` AS summary
    --using summary as alias for this table
ON edu.country_code = summary.country_code
   --country_code is used as key
    WHERE summary.region IS NOT NULL
    AND edu.indicator_name = 'Population of the official age for secondary education, both sexes (number)'
    AND edu.year = 2015
GROUP BY
  summary.region,
  edu.year
ORDER BY secondary_edu_pop DESC;

The query successfully returns the 7 regions of the world with the total number of people of the official age for secondary education in the year 2015 as shown below:

![Secondary education population in 2015 by region](c05m03-query-2015-edu-pop.png 'Secondary education population in 2015 by region')

## Query: Difference in average percentage of male and female population per region in 2015

Again joining the `international_education` and `country_summary` tables, I use a left  join to return a list of all regions whether their population has been indicated or not, for the year 2015, as follows:

In [None]:
SELECT
  country.region,
  edu.year,
  --Average percentage female population
  ROUND(AVG(
    CASE
      WHEN edu.indicator_name = 'Population, female (% of total)'
      THEN edu.value
      ELSE 0
    END), 2) AS avg_female_pop,
  --Average percentage male population
  ROUND(AVG(
    CASE
      WHEN edu.indicator_name = 'Population, male (% of total)'
      THEN edu.value
      ELSE 0
    END), 2) AS avg_male_pop,
  --Calculate gender gap
  ROUND(AVG(
    CASE
      WHEN edu.indicator_name = 'Population, male (% of total)'
      THEN edu.value
      ELSE 0
    END) -
    AVG(CASE
      WHEN edu.indicator_name = 'Population, female (% of total)'
      THEN edu.value
      ELSE 0 END), 2) AS gender_gap
FROM
  `bigquery-public-data.world_bank_intl_education.international_education` AS edu
LEFT JOIN
  `bigquery-public-data.world_bank_intl_education.country_summary` AS country
ON
  edu.country_code = country.country_code
WHERE
  edu.year = 2015
  AND edu.indicator_name IN ('Population, female (% of total)','Population, male (% of total)')
GROUP BY
  country.region,
  edu.year
HAVING
  country.region IS NOT NULL
--Sorting regions with more males than females first
ORDER BY
    gender_gap DESC;

The results of the query provide me with a list of the 7 regions of the world with their average female and male populations, indicating the gender gap, as shown below:

![Gender gap in 2015 by region](c05m03-query-2015-gender-gap.png 'Gender gap in 2015 by region')