# Activity: SQL sorting queries

## Overview

For this activity, I am a public health researcher with a state government agency. For my current project, I need to identify counties in the United States that have the most and least births in the 2016-2018 time frame. Additionally, I am curious of the birth rate trend in update New York, specifically the Erie, Niagara, and Chautauqua counties.

## Dataset

I will use the BigQuery public dataset called `sdoh_cdc_wonder_natality` with the full path `bigquery-public-data.sdoh_cdc_wonder_natality`. To view all the tables in this public dataset in BigQuery, I use the INFORMATION_SCHEMA.TABLES view as follows:

In [None]:
SELECT *
FROM `bigquery-public-data.sdoh_cdc_wonder_natality.INFORMATION_SCHEMA.TABLES`;

We have the following tables to work with to obtain the data:

- `county_natality`
- `county_natality_by_abnormal_conditions`
- `county_natality_by_congenital_abnormalities`
- `county_natality_by_payment`
- `county_natality_by_father_race`
- `county_natality_by_maternal_morbidity`
- `county_natality_by_mother_race`

To conduct my research, I will require number of `Births`, `Year` of birth and `County_of_Residence`. This specific data is contained in the `county_natality` table that I will use for queries in this activity.

## Query: Top 10 U. S. Counties with the least births

Simply sorting the dataset by the `Births` column and limiting the results to the top 10 rows would return the 10 rows with the lowest birth counts. However, the focus of this project is to identify the U. S. Counties with the lowest birth rates, and a simple sort like that could result in the same county appearing multiple times in the top 10. Therefore, I will use a GROUP BY clause to group the results by `County_of_Residence` to obtain the overall birth count for each county.

I will also use multiple CASE statements to calculate the total number of births for each of the years 2016, 2017, and 2018, as well as the overall total number of births across these years. The following SQL query was employed to extract the required data from the dataset:

In [None]:
SELECT
  County_of_Residence,
  SUM(Births) as Total_Births,
  SUM(
    CASE
      WHEN EXTRACT(YEAR FROM Year) = 2016 THEN Births
      ELSE 0 
    END) AS Births_2016,
  SUM(
    CASE
      WHEN EXTRACT(YEAR FROM Year) = 2017 THEN Births
      ELSE 0 
    END) AS Births_2017,
  SUM(
    CASE
      WHEN EXTRACT(YEAR FROM Year) = 2018 THEN Births
      ELSE 0 
    END) AS Births_2018,  
FROM
  `bigquery-public-data.sdoh_cdc_wonder_natality.county_natality`
GROUP BY
  County_of_Residence
ORDER BY
  Total_Births
LIMIT 10;

The query successfully returns the 10 U. S. Counties with the lowest number of births for the years 2016, 2017 and 2018. The results of the query is shown below:

![Lowest 10 Births](c05m01-births-lowest-10.png 'Lowest 10 Births')

## Query: Top 10 U. S. Counties with the most births

To determine the 10 U.S. Counties with the highest number of births, a slight modification to the previous query was made. The ORDER BY clause was adjusted to sort the results in descending order by Total_Births. The SQL query was modified as follows:

In [None]:
SELECT
  County_of_Residence,
  SUM(Births) as Total_Births,
  SUM(
    CASE
      WHEN EXTRACT(YEAR FROM Year) = 2016 THEN Births
      ELSE 0 
    END) AS Births_2016,
  SUM(
    CASE
      WHEN EXTRACT(YEAR FROM Year) = 2017 THEN Births
      ELSE 0 
    END) AS Births_2017,
  SUM(
    CASE
      WHEN EXTRACT(YEAR FROM Year) = 2018 THEN Births
      ELSE 0 
    END) AS Births_2018,  
FROM
  `bigquery-public-data.sdoh_cdc_wonder_natality.county_natality`
GROUP BY
  County_of_Residence
ORDER BY
  Total_Births DESC -- Modification to change sorting order
LIMIT 10;

The query returns the top 10 U. S. Counties with the highest number of births for the years 2016, 2017 and 2018:

![Highest 10 Births](c05m01-births-highest-10.png 'Highest 10 Births')

## Query: Birth rate trend for Erie, Niagara, and Chautauqua counties

This query retrieves all birth-related data for Erie, Niagara, and Chautauqua counties in New York, presenting the results in a structured manner, sorted by county and then chronologically by year:

In [None]:
SELECT
  *
FROM
  bigquery-public-data.sdoh_cdc_wonder_natality.county_natality
WHERE
  County_of_Residence IN ('Erie County, NY','Niagara County, NY','Chautauqua County, NY')
ORDER BY
  County_of_Residence, 
  Year;

The query results return the number of births in the counties of Erie, Niagara and Chautauqua for the years 2016, 2017 and 2018, respectively:

![NY County Births](c05m01-births-ny-counties.png 'NY County Births')