# Use subqueries

## Scenario

I am a junior data analyst for a multinational food and beverage manufacturer. My team is responsible for maintaining the safety of a wide array of food products. Because of the overwhelming number of products on the market, I have been asked to prioritize which products need to be reviewed by your stakeholders.

While it's useful to know which food industries receive the most complaints, the more critical aspect to consider is identifying the complaints that lead to severe health consequences, such as hospital visits. To this end, I will analyze food event reports for targeted health interventions.

## Dataset

For the analysis, I will use the BigQuery public dataset `fda_food` with the full path `bigquery-public-data.fda_food`, which contains the following tables:

- food_enforcement
- food-events

## Query: Number of hospitalization in the 10 industries with the most complaints

Using the `food_events` table, I execute the following query to determine the 10 industries with the most complaints and how many of these complaints led to hospitalizations:

In [None]:
SELECT
  products_industry_name,
  COUNT(report_number) AS count_hospitalizations
FROM
  bigquery-public-data.fda_food.food_events
/* Filter to find reports that are 1 of 10 with most
   complaints and where the outcome was hospitalization  */
WHERE products_industry_name IN
  (
    -- Subquery to determine 10 industries with most complaints
    SELECT
      products_industry_name
    FROM
      bigquery-public-data.fda_food.food_events
    GROUP BY
      products_industry_name
    ORDER BY
      COUNT(report_number) DESC
    LIMIT 10
  )
AND outcomes LIKE '%Hospitalization%'
GROUP BY
  products_industry_name
ORDER BY
  count_hospitalizations DESC;

The results is a table with the 10 industries with the most complaints and their number of hospitalization sorted in descending order as shown below:

![Hospitalization for industries with most complaints](c05m03-query-most-reports-hospitalizations.png 'Hospitalization for industries with most complaints')

## Query: The 10 industries with the highest percentage hospitalizations

As some industries may have higher overall report volumes than others, simply looking at the number of hospitalizations can be misleading. To identify the industries with the higher likelihood of hospitalizations, I execute the following query to determine the percentage of reports that had an outcome of hospitalization:

In [None]:
SELECT
  all_reports.products_industry_name,
  all_reports.total_reports,
  hosp_reports.hospitalizations,
  ROUND((hosp_reports.hospitalizations / all_reports.total_reports) * 100, 2) AS percentage_hospitalizations
FROM
  /* Subquery "all_reports" returns helper table with total reports
     for each industry to enable efficient reuse */
  (
    SELECT
      products_industry_name,
      COUNT(report_number) AS total_reports
    FROM
      `bigquery-public-data.fda_food.food_events`
    GROUP BY
      products_industry_name
  )
  AS all_reports
JOIN
  /* Subquery "hosp_reports" returns helper table with number of
     hospitalizations for each industry to enable efficient reuse */
  (
    SELECT
      products_industry_name,
      COUNT(report_number) AS hospitalizations
    FROM
      `bigquery-public-data.fda_food.food_events`
    WHERE
      outcomes LIKE '%Hospitalization%'
    GROUP BY
      products_industry_name
  )
  AS hosp_reports
ON
  all_reports.products_industry_name = hosp_reports.products_industry_name
ORDER BY
  percentage_hospitalizations DESC
LIMIT 10;

Although **Vit/Min/Prot/Unconv Diet(Human/Animal)** is the industry with the most reports and likely the highest number of hospitalizations overall, with a percentage hospitalizations of 26.24% it is the seventh most likely industry to result in a hospitalization. The industry with the highest likelihood of a hospitalization is **Liquid Concentrate Formula** with a percentage hospitalization of 60%.

![Highest percentage of hospitalizations](c05m03-query-highest-percentage.png 'Highest percentage of hospitalizations')

Very low report volume in some industries can further skew my results so it may be necessary to add more filters to this query for more accurate results.