# Queries within queries

## Overview

In this activity, I explore SQL subqueries and learn how to use them to build more complex queries. Subqueries are often used in FROM and WHERE clauses. The innermost query runs first, and its results are passed to the outer query.

## Dataset

I will use the BigQuery public dataset called `new_york` with the full path `bigquery-public-data.new_york`. This dataset contains many tables and for this activity, I will only be using the following tables:

- citibike_stations
- citibike_trips

## Exploring: Use a subquery in a SELECT statement

Using the `citibike_stations` table, I compare the number of bikes available at a particular station to the overall average number of bikes available at all stations by executing the following query:

In [None]:
/* Outer query to obtain number of bikes
   available at a particular station */

SELECT
	station_id,
	num_bikes_available,
	
	/* Subquery (inner query) to obtain
	   average number of bikes available */
	(SELECT
		AVG(num_bikes_available)
	FROM bigquery-public-data.new_york.citibike_stations)
    AS avg_num_bikes_available -- Subquery alias

FROM bigquery-public-data.new_york.citibike_stations;

The query returns a table containing the station id, the number of bikes available at each station and the overall average number of bikes available for all stations, as shown below:

![Number of bikes at each station](c05m03-query-select.png 'Number of bikes at each station')

## Exploring: Use a subquery in a FROM statement

Using both the `citibike_trips` and `citibike-stations` tables, I execute the following query to determine the total number of rides started at each station:

In [None]:
/* Outer query to obtain station id, station name from
   citibike_stations table and number of rides started
   at each station from citibike_trips table */
SELECT 
	station_id,
	name,
	number_of_rides AS number_of_rides_starting_at_station
FROM
	(
		/* Subquery to obtain number of rides grouped
        by start station id from citibike_trips table */
		SELECT
			/* Type conflict:
			   start_station_id = integer
			   station_id = string */
			CAST(start_station_id AS STRING) AS start_station_id_str,
			COUNT(*) AS number_of_rides
		FROM 
      		bigquery-public-data.new_york.citibike_trips
		GROUP BY 
			CAST(start_station_id AS STRING) -- Type conflict
	)
	AS station_num_trips -- Resulting helper table alias
	
INNER JOIN 
	bigquery-public-data.new_york.citibike_stations 
/* subquery converted start_station_id to STRING
   using CAST so join key types will match */
ON 
	station_id = start_station_id_str

/* sorted descending so that most popular stations
   to start rides will be at the top */
ORDER BY 
	number_of_rides DESC;

The results are displayed in a table indicating the station id, the station name and the number of rides that started at each station. A preview of the output is shown below:

![Number of rides from each station](c05m03-query-from.png 'Number of rides from each station')

## Explore: Use a subquery in a WHERE statement