# kaggle - Intro to SQL & Advanced SQL

https://www.kaggle.com/learn/intro-to-sql  (Chapter 1 -6)  

https://www.kaggle.com/learn/advanced-sql  (Chapter 7-10)

## Table of Contents
- [1. Getting Stared With SQL and BigQuery](#1.-Getting-Stared-With-SQL-and-BigQuery)
    - [1.1 Introduction](#1.1-Introduction)
    - [1.2 First BigQuery commands](#1.2-First-BigQuery-commands)
    - [1.3 Table schema](#1.3-Table-schema)
- [2. Select, From & Where](#2.-Select,-From-&-Where)
    - [2.1 Introduction](#2.1-Introduction)
    - [2.2 SELECT ... FROM](#2.2-SELECT-...-FROM)
    - [2.3 WHERE ...](#2.3-WHERE-...)
    - [2.4 Submitting the query to the dataset](#2.4-Submitting-the-query-to-the-dataset)
- [3. Group by, Having & Count](#3.-Group-by,-Having-&-Count)
    - [3.1 COUNT()](#3.1-COUNT())
    - [3.2 GROUP BY](#3.2-GROUP-BY)
    - [3.3 GROUP BY ... HAVING](#3.3-GROUP-BY-...-HAVING)
    - [3.4 Aliasing and other improvements](#3.4-Aliasing-and-other-improvements)
    - [3.5 Note on using GROUP BY](#3.5-Note-on-using-GROUP-BY)
- [4. Order by](#4.-Order-by)
    - [4.1 ORDER BY](#4.1-ORDER-BY)
    - [4.2 Dates](#4.2-Dates)
    - [4.3 EXTRACT](#4.3-EXTRACT)
- [5. As & With](#5.-As-&-With)
    - [5.1 Introduction](#5.1-Introduction)
    - [5.2 AS](#5.2-AS)
    - [5.3 WITH ... AS](#5.3-WITH-...-AS)
    - [5.4 Example](#5.4-Example)
- [6. Joining Data](#6.-Joining-Data)
    - [6.1 Example](#6.1-Example)
    - [6.2 JOIN](#6.2-JOIN)
- [7. Combine information from multiple tables.](#7.-Combine-information-from-multiple-tables.)
    - [7.1 JOINs](#7.1-JOINs)
    - [7.2 UNIONs](#7.2-UNIONs)
    - [7.3 Example](#7.3-Example)
    - [7.4 Exercise](#7.4-Exercise)


## 1. Getting Stared With SQL and BigQuery

Learn the workflow for handling big datasets with BigQuery and SQL

### 1.1 Introduction
__Structured Query Language__ (SQL), is the programming language used with databases, and it is an important skill for any data scientist. In this course, you'll build your SQL skills using __BigQuery__, a web service that lets you apply SQL to huge datasets.

### 1.2 First BigQuery commands

Import BigQuery Python package
> ```python
from google.cloud import bigquery
>```

The first step in the workflow is to create a `Client` object. 
> ```python 
> # Create a "Client" object
client = bigquery.Client()
> ```

We'll work with a dataset of posts on [Hacker News](https://news.ycombinator.com/), a website focusing on computer science and cybersecurity news.

In BigQuery, each dataset is contained in a corresponding project. In this case, our `hacker_news` dataset is contained in the `bigquery-public-data` project. To access the dataset,
- We begin by constructing a reference to the dataset with the `dataset()` method.
- Next, we use the `get_dataset()` method, along with the reference we just constructed, to fetch the dataset.

> ```python
> # Construct a reference to the "hacker_news" dataset
> dataset_ref = client.dataset("hacker_news", project="bigquery-public-data")  
>
> # API request - fetch the dataset
dataset = client.get_dataset(dataset_ref)
> ```

Every dataset is just a collection of tables. You can think of a dataset as a spreadsheet file containing multiple tables, all composed of rows and columns.  
We use the `list_tables()` method to list the tables in the dataset.

> ```python
> # List all the tables in the "hacker_news" dataset
> tables = list(client.list_tables(dataset))
> 
> # Print names of all tables in the dataset (there are four!)
> for table in tables:  
>     print(table.table_id)
> ```

<p style="background:black">
<code style="background:black;color:white">comments
full
full_201510
stories
</code>
</p>

Similar to how we fetched a dataset, we can fetch a table. In the code cell below, we fetch the `full` table in the `hacker_news` dataset.

> ```python
> # Construct a reference to the "full" table
> table_ref = dataset_ref.table("full")
> 
> # API request - fetch the table
> table = client.get_table(table_ref)
> ```

<!-- ![BigQuery](img/bigquery.png "BigQuery") -->
<img src="img/bigquery.png" alt="Drawing" title="BigQuery" style="width: 800px;"/>

### 1.3 Table schema
The structure of a table is called its **schema**. We need to understand a table's schema to effectively pull out the data we want.

> ```python
> # Print information on all the columns in the "full" table in the "hacker_news" dataset
> table.schema
> ```

<p style="background:black">
<code style="background:black;color:white">[SchemaField('title', 'STRING', 'NULLABLE', 'Story title', (), None),
 SchemaField('url', 'STRING', 'NULLABLE', 'Story url', (), None),
 ...,
 SchemaField('deleted', 'BOOLEAN', 'NULLABLE', 'Is deleted?', (), None)]
</code>
</p>

Each `SchemaField` tells us about a specific column (which we also refer to as a field). In order, the information is:
- **name** of the column
- **field type** (or **datatype**) in the column
- **mode** of the column ('NULLABLE' means that a column allows NULL values, and is the default)
- **description** of the data in that column

The first field has the SchemaField:
<p style="background:black">
<code style="background:black;color:white">SchemaField('by', 'string', 'NULLABLE', "The username of the item's author.",())
</code>
</p>

- the field (or column) is called by,
- the data in this field is strings,
- NULL values are allowed, and
- it contains the usernames corresponding to each item's author.

We can use the `list_rows()` method to check just the first five lines of of the full table to make sure this is right. (Sometimes databases have outdated descriptions, so it's good to check.) This returns a BigQuery `RowIterator` object that can quickly be converted to a pandas DataFrame with the `to_dataframe()` method.

> ```python
> # Preview the first five lines of the "full" table
> client.list_rows(table, max_results=5).to_dataframe()
> ```

The list_rows() method will also let us look at just the information in a specific column. If we want to see the first five entries in the by column, for example, we can do that!

> ```python
> # Preview the first five entries in the "by" column of the "full" table
> client.list_rows(table, selected_fields=table.schema[:1], max_results=5).to_dataframe()
> ```

## 2. Select, From & Where

The foundational components for all SQL queries

### 2.1 Introduction

We'll work with a small imaginary dataset `pet_records` which contains just one table, called pets.

| ID | Name | Animal| 
| --- | --- | --- | 
| 1 | Dr. Harris Bonkers | Rabbit | 
| 2 | Moon | Dog|
| 3 | Ripley | Cat |
| 4 | Tom | Cat |

### 2.2 SELECT ... FROM

Select a single column
- `SELECT` specifies the column
- `FROM` specifies the table

> ```python
SELECT Name
FROM `bigquery-public-data.pet_records.pets`
> ```

Select multiple columns
> ```python
SELECT Name, Animal
FROM `bigquery-public-data.pet_records.pets`
> ```

Select all columns
> ```python
SELECT *
FROM `bigquery-public-data.pet_records.pets`
> ```

<div class="alert alert-block alert-info"><b>Note:</b> when writing an SQL query, the argument we pass to FROM is not in single or double quotation marks (' or "). It is in backticks (`).</div>

### 2.3 WHERE ...

Return only the rows meeting specific conditions using the `WHERE` clause.

> ```python
SELECT Name
FROM `bigquery-public-data.pet_records.pets`
WHERE Animal='Cat'
> ```

### 2.4 Submitting the query to the dataset

> ```python
query = """
        SELECT Name
        FROM `bigquery-public-data.pet_records.pets`
        WHERE Animal='Cat'
        """
> # Create a "Client" object
client = bigquery.Client()
> # Only run the query if it's less than 1 MB
safe_config = bigquery.QueryJobConfig(maximum_bytes_billed=1000000)
> # Set up the query (will only run if it's less than 1 MB)
safe_query_job = client.query(query, job_config=safe_config)
> # API request - try to run the query, and return a pandas DataFrame
safe_query_job.to_dataframe()
> ```

## 3. Group by, Having & Count

Get more interesting insights directly from your SQL queries

### 3.1 COUNT()

`COUNT()` returns a count of things. If you pass it the name of a column, it will return the number of entries in that column.

> ```python
SELECT COUNT(ID)
FROM `bigquery-public-data.pet_records.pets`
> ```

| f0_ |
| --- | 
| 4 | 


`COUNT()` is an example of an **aggregate function**, which takes many values and returns one.  
Other examples: `SUM()`, `AVG()`, `MIN()`, `MAX()`.

### 3.2 GROUP BY

`GROUP BY` takes the name of one or more columns, and treats all rows with the same value in that column as a single group when you apply aggregate functions like `COUNT()`.

How many of each type of animal in the pets table? We can use `GROUP BY` to group together rows that have the same value in the `Animal` column, while using `COUNT()` to find out how many ID's we have in each group.

> ```python
SELECT Animal, COUNT(ID)
FROM `bigquery-public-data.pet_records.pets`
GROUP BY Animal
> ```

| Animal | f0_ |
| ------ | --- |  
| Rabbit | 1 |
| Dog | 1 |
| Cat | 2 |

### 3.3 GROUP BY ... HAVING

`HAVING` is used in combination with `GROUP BY` to ignore groups that don't meet certain criteria.

Only include groups that have more than one ID in them

> ```python
SELECT Animal, COUNT(ID)
FROM `bigquery-public-data.pet_records.pets`
GROUP BY Animal
HAVING COUNT(ID)>1
> ```

| Animal | f0_ |
| ------ | --- |  
| Cat | 2 |

### 3.4 Aliasing and other improvements

- Aliasing: adding `AS NewName` after you specify the aggregation replaces the column name `f0__`
- If you are ever unsure what to put inside the `COUNT()` function, you can do `COUNT(1)` to count the rows in each group. Most people find it especially readable, because we know it's not focusing on other columns. It also scans less data than if supplied column names (making it faster and using less of your data access quota).

> ```python
SELECT Animal, COUNT(1) AS NumPets
FROM `bigquery-public-data.pet_records.pets`
GROUP BY Animal
HAVING COUNT(1)>1
> ```

| Animal | NumPets |
| ------ | --- |  
| Cat | 2 |

### 3.5 Note on using GROUP BY

Note that because it tells SQL how to apply aggregate functions (like `COUNT()`), it doesn't make sense to use `GROUP BY` without an aggregate function. Similarly, if you have any `GROUP BY` clause, then all variables must be passed to either a
1. `GROUP BY` command, or
2. an aggregation function.


> ```python
SELECT ID, Animal, COUNT(1) AS NumPets
FROM `bigquery-public-data.pet_records.pets`
GROUP BY Animal
HAVING COUNT(1)>1
> ```

<p style="background:black">
<code style="background:black;color:red">SELECT list expression references column (column's name) which is neither grouped nor aggregated at
</code>
</p>


## 4. Order by

Order your results to focus on the most important data for your use case.

### 4.1 ORDER BY

`ORDER BY` is usually the <span style="color:blue">**last**</span> clause in your query, and it sorts the results returned by the rest of your query.

> ```python
SELECT ID, Name, Animal
FROM `bigquery-public-data.pet_records.pets`
ORDER BY ID
> ```

| ID | Name | Animal| 
| --- | --- | --- | 
| 1 | Dr. Harris Bonkers | Rabbit | 
| 2 | Moon | Dog|
| 3 | Ripley | Cat |
| 4 | Tom | Cat |

The `ORDER BY` clause also works for columns containing text, where the results show up in alphabetical order.

> ```python
SELECT ID, Name, Animal
FROM `bigquery-public-data.pet_records.pets`
ORDER BY Animal
> ```

| ID | Name | Animal| 
| --- | --- | --- | 
| 4 | Tom | Cat |
| 3 | Ripley | Cat |
| 2 | Moon | Dog|
| 1 | Dr. Harris Bonkers | Rabbit | 

You can reverse the order using the `DESC` argument (default `ASC`).

> ```python
SELECT ID, Name, Animal
FROM `bigquery-public-data.pet_records.pets`
ORDER BY Animal DESC
> ```

| ID | Name | Animal| 
| --- | --- | --- | 
| 1 | Dr. Harris Bonkers | Rabbit | 
| 2 | Moon | Dog|
| 3 | Ripley | Cat |
| 4 | Tom | Cat |

### 4.2 Dates

There are two ways that dates can be stored in BigQuery: as a **DATE** or as a **DATETIME**.

The **DATE** format has the year first, then the month, and then the day. It looks like this:  

`YYYY-[M]M-[D]D`
- YYYY: Four-digit year
- [M]M: One or two digit month
- [D]D: One or two digit day

The **DATETIME** format is like the date format ... but with time added at the end.

### 4.3 EXTRACT

Often you'll want to look at part of a date, like the year or the day. You can do this with `EXTRACT`. Table `pets_with_date`.

| ID | Name | Animal| Date |
| --- | --- | --- | --- |
| 1 | Dr. Harris Bonkers | Rabbit | 2019-04-18 | 
| 2 | Moon | Dog| 2019-05-16 |
| 3 | Ripley | Cat | 2019-01-07 |
| 4 | Tom | Cat | 2019-02-23 |

> ```python
SELECT Name, EXTRACT(DAY from Date) AS Day
FROM `bigquery-public-data.pet_records.pets_with_date`
> ```

| Name | Day |
| --- | --- |
| Dr. Harris Bonkers | 18 | 
| Moon | 16 |
| Ripley | 7 |
| Tom | 23 |

[Date and time functions](https://cloud.google.com/bigquery/docs/reference/legacy-sql#datetimefunctions)

> ```python
SELECT Name, EXTRACT(WEEK from Date) AS Week
FROM `bigquery-public-data.pet_records.pets_with_date`
> ```

| Name | Week |
| --- | --- |
| Dr. Harris Bonkers | 15 | 
| Moon | 19 |
| Ripley | 1 |
| Tom | 7 |

> ```python
> # Query to find out the number of accidents for each day of the week
query = """
        SELECT COUNT(consecutive_number) AS num_accidents, 
               EXTRACT(DAYOFWEEK FROM timestamp_of_crash) AS day_of_week
        FROM `bigquery-public-data.nhtsa_traffic_fatalities.accident_2015`
        GROUP BY day_of_week
        ORDER BY num_accidents DESC
        """
> ```

## 5. As & With

Organize your query for better readability. This becomes especially important for complex queries.

### 5.1 Introduction

Use **AS** and **WITH** to tidy up queries and make them easier to read.

| ID | Name | Animal| Years_old |
| --- | --- | --- | --- |
| 1 | Dr. Harris Bonkers | Rabbit | 4.5 | 
| 2 | Moon | Dog| 9.0 |
| 3 | Ripley | Cat | 1.5 |
| 4 | Tom | Cat | 7.8 |

### 5.2 AS

Use **AS** to rename the columns generated by your queries, which is also known as **aliasing**.

### 5.3 WITH ... AS

A **common table expression** (or **CTE**) is a temporary table that you return within your query. CTEs are helpful for splitting your queries into readable chunks, and you can write queries against them.

For instance, you might want to use the pets table to ask questions about older animals in particular. So you can start by creating a CTE which only contains information about animals more than five years old like this:

> ```python
WITH Seniors AS
(
  SELECT ID, Name
  FROM `bigquery-public-data.pet_records.pets`
  WHERE Years_old > 5
)
SELECT ID
FROM Seniors
> ```

You could do this without a CTE, but if this were the first part of a very long query, removing the CTE would make it much harder to follow.

Also, it's important to note that CTEs only exist inside the query where you create them, and you can't reference them in later queries. So, any query that uses a CTE is always broken into two parts: (1) first, we create the CTE, and then (2) we write a query that uses the CTE.

### 5.4 Example

> ```python
> # Query to select the number of transactions per date, sorted by date
query_with_CTE = """ 
                 WITH time AS 
                 (
                     SELECT DATE(block_timestamp) AS trans_date
                     FROM `bigquery-public-data.crypto_bitcoin.transactions`
                 )
                 SELECT COUNT(1) AS transactions,
                        trans_date
                 FROM time
                 GROUP BY trans_date
                 ORDER BY trans_date
                 """
> # Set up the query (cancel the query if it would use too much of 
> # your quota, with the limit set to 10 GB)
> safe_config = bigquery.QueryJobConfig(maximum_bytes_billed=10\*\*10)
query_job = client.query(query_with_CTE, job_config=safe_config)  
> # API request - run the query, and convert the results to a pandas DataFrame
transactions_by_date = query_job.to_dataframe()
># Print the first five rows
transactions_by_date.head()
> ```

> ```python
> # plot
transactions_by_date.set_index('trans_date').plot()
> ```

Exercise  
Write a query that shows, for each hour of the day in the dataset, the corresponding number of trips and average speed.

> ```python
> speeds_query = """
               WITH RelevantRides AS
               (
                   SELECT EXTRACT(HOUR FROM trip_start_timestamp) AS hour_of_day, 
                          trip_miles, 
                          trip_seconds
                   FROM `bigquery-public-data.chicago_taxi_trips.taxi_trips`
                   WHERE trip_start_timestamp > '2017-01-01' AND 
                         trip_start_timestamp < '2017-07-01' AND 
                         trip_seconds > 0 AND 
                         trip_miles > 0
               )
               SELECT hour_of_day, 
                      COUNT(1) AS num_trips, 
                      3600 * SUM(trip_miles) / SUM(trip_seconds) AS avg_mph
               FROM RelevantRides
               GROUP BY hour_of_day
               ORDER BY hour_of_day
               """
> # Set up the query (cancel the query if it would use too much of 
> # your quota)
safe_config = bigquery.QueryJobConfig(maximum_bytes_billed=10\*\* 10)
speeds_query_job = client.query(speeds_query, job_config=safe_config)
> # API request - run the query, and return a pandas DataFrame
speeds_result = speeds_query_job.to_dataframe()
> # View results
print(speeds_result)
> ```

## 6. Joining Data

Combine data sources. Critical for almost all real-world data problems

### 6.1 Example

Use **JOIN** to create a new table combining information from the pets and owners tables.

`owners` table

| ID | Name | Pet_ID | 
| --- | --- | --- | 
| 1 | Aubrey Little | 1 | 
| 2 | Chett Crawfish | 3 |
| 3 | Jules Spinner | 4 |
| 4 | Magnus | 2 |

`pets` table

| ID | Name | Animal| 
| --- | --- | --- | 
| 1 | Dr. Harris Bonkers | Rabbit | 
| 2 | Moon | Dog|
| 3 | Ripley | Cat |
| 4 | Tom | Cat |

### 6.2 JOIN

Using **JOIN**, we can write a query to create a table with just two columns: the name of the pet and the name of the owner.

> ```python
SELECT p.Name AS Pet_Name, o.Name AS Owner_Name,
FROM `bigquery-public-data.pet_records.pets` AS p
INNER JOIN `bigquery-public-data.pet_records.owners` AS o
ON p.ID=o.Pet_ID
>```

We combine information from both tables by matching rows where the ID column in the `pets` table matches the `Pet_ID` column in the owners table.

In the query, **ON** determines which column in each table to use to combine the tables. Notice that since the ID column exists in both tables, we have to clarify which one to use. We use p.ID to refer to the ID column from the pets table, and o.Pet_ID refers to the Pet_ID column from the owners table.

> In general, when you're joining tables, it's a good habit to specify which table each of your columns comes from. That way, you don't have to pull up the schema every time you go back to read the query.

The type of **JOIN** we're using today is called an **INNER JOIN**. That means that a row will only be put in the final output table if the value in the columns you're using to combine them shows up in both the tables you're joining.

How many files are covered by each type of software license?
<img src="img/join.png" alt="join" title="join" style="width: 800px;"/>

Exercise

> ```python
from google.cloud import bigquery
># Create a "Client" object
client = bigquery.Client()
># Construct a reference to the "stackoverflow" dataset
dataset_ref = client.dataset("stackoverflow", project="bigquery-public-data")
># API request - fetch the dataset
dataset = client.get_dataset(dataset_ref)
> # Get a list of available tables 
tables = list(client.list_tables(dataset))
list_of_tables = [table.table_id for table in tables] # Your code here
> # Print your answer
print(list_of_tables)
> ```

A **WHERE** clause can limit your results to rows with certain text using the **LIKE** feature.

Use % as a "wildcard" for **any number of characters**. Select name contains 'ipl':

> ```python
query = """
        SELECT * 
        FROM `bigquery-public-data.pet_records.pets` 
        WHERE Name LIKE '%ipl%'
        """
> ```

A general function of a query that has a single row for each user who answered at least one question with a tag that includes the string "{topic}". Your results should have two columns:
- `user_id` - contains the `owner_user_id` column from the `posts_answers` table
- `number_of_answers` - contains the number of answers the user has written to "bigquery"-related questions

> ```python
def expert_finder(topic, client):
    '''
    Returns a DataFrame with the user IDs who have written Stack Overflow answers on a topic.
    Inputs:
        topic: A string with the topic of interest
        client: A Client object that specifies the connection to the Stack Overflow dataset
    Outputs:
        results: A DataFrame with columns for user_id and number_of_answers. Follows similar logic to bigquery_experts_results shown above.
    '''
    my_query = """
               SELECT a.owner_user_id AS user_id, COUNT(1) AS number_of_answers
               FROM `bigquery-public-data.stackoverflow.posts_questions` AS q
               INNER JOIN `bigquery-public-data.stackoverflow.posts_answers` AS a
                   ON q.id = a.parent_Id
               WHERE q.tags like '%{topic}%'
               GROUP BY a.owner_user_id
               """
    # Set up the query (a real service would have good error handling for 
    # queries that scan too much data)
    safe_config = bigquery.QueryJobConfig(maximum_bytes_billed=10**10)      
    my_query_job = client.query(my_query, job_config=safe_config)
    # API request - run the query, and return a pandas DataFrame
    results = my_query_job.to_dataframe()
    return results
> ```

## 7. Combine information from multiple tables.

Combine information from multiple tables.

### 7.1 JOINs

<img src="img/pets_owners.png" alt="" title="Pets and Owners tables" style="width: 800px;"/>

**LEFT JOIN** returns all rows where the two tables have matching entries, along with all of the rows in the left table (whether there is a match or not).  
**RIGHT JOIN** returns the matching rows, along with all rows in the right table (whether there is a match or not).  
**FULL JOIN** returns all rows from both tables. Note that in general, any row that does not have a match in both tables will have NULL entries for the missing values. You can see this in the image below.

<img src="img/jointypes.png" alt="" title="Different types of JOIN" style="width: 800px;"/>

### 7.2 UNIONs

- **JOINs** horizontally combine results from different tables. 
- **UNION** vertically concatenates columns. The example query below combines the Age columns from both tables.

<img src="img/union.png" alt="" title="`UNION` combines `Age columns from both tables" style="width: 800px;"/>

Note that with a **UNION**, the data types of both columns must be the same, but the column names can be different.

**UNION ALL** -  include duplicate values   
**UNION DISTINCT** drops duplicate values

### 7.3 Example

> ```python
> # Query to select all stories posted on January 1, 2012, with number of comments
join_query = """
             WITH c AS
             (
             SELECT parent, COUNT(*) as num_comments
             FROM `bigquery-public-data.hacker_news.comments` 
             GROUP BY parent
             )
             SELECT s.id as story_id, s.by, s.title, c.num_comments
             FROM `bigquery-public-data.hacker_news.stories` AS s
             LEFT JOIN c
             ON s.id = c.parent
             WHERE EXTRACT(DATE FROM s.time_ts) = '2012-01-01'
             ORDER BY c.num_comments DESC
             """  
> # Run the query, and return a pandas DataFrame
join_result = client.query(join_query).result().to_dataframe()
join_result.head()
> ```

> ```python
> # Query to select all users who posted stories or comments on January 1, 2014
union_query = """
              SELECT c.by
              FROM `bigquery-public-data.hacker_news.comments` AS c
              WHERE EXTRACT(DATE FROM c.time_ts) = '2014-01-01'
              UNION DISTINCT
              SELECT s.by
              FROM `bigquery-public-data.hacker_news.stories` AS s
              WHERE EXTRACT(DATE FROM s.time_ts) = '2014-01-01'
              """
> # Run the query, and return a pandas DataFrame
union_result = client.query(union_query).result().to_dataframe()
union_result.head()
> ```

### 7.4 Exercise

1) How long does it take for questions to receive answers?
> ```python
correct_query = """
                SELECT q.id AS q_id,
                    MIN(TIMESTAMP_DIFF(a.creation_date, q.creation_date, SECOND)) AS time_to_answer
                FROM `bigquery-public-data.stackoverflow.posts_questions` AS q
                    LEFT JOIN `bigquery-public-data.stackoverflow.posts_answers` AS a
                ON q.id = a.parent_id
                WHERE q.creation_date >= '2018-01-01' and q.creation_date < '2018-02-01' 
                GROUP BY q_id
                ORDER BY time_to_answer
                """
> # Check your answer
q_1.check()
> # Run the query, and return a pandas DataFrame
correct_result = client.query(correct_query).result().to_dataframe()
print("Percentage of answered questions: %s%%" % \
      (sum(correct_result["time_to_answer"].notnull()) / len(correct_result) * 100))
print("Number of questions:", len(correct_result))
> ```

2) Initial questions and answers
> ```python
SELECT q.owner_user_id AS owner_user_id,
    MIN(q.creation_date) AS q_creation_date,
    MIN(a.creation_date) AS a_creation_date
FROM `bigquery-public-data.stackoverflow.posts_questions` AS q
    RIGHT JOIN `bigquery-public-data.stackoverflow.posts_answers` AS a
ON q.owner_user_id = a.owner_user_id 
WHERE q.creation_date >= '2019-01-01' AND q.creation_date < '2019-02-01' 
    AND a.creation_date >= '2019-01-01' AND a.creation_date < '2019-02-01'
GROUP BY owner_user_id
> ```

3) When did users post their first questions and answers, if ever?
> ```python
SELECT u.id AS id,
    MIN(q.creation_date) AS q_creation_date,
    MIN(a.creation_date) AS a_creation_date
FROM `bigquery-public-data.stackoverflow.users` AS u
    LEFT JOIN `bigquery-public-data.stackoverflow.posts_answers` AS a
        ON u.id = a.owner_user_id
    LEFT JOIN `bigquery-public-data.stackoverflow.posts_questions` AS q
        ON u.id = q.owner_user_id
WHERE u.creation_date >= '2019-01-01'
    AND u.creation_date < '2019-02-01'
GROUP BY id
> ```


4) How many distinct users posted on January 1, 2019? 
> ```python
SELECT q.owner_user_id 
FROM `bigquery-public-data.stackoverflow.posts_questions` AS q
WHERE EXTRACT(DATE FROM q.creation_date) = '2019-01-01'
UNION DISTINCT
SELECT a.owner_user_id
FROM `bigquery-public-data.stackoverflow.posts_answers` AS a
WHERE EXTRACT(DATE FROM a.creation_date) = '2019-01-01'
> ```

