# Kaggle Intro to SQL (and BigQuery)
- https://www.kaggle.com/learn/intro-to-sql

## 4. Order By
- Order your results to focus on the most important data for your use case.

### Introduction

- SELECT to pull specific dolumns from a table + along with WHERE to pull specific rows that meet specidied criteria.
- With SELECT you can use aggregate functions like COUNT(), SUM(), etc., alogn with GROUP BY to treat multiple rows as a single group.
- ORDER BY: to change the order of your results. Ex. a popular case by applying ordering to dates.

### ORDER BY

- Is usually the last clause in your query, and it sorts the results returned by the rest of your query. DESC for reversed order.
``` Python:
query = '''
    SELECT ID, Name, Animal
    FROM `bigquery-public-data.pet_records.pets`
    ORDER BY ID '''

query = '''
    SELECT ID, Name, Animal
    FROM `bigquery-public-data.pet_records.pets`
    ORDER BY Animal DESC '''
```

### Dates

- There are two ways that dates can be stored in BigQuery: as a DATE or as a DATETIME.
- DATE format: YYYY-[M]M-[D]D.
- DATETIME = DATE format + time added at the end.

### EXTRACT

- Often you'll want to look at part of a date, like the year or the day.
``` Python:
query = '''
    SELECT Name, EXTRACT(DAY from Date) AS Day
    FROM `bigquery-public-data.pet_records.pets_with_date` '''
```
- SQL is very smart about dates, and we can ask for information beyond just extracting part of the cell. For example, this query returns one column with just the week in the year (between 1 and 53) for each date in the Date column:
``` Python:
query = '''
    SELECT Name, EXTRACT(WEEK from Date) AS Day
    FROM `bigquery-public-data.pet_records.pets_with_date` '''
```
- https://cloud.google.com/bigquery/docs/reference/legacy-sql?hl=es-419#datetimefunctions

### Example: ORDER BY and DATE... TIME
- Which day of the week has the most fatal motor accidents?

- Let's use the US Traffic Fatality Records database, which contains information on traffic accidents in the US where at least one person died.
- We'll investigate the accident_2015 table..

In [4]:
### To fetch the dataset (in dataset var)
from google.cloud import bigquery

# Create a 'Client' object: 
client = bigquery.Client('jmproject86385')

# Construct a reference to the 'nhtsa_traffic_fatalities' dataset contained in
# bigquery-public-data project
dataset_ref = client.dataset('nhtsa_traffic_fatalities', project='bigquery-public-data')

# API request - fetch the dataset (first fetch the dataset, all tables)
dataset = client.get_dataset(dataset_ref)

# List of all the tables in the "hacker_news" dataset
tables = list(client.list_tables(dataset))
print(len(tables))
# for tbl in tables:
#     print(tbl.table_id)



143


In [5]:
# Construct a reference to the 'global_air_quality' table
table_ref = dataset_ref.table('accident_2015')

# API request - fetch the table
table = client.get_table(table_ref)

# Preview first 5 lines of the 'global_air_quality' table
#client.list_rows(table).to_dataframe()
client.list_rows(table, max_results=5).to_dataframe()

  client.list_rows(table, max_results=5).to_dataframe()


Unnamed: 0,state_number,state_name,consecutive_number,number_of_vehicle_forms_submitted_all,number_of_motor_vehicles_in_transport_mvit,number_of_parked_working_vehicles,number_of_forms_submitted_for_persons_not_in_motor_vehicles,number_of_persons_not_in_motor_vehicles_in_transport_mvit,number_of_persons_in_motor_vehicles_in_transport_mvit,number_of_forms_submitted_for_persons_in_motor_vehicles,...,minute_of_ems_arrival_at_hospital,related_factors_crash_level_1,related_factors_crash_level_1_name,related_factors_crash_level_2,related_factors_crash_level_2_name,related_factors_crash_level_3,related_factors_crash_level_3_name,number_of_fatalities,number_of_drunk_drivers,timestamp_of_crash
0,30,Montana,300019,5,5,0,0,0,7,7,...,45,0,,0,,0,,1,0,2015-03-28 14:58:00+00:00
1,39,Ohio,390099,7,7,0,0,0,15,15,...,24,27,Backup Due to Prior Crash,0,,0,,1,0,2015-02-14 11:19:00+00:00
2,49,Utah,490123,16,16,0,0,0,28,28,...,99,0,,0,,0,,1,0,2015-04-14 12:24:00+00:00
3,48,Texas,481184,6,5,1,0,5,5,10,...,99,0,,0,,0,,1,0,2015-05-27 16:40:00+00:00
4,41,Oregon,410333,11,11,0,0,0,14,14,...,99,0,,0,,0,,1,0,2015-11-17 18:17:00+00:00


Let's use the table to determine how the number of accidents varies with the day of the week. Since:
- The `consecutive_number` column contains a unique ID for each accident, and
- the `timestamp_of_crash` column contains the date of the accident in DATETIME format,
  
we can:
- __EXTRACT__ the day of the week (as `day_of_week`) from the `timestamp_of_crash` col., and
- __GROUP BY__ the day of the week, before we __COUNT__ the `consecutive_number` col. to determine the number of accidents for each day of the week.
- Then we sort the table with an __ORDER BY__ clause, so the days with the most accidents are returned first.

In [6]:
# Query to find out the number of accidents for each day of the week
query = '''
    SELECT COUNT(consecutive_number) AS num_accidents,
        EXTRACT(DAYOFWEEK from timestamp_of_crash) AS day_of_week
    FROM `bigquery-public-data.nhtsa_traffic_fatalities.accident_2015`
    GROUP BY day_of_week
    ORDER BY num_accidents DESC '''
# Set up the query (cancel the query if it would use too much of 
# your quota, with the limit set to 1 GB)
safe_config = bigquery.QueryJobConfig(maximum_bytes_billed=10**9)
query_job = client.query(query, job_config=safe_config)

# API request - run the query, and convert the results to a pandas DataFrame
accidents_by_day = query_job.to_dataframe()

# Print the DataFrame
accidents_by_day

Unnamed: 0,num_accidents,day_of_week
0,5659,7
1,5298,1
2,4916,6
3,4460,5
4,4182,4
5,4038,2
6,3985,3


- https://cloud.google.com/bigquery/docs/reference/legacy-sql?hl=es-419#datetimefunctions