In [1]:
%load_ext sql
%sql sqlite://

'Connected: @None'

# SQL Basics

## Creating Tables

We can create SQL tables either from scratch or from existing tables. 

* Each `SELECT` clause specifies the values for one row
* Each `UNION` is used to join rows together
* `AS` clause gives a name to each column
    * The `AS` doesn't need to be repeated for remaining rows after the first.

In [None]:
%%sql
CREATE TABLE [table_name] AS
    SELECT [val1] AS [column1], [val2] AS [column2], ... UNION
    SELECT [val3]             , [val4]             , ... UNION
    SELECT [val5]             , [val6]             , ...;

For example,

In [2]:
%%sql
CREATE TABLE big_game AS
  SELECT 30 AS berkeley, 7 AS stanford, 2002 AS year UNION
  SELECT 28,             16,            2003         UNION
  SELECT 17,             38,            2014;

 * sqlite://
Done.


[]

## Selecting From Tables

In [None]:
SELECT [columns] FROM [tables] WHERE [condition] ORDER BY [columns] LIMIT [limit];

* `SELECT [columns]`: which columns we want to include for the output table
    * `[columns]` is a comma-separated list of column names
    * `*` can be used to select all columns
* `FROM [table]` : which table we want to grab the columns from
* `WHERE [condition]` filters the rows that will be displayed in the output table
* `ORDER BY [columns]` orders the rows by the rows within the `[columns]`
* `LIMIT [limit]` limits the number of rows in the output table by the integer `[limit]`

Below is an example of choosing all of Berkeley's scores from the `big_game` table, but only for year later than 2002.

In [3]:
%%sql
SELECT berkeley FROM big_game WHERE year > 2002;

 * sqlite://
Done.


berkeley
17
28


And below are the scores for both Berkeley and Stanford, but only during the years when Berkeley won:

In [4]:
%%sql
SELECT berkeley, stanford FROM big_game WHERE berkeley > stanford;

 * sqlite://
Done.


berkeley,stanford
28,16
30,7


And below, we select the years when Stanford scored more than 15 points.

In [5]:
%%sql
SELECT year FROM big_game WHERE stanford > 15;

 * sqlite://
Done.


year
2014
2003


## SQL Operators

Expressions in the `SELECT`, `WHERE`, and `ORDER BY` clauses can contain one or more of the following operators:
* comparison operators: `=`, `>`, `<`, `<=`, `>=`, `<>`, `!=`
* boolean operators: `AND`, `OR`
* arithmetic operators: `+`, `-`, `*`, `/`
* concatenation operators: `||`

An example: Output the ratio of Berkeley's score to Stanford's score each year

In [7]:
%%sql
SELECT berkeley / stanford FROM big_game;

 * sqlite://
Done.


berkeley / stanford
0
1
4


Note that SQL automatically does integer division. If we want `float` division, one of the number has to be a `float`.

In [9]:
%%sql
SELECT berkeley * 1.0 / stanford FROM big_game;

 * sqlite://
Done.


berkeley * 1.0 / stanford
0.4473684210526316
1.75
4.285714285714286


And below is the sum of scores in years where both teams scored over 10 points,

In [3]:
%%sql
SELECT berkeley + stanford FROM big_game WHERE berkeley > 10 AND stanford > 10;

 * sqlite://
Done.


berkeley + stanford
55
44


And finally, below is an example of a table with a single column and a single row containing the value `"hello world"`.

In [4]:
%%sql
SELECT "hello" || " " || "world";

 * sqlite://
Done.


"""hello"" || "" "" || ""world"""
hello world


# Joins

To select data from multiple tables, we can use `joins`. There are many types of joins, but the one we'll be using is the `inner join`. To perform an `inner join` on 2 or more tables, simply list them out in the `FROM` clause of a `SELECT` statement.

In [None]:
SELECT [columns] FROM [table1], [table2], ... WHERE [condition] ORDER BY [columns] LIMIT [limit];

We can select from **multiple different tables** or from **the same table multiple times**. 

Below we have a table containing the names of football coaches at Cal since 2002,

In [5]:
%%sql
CREATE TABLE coaches AS
  SELECT "Jeff Tedford" AS name, 2002 as start, 2012 as end UNION
  SELECT "Sonny Dykes"         , 2013         , 2016        UNION
  SELECT "Justin Wilcox"       , 2017         , null;

 * sqlite://
Done.


[]

If we want to match up each game with the coach that season, we'd have to compare columns from the 2 tables in the `WHERE` clause:

In [6]:
%%sql
SELECT * FROM big_game, coaches WHERE year >= start AND year <= end;

 * sqlite://
Done.


berkeley,stanford,year,name,start,end
17,38,2014,Sonny Dykes,2013,2016
28,16,2003,Jeff Tedford,2002,2012
30,7,2002,Jeff Tedford,2002,2012


The following query outputs the name of the coach and the year for each Berkeley win recorded in `big_game`.

In [11]:
%%sql
SELECT name, year FROM big_game, coaches
    WHERE berkeley > stanford AND year >= start AND year <= end;

 * sqlite://
Done.


name,year
Jeff Tedford,2003
Jeff Tedford,2002


The queries above are relatively easy to make since none of the column names are ambiguous (e.g. the `name` column is from the `coaches` table, there's no `name` column in `big_game` table).

In case the same column name exists in more than one of the tables being joined, we need to disambiguate the column names using `aliases`.

For example, if we want to find the score difference for each team between a year and another year's game, we join `big_game` with itself.

In [18]:
%%sql
SELECT b.berkeley - a.berkeley as 'Berkeley Differences',
       b.stanford - a.stanford as 'Stanford Differences',
       b.year as 'Later', a.year as 'Earlier'
    FROM big_game AS b, big_game as a where a.year < b.year;
        

 * sqlite://
Done.


Berkeley Differences,Stanford Differences,Later,Earlier
-11,22,2014,2003
-13,31,2014,2002
-2,9,2003,2002


In the query above, we give the alias `a` to the 1st `big_game` table and `b` to the 2nd `big_game` table. We can then reference columns from each table using dot notation with the aliases.