[Table of Contents](../../index.ipynb)

# FRC Analytics with Python - Session 17
# Introduction to Structured Query Language (SQL) - Part I
**Last Updated: 17 October 2021**

This session covers *Structured Query Language* (SQL). SQL is used to to create, modify, and retrieve data stored in relational databases. SQL has been in general use since the 1980s and is required knowledge for data scientists and business analysts. SQL is included in our curriculum because the Issaquah Robotics Society's scouting system uses SQL to store, retrieve, and manipulate data.

A *database* is a software program that stores information. Relational databases are a type of database that store data in tables with rows and columns. Databases often run on computer servers and provide information to other computer programs over a network. For example, when a web browser retrieves a web page, the web page is generated by a web server like Apache, Microsoft Internet Information Services (IIS), or Nginx. Web servers typically retrieve data  needed to construct web pages from relational database servers. MySQL, PostgreSQL, and Oracle are some of the most common relational database servers.

The IRS's scouting system uses a simple relational database called *SQLite*. Sqlite is bundeled with Python, so if you've installed Python on your computer, you already have SQLite.

### SQL References
There are many high-quality SQL references on the internet. This tutorial will refer to the following three references when appropriate.
* The [W3 Schools SQL Tutorial](https://www.w3schools.com/Sql/default.asp)  The W3 tutorial covers basic SQL syntax that is common to popular database systems.  It includes simple examples and it's great for when you know what type of query you need but can't remember the exact syntax. This notebook contains links to applicable sections of the W3 SQL tutoral.
* Although basic SQL syntax is the same on different database systems, there are differences in how they support advanced SQL features. [The official SQLite documentation](https://www.sqlite.org/lang.html) provides precise descriptions of the features supported by SQLite.
* This notebook uses the *sqlite3* package from the *Python Standard Library* to interact with SQLite databases.  [The package's documenetation is available here.](https://docs.python.org/3/library/sqlite3.html) Keep in mind that SQLite can be used with many programming langauges, including Java, Python, C/C++, R, Javascript, Scala, Julia, and Ruby. Each of these languages has its own library for interacting with SQLite databases.

### If Using Google Colab
It's best if you clone the [pyclass_frc](https://github.com/irs1318dev/pyclass_frc) Github repo and run this notebook from your local computer. But if you would like to run it from Google Colab, uncomment and run the line in the next cell. (*Don't delete the exclamation point at the start of the line!*) The cell will copy a Sqlite database file from the Github repository.

In [None]:
# !wget -nv https://raw.githubusercontent.com/irs1318dev/pyclass_frc/master/sessions/s17_SQL_I/wasno2020.sqlite3
# !sudo apt install sqlite3

## I. Imports and Database Connections
To use a Sqlite database, we must first [import the `sqlite3` package](https://docs.python.org/3/library/sqlite3.html). This package is part of the *Python Standard Library*, so there is no need to install it with Anaconda.

In [None]:
import sqlite3

import pandas as pd

A Sqlite database is stored as a file. Our database is contained in the file *wasno2020.sqlite3*. It contains the IRS's scouting data from the [2020 Pacific Northwest (PNW) district competition at Glacier Peak High School in Snohomish, WA](https://www.thebluealliance.com/event/2020wasno). We can get a connection to this database by using the `sqlite3.connect()` function. 

In [None]:
# Get a connection to the database
db_file = "wasno2020.sqlite3"
con = sqlite3.connect(db_file)

## II. Getting Started
### A. Our First Query
The instructions that we'll give to our database are called queries. Run the next cell to see the results of our first query.

In [None]:
# Our first SQL Query
query = "SELECT * FROM teams;"

# This statement runs the query. We'll explain it later.
teams_dataframe = pd.read_sql_query(query, con, index_col = "team_id")

# Display the first six rows
teams_dataframe.head()

The first line of the code cell defines a SQL query, `SELECT * FROM teams;`and saves it in a Python variable. The next line sends the query to our *Sqlite* database and converts the results to a *Pandas* dataframe. Don't pay much attention to this line -- we'll cover it in more detail later. The final statement displays the first six records that were retrieved from the database.

The data from the *teams* table consists of rows and columns. Each row corresponds to a different robotics team, and each column represents a different attribute of a team, such as its name, city or year founded. Rows are often called *records* and the attribute represented by a column is often called a *field*.

This particular SQL query, `SELECT * FROM teams;`, requests all data from the *teams* table. The SQL query has several components.
* Our query started with the SQL keyword `SELECT`, which means we want to retrieve data from the database.
* The asterisk, `*`, means we want to return all columns. If you were reading a SQL query out load, you could say "all columns" or "all fields" instead of "asterisk".
* The phrase `FROM teams` means we want to the data to come from the *teams* table. `FROM` is a SQL keyword and `teams` is a user-created name that we gave to our database table.
* All SQL queries end with a semicolon, `;`.

This data was saved after the first day of the FRC competition, before all qualification matches were completed. That's why the teams have only completed nine or ten matches. Each team normally competes in twelve matches at district competitions in the Pacific Northwest (PNW).

### B. Choosing Columns
We can replace the asterisk in the `SELECT` statement with one or more column names. For example, we could modify the query to provide only the *team_name* and *team_number* columns.

In [None]:
# We can choose our columns
query = """SELECT team_name, team_number FROM teams;"""

# Don't worry about this statement yet
teams_dataframe = pd.read_sql_query(query, con)

# Display the first six rows
teams_dataframe.head()

`SELECT` statements will accept one or more column names. If providing more than one column name, separate the column names with commas.

[Refer to the *W3 SQL Tutorial* for additional examples of `SELECT` statements.](https://www.w3schools.com/Sql/sql_select.asp)

### C. Displaying Different Column Names
Using underscores to separate the words in long column names is a good practice, but it does make our tables look a bit crude.  We can use aliases to display the columns using a different name and make our table look more polished.

In [None]:
# We can choose our columns
query = """SELECT team_name AS Team,
                  team_number AS "Team Number"
             FROM teams;"""

# Don't worry about this statement yet
teams_dataframe = pd.read_sql_query(query, con)

# Display the first six rows
teams_dataframe.head()

The `AS` keyword caused the columns to be renamed when our results were displayed. The new name is called an *alias*. We can include spaces in the alias if we enclose it in double quotes (but then the entire string needs to be enclosed in single or triple quotes).

[Refer to the *W3 SQL Tutorial* for additional alias examples.](https://www.w3schools.com/Sql/sql_alias.asp)

### D. `DISTINCT` Keyword
Suppose we are interested in what cities teams are from. We could use the following query:

In [None]:
# We can filter the results with a WHERE clause.
query = """SELECT city FROM teams;"""

# Don't worry about this statement yet
pd.read_sql_query(query, con)

This query works, but it's longer than it needs to be. Many cities occur multiple times in the list because they have multiple teams.

Adding the DISTINCT keyword to the query solves this problem.

In [None]:
# Using the DISTINCT keyword
query = """SELECT DISTINCT city FROM teams;"""

# Don't worry about this statement yet
pd.read_sql_query(query, con)

Now each city occuurs only once in the list. The `DISTINCT` keyword removes duplicate records from the results.

### E. Filtering Results with a `WHERE` Clause

#### Basic `WHERE` Clause
The following query only returns teams that were founded in 2013.

In [None]:
# We can filter the results with a WHERE clause.
query = """SELECT *
             FROM teams
            WHERE year_founded = 2013;"""

# Don't worry about this statement yet
pd.read_sql_query(query, con, index_col="team_id")

We filtered the query results by adding a `WHERE` clause. Here are a couple key points:
* The order of clauses matters. The `WHERE` clause must come after the `FROM` clause, or else the SQL query will fail.
* Unlike Python, the equality operator, `=`, contains a single equals sign.

We can use the Boolean operators `AND`, `OR`, and `NOT` in our `WHERE` clause.

In [None]:
# We can use Boolean operators in the WHERE clause.
query = """SELECT *
             FROM teams
            WHERE year_founded <= 2012
              AND city = 'Seattle';"""

# Don't worry about this statement yet
pd.read_sql_query(query, con, index_col="team_id")

The query displays all teams in Seattle that were founded in 2012 or earlier. Here are a couple important things to note:
* The value *Seattle* is surrounded by single quotes (`'`) but the value *2012* does not use any quotes. This is because the *year_founded* has a numeric datatype and *city* has a text datatype. Literal text values should always be surrounded by *single* quotes. (*Sqlite* will let you use double quotes around literal strings, but other common database servers, like *Postgres*, will not. It's best get in the habit of using single quotes in this situation.)
* While SQL keywords and user-supplied identifiers are case insensitive, SQL searches often are case-sensitive. For example, if we change our the second part of our `WHERE` clause to `AND city = 'seattle'`, our search will return no results (try it for yourself!) because all occurrences of 'Seattle' in the *teams* table are capitalized. *Sqlite* queries can be forced to conduct a case-insensitive search by adding `COLLATE NOCASE` to the end of the `WHERE` clause. Also, it's possible to specify that searches should be case-insensitive when creating a database table.

[Refer to the *W3 SQL Tutorial* for additional examples of `WHERE` clauses.](https://www.w3schools.com/Sql/sql_where.asp)

#### `LIKE` Operator
We just used a `WHERE` clause and the equals operator (`=`) to find teams from Seattle. The equals operator is used to identify fields that exactly match a literal string. Sometimes we want to find fields that start with or contain a sequence of characters, but don't exactly match a literal string. The `LIKE` operator will do this. The following query finds all teams from citys that start with *S*.

In [None]:
# Using the LIKE operator
query = """SELECT DISTINCT city
             FROM teams
            WHERE city LIKE 's%';"""

# Don't worry about this statement yet
pd.read_sql_query(query, con)

In the preceding query, the `LIKE` operator was followed by the literal string `'s%'`. In SQL, the percent sign, `%`, is a wildcard that represents zero or more characters. The literal string `'s%'` matches any string that starts with `'s'`, regardless of the string's length.

Here is another example.

In [None]:
# Using the LIKE operator
query = """SELECT DISTINCT city
             FROM teams
            WHERE city LIKE '%ish';"""

# Don't worry about this statement yet
pd.read_sql_query(query, con)

The preceding query identfies all cities that end in `'ish`.

The `LIKE` keyword in SQLite will also accept an underscore as a wildcard. The underscore will represent a single character. For example, the following query does not work because Seattle has two t's and the underscore can only match one.

In [None]:
# Query does not work because '_' matches only a single character
# Replace '_' with '%' to fix query.
query = """SELECT DISTINCT city
             FROM teams
            WHERE city LIKE 'Sea_le';"""

# Don't worry about this statement yet
pd.read_sql_query(query, con)

#### `IN` Operator
Suppose we wanted to find all teams from either Issaquah or Bellevue. We cold use a compound statement in the LIKE clause like this:

In [None]:
# TCompound WHERE statement
query = """SELECT * 
             FROM teams
            WHERE city = 'Sammamish'
               OR city = 'Bellevue';"""

# Don't worry about this statement yet
pd.read_sql_query(query, con)

This works, but can get cumbersome. Suppose we wanted to find all teams from Bellevue, Sammamish, and Issaquah? The `IN` operator works great in this situation.

In [None]:
# The IN operator
query = """SELECT * 
             FROM teams
            WHERE city IN ('Sammamish', 'Issaquah', 'Bellevue');"""

# Don't worry about this statement yet
pd.read_sql_query(query, con)

### E. Exercises 1 - 6
It's time to write a few SQL queries.

**Ex. #1.** Write a SQL query that lists all FRC teams from Bellevue. It should have three columns: *team_number*, *team_name*, and *year_founded*.

In [None]:
# Ex #1. Write your SQL query between the quotes on the next line.
# Don't forget the semi-colon!
query = """    """

# Don't change or delete this code
pd.read_sql_query(query, con)

**Ex. #2.** Write a SQL query that finds all teams with 'Robo' in their name. Return all columns.

In [None]:
# Ex #2. Write your SQL query between the quotes on the next line.
query = """    """

# Don't change or delete this code
pd.read_sql_query(query, con)

**Ex. #3.** Write a SQL query that finds all teams from Seattle that were founded before 2010. The dataframe should have three columns, *city*, *team_name*, and *year_founded*, in that order.

In [None]:
# Ex #3. Write your SQL query between the quotes on the next line.
query = """    """

# Don't change or delete this code
pd.read_sql_query(query, con)

**Ex. #4.** Write a SQL query that finds all teams founded between 2010 and 2015, inclusive. Return all columns. You will need to use a Boolean operator in your `WHERE` clause.

In [None]:
# Ex #4. Write your SQL query between the quotes on the next line.
query = """    """

# Don't change or delete this code
pd.read_sql_query(query, con)

**Ex. #5.** Write a SQL query that finds all teams founded between 2010 and 2015, just like in exercise #4. Use the `BETWEEN` operator in your query instead of a Boolean operator. We have not covered the `BETWEEN` operator, but [you can find an example here.](https://www.w3schools.com/sql/sql_between.asp)

In [None]:
# Ex #5. Write your SQL query between the quotes on the next line.
query = """    """

# Don't change or delete this code
pd.read_sql_query(query, con)

**Ex. #6.** Write a SQL query that finds all teams whose names include the substring 'robo'. The substring can occcur anywhere in the team's name. Return only one column containing the team's name. Rename this colum *RoboTeams!*

In [None]:
# Ex #6. Write your SQL query between the quotes on the next line.
query = """    """

# Don't change or delete this code
pd.read_sql_query(query, con)

### F. Closing Connections
We've been running all of our queries on the same database connection. This is fine when we're working in a notebook, but in more significant programs, it's best to close database connections after each transaction. Connections that are left open needlessly consume resources. Promptly closing database connections after use reduces the risk that we'll forget to close them. The next cell closes our connection.

In [None]:
# Closing a database connection
con.close()

From here on, to establish good habits, we'll obtain and close connections within each example.

### G. SQL Syntax and Style
Unlike Python, SQL is case insensitive. SQL considers the queries `SELECT * FROM teams;` and `select * frOM TEAMS;` to be identical. This means you can't have two tables where the only differences in the table names are that some characters are upper or lower case.

To make our queries easy to read and understand, we will conform to a few style rules. These style rules come from a [SQL styleguide by Simon Holywell](https://www.sqlstyle.guide/). 
* SQL keywords like `SELECT` and `FROM` will always be all uppercase.
* User-generated table and column names will always be all lowercase.
* If table or column names contain multiple words, the words will be separated by underscores.
* Longer queries will be placed on multiple lines, with the leftmost keywords right-aligned.
* Table names will be a plural noun that describes the data stored in the table.

In addition, when writing SQL queries in Python code, the mentor recommends placing SQL queries in triple-quoted strings. This practice has two advantages:
* It allows multi-line queries, which enhances readability.
* Double quotes (`"`) and single quotes (`'`) occur frequently in SQL queries. Single and double quotes can easily be placed in single-quoted strings.

[Refer to the *W3 SQL Tutorial* for additional information on SQL syntax.](https://www.w3schools.com/Sql/sql_syntax.asp)

## III. Running Queries
This notebook uses two different techniques to run SQL queries.
* The `pandas.read_sql_query()` function
* The SQLite command line interface (CLI)

We've already seen examples of the `pandas.read_sql_query()` function. We'll use the SQLite CLI later in this section. There are other ways to run SQL queries that will be covered in the next session.

### A. The `pandas.read_sql_query()` Function
We've used the `read_sql_query()` function from the Pandas package to execute all of our queries. The `read_sql_query()` function is not part of the SQL specification. It is a function provided by the Pandas package that makes it easy to get data from a SQL database into a dataframe. We're using `read_sql_query()` function in this notebook because it's easy, which allows us to focus on learning SQL syntax. Also, it makes the query output look nice.

**Helpful Hint:** *Pandas* SQL functions require that the *SQLAlchemy* package be installed. Installing Pandas does not automatically install *SQLAlchemy*. If you get an error the first time you run a Pandas SQL function, install SQLAlchemy if it isn't installed. The CLI commands `conda list sqlachemy` and `conda install sqlalchemy` should do the trick.

The `read_sql_query()` function has two required parameters and six optional ones. We've used three of the parameters so far.
* `sql`: A text string containing the SQL query we want to execute.
* `con`: A SQLite database connection object. Without the connection object, Pandas would have no clue which database we want to use.
* `index_col`: Pandas dataframes always have an index column that is displayed on the left side of the dataframe. If the use doesn't specify an index, Pandas will create an integer index starting at zero. The `index_col` parameter allows us to tell Pandas to use one of the table's columns as an index. This prevents Pandas from displaying an extraneous column of integers that don't exist in the underlying database.

[Refer to the official Pandas documentation for more information on the `read_sql_query()` function.](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_sql_query.html#pandas.read_sql_query).


### B. SQLite on the Command Line
The Pandas and sqlite3 packages are essential if we want to interact with a database from a Python program or a Jupyter notebook. But sometimes we just want to quickly check something in the database, for which writing Python code would be overkill. Fortunately, SQLite provides an easy command line tool, the SQLite Command Line Interface (CLI). Here is an example of a command that can be run in PowerShell, Mac Terminal, or Linux Bash. (Remember, the exclamation point at the beginning of a Jupyter code cell causes the statement to be run from a command line.)

In [None]:
!sqlite3 wasno2020.sqlite3 "SELECT * FROM teams LIMIT 6;"

As you can see, we can run a SQL command from a CLI by entering `sqlite3`, then the path to our database file, and finally a SQL query in quotation marks. The results of the query will be printed on the command line.

Entering `sqlite3` and the path to a database file, and nothing else, causes a sqlite prompt to be displayed:
```
(pyclass) PS C:\Users\tedcodd\sql> sqlite3 wasno2020.sqlite3
SQLite version 3.36.0 2021-06-18 18:36:39
Enter ".help" for usage hints.
sqlite>
```
SQL queries can be entered directly at this prompt, and the results will be displayed within the CLI (see example below). Do not enclose the querie in quotes. Multiple SQL queries can be entered in this fashion. 

```
sqlite> SELECT * FROM schedule LIMIT 3;
205768|2020-02-29T11:00:00|qual|001-q|red|4131|1|10
205769|2020-02-29T11:00:00|qual|001-q|red|4683|2|10
205770|2020-02-29T11:00:00|qual|001-q|red|2412|3|9
sqlite> SELECT match, successes FROM measures WHERE team = '1318' AND task = 'launchOuter';
032-q|2
057-q|3
032-q|4
057-q|3
041-q|0
052-q|7
041-q|1
045-q|1
045-q|7
```

Entering `.quit` exits the *sqlite3* program and returns to the normal CLI prompt.
```
sqlite> .quit
(pyclass) PS C:\Users\tedcodd\sql>
```

The SQLite CLI is handy for when you need to check something in your database, but don't want to start up Jupyter and write Python code. But it can do more than display the results of SQL queries.

The SQLite CLI accepts special commands that start with a period, which are called *dot* commands. These commands are not part of the SQL language and only work within the *SQLite* CLI program. We've already seen one dot command, the `.quit` command. Run the next two code cells to see more dot commands in action.

In [None]:
%%writefile sched.sql
.output schedule.csv
.mode csv
SELECT * FROM schedule;

In [None]:
!sqlite3 wasno2020.sqlite3 ".read sched.sql"

There are several things going on in the two preceding code cells.
1. `%%writefile` is a jupyter magic command. It's neither Python nor SQL, and it only works in a Jupyter code cell. It wrote the contents of the cell to a text file named *sched.sql*. The *sched.sql* file contains two dot commands and one SQL query.
2. The second code cell runs the dot command `.read sched.sql` within the *sqlite3* program. the `.read` command causes each line of the *sched.sql* file to be executed with *sqlite3*.
3. The `.output schedule.csv` redirects all output to the *schedule.csv* file.
4. The `.mode csv` command changes the output format to comma separated value.
5. Finally, we run a `SELECT` query on the *wasno2020.sqlte3* database.

The end result is the entire schedule table is converted to CSV format and saved to the *schedule.csv* file. You can view this file in Jupyter or Google Colab by clicking on the file icon on the left side of the screen, then double-clicking on the *schedule.csv* file.

Here are a few more useful dot commands:
* `.help`: Lists all availale dot commands. Entering the name of the dot command after `.help` provides additional guidance on that dot command. For example, `.help mode` displays all output modes supported by the SQLite CLI.
* `.databases`: Lists the database files that the SQLite CLI is connected to.
* `.dump`: Produce SQL queries that will recreate the entire database.
* `.mode`: Change the way that query output is displayed. Options include `json`, `csv`, `tabs` (tab separated values), `html` (for HTML tables), `column`, and `list` (the default). There are many other options.
* `.ouput`: Save query output to the designated text file instead of displaying it on the screen.
* `.read`: Executes every SQL command or dot command in the provided text file.
* `.schema`: Display the schema of every table in the database.
* `.tables`: Lists all tables in the database.

[Refer to the *SQLite* command line shell's official documentation for more information.](https://sqlite.org/cli.html)

### C. Exercises 7 - 9

**Ex. #7.** Execute a query that shows the first six rows of the *schedule* table. Create your own connection object, use the `pd.read_sql_query()` function to save the results to a dataframe, close the connection, and display the dataframe.

In [None]:
# Ex #7.



**Ex. #8.** Run SQLite from PowerShell (Windows) or Terminal (Mac) and execute a query that shows all matches for team 2930, the *Sonic Squirrles*. Paste the query results below.

The free vesion of Google Colab does not provide a CLI. If you are running this session on Google Colab, run the SQLite CLI from a Jupyter code cell, prefacing the `sqlite3` command with an exclamation mark. There is an example of this technique in the first code cell in section III.B.

`# Ex #8 Answer.` Double-click on this cell and paste the query results for Ex. 8 between the tick marks below:
```

```

**Ex. #9.** Use the SQLite CLI's `.tables` dot command to list the tables in the wasno2020.sqlite3 database file. Paste the results as a comment below.

In [None]:
# Ex #9.
# Paste results here: 

## III. Exploring Tables and Schemas

### A. Getting a Table of Tables
Most relational databases contain multiple tables. We can run a special `SELECT` query to see what tables exist in a database.

In [None]:
# Get a list of tables in the database
query = """SELECT *
             FROM sqlite_master
             WHERE type = 'table';"""

# Run the query and display the results
con = sqlite3.connect(db_file)
tables = pd.read_sql_query(query, con)
con.close()
tables

The results indicate our database contains four tables: *measures*, *schedule*, *status*, and *teams*.

*Sqlite* databases always contain a special table called *sqlite_master*, which lists all objects contained in the database. The *sqlite_master* table contains five columns:
1. `type`: Specifies the type of object. There are four possible values: 'table', 'index', 'view', and 'trigger'. We used a `WHERE` clause to filter the results to just tables. 
2. `name`: The name of the object. For tables, this column contains the table's name.
3. `tbl_name`: Indexes and triggers are always associated with a table. This column specifies the associated table. But for tables and views, the value in this column is always the same as the *name* column.
4. `rootpage`: We won't be using this column. But if you must know, *Sqlite* uses a datastructure called a *B-tree* (NOT the same as a binary tree) to store objects. This column contains the location of the table within the B-tree.
5. `sql`: Contains the SQL query that was used to create the table.

The official SQLite documentation refers to this table as *sqlite_schema*, but says *sqlite_master* is an alternative name. We're using *sqlite_master* because for some mysterious reason, *sqlite_schema* does not work on Google Colab.

### B. Database Schemas
The arrangement of database tables, including the table's fields and data types, is called a schema. We can learn more about a database schema by inspecting the *sql* fields in the *sqlite_master* table. For example, the following query retrieves the query that was used to create the *teams* table.

In [None]:
# Get the teams table's CREATE statement
query = """SELECT sql
             FROM sqlite_master
            WHERE type = 'table'
              AND name = 'teams';"""

con = sqlite3.connect(db_file)
create = pd.read_sql_query(query, con)
con.close()
print(create.iloc[0, 0])

The `CREATE TABLE` query is used to create a new table in a SQL database.  Viewing the `CREATE TABLE` query is useful because it lists all of the columns and their datatypes for a specific table. For the *teams* table, we can see that all columns have a *TEXT* datatype except for *team_id*, *year_founded*, and *matches_played*. Note that the datatype of *team_number* is actually text, even though all of the values appear to be numeric. 

The important takeaway is that all columns have a specific datatype, and only data that matches the datatype can be stored in the column. We'll cover `CREATE TABLE` queries in greater detail in a later session.

[Refer to the official *SQLite* documentatoin for additional information on the schema table.](https://www.sqlite.org/schematab.html#interpretation_of_the_schema_table)

### C. Pragma Queries
*Sqlite* provides another method for retrieving information about a table. The following query extracts information about the *schedule* table.

In [None]:
# Pragma Queries
query = """PRAGMA table_info(schedule);"""
con = sqlite3.connect(db_file)
table_info = pd.read_sql_query(query, con, index_col="cid")
con.close()
table_info

Results from `PRAGMA table_info(...)` include six different columns:
* `cid`: Contains the column ID, which is a sequence of integers, starting with zero for the first column.
* `name`: The column name.
* `type`: The column data type.
* `notnull`: If 1, the column must contain information -- it cannot be empty.
* `dflt_value`: Contains the column's default value. The default value is inserted into the column when creating a new record if no value is specified by the user.
* `pk`: If 1, indicates the column is a primary key. Primary keys will be covered in a later session.

There are several dozen different types of `PRAGMA` queries. [The complete list is available on the official *Sqlite* documentation.](https://www.sqlite.org/pragma.html) `PRAGMA` queries can be used to get information about a database's schema or to view or set database settings.

### D. Exercise 10

**Ex. #10.** Use a `PRAGMA` query to list all columns in the *measures* table.

In [None]:
# Ex #10.



## IV. Sorting
### A. `ORDER BY` Clause
Sorting is easy. Just add an `ORDER BY` clause. The following query sorts teleop measures by match.

In [None]:
# Sorting Query Results
query = """SELECT match, phase, task, team, successes,
                  successes * 2 AS teleop_goal_points
             FROM measures
            WHERE task = 'launchOuter'
              AND phase = 'teleop'
         ORDER BY match;"""
con = sqlite3.connect(db_file)
points = pd.read_sql_query(query, con)
con.close()
points.head(12)

The default sort order is descending. We can change the sort order to descending by adding `DESC` after the column name.

In [None]:
# Sorting Query Results
query = """SELECT team, match, phase, task, successes,
                  successes * 2 AS teleop_goal_points
             FROM measures
            WHERE task = 'launchOuter'
              AND phase = 'teleop'
         ORDER BY team DESC;"""
con = sqlite3.connect(db_file)
points = pd.read_sql_query(query, con)
con.close()
points.head(8)

The team column did not sort as we might have expected. The team column has a text datatype, so all teams whose first digit is 9 are first, then 8, etc, regardless of the size of the integer. We can get a better sort if we convert the team column to an integer datatype, which can be completed within the `ORDER BY` clause.

In [None]:
# Casting text to integer datatype
query = """SELECT team, match, phase, task, successes,
                  successes * 2 AS teleop_goal_points
             FROM measures
            WHERE task = 'launchOuter'
              AND phase = 'teleop'
         ORDER BY CAST(team AS INT) DESC;"""  
con = sqlite3.connect(db_file)
points = pd.read_sql_query(query, con)
con.close()
points.head(8)

### B. Important Tip
In what order are database records returned if there is no `ORDER BY` clause? The official answer is that the order is *undefined*. In practice, relational databases often return records in the order they were created. For example, the order of records in our *wasno2020.sqlite3* database matches the order of the Pandas dataframes that the tables were created from. But that could change at any time, especially if the sqlite software is updated. Don't rely on the database's natural, undefined sort order. If order matters at all (and it ususally does), use an `ORDER BY` clause.

[Refer to the W3 Schools SQL Tutorial for additional information on `ORDER BY` clauses.](https://www.w3schools.com/sql/sql_orderby.asp)

### C. Exercise 11

**Ex. #11.** Run a query to retrieve the names of all robotics teams in descending order.

In [None]:
# Ex #11.



## V. Column Operations
### A. Calculations
The *successes* column in the *measures* table contains the number of times a team successfully compleated a task. We would like to know how many points the team scored during the match by completing that task. Consider the following query.

In [None]:
# Calculations with columsn
query = """SELECT phase, task, team, match, successes,
                  successes * 2 AS teleop_goal_points
             FROM measures
            WHERE task = 'launchOuter'
              AND phase = 'teleop';"""
con = sqlite3.connect(db_file)
points = pd.read_sql_query(query, con)
con.close()
points.head()

The task *launchOuter* corresponds to launching a power cell into the outer port during the 2020 FRC game *Infinite Recharge*. During teleop, outer goals were worth two points. In the second line of the query we calculated the number of points directly in the SQL statement. SQL has most of the arithmetical operators one would expect, inclucing +, -, &ast;, /, and %. 

SQLite also has functions for absolute value and rounding numbers. For example:
```sql
SELECT ABS(x_pos) FROM coordinates;
SELECT ROUND(account_balance, 2) FROM accounts;```

There are several mathematical operators and functions that are *not* avaialble in SQLite. There is no operator for exponentiation, nor are there any trigonometric or square root functions. There are ways to work around these omissions. Since the IRS's scouting data easily fits in memory, we usually read the query results into a Pandas dataframe and do calculations within the dataframe.

### B. CASE Statements
Suppose we would like the total points from outer goals for both the autonomous and teleop phases.

In [None]:
# Casting text to integer datatype
query = """SELECT team, match, phase, task, successes,
                  CASE
                        WHEN phase = 'auto' THEN successes * 4
                        WHEN phase = 'teleop' THEN successes * 2
                        ELSE 0
                  END AS teleop_goal_points
             FROM measures
            WHERE task = 'launchOuter'
         ORDER BY match;"""  
con = sqlite3.connect(db_file)
points = pd.read_sql_query(query, con)
con.close()
points.head(8)

SQL `CASE` statements are similar to Python `if` statements. With a case statement, the query is able to inspect the contents of the *phase* field and caclulate the points per outer goal correctly.

[Refer to the W3 Schools SQL Tutorial for additional information on `CASE` statements.](https://www.w3schools.com/sql/sql_case.asp)

### C. Exercises 12 - 13

**Ex. #12.** The measures table has two numeric columns: *successes* and *attempts*. The *attempts* column records the number of times a robot attempted a task during a single match, regardless of whether the task was successful. The *successes* column records how many times a robot successfully completed a match.

What team missed the most outer goals during teleop during a single match? In which match did this occur? Your query results should include the team's number, the match, and the number of misses. Sort the query in descending order by the number of misses. Use an alias if necessary to give each column a descriptive name. 

In [None]:
# Ex #12.



**Ex. #13.** Modify and run the example query with the `CASE` statement in seciton V.B. The query should return records for lower, outer, and inner goals, in both teleop and autonomous phases. It should have a column named *goal_points* that contains the points scored from the goals. The point total should be correct regardless of the type of goal or which phase the goals were scored in. Order the records by number of points scored, in descending order.

In [None]:
# Ex #13.



## VI. Empty Fields
### A. NULL Values
You may have noticed that there are a few teams in the *teams* table for which there are no team names. The *team_name* field is empty. For SQL databases, empty fields are considered to contain *NULL* values. One can search for NULL values using a `WHERE` clause.

In [None]:
# Fnding NULL values
query = """SELECT * FROM teams
            WHERE team_name IS NULL;"""
con = sqlite3.connect(db_file)
teams = pd.read_sql_query(query, con)
con.close()
teams

Using the phrase `IS NULL` in the `WHERE` clause restricts the query results to records with an empty *team_name* field. Conversely, we can eliminate records with NULL values with the phrase `IS NOT NULL`.

In [None]:
# Eliminating NULL values
query = """SELECT * FROM teams
            WHERE team_name IS NOT NULL;"""
con = sqlite3.connect(db_file)
teams = pd.read_sql_query(query, con)
con.close()
teams.head()

### B. Sorting NULL Values
How do NULL values behave when an `ORDER BY` clause is used? *SQLite* considers NULL values to be smaller than all other values. So NULLs are returned first for an ascending query and last for a descending query.

In [None]:
# Ascending Query - NULL values returned first
query = """SELECT * FROM teams
            ORDER BY team_name
            LIMIT 8;"""
con = sqlite3.connect(db_file)
teams = pd.read_sql_query(query, con)
con.close()
teams

In [None]:
# Descending Query - NULL values returned last
query = """SELECT * FROM teams
            ORDER BY team_name DESC;"""
con = sqlite3.connect(db_file)
teams = pd.read_sql_query(query, con)
con.close()
teams.tail(8)

The default ordering of records with NULL values can be overridden by using the phrases `NULLS FIRST` or `NULLS LAST` in the `ORDER BY` clause. For example, the first records in the following query do not contain NULL values in the *team_name* column because we forced the NULL values to the end of the table. (NOTE: The `NULLS FIRST` and `NULLS LAST` features were added to SQLite in version 3.30.0. As of 8 November 2021, Google Colab is running SQLite version 3.22.0, so using `NULLS FIRST` or `NULLS LAST` will cause an error.)

In [None]:
# Ascending Query - NULL values returned first
query = """SELECT * FROM teams
            ORDER BY team_name NULLS LAST
            LIMIT 8;"""
con = sqlite3.connect(db_file)
teams = pd.read_sql_query(query, con)
con.close()
teams

[Refer to the W3 Schools SQL tutorial for more information on NULL values.](https://www.w3schools.com/Sql/sql_null_values.asp)

### C. Exercise 14

**#14.** Run a query on the *teams* table. Use a `CASE` statement to check if the *team_name* field is empty. If it's empty, replace the NULL values with "!!!Name Unkown!!!". Otherwise, display the team's name. The results should have two columns, one with the team's number and the other column should have the team's name. Ensure each column has a short, intuitive name.

In [None]:
# Ex. 14




## VII. Exercises 15 - 17

**#15.** Use the SQLite CLI to run a `SELECT` query that lists all matches that the Spartabots (FRC 2976) play in. The results should have columns for team, match, and date, in that order. Sort the results in ascending order.

Copy query and results from the SQLite CLI and paste them into this cell. (Double-click on the cell to enter edit mode.)
```
# Paste Query and CLI output below this line.


```

**#16.** Get a list of all phases and tasks from the *measures* table. Ensure there are no duplicated values. Sort the results by *phase* and then *task*.

In [None]:
# Ex. #16.



**#17.** Use a SQL query to get a list of teams that scored at least 4 inner or outer goals during autonomous during at least one match. Ensure there are no duplicate values in the results.

In [None]:
# Ex. 17



## VIII. Concept and Terminology Review
You should be able to define the following terms or drescribe the concept. 
* Relational Database
* Table
* Row
* Record
* Column
* Field
* SQL
* SQLite
* Opening and Closing Connections
* *SQLite* CLI
* Dot commands
* `pandas.read_sql_query()`
* `SELECT`
* `FROM`
* Alias and `AS`
* `DISTINCT`
* `WHERE`
* Boolean Operators
* `LIKE`
* `IN`
* `BETWEEN`
* *sqlite_master* table
* `PRAGMA`
* `ORDER BY`
* `DESC`
* `CAST`
* `CASE`
* `NULL`
* `NULLS LAST`

[Table of Contents](../../index.ipynb)