# Important note!

Before you turn this problem in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel$\rightarrow$Restart) and then **run all cells** (in the menubar, select Cell$\rightarrow$Run All).

Make sure you fill in any place that says `YOUR CODE HERE` or "YOUR ANSWER HERE", as well as your name and collaborators below:

In [None]:
YOUR_ID = "" # Please enter your GT login, e.g., "rvuduc3" or "gtg911x"
COLLABORATORS = [] # list of strings of your collaborators' IDs

In [None]:
import re

RE_CHECK_ID = re.compile (r'''[a-zA-Z]+\d+|[gG][tT][gG]\d+[a-zA-Z]''')
assert RE_CHECK_ID.match (YOUR_ID) is not None

collab_check = [RE_CHECK_ID.match (i) is not None for i in COLLABORATORS]
assert all (collab_check)

del collab_check
del RE_CHECK_ID
del re

**Jupyter / IPython version check.** The following code cell verifies that you are using the correct version of Jupyter/IPython.

In [None]:
import IPython
assert IPython.version_info[0] >= 3, "Your version of IPython is too old, please update it."

# Part 1: SQLite [6 points]

The de facto language for managing relational databases is the Structured Query Language, or SQL ("sequel").

Many commerical and open-source relational data management systems (RDBMS) support SQL. The one we will consider in this class is the simplest, called [sqlite3](https://www.sqlite.org/). It stores the database in a simple file and can be run in a "standalone" mode from the command-line. However, we will, naturally, [invoke it from Python](https://docs.python.org/3/library/sqlite3.html).

With a little luck, you _might_ by the end of this class understand this [xkcd comic on SQL injection attacks](http://xkcd.com/327).

## Getting started

In Python, you _connect_ to an `sqlite3` database by creating a _connection object_.

In [None]:
import sqlite3 as db

# Connect to a database (or create one if it doesn't exist)
conn = db.connect ('example.db')

`sqlite` maintains databases as files; in this example, the name of that file is `example.db`.

> If the named file does not yet exist, connecting to it in this way will create it.

To issue commands to the database, you also need to create a _cursor_.

In [None]:
# Create a 'cursor' for executing commands
c = conn.cursor ()

A cursor tracks the current state of the database, and you will mostly be using the cursor to manipulate or query the database.

## Tables and Basic Queries

The main object of a relational database is a _table_.

Conceptually, your data consists of items and attributes. In a database table, the items are _rows_ and the attributes are _columns_.

For instance, suppose we wish to maintain a database of Georgia Tech students, whose attributes are their names and GT IDs. You might start by creating a table named `Students` to hold this data. You can create the table using the command, [`create table`](https://www.sqlite.org/lang_createtable.html).

In [None]:
c.execute ("create table Students (gtid integer, name text)")

> Note: This command will fail if the table already exists. If you are trying to carry out these exercises from scratch, you may need to remove any existing `example.db` first.

To populate the table with items, you can use the command, [`insert into`](https://www.sqlite.org/lang_insert.html).

In [None]:
c.execute ("insert into Students values (123, 'Vuduc')")
c.execute ("insert into Students values (456, 'Chau')")
c.execute ("insert into Students values (381, 'Bader')")
c.execute ("insert into Students values (991, 'Sokol')")

Given a table, the most common operation is a _query_. The simplest kind of query is called a [`select`](https://www.sqlite.org/lang_select.html).

The following example selects all rows (items) from the `Students` table.

In [None]:
c.execute ("select * from Students")

Conceptually, the database is now in a new state in which you can ask for results of the query. One way to do that is to call `fetchone()` on the cursor object, which will return a tuple corresponding to a row of the table.

This example calls `fetchone()` twice to get the first two query results.

In [None]:
print (c.fetchone ())
print (c.fetchone ())

An alternative to `fetchone()` is `fetchall()`, which will return a list of tuples for all rows, _starting at the cursor_.

> Since the preceding code has already fetched the first two results, calling `fetchall()` at this point will return all _remaining_ results.

In [None]:
print (c.fetchall ())

What will calling `fetchone()` at this point return?

In [None]:
print (c.fetchone ())

Here is an alternative, an arguably more natural, idiom for executing a query and iterating over its results.

In [None]:
query = 'select * from Students'
for student in c.execute (query):
    print (student)

## An insertion idiom

Another common operation is to perform a bunch of insertions into a table from a list of tuples. In this case, you can use `executemany()`.

In [None]:
# An important (and secure!) idiom
more_students = [(723, 'Rozga'),
                 (882, 'Zha'),
                 (401, 'Park'),
                 (377, 'Vetter'),
                 (904, 'Brown')]

c.executemany ('insert into Students values (?, ?)', more_students)

query = 'select * from Students'
for student in c.execute (query):
    print (student)

**Exercise 1** (2 points). Suppose we wish to maintain a second table, called `Takes`, which records classes that students have taken and the grades they earn.

In particular, each row of `Takes` stores a student by his/her GT ID, the course he/she took, and the grade he/she earned. More formally, suppose this table is defined as follows:

In [None]:
c.execute ('create table Takes (gtid integer, course text, grade real)')

Write a command to insert the following records into the `Takes` table.

* Vuduc: CSE 6040 - A (4.0), ISYE 6644 - B (3.0), MGMT 8803 - D (1.0)
* Sokol: CSE 6040 - A (4.0), ISYE 6740 - A (4.0)
* Chau: CSE 6040 - C (2.0), CSE 6740 - C (2.0), MGMT 8803 - B (3.0)

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# Displays the results of your code
c.execute ('select * from Takes')
results = c.fetchall ()
print ("Your results:", len (results), "\nThe entries of Takes:", results)

assert (991, "CSE 6040", 4.0) in results
assert (456, "CSE 6040", 4.0) in results
assert (123, "CSE 6040", 2.0) in results
assert (123, "ISYE 6644", 3.0) in results
assert (123, "MGMT 8803", 1.0) in results
assert (991, "ISYE 6740", 4.0) in results
assert (456, "CSE 6740", 2.0) in results
assert (456, "MGMT 8803", 3.0) in results
assert len (results) == 8

print ("\n(Passed.)")

## Join queries

The main type of query that combines information from multiple tables is the _join query_. Recall from our discussion of tibbles these four types:

- `inner-join (A, B)`: Keep rows of `A` and `B` only where `A` and `B` match
- `outer-join (A, B)`: Keep all rows of `A` and `B`, but merge matching rows and fill in missing values with some default (`NaN` in Pandas, `NULL` in SQL)
- `left-join (A, B)`: Keep all rows of `A` but only merge matches from `B`.
- `right-join (A, B)`: Keep all rows of `B` but only merge matches from `A`.

In SQL, you can use the `where` clause of a `select` statement to specify how to match rows from the tables being joined.

For example, recall that the `Takes` table stores classes taken by each student. However, these classes are recorded by a student's GT ID. Suppose we want a report where we want each student's name rather than his/her ID. We can get the matching name from the `Students` table. Here is a query to accomplish this matching:

In [None]:
# See all (name, course, grade) tuples
query = '''
    select Students.name, Takes.course, Takes.grade
        from Students, Takes
        where Students.gtid=Takes.gtid
'''

for match in c.execute (query):
    print (match)

**Exercise 2.** (2 points) Define a query to select only the names and grades of students _who took CSE 6040_. The code below will execute your query and store the results in a list `results1` of tuples, where each tuple is a `(name, grade)` pair; thus, you should structure your query to match this format.

In [None]:
# Define `query` here:
# YOUR CODE HERE
raise NotImplementedError()

# The following code executes your `query`:
c.execute (query)
results1 = c.fetchall ()

In [None]:
print ("Your results:", results1)

assert type (results1) is list
assert len (results1) == 3
assert ('Sokol', 4.0) in results1
assert ('Chau', 4.0) in results1
assert ('Vuduc', 2.0) in results1

print ("\nPassed.")

## Aggregations

Another common style of query is an _aggregation_, which is a summary of information across multiple records, rather than the raw records themselves.

For instance, suppose we want to compute the GPA for each unique GT ID from the `Takes` table. Here is a query that does it:

In [None]:
query = '''
    select gtid, avg (grade)
        from Takes
        group by gtid
'''

for match in c.execute (query):
    print (match)

**Exercise 3** (2 points). Define a query to compute the _average_ GPA of every student. The code below will execute your query and store the results in a list `results2` of tuples, where each tuple is a `(name, gpa)` pair; thus, you should structure your query to match this format.

In [None]:
# Define an SQL `query` string that compute the GPA of every student:
# YOUR CODE HERE
raise NotImplementedError()

# Executes your `query`, producing `results2`:
c.execute (query)
results2 = c.fetchall ()

In [None]:
print ("Your results:", results2)

assert ('Vuduc', 2.0) in results2
assert ('Chau', 3.0) in results2
assert ('Sokol', 4.0) in results2
assert len (results2) == 3

print ("\n(Passed.)")

## Cleanup

As one final bit of information, it's good practice to shutdown the cursor and connection, the same way you close files.

In [None]:
c.close()
conn.close()