# Denison CS181/DA210 SW Lab #10 - Step 2

Before you turn this problem in, make sure everything runs as expected. This is a combination of **restarting the kernel** and then **running all cells** (in the menubar, select Kernel$\rightarrow$Restart And Run All).

Make sure you fill in any place that says `# YOUR CODE HERE` or "YOUR ANSWER HERE".

---

#### Import Python modules and load "SQL Magic"

In [None]:
import pandas as pd
import os
import os.path
import json
import sys
import importlib

module_dir = "../../modules"
module_path = os.path.abspath(module_dir)
if not module_path in sys.path:
    sys.path.append(module_path)

%load_ext sql

#### Set credentials

In [None]:
def getsqlite_creds(dirname=".",filename="creds.json"):
    """ Using directory and filename parameters, open a credentials file
        and obtain the two parts needed for a connection string to
        a local provider using the "sqlite" dictionary within
        an outer dictionary.  
        
        Return a scheme and a dbfile
    """
    assert os.path.isfile(os.path.join(dirname, filename))
    with open(os.path.join(dirname, filename)) as f:
        D = json.load(f)
    sqlite = D["sqlite"]
    return sqlite["scheme"], sqlite["dbdir"], sqlite["database"]

In [None]:
scheme, dbdir, database = getsqlite_creds()
template = '{}:///{}/{}.db'
cstring = template.format(scheme, dbdir, database)
print("Connection string:", cstring)

#### Establish connection from client to server

In [None]:
%sql $cstring

---

## Part D: Single-Table Column Projection

#### Simple SELECT statements

In SQL, queries all use the SQL `SELECT` statement.

The syntax for a SQL query to project column fields from a table is given by:

`SELECT` _field-spec_ `FROM` _table-spec_

where _table-spec_ is the name of a table (we'll explore more complex options later), and _field-spec_ can be comprised of one or more _expressions_, separated by commas, as given in the following syntax:

_field-spec_ |= _expression_ [, _expression_ ]*

This syntax indicates that _field-spec_ must have at least one _expression_, and may have 0 or more additional _expressions_.

Here is an example for the `indicators0` table in `book.db`:

In [None]:
%sql SELECT code, pop, gdp FROM indicators0

If we want to select all fields for a given table, we can use `*` for the expression:

In [None]:
%sql SELECT * FROM indicators0

We can, accordingly, update our syntax expression for _field-spec_:

_field-spec_ |= _expression_ [, _expression_ ]* | `*`

In this syntax expression, `|` indicates alternatives.

#### More complex column projections

The `SELECT` statements we've seen so far may result in a huge number of records.  We can use the `LIMIT` keyword to restrict the projection to only a handful:

In [None]:
%sql SELECT name FROM topnames LIMIT 5

We can expand our `SELECT` statement syntax accordingly:

`SELECT` _field-spec_ `FROM` _table-spec_  
[ _limit-clause_ ]

Here, [ ] indicate that _limit-clause_ is optional.  It is given by:

_limit-clause_ |= `LIMIT` _number_

Additionally, we can use column projection to reorder or even rename the tables in the result (this does not impact the original data table, however):

In [None]:
%sql SELECT code, pop AS Population, gdp AS GDP FROM indicators0

Note that we can rewrite the above query to be a little more readable by storing the query as a multi-line string, and then sending it to the database management system.  We'll use this structure going forward for more readable SQL queries.

In [None]:
query = """
SELECT code, pop AS Population, gdp AS GDP
FROM indicators0
"""
%sql $query

---

## Part E: Your First SQL Statements

In the following cells, your only action is to remove the `# YOUR CODE HERE` and `raise NotImplementedError()` lines, and then put a valid SQL statement as the **value** of string variable `query`.

In each case, when you execute the cell, the query will be sent to the database management system, a result obtained, and the result converted into a `pandas` data frame, whose prefix is shown.  (This allows for testing the results as well as displaying them in your Jupyter Notebook.)

_Note: you may want to view the `book` database in SQLiteStudio to get an idea of what the field names are as you work through these exercises._

**Q1:** Using the table `countries`, give the SQL query you would use to obtain a table all country names. There should be no other columns projected.

In [None]:
query = """
"""
# YOUR CODE HERE
raise NotImplementedError()

resultset = %sql $query
resultdf = resultset.DataFrame()
resultdf.head()

In [None]:
# Testing cell
assert len(resultdf) == 217
assert 'Aruba' in list(resultdf['country'])

**Q2:** Project the year, code, population, and number of cell-phones from `indicators`.  Rename each column name to have a capital first letter.

In [None]:
query = """
"""
# YOUR CODE HERE
raise NotImplementedError()

resultset = %sql $query
resultdf = resultset.DataFrame()
resultdf.head()

In [None]:
# Testing cell
assert len(resultdf) == 12862
assert resultdf.shape == (12862,4)
assert 'Pop' in resultdf.columns
assert 'Cell' in resultdf.columns

---

## Part F: Ordering Results

In the relational model, tables in the database do not have an a priori order to their records (at least, not one we should rely on).  This makes sense if we think of them as sets.

SQL provides the ability for us to order (i.e., sort) the result of a query with an `ORDER BY` clause that follows the rest of the SQL query, and comes just before any `LIMIT` clause:

`SELECT` _field-spec_ `FROM` _table-spec_  
[ _order-clause_ ]  
[ _limit-clause_ ]

Here, [ ] again indicates that a clause is optional.  The _order-clause_ is given by:

_order-clause_ |= `ORDER BY` _order-term_ [, _order-term_ ]*

_order-term_ |= _expression_ [ `ASC` | `DESC` ]

For example, with no ordering:

In [None]:
query = """
SELECT * FROM topnames
LIMIT 3
"""
%sql $query

We can instead order by year, most recent first:

In [None]:
query = """
SELECT * FROM topnames
ORDER BY year DESC
LIMIT 4
"""
%sql $query

We can provide multiple order terms, resulting in a multi-level sort.  For example, we could sort `topnames` first by sex, and then within each sex, sort by count (note that these are not necessarily the fields making up the primary key in a table):

In [None]:
query = """
SELECT * FROM topnames
ORDER BY sex, count DESC
LIMIT 8
"""

# Note: ORDER BY defaults to ASC if not specified

%sql $query

**Q3:** Using the SQL table `countries`, project all columns and produce a table of rows ordered by landmass, from smallest to largest.

In [None]:
query = """
"""
# YOUR CODE HERE
raise NotImplementedError()

resultset = %sql $query
resultdf = resultset.DataFrame()
resultdf.head()

In [None]:
# Testing cell
assert len(resultdf) == 217
assert resultdf.iloc[0,0] == 'CUW'
assert resultdf.iloc[-1,0] == 'RUS'

**Q4:** Use SQL to find the rows with the top 20 GDP values in `indicators`.  You may project all the columns.

In [None]:
query = """
"""
# YOUR CODE HERE
raise NotImplementedError()

resultset = %sql $query
resultdf = resultset.DataFrame()
resultdf.head()

In [None]:
assert len(resultdf) == 20
assert resultdf.loc[0, 'year'] == 2018
assert resultdf.loc[0, 'code'] == 'USA'
assert resultdf.loc[0, 'gdp'] == 20494.1

**Q5:** Use a SQL query to answer: what country and in what year was the greatest number of cell phones? Project year, code, and cell.

In [None]:
query = """
"""
# YOUR CODE HERE
raise NotImplementedError()

resultset = %sql $query
resultdf = resultset.DataFrame()
resultdf.head()

In [None]:
# Testing cell
assert resultdf.loc[0,'cell'] == 1469.88
assert resultdf.loc[0,'year'] == 2017
assert resultdf.loc[0,'code'] == 'CHN'

> You've reached the third (and final) checkpoint in the lab.  Make sure to have it signed off by the instructor or TA.
>
> Checkpoint 3: Does the order in your `SELECT` statement have to match the order of the columns in the original table?  Also, if you did not already, how could you modify your query in the previous question to result in only a single record (row)?

---

---

## Part G

How much time (in minutes/hours) did you spend on this lab outside of class?

YOUR ANSWER HERE