# Denison CS181/DA210 SW Lab #12 - Step 2

Before you turn this problem in, make sure everything runs as expected. This is a combination of **restarting the kernel** and then **running all cells** (in the menubar, select Kernel$\rightarrow$Restart And Run All).

Make sure you fill in any place that says `# YOUR CODE HERE` or "YOUR ANSWER HERE".

---

#### Import Python modules

In [None]:
import pandas as pd
import os
import os.path
import json
import sqlalchemy as sa

#### Set credentials

In [None]:
def getsqlite_creds(dirname=".",filename="creds.json",source="sqlite"):
    """ Using directory and filename parameters, open a credentials file
        and obtain the two parts needed for a connection string to
        a local provider using the "sqlite" dictionary within
        an outer dictionary.  
        
        Return a scheme and a dbfile
    """
    assert os.path.isfile(os.path.join(dirname, filename))
    with open(os.path.join(dirname, filename)) as f:
        D = json.load(f)
    sqlite = D[source]
    return sqlite["scheme"], sqlite["dbdir"], sqlite["database"]

In [None]:
def buildConnectionString(source="sqlite_book"):
    scheme, dbdir, database = getsqlite_creds(source=source)
    template = '{}:///{}/{}.db'
    return template.format(scheme, dbdir, database)

cstring = buildConnectionString("sqlite_book")
print("Connection string:", cstring)

---

## Part D: Using Variables with `sqlalchemy`

#### Template strings

We already know how to use Python string formatting to build template strings.  We can use this strategy to write SQL queries that use variables for the source of values (e.g., in the `WHERE` clause):

In [None]:
# Re-connect to the database
engine = sa.create_engine(cstring)
connection = engine.connect()

In [None]:
# Build a query template string
query_template = """
SELECT code, pop, gdp, life
FROM indicators
WHERE year = {} AND life > {}
"""

In [None]:
# Use the template with year=2010 and threshold=82
query_2010 = query_template.format(2010, 82)
df1 = pd.read_sql_query(query_2010, con=connection)
df1

In [None]:
# Use the template with year=2017 and threshold=84
query_2017 = query_template.format(2017, 84)
df2 = pd.read_sql_query(query_2017, con=connection)
df2

We could, in fact, write a function to abstract performing this query.  In the SQL query below, we also join with `countries` to get the name of the countries that pass the filters.

In [None]:
def indicators_life(dbcon, year=2017, threshold=0):
    query_template = """
    SELECT i.code, country, pop, gdp, life
    FROM indicators AS i
        INNER JOIN countries USING (code)
    WHERE year = {} AND life > {}
    """

    query = query_template.format(year, threshold)
    return pd.read_sql_query(query, con=connection)

In [None]:
df3 = indicators_life(connection, year=2000, threshold=80)
df3

#### Variable binding

The `sqlalchemy` package provides another possibility: _binding_ variables by providing the names of the bound variables in the query string, binding those variables to values in Python, and executing the resulting query.

In [None]:
def indicators_life2(connection, year, threshold):
    # Prepare
    pyquery = """
    SELECT i.code, country, pop, gdp, life
    FROM indicators AS i
        INNER JOIN countries USING (code)
    WHERE year = :yr AND life > :thrsh
    """
    prepare_statement = sa.sql.text(pyquery)
    
    # Bind variables (match :x in query)
    bound_statement = prepare_statement.bindparams(yr = year, thrsh = threshold)

    # Execute
    df = pd.read_sql_query(bound_statement, connection)
    return df

In [None]:
df4 = indicators_life2(connection, 1980, 76)
df4

In [None]:
df5 = indicators_life2(connection, 1990, 77)
df5

In [None]:
# Close the connection!
try:
    connection.close()
except:
    pass
del engine

---

We'll use this technique for populating tables with data.  For now, let's get some practice using the `school` database.

In [None]:
# Build the connection

cstring = buildConnectionString("sqlite_school")
print("Connection string:", cstring)

# Connect to the database
engine = sa.create_engine(cstring)
connection = engine.connect()

**Q3:** Write a function
```
    departmentsByDivision(connection, division)
```
that queries the `departments` table for the set of departments (all fields) in the given `division`.  Use Python string composition and the `format()` method for building the query.  Return a `pandas` `DataFrame`.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# Testing cell
df_finearts = departmentsByDivision(connection, "Fine Arts")
assert df_finearts.shape == (5, 4)
assert "Cinema" in list(df_finearts["departmentname"])

df_inter = departmentsByDivision(connection, "Interdisciplinary")
assert df_inter.shape == (13, 4)
assert "Biology" not in list(df_inter["departmentname"])

**Q4:** Write a function
```
    departmentsByDivision2(connection, division)
```
that queries the `departments` table for the set of departments (all fields) in the given `division`.  Use _variable binding_ for building the query.  Return a `pandas` `DataFrame`.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# Testing cell
df_finearts = departmentsByDivision2(connection, "Fine Arts")
assert df_finearts.shape == (5, 4)
assert "Cinema" in list(df_finearts["departmentname"])

df_inter = departmentsByDivision2(connection, "Interdisciplinary")
assert df_inter.shape == (13, 4)
assert "Biology" not in list(df_inter["departmentname"])

In [None]:
# Close the connection!
try:
    connection.close()
except:
    pass
del engine