# Denison CS181/DA210 SW Lab #11 - Step 1

Before you turn this problem in, make sure everything runs as expected. This is a combination of **restarting the kernel** and then **running all cells** (in the menubar, select Kernel$\rightarrow$Restart And Run All).

Make sure you fill in any place that says `# YOUR CODE HERE` or "YOUR ANSWER HERE".

---

#### Import Python modules and load "SQL Magic"

In [None]:
import pandas as pd
import os
import os.path
import json
import sys
import importlib

module_dir = "../../modules"
module_path = os.path.abspath(module_dir)
if not module_path in sys.path:
    sys.path.append(module_path)

%load_ext sql

#### Set credentials

In [None]:
def getsqlite_creds(dirname=".",filename="creds.json",source="sqlite"):
    """ Using directory and filename parameters, open a credentials file
        and obtain the two parts needed for a connection string to
        a local provider using the "sqlite" dictionary within
        an outer dictionary.  
        
        Return a scheme and a dbfile
    """
    assert os.path.isfile(os.path.join(dirname, filename))
    with open(os.path.join(dirname, filename)) as f:
        D = json.load(f)
    sqlite = D[source]
    return sqlite["scheme"], sqlite["dbdir"], sqlite["database"]

In [None]:
scheme, dbdir, database = getsqlite_creds(source="sqlite2")
template = '{}:///{}/{}.db'
cstring = template.format(scheme, dbdir, database)
print("Connection string:", cstring)

#### Establish Connection from Client to Server

In [None]:
%sql $cstring

---

## Part A: Two-Table Inner Joins

For this lab, you'll be working with the `school` database.  The schema is shown below.

In [None]:
from IPython.display import Image
Image("figs/school_schema.jpg", width=600, height=600)

Recall from class that we can join two tables together using the following change to our _table-spec_:

_table-spec_ |= _table_ | (subquery) AS newName | _join-table_

where an inner join is specified using:

_join-table_ |= _table-spec_ [AS alias1] INNER JOIN _table-spec_ [AS alias2] ON _match-cond_

or:

_join-table_ |= _table-spec_ [AS alias1] INNER JOIN _table-spec_ [AS alias2] USING (_common-fields_)

For example, we could find the meeting times of each spring-semester class of any course worth 2 credits (sorted by the course subject and then number):

In [None]:
# A fairly complex SQL query, with an example of a two-table inner join
query = """
SELECT co.coursesubject, co.coursenum, co.coursehours, cl.classmeeting
FROM classes as cl INNER JOIN courses as co
    ON cl.coursesubject = co.coursesubject AND
       cl.coursenum = co.coursenum
WHERE cl.classterm = 'SPRING' AND co.coursehours >= 2 AND
      cl.classmeeting IS NOT NULL
ORDER BY co.coursesubject, cl.coursenum
"""

resultset = %sql $query
resultset.DataFrame()

#### Try it out yourself!

**Q1:** Write a SQL query to retrieve department information, including the id of the department, the name of the department, and the last and first names of the chair of the department.

In [None]:
query = """
"""
# YOUR CODE HERE
raise NotImplementedError()

resultset = %sql $query
resultdf = resultset.DataFrame()
print("Number of rows in result:", len(resultdf))
resultdf.head(10)

> You've reached the first checkpoint in the lab.  Make sure to have it signed off by the instructor or TA.
>
> Checkpoint 1: Determine the number of rows in your result and compare with the number of rows total in departments.  Are they the same?  Why or why not?

**Q2:** In reference to the `school` database, select all course titles for classes offered during the year, their class meeting times, and their terms. Keep the default ordering (by `coursetitle`). It's ok to include directed studies, but don't allow any NULL course titles or meeting times.

In [None]:
query = """
"""
# YOUR CODE HERE
raise NotImplementedError()

resultset = %sql $query
resultdf = resultset.DataFrame()
print(len(resultdf))
resultdf.tail()

In [None]:
# Testing cell
assert len(resultdf) == 1065
assert len(resultdf.iloc[0]) == 3
assert 'Writing Workshop' in list(resultdf['coursetitle'])

**Q3:** Write a query to display students (last name and first name) and instructors (first name) who have the same last name, ordered by student last name, then student first name. Don't include duplicate results.

In [None]:
query = """
"""
# YOUR CODE HERE
raise NotImplementedError()

resultset = %sql $query
resultdf = resultset.DataFrame()
print(len(resultdf))
resultdf.head()

In [None]:
# Testing cell
assert len(resultdf) == 1768
assert list(resultdf.columns) == ["studentlast", "studentfirst", "instructorfirst"]
assert list(resultdf.iloc[4,:]) == ["Anderson", "Julie", "Philip"]

---

## Part B: Three-Table Joins

We can extend the inner joins we've seen to a chain of joins, allowing us to get information across several tables.

For example, we'll build up a series of queries that will give us the subject, number, section, and meeting time for each of a set of classes combined with the instructoring teaching the class.

In [None]:
# First, just look up the info from the classes table
query = """
SELECT coursesubject || '-' || coursenum || '-' || classsection AS class,
       classmeeting
FROM classes
WHERE classid IN (21014, 21088, 21256, 21444)
"""

resultset = %sql $query
resultdf = resultset.DataFrame()
print(len(resultdf))
resultdf.head()

In [None]:
# Now, let's join with instructor_class to begin getting the instructor info
query = """
SELECT coursesubject || '-' || coursenum || '-' || classsection AS class,
       classmeeting,
       instructorid
FROM classes INNER JOIN instructor_class USING (classid)
WHERE classid IN (21014, 21088, 21256, 21444)
"""

resultset = %sql $query
resultdf = resultset.DataFrame()
print(len(resultdf))
resultdf.head()

In [None]:
# Finally, we *also* join with instructors to get the names
# (the inner joins are processed left to right)
query = """
SELECT coursesubject || '-' || coursenum || '-' || classsection AS class,
       classmeeting,
       instructorlast, instructorfirst
FROM classes INNER JOIN instructor_class USING (classid)
    INNER JOIN instructors USING (instructorid)
WHERE classid IN (21014, 21088, 21256, 21444)
"""

resultset = %sql $query
resultdf = resultset.DataFrame()
print(len(resultdf))
resultdf.head()

#### Practice with three-table inner joins

**Q4:** Write a query to display all the students (id, last name, first name) who took math or computer science during the fall, along with the classes they took (course subject, number, and class term).  Order your results by `studentid` (lowest to highest).  If a student took multiple math or CS courses, include them multiple times.

In [None]:
query = """
"""
# YOUR CODE HERE
raise NotImplementedError()

resultset = %sql $query
resultdf = resultset.DataFrame()
resultdf.head()

In [None]:
# Testing cell
assert resultdf.shape == (595, 6)
assert resultdf.iloc[3,0] == 61613
assert resultdf.iloc[4,4] == 110

**Q5:** In reference to the `school` database, which instructors (first and last name) were teaching in the spring semester? Your result should not include duplicates.

In [None]:
query = """
"""
# YOUR CODE HERE
raise NotImplementedError()

resultset = %sql $query
resultdf = resultset.DataFrame()
print(len(resultdf))
resultdf.head()

In [None]:
# Testing cell
assert len(resultdf) == 266
assert 'Taylor' in list(resultdf['instructorfirst'])
assert 'Fuller' in list(resultdf['instructorlast'])

**Q6:** Find the students (id only) who took more than 10 classes over the year. Include the number of classes they took as `count`.

In [None]:
query = """
"""
# YOUR CODE HERE
raise NotImplementedError()

resultset = %sql $query
resultdf = resultset.DataFrame()
print(len(resultdf))
resultdf.head()

In [None]:
# Testing cell
assert resultdf.shape == (919, 2)
assert resultdf["studentid"][0] == 61516
assert 61528 not in set(resultdf["studentid"])

> You've reached the second checkpoint in the lab.  Make sure to have it signed off by the instructor or TA.
>
> Checkpoint 2: The above query works for any status.  How would you limit it to classes the student is registered for, i.e., that they did not drop or withdraw?