# Homework 8

Answer the following questions using the Chinook database. You should download the Chinook data to your data folder (`../data/`).

Here is the ERD of the database:

![ERD](chinook-erd.png)

## Setup 

Set up the connection and cursor to the database.

In [2]:
import sqlite3

connection = sqlite3.connect("../data/Chinook_Sqlite.sqlite")
cursor = connection.cursor()

## Question 1

How many customers are there?

Write the Python SQL statements below to answer this question.

In [4]:
# The Customer table contains all of the distinct customers, so to answer this question
# we need to find out how many rows are in that table.
cursor.execute("select count() from Customer")
cursor.fetchone()

(59,)

You should find that there are 59 customers.

## Question 2 

Which customer bought the most number of tracks? How many tracks? Include the first and last name of the customer and the number of tracks purchased.

Write the Python SQL statements below to answer this question.

In [7]:
# NOTE: I am not considering the quantity of each track, though I probably should be...if I were
# to do that, I'd need to sum across the Quantity column.

# We need to find how how many tracks each customer bought -- this requires a join between the
# InvoiceLine table (which contains the tracks purchased) and the Invoice table (which gives
# us the CustomerId); we need to group by CustomerId and count the rows within each group.
# We also need the first and last name of the customer, so we need to join with the Customer
# table to get that.
# I'm only putting the "Table." in front of columns that are ambiguous (that is, column names
# that occur in more than one table involved in the query).
cursor.execute("select firstName, lastName, count(*) as trackCount from "+
               "InvoiceLine join Invoice on InvoiceLine.InvoiceId = Invoice.InvoiceId "+
               "            join Customer on Invoice.CustomerId =  Customer.CustomerId "+
               "group by Invoice.CustomerId "+
               "order by trackCount desc")
cursor.fetchone()

('Luís', 'Gonçalves', 38)

In [10]:
# Here's the same query, but using natural joins, which prevents us from having to spell out
# everything with the ON clause. It find the matching columns names and matches on those. 
# Just be careful with this, especially if there are multiple columns that overlap. If, for example,
# every talbe has a column named "id", which contains ids that are unique to that table, you'll
# run into problems using natural joins.
cursor.execute("select firstName, lastName, count(*) as trackCount from "+
               "InvoiceLine natural join Invoice "+
               "            natural join Customer "+
               "group by Invoice.CustomerId "+
               "order by trackCount desc")
cursor.fetchone()

('Luís', 'Gonçalves', 38)

In [12]:
# Here's yet another way to writ it using a USING clause instead of ON clause;
# USING allows you to specify one or more shared columns to join on -- kind of
# a nice medium between the two examples above.
cursor.execute("select firstName, lastName, count(*) as trackCount from "+
               "InvoiceLine join Invoice using (InvoiceId) "+
               "            join Customer using (CustomerId) "+
               "group by Invoice.CustomerId "+
               "order by trackCount desc")
cursor.fetchone()

('Luís', 'Gonçalves', 38)

You should expect a user with 38 tracks. There are many such users, but the one at the top of my list is: `('Manoj', 'Pareek', 38)`.

## Question 3

What were the 10 most purchased tracks? Include the name, artist, album, and tracks sold for each.

Write the Python SQL statements below to answer this question.

In [13]:
# NOTE: same note as for Question 2; I'm not considering the Quantity column of the
# InvoiceLine table, but I probably should be.

# We need to join the InvoiceLine table (which has the purchased tracks) with the Track
# table (to get the track name), join with the Album table to get the album title, and
# join with the Artist table to get the artist's name.
# We need to group by TrackId, count the number of tracks and order descending.
cursor.execute("select Track.Name, Artist.Name, Title, count() as tracksSold from "+
               "InvoiceLine join Track using (TrackId) "+
               "            join Album using (AlbumId) "+
               "            join Artist using (ArtistId) "+
               "group by InvoiceLine.TrackId "+
               "order by tracksSold desc")
cursor.fetchmany(10)

[('Balls to the Wall', 'Accept', 'Balls to the Wall', 2),
 ('Inject The Venom', 'AC/DC', 'For Those About To Rock We Salute You', 2),
 ('Snowballed', 'AC/DC', 'For Those About To Rock We Salute You', 2),
 ('Overdose', 'AC/DC', 'Let There Be Rock', 2),
 ('Deuces Are Wild', 'Aerosmith', 'Big Ones', 2),
 ('Not The Doctor', 'Alanis Morissette', 'Jagged Little Pill', 2),
 ('Por Causa De Você', 'Antônio Carlos Jobim', 'Warner 25 Anos', 2),
 ('Welcome Home (Sanitarium)',
  'Apocalyptica',
  'Plays Metallica By Four Cellos',
  2),
 ('Snowblind', 'Black Sabbath', 'Black Sabbath Vol. 4 (Remaster)', 2),
 ('Cornucopia', 'Black Sabbath', 'Black Sabbath Vol. 4 (Remaster)', 2)]