# SQLite Homework

For this week's graded assignment, you will be using the Chinook database. The Chinook database represents a hypothetical music store and you have been brought on as a consultant to help the company gain a better understanding of their store, customers, and employees.

To complete these exercises, you may find it helpful to refer to the Chinook database schema. The arrows indicate the columns that link the tables together.

<img src="Images/chinook_schema.jpg" alt="Chinook schmema" width="600" />

In [None]:
import sqlite3
import pandas as pd

For this assignment, you will need to use the `chinook.db` database that is stored in the `Data/` folder.

In [None]:
# Connect to the Chinook database
conn =

In [None]:
# Create a cursor object
cursor =

# Explore the Database

To learn more about the company, you should first begin by taking a look at some of the types of data they are storing in their database. Run the cell to print out a list of the database tables and then follow the instructions below.

In [None]:
# Query the table names
cursor.execute("""
        SELECT name
        FROM sqlite_master
        WHERE type = 'table'
        ORDER BY name;
""").fetchall()

# Albums

You decide to first take a closer look at the data in the `albums` table. To run this query, you should chain together the `cursor` object's `execute()` and `fetchall()` methods and select all of the columns in the table.

In [None]:
# Query the `albums` table


That's a lot of albums! Use the `COUNT` function to find the total number of rows in the `albums` table.

In [None]:
# Count the number of rows in the `albums` table


Now that you know how many rows are in the `albums` table, you decide to query the table again, but this time adding a `LIMIT` clause to only return the first five rows.

In [None]:
# Query the first 5 rows of the `albums` table


# Tracks

Next, you decide to take a look at the type of data stored in the `tracks` table. Similar to your last query, you decide to select all of the table columns. However, instead of adding a `LIMIT` clause to control the number of rows returned, instead you decide to use pandas `read_sql()` method to store the results in a pandas DataFrame.

In [None]:
# Query all of the columns in the `tracks` table
df = 

# Print the first 5 rows
df.head()

## Missing Data

You notice that the second row in the `tracks` table is missing a `composer` value, so you'd like to know if there are other rows missing this information. To answer your question, follow these steps:
- `SELECT` the `name` and `composer` columns `FROM` the `tracks` table
- Add a `WHERE` clause with the `IS NULL` operator to return rows missing a `composer` value
- Use `ORDER BY` to sort your results by the `name` field

*Please note: To receive credit for this assignment, you must write the query using SQL and cannot use the pandas DataFrame you created in the previous exercise. Also, for the rest of the exercise you may use either sqlite3 or pandas to execute your queries.*

In [None]:
# Query the missing data


## Number of Songs

In the `tracks` table, you notice there is an `albumid` field that you can use to find out the number of tracks on each album.
- In the `SELECT` statement, include the `albumid` column and apply the `COUNT` function to the `trackid` column. You should also alias the result of the `COUNT` function as `'No_Songs'`
- Use a `GROUP BY` clause to group the songs (rows) in the `tracks` table by `albumid`

In [None]:
# Count the number of songs per album


You're surprised to see that there are a lot of albums that only have one song, so you are curious to see what the highest number of songs is. To do this, you repeat your previous search, but this time adding an `ORDER BY` clause that uses the `No_Songs` alias to sort the results in descending order. In addition, you decide to limit your results to just the top five rows.

In [None]:
# Find the albums with the highest number of songs


# Album Titles

You liked being able to see how many songs (rows) were on each record. However, you don't want to have to use the `albumid` value to look up the name of each album. Therefore, you decide to use an `INNER JOIN` to combine rows from the `tracks` and the `albums` table. 

```
SELECT column_name(s)
FROM tableA
  INNER JOIN tableB 
          ON tableA.column = tableB.column;
```

- `SELECT` the `title` and `COUNT(trackid)` aliased as `No_Songs`. Also `SELECT` the `albumid` field from the `tracks` table (using dot notation).
- `FROM` the `tracks` table, use an `INNER JOIN` to join the `albums` and `tracks` tables. (*Hint: look at the schema above to see which column to join them on.) 
- `GROUP BY` the `albumid` field from the `tracks` table (use dot notation)
- Use `ORDER BY` to sort the results by the `No_Songs` alias in descending order
- Limit your results to just the first five rows

In [None]:
# INNER JOIN the `tracks` and `albums` tables
df = 

# Print your results
df

# Customers

Now you decide to move on and learn more about the company's finances. You know the company has customers all across the world so you decide to query a list of the countries where they live in. 

The `invoices` table contains information about the countries the store's customers live in. Start your query by using `SELECT DISTINCT` to find the unique values in the `billingcountries` column. Then use `ORDER BY` to sort your results alphabetically by `billingcountry`.

In [None]:
# Query a list of where the customers live


## International Customers

One of the ideas you're thinking of pitching to the company is an email marketing campaign to customers living outside of the United States. To query your email list:
- `SELECT` the `firstname`, `lastname`, `country`, and `email` fields from the `customers` table.
- Use a `WHERE` clause to exclude customers living in the United States
- Use `ORDER BY` to sort your results by the `country` column in ascending order

In [None]:
# Query email addresses for customers outside the USA


## Customers Per Country

Hmm, that's a lot of different countries to try to target. Instead, you decide to find out which of those countries have the largest number of customers using the database's `invoices` table.

In [None]:
# Find the countries with the largest number of customers


## Total Sales

The number of customers is important but you are starting to think that finding out which countries have the highest total sales might be a better strategy. *Hint: instead of COUNT, you will need to use a different aggregate function*

In [None]:
# Find the countries with the highest sales


## Email Provider

Finally, you noticed a few customers still had an `aol.com` email address, so you are not sure how up-to-date the email addresses are in the company's database. For this final exercise, query the `customers` table to find only those customers with a `gmail.com` address.

In [None]:
# Query for only Gmail addresses


Even though you didn't modify any of the company's data, it is always good practice to commit and close your database connection every time!

In [None]:
# Commit any changes
conn.commit()

In [None]:
# Close the database connection
conn.close()

Before submitting this assignment on Canvas, please be sure to run `Kernel` &mdash; `Restart & Run All`!