# More SQL

Start by downloading a MySQL and SQLite client for your computer:
- MySQL
  - MySQL Workbench
    - all operating systems
    - https://dev.mysql.com/downloads/workbench/
  - Sequel Pro
    - Mac only
    - https://sequelpro.com/download
- SQLite
  - DB Browser for SQLite
  - https://sqlitebrowser.org/dl/

We're working with the [Chinook data set](https://archive.codeplex.com/?p=chinookdatabase) today.  Some database files are in the `/data` folder here.

### Different relational DBs:
https://en.wikipedia.org/wiki/Comparison_of_relational_database_management_systems

In [1]:
!ls data/

[31mChinook_MySql.sql[m[m     [31mChinook_Sqlite.sql[m[m    [31mChinook_Sqlite.sqlite[m[m


## Connect to MySQL
I have a MySQL instance running in the cloud right now.  Connect to it with the following parameters:
- __Connection Name__: Whatever you want (`flatiron-demo`?)
- __Connection Method__: TCP/IP
- __Hostname__: `demo1.c1doesqrid0e.us-east-1.rds.amazonaws.com`
- __Port__: `3306` (leave default)
- __Username__: `flatiron`
- __Password__: will give in class or over Slack

Note: As is, this will only work from our current location (by IP address)!

You can also connect using Terminal:
```bash
mysql -h demo1.c1doesqrid0e.us-east-1.rds.amazonaws.com -u flatiron -p
```

## Data Exploration
Most of today is just going to be exploring and answering questions about our data.  We'll first do this in the client, and then move our answers into Python.  The questions we'll be answering are below.

In [3]:
# If you haven't yet already:
# !pip3 install pymysql

In [5]:
import pandas as pd
import pymysql
import sqlite3

In [6]:
host = 'demo1.c1doesqrid0e.us-east-1.rds.amazonaws.com'
port = 3306
user = 'flatiron'
db = 'Chinook'

In [7]:
password = 'SecurePassword1440'

In [8]:
conn_mysql = pymysql.connect(host=host, port=port, user=user, passwd=password, db=db)
conn_sqlite = sqlite3.connect('data/Chinook_Sqlite.sqlite')

In [23]:
pd.read_sql_query('''SELECT 'A' ''', conn_mysql)

Unnamed: 0,A
0,A


In [24]:
pd.read_sql_query('SELECT count(*) AS artist_count, artistid FROM Album group by ArtistId', conn_mysql)


Unnamed: 0,artist_count,artistid
0,2,1
1,2,2
2,1,3
3,1,4
4,1,5
5,2,6
6,1,7
7,3,8
8,1,9
9,1,10


In [16]:
pd.read_sql_query('SELECT count(*) AS artist_count, artistid FROM Album group by ArtistId;', conn_sqlite)


Unnamed: 0,artist_count,ArtistId
0,2,1
1,2,2
2,1,3
3,1,4
4,1,5
5,2,6
6,1,7
7,3,8
8,1,9
9,1,10


In [25]:
# Provide a query showing a unique/distinct list of billing
# countries from the Invoice table.
pd.read_sql_query("""
SELECT BillingCountry FROM Chinook.Invoice
GROUP BY 1
LIMIT 10;
""", conn_mysql)

Unnamed: 0,BillingCountry
0,Argentina
1,Australia
2,Austria
3,Belgium
4,Brazil
5,Canada
6,Chile
7,Czech Republic
8,Denmark
9,Finland


In [None]:
# Provide a query that shows the Invoice Total, Customer name, 
# Country and Sale Agent name for all invoices and customers.


In [26]:
# Provide a query only showing the Customers from Brazil.


In [None]:
# Provide a query showing the Invoices of customers who are from Brazil.
# The resultant table should show the customer's full name, Invoice ID,
# Date of the invoice and billing country.


In [None]:
# Provide a query that shows the # of invoices per country.


In [None]:
# Provide a query that shows the Invoice Total, Customer name,
# Country and Sale Agent name for all invoices and customers.


In [26]:
# Provide a query that shows all Invoices but includes the # of
# invoice line items.
pd.read_sql_query("""SELECT
  i.*
  , count(*) AS invoice_line_count
FROM InvoiceLine il
JOIN Invoice i USING (InvoiceId)
GROUP BY InvoiceId
LIMIT 10;
""", conn_mysql)

Unnamed: 0,InvoiceId,CustomerId,InvoiceDate,BillingAddress,BillingCity,BillingState,BillingCountry,BillingPostalCode,Total,invoice_line_count
0,1,2,2009-01-01,Theodor-Heuss-Straße 34,Stuttgart,,Germany,70174,1.98,2
1,2,4,2009-01-02,Ullevålsveien 14,Oslo,,Norway,0171,3.96,4
2,3,8,2009-01-03,Grétrystraat 63,Brussels,,Belgium,1000,5.94,6
3,4,14,2009-01-06,8210 111 ST NW,Edmonton,AB,Canada,T6G 2C7,8.91,9
4,5,23,2009-01-11,69 Salem Street,Boston,MA,USA,2113,13.86,14
5,6,37,2009-01-19,Berger Straße 10,Frankfurt,,Germany,60316,0.99,1
6,7,38,2009-02-01,Barbarossastraße 19,Berlin,,Germany,10779,1.98,2
7,8,40,2009-02-01,"8, Rue Hanovre",Paris,,France,75002,1.98,2
8,9,42,2009-02-02,"9, Place Louis Barthou",Bordeaux,,France,33000,3.96,4
9,10,46,2009-02-03,3 Chatham Street,Dublin,Dublin,Ireland,,5.94,6


In [None]:
# Provide a query that includes the purchased track name with each
# invoice line item.


In [None]:
# Provide a query that includes the purchased track name AND artist
# name with each invoice line item.


In [None]:
# Looking at the InvoiceLine table, provide a query that COUNTs the
# number of line items for each Invoice.


In [None]:
# Provide a query showing Customers (just their full names, customer ID
# and country) who are not in the US.


In [None]:
# Provide a query that shows total sales made by each sales agent.


In [30]:
# Which sales agent made the most in sales in 2009?
# Hint: Use the MAX function on a subquery
pd.read_sql_query("""
-- # Which sales agent made the most in sales in 2009?
-- # Hint: Use the MAX function on a subquery
SELECT SupportRepId, FirstName, LastName, SUM(invoice_count) AS total_sales 
FROM
(SELECT CustomerId, count(*) as invoice_count, c.SupportRepId
FROM Invoice i
JOIN Customer c USING (CustomerId)
WHERE InvoiceDate BETWEEN '2009-01-01' AND '2009-12-31'
GROUP BY CustomerId) AS q1
JOIN Employee e ON q1.SupportRepId = e.EmployeeId
GROUP BY 1
ORDER BY 4 DESC
LIMIT 1;
""", conn_mysql)

Unnamed: 0,CustomerId,invoice_count,SupportRepId
0,2,3,5
1,4,3,4
2,5,1,4
3,6,1,5
4,7,1,5
5,8,2,4
6,9,2,4
7,10,1,4
8,11,2,5
9,12,1,3


In [29]:
pd.read_sql_query("""
-- # Which sales agent made the most in sales in 2009?
-- # Hint: Use the MAX function on a subquery
WITH q1 AS
(SELECT CustomerId, count(*) as invoice_count, c.SupportRepId
FROM Invoice i
JOIN Customer c USING (CustomerId)
WHERE InvoiceDate BETWEEN '2009-01-01' AND '2009-12-31'
GROUP BY CustomerId)

SELECT SupportRepId, FirstName, LastName, SUM(invoice_count) AS total_sales 
FROM q1
JOIN Employee e ON q1.SupportRepId = e.EmployeeId
GROUP BY 1
ORDER BY 4 DESC
LIMIT 1;
""", conn_sqlite)

Unnamed: 0,SupportRepId,FirstName,LastName,total_sales
0,4,Margaret,Park,30


In [None]:
# Provide a query that shows the most purchased track of 2013.


In [None]:
# Provide a query that shows the top 3 best selling artists.
