# University of Michigan Intermediate PostgreSQL Week 2 Exercise

## Musical Tracks Many-to-One

This application will read an iTunes library in comma-separated-values (CSV) format and produce properly normalized tables as specified below. 

Here is the structure of the tables you will need for this assignment:

```SQL
CREATE TABLE album (
  id SERIAL,
  title VARCHAR(128) UNIQUE,
  PRIMARY KEY(id)
);

CREATE TABLE track (
    id SERIAL,
    title VARCHAR(128),
    len INTEGER, rating INTEGER, count INTEGER,
    album_id INTEGER REFERENCES album(id) ON DELETE CASCADE,
    UNIQUE(title, album_id),
    PRIMARY KEY(id)
);

DROP TABLE IF EXISTS track_raw;
CREATE TABLE track_raw (
    title TEXT, 
    artist TEXT, 
    album TEXT, 
    album_id INTEGER,
    count INTEGER, 
    rating INTEGER, 
    len INTEGER
);
```

We will ignore the artist field for this assignment and focus on the many-to-one relationship between tracks and albums.
If you run the program multiple times in testing or with different files, make sure to empty out the data before each run.

Your assignment consists of the following after creating the tables above:

1. Load <a href = 'https://www.pg4e.com/tools/sql/library.csv?PHPSESSID=d19b381f98606474b340f9f53d19889d%22'>this CSV data file,</a> into the `track_raw` table using the `\copy` command. 

2. Write SQL commands to insert all of the <b>distinct</b> albums into the <span style='color:green'>album table</span>  (creating their primary keys).

3. <b>Set</b> the <span style='color:pink'>album_id</span> in the <span style='color:green'>track_raw table</span> using an SQL query like:
```SQL 
UPDATE track_raw SET album_id = (SELECT album.id FROM album WHERE album.title = track_raw.album);
```

4. Use an `INSERT ... SELECT` statement to copy the corresponding data from the <span style='color:green'>track_raw table </span> to the <span style='color:green'>track table</span>, effectively dropping the artist and album text fields.

5. To grade this assignment, the auto-grader will run a query like this on your database:
 
```SQL
SELECT track.title, album.title
    FROM track
    JOIN album ON track.album_id = album.id
    ORDER BY track.title LIMIT 3;
```    
The expected result of this query on your database is:

| track	| album |
| :---- | :---- |
| A Boy Named Sue (live) | The Legend Of Johnny Cash |
| A Brief History of Packets | Computing Conversations |
| Aguas De Marco | Natural Wonders Music Sampler 1999 |


This notebook uses both the IPython magic `%sql` and `%%sql` as well as the Psycopg2 DBAPI.  The reason why I chose to use both is because I like the simplicity of the IPython magic but I haven't figured out a way to copy CSV files using the magics.  Thus, I have to rely on Psycopg2's interface for this purpose.  

### Setting Up The Connection

In [1]:
# Import necessary libraries
# courses_db_user_julia contains the PostgreSQL settings as a dictionary file for privacy 
# Import some libraries

import sqlalchemy
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from courses_db_user_julia import postgresql as settings
from pgspecial.main import PGSpecial
import psycopg2 as ps
import pandas as pd

In [2]:
# Get Version

sqlalchemy.__version__ 

'1.4.46'

In [3]:
# Create a get_engine function to get our credentials and create an engine

def get_engine(user, passwd, host, port, db):
    url = f"postgresql://{user}:{passwd}@{host}:{port}/{db}"
    engine = create_engine(url)
    return engine

In [4]:
engine = get_engine(settings['user'],
                    settings['password'],
                    settings['host'],
                    settings['port'],
                    settings['dbname'])

In [5]:
# Start Session

session = sessionmaker(bind=engine)()
session

<sqlalchemy.orm.session.Session at 0x7fcfb9ffafd0>

#### IPython Magic!

In [6]:
# Load IPython-SQL module

%load_ext sql

In [7]:
# Create the connection using $
# The $ is a special character in IPython SQL that encloses variables with quotes

%sql $engine.url

In [8]:
# Remove connection display when using magics

%config SqlMagic.displaycon = False

### 0. Create Tables

Per the instructions above, we are to create 3 tables:

- album
- track
- track_raw

There is one error in the code above for the `track_raw` table.  It already contains an `album_id` column.  However, the csv file does not contain album_id data.  This particular column will be "self-generated" later.

In [9]:
%%sql

DROP TABLE IF EXISTS album CASCADE;
CREATE TABLE album (
  id SERIAL,
  title VARCHAR(128) UNIQUE,
  PRIMARY KEY(id)
);

DROP TABLE IF EXISTS track CASCADE;
CREATE TABLE track (
    id SERIAL,
    title VARCHAR(128),
    len INTEGER, rating INTEGER, count INTEGER,
    album_id INTEGER REFERENCES album(id) ON DELETE CASCADE,
    UNIQUE(title, album_id),
    PRIMARY KEY(id)
);

DROP TABLE IF EXISTS track_raw CASCADE;
CREATE TABLE track_raw (
    title TEXT, 
    artist TEXT, 
    album TEXT,
    count INTEGER, 
    rating INTEGER, 
    len INTEGER
); 

Done.
Done.
Done.
Done.
Done.
Done.


[]

### 1. Copy 'library.csv' File Into Database

Now we need to copy the <span style='color:pink'>track_raw data</span> into the <span style='color:green'>track_raw table</span>.  This can be done using the `\copy` function in the PostgreSQL psql <b><u>terminal</u></b>.  The reason we use `\copy` as opposed to `COPY` is because `COPY FROM` instructs the PostgreSQL <b>server</b> process to read a file. Whereas the `\copy` is used on the <b>client</b> side.

However, this doesn't seem to work here using IPython magic.  So, we will use Psycopg2.  The basic steps are:

- 1a) Create a Psycopg2 connection object
- 1b) Create a dump file object
- 1c) Create a cursor object
- 1d) Use the `cursor.copy_expert()` function to copy the csv file

#### 1a) Create a Psycopg2 connection object

In [10]:
conn = ps.connect(user=settings['user'],
                  password=settings['password'],
                  host=settings['host'],
                  port=settings['port'],
                  dbname=settings['dbname'])

#### 1b) Create a dump file object

In [11]:
# Create a file object where we will dump our csv file to 
# This will allow us to copy the dump into our table

file_obj = open('/mnt/a/docker_share/SQL/library.csv', 'r') 

#### 1c) Create a cursor object

Psycopg2 requires the use of a cursor object in order to execute commands to the PostgreSQL server.

In [12]:
cur = conn.cursor()

#### 1d) Use the `cursor.copy_expert()` function to copy our csv file

In general, the basic syntax is: &ensp;`.copy_expert(sql, file)`

More specifically...

-------------
The basic syntax to copy <b><u>FROM</u></b> a file to a table is: 

(note: STDIN is short for standard input -- STDIN is an input stream where data is sent to and read by a program)

```Python
connection.cursor.copy_expert (
    "COPY table_where_csv_data_goes
    FROM STDIN
    WITH (
        FORMAT CSV,
        DELIMITER ',',
        HEADER
    );",
    file_object
)
```

--------------
The basic syntax to copy a table <b><u>OUT</u></b> to save a file:

(note: STDOUT is short for standard output -- STDOUT is an output stream where data is sent to and read by a program)

```Python
connection.cursor.copy_expert (
    "COPY table_to_save
    FROM STDOUT
    WITH (
        FORMAT CSV,
        DELIMITER ',',
        HEADER
    );",
    file_object
)
```
<p style='color:orange'>Change the format, delimiter and/or header as required.</p>

In [13]:
# Check if there is any data in the track_raw table -- there shouldn't be 

%sql SELECT * FROM track_raw LIMIT 1;

0 rows affected.


title,artist,album,count,rating,len


In [14]:
# Copy the data from the input stream to the track_raw table

cur.copy_expert("COPY track_raw (title, artist, album, count, rating, len) FROM STDIN WITH (FORMAT CSV, DELIMITER ',');", file_obj)

<p><span style='color:red; font-size:20px'>&#x26A0;&ensp;Error?</span> &ensp;&ensp;&ensp; You need to rollback the command or nothing else will execute.</p>

In [None]:
# If you make a mistake or there is an error umcomment the code below and try again

# conn.rollback()

<span style='color:green; font-size:20px'>&#x2705;&ensp;Everything OK?</span>&ensp;&ensp;&ensp; Commit!

In [15]:
# If everything works and you want it to stick issue this command

conn.commit()

In [16]:
# Test to make sure everything worked

%sql SELECT * FROM track_raw LIMIT 3;

3 rows affected.


title,artist,album,count,rating,len
Another One Bites The Dust,Queen,Greatest Hits,55,100,217
Asche Zu Asche,Rammstein,Herzeleid,79,100,231
Beauty School Dropout,Various,Grease,48,100,239


Just as an FYI, it's also possible to use Psycopg2 to run other SQL commands.  This is done by the `cursor.execute()` function.  Multiple SQL commands can be sent at once.

For example:

```Python
cursor.execute ("""

DROP TABLE IF EXISTS track_raw;
CREATE TABLE track_raw (
    title TEXT, 
    artist TEXT, 
    album TEXT, 
    count INTEGER, 
    rating VARCHAR(128), 
    len INTEGER
    );
    
""")
```


### 2. Write SQL commands to insert all of the distinct albums into the album table (creating their primary keys)

First, recall the <span style='color:green'>album table</span> schema:

In [17]:
%%sql

SELECT 
    table_name, 
   column_name, 
   data_type 
FROM 
   information_schema.columns
WHERE 
   table_name = 'album';

2 rows affected.


table_name,column_name,data_type
album,id,integer
album,title,character varying


In [18]:
%%sql

INSERT INTO album (title) SELECT DISTINCT album FROM track_raw;

-- Verify 
SELECT * FROM album LIMIT 2;

41 rows affected.
2 rows affected.


id,title
1,Peanut Butter and Jam
2,Greatest Hits


### 3. Set the album_id in the track_raw table

Like:

```SQL
UPDATE track_raw SET album_id = (SELECT album.id FROM album WHERE album.title = track_raw.album);
```

First, we need to add this column into the <span style='color:green'>track_raw table</span>:

In [19]:
%sql ALTER TABLE track_raw ADD COLUMN album_id INT;

Done.


[]

In [20]:
%%sql 

UPDATE track_raw SET album_id = (SELECT album.id FROM album WHERE album.title = track_raw.album);

-- Verify
SELECT * FROM track_raw LIMIT 2;

296 rows affected.
2 rows affected.


title,artist,album,count,rating,len,album_id
Another One Bites The Dust,Queen,Greatest Hits,55,100,217,2
Asche Zu Asche,Rammstein,Herzeleid,79,100,231,30


In [21]:
%sql SELECT * FROM track_raw ORDER BY album_id LIMIT 2;

2 rows affected.


title,artist,album,count,rating,len,album_id
Depression in Session,Brent,Peanut Butter and Jam,4,,213,1
Another One Bites The Dust,Queen,Greatest Hits,55,100.0,217,2


### 4.Copy the corresponding data from the track_raw table to the track table

Use an INSERT ... SELECT statement to copy the corresponding data from the track_raw table to the track table, effectively dropping the artist and album text fields.

First, recall <span style='color:green'>track table</span> schema:

In [22]:
%%sql

SELECT 
    table_name, 
   column_name, 
   data_type 
FROM 
   information_schema.columns
WHERE 
   table_name = 'track';  

6 rows affected.


table_name,column_name,data_type
track,id,integer
track,len,integer
track,rating,integer
track,count,integer
track,album_id,integer
track,title,character varying


In [23]:
%%sql

INSERT INTO track (len, rating, count, album_id, title) SELECT len, rating, count, album_id, title FROM track_raw;

-- Verify
SELECT * FROM track LIMIT 2;

296 rows affected.
2 rows affected.


id,title,len,rating,count,album_id
1,Another One Bites The Dust,217,100,55,2
2,Asche Zu Asche,231,100,79,30


### 5. To grade this assignment, the auto-grader will run a query like this on your database:
 
```
SELECT track.title, album.title
    FROM track
    JOIN album ON track.album_id = album.id
    ORDER BY track.title LIMIT 3;
```    
The expected result of this query on your database is:

| track	| album |
| :---- | :---- |
| A Boy Named Sue (live) | The Legend Of Johnny Cash |
| A Brief History of Packets | Computing Conversations |
| Aguas De Marco | Natural Wonders Music Sampler 1999 |

So, let's check our work:

In [24]:
%%sql

SELECT track.title AS track, album.title AS album
    FROM track
    JOIN album ON track.album_id = album.id
    ORDER BY track.title LIMIT 3;

3 rows affected.


track,album
A Boy Named Sue (live),The Legend Of Johnny Cash
A Brief History of Packets,Computing Conversations
Aguas De Marco,Natural Wonders Music Sampler 1999


<span style='color:green; font-size:20px'>&#x2705;&ensp;Success!&ensp;&#x1F389;</span>

<p style='color:red; font-size:22px'>Make sure you CLOSE all connections once you're done:</p>

In [25]:
magic_connections = %sql -l
[c.session.close() for c in magic_connections.values()]

[None]

In [26]:
session.close()
engine = session.get_bind()
engine.dispose() 
file_obj.close()
cur.close()
conn.close()