# University of Michigan Intermediate PostgreSQL Week 2 Exercise

## Musical Track Database plus Artists


This application will read an iTunes library in comma-separated-values (CSV) and produce properly normalized tables as specified below. 

We will do some things differently in this assignment. We will not use a separate "raw" table, we will just use `ALTER TABLE` statements to remove columns after we don't need them (i.e. we converted them into foreign keys).

We will use the same <a href='https://www.pg4e.com/tools/sql/library.csv?PHPSESSID=851f9f4ead91f173b21209852de9744b%22'>CSV track data</a> as in prior exercises - this time we will build a many-to-many relationship using a junction/through/join table between tracks and artists.

To grade this assignment, the program will run a query like this on your database:

```SQL
SELECT track.title, album.title, artist.name
    FROM track
    JOIN album ON track.album_id = album.id
    JOIN tracktoartist ON track.id = tracktoartist.track_id
    JOIN artist ON tracktoartist.artist_id = artist.id
    ORDER BY track.title
    LIMIT 3;
```

The expected result of this query on your database is:

| track	| album | artist|
| :---- | :---- | :---  |
| A Boy Named Sue (live) | The Legend Of Johnny Cash | Johnny Cash |
| A Brief History of Packets | Computing Conversations | IEEE Computer Society |
| Aguas De Marco | Natural Wonders Music Sampler 1999 | Rosa Passos |

<br>

In this assignment we will give you a partial script with portions of some of the commands replaced by three dots…

```SQL
DROP TABLE album CASCADE;
CREATE TABLE album (
    id SERIAL,
    title VARCHAR(128) UNIQUE,
    PRIMARY KEY(id)
);

DROP TABLE track CASCADE;
CREATE TABLE track (
    id SERIAL,
    title TEXT, 
    artist TEXT, 
    album TEXT, 
    album_id INTEGER REFERENCES album(id) ON DELETE CASCADE,
    count INTEGER, 
    rating INTEGER, 
    len INTEGER,
    PRIMARY KEY(id)
);

DROP TABLE artist CASCADE;
CREATE TABLE artist (
    id SERIAL,
    name VARCHAR(128) UNIQUE,
    PRIMARY KEY(id)
);

DROP TABLE tracktoartist CASCADE;
CREATE TABLE tracktoartist (
    id SERIAL,
    track VARCHAR(128),
    track_id INTEGER REFERENCES track(id) ON DELETE CASCADE,
    artist VARCHAR(128),
    artist_id INTEGER REFERENCES artist(id) ON DELETE CASCADE,
    PRIMARY KEY(id)
);

\copy track(title,artist,album,count,rating,len) FROM 'library.csv' WITH DELIMITER ',' CSV;

INSERT INTO album (title) SELECT DISTINCT album FROM track;
UPDATE track SET album_id = (SELECT album.id FROM album WHERE album.title = track.album);

INSERT INTO tracktoartist (track, artist) SELECT DISTINCT ...

INSERT INTO artist (name) ...

UPDATE tracktoartist SET track_id = ...
UPDATE tracktoartist SET artist_id = ...

-- We are now done with these text fields
ALTER TABLE track DROP COLUMN album;
ALTER TABLE track ...
ALTER TABLE tracktoartist DROP COLUMN track;
ALTER TABLE tracktoartist ...
```

This notebook uses both the IPython magic `%sql` and `%%sql` as well as the Psycopg2 DBAPI.  The reason why I chose to use both is because I like the simplicity of the IPython magic but I haven't figured out a way to copy CSV files using the magics.  Thus, I have to rely on Psycopg2's interface for this purpose.  

### Setting Up The Connection

In [1]:
# Import necessary libraries
# courses_db_user_julia contains the PostgreSQL settings as a dictionary file for privacy 
# Import some libraries

import sqlalchemy
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from courses_db_user_julia import postgresql as settings
from pgspecial.main import PGSpecial
import psycopg2 as ps
import pandas as pd

In [2]:
# Get Version

sqlalchemy.__version__ 

'1.4.46'

In [3]:
# Create a get_engine function to get our credentials and create an engine

def get_engine(user, passwd, host, port, db):
    url = f"postgresql://{user}:{passwd}@{host}:{port}/{db}"
    engine = create_engine(url)
    return engine

In [4]:
engine = get_engine(settings['user'],
                    settings['password'],
                    settings['host'],
                    settings['port'],
                    settings['dbname'])

In [5]:
# Start Session

session = sessionmaker(bind=engine)()
session

<sqlalchemy.orm.session.Session at 0x7f10c8ccacd0>

#### IPython Magic!

In [6]:
# Load IPython-SQL module

%load_ext sql

In [7]:
# Create the connection using $
# The $ is a special character in IPython SQL that encloses variables with quotes

%sql $engine.url

In [8]:
# Remove connection display when using magics

%config SqlMagic.displaycon = False

### 0. Create Tables

Per the instructions above, we are to create the following tables and are given the SQL commands:

In [9]:
%%sql

DROP TABLE IF EXISTS album CASCADE;
CREATE TABLE album (
    id SERIAL,
    title VARCHAR(128) UNIQUE,
    PRIMARY KEY(id)
);

DROP TABLE IF EXISTS track CASCADE;
CREATE TABLE track (
    id SERIAL,
    title TEXT, 
    artist TEXT, 
    album TEXT, 
    album_id INTEGER REFERENCES album(id) ON DELETE CASCADE,
    count INTEGER, 
    rating INTEGER, 
    len INTEGER,
    PRIMARY KEY(id)
);

DROP TABLE IF EXISTS artist CASCADE;
CREATE TABLE artist (
    id SERIAL,
    name VARCHAR(128) UNIQUE,
    PRIMARY KEY(id)
);

DROP TABLE IF EXISTS tracktoartist CASCADE;
CREATE TABLE tracktoartist (
    id SERIAL,
    track VARCHAR(128),
    track_id INTEGER REFERENCES track(id) ON DELETE CASCADE,
    artist VARCHAR(128),
    artist_id INTEGER REFERENCES artist(id) ON DELETE CASCADE,
    PRIMARY KEY(id)
);

Done.
Done.
Done.
Done.
Done.
Done.
Done.
Done.


[]

### 1. Copy 'library.csv' File Into Database

Now we need to copy the <span style='color:pink'>track data</span> into the <span style='color:green'>track table</span>.  This can be done using the `\copy` function in the PostgreSQL psql <b><u>terminal</u></b>.  The reason we use `\copy` as opposed to `COPY` is because `COPY FROM` instructs the PostgreSQL <b>server</b> process to read a file. Whereas the `\copy` is used on the <b>client</b> side.

However, this doesn't seem to work here using IPython magic.  So, we will use Psycopg2.  The basic steps are:

- 1a) Create a Psycopg2 connection object
- 1b) Create a dump file object
- 1c) Create a cursor object
- 1d) Use the `cursor.copy_expert()` function to copy the csv file

#### 1a) Create a Psycopg2 connection object

In [10]:
conn = ps.connect(user=settings['user'],
                  password=settings['password'],
                  host=settings['host'],
                  port=settings['port'],
                  dbname=settings['dbname'])

#### 1b) Create a dump file object

In [11]:
# Create a file object where we will dump our csv file to 
# This will allow us to copy the dump into our table

file_obj = open('/mnt/a/docker_share/SQL/library.csv', 'r') 

#### 1c) Create a cursor object

Psycopg2 requires the use of a cursor object in order to execute commands to the PostgreSQL server.

In [12]:
cur = conn.cursor()

#### 1d) Use the `cursor.copy_expert()` function to copy our csv file

In general, the basic syntax is: &ensp;`.copy_expert(sql, file)`

More specifically...

-------------
The basic syntax to copy <b><u>FROM</u></b> a file to a table is: 

(note: STDIN is short for standard input -- STDIN is an input stream where data is sent to and read by a program)

```Python
connection.cursor.copy_expert (
    "COPY table_where_csv_data_goes
    FROM STDIN
    WITH (
        FORMAT CSV,
        DELIMITER ',',
        HEADER
    );",
    file_object
)
```

--------------
The basic syntax to copy a table <b><u>OUT</u></b> to save a file:

(note: STDOUT is short for standard output -- STDOUT is an output stream where data is sent to and read by a program)

```Python
connection.cursor.copy_expert (
    "COPY table_to_save
    FROM STDOUT
    WITH (
        FORMAT CSV,
        DELIMITER ',',
        HEADER
    );",
    file_object
)
```
<p style='color:orange'>Change the format, delimiter and/or header as required.</p>

In [13]:
# Check if there is any data in the track table -- there shouldn't be 

%sql SELECT * FROM track LIMIT 1;

0 rows affected.


id,title,artist,album,album_id,count,rating,len


In [14]:
# Copy the data from the input stream to the track table

cur.copy_expert("COPY track (title, artist, album, count, rating, len) FROM STDIN WITH (FORMAT CSV, DELIMITER ',');", file_obj)

<p><span style='color:red; font-size:20px'>&#x26A0;&ensp;Error?</span> &ensp;&ensp;&ensp; You need to rollback the command or nothing else will execute.</p>

In [None]:
# If you make a mistake or there is an error umcomment the code below and try again

# conn.rollback()

<span style='color:green; font-size:20px'>&#x2705;&ensp;Everything OK?</span>&ensp;&ensp;&ensp; Commit!

In [15]:
# If everything works and you want it to stick issue this command

conn.commit()

In [16]:
# Test to make sure everything worked

%sql SELECT * FROM track LIMIT 3;

3 rows affected.


id,title,artist,album,album_id,count,rating,len
1,Another One Bites The Dust,Queen,Greatest Hits,,55,100,217
2,Asche Zu Asche,Rammstein,Herzeleid,,79,100,231
3,Beauty School Dropout,Various,Grease,,48,100,239


Just as an FYI, it's also possible to use Psycopg2 to run other SQL commands.  This is done by the `cursor.execute()` function.  Multiple SQL commands can be sent at once.

For example:

```Python
cursor.execute ("""
    
DROP TABLE IF EXISTS track CASCADE;
CREATE TABLE track (
    id SERIAL,
    title TEXT, 
    artist TEXT, 
    album TEXT, 
    album_id INTEGER REFERENCES album(id) ON DELETE CASCADE,
    count INTEGER, 
    rating INTEGER, 
    len INTEGER,
    PRIMARY KEY(id));
    
""")
```

### 2. Codes given to us to execute

The following SQL codes were provided to us to execute.  They basically are used to populate the other tables by using the imported track data above.

In [17]:
%%sql

INSERT INTO album (title) SELECT DISTINCT album FROM track;
UPDATE track SET album_id = (SELECT album.id FROM album WHERE album.title = track.album);

41 rows affected.
296 rows affected.


[]

### 3. Complete the following codes below 

```SQL
INSERT INTO tracktoartist (track, artist) SELECT DISTINCT ...

INSERT INTO artist (name) ...

UPDATE tracktoartist SET track_id = ...
UPDATE tracktoartist SET artist_id = ...
```

This is similar to what we did in the Musical Tracks Many-to-One exercise.  First, recall the schema for the <span style='color:green'>tracktoartist table</span>:

In [18]:
%%sql

SELECT
    column_name,
    data_type
FROM
    information_schema.columns
WHERE
    table_name='tracktoartist';

5 rows affected.


column_name,data_type
id,integer
track_id,integer
artist_id,integer
track,character varying
artist,character varying


#### 3a) Populate tracktoartist and artist tables using track table

In [19]:
%%sql

INSERT INTO tracktoartist (track, artist) SELECT DISTINCT title, artist FROM track;

INSERT INTO artist (name) SELECT DISTINCT artist FROM track;

296 rows affected.
51 rows affected.


[]

Verify it worked:

In [20]:
%sql tractoartist_table << SELECT * FROM tracktoartist LIMIT 2;
%sql album_table << SELECT * FROM album LIMIT 2;
%sql artist_table << SELECT * FROM artist LIMIT 2;

2 rows affected.
Returning data to local variable tractoartist_table
2 rows affected.
Returning data to local variable album_table
2 rows affected.
Returning data to local variable artist_table


In [21]:
 print('Track-to-Artist:\n {} \n Album:\n {} \n Artist:\n {}'.format(tractoartist_table, album_table, artist_table))

Track-to-Artist:
 +----+--------------------------------------+----------+---------------+-----------+
| id |                track                 | track_id |     artist    | artist_id |
+----+--------------------------------------+----------+---------------+-----------+
| 1  | Jack the Stripper/Fairies Wear Boots |   None   | Black Sabbath |    None   |
| 2  |            Asche Zu Asche            |   None   |   Rammstein   |    None   |
+----+--------------------------------------+----------+---------------+-----------+ 
 Album:
 +----+------------------------+
| id |         title          |
+----+------------------------+
| 1  | Peanut Butter and Jam  |
| 2  |     Greatest Hits      |
+----+------------------------+ 
 Artist:
 +----+------------------+
| id |       name       |
+----+------------------+
| 1  | The Black Crowes |
| 2  |  Chris Spheeris  |
+----+------------------+


#### 3b) Generate the foreign keys for tracktoartist table using primary keys from album & artist tables

You will need to get the <span style='color:pink'>track_id</span> from the <span style='color:green'>track table</span> and the <span style='color:pink'>artist_id</span> from the <span style='color:green'>artist table</span>

In [22]:
%%sql

UPDATE tracktoartist SET track_id = (SELECT track.id FROM track WHERE track.title = tracktoartist.track);
UPDATE tracktoartist SET artist_id = (SELECT artist.id FROM artist WHERE artist.name = tracktoartist.artist);

-- Verify
SELECT * FROM tracktoartist LIMIT 3;

296 rows affected.
296 rows affected.
3 rows affected.


id,track,track_id,artist,artist_id
3,Heavy,153,Brent,35
1,Jack the Stripper/Fairies Wear Boots,25,Black Sabbath,6
2,Asche Zu Asche,2,Rammstein,10


In [23]:
%sql SELECT * FROM tracktoartist ORDER BY track LIMIT 3;

3 rows affected.


id,track,track_id,artist,artist_id
46,A Boy Named Sue (live),102,Johnny Cash,34
206,A Brief History of Packets,224,IEEE Computer Society,16
72,Aguas De Marco,124,Rosa Passos,11


### 4. Clean up

We will use the `ALTER TABLE` statements to remove columns we converted into foreign keys.  The codes given to us are as follows:

```sql
ALTER TABLE track DROP COLUMN album;
ALTER TABLE track ...
ALTER TABLE tracktoartist DROP COLUMN track;
ALTER TABLE tracktoartist ...
SELECT track.title, album.title, artist.name 
    FROM track
    JOIN album ON track.album_id = album.id
    JOIN tracktoartist ON track.id = tracktoartist.track_id
    JOIN artist ON tracktoartist.artist_id = artist.id
    LIMIT 3;
```

In [24]:
%%sql

ALTER TABLE track DROP COLUMN album;
ALTER TABLE track DROP COLUMN artist;
ALTER TABLE tracktoartist DROP COLUMN track;
ALTER TABLE tracktoartist DROP COLUMN artist;

Done.
Done.
Done.
Done.


[]

### 5. To grade this assignment, the program will run a query like this on your database:

```SQL
SELECT track.title, album.title, artist.name
    FROM track
    JOIN album ON track.album_id = album.id
    JOIN tracktoartist ON track.id = tracktoartist.track_id
    JOIN artist ON tracktoartist.artist_id = artist.id
    ORDER BY track.title
    LIMIT 3;
```

The expected result of this query on your database is:

| track	| album | artist|
| :---- | :---- | :---  |
| A Boy Named Sue (live) | The Legend Of Johnny Cash | Johnny Cash |
| A Brief History of Packets | Computing Conversations | IEEE Computer Society |
| Aguas De Marco | Natural Wonders Music Sampler 1999 | Rosa Passos |


So, let's check our work:

In [25]:
%%sql

SELECT track.title AS track, album.title AS album, artist.name AS artist
    FROM track
    JOIN album ON track.album_id = album.id
    JOIN tracktoartist ON track.id = tracktoartist.track_id
    JOIN artist ON tracktoartist.artist_id = artist.id
    ORDER BY track.title
    LIMIT 3;

3 rows affected.


track,album,artist
A Boy Named Sue (live),The Legend Of Johnny Cash,Johnny Cash
A Brief History of Packets,Computing Conversations,IEEE Computer Society
Aguas De Marco,Natural Wonders Music Sampler 1999,Rosa Passos


<span style='color:green; font-size:20px'>&#x2705;&ensp;Success!&ensp;&#x1F389;</span>

<p style='color:red; font-size:22px'>Make sure you CLOSE all connections once you're done:</p>

In [26]:
magic_connections = %sql -l
[c.session.close() for c in magic_connections.values()]

[None]

In [27]:
session.close()
engine = session.get_bind()
engine.dispose() 
file_obj.close()
cur.close()
conn.close()