# Database Design




## From Excel to DB to Answers

For this lab you will be taking a MS Excel file __with three tabs__ of data and using that to create a database in PostgreSQL.

The file is located under datasets as [Module4Data.xlsx](../../../datasets/Module4Data.xlsx).

### Data 
The three pages of data are: 
  1. Artist - contains artist name, genre, and year formed;
  1. Albums - contains artist name, album title, year produced
  1. Songs - contains album title, song name, song length.

You will notice that the last tab, Songs is similar to the unnormalized data discussed in previous examples.


### Methodology

Please recall the steps of the database design process!

  1. Discovery
  1. Modelling
  1. Defining

### Discovery 

You should now take a moment to go and look at the data.  Identify the entities that are relevant for your database, and what their respective attributes will be.  Then, contemplate the identifiers for those entities, review Canvas lessons if needed.


### Modelling
Once you have identified all the aspects of the database, try to sketch out a model.  
  1. How are the entities connected through relationships?
  1. Is there one table per file tab? 
  1. Are you going to use __id__ columns as done in the previous examples?
  1. Which columns of data create overlaps that we can use as foriegn keys to reference other tables?
  1. Is your model normalized to remove redundancy?
  
Your resulting model should be able to hold all the data that you have been given. 

This portion of the activity is one of the most important so don't be afraid to take a while to design the system.

### Defining

Once you have your model designed, it is time to write the CREATE TABLE statements into a text file or use the Notebook Cell below. Add any additional DDL you fee is appropriate as well.

#### Create Table for Artists

Write the Create Table statement for the artist table.
Ensure you use appropriate data types and column/table constraints as necessary.  

**Remember** to define a primary key for the table.

##  Artist SQL

```SQL
CREATE TABLE Artist (
    artist_id INT, 
    name varchar(100), 
    genre varchar(100), 
    year_formed INT,
    PRIMARY KEY (artist_id)
);
```

We make a choice fo using an `artist_id` because artists may have the same name and this integer type (counting id) will be well suited for `FOREIGN KEY` usage.


#### Create Table for Albums

Write the Create Table statement for the artist table.
Ensure you use appropriate data types and column/table constraints as necessary.  

**Remember** to link the Albums records to the Artist records via a foreign key relationship.

## Albums Definition

Recall, an Album is recorded by and Artist.
Therefore, we expect the album to have a foreign key reference back to the Artist table.
In fact, we can define the album to be a proper child of the artist by making the `artist_id` first part of the key structure for Album.

```SQL
CREATE TABLE Album (
    artist_id INT, 
    title varchar(100), 
    year_produced INT,
    PRIMARY KEY (artist_id, title),
    FOREIGN KEY (artist_id)
        REFERENCES Artist(artist_id)
);
```

Notice the primary key is a composite of the `(artist_id, title)`.

The FOREIGN KEY refers back to the parent entity.


** Alternatively **


```SQL
CREATE TABLE Album (
    album_id INT,
    artist_id INT, 
    title varchar(100), 
    year_produced INT,
    PRIMARY KEY (album_id),
    FOREIGN KEY (artist_id)
        REFERENCES Artist(artist_id)
);
```


## Songs Definition

Albums have tracks, so we can number the tracks sequentially on the album.
We will work with the second definition of Album, where the primary key is an `album_id`.
Therefore, we expect the song to have a foreign key reference back to the Album table.

We can define the song to be a proper child of the album by making the `album_id` first part of the key structure for Song.

```SQL
CREATE TABLE Song (
    album_id INT,
    track INT,
    title varchar(100), 
    length varchar(20),
    PRIMARY KEY (album_id, track),
    FOREIGN KEY (album_id)
        REFERENCES Album(album_id)
);
```

Notice the primary key is a composite of the `(artist_id, title)`.



Now that we have finalized your DDL, you need to create the database tables in PostgreSQL.  

You will add another table to this database and populate it with data in the practice.


In [None]:
#Import Libraries
import psycopg2
import getpass

#Set connection variables
database = "dsa_student"
user     = input("Type username (pawprint) and hit enter: ")
password = getpass.getpass("Type password and hit enter: ")


connection = psycopg2.connect(database = database,
                              user     = user,
                              host     = 'pgsql.dsa.lan',
                              password = password)


In [None]:
## Drop tables if they already exist
with connection, connection.cursor() as cursor:
    cursor.execute("DROP TABLE IF EXISTS Song")
    cursor.execute("DROP TABLE IF EXISTS Album;")
    cursor.execute("DROP TABLE IF EXISTS Artist;")

In [None]:
# SQL from above to create tables
#----------------------------------------------------------------

with connection, connection.cursor() as cursor:

    #Create table
    cursor.execute(
        '''
        CREATE TABLE Artist (
            artist_id INT,
            name varchar(100), 
            genre varchar(100), 
            year_formed INT, 
            PRIMARY KEY (artist_id)
        );
        '''
    )
   
    #Check table created
    cursor.execute("SELECT COLUMN_NAME FROM information_schema.COLUMNS WHERE TABLE_NAME ='artist' and TABLE_SCHEMA=\'%s\';" % user)
    results = cursor.fetchall()

print("Columns in table:")
for row in results:
    print(row)

In [None]:
with connection, connection.cursor() as cursor:
    
    #Create table
    cursor.execute(
        '''
        CREATE TABLE Album (
            album_id INT,
            artist_id INT, 
            title varchar(100), 
            year_produced INT,
            PRIMARY KEY (album_id),
            FOREIGN KEY (artist_id)
                REFERENCES Artist(artist_id)
        );
        '''
    )
    
    #Check table created
    cursor.execute("SELECT COLUMN_NAME FROM information_schema.COLUMNS WHERE TABLE_NAME ='album' and TABLE_SCHEMA=\'%s\';" % user)
    results = cursor.fetchall()

print("Columns in table:")
for row in results:
    print(row)

In [None]:
with connection, connection.cursor() as cursor:

    #Create table
    cursor.execute(
        '''
        CREATE TABLE Song (
            album_id INT, 
            track INT, 
            title varchar(100), 
            length varchar(20), 
            PRIMARY KEY (album_id, track), 
            FOREIGN KEY (album_id) 
                REFERENCES Album(album_id)
        );
        '''
    )

    #Check table created
    cursor.execute("SELECT COLUMN_NAME FROM information_schema.COLUMNS WHERE TABLE_NAME ='song' and TABLE_SCHEMA=\'%s\';" % user)
    results = cursor.fetchall()

print("Columns in table:")
for row in results:
    print(row)

In [None]:
## Close the connection
connection.close()

## <span style="background:yellow">Your Turn</span>

Copy the CREATE TABLE statement for Song and change it to use a Song ID as a primary key, then think about why we chose not to do that in the first place. (Your statement will error because the table already exists)



In [None]:
connection = psycopg2.connect(database = database,
                              user     = user,
                              host     = 'pgsql.dsa.lan',
                              password = password)

with connection, connection.cursor() as cursor:
    cursor.execute("DROP TABLE IF EXISTS Song;")    
    
    #Create table
    cursor.execute(
        '''
        CREATE TABLE Song (
            song_id INT,
            album_id INT,
            track INT,
            title varchar(100),
            length varchar(20),
            PRIMARY KEY (song_id),
            FOREIGN KEY (album_id)
                REFERENCES Album(album_id)
        );
        '''
    )
    
    #Check table created
    cursor.execute("SELECT COLUMN_NAME FROM information_schema.COLUMNS WHERE TABLE_NAME ='song' and TABLE_SCHEMA=\'%s\';" % user)
    results = cursor.fetchall()

print("Columns in table:")
for row in results:
    print(row)


connection.close()

Write a CREATE TABLE statement for a 'Genre' table that would hold the name of the genre, the year it was introduced, and an id as a primary field. Then think about how the artist table would change if this table existed beforehand.

In [None]:
connection = psycopg2.connect(database = database,
                              user     = user,
                              host     = 'pgsql.dsa.lan',
                              password = password)

with connection, connection.cursor() as cursor:
    cursor.execute("DROP TABLE IF EXISTS genre;")
    cursor.execute(
        '''
        CREATE TABLE genre(
            genre_name varchar(20),
            year_introduced int,
            genre_id int,
            PRIMARY KEY(genre_id)
        );
        '''
    )
    
    #Check table created
    cursor.execute("SELECT COLUMN_NAME FROM information_schema.COLUMNS WHERE TABLE_NAME ='genre' and TABLE_SCHEMA=\'%s\';" % user)
    results = cursor.fetchall()

print("Columns in table:")
for row in results:
    print(row)
    
connection.close()



# PLEASE SAVE YOUR NOTEBOOK