## What is Data modeling
"... an abstraction that **organizes elements if data** and **how they** will **relate** to each other.

Data modelling can easily translate to database modelling as this is the essential **end state**.

### Common Questions
**Why can't everything be stored in a giant Excel spreadsheet?**

* There are limitations to the amount of data that can be stored in an Excel sheet. So, a database helps organize the elements into tables - rows and columns, etc. Also reading and writing operations on a large scale is not possible with an Excel sheet, so it's better to use a database to handle most business functions.

**Does data modeling happen before you create a database, or is it an iterative process?**

* It's definitely an iterative process. Data engineers continually reorganize, restructure, and optimize data models to fit the needs of the organization.

**How is data modeling different from machine learning modeling?**

Machine learning includes a lot of data wrangling to create the inputs for machine learning models, but data modeling is more about how to structure data to be used by different people within an organization. You can think of data modeling as the process of designing data and making it available to machine learning engineers, data scientists, business analytics, etc., so they can make use of it easily.

### Quiz Question
Choose the Correct Order of the Data Modeling Process:
- [ ] Pysical -> Logical -> Conceptual
- [ ] Logical -> Pysical -> Conceptual
- [ ] Conceptual -> Pysical -> Logical 
- [x] Conceptual -> Logical -> Pysical

## Why is Data Modelling important?

### Key points about Data Modeling 

* **Data Organization**: The organization of the data for your applications is extremely important and makes everyone's life easier.
* **Use cases**: Having a well thought out and organized data model is critical to how that data can later be used. Queries that could have been straightforward and simple might become complicated queries if data modeling isn't well thought out.
* **Starting early**: Thinking and planning ahead will help you be successful. This is not something you want to leave until the last minute.
* **Iterative Process**: Data modeling is not a fixed process. It is iterative as new requirements and data are introduced. Having flexibility will help as new information becomes available.

### Example of Why Data Modeling is Important:
Let's take an example from Udacity. Here, a Udacity data engineer would help structure the data so it can be used by different people within Udacity for further analysis and also shared with the learner on the website. For instance, when we want to track the students' progress within a Nanodegree program, we want to aggregate data across students and projects within a Nanodegree. In a relational database, this requires the data to be structured in ways that each student's data is tracked across all Nanodegree programs that s/he has ever enrolled in. The data also needs to track the student's progress within each of those Nanodegree programs.

The data model is critical for accurately representing each data object. For instance, a data table would track a student's progress on project submissions, i.e., whether they passed or failed a specific rubric requirement. Furthermore, the data model should ensure that a student's progress is updated and aggregated to provide an indicator of whether the student passed all the rubric requirements and successfully finished the project. Data modeling is critical to track all of these pieces of data so the tables are speaking to each other, updating the tables correctly (e.g., updating a student's progress on a project submission), and meeting defined rules (e.g., project completed when all rubric requirements are passed).

### Quiz Question
Who should focus on learning data modeling? Choose one response.

- [ ] Data Scientists
- [ ] Data Engineers
- [ ] Software Engineers
- [x] Everyone who deals with data!

## Intro to Relational Databases

### Relational Model
THis model **organizes data into** one or more tables(or "**relations**") **of columns and rows**, with a **unique key** identifying each row. Generally, each table represents one "entity type" (such as customer or product."

### Relational Database
Invented by Edgar Codd(1970)

"... is a digital database **based on relational model** of data...a software system used to maintain relational databases is a relational database management system (RDBMS)."  

"SQL(Strutured Query Language) is the language used accross almost all relational database system for querying and maintaining the database."

### Common Types of Relational Databases
* Oracle  --> Used in Enterprise
* Teradata
* MySql
* PostgresSQL
* SQlite  --> Used in App development

### The Basics
* Database/Schema
    - Collection of Tables
* Tables/Relation
    * A group of rows sharing the same labeled elements
        * Customers
* Columns/Attribute
    * Labeled element
        * Name, email, city
* Rows/Tuple
    * A single item
        * Amanda, jdoe@xyc.com. NYC
        
### Quiz Questions
What is RDBMS?
- [ ] Relateable Database Management System
- [x] Relational Database Management System
- [ ] Readable Data Management System
- [ ] Reachable Database Management System

## Quiz: Relational Databases
### Q1: 
True or False: A column holds multiple tables.
- [ ] True
- [x] False

### Q2:
True or False: An attribute is another name for a column.
- [x] True
- [ ] False

### Q3
True or False: A schema is a collection of tables in some database terminology.
- [x] True
- [ ] False

## When to use a Relational Database?
* **Flexibility for writing in SQL queries**: With SQL being the most common database query language.
* **Modeling the data not modeling queries**
* **Ability to do JOINS**
* **Ability to do aggregations and analytics**
* **Secondary Indexes available** : You have the advantage of being able to add another index to help with quick searching.
* **Smaller data volumes**: If you have a smaller data volume (and not big data) you can use a relational database for its simplicity.
* **ACID Transactions**: Allows you to meet a set of properties of database transactions intended to guarantee validity even in the event of errors, power failures, and thus maintain data integrity.
* **Easier to change to business requirements**

## ACID Transactions
Properties of database transactions intended to guarantee validity even in the event of errors or power failures.

* **Atomicity**: The whole transaction is processed or nothing is processed. A commonly cited example of an atomic transaction is money transactions between two bank accounts. The transaction of transferring money from one account to the other is made up of two operations. First, you have to withdraw money in one account, and second you have to save the withdrawn money to the second account. An atomic transaction, i.e., when either all operations occur or nothing occurs, keeps the database in a consistent state. This ensures that if either of those two operations (withdrawing money from the 1st account or saving the money to the 2nd account) fail, the money is neither lost nor created. Source [Wikipedia](https://en.wikipedia.org/wiki/Atomicity_(database_systems)) for a detailed description of this example.

* **Consistency**: Only transactions that abide by constraints and rules are written into the database, otherwise the database keeps the previous state. The data should be correct across all rows and tables. Check out additional information about consistency on [Wikipedia](https://en.wikipedia.org/wiki/Consistency_(database_systems)).

* **Isolation**: Transactions are processed independently and securely, order does not matter. A low level of isolation enables many users to access the data simultaneously, however this also increases the possibilities of concurrency effects (e.g., dirty reads or lost updates). On the other hand, a high level of isolation reduces these chances of concurrency effects, but also uses more system resources and transactions blocking each other. Source: [Wikipedia](https://en.wikipedia.org/wiki/Isolation_(database_systems)).

* **Durability**: Completed transactions are saved to database even in cases of system failure. A commonly cited example includes tracking flight seat bookings. So once the flight booking records a confirmed seat booking, the seat remains booked even if a system failure occurs. Source: [Wikipedia](https://en.wikipedia.org/wiki/ACID).

### Question1
Which of these are benefits of a relational database?

- [x] ACID Transactions
- [x] Ability to do JOINS
- [ ] Can handle big data
- [x] Easy to change business requirements on the data

### Question2
Can you JOIN a table with another table on any column?

- [x] Yes, as long there are matching values in the columns
- [ ] No, only on columns with the same name
- [ ] Yes, you can join any columns together
- [ ] No, only on columns in the same table

### When Not to Use a Relational Database
* **Have large amounts of data**: Relational Databases are not distributed databases and because of this they can only scale vertically by adding more storage in the machine itself. You are limited by how much you can scale and how much data you can store on one machine. You cannot add more machines like you can in NoSQL databases.
* **Need to be able to store different data type formats**: Relational databases are not designed to handle unstructured data.
* **Need high throughput -- fast reads**: While ACID transactions bring benefits, they also slow down the process of reading and writing data. If you need very fast reads and writes, using a relational database may not suit your needs.
* **Need a flexible schema**: Flexible schema can allow for columns to be added that do not have to be used by every row, saving disk space.
* **Need high availability**: The fact that relational databases are not distributed (and even when they are, they have a coordinator/worker architecture), they have a single point of failure. When that database goes down, a fail-over to a backup system occurs and takes time.
* **Need horizontal scalability**: Horizontal scalability is the ability to add more machines or nodes to a system to increase performance and space for data.

### Question 1  
True or False: Relational Databases are traditionally horizontally scalable.
- [ ] True
- [x] False

### Question 2
When should you use a Relational Database?
- [x] Small amount of data
- [x] Need to be able to do aggregations
- [x] Need ACID transactions
- [ ] You need to be able to scale out quickly
- [x] Need to be able to join multiple tables

## What is PostgreSQL
* Open source object-relational database system
* Uses and builds on SQL language.

## Lesson 1 Demo 0: PostgreSQL and AutoCommits

### Walkthrough the basics of PostgreSQL

In [1]:
# first need to install psycopg2 library
!pip install psycopg2



In [16]:
## import postgreSQL adapter for the Python
import psycopg2

### Create a connection to the database
1. Connect to the local instance of PostgreSQL (*127.0.0.1*)
2. Use the database/schema from the instance. 
3. The connection reaches out to the database (*studentdb*) and use the correct privilages to connect to the database (*user and password = student*).

In [17]:
conn = psycopg2.connect("host=127.0.0.1 dbname=studentdb user=student password=student")

### Use the connection to get a cursor that will be used to execute queries.

In [18]:
cur = conn.cursor()

### Create a database to work in

In [19]:
cur.execute("select * from test")

UndefinedTable: relation "test" does not exist
LINE 1: select * from test
                      ^


### Error occurs, but it was to be expected because table has not been created as yet. To fix the error, create the table. 

In [20]:
cur.execute("CREATE TABLE test (col1 int, col2 int, col3 int);")

InFailedSqlTransaction: current transaction is aborted, commands ignored until end of transaction block


### Error indicates we cannot execute this query. Since we have not committed the transaction and had an error in the transaction block, we are blocked until we restart the connection.

In [21]:
conn = psycopg2.connect("host=127.0.0.1 dbname=studentdb user=student password=student")
cur = conn.cursor()

In our exercises instead of worrying about commiting each transaction or getting a strange error when we hit something unexpected, let's set autocommit to true. **This says after each call during the session commit that one action and do not hold open the transaction for any other actions. One action = one transaction.**

In this demo we will use automatic commit so each action is commited without having to call `conn.commit()` after each command. **The ability to rollback and commit transactions are a feature of Relational Databases.**

In [22]:
conn.set_session(autocommit=True)

In [23]:
cur.execute("select * from test")

UndefinedTable: relation "test" does not exist
LINE 1: select * from test
                      ^


In [24]:
cur.execute("CREATE TABLE test (col1 int, col2 int, col3 int);")

### Once autocommit is set to true, we execute this code successfully. There were no issues with transaction blocks and we did not need to restart our connection. 

In [25]:
cur.execute("select * from test")

In [26]:
cur.execute("select count(*) from test")
print(cur.fetchall())

[(0,)]


### Dropping the table

In [27]:
cur.execute("DROP TABLE test")

## Lesson 1 Demo 1: Creating a Table with PostgreSQL
<br><li>Creating a table <li>Inserting rows of data, <li>Running a simple SQL query to validate the information. 

### Typically, we would use a python wrapper called *psycopg2* to run the PostgreSQL queries. This library should be preinstalled but in the future to install this library, run the following command in the notebook to install locally: 
!pip3 install --user psycopg2
#### More documentation can be found here: http://initd.org/psycopg/ 

### Import the library 
Note: An error might popup after this command has executed. Read it carefully before proceeding.

In [30]:
import psycopg2

### Create a connection to the database
1. Connect to the local instance of PostgreSQL (*127.0.0.1*)
2. Use the database/schema from the instance. 
3. The connection reaches out to the database (*studentdb*) and uses the correct privileges to connect to the database (*user and password = student*).

### Note 1: This block of code will be standard in all notebooks. 
### Note 2: Adding the try except will make sure errors are caught and understood

In [31]:
try: 
    conn = psycopg2.connect("host=127.0.0.1 dbname=studentdb user=student password=student")
except psycopg2.Error as e: 
    print("Error: Could not make connection to the Postgres database")
    print(e)

### Use the connection to get a cursor that can be used to execute queries.

In [32]:
try:
    cur = conn.cursor()
except psycopg2.Error as e:
    print("Error: Could not get curser to the Database")
    print(e)

### Use automactic commit so that each action is commited without having to call conn.commit() after each command. The ability to rollback and commit transactions is a feature of Relational Databases. 

In [33]:
conn.set_session(autocommit=True)

### Test the Connection and Error Handling Code
The try-except block should handle the error: We are trying to do a select * on a table but the table has not been created yet.

In [34]:
try: 
    cur.execute("select * from udacity.music_library")
except psycopg2.Error as e:
    print(e)

relation "udacity.music_library" does not exist
LINE 1: select * from udacity.music_library
                      ^



### Create a database to work in 

In [36]:
try: 
    cur.execute("create database udacity")
except psycopg2.Error as e:
    print(e)

### Close our connection to the default database, reconnect to the Udacity database, and get a new cursor.

In [37]:
try: 
    conn.close()
except psycopg2.Error as e:
    print(e)
  
try: 
    conn = psycopg2.connect("host=127.0.0.1 dbname=studentdb user=student password=student")
except psycopg2.Error as e: 
    print("Error: Could not make connection to the Postgres database")
    print(e)
    
try: 
    cur = conn.cursor()
except psycopg2.Error as e: 
    print("Error: Could not get curser to the Database")
    print(e)

conn.set_session(autocommit=True)

### We will create a Music Library of albums. Each album has a lot of information we could add to the music library table. We will  start with album name, artist name, year. 
`Table Name: music_library
column 1: Album Name
column 2: Artist Name
column 3: Year `
### Translate this information into a Create Table Statement. 

Review this document on PostgreSQL datatypes: https://www.postgresql.org/docs/9.5/datatype.html


In [42]:
try:
    cur.execute("CREATE TABLE IF NOT EXISTS music_library (album_name varchar, artist_name varchar, year int);")
except psycopg2.Error as e:
    print("Error: Issue creating table")
    print(e)

### No error was found, but lets check to ensure our table was created.  `select count(*)` which should return 0 as no rows have been inserted in the table.

In [45]:
try:
    cur.execute("SELECT COUNT(*) FROM music_library")
except psycopg2.Error as e:
    print("Error: Issue creating table")
    print(e)
print(cur.fetchall())

[(0,)]


### Insert two rows 

In [46]:
try:
    cur.execute("INSERT INTO music_library (album_name, artist_name, year) \
                 VALUES (%s, %s, %s)",\
                ("Let It Be", "The Beatles", 1970))
except:
    print("Error: Inserting Rows")
    print(e)
    
try:
    cur.execute("INSERT INTO music_library (album_name, artist_name, year) \
                 VALUES (%s, %s, %s)",\
                ("Rubber Soul", "The Beatles", 1965))
except:
    print("Error: Inserting Rows")
    print(e)

### Validate your data was inserted into the table. 
The while loop is used for printing the results. If executing queries in the Postgres shell, this would not be required.

### Note: If you run the insert statement code more than once, you will see duplicates of your data. PostgreSQL allows for duplicates.

In [47]:
try: 
    cur.execute("SELECT * FROM music_library;")
except psycopg2.Error as e: 
    print("Error: select *")
    print (e)

row = cur.fetchone()
while row:
   print(row)
   row = cur.fetchone()

('Let It Be', 'The Beatles', 1970)
('Rubber Soul', 'The Beatles', 1965)


### Drop the table to avoid duplicates and clean up

In [48]:
try: 
    cur.execute("DROP table music_library")
except psycopg2.Error as e: 
    print("Error: Dropping table")
    print (e)

###  Close the cursor and connection. 

In [49]:
cur.close()
conn.close()

## NoSQL Databases
### What is a NoSQL Database
"... has a simpler desing, simpler horizontal scaling and finer control of availabilty. Data structures used are different than those in Relational Database are makee some operators faster."

* NOSQL = NOT Only SQL' NoSQL and NonRelational are interchangeable terms
* Various types of NoSQL databases.
### Common Types of NoSQL Databases
* Apache Cassandra (Partition Row Store)
* MongoDB (Document store)
* DynamoDB (Key-Value store)
* Apache HBase(Wide Column Store)
* Neo4J (Graph Database)

### The Basic of Apache Cassandra
* Keyspace
    * Collection of Tables
* Table
    * A group of partitions
* Rows
    * A single item
* Partition
    * Fundamental unit of access
    * Collection of row
    * How data is distributed
* Primary Key
    * Primary Key is made up of a partition key and clustering columns
* Columns
    * Clustering and Data
    * Labeled element
### Question 1 of 2
True or False: A Keyspace in Apache Cassandra is similar to a schema in PostgreSQL
- [ ] True
- [x] False

### Question 2 of 2
Which of these are examples of non-relational databases?
- [ ] SQL
- [x] Apache Cassandra
- [ ] RDBMS
- [x] MongoDB

## What is Apache Cassandra?
"...**provides scalability** and **high availability** without compromising performance. Linear Scalability and proven **fault-tolerance** on commodity hardware or cloud infrastructure make it the perfect platform for mission-critical data.""

* Apache Cassanra uses its own query language CQL.

### Common Questions:
**What type of companies use Apache Cassandra?**
All kinds of companies. For example, Uber uses Apache Cassandra for their entire backend. Netflix uses Apache Cassandra to serve all their videos to customers. Good use cases for NoSQL (and more specifically Apache Cassandra) are :

1. Transaction logging (retail, health care)
2. Internet of Things (IoT)
3. Time series data
4. Any workload that is heavy on writes to the database (since Apache Cassandra is optimized for writes).

**Would Apache Cassandra be a hindrance for my analytics work? If yes, why?**
Yes, if you are trying to do analysis, such as using `GROUP BY` statements. Since Apache Cassandra requires data modeling based on the query you want, you can't do ad-hoc queries. However you can add clustering columns into your data model and create new tables.

### Quiz Queston
When should you use a NoSQL database?
- [x] Large amounts of data
- [ ] Need to be able to do aggregations
- [x] Need high availabilty
- [x] You need to be able to scale out quickly
- [ ] Need to be able to join multiple tables

## When to use a NoSQL Database
* **Need to be able to store different data type formats**: NoSQL was also created to handle different data configurations: structured, semi-structured, and unstructured data. JSON, XML documents can all be handled easily with NoSQL.
* **Large amounts of data**: Relational Databases are not distributed databases and because of this they can only scale vertically by adding more storage in the machine itself. NoSQL databases were created to be able to be horizontally scalable. The more servers/systems you add to the database the more data that can be hosted with high availability and low latency (fast reads and writes).
* **Need horizontal scalability**: Horizontal scalability is the ability to add more machines or nodes to a system to increase performance and space for data
* **Need high throughput**: While ACID transactions bring benefits they also slow down the process of reading and writing data. If you need very fast reads and writes using a relational database may not suit your needs.
* **Need a flexible schema**: Flexible schema can allow for columns to be added that do not have to be used by every row, saving disk space.
* **Need high availability**: Relational databases have a single point of failure. When that database goes down, a failover to a backup system must happen and takes time.

## When NOT to use a NoSQL Database?
* **When you have a small dataset**: NoSQL databases were made for big datasets not small datasets and while it works it wasn’t created for that.
* **When you need ACID Transactions**: If you need a consistent database with ACID transactions, then most NoSQL databases will not be able to serve this need. NoSQL database are eventually consistent and do not provide ACID transactions. However, there are exceptions to it. Some non-relational databases like MongoDB can support ACID transactions.
* **When you need the ability to do JOINS across tables**: NoSQL does not allow the ability to do JOINS. This is not allowed as this will result in full table scans.
* **If you want to be able to do aggregations and analytics**
* **If you have changing business requirements** : Ad-hoc queries are possible but difficult as the data model was done to fix particular queries
* **If your queries are not available and you need the flexibility** : You need your queries in advance. If those are not available or you will need to be able to have flexibility on how you query your data you might need to stick with a relational database

### Caveats to NoSQL and ACID Transactions

There are some NoSQL databases that offer some form of ACID transaction. As of v4.0, MongoDB added multi-document ACID transactions within a single replica set. With their later version, v4.2, they have added multi-document ACID transactions in a sharded/partitioned deployment.

* Check out this documentation from [MongoDB on multi-document ACID transactions](https://www.mongodb.com/collateral/mongodb-multi-document-acid-transactions)
* Here is another link documenting [MongoDB's ability to handle ACID transactions](https://www.mongodb.com/blog/post/mongodb-multi-document-acid-transactions-general-availability)
Another example of a NoSQL database supporting ACID transactions is MarkLogic.

* Check out this link from their [blog](https://www.marklogic.com/blog/how-marklogic-supports-acid-transactions/) that offers ACID transactions.

## Lesson 1 Demo 2: Creating a Table with Apache Cassandra

### Walk through the basics of Apache Cassandra:<br><li>Creating a table <li>Inserting rows of data<li>Running a simple SQL query to validate the information. 

### Use a python wrapper/ python driver called cassandra to run the Apache Cassandra queries. This library should be preinstalled but in the future to install this library you can run this command in a notebook to install locally: 
`! pip install cassandra-driver`<br>
More documentation can be found here:  https://datastax.github.io/python-driver/

### Import Apache Cassandra python package

In [None]:
import cassandra

### Create a connection to the database
1. Connect to the local instance of Apache Cassandra *['127.0.0.1']*.
2. The connection reaches out to the database (*studentdb*) and uses the correct privileges to connect to the database (*user and password = student*).
3. Once we get back the cluster object, we need to connect and that will create our session that we will use to execute queries.<BR><BR>
    
*Note 1:* This block of code will be standard in all notebooks

In [None]:
from cassandra.cluster import Cluster
try: 
    cluster = Cluster(['127.0.0.1']) #If you have a locally installed Apache Cassandra instance
    session = cluster.connect()
except Exception as e:
    print(e)
 

### Test the Connection and Error Handling Code
*Note:* The try-except block should handle the error: We are trying to do a `select *` on a table but the table has not been created yet.

In [None]:
try: 
    session.execute("""select * from music_libary""")
except Exception as e:
    print(e)
 

### Create a keyspace to the work in 
*Note:* We will ignore the Replication Strategy and factor information right now as those concepts are covered in depth in Lesson 3. Remember, this will be the strategy and replication factor on a one node local instance. 

In [None]:
try:
    session.execute("""
    CREATE KEYSPACE IF NOT EXISTS udacity 
    WITH REPLICATION = 
    { 'class' : 'SimpleStrategy', 'replication_factor' : 1 }"""
)

except Exception as e:
    print(e)

### Connect to our Keyspace.<br>
*Compare this to how a new session in PostgreSQL is created.*

In [None]:
try:
    session.set_keyspace('udacity')
except Exception as e:
    print(e)

### Begin with creating a Music Library of albums. Each album has a lot of information we could add to the music library table. We will  start with album name, artist name, year. 

### But ...Stop

### We are working with Apache Cassandra a NoSQL database. We can't model our data and create our table without more information.

### Think about what queries will you be performing on this data?

#### We want to be able to get every album that was released in a particular year. 
`select * from music_library WHERE YEAR=1970`

*To do that:* <ol><li> We need to be able to do a WHERE on YEAR. <li>YEAR will become my partition key,<li>artist name will be my clustering column to make each Primary Key unique. <li>**Remember there are no duplicates in Apache Cassandra.**</ol>

**Table Name:** music_library<br>
**column 1:** Album Name<br>
**column 2:** Artist Name<br>
**column 3:** Year <br>
PRIMARY KEY(year, artist name)


### Now to translate this information into a Create Table Statement. 
More information on Data Types can be found here: https://datastax.github.io/python-driver/<br>
*Note:* Again, we will go in depth with these concepts in Lesson 3.

In [None]:
query = "CREATE TABLE IF NOT EXISTS music_library "
query = query + "(year int, artist_name text, album_name text, PRIMARY KEY (year, artist_name))"
try:
    session.execute(query)
except Exception as e:
    print(e)


The query should run smoothly.

### Insert two rows 

In [None]:
query = "INSERT INTO music_library (year, artist_name, album_name)"
query = query + " VALUES (%s, %s, %s)"

try:
    session.execute(query, (1970, "The Beatles", "Let it Be"))
except Exception as e:
    print(e)
    
try:
    session.execute(query, (1965, "The Beatles", "Rubber Soul"))
except Exception as e:
    print(e)

### Validate your data was inserted into the table.
*Note:* The for loop is used for printing the results. If executing queries in the cqlsh, this would not be required.

*Note:* Depending on the version of Apache Cassandra you have installed, this might throw an "ALLOW FILTERING" error instead of printing the 2 rows that we just inserted. This is to be expected, as this type of query should not be performed on large datasets, we are only doing this for the sake of the demo.

In [None]:
query = 'SELECT * FROM music_library'
try:
    rows = session.execute(query)
except Exception as e:
    print(e)
    
for row in rows:
    print (row.year, row.album_name, row.artist_name)

### Validate the Data Model with the original query.

`select * from music_library WHERE YEAR=1970`

In [None]:
query = "select * from music_library WHERE YEAR=1970"
try:
    rows = session.execute(query)
except Exception as e:
    print(e)
    
for row in rows:
    print (row.year, row.album_name, row.artist_name)

### Drop the table to avoid duplicates and clean up. 

In [None]:
query = "drop table music_library"
try:
    rows = session.execute(query)
except Exception as e:
    print(e)
    

### Close the session and cluster connection

In [None]:
session.shutdown()
cluster.shutdown()