# Lesson 3 Exercise 1: Three Queries Three Tables
<img src="images/cassandralogo.png" width="250" height="250">

### Walk through the basics of creating a table in Apache Cassandra, inserting rows of data, and doing a simple CQL query to validate the information. You will practice Denormalization, and the concept of 1 table per query, which is an encouraged practice with Apache Cassandra. 

### Remember, replace ##### with your answer.


Note: __Do not__ click the blue Preview button at the bottom

#### We will use a python wrapper/ python driver called cassandra to run the Apache Cassandra queries. This library should be preinstalled but in the future to install this library you can run this command in a notebook to install locally: 
! pip install cassandra-driver
#### More documentation can be found here:  https://datastax.github.io/python-driver/

#### Import Apache Cassandra python package

In [1]:
# We are going to use Python driver to communicate with the Cassandra NoSQL db
import cassandra

### Create a connection to the database

In [2]:
from cassandra.cluster import Cluster

# Create a connection the database
# We will use local IP address; since we have a locally installed Apache cassandra instance
cluster = Cluster(['127.0.0.1'])

In [3]:
# Create a session to execute inside it our queries
session = cluster.connect()

### Create a keyspace to work in

In [4]:
# A keyspace is the top-level database object 
# that controls the replication for the object 
# it contains at each datacenter in the cluster.

# Keyspaces contain tables, materialized views and user-defined types, 
# functions and aggregates. 
# Typically, a cluster has one keyspace per application.

session.execute("""
    CREATE KEYSPACE IF NOT EXISTS udacity
    WITH REPLICATION = 
        {'class' : 'SimpleStrategy', 'replication_factor' : 1}"""
)

<cassandra.cluster.ResultSet at 0x1e3f8280640>

#### Connect to our Keyspace. Compare this to how we had to create a new session in PostgreSQL.  

In [5]:
session.set_keyspace('udacity')

### Let's imagine we would like to start creating a Music Library of albums. 

### We want to ask 3 questions of the data
#### 1. Give every album in the music library that was released in a given year
`select * from music_library WHERE YEAR=1970`
#### 2. Give every album in the music library that was created by a given artist  
`select * from artist_library WHERE artist_name="The Beatles"`
#### 3. Give all the information from the music library about a given album
`select * from album_library WHERE album_name="Close To You"`


### Because we want to do three different queries, we will need three different tables that partition the data differently. 

### 1. music_library table

`select * from music_library WHERE YEAR=1970`

`Table Name: music_library
column 1: Year
column 2: Artist_Name
column 3: Album_Name
PRIMARY KEY (year, artist_name)`

* My music_library table will be partitioned by `year`, that will be my `Partition Key`, And the `artist_name` will be my clustering column; to make each `Primary Key` unique.

<img src="images/table1.png" width="350" height="350">

### 2. artist_library table

`select * from artist_library WHERE artist_name="The Beatles"`

`Table Name: artist_library
column 1: Artist_Name
column 2: Album_Name
column 3: Year
PRIMARY KEY (artist_name, year)`

* My artist_library table will be partitioned by `artist_name`, that will be my `Partition Key`, And the `year` will be my clustering column; to make each `Primary Key` unique.

<img src="images/table2.png" width="350" height="350">

### 3. album_library table

`select * from album_library WHERE album_name="Close To You"`

`Table Name: album_library
column 1: Album_Name
column 2: Artist_Name
column 3: Year
PRIMARY KEY (album_name, year)`

* My album_library table will be partitioned by `album_name`, that will be my `Partition Key`, And the `year` will be my clustering column; to make each `Primary Key` unique.

<img src="images/table0.png" width="550" height="550">

### TO-DO: Create the tables. 

In [6]:
# Table Name: music_library
# column 1: Year
# column 2: Artist_Name
# column 3: Album_Name
# PRIMARY KEY (year, artist_name)

# Set the query
query = "CREATE TABLE IF NOT EXISTS music_library "
query += "(year INT, artist_name TEXT, album_name TEXT, PRIMARY KEY (year, artist_name))"

# Execute the query
session.execute(query)

<cassandra.cluster.ResultSet at 0x1e3f82bf700>

In [7]:
# Table Name: artist_library
# column 1: Artist_Name
# column 2: Album_Name
# column 3: Year
# PRIMARY KEY (artist_name, year)

# Set the query
query = "CREATE TABLE IF NOT EXISTS artist_library "
query += "(artist_name TEXT, album_name TEXT, year INT, PRIMARY KEY (artist_name, year))"

# Execute the query
session.execute(query)

<cassandra.cluster.ResultSet at 0x1e3f82a1d30>

In [8]:
# Table Name: album_library
# column 1: Album_Name
# column 2: Artist_Name
# column 3: Year
# PRIMARY KEY (album_name, year)

# Set the query
query = "CREATE TABLE IF NOT EXISTS album_library "
query += "(album_name TEXT, artist_name TEXT, year INT, PRIMARY KEY (Album_Name, year))"

# Execute the query
session.execute(query)

<cassandra.cluster.ResultSet at 0x1e3f82a1790>

### TO-DO: Insert data into the tables

In [9]:
# Table Name: music_library
# column 1: Year
# column 2: Artist_Name
# column 3: Album_Name
# PRIMARY KEY (year, artist_name)

# Set query
query = "INSERT INTO music_library (year, artist_name, album_name) "
query += "VALUES (%s, %s, %s)"

# Execute the query and insert the data (We will insert THE SAME DATA IN ALL TABLES)
session.execute(query, (1970, "The Beatles", "Let it Be"))
session.execute(query, (1965, "The Beatles", "Rubber Soul"))
session.execute(query, (1965, "The Who", "My Generation"))
session.execute(query, (1966, "The Monkees", "The Monkees"))
session.execute(query, (1970, "The Carpenters", "Close To You"))

<cassandra.cluster.ResultSet at 0x1e3f832ea30>

In [10]:
# Table Name: artist_library
# column 1: Artist_Name
# column 2: Album_Name
# column 3: Year
# PRIMARY KEY (artist_name, year)

# Set query
query = "INSERT INTO artist_library (artist_name, album_name, year) "
query += "VALUES (%s, %s, %s)"

# Execute the query and insert two rows
session.execute(query, ("The Beatles", "Let it Be", 1970))
session.execute(query, ("The Beatles", "Rubber Soul", 1965))
session.execute(query, ("The Who", "My Generation", 1965))
session.execute(query, ("The Monkees", "The Monkees", 1966))
session.execute(query, ("The Carpenters", "Close To You", 1970))

<cassandra.cluster.ResultSet at 0x1e3f832b640>

In [11]:
# Table Name: album_library
# column 1: Artist_Name
# column 2: Year
# column 3: Album_Name
# PRIMARY KEY (album_name, year)

# Set query
query = "INSERT INTO album_library (album_name, artist_name, year) "
query += "VALUES (%s, %s, %s)"

# Execute the query and insert two rows
session.execute(query, ("Let it Be", "The Beatles", 1970))
session.execute(query, ("Rubber Soul", "The Beatles", 1965))
session.execute(query, ("My Generation", "The Who", 1965))
session.execute(query, ("The Monkees", "The Monkees", 1966))
session.execute(query, ("Close To You", "The Carpenters", 1970))

<cassandra.cluster.ResultSet at 0x1e3f832e250>

This might have felt unnatural to insert duplicate data into the tables. If I just normalized these tables, I wouldn't have to have extra copies! While this is true, remember there are no `JOINS` in Apache Cassandra. For the benefit of high availibity and scalabity, denormalization must be how this is done. 


### TO-DO: Validate the Data Model

In [12]:
# Set the query
query = "select * from music_library WHERE YEAR=1970"

# Execute the query
rows = session.execute(query)

# Print the results
for row in rows:
    print(row)

Row(year=1970, artist_name='The Beatles', album_name='Let it Be')
Row(year=1970, artist_name='The Carpenters', album_name='Close To You')


### Your output should be:
1970 The Beatles Let it Be<br>
1970 The Carpenters Close To You

### TO-DO: Validate the Data Model

**NOTICE THAT THE QUOTES INSIDE THE QUERY NEEDS TO BE SINGLE QUOTATION**

In [13]:
# Set the query

query = "SELECT * FROM artist_library WHERE artist_name='The Beatles'"

# Execute the query
rows = session.execute(query)

# Print the results
for row in rows:
    print(row)

Row(artist_name='The Beatles', year=1965, album_name='Rubber Soul')
Row(artist_name='The Beatles', year=1970, album_name='Let it Be')


### Your output should be:
The Beatles Rubber Soul 1965 <br>
The Beatles Let it Be 1970 

### TO-DO: Validate the Data Model

In [14]:
# Set the query
query = "SELECT * FROM album_library WHERE album_name='Close To You'"

# Execute the query
rows = session.execute(query)

# Print the results
for row in rows:
    print(row)

Row(album_name='Close To You', year=1970, artist_name='The Carpenters')


### Your output should be:
The Carpenters 1970 Close To You

### DROP THE TABLES

In [15]:
query = "DROP TABLE IF EXISTS music_library"
session.execute(query)

query = "DROP TABLE IF EXISTS artist_library"
session.execute(query)

query = "DROP TABLE IF EXISTS album_library"
session.execute(query)

<cassandra.cluster.ResultSet at 0x1e3f833afa0>

### Drop the keyspace

In [16]:
query = "DROP KEYSPACE IF EXISTS udacity"
session.execute(query)

<cassandra.cluster.ResultSet at 0x1e3f8326730>

### And finally close the session and cluster connection

In [17]:
session.shutdown()
cluster.shutdown()