# Lesson 3 Exercise 3: Focus on Clustering Columns
<img src="images/cassandralogo.png" width="250" height="250">

### Walk through the basics of creating a table with a good Primary Key and Clustering Columns in Apache Cassandra, inserting rows of data, and doing a simple CQL query to validate the information. 

### Remember, replace ##### with your own code.

Note: __Do not__ click the blue Preview button in the lower task bar

#### We will use a python wrapper/ python driver called cassandra to run the Apache Cassandra queries. This library should be preinstalled but in the future to install this library you can run this command in a notebook to install locally: 
! pip install cassandra-driver
#### More documentation can be found here:  https://datastax.github.io/python-driver/

#### Import Apache Cassandra python package

In [1]:
# We are going to use Python driver to communicate with the Cassandra NoSQL db
import cassandra

### Create a connection to the database

In [2]:
from cassandra.cluster import Cluster

# Create a connection the database
# We will use local IP address; since we have a locally installed Apache cassandra instance
cluster = Cluster(['127.0.0.1'])

In [3]:
# Create a session to execute inside it our queries
session = cluster.connect()

### Create a keyspace to work in 

In [4]:
# A keyspace is the top-level database object 
# that controls the replication for the object 
# it contains at each datacenter in the cluster.

# Keyspaces contain tables, materialized views and user-defined types, 
# functions and aggregates. 
# Typically, a cluster has one keyspace per application.

session.execute("""
    CREATE KEYSPACE IF NOT EXISTS udacity
    WITH REPLICATION = 
        {'class' : 'SimpleStrategy', 'replication_factor' : 1}"""
)

<cassandra.cluster.ResultSet at 0x158548af550>

#### Connect to the Keyspace. Compare this to how we had to create a new session in PostgreSQL.  

In [5]:
session.set_keyspace('udacity')

### Imagine we would like to start creating a new Music Library of albums. 

### We want to ask 1 question of our data:
### 1. Give me all the information from the music library about a given album
`select * from album_library WHERE album_name="Close To You"`

### Here is the data:
<img src="images/table4.png" width="650" height="350">

### How should we model this data? What should be our Primary Key and Partition Key? 

Obviously, we should start the **Primary Key** with the `album_name` column as the **Partition Key**, then we can use the `artist_name` and `city` as the **Clustering Columns**.

**Since we are looking in our data for the `album_name`, let's start with that. From there, we will need to add other elements to make sure the key is unique. We also need to add the `city` and `artist_name` as `Clustering Columns` to sort the data. That should be enough to make the row key unique.**

### Create the `music_library` table with the composite key

In [6]:
# Set the query
query = "CREATE TABLE IF NOT EXISTS album_library "
query += "(year INT, city TEXT, artist_name TEXT, album_name TEXT, PRIMARY KEY (album_name, artist_name, city)) "

# Execute the query and create the table
session.execute(query)

<cassandra.cluster.ResultSet at 0x158548cff10>

### Insert data into the table

In [7]:
# Set the query
query = "INSERT INTO album_library "
query += "(year, city, artist_name, album_name) "
query += "VALUES (%s, %s, %s, %s)"

# Insert the data
session.execute(query, (1965, 'Oxford', 'The Beatles', 'Rubber Soul'))
session.execute(query, (1970, 'Liverpool', 'The Beatles', 'Let it Be'))
session.execute(query, (1966, 'Los Angeles', 'The Monkees', 'The Monkees'))
session.execute(query, (1970, 'San Diego', 'The Carpenters', 'Close To You'))
session.execute(query, (1964, 'London', 'The Beatles', 'Beatles For Sale'))

<cassandra.cluster.ResultSet at 0x158549356a0>

### Validate the Data Model -- Did it work? 
`select * from album_library WHERE album_name="Close To You"`

In [8]:
# Set the query
query = "SELECT * FROM album_library WHERE album_name='Close To You'"

# Execute the query
rows = session.execute(query)

# Print the results
for row in rows:
    print(row.artist_name, '-', row.album_name, '-', row.city, '-', row.year)

The Carpenters - Close To You - San Diego - 1970


### Your output should be:
('The Carpenters', 'Close to You', 'San Diego', 1970)

### OR
('The Carpenters', 'Close to You', 1970, 'San Diego') 

### Drop the table

In [9]:
query = "DROP TABLE IF EXISTS album_library"
session.execute(query)

<cassandra.cluster.ResultSet at 0x15854901f10>

### Drop the Keyspace

In [10]:
query = "DROP KEYSPACE IF EXISTS udacity"
session.execute(query)

<cassandra.cluster.ResultSet at 0x158548e9b80>

### Close the session and cluster connection

In [11]:
session.shutdown()

In [12]:
cluster.shutdown()