## Lesson 3 Exercise 2: Focus on Primary and Clustering Columns

### Walk through the basics of creating a table in Apache Cassandra, inserting rows of data, and doing a simple CQL query to validate the information. You will practice creating Primary columns and Clustering columns, which is an encouraged practice with Apache Cassandra. 

In [2]:
# Import Cassandra driver to perform operations on Cassandra DB
import cassandra

### Create a connection to the Cluster (equivalent to DB server in SQL)

In [3]:
from cassandra.cluster import Cluster
try:
    cluster = Cluster(['127.0.0.1'])
    session = cluster.connect()
except Exception as e:
    print(e)

### Create a keyspace we have to work with (equivalent to a Database in SQL)

In [4]:
try:
    session.execute("""
                    CREATE KEYSPACE IF NOT EXISTS udacity
                    WITH REPLICATION =
                    { 'class': 'SimpleStrategy', 'replication_factor': 1 } """)
except Exception as e:
    print(e)

### Connect to Keyspace. Comapre this with how we create a new session in Postgres

In [5]:
try:
    session.set_keyspace('udacity')
except Exception as e:
    print(e)

### We would like to create a New OLAP table to store the Music Library of albums

#### The following are the queries which the Business would like to use to analyze data.

#### 1. Give every album in the music library that was released in the given year
`SELECT * FROM music_library WHERE year = 1970`

### The Table contains the following columns, with Year as the Primary key

![music_library](images/image4.png)

### CREATE TABLE IN KEYSPACE: udacity

In [10]:
try:
    query = """ CREATE TABLE IF NOT EXISTS music_library
                (year int, month varchar, city varchar, artist_name varchar, album_name text, PRIMARY KEY(year)) """
    
    # CREATE TABLE MUSIC LIBRARY 
    session.execute(query)
    
except Exception as e:
    print(e)

### INSERT DATA INTO THE TABLES IN KEYSPACE: udacity

In [11]:
query = "INSERT INTO music_library (year, month, city, artist_name, album_name)"
query = query + " VALUES (%s, %s, %s, %s, %s)"

In [12]:
try:
    session.execute(query, (1965, "March", "Oxford", "The Beatles", "Rubber Soul"))
except Exception as e:
    print(e)
    
try:
    session.execute(query, (1970, "August", "Liverpool", "The Beatles", "Let it Be"))
except Exception as e:
    print(e)
    
try:
    session.execute(query, (1964, "January", "London", "The Beatles", "Beatles for Sale"))
except Exception as e:
    print(e)

try:
    session.execute(query, (1966, "June", "Los Angeles", "The Monkees", "The Monkees"))
except Exception as e:
    print(e)

try:
    session.execute(query, (1970, "July", "San Diego", "The Carpenters", "Close To You"))
except Exception as e:
    print(e)

### Validate the Data Model of the Music Library table

In [13]:
query = "select * from music_library where year = 1970"
try:
    rows = session.execute(query)
except Exception as e:
    print(e)
    
for row in rows:
    print(row.year, row.month, row.city, row.artist_name, row.album_name)

1970 July San Diego The Carpenters Close To You


<div class="alert alert-block alert-warning">
    <b> Analysis </b>
    <ol>
        <li> The Insert statement for the second record having '1970' as primary key was <b>NOT</b> successfully executed. </li>
    <li> Hence the select statement yielded only one record. </li>
    </ol>
    <b> We would have to take a re-look at our choice of Primary and Clustering Keys for the table </b>
</div>

### DROP THE MUSIC LIBRARY TABLE

In [17]:
dropTable('music_library')

### CREATE MODIFIED TABLE IN Keyspace WITH Primary key: year and Clustering key: month, artist_name

In [18]:
try:
    query = """ CREATE TABLE IF NOT EXISTS music_library
                (year int, month varchar, city varchar, artist_name varchar, album_name text, PRIMARY KEY((year), month, artist_name)) """
    
    # CREATE TABLE MUSIC LIBRARY 
    session.execute(query)
    
except Exception as e:
    print(e)

In [19]:
query = "INSERT INTO music_library (year, month, city, artist_name, album_name)"
query = query + " VALUES (%s, %s, %s, %s, %s)"

In [20]:
try:
    session.execute(query, (1965, "March", "Oxford", "The Beatles", "Rubber Soul"))
except Exception as e:
    print(e)
    
try:
    session.execute(query, (1970, "August", "Liverpool", "The Beatles", "Let it Be"))
except Exception as e:
    print(e)
    
try:
    session.execute(query, (1964, "January", "London", "The Beatles", "Beatles for Sale"))
except Exception as e:
    print(e)

try:
    session.execute(query, (1966, "June", "Los Angeles", "The Monkees", "The Monkees"))
except Exception as e:
    print(e)

try:
    session.execute(query, (1970, "July", "San Diego", "The Carpenters", "Close To You"))
except Exception as e:
    print(e)

### ADD ANOTHER RECORD INTO THE Music Library Table

In [21]:
try:
    session.execute(query, (1970, "July", "Cleveland", "Gonugunta's", "Jai Srimanarayana"))
except Exception as e:
    print(e)

### Validate the Data Model of the Music Library table

In [None]:
import pandas as pd

In [37]:
try:
    query = "SELECT * FROM music_library WHERE year = 1970"
    rows = session.execute(query)
    
    data = [list(row) for row in rows]
    df = pd.DataFrame(data,columns=['year', 'month', 'artist_name', 'city', 'album_name'])
    
except Exception as e:
    print(e)
    
df

Unnamed: 0,year,month,artist_name,city,album_name
0,1970,August,The Beatles,Let it Be,Liverpool
1,1970,July,Gonugunta's,Jai Srimanarayana,Cleveland
2,1970,July,The Carpenters,Close To You,San Diego


<div class="alert alert-block alert-success">
    <b> Analysis of the Result Set </b>
    <ol>
    <li> The Result set contains all the records with the WHERE statement (=1970). </li>
    <li> The result is in ascending order of the clustering column 'month' and 'artist_name'. </li>
    <li> 'month' column has 'August' followed with 'July' </li>
    <li> 'artist_name' column has 'Gonugunta's' followed by 'The Carpenters'. </li>
    </ol>
    <b> The Data Model looks in a better shape now! </b>
</div>

### DROP TABLE FUNCTION

In [16]:
# Drop table from the KeySpace
def dropTable(table):
    try:
        query = "DROP TABLE " + table
        session.execute(query)
    except Exception as e:
        print(e)

### Close the Session and Connection

In [38]:
session.shutdown()
cluster.shutdown()