# L3 Exercise 1: Three Queries Three Tables
<img src="https://upload.wikimedia.org/wikipedia/commons/5/5e/Cassandra_logo.svg" width="250" height="250">

### Walk through the basics of creating a table in Apache Cassandra, inserting rows of data, and doing a simple CQL query to validate the information. You will practice Denormalization, and the concept of 1 table per query, which is an encouraged practice with Apache Cassandra. 

#### We will use a python wrapper/ python driver called cassandra to run the Apache Cassandra queries. This library should be preinstalled but in the future to install this library you can run this command in a notebook to install locally: 
! pip install cassandra-driver
#### More documentation can be found here:  https://datastax.github.io/python-driver/

<h3><span style='color:blue'>Using K8S Cassandra</span></h3>
​
Obviously you need a k8s avaible like: Minikube, Minishift, Docker (with K8s)
​
Helm is need to, go to [helm.sh](http://helm.sh)

In [1]:
#Checks if Helm V3 is available
helm_version = !helm version --short
assert helm_version[0][:2] == 'v3', "Expected HELM version not available, visit https://helm.sh"

In [2]:
!helm repo add bitnami https://charts.bitnami.com/bitnami

"bitnami" has been added to your repositories


In [3]:
CHART_INSTANCE_NAME = 'dend-l3e1'
CASSANDRA_PASSWORD = 'password'

In [4]:
%%writefile dend-cassandra-customize.yaml
service:
    type: NodePort
    nodePorts:
        cql: 30942
        rcp: 30160
dbUser:
    user: cassandra
    password: password

Overwriting dend-cassandra-customize.yaml


In [5]:
helm_chart_out = !helm install {CHART_INSTANCE_NAME} bitnami/cassandra --values dend-cassandra-customize.yaml
#for c_out in helm_chart_out: print(c_out)

In [16]:
!kubectl get pod,svc,pvc

NAME                        READY   STATUS    RESTARTS   AGE
pod/dend-l3e1-cassandra-0   1/1     Running   0          2m1s

NAME                                   TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                                        AGE
service/dend-l3e1-cassandra            NodePort    10.102.154.32   <none>        9042:30942/TCP,9160:31518/TCP                  2m1s
service/dend-l3e1-cassandra-headless   ClusterIP   None            <none>        7000/TCP,7001/TCP,7199/TCP,9042/TCP,9160/TCP   2m1s
service/kubernetes                     ClusterIP   10.96.0.1       <none>        443/TCP                                        46d

NAME                                               STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
persistentvolumeclaim/data-dend-l3e1-cassandra-0   Bound    pvc-b9ad3c7e-b318-42ab-af6b-88f10f272e25   8Gi        RWO            hostpath       2m1s


#### Import Apache Cassandra python package

In [17]:
import cassandra

### Create a connection to the database

In [18]:
# This should make a connection to a Cassandra instance your kubernetes instance

from cassandra.cluster import Cluster
from cassandra.auth import PlainTextAuthProvider

try: 
    # Added connection auth for bitnami / helm / cassandra bundle
    auth_provider = PlainTextAuthProvider(username='cassandra', password='password')
    cluster = Cluster(['127.0.0.1'], port=30942, auth_provider=auth_provider) #If you have a locally installed Apache Cassandra instance
    session = cluster.connect()
    print(session.hosts)
except Exception as e:
    print(f"Error: {e}")

[<Host: 127.0.0.1:30942 datacenter1>]


### Create a keyspace to work in

In [19]:
try:
    session.execute("""
    CREATE KEYSPACE IF NOT EXISTS udacity 
    WITH REPLICATION = 
    { 'class' : 'SimpleStrategy', 'replication_factor' : 1 }"""
)

except Exception as e:
    print(e)

#### Connect to our Keyspace. Compare this to how we had to create a new session in PostgreSQL.  

In [20]:
try:
    session.set_keyspace('udacity')
except Exception as e:
    print(e)

### Let's imagine we would like to start creating a Music Library of albums. 

### We want to ask 3 questions of our data
#### 1. Give every album in the music library that was released in a given year
`select * from music_library WHERE YEAR=1970`
#### 2. Give every album in the music library that was created by a given artist  
`select * from artist_library WHERE artist_name="The Beatles"`
#### 3. Give all the information from the music library about a given album
`select * from album_library WHERE album_name="Close To You"`


### Because we want to do three different queries, we will need different tables that partition the data differently.  
* The music library table will be partitioned by year that will become the partition key, and artist name will be the clustering column to make each Primary Key unique. 
* The artist library table will be partitioned by artist name that will be the partition key, and year will be the clustering column to make each Primary Key unique. More on Primary keys in the next lesson and demo. 
* The album library table will be partitioned by album name that will be the partition key, and artist will be the clustering colum to make each Primary Key unique. 

`Table Name: music_library
column 1: Year
column 2: Artist Name
column 3: Album Name
PRIMARY KEY(year, artist name)`

` Table Name: artist_library 
column 1: Artist Name
column 2: Year
column 3: Album Name
PRIMARY KEY (artist name, year)`

`Table Name: album_library 
column 1: Album Name
column 2: Artist Name
column 3: Year
PRIMARY KEY (album_name, artist_name)`

### Create the tables

In [21]:
def run_query(session, query, values = None):
    try:
        if values is None:
            rows = session.execute(query)
        else:
            rows = session.execute(query, values)
        return rows
    except Exception as e:
        print(e)

In [22]:
query = "CREATE TABLE IF NOT EXISTS music_library "
query = query + "(year int, artist_name text, album_name text, PRIMARY KEY (year, artist_name))"

run_query(session,query)

query = "CREATE TABLE IF NOT EXISTS artist_library "
query = query + "(artist_name text, year int, album_name text, PRIMARY KEY (artist_name,  year))"

run_query(session,query)    

query = "CREATE TABLE IF NOT EXISTS album_library "
query = query + "(artist_name text, album_name text, year int, PRIMARY KEY (album_name, artist_name))"

run_query(session,query)

<cassandra.cluster.ResultSet at 0x1061844a8>

### Insert data into the tables

In [23]:
query = "INSERT INTO music_library (year, artist_name, album_name)"
query = query + " VALUES (%s, %s, %s)"

query1 = "INSERT INTO artist_library (artist_name, year, album_name)"
query1 = query1 + " VALUES (%s, %s, %s)"

query2 = "INSERT INTO album_library (album_name, artist_name, year)"
query2 = query2 + " VALUES (%s, %s, %s)"


data_music_library = [
    (1970, "The Beatles", "Let it Be"),
    (1965, "The Beatles", "Rubber Soul"),
    (1965, "The Who", "My Generation"),
    (1966, "The Monkees", "The Monkees"),
    (1970, "The Carpenters", "Close To You")
]

data_artist_library = [
    ("The Beatles", 1970, "Let it Be"),
    ("The Beatles", 1965, "Rubber Soul"),
    ("The Who", 1965, "My Generation"),
    ("The Monkees", 1966, "The Monkees"),
    ("The Carpenters", 1970, "Close To You")
]

data_album_library = [
    ("Let it Be", "The Beatles", 1970),
    ("Rubber Soul", "The Beatles", 1965),
    ("My Generation", "The Who", 1965),
    ("The Monkees", "The Monkees", 1966),
    ("Close To You", "The Carpenters", 1970)
]

for citem in data_music_library:
    run_query(session,query,citem)
    
for citem in data_artist_library:
    run_query(session,query1,citem)

for citem in data_album_library:
    run_query(session,query2,citem)

### This might have felt unnatural to insert duplicate data into two tables. If I just normalized these tables, I wouldn't have to have extra copies! While this is true, remember there are no `JOINS` in Apache Cassandra. For the benefit of high availibity and scalabity denormalization must be how this is done. 


### Validate our Data Model

`select * from music_library WHERE YEAR=1970`

In [24]:
query = "select * from music_library WHERE YEAR=1970"
rows = run_query(session, query)
    
for row in rows:
    print (row.year, row.artist_name, row.album_name,)

1970 The Beatles Let it Be
1970 The Carpenters Close To You


### Validate our Data Model

`select * from artist_library WHERE ARTIST_NAME = "The Beatles"`

In [25]:
query = "select * from artist_library WHERE ARTIST_NAME='The Beatles'"
rows = run_query(session, query)
    
for row in rows:
    print (row.artist_name, row.year, row.album_name)

The Beatles 1965 Rubber Soul
The Beatles 1970 Let it Be


### Validate our Data Model

`select * from album_library WHERE album_name="Close To You"`

In [26]:
query = "select * from album_library WHERE ALBUM_NAME='Close To You'"
rows = run_query(session, query)
    
for row in rows:
    print (row.artist_name, row.year, row.album_name)

The Carpenters 1970 Close To You


### For the sake of the demo, drop the table. 

In [27]:
query = "drop table music_library"
run_query(session, query)

query = "drop table album_library"
run_query(session, query)

query = "drop table artist_library"
run_query(session, query)

<cassandra.cluster.ResultSet at 0x1034d3c18>

### Close the session and cluster connection

In [28]:
session.shutdown()
cluster.shutdown()

<h2><span style='color:blue'>Remove Environment</span></h2>

In [29]:
# Removes chart instances
!helm uninstall {CHART_INSTANCE_NAME}

release "dend-l3e1" uninstalled


In [30]:
# Removes persistent Volume
!kubectl get pvc|fgrep {CHART_INSTANCE_NAME}|cut -d ' '  -f1| xargs -t kubectl delete pvc

kubectl delete pvc data-dend-l3e1-cassandra-0
persistentvolumeclaim "data-dend-l3e1-cassandra-0" deleted


In [31]:
!kubectl get pvc

No resources found.
