# Lesson 3 Demo 4: Using the WHERE Clause
<img src="images/cassandralogo.png" width="250" height="250">

### In this exercise we are going to walk through the basics of using the WHERE clause in Apache Cassandra.

##### denotes where the code needs to be completed.

Note: __Do not__ click the blue Preview button in the lower task bar

#### We will use a python wrapper/ python driver called cassandra to run the Apache Cassandra queries. This library should be preinstalled but in the future to install this library you can run this command in a notebook to install locally: 
! pip install cassandra-driver
#### More documentation can be found here:  https://datastax.github.io/python-driver/

#### Import Apache Cassandra python package

In [1]:
# We are going to use Python driver to communicate with the Cassandra NoSQL db
import cassandra

### First let's create a connection to the database

In [2]:
from cassandra.cluster import Cluster

# Create a connection the database
# We will use local IP address; since we have a locally installed Apache cassandra instance
cluster = Cluster(['127.0.0.1'])

In [3]:
# Create a session to execute inside it our queries
session = cluster.connect()

### Let's create a keyspace to do our work in 

In [4]:
# A keyspace is the top-level database object 
# that controls the replication for the object 
# it contains at each datacenter in the cluster.

# Keyspaces contain tables, materialized views and user-defined types, 
# functions and aggregates. 
# Typically, a cluster has one keyspace per application.

session.execute("""
    CREATE KEYSPACE IF NOT EXISTS udacity
    WITH REPLICATION = 
        {'class' : 'SimpleStrategy', 'replication_factor' : 1}"""
)

<cassandra.cluster.ResultSet at 0x188956c1580>

#### Connect to our Keyspace. Compare this to how we had to create a new session in PostgreSQL.  

In [5]:
session.set_keyspace('udacity')

### Let's imagine we would like to start creating a new Music Library of albums. 
### We want to ask 4 question of our data
#### 1. Give me every album in my music library that was released in a 1965 year
`SELECT * FROM music_library WHERE year=1970`

#### 2. Give me the album that is in my music library that was released in 1965 by "The Beatles"
`SELECT * FROM music_library  WHERE year=1970 AND artist_name='The Beatles'`

#### 3. Give me all the albums released in a given year that was made in London
`SELECT * FROM music_library WHERE year=1965 AND city='London'`

#### 4. Give me the city that the album "Rubber Soul" was recorded
`SELECT city FROM music_library WHERE year=1965 AND arist_name='The Beatles' AND album_name='Rubber Soul'`

### Here is our Collection of Data
<img src="images/table3.png" width="650" height="350">

### How should we model this data? What should be our Primary Key and Partition Key? 

**Since our data is looking for the YEAR let's start with that. From there we will add clustering columns on Artist Name and Album Name.**

In [6]:
# Set query
query = "CREATE TABLE IF NOT EXISTS music_library "
query += "(year INT, city Text, artist_name TEXT, album_name TEXT, PRIMARY KEY (year, artist_name, album_name))"

# Execute the query and create the table
session.execute(query)

<cassandra.cluster.ResultSet at 0x188956f9550>

### Let's insert our data into of table

In [7]:
# Set the base of the query
query = "INSERT INTO music_library "
query += "(year, city, artist_name, album_name) "
query += "VALUES (%s, %s, %s, %s)"

# Insert the data
session.execute(query, (1965, 'Oxford', 'The Beatles', 'Rubber Soul'))
session.execute(query, (1970, 'Liverpool', 'The Beatles', 'Let it Be'))
session.execute(query, (1966, 'Los Angeles', 'The Monkees', 'The Monkees'))
session.execute(query, (1970, 'San Diego', 'The Carpenters', 'Close To You'))
session.execute(query, (1965, 'London', 'The Who', 'My Generation'))

<cassandra.cluster.ResultSet at 0x18895769640>

### Let's Validate our Data Model with our 4 queries.

Query 1: 

In [8]:
# Set the query
query = "SELECT * FROM music_library WHERE year=1970"

# Execute the query
rows = session.execute(query)

# Print the results
for row in rows:
    print(row)

Row(year=1970, artist_name='The Beatles', album_name='Let it Be', city='Liverpool')
Row(year=1970, artist_name='The Carpenters', album_name='Close To You', city='San Diego')


 Let's try the 2nd query.
 Query 2: 

In [9]:
# Set the query
query = "SELECT * FROM music_library  WHERE year=1970 AND artist_name='The Beatles'"

# Execute the query
rows = session.execute(query)

# Print the results
for row in rows:
    print(row)

Row(year=1970, artist_name='The Beatles', album_name='Let it Be', city='Liverpool')


### Let's try the 3rd query.
Query 3: 

In [10]:
# Set the query
query = "SELECT * FROM music_library WHERE year=1965 AND city='London'"

try:
    # Execute the query
    rows = session.execute(query)
except Exception as e:
    print(e)

# Print the results
for row in rows:
    print(row)

Error from server: code=2200 [Invalid query] message="Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING"


### Did you get an error? You can not try to access a column or a clustering column if you have not used the other defined clustering column. Let's see if we can try it a different way. 
Try Query 4: 



In [11]:
# Set the query
query = "SELECT city FROM music_library WHERE year=1965 AND artist_name='The Beatles' AND album_name='Rubber Soul'"

# Execute the query
rows = session.execute(query)

# Print the results
for row in rows:
    print(row)

Row(city='Oxford')


### Drop the table

In [12]:
query = "DROP TABLE IF EXISTS album_library"
session.execute(query)

<cassandra.cluster.ResultSet at 0x1889576e640>

### Drop the Keyspace¶

In [13]:
query = "DROP KEYSPACE IF EXISTS udacity"
session.execute(query)

<cassandra.cluster.ResultSet at 0x1889576ea60>

### And Finally close the session and cluster connection

In [14]:
session.shutdown()

In [15]:
cluster.shutdown()