# Installing cassandra-driver using pip

To run cassandra queries in python environment we will be using cassandra-driver.

`pip install cassandra-driver`

# Start server

Run below command on terminal to start server:

`cassandra -f`

# Importing package

In [1]:
import cassandra

# Connect to database

In [2]:
from cassandra.cluster import Cluster

try:
    cluster = Cluster(['127.0.0.1'])
    session = cluster.connect()
except Exception as e:
    print(e)

# Test connection

Testing connection by firing a select on non existing table. A cassandra error from below code ensures that our connection is woeking.

In [3]:
try: 
    session.execute("""select * from non_existing_table""")
except Exception as e:
    print(e)

Error from server: code=2200 [Invalid query] message="No keyspace has been specified. USE a keyspace, or explicitly specify keyspace.tablename"


# Create a keyspace to work in

Since we are running cassandra locally, we will set replication factor to 1.

In [4]:
try:
    session.execute("""
    CREATE KEYSPACE IF NOT EXISTS myspace
    WITH REPLICATION = {'class':'SimpleStrategy', 'replication_factor':1}
    """)
except Exception as e:
    print(e)

# Connect to our keyspace

In [5]:
try:
    session.set_keyspace('myspace')
except Exception as e:
    print(e)

# Creating table

We are trying to create a table to store a music library.

Table name: music_library<br>
Column 1: album_name<br>
Column 2: artist_name<br>
Column 3: year<br>
Primary key: year, artist_name<br>

Consider query to be performed on the table:
    `select * from music_library where ARTIST_NAME="The Beatles"`

What should be our primary key?

Since we want to filter using artist name let's start with artist name as the primary key.

From there we need to add other elemets to our primary key to make sure that our key is uniue.

Let's assume that all albums have different names so we can make a primary key of album name along with artist name. In real life we would spend good amount of time understanding our data and it's constraints to come up with the primary key candidates.

Supposw we wanted to sort our data by city then we would need to add city as our clustering column along with album name.

So our create table query will be:

In [13]:
query = "CREATE TABLE IF NOT EXISTS music_library "
query = query + "(year int, artist_name text, album_name text, city text, PRIMARY KEY (artist_name, album_name, city))"
try:
    session.execute(query)
except Exception as e:
    print(e)

# Inserting rows in our table

In [14]:
query = "INSERT INTO music_library (year, artist_name, album_name, city)"
query = query + " VALUES (%s, %s, %s, %s)"

try:
    session.execute(query, (1970, "The Beatles", "Let it Be", "Liverpool"))
except Exception as e:
    print(e)
    
try:
    session.execute(query, (1965, "The Beatles", "Rubber Soul", "Oxford"))
except Exception as e:
    print(e)
    
try:
    session.execute(query, (1964, "The Beatles", "Beatles For Sale", "London"))
except Exception as e:
    print(e)

try:
    session.execute(query, (1966, "The Monkees", "The Monkees", "Los Angeles"))
except Exception as e:
    print(e)

try:
    session.execute(query, (1970, "The Carpenters", "Close To You", "San Diego"))
except Exception as e:
    print(e)

# Validating insert

In [15]:
query = 'SELECT * FROM music_library'
try:
    rows = session.execute(query)
except Exception as e:
    print(e)
    
for row in rows:
    print (row.year, row.album_name, row.artist_name)

1970 Close To You The Carpenters
1964 Beatles For Sale The Beatles
1970 Let it Be The Beatles
1965 Rubber Soul The Beatles
1966 The Monkees The Monkees


We will never perform this type of query on a large dataset and this is just for demo.

# Filtering using artist name

In [16]:
query = "select * from music_library WHERE ARTIST_NAME='The Beatles'"
try:
    rows = session.execute(query)
except Exception as e:
    print(e)
    
for row in rows:
    print (row.artist_name, row.album_name, row.city, row.year)

The Beatles Beatles For Sale London 1964
The Beatles Let it Be Liverpool 1970
The Beatles Rubber Soul Oxford 1965


# Filtering using album name

In [17]:
query = "select * from music_library WHERE ALBUM_NAME='The Monkees'"
try:
    rows = session.execute(query)
except Exception as e:
    print(e)
    
for row in rows:
    print (row.artist_name, row.album_name, row.city, row.year)

Error from server: code=2200 [Invalid query] message="Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING"


We get an error because our clustering is based on artist name and we haven't used it in the query above.

# Filtering using artist name and album name

In [25]:
query = "select * from music_library WHERE ALBUM_NAME='Let it Be' and ARTIST_NAME='The Beatles'"
try:
    rows = session.execute(query)
except Exception as e:
    print(e)
    
for row in rows:
    print (row.artist_name, row.album_name, row.city, row.year)

The Beatles Let it Be Liverpool 1970


This works perfectly.

# Filtering using year along with artist name

In [26]:
query = "select * from music_library WHERE ARTIST_NAME='The Beatles' and YEAR=1970"
try:
    rows = session.execute(query)
except Exception as e:
    print(e)
    
for row in rows:
    print (row.year, row.album_name, row.artist_name)

Error from server: code=2200 [Invalid query] message="Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING"


This doesn't work as year is not part of our clustering column

# Dropping our created table

In [27]:
query = "drop table music_library"
try:
    rows = session.execute(query)
except Exception as e:
    print(e)

# Closing session and cluster

In [28]:
session.shutdown()
cluster.shutdown()