# Demo 3 : Data Modeling, Creating a Table with Apache Cassandra

<img src="images/cassandra-logo.png" width="250" height="250">

## Walk through the basics of Apache Cassandra:
<br><li>Creating a table <li>Inserting rows of data<li>Running a simple SQL query to validate the information. 

### Use a python `wrapper/python` driver called cassandra to run the Apache Cassandra queries. This library should be preinstalled but in the future to install this library you can run this command in a notebook to install locally: 
`! pip install cassandra-driver`<br>

More documentation can be found here:  https://datastax.github.io/python-driver/

 `conda install -c anaconda cassandra-driver`
 
 Anaconda driver check this https://anaconda.org/anaconda/cassandra-driver

### Import Apache Cassandra python package

In [1]:
import cassandra

### Create a connection to the database
1. Connect to the local instance of Apache Cassandra *['127.0.0.1']*.
2. The connection reaches out to the database (*dataengineering*) and uses the correct privileges to connect to the database (*user and password*).
3. Once we get back the cluster object, we need to connect and that will create our session that we will use to execute queries.<BR><BR>
    
*Note 1:* This block of code will be standard in all notebooks

In [2]:
from cassandra.cluster import Cluster
try: 
    cluster = Cluster(['127.0.0.1']) #If you have a locally installed Apache Cassandra instance
    session = cluster.connect()
except Exception as e:
    print(e)
 

('Unable to connect to any servers', {'127.0.0.1:9042': ConnectionRefusedError(61, "Tried connecting to [('127.0.0.1', 9042)]. Last error: Connection refused")})


### Test the Connection and Error Handling Code
*Note:* The try-except block should handle the error: We are trying to do a `select *` on a table but the table has not been created yet.

In [3]:
try: 
    session.execute("""select * from en_sahih""")
except Exception as e:
    print(e)
 

Error from server: code=2200 [Invalid query] message="No keyspace has been specified. USE a keyspace, or explicitly specify keyspace.tablename"


### Create a keyspace to the work in 
*Note:* We will ignore the Replication Strategy and factor information right now. Remember, this will be the strategy and replication factor on a one node local instance. 

In [4]:
try:
    session.execute("""
    CREATE KEYSPACE IF NOT EXISTS quran 
    WITH REPLICATION = 
    { 'class' : 'SimpleStrategy', 'replication_factor' : 1 }"""
)

except Exception as e:
    print(e)

### Connect to our Keyspace.<br>
*Compare this to how a new session in PostgreSQL is created.*

In [5]:
try:
    session.set_keyspace('quran')
except Exception as e:
    print(e)

### We will create AL-QURAN, each surah has a lot of information we could add to the AL-QURAN database. We will design english translation by create quran_index, sura, aya, text.

### But ...Stop

### We are working with Apache Cassandra a NoSQL database. We can't model our data and create our table without more information.

### Think about what queries will you be performing on this data?

<img src="images/quran-nosql-schema.png" width="500" height="500">

<img src="images/cassandra-architecture.png">

#### We want to be able to get every Ayah translations that was released in a particular Surah. 
`select * from en_sahih WHERE surah=1`

*To do that:* <ol>
    <li> We need to be able to do a WHERE on surah. 
    <li>Surah will become my partition key,
    <li>Ayah will be my clustering column to make each Primary Key unique. 
    <li>**Remember there are no duplicates in Apache Cassandra.**</ol>

`Table Name: English Saheeh International 
column 1: Quran Index
column 2: Surah
column 3: Ayah
column 4: Text Translation`

PRIMARY KEY(surah, ayah)


### Now to translate this information into a Create Table Statement. 
More information on Data Types can be found here: https://datastax.github.io/python-driver/<br>
*Note:* Again, we will go in depth with these concepts in Lesson 3.

In [6]:
query = "drop table en_sahih"
try:
    rows = session.execute(query)
except Exception as e:
    print(e)

In [7]:
query = "CREATE TABLE IF NOT EXISTS en_sahih "
query = query + "(quran_index int, surah int, ayah int, text_translation_en text, PRIMARY KEY (surah, ayah))"
try:
    session.execute(query)
except Exception as e:
    print(e)

The query should run smoothly.

### Insert rows of data 

To get the Quran dataset here: 
https://github.com/langsari/quran-dataset

Insert a row of data

In [8]:
query = "INSERT INTO en_sahih (quran_index, surah, ayah, text_translation_en)"
query = query + " VALUES (%s, %s, %s, %s)"

try:
    session.execute(query, (1, 1, 1, 'In the name of Allah, the Entirely Merciful, the Especially Merciful.'))
except Exception as e:
    print(e)

Insert multiple rows of data

In [9]:
query = "INSERT INTO en_sahih (quran_index, surah, ayah, text_translation_en)"
query = query + " VALUES (%s, %s, %s, %s)"

try:
    session.execute(query, (2, 1, 2, '[All] praise is [due] to Allah, Lord of the worlds -'))
except Exception as e:
    print(e)
    
try:
    session.execute(query, (3, 1, 3, 'The Entirely Merciful, the Especially Merciful,'))
except Exception as e:
    print(e)
    
try:
    session.execute(query, (4, 1, 4, 'Sovereign of the Day of Recompense.'))
except Exception as e:
    print(e)
    
try:
    session.execute(query, (5, 1, 5, 'It is You we worship and You we ask for help.'))
except Exception as e:
    print(e)

try:
    session.execute(query, (6, 1, 6, 'Guide us to the straight path -'))
except Exception as e:
    print(e)
    
try:
    session.execute(query, (7, 1, 7, 'The path of those upon whom You have bestowed favor, not of those who have evoked [Your] anger or of those who are astray.'))
except Exception as e:
    print(e)


Insert multiple rows of data with different Surah

In [10]:
query = "INSERT INTO en_sahih (quran_index, surah, ayah, text_translation_en)"
query = query + " VALUES (%s, %s, %s, %s)"

try:
    session.execute(query, (8, 2, 1, 'Alif, Lam, Meem.'))
except Exception as e:
    print(e)
    
try:
    session.execute(query, (9, 2, 2, 'This is the Book about which there is no doubt, a guidance for those conscious of Allah -'))
except Exception as e:
    print(e)

### Validate your data was inserted into the table.
*Note:* The for loop is used for printing the results. If executing queries in the cqlsh, this would not be required.

*Note:* Depending on the version of Apache Cassandra you have installed, this might throw an "ALLOW FILTERING" error instead of printing the 2 rows that we just inserted. This is to be expected, as this type of query should not be performed on large datasets, we are only doing this for the sake of the demo.

In [11]:
query = 'SELECT * FROM en_sahih'
try:
    rows = session.execute(query)
except Exception as e:
    print(e)
    
for row in rows:
    print (row.quran_index, row.surah, row.ayah, row.text_translation_en)

1 1 1 In the name of Allah, the Entirely Merciful, the Especially Merciful.
2 1 2 [All] praise is [due] to Allah, Lord of the worlds -
3 1 3 The Entirely Merciful, the Especially Merciful,
4 1 4 Sovereign of the Day of Recompense.
5 1 5 It is You we worship and You we ask for help.
6 1 6 Guide us to the straight path -
7 1 7 The path of those upon whom You have bestowed favor, not of those who have evoked [Your] anger or of those who are astray.
8 2 1 Alif, Lam, Meem.
9 2 2 This is the Book about which there is no doubt, a guidance for those conscious of Allah -


### Validate the Data Model with the original query.

`select * from en_sahih WHERE surah=1`

In [12]:
query = "select * from en_sahih WHERE surah=1"
try:
    rows = session.execute(query)
except Exception as e:
    print(e)
    
for row in rows:
    print (row.quran_index, row.surah, row.ayah, row.text_translation_en)

1 1 1 In the name of Allah, the Entirely Merciful, the Especially Merciful.
2 1 2 [All] praise is [due] to Allah, Lord of the worlds -
3 1 3 The Entirely Merciful, the Especially Merciful,
4 1 4 Sovereign of the Day of Recompense.
5 1 5 It is You we worship and You we ask for help.
6 1 6 Guide us to the straight path -
7 1 7 The path of those upon whom You have bestowed favor, not of those who have evoked [Your] anger or of those who are astray.


In [13]:
query = "select * from en_sahih WHERE surah=1 AND ayah=1"
try:
    rows = session.execute(query)
except Exception as e:
    print(e)
    
for row in rows:
    print (row.quran_index, row.surah, row.ayah, row.text_translation_en)

1 1 1 In the name of Allah, the Entirely Merciful, the Especially Merciful.


### Drop the table to avoid duplicates and clean up. 

In [14]:
query = "drop table en_sahih"
try:
    rows = session.execute(query)
except Exception as e:
    print(e)
    

### Close the session and cluster connection

In [15]:
session.shutdown()
cluster.shutdown()