# Setting up access to Keyspace in Jupyter notebook

## Prerequisites
Two files are required for accessing Keyspace in Jupyter notebook:  

1. Starfield digital certificate
2. AWS credentials for Keyspace

More details about the setup can be found in [Using a Cassandra Python client](https://docs.aws.amazon.com/keyspaces/latest/devguide/using_python_driver.html#python_SigV4).

### Starfield digital certificate
The digital certificate can be downloaded by:

In [None]:
!curl https://certs.secureserver.net/repository/sf-class2-root.crt -O

### AWS credentials
The AWS credentials file can be downloaded via Canvas.  Place this `credentials` file under `.aws/` folder on your local machine, or your EC2 instance, so that AWS knows where to look for the credentials.

**MAJOR NOTE**: **Do not** store this credentials anywhere that is publicly accessible, github, public S3 bucket, etc.  That is the primary reason why this file is only available on Canvas.

## Sample code for connection

1. Install `cassandra-sigv4` via the following command:

In [None]:
%pip install cassandra-sigv4

2. Set up a `boto3` session and a Cassandra cluster (the Python way of interacting with AWS).

In [None]:
from cassandra.cluster import Cluster
from ssl import SSLContext, PROTOCOL_TLSv1_2, CERT_REQUIRED
from cassandra_sigv4.auth import SigV4AuthProvider
import boto3

# ssl setup
ssl_context = SSLContext(PROTOCOL_TLSv1_2)
ssl_context.load_verify_locations('sf-class2-root.crt')  # change your file path for locating the certificate
ssl_context.verify_mode = CERT_REQUIRED

# boto3 session setup
boto_session = boto3.Session(region_name="us-east-2")  # this AWS credentials is specific to `us-east-2` region

In [None]:
# authorization setup with SigV4
auth_provider = SigV4AuthProvider(boto_session)

In [None]:
#cluster setup 
cluster = Cluster(['cassandra.us-east-2.amazonaws.com'], 
                  ssl_context=ssl_context, 
                  auth_provider=auth_provider, 
                  port=9142)  # TLS only communicates on port 9142

## Working with Cassandra (AWS Keyspace)

In [None]:
# establishing connection to Keyspace
session = cluster.connect()

In [None]:
# Insert any CQL queries between .connect() and .shutdown()

# For example, show all keyspaces created
r = session.execute('''
    SELECT * FROM system_schema.keyspaces;
    ''')
print(r.current_rows)

In [None]:
# For example, create a keyspace for HW2
r = session.execute('''
    CREATE KEYSPACE IF NOT EXISTS de300_demo 
    WITH replication = {'class': 'SingleRegionStrategy'};
    ''')
print(r.current_rows)

## Exercises

Let's first create a table within the keyspace 'de300-demo'. Note that when using Amazon Keyspaces, you must explicitly set an ExecutionProfile with LOCAL_QUORUM when creating your Cassandra Cluster in Python. Thus we set the credentials and reload the session

In [None]:
from cassandra.cluster import ExecutionProfile, EXEC_PROFILE_DEFAULT
from cassandra import ConsistencyLevel


# Define execution profile with LOCAL_QUORUM
execution_profile = ExecutionProfile(
    consistency_level=ConsistencyLevel.LOCAL_QUORUM
)

# Cluster setup with correct profile
cluster = Cluster(
    ['cassandra.us-east-2.amazonaws.com'],
    ssl_context=ssl_context,
    auth_provider=auth_provider,
    port=9142,
    execution_profiles={EXEC_PROFILE_DEFAULT: execution_profile}
)

# establishing connection to Keyspace
session = cluster.connect()
session.set_keyspace('de300_demo')  # Replace with your keyspace

Create a new table named 'github'

In [None]:
session.execute("""
CREATE TABLE IF NOT EXISTS github (
    id UUID PRIMARY KEY,
    name TEXT,
    username TEXT
)
""")

Insert data to the table (please replace the value below with your name and GitHub username)

In [None]:
import uuid

session.execute("""
    INSERT INTO github (id, name, username)
    VALUES (%s, %s, %s)
""", (uuid.uuid4(), "Your_Name", "Your_GitHub_Username"))

# Replace "Your_Name" with your name
# Replace "Your_GitHub_User_Name" with your real GitHub User

Export the table as a csv file

In [None]:
import csv

rows = session.execute("SELECT * FROM GitHub")
with open("github.csv", "w", newline="") as f:
    writer = csv.writer(f)
    writer.writerow(["id", "name", "username"])
    for row in rows:
        writer.writerow([row.id, row.name, row.username])

Create a folder named "Lab4" in your GitHub repository, and upload the csv file there.

## Shutdown your Cassandra connection

In [None]:
session.shutdown()