## Model Data using Cassandra


### Please just submit this notebook in the Submission and make sure each cell has been executed and output is clearly displayed.

### The aim of the project is to solve the three queries given below.

### Introduction

There is a music streaming app called SoundCloud, that has been using their music streaming app and collecting data on songs and user activity and their aim is to analyze this data especially understanding what songs users are listening to. Currently, they are not making use of a NoSQL db and they have the data stored as a CSV file, thus its difficult for them to query the data. So our task is to create a NoSQL database for helping them with the analysis.

#### Import Packages 

In [16]:
# from astrapy import DataAPIClient
import json
import pandas as pd
import cassandra
import csv
from astrapy import DataAPIClient


##  The image below is a screenshot of what the data appears like in the event_data.csv

<img src="event_data_image.jpg">

#### Creating a Cluster

In [3]:
# Task: Make a connection to the cassandra instance on your local machine(127.0.0.1) and 
# create a session to establish connection and begin executing queries

# Initialize the client

token = 'AstraCS:aKXTRTFNRlMNBJHnjvTIEKPD:7ea32ea24755cc6b990733ae5e8589497cb3900be482a04bec41beb0fdcfaf9b'

client = DataAPIClient(token)
db = client.get_database_by_api_endpoint(
  "https://e3b3a839-6ac8-4ebf-904d-38dba1f973d4-us-east-2.apps.astra.datastax.com"
)

print(f"Connected to Astra DB: {db.list_collection_names()}")

Connected to Astra DB: []


#### Create & Set Keyspace

In [30]:
db_admin = client.get_admin()
# Collecting Database information created on DataStax
lis_db = list(db_admin.list_databases())
print(len(lis_db))
print(f"Database ID: {lis_db[0].id}") # Database ID in DataStax

db_database_admin = db_admin.get_database_admin( 
    "https://e3b3a839-6ac8-4ebf-904d-38dba1f973d4-us-east-2.apps.astra.datastax.com"
)
Namespace = 'soundcloud'
# Creating a new Keyspace
if Namespace not in db_database_admin.list_keyspaces(): 
    db_database_admin.create_keyspace(Namespace) #Creates a Keyspace of name SoundCloud.
    print(f"Keyspaces created: {db_database_admin.list_keyspaces()}")
else:
    print(f"Keyspaces: {db_database_admin.list_keyspaces()}")

1
Database ID: e3b3a839-6ac8-4ebf-904d-38dba1f973d4
Keyspaces created: ['default_keyspace', 'soundcloud']


In [5]:
df = pd.read_csv('event_data.csv')
df.head() 

Unnamed: 0,artist_name,fname,gender,item_in_session_number,lname,length,level,location,session_number,song_title,user_id
0,Pavement,Sylvie,F,0,Cruz,99.16036,free,"Washington-Arlington-Alexandria, DC-VA-MD-WV",345,Mercy:The Laundromat,10
1,Barry Tuckwell/Academy of St Martin-in-the-Fie...,Celeste,F,1,Williams,277.15873,free,"Klamath Falls, OR",438,Horn Concerto No. 4 in E flat K495: II. Romanc...,53
2,Gary Allan,Celeste,F,2,Williams,211.22567,free,"Klamath Falls, OR",438,Nothing On But The Radio,53
3,Charttraxx Karaoke,Celeste,F,3,Williams,225.17506,free,"Klamath Falls, OR",438,Fireflies,53
4,The Libertines,Jacqueline,F,1,Lynch,179.53914,paid,"Atlanta-Sandy Springs-Roswell, GA",389,The Good Old Days,29


In [29]:
# Dropping Namespace
if Namespace in db_database_admin.list_keyspaces():
    db_database_admin.drop_keyspace(Namespace)
    print(f'{Namespace} dropped successfully\nKeyspaces: {db_database_admin.list_keyspaces()}')
else:
    print(f'{Namespace} does not exist in Keyspaces: {db_database_admin.list_keyspaces()}')

SoundCloud dropped successfully
Keyspaces: ['default_keyspace']


## List of Queries 

### 1. Find the artist_name, song_title and length of song the SoundCloud app history that was heard during  session_number = 338, and item_in_session_number  = 4


### 2. Find the artist_name, song_title (sorted by item_in_session_number) and name(fname and lname) of the user for user_id = 10, session_number = 182
    

### 3. Find every name(fname and lname) of the user from the SoundCloud app history that listened to the song_title 'All Hands Against His Own'




### Query1 Table1: How should we model this data? Think about what should be our Primary Key/Partition Key/Clustering Key

In [None]:
## Task: Query 1: Find the artist_name, song_title and length of song the SoundCloud app history 
## that was heard during session_number = 338, and item_in_session_number = 4
## make use of create table command        
           

### Let's insert our data into of table

In [None]:
# We have provided part of the code to set up the CSV file. Please complete the Apache Cassandra code below#
file_name = 'event_data.csv'

with open(file_name, encoding = 'utf8') as f:
    csv_reader = csv.reader(f)
    next(csvreader) # skip the header in the csv file
    for row in csv_reader:
## Task: Write the INSERT statements and assign it to the query variable
        query = "<Place your insert statement and assign the values here>"
        
        ## Task: Match the column in the csv file to the column in the INSERT statement.
        ## e.g., if you want to INSERT gender from csv file into the database you will use row[2]
        ## e.g., if you want to INSERT location from csv file into database you will use row[7]
        session.execute(query, (row[], row[]))

### Validate our Data Model using a SELECT

In [None]:
## Task: Make use of the SELECT statement and for loop to check if your query works and display the results

### Query2 Table2: How should we model this data? Think about what should be our Primary Key/Partition Key/Clustering Key 

In [None]:
## Task: Query 2: Find the artist_name, song_title (sorted by item_in_session_number) and 
## name(fname and lname) of the user for user_id = 10, session_number = 182 
## make use of create table command                   

### Let's insert our data into of table

In [None]:
# We have provided part of the code to set up the CSV file. Please complete the Apache Cassandra code below#
file_name = 'event_data.csv'

with open(file_name, encoding = 'utf8') as f:
    csv_reader = csv.reader(f)
    next(csvreader) # skip the header in the csv file
    for row in csv_reader:
## Task: Write the INSERT statements and assign it to the query variable
        query = "<Place your insert statement and assign the values here>"
        
        ## Task: Match the column in the csv file to the column in the INSERT statement.
        ## e.g., if you want to INSERT gender from csv file into the database you will use row[2]
        ## e.g., if you want to INSERT location from csv file into database you will use row[7]
        session.execute(query, (row[], row[]))

### Validate our Data Model using a SELECT

In [None]:
## Task: Make use of the SELECT statement and for loop to check if your query works and display the results

### Query3 Table3: How should we model this data? Think about what should be our Primary Key/Partition Key/Clustering Key

In [None]:
## Task: Query 3: Find every name(first and lastname) of the user from the SoundCloud app history that listened 
## to the song_title 'All Hands Against His Own'
## make use of create table command                   

### Let's insert our data into of table

In [None]:
# We have provided part of the code to set up the CSV file. Please complete the Apache Cassandra code below#
file_name = 'event_data.csv'

with open(file_name, encoding = 'utf8') as f:
    csv_reader = csv.reader(f)
    next(csvreader) # skip the header in the csv file
    for row in csv_reader:
## Task: Write the INSERT statements and assign it to the query variable
        query = "<Place your insert statement and assign the values here>"
        
        ## Task: Match the column in the csv file to the column in the INSERT statement.
        ## e.g., if you want to INSERT gender from csv file into the database you will use row[2]
        ## e.g., if you want to INSERT location from csv file into database you will use row[7]
        session.execute(query, (row[], row[]))

### Validate our Data Model using a SELECT

In [None]:
## Task: Make use of the SELECT statement and for loop to check if your query works and display the results

### Drop the tables before closing out the sessions

### Close the session and cluster connection¶