# Cassandra & CQL Guide

We will use python 3.8 + Datastax driver and this jupyter notebook to run the codes (for convenience)

![title](img/Cassandra_Tutorials_02.png)

![title](img/Cassandra_Tutorials_03.png)

![title](img/Cassandra_Tutorials_04.png)

![title](img/Cassandra_Tutorials_05.png)

### Initalization

Run `cassandra` on cmd / terminal (single node)

In [1]:
from cassandra.cluster import Cluster

cluster = Cluster(port=9042)
session = cluster.connect()

In [2]:
session.execute("select release_version from system.local").all()

[Row(release_version='3.0.27')]

In [3]:
# Drop all incase you run this again
session.execute("DROP MATERIALIZED VIEW IF EXISTS snu.student_by_name")
session.execute("DROP TABLE IF EXISTS snu.student")

session.execute("DROP TABLE IF EXISTS snu.BKMS2")
session.execute("DROP TABLE IF EXISTS snu.lecture_student")

session.execute("DROP KEYSPACE IF EXISTS snu")

<cassandra.cluster.ResultSet at 0x1404e936fd0>

In [4]:
session.execute("SELECT keyspace_name FROM system_schema.keyspaces").all()

[Row(keyspace_name='system_auth'),
 Row(keyspace_name='system_schema'),
 Row(keyspace_name='system_distributed'),
 Row(keyspace_name='system'),
 Row(keyspace_name='system_traces')]

In [5]:
!nodetool status

Datacenter: datacenter1
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens       Owns (effective)  Host ID                               Rack
UN  127.0.0.1  227.52 KB  256          100.0%            dbc99623-ab6e-4807-9b9d-68f141db7221  rack1



### Basic queries

#### Create Keyspace

In [6]:
# Keyspace = Database
session.execute(
'''
CREATE KEYSPACE snu 
    WITH REPLICATION = { 
        'class' : 'SimpleStrategy', 
        'replication_factor' : 1
        } 
'''
)

<cassandra.cluster.ResultSet at 0x1404e9380a0>

#### Create Table

In [7]:
# Keyspace = Database
session.execute(
'''                
CREATE TABLE snu.student ( 
    year int,
    id int,
    name text, 
    dept text, 
    email text,
    PRIMARY KEY (year, id)
) WITH CLUSTERING ORDER BY (id ASC)
'''
)

<cassandra.cluster.ResultSet at 0x1404e5bf9a0>

In [8]:
response = session.execute("SELECT keyspace_name FROM system_schema.keyspaces")
response.all()

[Row(keyspace_name='system_auth'),
 Row(keyspace_name='system_schema'),
 Row(keyspace_name='snu'),
 Row(keyspace_name='system_distributed'),
 Row(keyspace_name='system'),
 Row(keyspace_name='system_traces')]

In [9]:
session.execute("SELECT table_name FROM system_schema.tables WHERE keyspace_name = 'snu'").all()

[Row(table_name='student')]

In [10]:
session.execute(
'''
SELECT * FROM system_schema.columns
WHERE keyspace_name = 'snu' AND table_name = 'student';
'''
).all()

[Row(keyspace_name='snu', table_name='student', column_name='dept', clustering_order='none', column_name_bytes=b'dept', kind='regular', position=-1, type='text'),
 Row(keyspace_name='snu', table_name='student', column_name='email', clustering_order='none', column_name_bytes=b'email', kind='regular', position=-1, type='text'),
 Row(keyspace_name='snu', table_name='student', column_name='id', clustering_order='asc', column_name_bytes=b'id', kind='clustering', position=0, type='int'),
 Row(keyspace_name='snu', table_name='student', column_name='name', clustering_order='none', column_name_bytes=b'name', kind='regular', position=-1, type='text'),
 Row(keyspace_name='snu', table_name='student', column_name='year', clustering_order='none', column_name_bytes=b'year', kind='partition_key', position=0, type='int')]

#### Insert records into the table

In [11]:
# Inserts need primary keys
session.execute(
'''
INSERT INTO snu.student (year, id, name, dept, email)
  VALUES(2019, 55555, 'Jun Ha Chun', 'ECE', 'nikriz@snu.ac.kr');
'''
)
session.execute(
'''
INSERT INTO snu.student (year, id, name, dept, email)
  VALUES(2020, 44444, 'Ji Soo Jang', 'DS', 'simonjisu@snu.ac.kr');
'''
)
session.execute(
'''
INSERT INTO snu.student (year, id, name, dept, email)
  VALUES(2020, 11111, 'Kim Kim Kim', 'ECE', 'kim@snu.ac.kr');
'''
)
session.execute(
'''
INSERT INTO snu.student (year, id, name, dept, email)
  VALUES(2019, 22222, 'Lee Lee Lee', 'DS', 'lee@snu.ac.kr');
'''
)
session.execute(
'''
INSERT INTO snu.student (year, id, name, dept, email)
  VALUES(2020, 33333, 'Park Park Park', 'DS', 'park@snu.ac.kr');
'''
)

<cassandra.cluster.ResultSet at 0x1404e960670>

In [12]:
session.execute("SELECT * FROM snu.student").all()

[Row(year=2019, id=22222, dept='DS', email='lee@snu.ac.kr', name='Lee Lee Lee'),
 Row(year=2019, id=55555, dept='ECE', email='nikriz@snu.ac.kr', name='Jun Ha Chun'),
 Row(year=2020, id=11111, dept='ECE', email='kim@snu.ac.kr', name='Kim Kim Kim'),
 Row(year=2020, id=33333, dept='DS', email='park@snu.ac.kr', name='Park Park Park'),
 Row(year=2020, id=44444, dept='DS', email='simonjisu@snu.ac.kr', name='Ji Soo Jang')]

#### Update a record + @

- You can dump your tables using `nodetool flush` in data folder in install directory 
- You can check table's id from system schema
- You can read the dumped file using `sstabledump`
- After update, delta is created
- You can compact using `nodetool compact`

In [13]:
!nodetool flush snu student

In [14]:
uuid = str(session.execute("SELECT id FROM system_schema.tables WHERE keyspace_name = 'snu'").all()[0].id).replace("-","")

In [15]:
!sstabledump C:\Cassandra\apache-cassandra-3.0.27\data\data\snu\student-{uuid}\me-1-big-Data.db -d

[2019]@0 Row[info=[ts=1664220528254716] ]: 22222 | [dept=DS ts=1664220528254716], [email=lee@snu.ac.kr ts=1664220528254716], [name=Lee Lee Lee ts=1664220528254716]
[2019]@60 Row[info=[ts=1664220528244706] ]: 55555 | [dept=ECE ts=1664220528244706], [email=nikriz@snu.ac.kr ts=1664220528244706], [name=Jun Ha Chun ts=1664220528244706]
[2020]@106 Row[info=[ts=1664220528250707] ]: 11111 | [dept=ECE ts=1664220528250707], [email=kim@snu.ac.kr ts=1664220528250707], [name=Kim Kim Kim ts=1664220528250707]
[2020]@167 Row[info=[ts=1664220528258707] ]: 33333 | [dept=DS ts=1664220528258707], [email=park@snu.ac.kr ts=1664220528258707], [name=Park Park Park ts=1664220528258707]
[2020]@213 Row[info=[ts=1664220528246707] ]: 44444 | [dept=DS ts=1664220528246707], [email=simonjisu@snu.ac.kr ts=1664220528246707], [name=Ji Soo Jang ts=1664220528246707]


In [16]:
session.execute(
'''
UPDATE snu.student
SET email = 'junha@snu.ac.kr'
WHERE year=2019 AND id = 55555 IF EXISTS;
'''
)

<cassandra.cluster.ResultSet at 0x1404e4b57c0>

In [17]:
session.execute("SELECT * FROM snu.student").all()

[Row(year=2019, id=22222, dept='DS', email='lee@snu.ac.kr', name='Lee Lee Lee'),
 Row(year=2019, id=55555, dept='ECE', email='junha@snu.ac.kr', name='Jun Ha Chun'),
 Row(year=2020, id=11111, dept='ECE', email='kim@snu.ac.kr', name='Kim Kim Kim'),
 Row(year=2020, id=33333, dept='DS', email='park@snu.ac.kr', name='Park Park Park'),
 Row(year=2020, id=44444, dept='DS', email='simonjisu@snu.ac.kr', name='Ji Soo Jang')]

In [18]:
!nodetool flush snu student

In [19]:
!sstabledump C:\Cassandra\apache-cassandra-3.0.27\data\data\snu\student-{uuid}\me-2-big-Data.db -d

[2019]@0 Row[info=[ts=-9223372036854775808] ]: 55555 | [email=junha@snu.ac.kr ts=1664220532906000]


In [20]:
!nodetool compact snu student

In [21]:
!dir C:\Cassandra\apache-cassandra-3.0.27\data\data\snu\student-{uuid}

 C 드라이브의 볼륨에는 이름이 없습니다.
 볼륨 일련 번호: B483-D382

 C:\Cassandra\apache-cassandra-3.0.27\data\data\snu\student-709315203dd111eda96abf4bb1932169 디렉터리

2022-09-27  오전 04:28    <DIR>          .
2022-09-27  오전 04:28    <DIR>          ..
2022-09-27  오전 04:28    <DIR>          backups
2022-09-27  오전 04:28                43 me-3-big-CompressionInfo.db
2022-09-27  오전 04:28               201 me-3-big-Data.db
2022-09-27  오전 04:28                 9 me-3-big-Digest.crc32
2022-09-27  오전 04:28                16 me-3-big-Filter.db
2022-09-27  오전 04:28                16 me-3-big-Index.db
2022-09-27  오전 04:28             4,838 me-3-big-Statistics.db
2022-09-27  오전 04:28                56 me-3-big-Summary.db
2022-09-27  오전 04:28               100 me-3-big-TOC.txt
               8개 파일               5,279 바이트
               3개 디렉터리  154,553,118,720 바이트 남음


#### Delete record from the table

In [22]:
session.execute(
'''
INSERT INTO snu.student (year, id, name, dept, email)
  VALUES(9999, 9999, 'Not human', 'X', 'X@snu.ac.kr');
'''
)

<cassandra.cluster.ResultSet at 0x1404e938100>

In [23]:
session.execute("DELETE FROM snu.student WHERE year = 9999")

<cassandra.cluster.ResultSet at 0x1404e536d90>

In [24]:
session.execute("SELECT * FROM snu.student").all()

[Row(year=2019, id=22222, dept='DS', email='lee@snu.ac.kr', name='Lee Lee Lee'),
 Row(year=2019, id=55555, dept='ECE', email='junha@snu.ac.kr', name='Jun Ha Chun'),
 Row(year=2020, id=11111, dept='ECE', email='kim@snu.ac.kr', name='Kim Kim Kim'),
 Row(year=2020, id=33333, dept='DS', email='park@snu.ac.kr', name='Park Park Park'),
 Row(year=2020, id=44444, dept='DS', email='simonjisu@snu.ac.kr', name='Ji Soo Jang')]

#### Empty, Drop the table / Drop the keyspace

In [25]:
#session.execute("TRUNCATE TABLE snu.student")

In [26]:
#session.execute("DROP TABLE snu.student")

In [27]:
#session.execute("DROP KEYSPACE snu")

#### Async

In [28]:
responses = []
for i in range(5):
    responses.append(session.execute_async("SELECT * FROM snu.student"))

In [29]:
[r.result().all() for r in responses]

[[Row(year=2019, id=22222, dept='DS', email='lee@snu.ac.kr', name='Lee Lee Lee'),
  Row(year=2019, id=55555, dept='ECE', email='junha@snu.ac.kr', name='Jun Ha Chun'),
  Row(year=2020, id=11111, dept='ECE', email='kim@snu.ac.kr', name='Kim Kim Kim'),
  Row(year=2020, id=33333, dept='DS', email='park@snu.ac.kr', name='Park Park Park'),
  Row(year=2020, id=44444, dept='DS', email='simonjisu@snu.ac.kr', name='Ji Soo Jang')],
 [Row(year=2019, id=22222, dept='DS', email='lee@snu.ac.kr', name='Lee Lee Lee'),
  Row(year=2019, id=55555, dept='ECE', email='junha@snu.ac.kr', name='Jun Ha Chun'),
  Row(year=2020, id=11111, dept='ECE', email='kim@snu.ac.kr', name='Kim Kim Kim'),
  Row(year=2020, id=33333, dept='DS', email='park@snu.ac.kr', name='Park Park Park'),
  Row(year=2020, id=44444, dept='DS', email='simonjisu@snu.ac.kr', name='Ji Soo Jang')],
 [Row(year=2019, id=22222, dept='DS', email='lee@snu.ac.kr', name='Lee Lee Lee'),
  Row(year=2019, id=55555, dept='ECE', email='junha@snu.ac.kr', name

![title](img/Cassandra_Tutorials_06.png)

![title](img/Cassandra_Tutorials_07.png)

### Primary keys

- You can't query by range with partition key.

- You are not recommended to query only by cluster key.

- You are very not recommended to query by non primary key.

- You must design query → table / primary key

In [30]:
session.execute("SELECT * FROM snu.student WHERE year > 2019").all()

InvalidRequest: Error from server: code=2200 [Invalid query] message="Only EQ and IN relation are supported on the partition key (unless you use the token() function)"

In [33]:
session.execute("SELECT * FROM snu.student WHERE id = 20000").all()

InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING"

In [31]:
session.execute("SELECT * FROM snu.student WHERE id = 20000 ALLOW FILTERING").all() 

[]

In [32]:
session.execute("SELECT * FROM snu.student WHERE year = 2019 AND id = 20000").all()

[]

In [34]:
session.execute("SELECT * FROM snu.student WHERE year = 2019 AND id > 20000").all()

[Row(year=2019, id=22222, dept='DS', email='lee@snu.ac.kr', name='Lee Lee Lee'),
 Row(year=2019, id=55555, dept='ECE', email='junha@snu.ac.kr', name='Jun Ha Chun')]

### Materialized view

In [35]:
session.execute(
'''
CREATE MATERIALIZED VIEW snu.student_by_name
AS SELECT year, id, dept, email, name
FROM snu.student
WHERE name IS NOT NULL AND id IS NOT NULL
PRIMARY KEY (name, year, id);
'''
)

<cassandra.cluster.ResultSet at 0x1404fa5cd30>

In [36]:
session.execute("SELECT * FROM snu.student_by_name").all()

[Row(name='Lee Lee Lee', year=2019, id=22222, dept='DS', email='lee@snu.ac.kr'),
 Row(name='Park Park Park', year=2020, id=33333, dept='DS', email='park@snu.ac.kr'),
 Row(name='Ji Soo Jang', year=2020, id=44444, dept='DS', email='simonjisu@snu.ac.kr'),
 Row(name='Kim Kim Kim', year=2020, id=11111, dept='ECE', email='kim@snu.ac.kr'),
 Row(name='Jun Ha Chun', year=2019, id=55555, dept='ECE', email='junha@snu.ac.kr')]

#### Some more

- LIMIT
- ORDER BY
- AS
- IN
- Secondary index (not recommended)

In [37]:
session.execute("SELECT * FROM snu.student LIMIT 3").all()

[Row(year=2019, id=22222, dept='DS', email='lee@snu.ac.kr', name='Lee Lee Lee'),
 Row(year=2019, id=55555, dept='ECE', email='junha@snu.ac.kr', name='Jun Ha Chun'),
 Row(year=2020, id=11111, dept='ECE', email='kim@snu.ac.kr', name='Kim Kim Kim')]

In [38]:
session.execute("SELECT * FROM snu.student WHERE year = 2020 ORDER BY id DESC").all()

[Row(year=2020, id=44444, dept='DS', email='simonjisu@snu.ac.kr', name='Ji Soo Jang'),
 Row(year=2020, id=33333, dept='DS', email='park@snu.ac.kr', name='Park Park Park'),
 Row(year=2020, id=11111, dept='ECE', email='kim@snu.ac.kr', name='Kim Kim Kim')]

In [39]:
session.execute("SELECT year, id, dept AS department FROM snu.student").all()

[Row(year=2019, id=22222, department='DS'),
 Row(year=2019, id=55555, department='ECE'),
 Row(year=2020, id=11111, department='ECE'),
 Row(year=2020, id=33333, department='DS'),
 Row(year=2020, id=44444, department='DS')]

In [40]:
session.execute("SELECT * FROM snu.student WHERE year IN (2019, 2020)").all()

[Row(year=2019, id=22222, dept='DS', email='lee@snu.ac.kr', name='Lee Lee Lee'),
 Row(year=2019, id=55555, dept='ECE', email='junha@snu.ac.kr', name='Jun Ha Chun'),
 Row(year=2020, id=11111, dept='ECE', email='kim@snu.ac.kr', name='Kim Kim Kim'),
 Row(year=2020, id=33333, dept='DS', email='park@snu.ac.kr', name='Park Park Park'),
 Row(year=2020, id=44444, dept='DS', email='simonjisu@snu.ac.kr', name='Ji Soo Jang')]

In [41]:
session.execute("CREATE INDEX student_by_email ON snu.student (email)")

<cassandra.cluster.ResultSet at 0x1404fe500d0>

In [42]:
session.execute("SELECT * FROM snu.student WHERE email = 'junha@snu.ac.kr'").all()

[Row(year=2019, id=55555, dept='ECE', email='junha@snu.ac.kr', name='Jun Ha Chun')]

#### Some more 2

- Referential integrity
- Join
- Bulk insertion

In [43]:
# Keyspace = Database
session.execute(
'''                
CREATE TABLE snu.BKMS2 ( 
    year int,
    id int,
    name text, 
    grade text,
    PRIMARY KEY (year, id)
) WITH CLUSTERING ORDER BY (id ASC)
'''
)

<cassandra.cluster.ResultSet at 0x1404e493a00>

In [44]:
session.execute(
'''
INSERT INTO snu.BKMS2 (year, id, name, grade)
  VALUES(2019, 55555, 'Jun Ha Chun', 'A');
'''
)
session.execute(
'''
INSERT INTO snu.BKMS2 (year, id, name, grade)
  VALUES(2020, 44444, 'Ji Soo Jang', 'A');
'''
)
session.execute(
'''
INSERT INTO snu.BKMS2 (year, id, name, grade)
  VALUES(2021, 88888, 'Choi Choi Choi', 'C');
'''
)
session.execute(
'''
INSERT INTO snu.BKMS2 (year, id, name, grade)
  VALUES(2020, 6666, 'Jung Jung Jung', 'B');
'''
)

<cassandra.cluster.ResultSet at 0x140504857f0>

In [45]:
session.execute(
'''   
BEGIN BATCH
  DELETE FROM snu.student WHERE year = 2019 AND id = 55555
  DELETE FROM snu.BKMS2 WHERE year = 2019 AND id = 55555
APPLY BATCH
'''
)

<cassandra.cluster.ResultSet at 0x140504855e0>

In [46]:
rows = session.execute("SELECT * FROM snu.student WHERE year = 2020").all()
join = []
for row in rows:
    result = session.execute("SELECT * FROM snu.BKMS2 WHERE year = {} AND id = {}".format(row.year, row.id)).one()
    if result:
        join.append(result)
join

[Row(year=2020, id=44444, grade='A', name='Ji Soo Jang')]

In [47]:
session.execute(
'''                
CREATE TABLE snu.lecture_student ( 
    year int,
    id int,
    name text, 
    dept text, 
    email text,
    lecture text,
    grade text,
    PRIMARY KEY (year, id)
) WITH CLUSTERING ORDER BY (id ASC)
'''
)

<cassandra.cluster.ResultSet at 0x1404fa66760>

#### Bulk insertion

- Cassandra tools

`sstableloader --nodes [ip] [path to the sstable backup file]`

- cqlsh https://docs.datastax.com/en/cql-oss/3.3/cql/cql_reference/cqlshCopy.html?hl=copy

`COPY [table_name] (columns, ) TO [path] WITH HEADER = TRUE/FALSE;`

`COPY [table_name] (columns, ) FROM [path] WITH HEADER = TRUE/FALSE;`

- Python driver

use asynchronous insert queries

or try https://docs.datastax.com/en/dsbulk/docs/reference/dsbulkCmd.html

In [48]:
# in sqlsh
# export
# COPY snu.student TO './data_export.csv' WITH HEADER = FALSE;
# import
# COPY snu.student FROM './data_export.csv' WITH HEADER = FALSE;

In [49]:
responses = []
with open("./data.csv", "r") as f:
    # next(f) # for skipping header
    for row in f:
        row = row.split(",")
        response = session.execute_async("INSERT INTO snu.student (year, id, name, dept, email) VALUES ({}, {}, {}, {}, {})"\
                                       .format(row[0], row[1], row[2], row[3], row[4]))
        responses.append(response)

In [50]:
responses[0]

<ResponseFuture: query='<SimpleStatement query="INSERT INTO snu.student (year, id, name, dept, email) VALUES (2019,  55555,  'Jun Ha Chun',  'ECE',  'nikriz@snu.ac.kr')", consistency=Not Set>' request_id=124 result=None exception=None coordinator_host=127.0.0.1:9042>

### Distributed Setting
- Docker

```
docker run --name cassandra-1 -p 9042:9042 -d cassandra:3.0
$INSTANCE1=$(docker inspect --format="{{ .NetworkSettings.IPAddress }}" cassandra-1)
echo "Instance 1: ${INSTANCE1}"

docker run --name cassandra-2 -p 9043:9042 -d -e CASSANDRA_SEEDS=$INSTANCE1 cassandra:3.0
$INSTANCE2=$(docker inspect --format="{{ .NetworkSettings.IPAddress }}" cassandra-2)
echo "Instance 2: ${INSTANCE2}"
echo "Wait 60s until the second node joins the cluster"
sleep 60

docker run --name cassandra-3 -p 9044:9042 -d -e CASSANDRA_SEEDS=$INSTANCE1,$INSTANCE2 cassandra:3.0
$INSTANCE3=$(docker inspect --format="{{ .NetworkSettings.IPAddress }}" cassandra-3)

docker exec cassandra-3 nodetool status

docker exec -it cassandra-1 cqlsh
```

- Astra

follow instructions from https://astra.datastax.com/register and you will get to create keyspace there

save key dictionary to `./key.txt`

download SCB to `./` (Your database - connect - python - download bundle)

(If you want, you can use other connection methods for your implementation)

*nodetool, materialized views are not supported

*Free $25 credit / month

![title](img/Astra_01.png)

In [51]:
from cassandra.auth import PlainTextAuthProvider
import json

key = {}
with open('key.txt') as f:
    key = json.load(f)

cloud_config= {
        'secure_connect_bundle': './secure-connect-cassandra-tutorial.zip' # modify with your bundle file name
}
auth_provider = PlainTextAuthProvider(key['clientID'], key['clientSecret'])
cluster = Cluster(cloud=cloud_config, auth_provider=auth_provider)
session = cluster.connect()

session.execute("select release_version from system.local").one()

Row(release_version='4.0.0.6816')

In [52]:
session.execute("SELECT * FROM system.peers").all()

[Row(peer='172.24.35.2', data_center='asia-south1', dse_version=None, graph=None, host_id=UUID('b89edb89-3de0-4228-8046-499e8a4a2e75'), jmx_port=7199, native_transport_address='172.24.35.2', native_transport_port=29042, native_transport_port_ssl=None, preferred_ip='172.24.35.2', rack='rack1', release_version='4.0.0.6816', rpc_address='172.24.35.2', schema_version=UUID('ed9c7d5a-52a8-3ca1-a3da-7b3e59928b2a'), server_id=None, storage_port=7000, storage_port_ssl=7001, tokens=SortedSet(['-3973216027162921237', '-4824212584856678155', '-5641161138316102174', '-6076354814903314155', '-7880102670609790399', '4513513936355413163', '666993213066548225', '7146936265093438770']), workload=None, workloads=None),
 Row(peer='172.24.26.2', data_center='asia-south1', dse_version=None, graph=None, host_id=UUID('36491cd1-63ba-428b-85a2-1c08c1c6a661'), jmx_port=7199, native_transport_address='172.24.26.2', native_transport_port=29042, native_transport_port_ssl=None, preferred_ip='172.24.26.2', rack='rack

In [55]:
# Drop all incase you run this again
session.execute("DROP TABLE IF EXISTS snu.student")

<cassandra.cluster.ResultSet at 0x1404fa11fa0>

### Replication Factor


![title](img/Lecture_01.png)

In [56]:
# Keyspace is already created at Astra dashboard
"""
session.execute(
'''
CREATE KEYSPACE snu
  WITH REPLICATION = { 
   'class' : 'NetworkTopologyStrategy',
   'datacenter1' : 3 
  };
'''
)
"""

"\nsession.execute(\n'''\nCREATE KEYSPACE snu\n  WITH REPLICATION = { \n   'class' : 'NetworkTopologyStrategy',\n   'datacenter1' : 3 \n  };\n'''\n)\n"

In [57]:
session.execute("SELECT * FROM system_schema.keyspaces WHERE keyspace_name = 'snu'").all()

[Row(keyspace_name='snu', durable_writes=True, graph_engine=None, replication=OrderedMapSerializedKey([('asia-south1', '3'), ('class', 'org.apache.cassandra.locator.NetworkTopologyStrategy')]))]

In [58]:
session.execute(
'''                
CREATE TABLE snu.student ( 
    year int,
    id int,
    name text, 
    dept text, 
    email text,
    PRIMARY KEY (year, id)
) WITH CLUSTERING ORDER BY (id ASC)
'''
)

<cassandra.cluster.ResultSet at 0x140504a6850>

In [59]:
session.execute(
'''
INSERT INTO snu.student (year, id, name, dept, email)
  VALUES(2019, 55555, 'Jun Ha Chun', 'ECE', 'nikriz@snu.ac.kr');
'''
)
session.execute(
'''
INSERT INTO snu.student (year, id, name, dept, email)
  VALUES(2020, 44444, 'Ji Soo Jang', 'DS', 'simonjisu@snu.ac.kr');
'''
)
session.execute(
'''
INSERT INTO snu.student (year, id, name, dept, email)
  VALUES(2020, 11111, 'Kim Kim Kim', 'ECE', 'kim@snu.ac.kr');
'''
)
session.execute(
'''
INSERT INTO snu.student (year, id, name, dept, email)
  VALUES(2019, 22222, 'Lee Lee Lee', 'DS', 'lee@snu.ac.kr');
'''
)
session.execute(
'''
INSERT INTO snu.student (year, id, name, dept, email)
  VALUES(2020, 33333, 'Park Park Park', 'DS', 'park@snu.ac.kr');
'''
)

<cassandra.cluster.ResultSet at 0x14050530970>

### Consistency

![title](img/Consistency.png)

In [60]:
# CQL shell command
# You can run them in Astra dashboard CQL console
# CONSISTENCY
# CONSISTENCY ALL
# CONSISTENCY ONE

In [61]:
from cassandra import ConsistencyLevel
from cassandra.query import SimpleStatement
from cassandra.cluster import ExecutionProfile, EXEC_PROFILE_DEFAULT

profile = ExecutionProfile(
    consistency_level=ConsistencyLevel.ALL
)
cluster = Cluster(execution_profiles={EXEC_PROFILE_DEFAULT: profile})
session = cluster.connect()

In [62]:
query = SimpleStatement("SELECT * FROM snu.student",
    consistency_level=ConsistencyLevel.ALL)
session.execute(query).all()

[Row(year=2019, id=22222, dept='DS', email='lee@snu.ac.kr', name='Lee Lee Lee'),
 Row(year=2019, id=55555, dept='ECE', email='nikriz@snu.ac.kr', name='Jun Ha Chun'),
 Row(year=2020, id=11111, dept='ECE', email='kim@snu.ac.kr', name='Kim Kim Kim'),
 Row(year=2020, id=33333, dept='DS', email='park@snu.ac.kr', name='Park Park Park'),
 Row(year=2020, id=44444, dept='DS', email='simonjisu@snu.ac.kr', name='Ji Soo Jang')]

#### Installation

https://phoenixnap.com/kb/install-cassandra-on-windows

#### Materials

https://docs.datastax.com/en/cql-oss/3.3/index.html

https://cassandra.apache.org/doc/latest/cassandra/tools/cqlsh.html

https://docs.datastax.com/en/developer/python-driver/3.25/getting_started/

https://www.freecodecamp.org/news/the-apache-cassandra-beginner-tutorial/
