# Cassandra

## Instalace v Docker

```yaml
version: '3'

services: 
 cas1: 
    image: cassandra:latest
    ports:
      - 9042:9042
    environment:
      - CASSANDRA_START_RPC=true
      - CASSANDRA_CLUSTER_NAME=MyCluster
      - CASSANDRA_ENDPOINT_SNITCH=GossipingPropertyFileSnitch
      - CASSANDRA_DC=datacenter1
 cas2:
  image: cassandra:latest
  ports:
      - 9043:9042
  depends_on:
    - cas1
  environment:
      - CASSANDRA_START_RPC=true
      - CASSANDRA_CLUSTER_NAME=MyCluster
      - CASSANDRA_ENDPOINT_SNITCH=GossipingPropertyFileSnitch
      - CASSANDRA_DC=datacenter1
      - CASSANDRA_SEEDS=cas1
```

https://towardsdatascience.com/getting-started-with-apache-cassandra-and-python-81e00ccf17c9

https://docs.datastax.com/en/developer/python-driver/3.25/

Pojmové ekvivalenty

Database - Keyspace 

Table - Column Family

Primary key - Row key

Structured data - Unstructured data

Fixed schema - Flexible schema

## Instalace

In [7]:
!pip install cassandra-driver



In [10]:
from cassandra.cluster import Cluster

cassandraHost = '192.168.1.100'
cluster = Cluster([cassandraHost], port=9042)

session = cluster.connect()
#session = cluster.connect('mykeyspace')
#session.set_keyspace('users')
#session.execute('USE users')

In [26]:
result = session.execute("""CREATE KEYSPACE IF NOT EXISTS uois WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'datacenter1' : 3 };""")
print(list(result))

[]


In [30]:
import uuid

session.set_keyspace('uois')
result = session.execute('''CREATE TABLE IF NOT EXISTS users ( id UUID PRIMARY KEY, name text, surname text, email text) ''')
print(list(result))

[]


In [32]:
result = session.execute(
    'ALTER TABLE users ' 
    'ADD groups set<UUID>'
    )
print(list(result))

[]


## CRUD

### Create

In [14]:
session.execute(
    """
    INSERT INTO users (name, surname, id)
    VALUES (%s, %s, %s)
    """,
    ("John", "O'Reilly", uuid.uuid1())
)

<cassandra.cluster.ResultSet at 0x7f0e2e136700>

In [15]:
import random
import uuid

def randomUser():
    surNames = [
        'Novák', 'Nováková', 'Svobodová', 'Svoboda', 'Novotná',
        'Novotný', 'Dvořáková', 'Dvořák', 'Černá', 'Černý', 
        'Procházková', 'Procházka', 'Kučerová', 'Kučera', 'Veselá',
        'Veselý', 'Horáková', 'Krejčí', 'Horák', 'Němcová', 
        'Marková', 'Němec', 'Pokorná', 'Pospíšilová','Marek'
    ]

    names = [
        'Jiří', 'Jan', 'Petr', 'Jana', 'Marie', 'Josef',
        'Pavel', 'Martin', 'Tomáš', 'Jaroslav', 'Eva',
        'Miroslav', 'Hana', 'Anna', 'Zdeněk', 'Václav',
        'Michal', 'František', 'Lenka', 'Kateřina',
        'Lucie', 'Jakub', 'Milan', 'Věra', 'Alena'
    ]

    name1 = random.choice(names)
    name2 = random.choice(names)
    name3 = random.choice(surNames)
    return {'name': f'{name1} {name2}', 'surname': f'{name3}', 'email': f'{name1}.{name2}.{name3}@university.world', 'id': uuid.uuid1()}

randomUser()

{'name': 'Jan Petr',
 'surname': 'Kučerová',
 'email': 'Jan.Petr.Kučerová@university.world'}

In [37]:
def denameDict(data, *names):
    return tuple(data[name] for name in names)

newUser = randomUser()
print(denameDict(newUser, 'name', 'surname'))
print(denameDict(newUser, 'email', 'name', 'surname'))

('Václav Lenka', 'Veselý')
('Václav.Lenka.Veselý@university.world', 'Václav Lenka', 'Veselý')


In [39]:
def nameDict(data, *names):
    return dict(zip(names, data))

names = ['email', 'name', 'surname']
newUser = randomUser()
nameDict(denameDict(newUser, *names), *names)

{'email': 'Jana.Zdeněk.Novotná@university.world',
 'name': 'Jana Zdeněk',
 'surname': 'Novotná'}

In [19]:
def fillUsers(count=10):
    for _ in range(count):
        newUser = randomUser()
        session.execute(
            "INSERT INTO users (name, surname, email, id)"
            "VALUES (%s, %s, %s, %s)",
            denameDict(newUser, 'name', 'surname', 'email', 'id')
        )
        
fillUsers()

### Read

In [41]:
rows = session.execute('SELECT id, name, surname, email, groups FROM users')

for row in rows:
    print(row.id, row.email, row.name, row.surname, row.groups, sep='; ')


2ef65749-bef5-11ec-b0f6-6b6ac1b3cd5c; Jan.Hana.Krejčí@university.world; Jan Hana; Krejčí; None
2ee62b04-bef5-11ec-9623-6b6ac1b3cd5c; Jaroslav.Milan.Procházková@university.world; Jaroslav Milan; Procházková; None
2ef0dded-bef5-11ec-be2c-6b6ac1b3cd5c; Petr.Kateřina.Němec@university.world; Petr Kateřina; Němec; None
2ee44cbe-bef5-11ec-b78f-6b6ac1b3cd5c; Václav.Václav.Pospíšilová@university.world; Václav Václav; Pospíšilová; None
2eea2db0-bef5-11ec-a8c7-6b6ac1b3cd5c; Petr.Věra.Kučerová@university.world; Petr Věra; Kučerová; None
0cd930c2-bef5-11ec-8d4a-6b6ac1b3cd5c; None; John; O'Reilly; None
2eef27b2-bef5-11ec-bd3b-6b6ac1b3cd5c; Miroslav.Jakub.Pokorná@university.world; Miroslav Jakub; Pokorná; None
2ee796e2-bef5-11ec-9b90-6b6ac1b3cd5c; Josef.Tomáš.Dvořák@university.world; Josef Tomáš; Dvořák; None
2eec3700-bef5-11ec-91cb-6b6ac1b3cd5c; Michal.Marie.Nováková@university.world; Michal Marie; Nováková; None
2ef2cb92-bef5-11ec-967c-6b6ac1b3cd5c; Hana.Martin.Procházková@university.world; Hana Ma

### Update

In [49]:
session.set_keyspace('uois')
rows = session.execute('SELECT id, name, surname, email, groups FROM users')
firstUserId = rows[0].id
print(firstUserId)

rows = session.execute(
    'UPDATE users '
    'SET name=%s, '
    'surname=%s, '
    'email=%s '
    'WHERE id=%s', ('John', 'Newbie', 'john.newbie@university.world', firstUserId)
)

print(list(rows))

rows = session.execute('SELECT id, name, surname, email, groups FROM users')
for row in rows:
    print(row.id, row.email, row.name, row.surname, row.groups, sep='; ')

2ef65749-bef5-11ec-b0f6-6b6ac1b3cd5c
[]
2ef65749-bef5-11ec-b0f6-6b6ac1b3cd5c; john.newbie@university.world; John; Newbie; None
2ee62b04-bef5-11ec-9623-6b6ac1b3cd5c; Jaroslav.Milan.Procházková@university.world; Jaroslav Milan; Procházková; None
2ef0dded-bef5-11ec-be2c-6b6ac1b3cd5c; Petr.Kateřina.Němec@university.world; Petr Kateřina; Němec; None
2ee44cbe-bef5-11ec-b78f-6b6ac1b3cd5c; Václav.Václav.Pospíšilová@university.world; Václav Václav; Pospíšilová; None
2eea2db0-bef5-11ec-a8c7-6b6ac1b3cd5c; Petr.Věra.Kučerová@university.world; Petr Věra; Kučerová; None
0cd930c2-bef5-11ec-8d4a-6b6ac1b3cd5c; None; John; O'Reilly; None
2eef27b2-bef5-11ec-bd3b-6b6ac1b3cd5c; Miroslav.Jakub.Pokorná@university.world; Miroslav Jakub; Pokorná; None
2ee796e2-bef5-11ec-9b90-6b6ac1b3cd5c; Josef.Tomáš.Dvořák@university.world; Josef Tomáš; Dvořák; None
2eec3700-bef5-11ec-91cb-6b6ac1b3cd5c; Michal.Marie.Nováková@university.world; Michal Marie; Nováková; None
2ef2cb92-bef5-11ec-967c-6b6ac1b3cd5c; Hana.Martin.Proch

## Modely

In [67]:
import uuid
from cassandra.cqlengine import columns
from cassandra.cqlengine import connection
from datetime import datetime
from cassandra.cqlengine.management import sync_table
from cassandra.cqlengine.models import Model

#first, define a model
class ExampleModel(Model):
    example_id      = columns.UUID(primary_key=True, default=uuid.uuid4)
    example_type    = columns.Integer(index=True)
    created_at      = columns.DateTime()
    description     = columns.Text(required=False)
    
cassandraHost = '192.168.1.100'
#cluster = Cluster([cassandraHost], port=9042)

#session = cluster.connect()
#session.set_keyspace('uois')

connection.setup([cassandraHost], "uois", protocol_version=3)
sync_table(ExampleModel)

em1 = ExampleModel.create(example_type=0, description="example1", created_at=datetime.now())
em2 = ExampleModel.create(example_type=0, description="example2", created_at=datetime.now())
em3 = ExampleModel.create(example_type=0, description="example3", created_at=datetime.now())
em4 = ExampleModel.create(example_type=0, description="example4", created_at=datetime.now())
em5 = ExampleModel.create(example_type=1, description="example5", created_at=datetime.now())
em6 = ExampleModel.create(example_type=1, description="example6", created_at=datetime.now())
em7 = ExampleModel.create(example_type=1, description="example7", created_at=datetime.now())
em8 = ExampleModel.create(example_type=1, description="example8", created_at=datetime.now())

In [69]:
rows = session.execute('SELECT example_id, example_type, created_at, description FROM example_model')

for row in rows:
    print(row.example_id, row.example_type, row.created_at, row.description, sep='; ')

7efcf4d7-0d67-4a10-8ceb-194a7f5c542f; 1; 2022-04-18 19:52:43.722000; example6
df8f49fd-37df-4f9c-8e70-aaae24ba206a; 0; 2022-04-18 19:52:43.688000; example4
1892e48f-f921-4f4e-9054-9fcfa9be584d; 0; 2022-04-18 19:52:43.661000; example3
2075763d-6c49-41e2-a336-6018b37a79c9; 0; 2022-04-18 19:52:43.607000; example2
42d515e6-8ca6-4405-b259-e19350260d3f; 1; 2022-04-18 19:52:43.707000; example5
5752b893-7851-4d90-8aa5-83fdf94d4664; 1; 2022-04-18 19:52:43.742000; example7
d4beb9af-4cd6-4dc9-9339-257001740471; 1; 2022-04-18 19:52:43.777000; example8
3bea95d3-a0f8-4250-89cd-f76b7447deff; 0; 2022-04-18 19:52:43.544000; example1


In [70]:
#ExampleModel.objects.count()

q = ExampleModel.objects(example_type=1)
#q.count()

for instance in q:
     print(instance.description)

example6
example5
example7
example8


In [63]:
session.set_keyspace('uois')
#session.execute("DROP TABLE wideusers")
rows = session.execute("CREATE COLUMNFAMILY IF NOT EXISTS wideusers (id UUID PRIMARY KEY, name text, surname text, email text)")# with default_validation_class 'UTF8Type'")
print(list(rows))

rowId = uuid.uuid1()
rows = session.execute('INSERT INTO wideusers (name, surname, id) VALUES (%s, %s, %s)', ("Born", "O'Reilly", rowId))
#rows = session.execute("GET wideusers[%s]['surname']", (rowId, ))
#print(list(rows))
rows = session.execute("SET wideusers[%s]['bornname'] = %s", ("born", rowId))
print(list(rows))


[]


SyntaxException: <Error from server: code=2000 [Syntax error in CQL query] message="line 1:0 no viable alternative at input 'SET' ([SET]...)">

## Graph

Dokumentace, ale nefunguje https://docs.datastax.com/en/developer/python-driver/3.25/graph/

In [42]:
from cassandra.cluster import Cluster, EXEC_PROFILE_GRAPH_SYSTEM_DEFAULT

cassandraHost = '192.168.1.100'
cluster = Cluster([cassandraHost], port=9042)
session = cluster.connect()

graph_name = 'uois_graph'
session.execute_graph("system.graph(name).ifNotExists().engine(Classic).create()", {'name': graph_name},
                      execution_profile=EXEC_PROFILE_GRAPH_SYSTEM_DEFAULT)

SyntaxException: <Error from server: code=2000 [Syntax error in CQL query] message="line 1:0 no viable alternative at input 'system' ([system]...)">

In [72]:
from cassandra.cluster import Cluster, GraphExecutionProfile, EXEC_PROFILE_GRAPH_DEFAULT
from cassandra.graph import GraphOptions, GraphProtocol, graph_graphson3_row_factory

graph_name = 'uois_graph'
ep_graphson3 = GraphExecutionProfile(
    row_factory=graph_graphson3_row_factory,
    graph_options=GraphOptions(
        graph_protocol=GraphProtocol.GRAPHSON_3_0,
        graph_name=graph_name))

cassandraHost = '192.168.1.100'
cluster = Cluster([cassandraHost], port=9042, execution_profiles={'core': ep_graphson3})
session = cluster.connect()

#session.execute_graph("g.addV(...)", execution_profile='core')
session.execute_graph("g.create()", execution_profile='core')

SyntaxException: <Error from server: code=2000 [Syntax error in CQL query] message="line 1:0 no viable alternative at input 'g' ([g]...)">