# Demo 1

<table><tr>
    <td><img src="image/esi-sba.png" width="100" height="100"></td>
    <td><img src="image/cassandra.png" width="100" height="100"></td>
    </tr></table>



## Dans ce demo, nous travaillons sur les notions bases de  Apache Cassandra:
### creation d'un keyspace, table, etc
### insertion et requêtage des données
### Query Model : Two Queries ==> two tables
### Primary key , Partition Key & Clustering Column
### Where Clause
### ALLOW FILTERING

#### installer cassandra-driver afin de se connecter au cluster cassandra
! pip install cassandra-driver
#### More documentation can be found here:  https://datastax.github.io/python-driver/

#### Import Apache Cassandra python package

In [1]:
import cassandra

### créer une connection au cluster cassandra

In [13]:
from cassandra.cluster import Cluster

try: 
    cluster = Cluster(['127.0.0.1']) # si cassandra est installé localement avec le port par défaut 9042
    session = cluster.connect()
except Exception as e:
    print(e)

### créer maintenant un keyspace nommé "tp1_esi"

In [14]:
try:
    session.execute("""
    CREATE KEYSPACE IF NOT EXISTS tp2_isi_esi
    WITH REPLICATION = 
    { 'class' : 'SimpleStrategy', 'replication_factor' : 1 }"""
)

except Exception as e:
    print(e)

### connecter à votre Keyspace tp1_esi 
Equivalent à "use tp1_esi" 

In [15]:
try:
    session.set_keyspace('tp2_isi_esi')
except Exception as e:
    print(e)

### Nous voulons analyser la clientèle de différents magasins en utilisant ces deux requêtes
#### 1. retourner le nbClient par ville 
`select nbclient from magasin_client WHERE ville="sba"`
#### 2. retourner le nbClient par année
`select nbclient from magasin_client WHERE annee=2021`


## Puisque on a deux différentes requêtes, on a besoin de deux différentes tables qui partitionnent les données différemment

* La table magasin_client va être partitionnée par ville 
* La table magasin_client va être partitionnée par année


`Table Name: magasin_byVille
column 1: ville
column 2: nbclient
column 3: annee
PRIMARY KEY(ville)`


` Table Name: magasin_byAnnee 
column 1: ville
column 2: nbclient
column 3: annee
PRIMARY KEY (annee)`


In [5]:
query = "CREATE TABLE IF NOT EXISTS magasin_byVille "
query = query + "(ville text, nbclient int, annee int, PRIMARY KEY (ville))"
session.execute(query)
    
query = "CREATE TABLE IF NOT EXISTS magasin_byAnnee  "
query = query + "(ville text, nbclient int, annee int, PRIMARY KEY (annee))"
session.execute(query)

<cassandra.cluster.ResultSet at 0x7f80a1178a60>

### Insertion des données dans la table "magasin_byVille "

In [6]:
query = "INSERT INTO magasin_byVille (ville, nbclient, annee)"
query = query + " VALUES (%s, %s, %s)"

session.execute(query, ("sba", 900, 2019))
session.execute(query, ("sba", 1200, 2020))
session.execute(query, ("sba", 1300, 2017))
session.execute(query, ("sba", 100, 2018))
session.execute(query, ("oran", 2900, 2020))
session.execute(query, ("oran", 3600, 2017))
session.execute(query, ("oran", 4000, 2019))
session.execute(query, ("oran", 200, 2018))

<cassandra.cluster.ResultSet at 0x7f80a1179f60>

### Insertion des données dans la table "magasin_byAnnee "

In [7]:
query1 = "INSERT INTO magasin_byAnnee (ville, nbclient, annee)"
query1 = query1 + " VALUES (%s, %s, %s)"

session.execute(query1, ("sba", 900, 2019))
session.execute(query1, ("sba", 1200, 2020))
session.execute(query1, ("sba", 1300, 2017))
session.execute(query1, ("sba", 100, 2018))
session.execute(query1, ("oran", 2900, 2020))
session.execute(query1, ("oran", 3600, 2017))
session.execute(query1, ("oran", 4000, 2019))
session.execute(query1, ("oran", 200, 2018))

<cassandra.cluster.ResultSet at 0x7f809fe74d30>

### Valider votre modèle en exécutant les requêtes suivantes
#### est ce que toutes les données sont retournées??

`select * from magasin_byVille WHERE ville="sba"`

`select * from magasin_byAnnee WHERE annee=2020`

In [8]:
# data By Ville
query = "select * from magasin_byVille WHERE ville='sba'"
rows = session.execute(query)
for row in rows:
    print ("from magasin_byVille:-", row.ville,row.nbclient,row.annee)



from magasin_byVille:- sba 100 2018


In [9]:
# data By Annee    
query = "select * from magasin_byAnnee WHERE annee=2020"
rows = session.execute(query)
for row in rows:
    print ("from magasin_byAnnee:-",row.ville,row.nbclient,row.annee)

from magasin_byAnnee:- oran 2900 2020


### Cela n'a pas fonctionné comme prévu ! Pourquoi donc? Parce que nous n'avons pas créé de clé primaire unique.
### Créer une nouvelle table  magasin_byVille_2 avec une clé composite (ville, annee), tel que: 
* ville as a partition key
* annee as a clustering column

`Table Name: magasin_byVille_2
column 1: ville
column 2: nbclient
column 3: annee
PRIMARY KEY(ville,annee)`

In [10]:
query = "CREATE TABLE IF NOT EXISTS magasin_byVille_2 "
query = query + "(ville text, nbclient int, annee int, PRIMARY KEY (ville,annee))"
session.execute(query)

query = "INSERT INTO magasin_byVille_2 (ville, nbclient, annee)"
query = query + " VALUES (%s, %s, %s)"

session.execute(query, ("sba", 900, 2019))
session.execute(query, ("sba", 1200, 2020))
session.execute(query, ("sba", 1300, 2017))
session.execute(query, ("sba", 100, 2018))
session.execute(query, ("oran", 2900, 2020))
session.execute(query, ("oran", 3600, 2017))
session.execute(query, ("oran", 4000, 2019))
session.execute(query, ("oran", 200, 2018))

<cassandra.cluster.ResultSet at 0x7f80a1179b70>

### Créer une nouvelle table  magasin_byAnnee_2 avec une clé composite (annee, ville), tel que: 
* annee as a partition key
* ville as a clustering column


` Table Name: magasin_byAnnee_2
column 1: ville
column 2: nbclient
column 3: annee
PRIMARY KEY (annee,ville)`

In [24]:
query = "CREATE TABLE IF NOT EXISTS magasin_byAnnee_2 "
query = query + "(ville text, nbclient int, annee int, PRIMARY KEY (annee,ville))"
session.execute(query)

query = "INSERT INTO magasin_byAnnee_2 (ville, nbclient, annee)"
query = query + " VALUES (%s, %s, %s)"

session.execute(query, ("sba", 900, 2019))
session.execute(query, ("sba", 1200, 2020))
session.execute(query, ("sba", 1300, 2017))
session.execute(query, ("sba", 100, 2018))
session.execute(query, ("oran", 2900, 2020))
session.execute(query, ("oran", 3600, 2017))
session.execute(query, ("oran", 4000, 2019))
session.execute(query, ("oran", 200, 2018))

<cassandra.cluster.ResultSet at 0x7f6d6516dcf0>

### Valider le nouveau modèle

`select * from magasin_byVille_2 WHERE ville="sba"`

`select * from magasin_byAnnee_2 WHERE annee=2020`

In [19]:
# data By Ville
query = "select * from magasin_byVille_2 WHERE ville='sba'"
rows = session.execute(query)
for row in rows:
    print ("from magasin_byVille:-", row.ville,row.annee,row.nbclient)

from magasin_byVille:- sba 2017 1300
from magasin_byVille:- sba 2018 100
from magasin_byVille:- sba 2019 900
from magasin_byVille:- sba 2020 1200


In [25]:
# data By Annee    
query = "select * from magasin_byAnnee_2 WHERE annee=2020"
rows = session.execute(query)
for row in rows:
    print ("from magasin_byAnnee:-",row.annee, row.ville,row.nbclient)

from magasin_byAnnee:- 2020 oran 2900
from magasin_byAnnee:- 2020 sba 1200


### Essayons d'autres requêtes 

*Query1=select * from magasin_byVille_2 WHERE ville="sba" and annee>2018

*Query2=select * from magasin_byVille_2 WHERE annee>2018

*Query3=select * from magasin_byVille_2 WHERE ville='oran' and nbClient>1000

*Query4=select * from magasin_byAnnee_2 WHERE annee>2018

#### Query1==>select * from magasin_byVille_2 WHERE ville="sba" and annee>2018


In [26]:
#query1
query = "select * from magasin_byVille_2 WHERE ville='sba' and annee>2018"

try:
      rows = session.execute(query)
except Exception as e:
      print(e)
    
for row in rows:
    print (row.ville, row.annee,row.nbclient)


sba 2019 900
sba 2020 1200


#### Query2==>"select * from magasin_byVille_2 WHERE annee>2018"

In [27]:
#query2    
query = "select * from magasin_byVille_2 WHERE annee>2018"

try:
      rows = session.execute(query)
except Exception as e:
      print(e)
    
for row in rows:
    print (row.ville, row.annee,row.nbclient)

Error from server: code=2200 [Invalid query] message="Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING"


#### Query3 ==> "select * from magasin_byVille_2 WHERE ville='oran' and nbClient>1000"

In [28]:
#query3
query="select * from magasin_byVille_2 WHERE ville='oran' and nbClient>1000"
try:
      rows = session.execute(query)
except Exception as e:
      print(e)
    
for row in rows:
    print (row.ville, row.annee,row.nbclient)

Error from server: code=2200 [Invalid query] message="Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING"


#### Query4 ==> "select * from magasin_byAnnee_2 WHERE annee>2018"

In [29]:
#query4   
query = "select * from magasin_byAnnee_2 WHERE annee>2018"

try:
      rows = session.execute(query)
except Exception as e:
      print(e)
    
for row in rows:
    print (row.ville, row.annee,row.nbclient)

Error from server: code=2200 [Invalid query] message="Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING"


### Forcer Cassandra à exécuter les requêtes coûteuses en utilisant "ALLOW FILTERING"
#### Peu efficace ==> risque de parcourir toutes les partitions et par conséquent tous les nœuds du cluster


In [11]:
#query2    
query = "select * from magasin_byVille_2 WHERE annee>2018 ALLOW FILTERING"

try:
      rows = session.execute(query)
except Exception as e:
      print(e)
    
for row in rows:
    print (row.ville, row.annee,row.nbclient)
    
print("-------------------------------")



#query3
query="select * from magasin_byVille_2 WHERE ville='oran' and nbClient>1000 ALLOW FILTERING"
try:
      rows = session.execute(query)
except Exception as e:
      print(e)
    
for row in rows:
    print (row.ville, row.annee,row.nbclient)

print("-------------------------------")    
    

    
#query4   
query = "select * from magasin_byAnnee_2 WHERE annee>2018 ALLOW FILTERING"

try:
      rows = session.execute(query)
except Exception as e:
      print(e)
    
for row in rows:
    print (row.ville, row.annee,row.nbclient)

sba 2019 900
sba 2020 1200
oran 2019 4000
oran 2020 2900
-------------------------------
oran 2017 3600
oran 2019 4000
oran 2020 2900
-------------------------------
oran 2019 4000
sba 2019 900
oran 2020 2900
sba 2020 1200


###  drop  tables. 

In [12]:
query = "drop table IF EXISTS  magasin_byVille"
rows = session.execute(query)

query = "drop table IF EXISTS magasin_byVille_2"
rows = session.execute(query)

query = "drop table IF EXISTS magasin_byAnnee"
rows = session.execute(query)

query = "drop table IF EXISTS magasin_byAnnee_2"
rows = session.execute(query)

### close the session and cluster connection

In [None]:
session.shutdown()
cluster.shutdown()