# Alternatives

MongoDB is the most popular Document oriented DB. But depending on the data you want to store and the project you are working on you might want to use other DBs. We will discuss different alternatives to MongoDB, their strengths and weaknesses. We wont go into to much details how to install every one of them and how to interact with them using python but feel free to try them on your own !

## REDIS

Redis is a key-value storage system. It is an open source (BSD licensed), in-memory data structure store, used as a database, cache and message broker (i.e it stores data on ram and save it on disk after a defined number of transaction or when you shut down the server.). This means that the database has a stockage limit which is your ram.
read more on:
https://redis.io/topics/faq
https://redis.io/topics/persistence

Easy install:
- windows https://redislabs.com/ebook/appendix-a/a-3-installing-on-windows/a-3-2-installing-redis-on-window/
- macos https://redislabs.com/ebook/appendix-a/a-2-installing-on-os-x/
- linux https://redislabs.com/ebook/appendix-a/a-1-installation-on-debian-or-ubuntu-linux/

once your redis-server is launched you can use python to interact with it

In [1]:
import redis
import json
redis_server = redis.Redis("localhost", db=0)
redis_server.set("name", "orkb")
redis_server.get("name")

user = {"Name":"Pradeep", "Company":"SCTL", "Address":"Mumbai", "Location":"RCP"}
user_updated = {"Name":"Pradeep", "Company":"Google", "Address":"Mumbai", "Location":"RCP"}
# create
redis_server.set("pythonDict", json.dumps(user))
# read
print(redis_server.get("pythonDict"))
# update
redis_server.set("pythonDict", json.dumps(user_updated))
# delete
redis_server.delete("pythonDict")

b'{"Name": "Pradeep", "Company": "SCTL", "Address": "Mumbai", "Location": "RCP"}'


1

### TODO save the papers created in chapter I (ai_papers.json) into redis, update one of the papers by adding a random vector, delete an other paper from the DB and read the rest.

## Neo4j


To install neo4j
https://neo4j.com/download-thanks/?edition=community
https://riptutorial.com/neo4j/example/13244/installation---starting-a-neo4j-server
https://www.youtube.com/watch?v=3JMhX1sT98U

run:
neo4j console
neo4j start

goto:
http://localhost:7474/


In [2]:
# First steps in py2neo
import py2neo

# Connect to graph db , default is auth=("neo4j", "neo4j")
# to change default auth just go to the localhost

graph = py2neo.Graph("bolt://localhost:7687", auth=("neo4j", "tutorial"), name="neo4j")

# Node method: first argument is Node label and then arbitrary number of characteristics
Pierre = py2neo.Node("Person", name="Pierre", age = 25) 
Agathe = py2neo.Node("Person", name="Agathe", age = 27)
Kevin = py2neo.Node("Person", name="Kevin", age = 24)

# You need to commit these node before they appear in the db
# We will do that by using a transaction

transaction = graph.begin()
ab = py2neo.Relationship(Pierre, "COLLABORATED", Kevin, n_collab = 3)
ba = py2neo.Relationship(Kevin, "COLLABORATED", Pierre, n_collab = 2)
transaction.create(Pierre|Agathe|Kevin)
transaction.create(ab)
transaction.create(ba)
transaction.commit()

print(graph.exists(ab))

True


### TODO Create a graph from the M1 class with name, Group of 3 that collab (undirected relation).

You can also create your own class of node type with specific argument (more structured).

In [3]:
import py2neo
from py2neo import ogm
import numpy as np
import tqdm

graph = py2neo.Graph("bolt://localhost:7687", auth=("neo4j", "tutorial"), name="neo4j")
# Delete every node and relation
graph.delete_all()

# The class Person will inherit from ogm.GraphObject
class Person(ogm.GraphObject):
    __primarykey__ = "id_"
    
    id_ = ogm.Property()
    name = ogm.Property()
    age = ogm.Property()
    
    def __init__(self):
        self.node = self.__ogm__.node

# Init dict with the different name for random attribution     
names = {"0":"Pierre",
        "1":"Kevin",
        "2":"Agathe"}

transaction = graph.begin()
iteration = 0
for i in tqdm.tqdm(range(1000000)):
    # Create instance of class
    ind = Person()
    # Init variable of ind
    ind.id_ = int(iteration)
    name = names[str(np.random.random(3).argmax())]
    ind.name = name
    ind.age = int(np.random.randint(low=20, high=60, size=1)[0])
    
    # init transaction  
    transaction.create(ind.node)
    iteration += 1
    
    # commit transaction and reset it every 1000 iterations
    if i % 1000 == 0:
        transaction.commit()
        transaction = graph.begin()

  0%|                                                                        | 1619/1000000 [00:10<1:48:34, 153.26it/s]


KeyboardInterrupt: 

Wors pretty well but it is pretty slow. To overcome this problem you can use Cypher queries. Cypher is Neo4j’s graph query language and is way faster than the py2neo wrapper. Cypher is like SQL a declarative, textual query language, but for graphs.If you have a small project you can ignore Cypher queries but if you are indeed interested by neo4j I really recommend to learn Cypher queries. We will learn CRUD operation with Cypher queries. https://neo4j.com/docs/cypher-refcard/current/

### CREATE

In [6]:
import tqdm
import py2neo

graph = py2neo.Graph("bolt://localhost:7687", auth=("neo4j", "tutorial"), name="neo4j")
graph.delete_all()

names = {"0":"'Pierre'",
        "1":"'Kevin'",
        "2":"'Agathe'"}

# Execute a query with graph.run
# Here the query creates a "constraint" on the id_ of the person asking for it to be unique i.e Primary key
try:
    graph.run("CREATE CONSTRAINT ON (n:Person) ASSERT n.id_ IS UNIQUE")
except:
    print("Constraint already exists")
    
# init a list of transaction that we will commit at the same time
transaction_list = []

iteration = 0
for i in tqdm.tqdm(range(1000000)):
    id_ = iteration
    name = names[str(np.random.random(3).argmax())]
    age = int(np.random.randint(low=20, high=60, size=1)[0])
    # Append a dict, 1 dict per person 
    transaction_list.append({"id_":id_,"name":name,"age":age})
    iteration += 1
    if i % 1000 == 0:
        # To commit a list of transaction with cypher you can use the UNWIND parameter
        # $json => json will be the argument of run with the data
        transaction = "UNWIND $json as data CREATE (n:Person) SET n = data"
        graph.run(transaction, json=transaction_list)
        transaction_list = []

graph.run(transaction, json=transaction_list)
# CREATE Relation

for i in range(10):
    query = """MATCH (a:Person),(b:Person)
    WHERE a.id_ = {} AND b.id_ = {}
    CREATE (a)-[r:COLLAB]->(b)
    """.format(str(i),str(i+1))
    graph.run(query)

for i in range(21,25,1):
    query = """MATCH (a:Person),(b:Person)
    WHERE a.id_ = %s AND b.id_ = %s
    CREATE (a)-[:COLLAB {n_collab: %s}]->(b)
    """% (str(i),str(i+1),2)
    graph.run(query)


### TODO 18 CREATE the graph of the M1 class using cypher queries

In [7]:
import py2neo
from py2neo import ogm
import numpy as np
import tqdm

graph = py2neo.Graph("bolt://localhost:7687", auth=("neo4j", "tutorial"), name="neo4j")


### READ

In [8]:
# READ

import tqdm
import py2neo

graph = py2neo.Graph("bolt://localhost:7687", auth=("neo4j", "tutorial"), name="neo4j")


query = """
MATCH(person:Person)
RETURN person.name AS name, person.age AS age
"""

# Returns a generator
data = graph.run(query)

#iterate through all elements
iteration = 0
pbar = tqdm.tqdm(total=1000000)
while True:
    next(data)
    #print(data["name"])
    #print(data["age"])
    iteration += 1
    pbar.update(1)

    

100%|██████████████████████████████████████████████████████████████████████▉| 998756/1000000 [03:32<00:00, 4698.70it/s]

StopIteration: 

100%|██████████████████████████████████████████████████████████████████████▉| 999001/1000000 [03:50<00:00, 4698.70it/s]

### UPDATE

In [9]:
## update

query = """
    MATCH (person:Person) 
    WHERE person.id_ < 10
    SET person.newobs = %s
    RETURN person.id_, person.newobs
    """ % (43)

df = graph.run(query).to_data_frame()


### DELETE

In [10]:
# Delete

query = """
    MATCH (person:Person) 
    WHERE person.id_ < 10
    DETACH DELETE person
    """ 

graph.run(query)

ValueError: Missing keys

To create a new db as community edition of neo4j is harder than for the entreprise edition.

1. Edit the file  NEO4J_HOME\conf\neo4j.conf

2. Un-comment the line:  dbms.default_database=neo4j

3. Change the neo4j  to whatever database name you want for a new database.  Note: names must have between 3 and 63 characters.   For example:   dbms.default_database=mydatabase

4. Save the file

5. (If applicable) Kill the database server, and close the browser window with the Neo4j UI

6. Start the neo4j server, and open a new browser window, pointed as usual to http://localhost:7474/

7. Both the old (default) database, "neo4j" and the one you just created will show up.  However, attempting to switch between them causes an error.   If a switch is desired, repeat the above steps starting from (3)

### TODO 19 change the database "neo4j" to the database "movie". Go to the browser and run on the browser console :play movie-graph then copy the queries on the console and run it. Congrats you just imported a brand new sample of data. Try to answer the following questions with cypher queries.

How many actors played in more than 2 movie ? Get their names and year they were born.

In how many movies did Tom hanks play ?

In average in how many movies do actors play in ?

In average how many actors are there in a movie ?

In average how many writers is needed to write a movie ?

What is the proportion of writers that also directed the movie ?

Do an histogram of the year of release for the movies.

Do an histogram of the persons.

