# CosmosDB Graph Database

This notebook shows how to connect and use [**Azure CosmosDB**](https://azure.microsoft.com/en-us/services/cosmos-db/) with the [Apache TinkerPop Gremlin API](https://tinkerpop.apache.org/) to model data as **graphs** using the [**CosmosDB Graph API**](https://docs.microsoft.com/en-us/azure/cosmos-db/graph-introduction).

The goal is to show how easy is to create a **Serverless** Graph Database in Azure to model data as graphs. 
Graphs are extremely useful when dealing with complex relations, non-fixed schemas and large data sets. 


We will use Azure CosmosDB Graph API to create a simple Graph Database using Python.

Please refer to the article for this notebook: 


## Setup (Optional)

If you want to create your own DB, follow this instructions:

- Log in into your Azure Portal
- Create a database account, follow the oficial [**quickstart**](https://docs.microsoft.com/en-us/azure/cosmos-db/create-graph-python#create-a-database-account)
- Add a [graph](https://docs.microsoft.com/en-us/azure/cosmos-db/create-graph-python#add-a-graph)

Follow the detailed instructions in the article.


## Demo

Now that you have created a graph database in Azure, let's model some data.

A graph is a structure that's composed of vertices and edges. Both objects can have an arbitrary number of key-value pairs as properties.

- **Vertices/nodes**: Vertices denote discrete entities, such as a person, a place, or an event.

- **Edges/relationships**: Edges denote relationships between vertices. For example, a person might know another person, be involved in an event, and recently been at a location.

- **Properties**: Properties express information about the vertices and edges. There can be any number of properties in either vertices or edges, and they can be used to describe and filter the objects in a query. Example properties include a vertex that has name and age, or an edge, which can have a time stamp and/or a weight.

- **Label**: A label is a name or the identifier of a vertex or an edge. Labels can group multiple vertices or edges such that all the vertices/edges in a group have a certain label. For example, a graph can have multiple vertices of label type "person".

Graph databases are often included within the NoSQL or non-relational database category, since there is no dependency on a schema or constrained data model. This lack of schema allows for modeling and storing connected structures naturally and efficiently.

![Cosmos DB](cosmos.png)



As an example, we are going to try to model and very simplified version of Medium.com! 

The goal is to show how to create nodes and several relations. This is what we are going to model. 

Note that this is not the correct model for Medium.com, just an approximation.

![Cosmos DB](demo.png)


We have users, which may have memberships. If they own a membership they may be interested in some topics. A user may read and clap to articles. It may also add a comment to an article or publish a new article. An article has an id, title, body and tags. User, Article, Topic are the labels of the nodes.

We can use the [**Gremlin Console**](https://docs.microsoft.com/en-us/azure/cosmos-db/create-graph-gremlin-console) to add and query data using the browser. We can also use .Net, Java or other programing languages.

In this notebook we will use **Python** to create the graph. Then, you can the built in data explorer to visualize the graph and execute Gremlin queries.

Let's start by installing and importing the required libs. Note that we use the official Gremlin libs which are fully supported by Azure CosmosDB.

In [7]:
pip install gremlinpython nest_asyncio

Note: you may need to restart the kernel to use updated packages.


Note that you need to restart the notebook kernel after intalling this. 
Since the client uses an Async API with an event loop which is also used by Jupyter, we need to support nested event loops, to do so, we install [nest_asyncio](https://github.com/erdewit/nest_asyncio).

Let's add the imports!

In [9]:
from gremlin_python.driver import client, serializer
import nest_asyncio
nest_asyncio.apply()

Let's intialize the client...

Enter the URL for the GREMLIN ENDPOINT and the primary key.

The username is in this format /dbs/{database}/colls/{graph}

In [None]:
client = client.Client('wss://gdemo.gremlin.cosmos.azure.com:443/', 'g',
                       username="/dbs/demodb/colls/dgraph",
                       password="<ENTER_PRIMARY_KEY>",
                       message_serializer=serializer.GraphSONSerializersV2d0()
                       )

Let's define a simple function to execute any Gremlin query, since this is a demo, we will block the call...

In [11]:
def run(queries):
    for query in queries:
        print("\tRunning: {0}".format(
            query))
        try:
            callback = client.submitAsync(query)
            if callback.result() is not None:
                results = callback.result()
                if results is not None:
                    print("\tSuccess!")
            else:
                print("Something went wrong with this query: {0}".format(query))
        except Exception as e:
            print('There was an exception: {0}'.format(e))

**Optional**: Clean up the DB

In [49]:
run(["g.V().drop()"])

	Running: g.V().drop()
	Success!


#### Let's add some data. 

First, we define the queries. Let's add a couple of users...

In [50]:
init_user = [
    "g.addV('User').property('id', 'user1').property('name','User 1').property('pk_id', '1')",
    "g.addV('User').property('id', 'user2').property('name','User 2').property('pk_id', '2').property('location','EU')"
]

Note that we add an ID to easily reference the data, otherwise one UUID would be generated which is hard to interpret.

Let's run the queries:

In [51]:
run(init_user)

	Running: g.addV('User').property('id', 'user1').property('name','User 1').property('pk_id', '1')
	Success!
	Running: g.addV('User').property('id', 'user2').property('name','User 2').property('pk_id', '2').property('location','EU')
	Success!


Next, let's add a membership for User 1:

In [52]:
run(["g.addV('Membership').property('id', 'mem1').property('name','Membership 1').property('pk_id', '3').property('type','premium')"])

	Running: g.addV('Membership').property('id', 'mem1').property('name','Membership 1').property('pk_id', '3').property('type','premium')
	Success!


Let's add our first relation between User 1 and Membership 1:

In [53]:
run(["g.V('user1').addE('has').to(g.V('mem1'))"])

	Running: g.V('user1').addE('has').to(g.V('mem1'))
	Success!


Next, we add some topics:

In [54]:
init_topics = [
    "g.addV('Topic').property('id', 'topic1').property('name','Machine Learning').property('pk_id', '4')",
    "g.addV('Topic').property('id', 'topic2').property('name','Big Data').property('pk_id', '5')",
    "g.addV('Topic').property('id', 'topic3').property('name','Scala').property('pk_id', '6')"
]
run(init_topics)

	Running: g.addV('Topic').property('id', 'topic1').property('name','Machine Learning').property('pk_id', '4')
	Success!
	Running: g.addV('Topic').property('id', 'topic2').property('name','Big Data').property('pk_id', '5')
	Success!
	Running: g.addV('Topic').property('id', 'topic3').property('name','Scala').property('pk_id', '6')
	Success!


And some articles:

In [55]:
init_articles = [
    "g.addV('Article').property('id', 'article1').property('name','The Secrets of NLP').property('pk_id', '7').property('body', 'blah').property('tags','nlp,ml,python')",
    "g.addV('Article').property('id', 'article2').property('name','Apache Spark Optimizations').property('pk_id', '8').property('body', 'blah').property('tags','none')",
    "g.addV('Article').property('id', 'article3').property('name','Introduction to Scala').property('pk_id', '9').property('body', 'blah')"
]
run(init_articles)

	Running: g.addV('Article').property('id', 'article1').property('name','The Secrets of NLP').property('pk_id', '7').property('body', 'blah').property('tags','nlp,ml,python')
	Success!
	Running: g.addV('Article').property('id', 'article2').property('name','Apache Spark Optimizations').property('pk_id', '8').property('body', 'blah').property('tags','none')
	Success!
	Running: g.addV('Article').property('id', 'article3').property('name','Introduction to Scala').property('pk_id', '9').property('body', 'blah')
	Success!


Let's add relations to topics, let's relate some articles to topics and some memberships as well:

In [56]:
add_topic_relations = [
    "g.V('mem1').addE('likes').to(g.V('topic1'))",
    "g.V('mem1').addE('likes').to(g.V('topic2'))",
    "g.V('article1').addE('has').to(g.V('topic1'))",
    "g.V('article2').addE('has').to(g.V('topic2'))",
    "g.V('article2').addE('has').to(g.V('topic1'))",
    "g.V('article3').addE('has').to(g.V('topic3'))",
]
run(add_topic_relations)

	Running: g.V('mem1').addE('likes').to(g.V('topic1'))
	Success!
	Running: g.V('mem1').addE('likes').to(g.V('topic2'))
	Success!
	Running: g.V('article1').addE('has').to(g.V('topic1'))
	Success!
	Running: g.V('article2').addE('has').to(g.V('topic2'))
	Success!
	Running: g.V('article2').addE('has').to(g.V('topic1'))
	Success!
	Running: g.V('article3').addE('has').to(g.V('topic3'))
	Success!


Next, let's add read and publish relations:

In [57]:
read_pub_relations = [
    "g.V('user1').addE('reads').to(g.V('article1'))",
    "g.V('user1').addE('reads').to(g.V('article3'))",
    "g.V('user2').addE('reads').to(g.V('article2'))",
    "g.V('user2').addE('publish').to(g.V('article1')).property('date', '2020-11-11')",
    "g.V('user2').addE('publish').to(g.V('article3')).property('date', '2019-11-11')",
    "g.V('user1').addE('publish').to(g.V('article2')).property('date', '2020-11-12')"
]
run(read_pub_relations)

	Running: g.V('user1').addE('reads').to(g.V('article1'))
	Success!
	Running: g.V('user1').addE('reads').to(g.V('article3'))
	Success!
	Running: g.V('user2').addE('reads').to(g.V('article2'))
	Success!
	Running: g.V('user2').addE('publish').to(g.V('article1')).property('date', '2020-11-11')
	Success!
	Running: g.V('user2').addE('publish').to(g.V('article3')).property('date', '2019-11-11')
	Success!
	Running: g.V('user1').addE('publish').to(g.V('article2')).property('date', '2020-11-12')
	Success!


Finally, let's add a comment:

In [59]:
run(["g.V('user1').addE('comment').to(g.V('article1')).property('comment', 'Great Job!')"])

	Running: g.V('user1').addE('comment').to(g.V('article1')).property('comment', 'Great Job!')
	Success!


Awesome, now you have some data in your graph database!. If you go to the Data Explorer you should be able to visualize it:
![Cosmos DB](result.png)

Great Job, now you know a bit more about graph databases!

Let's clean up before we go...

In [61]:
run(["g.V().drop()"])
client.close()

	Running: g.V().drop()
	Success!
