# Lecture 8: Graph exercise/MongoDB introduction

Gittu George, February 3, 2022

## Agenda

- Graph exercise 
- Introduction to NoSQL document database (MongoDB)
- Basic queries in MongoDB

## Objectives

- To make you capable on applying CQL skills on a real world problem (twitter data)
- Have an understanding on document database and mongoDB
- Setting up mongoDB instance in cloud
- Write basic MQL

## Worksheet 8

***In any graph questions keep two rules in mind. Can you...***
- Break the questions into various parts
  - the rate at which people reply
  - and are replied to
- Can you draw the logic with a pen and paper? (Maybe for a complex scenario or for identifying some interesting questions)

```{toggle}
Did you draw ?

![](img/graphtwi.png)
```

***Here are some of my thoughts on formulating a CQL query.***

- What are the elements that I need to RETURN?
    - gotreply (the rate at which people reply), 
    - sentreply (and are replied to)
- Where can I get these elements? (Check CALL apoc.meta.data())
- What pattern do I need to give in my MATCH? 
    - I need to be careful [with the direction](https://ggeorg02.github.io/BAIT580/lectures/lecture8.html#relationships).
- Do I need to provide multiple MATCH clauses/patterns? 
    - Maybe multiple MATCH for subquerying
- Is any [subquery needed?](https://ggeorg02.github.io/BAIT580/lectures/lecture10.html#limiting-the-number-of-results) 
    - Maybe YES, one for the first breakdown (gotreply)
    - and the other for the second breakdown I listed.(sentreply)
- Do I have to do any aggregation? 
    - YES, [sum()](https://neo4j.com/docs/cypher-manual/current/functions/aggregating/#functions-sum), so what property needs to be inside this? 
- Okay, I need to do aggregation, but what should be [the grouping key ?](https://ggeorg02.github.io/BAIT580/lectures/lecture10.html#aggregation-in-cypher)

## MongoDB

<img src="img/mongo.png" width="400">

MongoDB is a document-based DBMS:

- Released in 2009
- Written in C++
- Open source
- Cross platform
- Super fast

MongoDB is based on JSON-like documents for data storage. It offers:

- Native replication and sharding
- Automatic scaling and load balancing
- Multi-language support
- Powerful query language

### Who uses MongoDB

Google, ebay, Craigslist, Toyota, Forbes, Electronic Arts, Adobe, AstraZeneca, and the list goes on.

(https://www.mongodb.com/who-uses-mongodb)

### JSON

- JSON is short for Java Script Object Notation.
- JSON documents are simple containers, where a string key is mapped to a value (e.g. a number, string, function, another object).

```json
{
  "_id": 1,
  "name" : { "first" : "John", "last" : "Backus" },
  "contribs" : [ "Fortran", "ALGOL", "Backus-Naur Form", "FP" ],
  "awards" : [
    {
      "award" : "W.W. McDowell Award",
      "year" : 1967,
      "by" : "IEEE Computer Society"
    }, {
      "award" : "Draper Prize",
      "year" : 1993,
      "by" : "National Academy of Engineering"
    }
  ]
}
```

JSON documents can be found everywhere:

- APIs
- Configuration files
- Log messages
- Database storage

### BSON

Although the JSON document may look great for storing data **_as is_**, but it has a number of drawbacks:

- JSON is text, and text parsing is very slow
- JSON’s format is readable but not space-efficient (a database concern)
- JSON's support of various data types is not great

It's because of the above reasons that MongoDB stores data in BSON (Binary JSON) files, which address all of the above issues but still look like JSON when we work with them in MongoDB.

For a an overview, see [here](https://www.mongodb.com/json-and-bson).

### Collections

In MongoDB, a database consists of one or more **collections**, each containing multiple **documents**.

<img src="img/collection.png" width="600">

### Documents

<img src="img/document.png" width="600">

- Each document contains field-value pairs
- The field name `_id` acts as the primary key of each document, and should therefore be unique in a collection
- MongoDB automatically assigns an `_id` value if not specified at the time of inserting a document
- MongoDB creates an index on the `_id` field by default
- The maximum size of a BSON document is about 16MB

[MongoDB Atlas](https://docs.atlas.mongodb.com/) is a fully managed cloud database service, that automates the whole process of configuring, administration and maintaining of a database server for you. Basically, you specify what kind of server (CPU, RAM, number of nodes, location, etc.) you need, and MongoDB Atlas sets it up for you. They've partnered with Amazon Web Services, Google Cloud Platform, Microsoft Azure to host their database instances.

The majority of these services are paid, however, they also offer a basic database service that is **free** and is best suited for learning and exploring. We'll use the free MongoDB Atlas clusters for our course. You can set up your own cluster [here](https://www.mongodb.com/cloud/atlas/register).

```{admonition} See also
[Here are the details](../installation/Mongodb.md) on how to setup your MongoDB atlas and setup your `mongosh`.
```

### MongoDB interfaces

#### MongoDB shell (`mongosh`)

This is command line interface for interacting with a MongoDB database, similar to `psql` that we've used for Postgres. `mongosh` is based on the JavaScript language. We will not use `mongosh` much in this course.

#### MongoDB Compass
Compass is a versatile graphical user interface for using MongoDB databases. This is a similar application to pgAdmin that we've used for Postgres.

#### MongoDB's Python driver (`pymongo`)
And finally, `pymongo` is the official Python driver for MongoDB. If your using the course `conda` environment, this package is installed and ready to use in Jupyter Lab. You can take a look at `pymongo`s documentation [here](https://pymongo.readthedocs.io/en/stable/tutorial.html).

In [17]:
from pymongo import MongoClient
import json
import urllib.parse

with open('credentials_mongodb.json') as f:
    login = json.load(f)

username = login['username']
password = urllib.parse.quote(login['password'])
host = login['host']
url = "mongodb+srv://{}:{}@{}/?retryWrites=true&w=majority".format(username, password, host)

In [18]:
client = MongoClient(url)

In [19]:
client.list_database_names()

['sample_airbnb',
 'sample_analytics',
 'sample_geospatial',
 'sample_guides',
 'sample_mflix',
 'sample_restaurants',
 'sample_supplies',
 'sample_training',
 'sample_weatherdata',
 'admin',
 'local']

## Can you?

- Take your CQL knowledge and work on a real-life scenario
- list down the benefits of having a document database 
- Set up MongoDB cloud instance?
- Various ways to interact with MongoDB

## Class activity

- Practice CQL.
- Set up MongoDB in the cloud.