# Learning Objectives

- [ ] 3.3.6 Understand how NoSQL database management system addresses the shortcomings of relational database management system (SQL). 
- [ ] 3.3.7 Explain the applications of SQL and NoSQL. 
- [ ] 3.3.8 Use a programming language to work with both SQL and NoSQL databases. 

# References

1. Leadbetter, C., Blackford, R., & Piper, T. (2012). Cambridge international AS and A level computing coursebook. Cambridge: Cambridge University Press.
2. https://www.mongodb.com/compare/mongodb-dynamodb


Recall that a **database** is a collection of related data where all records have the same structure or  collection of data stored in an organised or logical manner.

# 15.1 NoSQL databases
Relational databases (commonly referred as SQL databases) work well with structured data since each table's **schema** (the precise description of the data to be stored and the relationships between them) is always clearly defined. However, with the 
increasing number of ways to gather and generate data, we often need to deal with unstructured data. 

For example, a convenience store that frequently refreshes the services it provides may sell both mobile phones as well as groceries. To run the store, information about both mobile phones (e.g., model names and prices) and groceries (e.g prices and descriptions) need to be stored. In the future, the store may also start selling mobile phone subscription plans as well. Storing all this data in the same relational database may not be easy. In this case, non-relational databases, also referred to as NoSQL databases, can offera better choice.

There are four main types of NoSQL databases: 
- key-value databases. In this databasae, data is stored as a collection of key-value pairs in which a key serves as a unique identifier. E.g. Amazon DynamoDB. In this database, your query is limited to the key **only** and values retrieved by the key are not known (opaque). 
- document databases. Document databases work like a hash table, but each key can point to an embedded key-value structure, also known as a **document**, instead of just a single value. (Recall that in a hash table, each key points to a single value or data item.). E.g., MongoDB
- wide column databases. Data tables are stored in terms of column instead of row. 
- graph databases are databases that uses graph structures for semantic queries with nodes, edges, and properties to represent and store data. E.g., neo4j

# 15.2 Differences between SQL and Document Database

- Relational databases have a **fixed, predefined schema** that its tables follows but NoSQL databases usually have **no predefined schema**, which is dynamic and can change easily
- Relational databases contain tables while document databases like MongoDB contain collections. The data types of each field in the table is fixed for relational databases but it is flexible for document databases like MongoDB.
- Relational databases represent data in tables and rows while document databases store  data as collections of documents.
- For relational databases, joins are usually used to get data across tables, while for document databases like MongoDB there is usually no such joins. Thus it is easier to use relational databases for complex queries rather than NoSQL databases.

# 15.3 MongoDB 

MongoDB is a very popular NoSQL document database, which uses `JSON` (Java Script Notation Object)-like **documents** to store records.  JSON has the format
>```python
> {
>   <attribute_name_1>: <attribute_values_1>,
>   <attribute_name_2>: <attribute_values_2>
>                   ....
> }
>```
which looks like python `dict` object. 

Terminologies used for MongoDB is a little different compared with SQL. Below is the table of terms in MongoDB with corresponding terms in SQL. 

<center>

| **SQL Term** | **MongoDB Term** | 
|-|-|
| `Database` | `Database` | 
| `Table` | `Collection` | 
| `Row/Record` | `Document` | 
| `Column/Field/Attribute` | `Field` |

</center>

## 15.3.1 Running MongoDB
After installation, open command prompt and type `mongo` to run MongoDB shell. To maintain access to the MongoDB databases, you need to **make sure that MongoDB is running**. On Windows machine,  make sure that `mongod.exe` is running on the background. If it isn't, open command prompt as administrator and type `net start MongoDB`.
<center>
<img src="images/mongo_cmd.gif" width="1080" align="center"/>
</center>

> If you encounter an error, MongoDB folder might not have been added to the PATH environment variable. Click <a href = 'https://dangphongvanthanh.wordpress.com/2017/06/12/add-mongos-bin-folder-to-the-path-environment-variable/' >here</a> for troubleshooting. Remember to add `\` to the `....\bin` when adding to PATH.

> Some error could be fixed by adding the folder `data\db`

> Compatible `pymongo` version is 4.6.1. So, to avoid installing a noncompatible version of `pymongo` use `pip install --force-reinstall -v "pymongo==4.6.1"` 

Some useful commands to run on MongoDB shell
- `help` : get the available shell commands
- `show dbs` : show the currently available databases in MongoDB
- `use <db_name>` : set current database to `<db_name>`
- `db.createCollection(<collection_name>)` : create collection named `<collection_name>` in the database
- after you have set your current database, you can insert documents into the database by running `db.<collection_name>.insert(<json_obj>)`
- `show collections` : show the available collections in the current database

> Instead of creating collection with `db.createCollection(<collection_name>)`, `db.<collection_name>.insert(<json_obj>)` will automatically create the collection with the document is added. 

### Exercise 
On MongoDB shell, create a database called `test_info` and insert the following JSON object as a document in the collection `Person` in the database.

>```python 
>{
> 'name':'John Lim',   
> 'class': '18S01',   
> 'hobbies': ['running','kayaking','gaming']   
>}
>```


In [None]:
#YOUR_CODE_HERE
db.test_info.insert({'name':'John Lim','class': '18S01','hobbies': ['running','kayaking','gaming']})

## 15.3.2 Interacting with MongoDB with `pymongo`

Similar to relational databases, we need to know how to execute the important database operations (CRUD) with MongoDB as well. However, for MongoDB, we will skip on the MongoDB shell commands and go straight up to the commands in `pymongo`, which is a Python to interact with MongoDB databases (as warned earlier, keep the MongoDB running else you will encounter errors.)

## 15.3.3 Connecting to MongoDB database with `pymongo` 
Roughly speaking, to work with the database,
1. We first **establish connection** to the MongoDB server by creating `pymongo.MongoClient` object to `localhost` with the default port `27107`
2. Access the database through the client.
3. Access the collection through the database.
4. Do your query, insertion, updating and deletion.

### Example 26

The code below illustrates the process of connecting to the database `test_info` and accessing the collection `Person` with `pymongo`.

In [8]:
# We can actually do 
# import pymongo
# but this means that at line 8, we'll have client = pymongo.Mongoclient('localhost', 27017)

from pymongo import MongoClient 

try:
    client = MongoClient('localhost', 27017) #localhost is your local computer address 127.0.0.1
    print("Connected successfully!!!")
except:
    print("Could not connect to MongoDB")

# client = MongoClient('localhost', 27017)

db = client['test_info']

coll = db['Person']

# Note that for pymongo, we don't need to close the connection as it's done automatically for us. 

print(list(coll.find()))

for i in coll.find():
    print(i)

Connected successfully!!!
[{'_id': ObjectId('620b1c144bb116337cc7392e'), 'name': 'John Lim', 'class': '18S01', 'hobbies': ['running', 'kayaking', 'gaming']}]
{'_id': ObjectId('620b1c144bb116337cc7392e'), 'name': 'John Lim', 'class': '18S01', 'hobbies': ['running', 'kayaking', 'gaming']}


# 15.4 CRUD operations with `pymongo` 
Unlike `sqlite` which do CRUD operations by passing SQL statements into the `execute` command, the CRUD operations with `pymongo` is done through various methods to the objects found in `pymongo`. Some of the methods act on  `pymongo.collection.Collection` objects and they are:
- `insert_one()` : insert one document into a collection
- `insert_many()` : insert more than one document into a collection
- `find()` : to query documents from the collection
- `update_one()` : to update a document in the collection
- `update_many()` : to update more than one document in the collection

## 15.4.1 Creating Database and Collection
To create databases and collection in MongoDB with `pymongo` is a simple task. We just need to 
- connect to a **running** MongoDB server,
- create a connection through `MongoClient` object
- access the database through the connection object by treating it like a Python `dict` object
- access the collection through the database object also by treating it like a Python `dict` object.

So the boilerplate code is as such

In [40]:
from pymongo import MongoClient 

try:
    client = MongoClient('localhost', 27017) #localhost is your local computer address 127.0.0.1
    print("Connected successfully!!!")
except:
    print("Could not connect to MongoDB")

db = client['test_info'] # where <DATABASE_NAME> should be replaced with appropriate string

coll = db['Person']  

coll.insert_one({'name':'John Lim','class': '18S01','hobbies': ['running','kayaking','gaming']})
print(list(coll.find()))

Connected successfully!!!
[{'_id': ObjectId('620b1fe87ce2d88427ea2ef0'), 'name': 'John Lim', 'class': '18S01', 'hobbies': ['running', 'kayaking', 'gaming']}, {'_id': ObjectId('620b20157ce2d88427ea2ef1'), 'name': 'Ben', 'age': '15', 'hobbies': ['running', 'reading', 'gaming']}, {'_id': ObjectId('620b20157ce2d88427ea2ef2'), 'name': 'Lim Bo', 'class': '18S01', 'hobbies': ['gaming']}, {'_id': ObjectId('620b21777ce2d88427ea2ef3'), 'name': 'Ben', 'age': '30', 'hobbies': ['running', 'reading']}, {'_id': ObjectId('620b21807ce2d88427ea2ef4'), 'name': 'Ben', 'age': '30', 'hobbies': ['running', 'reading']}, {'_id': ObjectId('620c58da7ce2d88427ea2ef6'), 'name': 'John Lim', 'class': '18S01', 'hobbies': ['running', 'kayaking', 'gaming']}, {'_id': ObjectId('620c58e27ce2d88427ea2ef8'), 'name': 'John Lim', 'class': '18S01', 'hobbies': ['running', 'kayaking', 'gaming']}]


In [20]:
extra_person = [{
    'name':'Ben',   
    'age': '15',   
    'hobbies': ['running','reading','gaming']   
},{
    'name':'Lim Bo',   
    'class': '18S01',   
    'hobbies': ['gaming']   
}
]

for i in extra_person:
    coll.insert_one(i)

print(list(coll.find()))

[{'_id': ObjectId('620b1fe87ce2d88427ea2ef0'), 'name': 'John Lim', 'class': '18S01', 'hobbies': ['running', 'kayaking', 'gaming']}, {'_id': ObjectId('620b20157ce2d88427ea2ef1'), 'name': 'Ben', 'age': '15', 'hobbies': ['running', 'reading', 'gaming']}, {'_id': ObjectId('620b20157ce2d88427ea2ef2'), 'name': 'Lim Bo', 'class': '18S01', 'hobbies': ['gaming']}]


In [42]:
coll.insert_one({
    'name':'Ben',   
    'age': '25',   
    'hobbies': ['running','reading']   
})

print(list(coll.find()))

[{'_id': ObjectId('620b1fe87ce2d88427ea2ef0'), 'name': 'John Lim', 'class': '18S01', 'hobbies': ['running', 'kayaking', 'gaming']}, {'_id': ObjectId('620b20157ce2d88427ea2ef1'), 'name': 'Ben', 'age': '15', 'hobbies': ['running', 'reading', 'gaming']}, {'_id': ObjectId('620b20157ce2d88427ea2ef2'), 'name': 'Lim Bo', 'class': '18S01', 'hobbies': ['gaming']}, {'_id': ObjectId('620b21777ce2d88427ea2ef3'), 'name': 'Ben', 'age': '30', 'hobbies': ['running', 'reading']}, {'_id': ObjectId('620b21807ce2d88427ea2ef4'), 'name': 'Ben', 'age': '30', 'hobbies': ['running', 'reading']}, {'_id': ObjectId('620c58da7ce2d88427ea2ef6'), 'name': 'John Lim', 'class': '18S01', 'hobbies': ['running', 'kayaking', 'gaming']}, {'_id': ObjectId('620c58e27ce2d88427ea2ef8'), 'name': 'John Lim', 'class': '18S01', 'hobbies': ['running', 'kayaking', 'gaming']}, {'_id': ObjectId('620c593a7ce2d88427ea2ef9'), 'name': 'Ben', 'age': '25', 'hobbies': ['running', 'reading']}]


## 15.4.2 Query Documents in a collection [READ]
Querying for documents in collection `coll` is done through `find()` method of `pymongo.collection.Collection` object. It acts like `SELECT` statement for SQL. 
- when no parameters is passed into `find()`, it will return a `Cursor` object that contains **all** the documents in the collection. The syntax is 
> `<my_collection>.find()`
- the interesting thing happens when you have specific attributes that you want to inspect from the collection and this is where things could get a little complicated. To do this, we pass `{<attribute_name_1>:<value_1>,{<attribute_name_2>:<value_2>/,....}` as an argument, where `...` represents more attributes to cover. Example:
> `<my_collection>.find({'name':'TAN AH GAO', 'address' : 'KERBAU ROAD'})`
- furthermore, we can even use comparison operators on attributes to further filter our result. The operators are

<center>

| **Comparison Operator** | **Description** | 
|-|-|
| `$eq` | Matches values that are equal to the given value. | 
| `$gt` | Matches if values are greater than the given value. | 
| `$lt` | Matches if values are less than the given value. | 
| `$gte` | Matches if values are greater or equal to the given value. |
| `$lte` | Matches if values are less or equal to the given value. |
| `$in` | Matches any of the values in an array. |
| `$ne` | Matches values that are not equal to the given value. |
| `$nin` | Matches none of the values specified in an array. |

</center>

<br>

In this case, we pass `{<attribute_name>:{<comparison_operator>:<value>}` as an argument to `find()` method. Example:
> `<my_collection>.find({'name':'TAN AH GAO', 'age' : {'$lt':40}})`

Also, we can query for attributes using the logical operators below as well. Similarly, we pass `{<attribute_name>:{<logical_operator>:<array_of_value/expression>}` as an argument to `find()` method.

<center>

| ** Logical Operator** | **Description** | 
|-|-|
| `$and` | Joins query clauses with a logical `AND` returns all documents that match the conditions of both clauses. | 
| `$or ` | oins query clauses with a logical OR returns all documents that match the conditions of either clause. | 
| `$not ` | Inverts the effect of a query expression and returns documents that do not match the query expression. | 
| `$nor` | Joins query clauses with a logical NOR returns all documents that fail to match both clauses. | 

</center>

You can head to <a href='https://docs.mongodb.com/manual/reference/operator/query/'>the official query operator docs</a> for more examples and operators. However, the ones mentioned above should suffice for most cases.

### Exercise
In the `Person` collection in the `test_info` database,

0. print out all the documents in the collection,
1. find the documents with `Ben` in its `name` field and print it out
2. find the documents with `age` field having value greater than 20.

In [45]:
#YOUR_CODE_HERE
# print(list(coll.find()))

print(list(coll.find({'name':'Ben','age':'30'})))

[{'_id': ObjectId('620b21777ce2d88427ea2ef3'), 'name': 'Ben', 'age': '30', 'hobbies': ['running', 'reading']}, {'_id': ObjectId('620b21807ce2d88427ea2ef4'), 'name': 'Ben', 'age': '30', 'hobbies': ['running', 'reading']}]


## 15.4.3 Updating Documents in a collection [UPDATE]
Assume `<myquery>` is JSON-like object (think Python `dict`) of the form `{<attribute_name_1>:<value_1>,<attribute_name_2>:<value_2>/,....}`. To update documents matching our requirements in `<myquery>` into a MongoDB database with `pymongo`, we
1. Access the collection `coll` that you want to update document(s) in
2. To:
    - update one documents satisfying your query, use `coll.update_one(<my_query>,<my_values>)`, where `<my_values>={'$set':{<attribute_name_1>:<value_1>,<attribute_name_2>:<value_2>/,....}}`
    > Take note on the format of `<my_values>`
    - update all documents satisfying your query, use `coll.update_many(<my_query>,<my_values>)`.
3. `update_one()` and `update_many()` methods accept a Boolean parameter `upsert` (default is `False`) which modify their behaviours a little bit. If `upsert=True` and the query does not match any documents in MongoDB, the method will perform an insertion of the document instead. Otherwise, if the query does not match any documents, no insertion of such records. This approach could be handy if you want to avoid using conditionals to handle such cases. 

### Exercise
In the `Person` collection in the `test_info` database, update all the documents with `Ben` in its `name` field and set the `age` field for such documents to `20`.

In [51]:
#YOUR_CODE_HERE
# print(list(coll.find()))

my_query = {'name':'Ben'}
my_update = {'$set':{'age':'30'}}

coll.update_many(my_query,my_update)

print(list(coll.find({'name':'Ben'})))

[{'_id': ObjectId('620b20157ce2d88427ea2ef1'), 'name': 'Ben', 'age': '30', 'hobbies': ['running', 'reading', 'gaming']}, {'_id': ObjectId('620b21807ce2d88427ea2ef4'), 'name': 'Ben', 'age': '30', 'hobbies': ['running', 'reading']}, {'_id': ObjectId('620c593a7ce2d88427ea2ef9'), 'name': 'Ben', 'age': '30', 'hobbies': ['running', 'reading']}]


## 15.4.4 Delete Documents in a collection [DELETE]
Assume `<myquery>` is JSON-like object (think Python `dict`) of the form `{<attribute_name_1>:<value_1>,<attribute_name_2>:<value_2>/,....}`. To delete documents matching our requirements in `<myquery>`, we
1. Access the collection `coll` that you want to delete document(s) in
2. To:
    - delete one document satisfying your query, we use `coll.delete_one(<my_query>)`
    - delete all documents satisfying your query, we use `coll.delete_many(<my_query>)`
    - delete all documents in the collection, we pass empty query `{}` in the `coll.delete_many()` method.

### Exercise
In the `Person` collection in the `test_info` database, 
1. delete all the documents with `Ben` in its `name` field and print the database documents,
2. delete all documents from `test_info` database and verify that the database is empty.

In [53]:
#YOUR_CODE_HERE
my_query = {'name':'Ben','age':'30'}

coll.delete_many(my_query)

print(list(coll.find()))

[{'_id': ObjectId('620b1fe87ce2d88427ea2ef0'), 'name': 'John Lim', 'class': '18S01', 'hobbies': ['running', 'kayaking', 'gaming']}, {'_id': ObjectId('620b20157ce2d88427ea2ef2'), 'name': 'Lim Bo', 'class': '18S01', 'hobbies': ['gaming']}, {'_id': ObjectId('620c58da7ce2d88427ea2ef6'), 'name': 'John Lim', 'class': '18S01', 'hobbies': ['running', 'kayaking', 'gaming']}, {'_id': ObjectId('620c58e27ce2d88427ea2ef8'), 'name': 'John Lim', 'class': '18S01', 'hobbies': ['running', 'kayaking', 'gaming']}]


In [54]:
coll.delete_many({})

print(list(coll.find()))

[]


# 15.5 Situations to use SQL or NoSQL

The choice of whether to use a SQL or NoSQL database depends on the type of data being stored as well as the nature of tasks that the database is required to perform.

SQL databases should be used if:
- The data being stored has a fixed schema.
- Complex and varied queries will be frequently performed.
- The atomicity, consistency,isolation and durability (ACID) properties are critical to the database.
- There will be a high number of simultaneous transactions.

NoSQL databases should be used if:
- The data being stored has a dynamic schema, (i.e., unstructured data with flexible data types).
- Data storage needs to be performed quickly.
- There will be an extremely large amount of data (i.e., Big Data).


# 15.6 Advantages of NoSQL Databases over Relational Databases

- Relational databases have a predefined schema that is difficult to change. Even if you wish to add a field to a small number of records, you still need to include the field for the entire table. Therefore, it can be difficult to support the processing of unstructured data using relational databases compared to NoSQL databases.
- Unlike NoSQL databases, relational databases do not usually support hierarchical data storage, where less frequently-used data is moved to cheaper, slower storage devices. This means that the cost of storing data in a relational database is more expensive than storing the same amount of data in a NoSQL database.
- Relational databases are mainly vertically scalable while NoSQL databases are mainly horizontally scalable. Vertically scalable means that improving the performance of a relational database server usually requires upgrading an existing server with faster processors and more memory. Such high-performance components can be expensive and upgrades are limited by the capacity of a single machine. On the other hand, horizontally scalable means that the performance of a NoSQL database can be improved by simply increasing the number of servers. This is relatively cheaper as mass-produced average-performance computers are easily available at low prices.
- Relational databases are stored in a server, which makes the database unavailable when the server fails. NoSQL databases are designed to take advantage of multiple servers so that if one server fails, the other servers can continue to support applications.

# Appendix 

## A. Loading json files with `json` Module
Sometimes the items are provided in a file with `.json` extension and you are asked to retrieve them from the file. Instead of manually handling the import the items using standard file read, we can actually import the built-in `json` module in python and use the `.load()` method to put all of them in a list, i.e. by doing this, we get a list of dictionary objects in Python.

The boilerplate code is given below.

In [None]:
import json

with open('<file_name>.JSON') as f:
    <items_list> = json.load(f)

## B. MongoDB VSCode Extension
If you are using VSCode, you can also install MongoDB VSCode Extension <a href = 'https://marketplace.visualstudio.com/items?itemName=mongodb.mongodb-vscode'>here</a>. For our purpose it helps you to:
- Navigate your databases, collections, and read-only views
- See the documents in your collections
- Edit documents and save changes to the database
- Get a quick overview of your schema and your indexes