# Learning Objectives

- [ ] 3.3.6 *Understand how NoSQL database management system addresses the shortcomings of relational database management system (SQL). 
- [ ] 3.3.7 *Explain the applications of SQL and NoSQL. 
- [ ] 3.3.8 *Use a programming language to work with both SQL and NoSQL databases. 

# References

1. Leadbetter, C., Blackford, R., & Piper, T. (2012). Cambridge international AS and A level computing coursebook. Cambridge: Cambridge University Press.
2. https://www.mongodb.com/compare/mongodb-dynamodb


Recall that a **database** is a collection of related data where all records have the same structure or  collection of data stored in an organised or logical manner.

# 13.1 NoSQL databases
Relational databases (commonly referred as SQL databases) work well with structured data since each table's **schema** (the precise description of the data to be stored and the relationships between them) is always clearly defined. However, with the 
increasing number of ways to gather and generate data, we often need to deal with unstructured data. 

For example, a convenience store that frequently refreshes the services it provides may sell both mobile phones as well as groceries. To run the store, information about both mobile phones (e.g., model names and prices) and groceries (e.g prices and descriptions) need to be stored. In the future, the store may also start selling mobile phone subscription plans as well. Storing all this data in the same relational database may not be easy. In this case, non-relational databases, also referred to as NoSQL databases, can offera better choice.

There are four main types of NoSQL databases: 
- key-value databases. In this databasae, data is stored as a collection of key-value pairs in which a key serves as a unique identifier. E.g. Amazon DynamoDB. In this database, your query is limited to the key **only** and values retrieved by the key are not known (opaque). 
- document databases. Document databases work like a hash table, but each key can point to an embedded key-value structure, also known as a **document**, instead of just a single value. (Recall that in a hash table, each key points to a single value or data item.). E.g., MongoDB
- wide column databases. Data tables are stored in terms of column instead of row. 
- graph databases are databases that uses graph structures for semantic queries with nodes, edges, and properties to represent and store data. E.g., neo4j

# 13.2 Differences between SQL and Document Database

- Relational databases have a **fixed, predefined schema** that its tables follows but NoSQL databases usually have **no predefined schema**, which is dynamic and can change easily
- Relational databases contain tables while document databases like MongoDB contain collections. The data types of each field in the table is fixed for relational databases but it is flexible for document databases like MongoDB.
- Relational databases represent data in tables and rows while document databases store  ata as collections of documents.
- For relational databases, joins are usually used to get data across tables, while for document databases like MongoDB there is usually no such joins. Thus it is easier to use relational databases for complex queries rather than NoSQL databases.

# 13.3 MongoDB 

MongoDB is a very popular NoSQL document database, which uses `JSON` (Java Script Notation Object)-like **documents** to store records.  JSON has the format
>```python
> {
>   <attribute_name_1>: <attribute_values_1>,
>   <attribute_name_2>: <attribute_values_2>
>                   ....
> }
>```
which looks like python `dict` object. 

Terminologies used for MongoDB is a little different compared with SQL. Below is the table of terms in MongoDB with corresponding terms in SQL. 

<center>

| **SQL Term** | **MongoDB Term** | 
|-|-|
| `Database` | `Database` | 
| `Table` | `Collection` | 
| `Row/Record` | `Document` | 
| `Column/Field/Attribute` | `Field` |

</center>

## 13.3.1 Running MongoDB
After installation, open command prompt and type `mongo` to run MongoDB shell. To maintain access to the MongoDB databases, you need to **make sure that MongoDB is running**, i.e. don't close MongoDB shell.

<center>
<img src="images/database_create.gif" width="1080" align="center"/>
</center>

> If you encounter an error, MongoDB folder might not have been added to the PATH environment variable. Click <a href = 'https://dangphongvanthanh.wordpress.com/2017/06/12/add-mongos-bin-folder-to-the-path-environment-variable/' >here</a> for troubleshooting.

Some useful commands to run on MongoDB shell
- `help` : get the available shell commands
- `show dbs` : show the currently available databases in MongoDB
- `use <db_name>` : set current database to `<db_name>`
- `db.createCollection(<collection_name>)` : create collection named `<collection_name>` in the database
- after you have set your current database, you can insert documents into the database by running `db.<collection_name>.insert(<json_obj>)`
- `show collections` : show the available collections in the current database

> Instead of creating collection with `db.createCollection(<collection_name>)`, `db.<collection_name>.insert(<json_obj>)` will automatically create the collection with the document is added. 

### Exercise 
Create a database called `test_info` and insert the following JSON object as a document in the collection `Person` in the database.

```python 
>{
> 'name':'John Lim',   
> 'class': '18S01',   
> 'hobbies': ['running','kayaking','gaming']   
>}
```


## 13.3.1 Interacting with MongoDB with `pymongo`

Similar to relational, we need to know how to execute the important database operations (CRUD) with MongoDB as well. However, for MongoDB, we will skip on the MongoDB shell commands and go straight up to the commands in `pymongo`, which is a Python to interact with MongoDB databases (as warned earlier, keep the MongoDB running else you will encounter errors.)

## 13.8.1 Connecting to MongoDB database with `pymongo` 
Roughly speaking, to work with the database,
1. We first **establish connection** to the MongoDB server by creating `pymongo.MongoClient` object to `localhost` with the default port `270107`
2. Access the database through the client.
3. Access the collection through the database.
4. Do your query, insertion, updating and deletion.

### Example 26

The code below illustrates the process of connecting to the database `test_info` and accessing the collection `Person` with `pymongo`.

In [None]:
# We can actually do 
# import pymongo
# but this means that at line 8, we'll have client = pymongo.Mongoclient('localhost', 27017)

from pymongo import MongoClient 

try:
    client = MongoClient('localhost', 27017) #localhost is your local computer address 127.0.0.1
    print("Connected successfully!!!")
except:
    print("Could not connect to MongoDB")

db = client['test_info']

coll = db['Person']

# Note that for pymongo, we don't need to close the connection as it's done automatically for us. 

# 13.8.2 CRUD operations with `pymongo` 
Unlike `sqlite` which do CRUD operations by passing SQL statements into the `execute` command, the CRUD operations with `pymongo` is done through various methods available to it. 


### Exercise 27 [`INSERT`, `DELETE`, `UPDATE`]
1. Try out the following code and check the database again with DB4S. 
    - What changes do you expect from executing the code?
    - What do you observe about the database? 

# 13.4 Situations to use SQL or NoSQL

The choice of whether to use a SQL or NoSQL database depends on the type of data being stored as well as the nature of tasks that the database is required to perform.

SQL databases should be used if:
- The data being stored has a fixed schema.
- Complex and varied queries will be frequently performed.
- The atomicity, consistency,isolation and durability (ACID) properties are critical to the database.
- There will be a high number of simultaneous transactions.

NoSQL databases should be used if:
- The data being stored has a dynamic schema, (i.e., unstructured data with flexible data types).
- Data storage needs to be performed quickly.
- There will be an extremely large amount of data (i.e., Big Data).


In [1]:
import sqlite3

connection = sqlite3.connect("./resources/library_copy.db")

connection.execute("INSERT INTO Borrower(FirstName, Surname, Contact) VALUES('Alex', 'Ong', 98765432)")

#Sometimes it can be clearer if you split your SQL statements into multiple lines and you can use `\` in python for this purpose
connection.execute( "UPDATE Borrower SET Surname = 'Lim' " +\
                    "WHERE FirstName = 'Alex'")
connection.close()

In [None]:
#YOUR_ANSWER_HERE

2. Try the following code and check the database again. What do you observe?

In [None]:
import sqlite3

connection = sqlite3.connect("./resources/library_copy.db")

connection.execute("INSERT INTO Borrower(FirstName, Surname, Contact) VALUES('Alex', 'Ong', 98765432)")
connection.execute( "UPDATE Borrower SET Surname = 'Lim' " +\
                    "WHERE FirstName = 'Alex'")

#addded the following method
connection.commit()

connection.close()

In [None]:
#YOUR_ANSWER_HERE

> It is important to run the commit() method to save the changes to the database. This is equivalent to the action `Write Changes` we used in DB4S.

Alternative to saving the database manually using `commit()` method, similar to file I/O, we can also utilize the `with` statement in Python. 

In [None]:
import sqlite3

connection = sqlite3.connect("./resources/library_copy.db")

with connection:
    connection.execute("INSERT INTO Borrower(FirstName, Surname, Contact) VALUES('Alex', 'Ong', 98765432)")
    connection.execute( "UPDATE Borrower SET Surname = 'Lim' " +\
                        "WHERE FirstName = 'Alex'")

    #commit() method to save the changes is no longer required.

#connection to database still need to be closed
connection.close()

### Exercise 28 [`INSERT`, `DELETE`, `UPDATE`]

1. Try out the following code and check the database with DB4S. 
    - What changes do you expect from executing the code?
    - What do you observe about the database? 

In [None]:
import sqlite3

connection = sqlite3.connect("./resources/library_copy.db")

connection.execute("INSERT INTO Borrower(FirstName, Surname, Contact) VALUES('AlexA', 'Ong', 98765432)")
connection.execute("INSERT INTO Borrower(FirstName, Surname, Contact) VALUES('VijayA', 'Singh', 91919191)")

connection.execute("INSERT INTO Borrower(FirstName, Surname, Contact) VALUES('AlexB', 'Ong', 98765432)")
connection.execute("INSERT INTO Borrower(FirstName, Surname, Contact) VALUES('VijayB', 'Singh', 91919191)")

connection.commit()

connection.close()

2. Try out the following code and check the database again with DB4S. 
    - What changes do you expect from executing the code?
    - What do you observe about the database? 

# 13.3 Advantages of NoSQL Databases over Relational Databases

- Relational databases have a predefined schema that is difficult to change. Even if youw ish to add a fieldto a small number of records, you still need to include the field for the entire table. Therefore, it can be difficult to support the processing of unstructured data using relational databases compared to NoSQL databases.
- Unlike NoSQL databases, relational databases do not usually support hierarchical data storage, where less frequently-used data is moved to cheaper, slower storage devices. This means that the cost of storing data in a relational database is more expensive than storing the same amount of data in a NoSQL database.
- Relational databases are mainly vertically scalable while NoSQL databases are mainly horizontally scalable. Vertically scalable means that improving the performance of a relational database server usually requires upgrading an existing server with faster processors and more memory. Such high-performance components can be expensive and upgrades are limited by the capacity of a single machine. On the other hand, horizontally scalable means that the performance of a NoSQL database can be improved by simply increasing the number of servers. This is relatively cheaper as mass-produced average-performance computers are easily available at low prices.
- Relational databases are stored in a server, which makes the database unavailable when the server fails. NoSQL databases are designed to take advantage of multiple servers so that if one server fails, the other servers can continue to support applications.

In [None]:
import sqlite3

connection = sqlite3.connect("./resources/library_copy.db")

connection.execute("INSERT INTO Borrower(FirstName, Surname, Contact) VALUES('AlexC', 'Ong', 98765432)")
connection.execute("INSERT INTO Borrower(FirstName, Surname, Contact) VALUES('VijayC', 'Singh', 91919191)")

#the following line is added
connection.rollback()

connection.execute("INSERT INTO Borrower(FirstName, Surname, Contact) VALUES('AlexD', 'Ong', 98765432)")
connection.execute("INSERT INTO Borrower(FirstName, Surname, Contact) VALUES('VijayD', 'Singh', 91919191)")

connection.commit()

connection.close()

> `rollback()` method undo the changes to the database. This is equivalent to the action `Revert Changes` we used in DB4S.

### Example 29 [`CREATE TABLE, DROP TABLE`]
Try out the following code blocks and check the database with DB4S. What do you expect the code blocks do?

In [None]:
import sqlite3

connection = sqlite3.connect("./resources/newfile.db")

connection.execute("CREATE TABLE Book(" +\
                   "ID INTEGER PRIMARY KEY, 
                   "Title TEXT" +\
                   ")"
                  )

connection.execute("CREATE TABLE BookToo(" +\
                   "ID INTEGER PRIMARY KEY, 
                   "Title TEXT" +\
                   ")"
                  )
connection.commit()
connection.close()

In [None]:
import sqlite3

connection = sqlite3.connect("./resources/newfile.db")

connection.execute("DROP TABLE Book")
connection.commit()
connection.close()

The last of the CRUD operation we will discuss is the Read/Retrieve operation. We will show 4 ways to do this with `sqlite3` module.
1. iterate the `Cursor` object, which is also created when we run `execute()` method on `Connection` object,
2. use the `fetchone()` method of the `Cursor` object.
3. use the `fetchall()` method of the `Cursor` object.
4. setting the `row_factory` attribute of the `Connection` object as `sqlite3.Row` object. `Row` provides both index-based and case-insensitive name-based access to columns and most useful when we want name-based access to columns.

### Example 30 [`SELECT`]
Try out the following code block 3 times with appropriate commenting and uncommenting of the relevant parts of the code. What can you observe about the type of the output given by each of the approach?

In [None]:
import sqlite3

connection = sqlite3.connect("./resources/library_copy.db")

cursor = connection.execute("SELECT ID, FirstName FROM Borrower")

#Approach 1
for row in cursor:
    print(row)

#Approach 2
row = cursor.fetchone()
while row is not None:
    print(row)
    row = cursor.fetchone()

#Approach 3
rows = cursor.fetchall()
print(rows)

connection.close()

In [None]:
#YOUR_ANSWER_HERE

### Example 31 [`SELECT`]
Try out the following code block 5 times with appropriate commenting and uncommenting of the relevant parts of the code. What can you observe about each output?

In [None]:
import sqlite3

connection = sqlite3.connect("./resources/library_copy.db")

#setting the `row_factory` attribute of `Connection` object
connection.row_factory = sqlite3.Row

cursor = connection.execute("SELECT ID, FirstName FROM Borrower")

#Try 1
for row in cursor:
    print(row)

#Try 2
row = cursor.fetchone()
while row is not None:
    print(row)
    row = cursor.fetchone()

#Try 3
rows = cursor.fetchall()
print(rows)

#Try 4
for row in cursor:
    print(row['ID'])
    print(row['FirstName']

#Try 5
row = cursor.fetchone()
while row is not None:
    print(row['ID'])
    print(row['FirstName'])
    row = cursor.fetchone()

connection.close()

The main advantage of using `Row` objects is that they are more flexible as they behave like `dict` objects; we can index values by column name instead of relying on the order of columns in the original `SELECT` statement.

# 13.8.3 SQL Injection Protection with `sqlite3`
SQL injection is a web security vulnerability that allows an attacker to interfere with the queries that an application makes to its SQL database.  It generally allows an attacker to view data that they are not normally able to retrieve. This might include data belonging to other users, or any other data that the application itself is able to access. In many cases, an attacker can modify or delete this data, causing persistent changes to the application's content or behavior.

Consider a shopping application that displays products in different categories. When the user clicks on the Gifts category, their browser requests the URL:
>```
>
>https://insecure-website.com/products?category=Gifts
>
>```
This causes the application to make an SQL query to retrieve details of the relevant products from the database:
>```
>
>SELECT * FROM products WHERE category = 'Gifts' AND released = 1
>
>```

Note that we can presume that: 
- This SQL query asks the database to return all details (`*`) from the `products` table where the category is `"Gifts"` and `released` is `1`.
- The restriction `released = 1` is being used to hide products that are not released. For unreleased products, presumably `released = 0`.

The application doesn't implement any defenses against SQL injection attacks, so an attacker can construct an attack like:
>```
>
>https://insecure-website.com/products?category=Gifts'--
>
>```

This results in the SQL query:
>```
>
>SELECT * FROM products WHERE category = 'Gifts'--' AND released = 1
>
>```

The key thing here is that the double-dash sequence `--` is a comment indicator in SQL, and means that the rest of the query is interpreted as a comment. This effectively removes the remainder of the query, so it no longer includes `AND released = 1`. This means that all products has potential to be displayed, including unreleased products where `released = 0`.

Here's another example from xkcd.   
<center>
<img src="https://imgs.xkcd.com/comics/exploits_of_a_mom.png" width="800" align="center"/>
</center>

## 13.8.3.1 Parameter Substitution
From the SQL injection example above, we see that user inputs should not be taken wholesale and it is a good idea to first run a validity check on it before being passed to the SQL statements. As such, we can use **parameter substitution**, which makes use of the `?` symbol and optional arguments in the `execute()` method in `sqlite3`.

### Example 32

Consider the following snippet of Python code to delete records in the SQL database where `ID` is between 2 and 4.

In [None]:
# The symbols `?` are placeholders for user inputs
# the second argument in the execute() method is a tuple of user inputs to use for substitution
# Parameter substitution follows the same order in which the placeholders appear in the SQL
execute("DELETE FROM Book WHERE ID > ? AND ID < ?", (2, 4))

### Example 33

The following program can be used to enter new borrowers into the `library.db` database:

In [None]:
import sqlite3

connection = sqlite3.connect("library.db")

while True:
    first = input("Enter first name: ")
    surname = input("Enter surname: ")
    contact = int(input("Enter contact number: "))

    #Note that at this point in the code, we can run the validation checks on the values first, surname and contact
    #before wew pass it to the SQL statement below
    connection.execute("INSERT INTO Borrower(FirstName, Surname, Contact) VALUES(?, ?, ?)", (first, surname, contact))
    connection.commit()

    if input("Continue (Y/N)?").upper() != 'Y':
        break

connection.close()