# Non-Relational DBMS (aka NoSQL DBMS)

It is a database that stores data in a non relational format. It is also a common misconception; NoSQL databases can store relationship data, they just store it differently. NoSQL databases started to gain popularity when the cost of storage dramatically decreased in the late 2000s and with the explosion of web applications, generated data have been increasingly difficult to fit within a relational data model.

## Topics
* Differences, Benefits and Drawbacks Between RDBMS and NoSQL
* XML & JSON Data Formats
 * Extensible Markup Language (XML)
 * JavaScript Object Notation (JSON)
* MongoDB
 * Basic Concepts
 * Datatypes
 * Create, Read, Update and Delete (CRUD)
 
---

## Differences, Benefits and Drawbacks Between RDBMS and NoSQL

Let's look at the 4 properties (*Scalability*, *Cost*, *Flexibility* and *Availability*) that organizations normally take into consideration during the decision making process.

#### Scalability
Scalability is the ability of a system to efficiently handle varying workloads. 

**Scaling Horizontally**<br>
A website can have a sudden spike in traffic thus requiring additional servers to be brought online to handle the additional load. When the spike subsides, those servers can be shut down. The process of adding and removing multiple same hardware machines is called scaling horizontally.

With RDBMS (termed cluster-based), each additional component adds complexity and cost to the overall operations. Therefore the challenge would be have additional database software purchased to maintain a single database system across multiple servers. 

**Scaling Vertically**<br>
To upgrade the machine's hardware to improve the performance of the DBMS. Most of the time the RAM chips are the first component to be changed.

![db_scaling.png](attachment:c39c126f-b044-4204-8746-c4d22d5cae99.png)

In retrospect, scaling horizontally is more flexible than scaling vertically as adding or removing more servers is easier physically. NoSQL databases are designed for scaling horizontally without much intervention however, should the server machine be scaled vertically, the DBMS would need to be migrated to the new server (especially when upgrading data storage medium). DBMS migration is generally dependent on which hardware components are being added to or replaced on the server but regardless, downtime is still required for the physical replacement process. 

#### Cost
The main cost DBMS lies in the licensing model of commercial software vendors. These models include charging by the server specifications upon which the DBMS is to execute upon, the number of concurrent users on the database or the number of named users allowed have access to the software. Each of these models posses challengers for any business or organization and their users. To complicate matters, the demands of web applications are highly unpredictable (we will not be able to predict how many users there will be in 6 months, a year or more) therefore it is difficult to set a budget for the licenses.


NoSQL databases are generally open-sourced and thus free to use on as many servers as is needed. In addition, there are 3rd party companies that provide commercial support for open-sourced NoSQL DBMS therefore software support is still available to businesses and organizations.

#### Flexibility
This characteristic is situational as it is not always easy to decide which type (Relational or NoSQL) of database to use. Consider the scenario: a relational database is used typically for storing staff details in an organization while the tracking of products attributes on an e-commerce website typically uses the NoSQL database due to the ability to store variable length fields.

A key requirement for relational databases is that the database designers are required to know the database schemas (that is all the tables, fields or columns and relationship between the tables) before an application can be developed to support that database. In addition, most of the fields in the tables are assumed to be required by each record and thus filled with some value. This aspect makes relational database less flexible as changes to the tables will also mean changes to the application.

However with NoSQL database, database schemas are not required and database designers are able to create variable length tables based on the changing needs of the product without changing the database design.

For example, tracking of product attributes on an e-commerce store. <br>
Using a document-oriented NoSQL database, it would treat each product as a document (table) and each document contains a variable number of attributes with respect to the product.<br>
Using a RBDMS, the developers would need to know before hand the number of attributes the product must have. Should a product not have those fields, it will be left blank. Any changes to the products means the whole database design may need to change.

#### Avaliability
Everything needs to be always online or always available, now-a-days. That is a standard that we have learnt to accept therefore having your database on a single server would be bad. On the other hand, having backup servers and dulipcating data does not help with processing workload instead it's adding more work.

NoSQL databases are designed to take advantage of multiple, low-cost servers; server failure or server maintenance can be easily mitigated with the other cluster of servers picking up the extra work load. Although performance may drop, the website or web application is still available for use.

       ###############################################################################################################

Both types of DBMS have evolved to meet the changing applications requirements. RDBMS have evolved to virtually replace all other types of older data management systems but it had difficulty meeting the requirements for the exponential growth of e-commerce and social media. NoSQL databases were created to address these limitations. Most organization uses both DBMS to store different types of data as the applications grows in complexities.

**Main differences are**
<style>
    tr:nth-child(even) { background-color:#f2f2f2; }
    table: width="100%"
</style>
<table align="center" border=1>
    <colgroup>
       <col span="1" style="width: 50%;">
       <col span="1" style="width: 50%;">
    </colgroup>
    <tr>
        <th align="center">NoSQL</th>
        <th align="center">RDBMS</th>
    </tr>
    <tr>
        <td>Dynamic schema thus allowing unstructured data.</td>
        <td>Define the schema before adding data.</td>
    </tr>
    <tr>
        <td>Variety of ways data can be structured for storage such as document-oriented, graph based, key-value pairs or wide column thus suited for hierarchical data.</td>
        <td>Only table based, not so good for unstructured data.</td>
    </tr>
    <tr>
        <td>Scaled horizontally.</td>
        <td>Scaled vertically.</td>
    </tr>
    <tr>
        <td>Uses API like function calls, complex queries are harder for unstructured data.</td>
        <td>Uses Structured Query Language (SQL) commands therefore complex queries are easier to handle.</td>
    </tr>
    <tr>
        <td>Open-source community.</td>
        <td>Closed sourced with licensing fees.</td>
    </tr>
</table>

---
## XML & JSON Data Formats

One of the most common forms of NoSQL databases is the document-oriented database. These databases store data as documents however the encoding or standard format differs. The document formats are either XML, YAML, JSON or its binary variant BSON.

#### Extensible Markup Language (XML)
XML is a markup language that stores data in a structure that is similar to how HTML is used to construct webpages. The basic XML structure involves data being wrapped within custom tags.

```xml
<?xml version="1.0"?>
<contactinfo>
    <address category="office">
        <name>Olympus Inc.</name>
        <location>587 Drive, Mount Olympus, Greece</location>
        <contact>+30 281 8154 2445</contact>
    </address>
</contactinfo>
```

XML is extensible because the order in which you decide to structure your data defines how your data is to be processed or displayed. We can add more data to the structure by adding more tags to group related data.

```xml
<?xml version="1.0"?>
<contactinfo>
    <name>John Doe</name>
    <address category="office">
        <name>Olympus Inc.</name>
        <location>587 Drive, Mount Olympus, Greece</location>
        <contact>+30 281 8154 2445</contact>
    </address>
    <address category="home">
        <street>54 Moon Beam Drive</street>
        <houseno>8</houseno>
        <postalcode>487510</postalcode>
    </address>
</contactinfo>
```

In a nutshell, the structure of XML can be shown pictorially as
![xml_format.png](attachment:f09c0525-6ab4-4b1b-8b61-5959dc440aa3.png)

It's properties are:
* **Unicode characters** - XML documents uses strings of Unicode characters but not all Unicode characters are valid. Invalid unicode characters are `&`, `<` and `>` in addition, `"` and `'` are invalid in attributes.
* **Markup and Content** - information making up the XML document are split into 2 types, namely markups and content. Markup strings are denoted by starting with the `<` character and ending with the `>` character. Content are the strings within the markup strings.
* **Tags** - is the markup string that begins with `<` and ends with `>`. There are 3 types of tags a start-tag `<houseno>`, an end-tag `</houseno>` and an empty-element tag `<phone />`. Note the placement of the forward slash `/` in the end-tag and the empty-element tag structure.
* **Elements** - are a matching pair of tags and/or content that begins with a start-tag and ends with a matching end-tag or consist of only of the empty-element tag. The element's content is sandwiched between the start-tags and end-tags. It is possible to nest elements within elements, in a parent-child relationship structure. Like the `<address>` tag in the example above.
* **Attributes** - are the name-value pairs that exists within the start-tag or empty-element tag. Take the example `<address category="office">`, the attribute name is `category` and its value is `office`. Each XML attribute can only have a single value and each attribute can only appear once in an element. Occasionally, there will be cases where there is a need to place multiple values to an attribute. This can be done using delimiter separated values. Usually the delimiters used are either comma `(,)`, semi-colon `(;)` or a single white space character.
* **XML declaration** - this is the declaration at the top of every XML document that describes the XML document.

Processing of XML documents normally requires some 3rd party XML parsers to read the document before the application can extract and process the data. Generally, DBMSs would provide some form of XML manipulation functionalities.

#### JavaScript Object Notation (JSON)
JSON is an open standard file and data interchange format that is used to transfer data between programs. It is a text format that is based on JavaScript object literals. Despite its name and origins, JSON is platform independent. It uses attribute-value pairs of information to store and transmit the data as objects. This format is highly friendly with Python's Dictionary object.

For example, a pair of shoes can have attributes such as brand, colour, size, type of insole, etc. The JSON description of this shoe would be
```json
{ "shoe": 
    {
        "brand": "Sketchers",
        "colour": "black",
        "size": 38,
        "insole": "memory foam"
    }
}
```

The main things to note about JSON are

* Names/keys are always located on the left separated by the colon (`:`) character followed by the values on the right.
* Double quotes (`""`) are always used to enclose both names/keys and/or values (depending on the value's datatype).
* Although any alpha numeric characters can be used within the double quotes, it does not mean that you should haphazardly name the keys as it may cause issues during processing.
* Curly brackets (`{}`) surrounding 1 or more key-value pairs are used to denote an object and multiple key-value pairs are separated using commas (`,`).
* Values that are of type array or list uses the square brackets (`[]`).
* JSON files have the extension `.json`.

<br>

JSON also has a set of datatypes namely *Object, String, Number, Boolean, Null* and *Array*.
```json
{
    "customer": { //object
        "name": "John Doe", //string
        "weightInKg": 70, //numeral
        "head": {
            "hair": {
                "hairColour": "brunette",
                "length": "short",
                "style": "crew-cut"
            },
            "eyeColour": "blue",
            "piercings": null, //null
        },
        "tatoos": ["dove", "eagle", "crane"], //array
        "isMarried": false //boolean
    }
}
```

The datatypes are breakdown as follows:
* The attributes `customer` and `head` are of type **Object**. Objects are also know as "the root" and have a list of key-value pairs surrounded by curly brackets.
* The attributes like `name`, `hairColour`, etc have values of type **String**. Strings can be comprised of any Unicode characters and are always enclosed by the double quotes. Escape characters, that is special characters that have a preceding backslash (`\`) are used to denote to the parser to treat that character as a literal.
* The attribute `weightInKg` has a value of type **Number**. Numbers are integers, decimals, negative numbers or exponents. Numbers typically do not require the double quotations. Very large numbers like the mass of Earth in kg are denoted using the E Notation.
* The attribute `piercings` has the value of type **Null**. In programming terms, `null` represents zero, zilch or none without using a number because certain attributes do not make sense if we give it a numerical quantity.
* The attribute `tatoos` has the value of type **Array**. An array is a container of sorts for comma separated items. An array is similar to a JSON object but it only stores the **values**. Arrays also uses an indexing notation starting from `0` to `n-1` where `n` is the maximum size of the array.
* The attribute `isMarried` has the value of type **Boolean**. Boolean data only have 2 values, `True` or `False`. In JSON, these values are **always in lowercase**: `true` or `false`.

---
## MongoDB
Is a NoSQL document-oriented database. MongoDB stores documents using Binary JSON (BSON) which is a JSON-like data structure with additional datatypes.

### Basic Concepts

* **Documents** - a document is a basic unit of data. It is similar to a row in relational databases. It stores data in a structure that is an ordered set of type-sensitive and case-sensitive key-value pairs which can be mapped naturally to most programming languages. Since MongoDB uses the BSON data structures, *values* follows the same rules the JSON format and *keys* uses any UTF-8 characters except `\0` (end of key), period (`.`) and dollar (`$`) characters which are reserved characters. Duplicate *keys* are also no allowed.

 ```python
# different case for the keys, same value type
{"name": "John"}
{"Name": "John"}
# different value type, same case for the keys (will cause an error because of duplicate keys)
{"age": 56}
{"age": "56"}
 ```


* **Collections** - are the construct that stores the documents and it has dynamic schemas because documents can have different *shapes* (referring to the different number of key-value pairs). It is imperative that there be separate collections that are used to store related documents as it would make querying, aggregation and indexing more efficient. This is liken to storing of related data into tables in a relational database. 

 Like JSON rules for documents, collection names follow similar rules but with a few more additional rules. The  naming convention specifies that there must be no `\0` (end of key) and `$` (dollar) characters, the empty string (`""`) is not valid and try not to name collections starting with `system.` as those are reserved for internal collections.
 
* **Databases** - these store the collections. Like RDBMS, MongoDB is able to manage multiple databases and within each database, there are multiple collections and within each collection, there are multiple documents. A rule of thumb is to store all data for a single application in the same database.

 ![rdbms_mongo_same.png](attachment:1441b2e7-298d-4cb2-9476-21d710fab01e.png)
 
  Like naming collections, naming of databases follow similar rules but with more restrictions: empty strings (`""`) are not valid, characters such as `/` (forward slash), `\` (backslash), `.` (period), `"` (double quotes), `*` (asterisk), `<` (less than), `>` (greater than), `:` (colon), `|` (pipe), `?` (question mark), `$` (dollar),  (single whitespace), `\0` (end of key) are not allowed, names are case-insensitive and are limited to a maximum of 64 bytes (approximately 64 characters if we are using only the English alphabets). 

### Datatypes
Since MongoDB documents uses BSON data structures, it not only supports the same 6 datatypes that JSON supports, it has additional datatypes. Lets recap the 6 supported JSON datatypes and how they are used in MongoDB:

* **Null** - is used to represent a `null` value or a non existent field. Same in MongoDB.
* **Boolean** - used for values `true` and `false` or `1` and `0`. Same in MongoDB.
* **Number** - which supports integers, decimals, negative numbers and exponentials. MongoDB splits this classification up into *32-bit integers*, *64-bit integers*, *doubles* and *decimal128* which is a decimal 128-bits floating-point number (used especially for monetary applications to ensure precision and accuracy).
* **String** - UTF-8 characters. Same in MongoDB.
* **Object** - used for embedded JSON object. In MongoDB it is the same, an object is a document and documents can have embedded documents
* **Array** - a list of values that can be represented as an array. Same in MongoDB.

Some of the additional datatypes are:

* **Binary Data** - string of arbitrary bytes. This datatype is the only option used to store non UTF-8 strings.
* **ObjectId** - this is the 12-bytes unique key id of each document stored. It has the *key* name: `_id`. These can either be autogenerated by MongoDB or allocated via application.
* **Date** - dates are stored as `ISODate` objects, it uses the ISO 8601 data and time format.
 ```python
 { "_id" : ObjectId("5f33b737263f9a36e29149f0"), "c" : ISODate("2020-08-12T09:32:39.864Z") }
 ```
* **Regular Expression** - stores regular expressions of the format PCRE (Perl Compatible Regular Expressions)
* **Code** - this represents JavaScript code.

### Create, Read, Update and Delete (CRUD)

Unlike relational databases which uses SQL, NoSQL databases uses their own blend of query languages that are similar to function calls. Each interaction is done through a variety of function calls. However, there is a **caveat**, depending on the driver upon which you are using to interact with the MongoDB, the function names have slight differences in them. 

For example, the function to insert 1 document to a collection is `insertOne()` in Mongo Shell but it is `insert_one()` in PyMongo.

> Make sure everyone can open MongoDB shell

Mongo Shell will connect to the MongoDB Server on the local host and the default database it will use is the `test` database. MongoDB does not actually create a database unless a collection or a record within a collection has been added to the database. **Note** that Mongo Shell functions uses the `db` variable at the start of every function call. This `db` variable contains the the current database name from the `use` command. 

The way to call the functions also involves chaining the collection name to the function call therefore it has the structure `<database_name>.<collection_name>.function_name()`. A full range of Mongo Shell functions and commands can be found [here](https://docs.mongodb.com/manual/reference/method/).

Useful commands to help show information on current database or collections:

* `use <database_name>` - switch databases, also "creates" the database
* `show databases` or `show dbs` - show all current databases managed by MongoDB
* `show collections` - show the current collections in the database
* `show users` - list the users that can use this database
* `show roles` - list all the roles (user-defined and built-in) affixed to the current database
* `show profile` - lists the 5 most recent operations that took more than 1 millisecond to complete
* `help` or `db.help()` or `db.collection.help()`- show help (general), database related functions and collection related functions respectively.

#### Insert Function
Inserting documents into a collection can be done either directly using the function parameters or externally then passed to the function parameters.

![db_insert_breakdown.png](attachment:c94c296f-5ff2-4596-a6c2-ea20111aeb1f.png)

Insert into a collection functions are as follows:
* `db.collection.insertOne()` - insert one document to the collection either directly or using a variable.

 ![db_insertOne.PNG](attachment:4bef117e-044f-456b-95c3-fbf54a144c9b.PNG)
 
* `db.collection.insertMany()` - insert multiple documents to the collection. To insert more than 1 document, either group your documents in a list then assign it to a variable or directly type them into the function parameter as a list of comma separated documents. In addition, this function has an `ordered` parameter which is used to define if the documents to be inserted are to be ordered or unordered. Its default value is `true`.

**Note** that the insert functions does very little validation. It only checks for the basic document structure and then adds an object id (`_id` field) should there be none provided by the application. One of the basic document structure checks is that all **documents size** must be **less than** 16MB. It is also fairly easy to insert invalid data therefore only allow trusted sources such as applications to perform insert operations to the databases.

#### Delete Function
Deleting or removing documents, collections or databases are done via the `delete` or `drop` functions along with a delete filter to target the documents, collections or databases. The functions are 

* `db.collection.deleteOne()` - removes the first instance of the document that matches the delete filter from the collection.
* `db.collection.deleteMany()` - removes all documents that matches the delete filter from the collection.
* `db.collection.drop()` - delete the specified collection.
* `db.dropDatabase()` - delete the current database.

The delete filter (pertaining to the documents) is the key-value pair of part of a document that is to be deleted. For example, if we have a document with the following data

```python
{
   _id: ObjectId("563237a41a4d68582c2509da"),
   stock: "Brent Crude Futures",
   qty: 250,
   type: "buy-limit",
   limit: 48.90,
   creationts: ISODate("2015-11-01T12:30:15Z"),
   expiryts: ISODate("2015-11-01T12:35:15Z"),
   client: "Crude Traders Inc."
}
```

To delete a single document, the delete filter would consist of any unique key-value pairs (such as `_id: ObjectId("563237a41a4d68582c2509da")`) from this document but if we wanted to delete multiple documents, the delete filer would use a common key-value pair like `client: "Crude Traders Inc."` to remove all documents with who are clients of *Crude Traders Inc.*

![db_delete_breakdown.png](attachment:314193b1-dfc2-432d-b33f-4849d00b06bf.png)

#### Update Function
Whether it is updating a single key-value pair or the whole document, the one of the several following update functions can be used. 

* `db.collection.updateOne()` - updates a single document in the collection.
* `db.collection.updateMany()` - updates multiple documents in the collection.
* `db.collection.replaceOne()` - replaces a single document in the collection.

The *update* functions each takes a filter as the first parameter and a modifier document as the second parameter which describes the changes to be made. The *replace* function takes a filter as the first parameter and a replacement document as the second parameter to replace the document.

**Note** that issuing 2 update commands at the same time will result in the executing style of first come, first served behaviour by the server. This the the default behaviour therefore conflicting updates sent in rapid-fire succession will not corrupt any documents but the last update function call will "win". If that is not desired, you may want to research then implement the Document Versioning pattern.

**Example: Update using the `replaceOne()` function**
```python
{ "_id" : 1, "name" : "Jambo Seafood", "Orchard" : "Ion" },
{ "_id" : 2, "name" : "Crystal Jade Palace", "Orchard" : "Takashimaya", "stars" : 3 },
{ "_id" : 3, "name" : "Chocolate Origin", "Orchard" : "313 Somerset", "stars" : 4 }
```
We would like to replace the document with the id `1` with another document with the missing information. 
![db_replace_breakdown.png](attachment:b3b0054d-f695-40c7-bf6f-039a7aeea751.png)

Rule of thumb when using the `replaceOne` function, it is better to use the `_id` key-value pair as the replacement filter as it is always guaranteed to be unique to the document.

Generally, we don't want to replace the whole document but only certain attributes that's when the `update` functions should be used. the `update` functions are very versatile in the sense that it allow us to specify complex update operations such as altering, adding and removing key-value pairs and, manipulating arrays and embedded documents via the modifier document.

**Example: Using the update function**

Let's say that we have a customer record below:
```python
{
    "_id" : ObjectId("4b253b067525f35f94b60a31"),
    "name" : "joe",
    "age" : 30,
    "sex" : "male",
    "data" : {
        "weightInKg" : 70,
        "heightInCm" : 175
    },
    "favorite book" : "War and Peace"
}
```

We would like to increment from the customer's document: `age` and `weight` values.
![db_update_breakdown.png](attachment:e43a21ba-e168-4364-8f19-1061ba574882.png)

Notice that the modifier document's key has the dollar (`$`) symbol preceding it? These are MongoDB operators and they are used in query, update and aggregation functions. There can be more than 1 operator used in the *modifier document*. The list of all the MongoDB operators can be found [here](https://docs.mongodb.com/manual/reference/operator/update/). The more common update operators that deals with key-value pairs are as listed

* `$inc` - increments the value of the key by the specified amount
* `$mul` - multiplies the value of the key by the specified amount
* `$rename` - rename the key
* `$set` - sets value of the key or add the key-value pair if the pair is not in the document
* `$unset` - removes the key from the document

<br>

So what happens if the value if of type `array`?
* **Adding elements** - use the `$push` operator and it's modifiers to add elements via array index

 ```bash
db.collection.updateOne(
   { _id: 1 },
   {
     $push: {
        scores: {
           $each: [ 50, 60, 70 ],
           $position: 0
        }
     }
   }
)
 ```
 The above example inserts all the elements in the array `[50, 60, 70]` in the key `scores` at position `0` (front of the existing array).  It can also be used to add a new array when there is no existing *key* for this array.
 
* **Removing elements** - there are a few ways we can do that. We can treat the array like a list, queue or stack, then we can use the `$pop` operator with the *key* name and either a `-1` or `1` to remove the first or last element of the array, respectively.

```
# removes the first item of an array
db.collection.update( { _id: 1 }, { $pop: { scores: -1 } } )
# removes the last item of an array
db.collection.update( { _id: 1 }, { $pop: { scores: 1 } } )
```

 The `$pull` operator removes elements based on the specified condition. The code below will remove all elements in the `score` array that are greater than or equal (`$gte`) to 100.
 
 ```
db.collection.update( { _id: 1 }, { $pull: { scores: { $gte: 100 } } } )
 ```

* **Modifying array elements based on position** - this uses the positional operator `$`. Arrays uses the same indexing concept as most programming languages, that is the first element has an index of `0` and the last element is `n-1` where `n` is the length of the array. If we know the position of the element in the array, the following can be used `{"$inc" : {"comments.5.votes" : 1} }` where the `5` denotes the 6th element of the array but when we do not know the position of the element, we use the positional operator `$` coupled with the filter to modify the element.

 ```
db.collection.updateOne( {"comments.name": "Amy"}, {$set: {"comments.$.content": "changed"} } )
 ```
 
 The positional operator `$` can also be coupled with array filters to help modify elements based on the specified condition.
 
 **Example: Updating all elements that are greater than or equal to `100` in the `grades` array**
 
 ```
 db.students.update(
    { },
    { $set: { "grades.$[element]" : 100 } },
    { multi: true,
      arrayFilters: [ { "element": { $gte: 100 } } ]
    }
 )
 ```

#### Query (Read) Functions
The read/querying functions of MongoDB consists of only 2 functions (listed below) but it has a rich set of operators.

* `db.collection.findOne()` - returns the first occurrence of a single document matching the query criteria.
* `db.collection.find()` - returns a `cursor` object that points to all the documents that matches the query criteria.

The difference between the `find()` and `findOne()` function is that the `findOne()` is equivalent to `find()` with the `limit() ` function set to `1`. This `limit()` function is available from `Cursor` object.

![db_find_breakdown.png](attachment:ee2e0996-8a9e-492e-bca8-e75d867ea571.png)

These functions both have a *query criteria* as the first parameter and a *projection* as the second parameter. The *projection* parameter is used to return only the specified keys. Omitting this, will return all keys in the matching document. 

* The *query criteria* is also able to accept multiple conditions such as `{ age: {$gt: 40}, name: {"Joe"} }` which means to find all documents where the `age` is greater than 40 **and** the `name` is equal to "joe".
* The *projection* parameter is used to specify which *keys* to return from the document.

The *projection* document accepts the *key* names and a boolean value of either `1` (True) or `0` (False). Remember that in a document there are many *keys* and it is not always necessary to return the whole document, just the relevant information only. The `0` (False) value for a *key* means that that key is to be excluded from the results. The caveat is that *projection* **cannot** have a mix of inclusion and exclusion values. 

All the keys-value pairs in the *projection* document has to consist fully of either *keys* that are to be returned (inclusion) or *keys* that we do not want (exclusion), save for the `_id` *key* which is exempted from this rule. By default, the `_id` *key* is returned therefore if it is coupled with exclusion values, it will generally result in an error as there will be nothing to return.

**Example: Valid and Invalid projection documents**

```python
# invalid, means we want '_id' and not 'MatrNum'
# change to {"MatrNum":1} to be valid
{"_id":1, "MatrNum":0}

# valid, means we want 'MatrNum' and 'name' and not '_id'
{"_id":0, "MatrNum":1, "name":1}
```

From the figure, we can spot another operator, the greater than (`$gt`) operator. These operators are called *query selectors* and they consist of several types. The full list can be found [here](https://docs.mongodb.com/manual/reference/operator/query/) but lets go through some of them, bear in mind that these operators are also applicable for use in the *update filters* for the update functions.

* `$in` - a comparison operator that selects documents based on if the value in the *key* matches the value in the specified array. In the example below, the `$in` operator is used to find all the documents that have either a `5` or `15` value for the `qty` *key*.

 **Example: Usage of `$in` operator**
 
 ```bash
# document
{ _id: 1, item: "abc", qty: 15, tags: [ "school", "clothing" ], sale: false }
db.collection.find( { qty: { $in: [ 5, 15 ] } } )
 ```
 
 
* `$and`, `$or`, `$nor` and `$not` - these are logical operators that joins multiple query conditions. They work exactly like the logical operators in any programming language. 
 
 **Example: Usage of logical operators**
 
 ```bash
db.collection.find( {
    $and: [
        { $or: [ { qty: { $lt : 10 } }, { qty : { $gt: 50 } } ] },
        { $or: [ { sale: true }, { price : { $lt : 5 } } ] }
    ]
} )
 ```
 
 
* `$type` - querying by the BSON datatype of the *value* in the key-value pairs. Take the zipcode value as an example, it is a bunch of numbers that is location based. Because it is number based, most people would store it as `integer` or `long` but because the value does not behave like a number, it can also be stored as a `string`. Documents can get messy when there is a particular key (zipcode) that has different *value* datatypes therefore finding and correcting them is required.

 **Example: Usage of `$type` operators**
 
 ```bash
db.collection.find( { "zipCode" : { $type : "number" } } );
 ```
 
 
* `$slice` - is used to return a subset of elements for *keys* that have *array* type values. This operator takes either a single number or an array of 2 numbers. For a single number, it returns the number of elements from `0` to `n` or `-n ` to `max` where `n` is the given number and `max` is the maximum number of items in the array, example `{"$slice" : 10}` returns the first 10 elements. For an 2 array of numbers, it becomes *skip elements to first given number* then *return the next element up to the first plus second given number*, example `{"$slice" : [23, 10]}` skips the first 23 elements then returns the 24th element to the 33rd element.


It was mention earlier that the `find()` function returns a `Cursor` object, this object has the ability to limit the amount of results, skip over some results, sort the results by any combination of keys and more. Cursors are also iterator objects therefore depending on the driver used, built-in iterator functions can be used on it.

**Example**

```bash 
db.collection.find({"course": "BSc Mechanical Engineering" }).limit(2)
```

---
## Summary

* Differences, Benefits and Drawbacks Between RDBMS and NoSQL
* Learnt in greater detail about XML and JSON data formats
* Introduction to MongoDB
* CRUD operations with MongoDB