# MongoDB Tutoring

## 1. The introduction of MongoDB

### 1.1 What is MongoDB?

MongoDB is a database based on distributed file storage, written by C++ and is designed to provide scalable high-performance data storage soluations for WEB application. MongoDB is a high-performance, open source and modeless document database. 
The official definition for yourself is a bridge between Key-value storage (high performance and high expansion) and traditional RDBMS (rich query and function).

### 1.2	The docment and BSON in MongoDB

In MongoDB, data are stored as BSON, for example:
{
	name: “ruizhi”,
	age: “23”,
	groups: [“art”, “game”]
}

The basic unit of data in MongoDB is called a Document, it is the core concept of MongoDB, which is composed by multiple keys and values placed together in an order. They are corresponding to the rows in the database.
Data in MongoDB are stored as the form of BSON(Binary-JSON). BSON
(Binary Serialized Document Format) is a binary json-like storage format. It is just like JSON and supports embedded document objects and array objects, but BSON has some data types that JSON does not have, such as Date and BinData types.

### 1.3 Features

MongoDB is a database for document storage, which is relatively simple and easy to operate.  
You can set any attribute index (such as: FirstName = "Sameer", Address = "8 Gandhi Road") in MongoDB records to achieve faster sorting.  
You can create data mirrors locally or on the network, which makes MongoDB more scalable.  
If the load increases (requires more storage space and stronger processing power), it can be distributed on other nodes in the computer network. This is called sharding.  
Mongo supports rich query expressions. The query command uses JSON format tags to easily query the embedded objects and arrays in the document.  
MongoDb can use the update () command to replace the completed document (data) or some specified data fields.  
Map / reduce in Mongodb is mainly used to batch process and aggregate data.  
Map and Reduce. The Map function calls emit (key, value) to traverse all the records in the collection, and passes the key and value to the Reduce function for processing.  
The Map function and Reduce function are written in Javascript, and MapReduce operations can be performed through the db.runCommand or mapreduce commands.  
GridFS is a built-in feature in MongoDB that can be used to store a large number of small files.  
MongoDB allows script execution on the server side. You can write a function in Javascript and execute it directly on the server side. You can also store the function definition on the server side and call it directly next time.  
MongoDB supports various programming languages: RUBY, PYTHON, JAVA, C ++, PHP, C # and other languages.  
MongoDB is easy to install.  

## 2. Prepare for the MongoDB
### 2.1 Make a connection through pymongo

First, we need to import pymongo.

In [1]:
import pymongo

Then we need to create a MongoClient to run MongoDB instance

In [2]:
from pymongo import MongoClient
client = MongoClient()

The above code can connect to the default host and post, but also, we could specify the host and post we want

In [3]:
client = MongoClient("localhost", 27017)

Also, we have another method to make the connection

In [4]:
client = MongoClient('mongodb://localhost:27017/')

### 2.2 Create a Database

To create a database in MongoDB, we need to speciy the name of a database.

In [5]:
db = client['tutoring']

## 3. Basic operations in MongoDB

## 3.1 Inserting a Document

To insert a document into MongoDB, we use 'insert_one()' methods.
Document is document data, and collection is a collection of document data.For example: all user information is stored in the user’s collection, each user information is a user document, insert data:
db.users.insert(user);
If the collection exists, the document will be added to the collection directory. If the collection does not exist, the database will first create the collection and then save the document.

In [6]:
article = {
    "author": "Luxun",
    "description":"the most important book in China",
    "tags": ["society","history","critical"]
}

In [7]:
articles = db.articles
result = articles.insert_one(article)

when the document is inserted, an id will gengerated automatically. We can print it as follows:

In [8]:
print("the id is: {}".format(result.inserted_id))

the id is: 5ea3906d920957af0a71267d


The collection names articles is created and we can comfirmed that by using list_collection_name mothed

In [9]:
db.list_collection_names()

['articles']

The insert statement can not only insert a single document, but also insert multiple documents at once. When inserting multiple documents, the parameter of the insert command is an array, and the array elements are documents in BSON format. Multiple documents can be placed in an array, inserting multiple data at once, for example:

In [10]:
article1 = {"author": "Ruizhi",
            "about": "Book one",
            "tags":
                ["first tag","second tag"]}

article2 = {"author": "Nuo",
            "about": "Book two",
            "tags":
                ["first tag", "second tag"]}

new_articles = articles.insert_many([article1, article2])

print("The new article IDs are {}".format(new_articles.inserted_ids))

The new article IDs are [ObjectId('5ea3906d920957af0a71267e'), ObjectId('5ea3906d920957af0a71267f')]


### 3.2 Retrieving a Single Document with find_one()

Using find_one() returns one single document by matching the query requirement. When we call the method, it will return the first article in the database

In [11]:
print(articles.find_one())

{'_id': ObjectId('5ea3906d920957af0a71267d'), 'author': 'Luxun', 'description': 'the most important book in China', 'tags': ['society', 'history', 'critical']}


### 3.3 Finding all Documents with find()

Use the find command when querying and retrieving data in MongoDB. The find command has two optional parameters, criterion is the query condition, and projection is the returned field. If no condition database is passed, all documents in the collection will be returned.

In [12]:
for article in articles.find():
    print(article)

{'_id': ObjectId('5ea3906d920957af0a71267d'), 'author': 'Luxun', 'description': 'the most important book in China', 'tags': ['society', 'history', 'critical']}
{'_id': ObjectId('5ea3906d920957af0a71267e'), 'author': 'Ruizhi', 'about': 'Book one', 'tags': ['first tag', 'second tag']}
{'_id': ObjectId('5ea3906d920957af0a71267f'), 'author': 'Nuo', 'about': 'Book two', 'tags': ['first tag', 'second tag']}


#### 3.3.1 Questions: How to find some specific items in a collection?

Execise: Some times, we want to search some items with specific tag, we still can use find_one or find_many(), try to figure out how to find the item with author names Luxun?

In [13]:
# for article in articles.find({}, {"author": "Luxun"}):
#     print(article)

article = articles.find({}, {"author": "Luxun"})
    
print(article)

<pymongo.cursor.Cursor object at 0x10c9397c0>


### 3.4 Sorting the results

We can use the sort() method to sort the results and use 1 to ascending order and -1 to descending order.

In [14]:
doc = articles.find().sort("author", -1)

for x in doc:
  print(x)

{'_id': ObjectId('5ea3906d920957af0a71267e'), 'author': 'Ruizhi', 'about': 'Book one', 'tags': ['first tag', 'second tag']}
{'_id': ObjectId('5ea3906d920957af0a71267f'), 'author': 'Nuo', 'about': 'Book two', 'tags': ['first tag', 'second tag']}
{'_id': ObjectId('5ea3906d920957af0a71267d'), 'author': 'Luxun', 'description': 'the most important book in China', 'tags': ['society', 'history', 'critical']}


### 3.5 Updating a Document

we usually use update_one() method to update a document content

In [15]:
query = {"author": "ruizhi"}

new_author = {"$set": {"author": "ma"}}

articles.update_one(query, new_author)

for article in articles.find():
    print(article)

{'_id': ObjectId('5ea3906d920957af0a71267d'), 'author': 'Luxun', 'description': 'the most important book in China', 'tags': ['society', 'history', 'critical']}
{'_id': ObjectId('5ea3906d920957af0a71267e'), 'author': 'Ruizhi', 'about': 'Book one', 'tags': ['first tag', 'second tag']}
{'_id': ObjectId('5ea3906d920957af0a71267f'), 'author': 'Nuo', 'about': 'Book two', 'tags': ['first tag', 'second tag']}


### 3.6 Limiting the Result

MongoDB enable us to query limited amount of results.

In [16]:
limited = articles.find().limit(2)

for x in limited:
    print(x)

{'_id': ObjectId('5ea3906d920957af0a71267d'), 'author': 'Luxun', 'description': 'the most important book in China', 'tags': ['society', 'history', 'critical']}
{'_id': ObjectId('5ea3906d920957af0a71267e'), 'author': 'Ruizhi', 'about': 'Book one', 'tags': ['first tag', 'second tag']}


### 3.7 deleting documents with delete_one()

we use delete_one to delete a document in MongoDB. You need to specify the document that you want to delete like follows:

In [17]:
db.articles.delete_one({"_id":"5ea1e1bc106c83ee073753ff"})

<pymongo.results.DeleteResult at 0x10c965300>

### 3.8 Deleting many documents with delete_many()

In order to delete many documents in MongoDB, we usually use delete_many() method to do deletion

In [18]:
delete_many = articles.delete_many({})

print(delete_many)

<pymongo.results.DeleteResult object at 0x10c965040>


### 3.9 Dropping a Collection
In MongoDB, we can drop the collection by using drop() method

In [19]:
articles.drop()

then, we can confirm that by call list_collection_names:

In [20]:
db.list_collection_names()

[]

## 4. Adanced Knowledge in MongoDB
### 4.1 One to many relationship
#### 4.1.1 n (n<100)
For example, each Person will have multiple Address. In this case, we use the simplest embedded document to model.  

{  
  name: 'Kate Monster',  
  id: '123-456-7890',  
  addresses : [  
     { street: '123 Sesame St', city: 'Anytown', cc: 'USA' },  
     { street: '123 Avenue Q', city: 'New York', cc: 'USA' }  
  ]  
}  
  
This modeling approach includes obvious strengths and weaknesses:  
Strengths: You don't need to run a separate query to get all Address information of a person.  
Disadvantages: You can't manipulate Address information like an independent document.  

You must first operate (for example, query) the Person document before it is possible to continue to operate Address.  

In this example, we do not need to perform independent operations on Address. And the Address information is meaningful only when it is associated with a specific Person. So the conclusion is: using this embedded (embedded) modeling is very suitable for Person-Address scenarios.

#### 4.1.2. n (100 <n< 1000)
For example, products and parts, each product will have many parts. In such a scenario, we can use reference to model, such as the following:  

Part:  
{  
    _id : ObjectID('AAAA'),  
    partno : '123-aff-456',  
    name : '#4 grommet',  
    qty: 94,  
    cost: 0.94,  
    price: 3.99  
}  


Product:  
{  
    name : 'left-handed smoke shifter',  
    manufacturer : 'Acme Corp',  
    catalog_number: 1234,  
    parts : [     // array of references to Part documents  
        ObjectID('AAAA'),    // reference to the #4 grommet above  
        ObjectID('F17C'),    // reference to a different Part  
        ObjectID('D2AA'),  
        // etc  
    ]  
}  
  
Strengths: Components exist as independent documents, and you can perform independent operations on a component. For example, query or update.  
Disadvantages: As mentioned above, you must find the information of all parts to which a certain product belongs through two queries.  

In this case. This shortcoming is acceptable. It is not difficult to realize by itself. And, through such modeling, you can easily extend 1 to n to n to n. That is, a product can include multiple components, and a component can be referenced by multiple products at the same time (that is, the same component can be used by multiple products).

#### 4.1.3 n (1000 <n)  
For example. Each host (host) will generate a very large number of log messages (logmsg).

In this case, assuming you use embedded modeling, a host document will be very large, which easily exceeds the MongoDB document limit size. So it is not feasible. Suppose you use the second method to model and use an array to store all logmsg _id values. This method is not feasible. When there are too many logs, even if the objectId alone is referenced, it will easily exceed the document limit size. So at this time, we use the following method:
  
Host:  
{  
    _id : ObjectID('AAAB'),  
    name : 'goofy.example.com',  
    ipaddr : '127.66.66.66'  
}  
 
Log messages:  
{  
    time : ISODate("2014-03-28T09:42:41.382Z"),  
    message : 'cpu is on fire!',  
    host: ObjectID('AAAB')       // Reference to the Host document  
}  

We only need to store the _id reference to the host in log messages.

### 4.2 Summary
In summary, when modeling the 1 to n relationship, we need to consider:

1) The order of magnitude represented by n is very small. And when the entity represented by n does not need to operate separately, it can adopt embedded modeling.

2) The order of magnitude represented by n is relatively large. Or when the entity represented by n needs to be operated separately, it is modeled by storing the reference in Array in 1.

3) When the magnitude of n is large, we have no choice. It is only possible to add a reference to terminal 1 at terminal n.

## 5 Assement


1. How to enter my_user database
2. How to insert a document into the user collection of the database
3. How to query documents in user collection
4. How to insert a document into the user collection of the database
5. How to query the documents in the database user collection
6. How to count the number of documents in the database user collection
7. How to query the document whose username is sunwukong in the database user collection
8. How to add an address attribute to the document whose suername is sunwukong in the database user collection, the attribute value is huaguoshan
9. How to use {username: "tangseng"} to replace the document whose username is hubajie
10. How to delete the address attribute of a document whose username is sunwukong
11. How to add a hobby: {cities: ["beijing", "shanghai", "shenzhen"], movies: ["sanguo", "hero"]} to the document whose username is sunwukong
12. How to add a hobby: {movies: ["A Chinese man", "the king of god"]} to the document whose username is tangseng
13. How to check the documents of hero hero
14. How to add a new movie Interstellar to tangseng
15. How to delete users who like beijing
16. How to empty the user collection

### Concluction

This type of database has these characteristics: non-relational, distributed, open source, and horizontally scalable. The original purpose was for large-scale web applications. Someone in the early stage of this brand-new database revolution proposed that the trend will grow even higher in 2009. NoSQL proponents advocate the use of non-relational data storage. Common applications such as: free mode, simple replication support, simple API, eventual consistency (non-ACID), and large-capacity data. NoSQL is the key-value storage that we use the most, of course, there are other document-based, column storage, graph database, xml database, etc. Compared with the current overwhelming use of relational databases, this concept is undoubtedly an injection of new thinking.

### CONTRIBUTION
We contributed By Own: 70%  
By External source: 30%

### CITATIONS
https://docs.mongodb.com/

### LICENSE¶
Copyright 2020 Ruizhi Ma，Nuo Xu

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON INFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.