### MongoDB

* MongoDB is a leading NoSQL, classified as a document-oriented database. The word MongoDB is originated from humongous, meaning huge.
* MongoDB is written in C++ language.
* MongoDB pairs each key with a complex data structure named as document.
* MongoDB stores document in a binary-encoded format termed as BSON. BSON is an extended format of JSON data model.

### NoSQL
* NoSQL provides a mechanism for retrieval and storage of data other than relational databases. The general observations of NoSQL are:
* NoSQL does not use the relational model.
* NoSQL runs on clusters.
* NoSQL is probably open-source.
* NoSQL is mostly used in Big data and Real-time web applications.
* The commonly used Data Structures are Document, Graph, Key-value, and Wide column. Learn more about NoSQL by enrolling in the NoSQL Course available in FrescoPlay.

### What are Document Databases?
* Document databases pair each key with a complex data structure commonly with a block of XML or JSON termed as a document.
* Document-oriented databases are a special type of NoSQL database used for
    * storing
    * retrieving
    * managing semi-structured data.

### Document Databases - Features
The key features of Document Databases include:
* Flexible data modeling: Document Databases are best suited for modern application data models like web, mobile, social, and IoT-based applications. Database eliminates the need for force-fit relational data models.
* Fast querying: Document databases have strong query engines and indexing features that provides fast and efficient retrieval of data.
* Faster write performance: Document databases prioritize write availability over strict data consistency.

### Why MongoDB?
* where very minimal Total Cost of Ownership (TCO) is required.
* when a need for replication across multiple data centers globally.
* where rapid deployment and faster scaling are required.
* when a need for easy loading of data at the beginning and overtime is needed.
* when massive concurrency is demanded by a user.
* when no downtime can be tolerated.
* when the database needs to grow rapidly as per user needs.
* when high uncertainty in sizing exists.
* where seamless and consistent experience is expected.

### Advantages of MongoDB
* Rich Document based Query for easy readability
* Schema-free: Schema to change as Applications evolve
* Performance-oriented database: Best suited for faster request/response.
* Ease of use: Codebase is simple, less hardware, quick and easy to add new functionality.
* High scalability can be achieved by working on low commodity hardware.
* Supports Consistency and Partition Tolerance on CAP theorem
* Easy Replication for high availability
* Able to handle large volumes of structured, semi-structured, and unstructured data.

### What it is not !
* MongoDB does not compete with SAS and SPSS as an analytical suite.
* MongoDB does not compete with Teradata, Netezza, Vertica as a data warehouse technology.
* MongoDB is not a BI tool competing with Tableau or Quick view for Back Office transaction processing.
* MongoDB does not compete with IBM Mainframes as a backend for a billing system or general ledger system.
* MongoDB does not compete with Elastic search, SOLR as a search engine.
* Not suited for complex transactions spanning multiple operations.
* Not for applications with traditional Database System Requirements such as Foreign Key Constraints etc.

### Indexing
* **default_id**: Each collection contains an index named default_id.
* **Single Field**: Used for Single field or sort. Indexes can be either in ascending order or descending order.
* **Compound Index**: Used for multiple fields.
* **Multikey Index**: These are used to index array data.
* **Geo-spacial Index**: Indexes used are two dimensional and 2D sphere (geolocation).

### Load Balancing
* Sharding is a technique used for distributing data across multiple servers.
* MongoDB supports Horizontal scaling by sharding.
* Mongo leverages Sharding for splitting up of a large collection among multiple servers.
* MongoDB supports deployments with very large data sets and high throughput operations through this.

### Capped Collections
* MongoDB supports Capped collections. It is a fixed type collection that maintains insertion order once the specified size has arrived.
* It acts as a circular queue. In this collection, you can restrain the size of the collection, or you can put a limit on the size of a collection.
``` db.createCollection(<CollectionName>, {capped: <true/false>, size: Number, max:number }) ```

### Capped Collections
Two types of capped collections are used as given below:
* fs.files is used to store the metadata.
* fs.chunks is used to store the file chunks.
Example: Suppose we have an eCommerce application. We are logging user data and should restrict data not to go more than four documents.In such scenario, we use capped collection.

``` db.createCollection("LogUsers", {capped : true,size : 100, max :4}) ```

### Replication
* MongoDB uses replica sets for high availability.
* Replica sets contain two or more copies of the data. Each replica set may act as a primary or secondary replica set. By default, read and write operations are performed on the primary replica. The secondary replica will maintain a copy of primary data.

### Storage Mechanisms
* MongoDB supports different storage engines:
    * **MMAPv1**: Default Storage engine till MongoDB version 3.2.
    * **WiredTiger**: Default storage engine starting from MongoDB 3.2.
    * **In-Memory Storage Engine**: This storage engine will be available in Enterprise version. It retains documents in-memory.
* MongoDB uses GridFS specification for storing and retrieving large collections. GridFS is a special type of file system in which data can be stored within MongoDB collections. GridFS splits a larger file into smaller chunks and stores each chunk of data in a separate document with a size of 255k.

### Aggregation
In MongoDB, aggregation process records and return computed results.
* Aggregation can be categorized as :
* Pipeline Aggregation: Documents are piped through processing pipeline and executes in different stages and transforms the documents into a final aggregated result.
    * Map-Reduce: It splits a larger problem into smaller chunks and sends to different machines for processing. It comprises two phases: reduce and map.
    * Single Purpose: These operations will aggregate documents from a single collection.

### Architecture
![image.png](attachment:image.png)

The architecture of MongoDB comprises:
* Application Driver
* Databases
* Collections
* Documents
* Indexes
* Security Features
* Storage Engine

### Drivers
* Drivers are client libraries that provide interfaces and methods for applications to interact with MongoDB database.
* Drivers will handle the translation of documents between BSON objects and mapping structures.
* C++, Java, .NET, Ruby, JavaScript, Node.js, Python, Perl, PHP, and Scala are some of the widely used drivers supported by MongoDB.

### Databases
* The database can be defined as a physical container of collections. MongoDB server can have one or more databases.
* The default database for MongoDB is test. In the absence of any database, collections will be stored in the test database.
* The command to check databases in MongoDB Server

### Document
* A document is a set of key-value pairs that support dynamic schema. A document is similar to Row in RDBMS. In Relational databases, schemas should be defined before we add any data whereas MongoDB allows the insertion of data without a predefined schema.
* Dynamic schema implies that the documents stored in the database can have different fields, with different types for each field.

### Collections
A collection consists of a group of MongoDB documents. It is similar to RDBMS table. Documents inside a collection can have same or different fields.

### index
* In MongoDB, indexes support fast and efficient execution of queries.
* In the absence of indexes, MongoDB will scan every document in a collection to select those documents that match the query statement. If an index exists, the number of documents to inspect can be restricted.

### Security
* Authentication
* Authorization
* Encryption on data
* Auditing
* Hardening - (Ensure only trusted hosts have access)

### Architecture - GridFs
* GridFs helps us to store data that has a size of more than 16MB.
* When a document exceeds the size of 16MB, MongoDB uses GridFS specification. It splits a larger record into small chunks and stores the chunks in documents with a maximum size of 255KB.
Syntax to store file using GridFs is:
```mongofiles.exe -d <database_name> put <file_name>```

### Storage Engine
* Storage engine is a part of the database that manages how data should be stored in memory and on disk. Storage engine can remarkably impact the performance of the applications.
* It acts as an interface between persistent storage, i.e., disk and MongoDB database. MongoDB supports different storage engines:
    * MMAPv1
    * WiredTiger
    * In-Memory Storage Engine

### MMAPv1 Storage Engine
* MMAPv1 is a traditional storage engine based on memory mapped files. This storage engine provides better workloads with high volume reads, inserts and in-place updates.
* MMAPv1 automatically allocates power-of-two-sized documents when new documents are inserted.
* Operating system decides which pages can fit into memory.
* It supports two types of strategies.
    * Power-of-two allocation: Store documents in power-of-two, eg: 32, 64, 128, 256, 512 … 2 MB.It works more efficiently for more insert, update delete workloads.
    * Extra Fit: Collections whose workloads that consist of insert-only operations, or update operations will not increase document size.
* Consistency in this storage is achieved through Journalling. MongoDB writes journal files every 100 milliseconds and writes data files to disk on every 60 seconds.
    * Indexes and Data memories are mapped into virtual address space.
    * Frequently used pages will get retained in RAM.

### Limitations of MMAPv1 Storage Engine:
* MMAPV1 offers collection level locking.
* Does not support Compression.
* MMAPV1 uses B-trees to store indexes.
* Storage Engine works on Multiple reader single writer lock. A user cannot have two write calls to be processed in parallel on the same collection.
* Fast for reads and slow for writes.

### WiredTiger
Default Storage Engine starts from MongoDB 3.2 version. Created by BerkelyDB and later taken by Oracle's NoSQL DB.

### Wired Tiger supports
* Document-level concurrency model
* Compression
* Encryption at Rest for MongoDB Enterprise Edition
* Durability with and without journal
* B-trees by default but also supports LSM trees
* No Locking Algorithms like Hash Pointer.
* WiredTiger yields 7x-10x better write operations and 80% of the file system compression than MMAP.
* **Compression**: Compresses indexes and collections: Compression works on the principle that identifies repeating values or values like patterns that can be stored once in compressed form thereby reducing the total amount of space. Larger units of data tend to compress more effectively as there are possibilities for repeating values and patterns, Available compressors are - snappy, zlib, and none.
* **Snappy**: Default Compression and low overhead. Efficiently use resources
* **Zlib**: It is similar to gzip and provides 10X better compression than snappy compression but cost more on CPU.
* **None**: When no compression is needed.

There are two compression options for indexes:
* No compression
* Prefix Compression: Effective for some data sets whose values are duplicating. e.g., Country

* **Concurrency**:When we are updating an existing record, the update will not happen in original record immediately. It uses multiple version of records. First, make a copy of existing records and update another copy in temporary cache. Threads reading from storage engine will be able to see a different version of data that are committed before read started.

### Limitations of WiredTiger:
WiredTiger is not available on Solaris platform. While updating a bigger document with only a single element, WiredTiger re-writes the whole document, which makes the processes slower.

### InMemory Storage Engine
* Instead of storing documents on disk, the engine uses in-memory for more predictable data latencies.
* Storage engine uses 50% of physical RAM minus 1 GB as default.
* Limitation of InMemory Storage Engine:
* Storage engine does not persist data after a process shutdown.
* In-memory storage engine requires all its data. When the dataset is huge, then the in-memory engine is not a good option.

### Package Component
![image-2.png](attachment:image-2.png)

### Data Import and Export Tools
Following are the commonly used Data Import and Export Tools in MongoDB.

#### **mongoexport**
The utility that exports MongoDB data to JSON, TSV or CSV files.

Syntax:
```
mongoexport-d <database> 
          -c <Collection name > 
          -o <Output file name >.json
```
Example: ```mongoexport -d  customer -c order -o student.json```


#### **mongoimport**:
mongoimport is a utility that imports JSON, TSV or CSV data into MongoDB database.

Syntax:
```
 mongoimport -d<datatbasename>
             -c <collectionname>
            --file<filename>
```
Example: ```mongoimport -d customers -c orders --file student.json```

### Pipeline Aggregation
MongoDB’s aggregation framework is desgined based on data processing pipelines. Documents are piped through processing pipeline that will be executed in stages and transforms the documents into an aggregated result. When more than one stage occurs, each of the stages is placed inside the array.

Some of the stages are :
* **$Project**: Reshape the documents. Handling of document is 1:1.
* **$match**: Filtering of the document occurs. Reduce the number of documents hence handling is n:1 (n is the input)
* **$group**: We can aggregate operators like sum, count that will group together the documents. Reduce the number of documents hence nature is n:1 (n is the input).
* **$sort**: Once group completed, sort documents based on order. This stage will be in 1:1 nature.
* **$skip**: Skips some documents .n:1 transformation in nature.
* **$limit**: Limit some documents .n:1 transformation nature.
* **$unwind**: Used to unwind document that is used in an array.
* **$output**: Output collection .1:1 transformation in nature.
* **$redact**: Security related feature that is used to limit to certain users.
* **$geonear**: Security related feature that is used to perform allocation based queries to limit based on location.

Syntax:
```
db.collectionname.aggregate( [
                 { <stage1> },
                 { <stage2> } ,......                 
                 { <stage..n> } ])
```
Example:
```
db.customers.aggregate( [
{$match : {status:"Active"}},
{$group:{_id :"Customer_id",total :{$sum : "$salesamount" } } }
```

 ```db.customers.distinct("Customer_ID")```

In [1]:
from pymongo.mongo_client import MongoClient
from pymongo.server_api import ServerApi
uri = "mongodb+srv://mritunjaysrivastava456:ttMaPAqUBBfPWyqY@cluster0.4tzzamn.mongodb.net/?retryWrites=true&w=majority&appName=Cluster0"
# Create a new client and connect to the server

client = MongoClient(uri)
# Send a ping to confirm a successful connection
try:
    client.admin.command('ping')
    print("Pinged your deployment. You successfully connected to MongoDB!")
except Exception as e:
    print(e)

Pinged your deployment. You successfully connected to MongoDB!



 This question gives you hands-on experience on CRUD Operations in MongoDB.

 1. Create Database in MongoDBw

 Enter the following command in the MongoDB shell to create the database "TestDB".

 ```use TestDB```

 2. Insert Record into the Existing Table in MongoDB

 Enter the following commands in the MongoDB shell one by one:

 ```
 db.topic.insert({title: 'MongoDB',
 desc: 'MongoDB is document oriented database'})

 db.topic.insert({title: 'Hbase',
 desc: 'Hbase is a column-oriented database'})
 ```

 Show the databases created in MongoDB:

 ```show dbs```

 List all collections:

 ```show collections```

 3. Read Data from the Database

 ```db.topic.find()```

 Use the pretty() method to display records in a formatted manner.

 ```db.topic.find().pretty()```

 Projection: Display only the title and hide ID

 ```db.topic.find({},{"title":1,_id:0})```

 4. Delete a Record

 Delete a record from a topic collection already created.

 ```db.topic.deleteOne({title : 'MongoDB'})```

 Read the records in the topic collection to verify if the records are deleted.

 ```db.topic.find().pretty()```

 5. Update the Record

 Consider inserting records into the stock collection. The stock consists of Item, Quantity, Color, Size, and Measurement in units.

```
 db.Stock.insertMany([
   { item: "pepper", qty: 250, color: ["blank", "dark red"],
 sizes: { h: 14, w: 12, uom: "m" } },
   { item: "clothes", qty: 850, color: ["gray blue"], sizes: { h: 27, w: 35, uom: "m" } },
   { item: "table", qty: 250, color: ["gel", "dark blue"],
 sizes: { h: 10, w: 22, uom: "m" } }
             ])
```

 Update the record by changing the unit of measurement from "meter" to "centimeter".

 ```
 db.Stock.updateOne(
   { item: "pepper" },
   {
    $set: { "sizes.uom": "cm"},
    $currentDate: { lastModified: true }
   }
 )
 ```

 Use the find() statement to check and confirm if the record is updated or not.

```db.Stock.find( {"item": "pepper" },{"sizes.uom":1,_id:0} ) ```

Starting MongoDB :
In a terminal type, mongo

 

Now you have a

Successfully running MongoDB environment
 

A command line shell to interact with MongoDB
 

This question gives you hands-on experience on Aggregations in MongoDB.

 

1. Pipeline Aggregation

 

Consider Bookshop is a collection that contains book documents. Each document consists of the title, description, creator, tags, and the number of users. Use the pipeline function and find the total number of books based on the creator.

 
```
db.bookshop.insert([
  {
    titlename: 'MongoDB',
    desc: 'MongoDB is no sql database',
    creator: 'Sydney Team',
    tags: ['mongodb', 'database', 'NoSQLDB'],
    User: 500
  },
  {
    titlename: 'HBASE Database',
    desc: " HBASE is a columar database ",
    creator: 'India Team',
    tags: ['MongoDB', 'database', 'NoSQLDB'],
    User: 2000
  },
 {
    titlename: 'Cassandra',
    desc: 'cassandra is no sql database',
    creator: 'Sydney Team',
    tags: ['cassandra', 'database', 'NoSQLDB'],
    User: 5000
  },
])
```
 

Insert the following command to perform Pipeline Aggregation.

Note:  $group is a stage in Pipeline aggregation.

 
```
db.bookshop.aggregate(
   {
   $group : {_id : "$creator",
         total : { $sum : 1 }}
   } );
```
 

2. Map-Reduce Aggregation

 

Consider School is a collection that contains student documents. Each document consists of the full name of the student, the subject studied, and the marks scored in a particular subject. Using the Map-Reduce function, find the accumulated marks scored by each student.

 
```
try {
  db.school.insertMany( [
{ "_id" :"1", "Fullname" :"Mridhula", "subject" : "Science", "score" : 680 },
{ "_id" : "2", "Fullname" : "Mridhula", "subject" : "Mathematics", "score" : 980 },
{ "_id" : "3", "Fullname" : "Mridhula", "subject" : "English", "score" : 770 },
{ "_id" :"4", "Fullname" : "Akhila", "subject" : "science", "score" : 670 },
{ "_id" : "5", "Fullname" : "Akhila", "subject" : "Mathematics", "score" : 870},
{ "_id" : "6", "Fullname" : "Akhila", "subject" : "English", "score" : 890 },
{ "_id" :"7", "Fullname" : "Abhilasha", "subject" : "Science", "score" : 670 },
{ "_id" :"8", "Fullname" : "Abhilasha", "subject" : "Mathematics", "score" : 780 },
{ "_id" :"9", "Fullname" : "Abhilasha", "subject" : "English", "score" : 900 }
  ] );
} catch (e) {
  print (e);
};
```
 

Run find() and view the result set:

 

```db.school.find({});```

 

Insert Map-Reduce. In the following example, totalscores is the output of Map-Reduce.

 
```
var map = function() {emit(this.Fullname,this.score);};
var reduce = function(Fullname,score) {return Array.sum(score);};
db.school.mapReduce(
   map,
  reduce,
   { out: "totalscores" }
 );
```
 

Check the result.

 

```db.totalscores.find();```


3. Single Purpose Aggregation Operations

 

Consider Customerorder is a collection that contains records corresponding to customerID, sales amount, and status.

 
```
db.custorders.insert(
{
Customer_ID: 'A1',
Desc: 'Orange',
SalesAmount :5000 ,
Status : 'Active'
}
)
db.custorders.insert(
{
Customer_ID: 'A2',
Desc: 'Apple',
SalesAmount :15000 ,
Status : 'Active'
}
)
db.custorders.insert(
{
Customer_ID: 'A3',
Desc: 'Melon',
SalesAmount :35000 ,
Status : 'Active'
}
)
db.custorders.insert(
{
Customer_ID: 'A3',
Desc: 'Banana',
SalesAmount :150 ,
Status : 'Active'
}
)
```
Run the following command to perform Single Purpose Aggregation operations.

 

Note: Only distinct records of the Customer ID are displayed as output.

 

```db.custorders.distinct("Customer_ID")```
