# MongoDB

## What is MongoDB?
- MongoDB is a **document-oriented** NoSQL database used for high volume data storage. 
- Instead of using tables and rows as in the traditional relational databases, MongoDB makes use of **collections** and **documents**. 
- Documents consist of **key-value pairs** which are the basic unit of data in MongoDB. 
- Collections contain sets of documents and function which is the equivalent of relational database tables, contain document sets.
- MongoDB offers support for many programming languages, such as C, C++, C#, Go, Java, **Python**, Ruby and Swift.


<img src="images/Capture.png" height=400px width=400px align='center'>




## MongoDB history
MongoDB was created by **Dwight Merriman and Eliot Horowitz**, who encountered development and scalability issues with traditional relational database approaches while building web applications at DoubleClick, an online advertising company that is now owned by Google Inc. The name of the database was derived from the word humongous to represent the idea of supporting large amounts of data.

Merriman and Horowitz helped form **10Gen Inc**. in 2007 to commercialize MongoDB and related software. The company was renamed **MongoDB Inc**. in 2013 and went public in October 2017 under the ticker symbol MDB.The DBMS was released as **open source software** in 2009 and has been kept updated since.

### Applications:

- Organizations like the insurance company MetLife have used MongoDB for customer service applications
- While other websites like Craigslist have used it for archiving data. 
- The CERN physics lab has used it for data aggregation and discovery. 
- Additionally, The New York Times has used MongoDB to support a form-building application for photo submissions.

## Key Components of MongoDB Architecture
MongoDB environments provide users with a server to create databases with MongoDB. MongoDB stores data as records that are made up of collections and documents.

**Database:** This is a container for collections like in RDMS wherein it is a container for tables. Each database gets its own set of files on the file system. A MongoDB server can store multiple databases.

**Collections:** This is a grouping of MongoDB documents. Sets of documents are called collections, which function as the equivalent of relational database tables. Collections can contain any type of data, but the restriction is the data in a collection cannot be spread across different databases. Users of MongoDB can create multiple databases with multiple collections.

**Documents:** Documents contain the data the user wants to store in the MongoDB database. The documents are similar to JavaScript Object Notation (JSON) but use a variant called Binary JSON (BSON). The benefit of using BSON is that it accommodates more data types. 

**id:** It is a 24 digit unique identifier to each document in the collection. This is a field required in every MongoDB document. The id field represents a unique value in the MongoDB document. The id field is like the document’s primary key. If you create a new document without an id field, MongoDB will automatically create the field.

**Fields:** A name-value pair in a document. A document has zero or more fields.The fields in these documents are like the columns in a relational database. 

**Values:** Values contained can be a variety of data types, including other documents, arrays and arrays of documents, according to the MongoDB user manual. Documents will also incorporate a primary key as a unique identifier. A document's structure is changed by adding or deleting new or existing fields.

**Cursor:** This is a pointer to the result set of a query. Clients can iterate through a cursor to retrieve results.

**Mongo Shell:** The mongo shell is a standard component of the open-source distributions of MongoDB. Once MongoDB is installed, users connect the mongo shell to their running MongoDB instances. The mongo shell acts as an interactive JavaScript interface to MongoDB, which allows users to query or update data and conduct administrative operations.

**JSON:** This is known as JavaScript Object Notation. This is a human-readable, plain text format for expressing structured data. JSON is currently supported in many programming languages.



**Example**

        { "_id" : ObjectId("5b98bfe7e8b9ab9875e4c80c"),

             "StudentName" : "George  Beckonn",

             "ParentPhone" : 75646344,

             "age" : 10

        }

## Features of MongoDB
An organization might want to use MongoDB for the following:

- **Storage:** MongoDB can store large structured and unstructured data volumes. Indexes can be created to improve the performance of searches within MongoDB. Any field in a MongoDB document can be indexed.
- **Schema-less:** MongoDB is a schema-less database, which means the database can manage data without the need for a blueprint.The rows (or documents as called in MongoDB) doesn’t need to have a schema defined beforehand. Instead, the fields can be created on the fly.
- **Data integration:** This integrates data for applications, including for hybrid and multi-cloud applications.
- **Complex data structures descriptions:** Document databases enable the embedding of documents to describe nested structures (a structure within a structure) and can tolerate variations in data.The data model available within MongoDB allows you to represent hierarchical relationships, to store arrays, and other more complex structures more easily.
- **Scalability:** MongoDB supports vertical and horizontal scaling. Vertical scaling works by adding more power to an existing machine, while horizontal scaling works by adding more machines to a user's resources.
- **Document-oriented:** Since MongoDB is a NoSQL type database, instead of having data in a relational type format, it stores the data in documents. This makes MongoDB very flexible and adaptable to real business world situation and requirements.
- **Ad hoc queries:** MongoDB supports searching by field, range queries, and regular expression searches. Queries can be made to return specific fields within documents.

- **Load balancing** MongoDB uses the concept of sharding to scale horizontally by splitting data across multiple MongoDB instances. MongoDB can run over multiple servers, balancing the load and/or duplicating data to keep the system up and running in case of hardware failure.



- **Replication:** MongoDB can provide high availability with replica sets. A replica set consists of two or more mongo DB instances. Each replica set member may act in the role of the primary or secondary replica at any time. The primary replica is the main server which interacts with the client and performs all the read/write operations. The Secondary replicas maintain a copy of the data of the primary using built-in replication. When a primary replica fails, the replica set automatically switches over to the secondary and then it becomes the primary server.




## Advantages of MongoDB
MongoDB offers several potential benefits:

- Schema-less.
- Document-oriented. 
- Scalability. 
- Third-party support. MongoDB supports several storage engines and provides pluggable storage engine APIs that let third parties develop their own storage engines for MongoDB.
- Aggregation. The DBMS also has built-in aggregation capabilities, which lets users run MapReduce code directly on the database rather than running MapReduce on Hadoop. MongoDB also includes its own file system called GridFS, akin to the Hadoop Distributed - File System. The use of the file system is primarily for storing files larger than BSON's size limit of 16 MB per document. These similarities let MongoDB be used instead of Hadoop, though the database software does integrate with Hadoop, Spark and other data processing frameworks.



## Data Modelling in MongoDB
As we have seen from the Introduction section, the data in MongoDB has a flexible schema. Unlike in SQL databases, where you must have a table’s schema declared before inserting data, MongoDB’s collections do not enforce document structure. This sort of flexibility is what makes MongoDB so powerful.

Data modeling is the blueprint on which a full-fledged database system is developed. The primary function of a data model is to facilitate visual information of how the relationship between two or a group of data points would look.In data modelling, we must create a relationship that connects each user to the appropriate profile. In a nutshell, data modelling is the first step in database design, as well as the foundation for object-oriented programming. It also provides an indication of how the physical application will appear as development progress.

When data professionals start building data models in MongoDB, they fall upon the choice to either embed the information or to have it separately in a collection of documents. Hence two concepts exist for efficient MongoDB Data Modeling:

- The Embedded Data Model, and
- The Normalized Data Model

### Embedded Data Model
In this case, related data is stored as a field value or an array within a single document. The main benefit of this method is that data is denormalized, making it possible to manipulate related data in a single database operation. As a result, the efficiency of CRUD operations improves, and fewer queries are required. Consider the following document as an example:

<img src="images/5.png" height=400px width=400px align='center'>

**Pros of Embedded Model**
- Increased data access speed 
- Reduced data inconsistency 
- Reduced CRUD operations 

**Cons of Embedded Model**
- Restricted document size
- Data duplication

### Reference Data Model
The related data is stored in separate documents in this case, but there is a reference link between them. The sample data can be reassembled as follows:

<img src="images/6.png" height=400px width=400px align='center'>

**Pros of Reference Data Model**
- Data consistency 
- Improved data integrity 
- Improved cache utilization 
- Improved flexibility 
- Faster writes
- Efficient hardware utilization

**Cons of Reference Data Model**
- Multiple lookups
- Many queries are issued to achieve some operation 



## Key Considerations for MongoDB Data Modeling
When deciding on the best data model, there are several factors to consider. These aspects differ depending on the stage of the Data Lifecycle for which we are designing. These elements are as follows:

- Data Creation, Modification Speed, and Frequency: Small amounts of data should be captured more quickly while maintaining consistency.
- Data Retrieval Speed: The ability to retrieve small or large amounts of data for reporting and analysis.
- ACID Properties: Atomicity, Consistency, Isolation, and Transaction Durability
- Business scope: involves one or more departments or business functions.
- Access to the Finest Grain of Data: Different data use cases may necessitate access to the finest level of detail or various levels of aggregation.

Other factors may exist, but the ones mentioned above have a significant impact on the decision-making process for selecting the best data model.

## MongoDB Data Modeling Schema
For a given set of data, a schema is essentially a skeleton of fields and data types that each field should contain. All rows should have the same columns, and each column should contain the defined data type, according to SQL. However, MongoDB data modelling comes with a flexible Schema that doesn't require all documents to conform to the same standards.

**1. Flexible Schema**
In MongoDB, a flexible schema specifies that documents do not have to have the same fields or data types, and that a field can vary between documents within a collection. The main benefit of this concept is that it allows you to add new fields, delete existing ones, or change field values to a different type, resulting in a new structure for the document.For example:

        { "_id" : ObjectId("5b98bfe7e8b9ab9875e4c80c"),

             "StudentName" : "George  Beckonn",

             "ParentPhone" : 75646344,

             "age" : 10

        }
        { "_id" : ObjectId("5b98bfe7e8b9ab98757e8b9a"),

             "StudentName" : "Fredrick  Wesonga",

             "ParentPhone" : false,

        }

**2. Rigid Schema**
You may decide to create a rigid schema even though these documents may differ from one another. A rigid schema specifies that all documents in a collection have the same structure, allowing you to create document validation rules to ensure data integrity during insert and update operations.

        var userSchema = new mongoose.Schema({
            userId: Number,
            Email: String,
            Birthday: Date,
            Adult: Boolean,
            Binary: Buffer,
            height: Schema.Types.Decimal128,
            units: []
           });

        Its example use case is as follows:

        var user = mongoose.model(‘Users’, userSchema )
        var newUser = new user;
        newUser.userId = 1;
        newUser.Email = “example@gmail.com”;
        newUser.Birthday = new Date;
        newUser.Adult = false;
        newUser.Binary = Buffer.alloc(0);
        newUser.height = 12.45;
        newUser.units = [‘Circuit network Theory’, ‘Algerbra’, ‘Calculus’];
        newUser.save(callbackfunction);

**3. Schema Validation**

Schema validation proves vital when validating data from the server’s end. There exist some schema validation rules to achieve the same. The validation rules are applied operations related to insertion and deletion. The rules can also be added to an existing collection using the ‘collMod’ command. The updates will not get applied to an existing document unless an update is applied to them.

The validator command can be issued when creating a new collection using the ‘dv.createCollection()’ command.

        db.createCollection("students", {
           validator: {$jsonSchema: {
                 bsonType: "object",
                 required: [ "name", "year", "major", "gpa" ],
                 properties: {
                    name: {
                       bsonType: "string",
                       description: "must be a string and is required"
                    },
                    year: {
                       bsonType: "int",
                       minimum: 2017,
                       maximum: 3017,
                       exclusiveMaximum: false,
                       description: "must be an integer in [ 2017, 2020 ] and is required"
                    },
                    gpa: {
                       bsonType: [ "double" ],
                       minimum: 0,
                       description: "must be a double and is required"
                    }
                 }

           }})
           
To insert a new document into the schema, follow the below-given example:

            db.students.insert({
               name: "James Karanja",
               year: NumberInt(2016),
               major: "History",
               gpa: NumberInt(3)
            })
            
An error will occur due to the callback function because of some violated validation rules as the supplied year is not within the specified limit.

There are three levels of validation:
- **Strict:** Validation rules are applied to all inserts and updates.
- **Moderate:** Validation rules are applied to only those existing documents — during inserts and updates — that fulfill the validation criteria.
- **Off:** Validations are off; hence no validation criteria is applied to any document.

## Schema Validation Actions
Schema validation actions apply to those documents that violate the validation criteria in the first place. Hence, there exist the need to provide actions when that happens. MongoDB provides two actions for the same: Error and Warn.

- **Error:** This action rejects insert or update if the validation criteria are not met.

- **Warn:** Warn action will record every violation in the MongoDB log and allow insert or update operator to be completed.

## Defining Relationships in MongoDB Data Modeling

Defining relationships for your schema in your MongoDB data modeling project is the most important consideration. These relationships define how data will get used by your system. These are three central relationships defined in MongoDB Data Modeling: 

- One-to-one
- One-to-many
- Many-to-many.

### One-to-one Relationship

A great example of this relationship would be your name. Because one user can have only one name. One-to-one data can be modeled as the key-value pairs in your database. Look at the example given below:

**User:**

{
    "_id": "ObjectId('AAA')",
    
    "name": "XYZ",
}


### One-to-many Relationship

A great example of this would be to imagine if you have more than one addresses. The schema will potentially save thousands of subparts and relationships. Let’s take a look at its works:

**Employee**

{
    "_id": "ObjectId('AAA')",
    
    "name": "XYZ",
    
    "department": "MMM",
    
    "addresses": [
    
        { "street": "123 Sesame St", "city": "Anytown", "cc": "USA" },  
        
        { "street": "123 Avenue Q",  "city": "New York", "cc": "USA" }
        
    ]
}


### Many-to-many Relationship

To better understand many-to-many relationships, try imaging a to-do application. In the application, a user might have many tasks, and to a task multiple users assigned. Hence, to preserve relationships between users and tasks, reference will exist between one user to many tasks and one task too many users. Let’s see with the help of the below-given example given:

**Users:**

{
    "_id": ObjectID("AAF1"),
    
    "name": "XYZ",
    
    "tasks": [ObjectID("ADF9"), ObjectID("AE02"), ObjectID("AE73")]
    
}

**Tasks:**

{
    "_id": ObjectID("ADF9"),
    
    "description": "Write a blog about MongoDB schema design",
    
    "due_date": ISODate("2023-04-01"),
    
    "owners": [ObjectID("AAF1"), ObjectID("BB3G")]
    
}

## Installing Mongodb
### Step 1: Download The Installer version 5.
#### Go to the download page at https://www.mongodb.com/try/download/community
#### Choose your OS and your desired MongoDB version.
#### Click Download.


###  Step 2: Run The MongoDB Installer(a .msi file)
#### Go to your ‘Downloads’ folder.
#### Double Click on the installer.
#### choose the service as a network service user.
#### Click next.
#### Click Finish.
#### While installing , check box the mongodb compass.

### Installing pymongo

In [1]:
pip install pymongo

Collecting pymongo
  Downloading pymongo-4.3.3-cp39-cp39-win_amd64.whl (382 kB)
     -------------------------------------- 382.5/382.5 kB 2.2 MB/s eta 0:00:00
Collecting dnspython<3.0.0,>=1.16.0
  Downloading dnspython-2.3.0-py3-none-any.whl (283 kB)
     -------------------------------------- 283.7/283.7 kB 2.9 MB/s eta 0:00:00
Installing collected packages: dnspython, pymongo
Successfully installed dnspython-2.3.0 pymongo-4.3.3
Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 23.0 -> 23.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [1]:
### importing pymongo package
import pymongo as pm

In [19]:
### a variable named connection_url to store the MongoDB connection URL
connection_url="mongodb://localhost:27017"

In [20]:
### To connect to the MongoDB server, we will use the MongoClient method.
client=pm.MongoClient(connection_url)

In [21]:
 ### To display the databases we use list_database_names() method of client object created above .
client.list_database_names()

['admin', 'config', 'local']

In [22]:
### creating database
database_name="student_database"
student_db=client[database_name]

In [23]:
### creating a collection in the database
collection_name="computer science"
collection=student_db[collection_name]

In [24]:
## To list the available collections in a database
student_db.list_collection_names()


[]

### Inserting document in the collection
To insert a record, or document as it is called in MongoDB, into a collection, we use the **insert_one()** and **insert_many() method**.

In [25]:
### Inserting single document into the collection.
document={"Name":"Zainab",
"Roll No ":  16,
"Branch ": "BS(SE)"}
collection.insert_one(document)

<pymongo.results.InsertOneResult at 0x2307c924370>

In [26]:
### Inserting multiple documents into the collection.
documents=[{"Name":"Ali","Roll No":159,"Branch":"CSE"},
           {"Name":"Ayesha","Roll No":155,"Branch":"CSE"},
           {"Name":"Zaib","Roll No":156,"Branch":"CSE"}]
collection.insert_many(documents)

<pymongo.results.InsertManyResult at 0x2307c9596d0>

### Retrieving the data from the collection

In MongoDB we use the **find()** and **find_one() methods** to find data in a collection.Just like the SELECT statement is used to find data in a table in a MySQL database. No parameters in the find() method gives you the same result as SELECT * in MySQL.

In [11]:
### retrieving single document using find_one
query={"Name":"Zainab"}
print(collection.find_one(query))

{'_id': ObjectId('6422a4549cbbb2799d32f0c3'), 'Name': 'Zainab', 'Roll No ': 16, 'Branch ': 'BS(SE)'}


In [12]:
### Retrieving multiple documents from the collection
query={"Branch":"CSE"}
result=collection.find(query)
for i in result:
    print(i)

{'_id': ObjectId('6422a4589cbbb2799d32f0c4'), 'Name': 'Ali', 'Roll No': 159, 'Branch': 'CSE'}
{'_id': ObjectId('6422a4589cbbb2799d32f0c5'), 'Name': 'Ayesha', 'Roll No': 155, 'Branch': 'CSE'}
{'_id': ObjectId('6422a4589cbbb2799d32f0c6'), 'Name': 'Zaib', 'Roll No': 156, 'Branch': 'CSE'}


In [13]:
### To retrieve all the documents you need to pass an empty query into the find method
result=collection.find({})
for i in result:
    print(i)

{'_id': ObjectId('6422a4549cbbb2799d32f0c3'), 'Name': 'Zainab', 'Roll No ': 16, 'Branch ': 'BS(SE)'}
{'_id': ObjectId('6422a4589cbbb2799d32f0c4'), 'Name': 'Ali', 'Roll No': 159, 'Branch': 'CSE'}
{'_id': ObjectId('6422a4589cbbb2799d32f0c5'), 'Name': 'Ayesha', 'Roll No': 155, 'Branch': 'CSE'}
{'_id': ObjectId('6422a4589cbbb2799d32f0c6'), 'Name': 'Zaib', 'Roll No': 156, 'Branch': 'CSE'}


### Updating the documents in the collection
To update a record, or document as it is called in MongoDB, into a collection, we use the **update_one()** and **update_many()** method.

In [14]:
### Updating single document using update_one() method
query={"Roll No":{"$eq":155}}
present_data=collection.find_one(query)
new_data={'$set':{"Name":'Aahil'}}
collection.update_one(present_data,new_data)


<pymongo.results.UpdateResult at 0x2307c9139d0>

In [15]:
### Updating multiple documents using update_many method
present_data={"Branch":"CSE"}
new_data={"$set":{"Branch":"IT"}}
collection.update_many(present_data,new_data)

<pymongo.results.UpdateResult at 0x2307c913910>

### Deleting documents from the collection
To delete a record, or document as it is called in MongoDB, into a collection, we use the **delete_one()** and **delete_many()** method.

In [16]:
### deleting single document using delete_one() method
query={"Roll No":159}
collection.delete_one(query)

<pymongo.results.DeleteResult at 0x2307c9136a0>

In [17]:
### deleting mutiiple dpcuments using delete_many() method
query={"Branch":"IT"}
collection.delete_many(query)

<pymongo.results.DeleteResult at 0x2307c913a00>

### dropping collection

In [18]:
collection.drop()

## Mongodb Atlas

### Steps

- Create MongoDB Atlas account
- Create Shared Cluster with deafult options

<img src="images/Cloud1.png" height=700px width=700px align='center'>

- Create database user

<img src="images/Cloud2.png" height=700px width=700px align='center'>



<img src="images/Cloud3.png" height=700px width=700px align='center'>

- Select option to connect from

<img src="images/Cloud4.png" height=700px width=700px align='center'>

- Add entries to your IP Access List. I opted "Add my current IP Address"

<img src="images/Cloud5.png" height=700px width=700px align='center'>

- Click Finish and Close

<img src="images/Cloud6.png" height=700px width=700px align='center'>

- Click Go To Database

<img src="images/Cloud7.png" height=700px width=700px align='center'>

- Click on Connect

<img src="images/Cloud8.png" height=700px width=700px align='center'>

- Select "Connect Using MongoDB Compass"

<img src="images/Cloud9.png" height=600px width=600px align='center'>

- Select "I have MongoDB Compass"
- Copy Connection string

<img src="images/Cloud10.png" height=600px width=600px align='center'>

- Open MongoDB Compass

<img src="images/Cloud11.png" height=700px width=700px align='center'>

- Click on Connect from menu bar drop down appear. Then disconnect the running connection
- Paste Connection String and replace username and password fields. Then click Connect 

<img src="images/Cloud14.png" height=600px width=600px align='center'>

- Now your mongoDB Compass is connected with cloud

<img src="images/Cloud15.png" height=700px width=700px align='center'>

### Connecting to Pymongo

In [2]:
### a variable named connection_url to store the MongoDB connection URL
connection_url="mongodb+srv://Tayyaba25:Teeba12699@cluster0.odan0mg.mongodb.net/test"

In [3]:
### To connect to the MongoDB server, we will use the MongoClient method.
client=pm.MongoClient(connection_url)

In [4]:
 ### To display the databases we use list_database_names() method of client object created above .
client.list_database_names()

['admin', 'local']

In [5]:
### creating database
database_name="student_cloud_database"
student_cloud_db=client[database_name]

In [6]:
### creating a collection in the database
collection_name="computer science"
collection=student_cloud_db[collection_name]

In [7]:
## To list the available collections in a database
student_cloud_db.list_collection_names()


[]

### Inserting document in the collection

In [8]:
### Inserting single document into the collection.
document={"Name":"Zainab",
"Roll No ":  16,
"Branch ": "BS(SE)"}
collection.insert_one(document)

<pymongo.results.InsertOneResult at 0x20cbc30a070>

Inserted document is visible in MongoDB Compass

<img src="images/Cloud16.png" height=800px width=800px align='center'>

Inserted document is visible in on Atlas account

<img src="images/Cloud18.png" height=800px width=800px align='center'>

In [9]:
### Inserting multiple documents into the collection.
documents=[{"Name":"Ali","Roll No":159,"Branch":"CSE"},
           {"Name":"Ayesha","Roll No":155,"Branch":"CSE"},
           {"Name":"Zaib","Roll No":156,"Branch":"CSE"}]
collection.insert_many(documents)

<pymongo.results.InsertManyResult at 0x20cbc3175e0>

### Retrieving the data from the collection

In [10]:
### retrieving single document using find_one
query={"Name":"Zainab"}
print(collection.find_one(query))

{'_id': ObjectId('6423020eed0027b2613ea7d1'), 'Name': 'Zainab', 'Roll No ': 16, 'Branch ': 'BS(SE)'}


In [11]:
### Retrieving multiple documents from the collection
query={"Branch":"CSE"}
result=collection.find(query)
for i in result:
    print(i)

{'_id': ObjectId('64230283ed0027b2613ea7d2'), 'Name': 'Ali', 'Roll No': 159, 'Branch': 'CSE'}
{'_id': ObjectId('64230283ed0027b2613ea7d3'), 'Name': 'Ayesha', 'Roll No': 155, 'Branch': 'CSE'}
{'_id': ObjectId('64230283ed0027b2613ea7d4'), 'Name': 'Zaib', 'Roll No': 156, 'Branch': 'CSE'}


In [12]:
### To retrieve all the documents you need to pass an empty query into the find method
result=collection.find({})
for i in result:
    print(i)

{'_id': ObjectId('6423020eed0027b2613ea7d1'), 'Name': 'Zainab', 'Roll No ': 16, 'Branch ': 'BS(SE)'}
{'_id': ObjectId('64230283ed0027b2613ea7d2'), 'Name': 'Ali', 'Roll No': 159, 'Branch': 'CSE'}
{'_id': ObjectId('64230283ed0027b2613ea7d3'), 'Name': 'Ayesha', 'Roll No': 155, 'Branch': 'CSE'}
{'_id': ObjectId('64230283ed0027b2613ea7d4'), 'Name': 'Zaib', 'Roll No': 156, 'Branch': 'CSE'}


### Updating the documents in the collection

In [13]:
### Updating single document using update_one() method
query={"Roll No":{"$eq":155}}
present_data=collection.find_one(query)
new_data={'$set':{"Name":'Aahil'}}
collection.update_one(present_data,new_data)


<pymongo.results.UpdateResult at 0x20cbc2dfc40>

In [14]:
### Updating multiple documents using update_many method
present_data={"Branch":"CSE"}
new_data={"$set":{"Branch":"IT"}}
collection.update_many(present_data,new_data)

<pymongo.results.UpdateResult at 0x20cbc2dfb20>

### Deleting documents from the collection

In [15]:
### deleting single document using delete_one() method
query={"Roll No":159}
collection.delete_one(query)

<pymongo.results.DeleteResult at 0x20cbc2d0c40>

In [16]:
### deleting mutiiple dpcuments using delete_many() method
query={"Branch":"IT"}
collection.delete_many(query)

<pymongo.results.DeleteResult at 0x20cbc317ee0>

## Aggregations Framework

Aggregation is a way of processing a large number of documents in a collection by means of passing them through different stages. The stages make up what is known as a pipeline. The stages in a pipeline can filter, sort, group, reshape and modify documents that pass through the pipeline.

One of the most common use cases of Aggregation is to calculate aggregate values for groups of documents. This is similar to the basic aggregation available in SQL with the GROUP BY clause and COUNT, SUM and AVG functions. MongoDB Aggregation goes further though and can also perform relational-like joins, reshape documents, create new and update existing collections,
An aggregation pipeline consists of one or more stages that process documents: Each stage performs an operation on the input documents. For example, a stage can filter documents, group documents, and calculate values. The documents that are output from a stage are passed to the next stage.

#### The aggregate() Method
For the aggregation in MongoDB, you should use aggregate() method.

#### Syntax
This is an example of how to build an aggregation query:

<b>db.collectionName.aggregate(pipeline, options)</b>

where collectionName – is the name of a collection,
pipeline – is an array that contains the aggregation stages,
options – optional parameters for the aggregation 
This is an example of the aggregation pipeline syntax:

<img src="images/Mongodbframework.png" height=600px width=600px align='center'>



- $match stage:
filter those documents we need to work with , those that fit our needs.
- $group stage:
does the aggregation job.
- $sort  stage:</b>
sorts the resulting documents the way we require (ascending or descending)

<img src="images/ag1.png" height=600px width=600px align='center'>
<img src="images/ag2.png" height=600px width=600px align='center'>

In [31]:
users_db.users.aggregate([{'$match':{"name":"mp"}},{'$count':"Total users with name mp:"}])
users_db.users.aggregate( [
   { '$count': "myCount" }
])

<pymongo.command_cursor.CommandCursor at 0x254192ce590>

In [40]:
users_db.users.aggregate( [
   { '$count': "myCount" }
])

<pymongo.command_cursor.CommandCursor at 0x2541a04f610>

### Advance Querying

In [3]:
import pymongo as pm
connection_url="mongodb://localhost:27017"
client=pm.MongoClient(connection_url)
database_name="users_database"
users_db=client[database_name]
collection_name="Users"
collection=users_db[collection_name]
documents=[{ "id": 1, "name": "mp","age": 10,"email": "xyz@gmail.com"},{
  "id": 2,
  "name": "np",
  "age": 18,
  "email": "abc@gmail.com",}]
collection.insert_many(documents)

<pymongo.results.InsertManyResult at 0x2084fefec20>

In [4]:
### Example 1 :  find query returns a cursor, and has no actual data is returned (only the cursor information).
users_db.users.find({"name": "mp"})


<pymongo.cursor.Cursor at 0x2084f9e9110>

In [5]:
### Limit Query:
users_db.users.find().limit(10)

<pymongo.cursor.Cursor at 0x2084f9aeb90>

In [6]:
### Greater/Less than Modifier($gt, $lt, $gte, $lte):
users_db.users.find({"age": {'$gt': 20}})
users_db.users.find({"age": {'$lt': 20}})
users_db.users.find({"age": {'$gte': 20}})
users_db.users.find({"age": {'$lte': 20}})

<pymongo.cursor.Cursor at 0x2084f9cc1d0>

### Grid FS
GridFS is the MongoDB specification for storing and retrieving large files such as images, audio files, video files, etc. It is kind of a file system to store files but its data is stored within MongoDB collections. GridFS has the capability to store files even greater than its document size limit of 16MB.

GridFS divides a file into chunks and stores each chunk of data in a separate document, each of maximum size 255k.

GridFS by default uses two collections fs.files and fs.chunks to store the file's metadata and the chunks. Each chunk is identified by its unique _id ObjectId field. The fs.files serves as a parent document. The files_id field in the fs.chunks document links the chunk to its parent.
Following is a sample document of fs.files collection −

### MongoDB Scalability

Scalability is the ability of a database to constantly adjust its resources to meet application demands. As an application grows or traffic increases, the original server resources, such as RAM, CPU, storage, and I/O, might not suffice. This is when you will need to scale your database.

#### Vertical scaling
Vertical scaling refers to increasing the processing power of a single server or cluster. Both relational and non-relational databases can scale up, but eventually, there will be a limit in terms of maximum processing power and throughput. Additionally, there are increased costs with scaling up to high-performing hardware, as costs do not scale linearly.

<img src="vs.png" height=300px width=300px align='center'>

#### Horizontal scaling
Horizontal scaling (or scale-out) involves dividing your application data and workload over multiple servers and adding servers to increase capacity. Each machine handles a subset of the overall workload, giving you unlimited scalability. Expanding the capacity of the deployment only requires adding additional servers as needed, which gives you cost linearity compared to vertical scaling. This is difficult with relational databases due to the difficulty in spreading out related data across nodes. With non-relational databases, this is made simpler since collections are self-contained and not coupled relationally. This allows them to be distributed across nodes more simply, as queries do not have to “join” them together across nodes.Scaling MongoDB horizontally is achieved through sharding (preferred) and replica sets.

<img src="hs.png" height=300px width=300px align='center'>



#### Sharding
Each node contains a subset of the overall data. This is especially effective for increasing throughput for use cases that involve significant amounts of write operations, as each operation only affects one of the nodes and the partition of data it is managing.While sharding happens automatically in MongoDB Atlas,it is still up to us to configure the shard key, which is used by MongoDB for partitioning the data in a non-overlapping fashion across shards. 

<img src="sharding.png" height=400px width=400px align='center'>

Over time, datasets typically do not grow uniformly, and various shards will grow at faster rates than others. As your workloads evolve and data sets grow, there will be a need to rebalance data to ensure an even distribution of load across the cluster. This uneven distribution of data is addressed through shard balancing. In MongoDB, this is handled automatically by the sharded cluster balancer.

#### Replica Sets
Replica sets seem similar to sharding, but they differ in that the dataset is duplicated. Replication allows for high availability, redundancy/failover handling, and decreased bottlenecks on read operations. However, they can also introduce issues for applications with large amounts of write transactions, as each update must be propagated over to every replica set member.

<img src="images/replica.png" height=400px width=400px align='center'>

- **Load balancing** MongoDB uses the concept of sharding to scale horizontally by splitting data across multiple MongoDB instances. MongoDB can run over multiple servers, balancing the load and/or duplicating data to keep the system up and running in case of hardware failure.

<img src="images/2.png" height=400px width=400px align='center'>

- **Replication:** MongoDB can provide high availability with replica sets. A replica set consists of two or more mongo DB instances. Each replica set member may act in the role of the primary or secondary replica at any time. The primary replica is the main server which interacts with the client and performs all the read/write operations. The Secondary replicas maintain a copy of the data of the primary using built-in replication. When a primary replica fails, the replica set automatically switches over to the secondary and then it becomes the primary server.

<img src="images/3.png" height=400px width=400px align='left'> <img src="images/4.png" height=400px width=400px align='right'>

#### Is horizontal or vertical scaling better?
It depends on your use case! For most applications, you want the ability to do both, as that gives flexibility in meeting the throughput needs of your application. If the needs of your application can be met with a single instance, vertical scaling tends to be the simpler, more straightforward option.

For workloads which are more than a single instance can handle, horizontal scaling becomes necessary. Horizontal scaling also supports low latency in globally-distributed applications as well as aids in complying with data-sovereignty requirements. From an administrative and maintenance perspective, horizontal scaling tends to be the more difficult task. Having this feature provided by your database platform can be a huge time-saver.

#### Scaling Proactively Versus Scaling Reactively
Proactive scaling refers to scaling your database in advance of foreseen load or high-traffic events. This could be based upon a regular pattern (e.g., day of the week or certain times of the year), or it could be done before specific events, such as launching a marketing campaign.

In contrast, reactive scaling refers to scaling in response to metrics. These could be warning signs such as slow transactions and query response times, or it could even be error messages coming from your database monitoring. In the worst-case scenario, this could be an outage due to excessive load.

### The MongoDB Backup and Restore
The MongoDB Backup and Restore Tool allows you to encapsulate the state of a cluster and return to that state at any time. This helps protect you from data loss, as you can restore a database to a MongoDB instance using a created copy of that instance.

#### Backup Types
There are two types of backups in MongoDB: logical backups and physical backups.

##### Logical Backups
Logical backups dump data from databases into backup files, formatted as a BSON file. During the logical backup process, client APIs are used to get the data from the server. The data is encrypted, serialized, and written as either a “.bson,” “.json,” or “.csv” file, depending on the backup utility used. If you have enabled field level encryption, backing up data will ensure that the field remains encrypted.

MongoDB supplies two utilities to manage logical backups: **Mongodump and Mongorestore.**

The Mongodump command dumps a backup of the database into the “.bson” format, and this can be restored by providing the logical statements found in the dump file to the databases.

The Mongorestore command is used to restore the dump files created by Mongodump. Index creation happens after the data is restored.

Logical backups copy the data itself. They don’t copy any of the physical files relating to the data (like control files, log files, executables, etc.). They are typically used to archive databases, verify database structures, and move databases across different environments and operating systems.

If you have one server that contains a collection you need in another server, you could use a MongoDB logical backup to migrate the collection from the original server to the target server.

##### Physical Backups
Physical backups are snapshots of the data files in MongoDB at a given point in time. The snapshots can be used to cleanly recover the database, as they include all the data found when the snapshot was created. Physical backups are critical when attempting to back up large databases quickly.

There are currently no provided, out-of-the-box solutions for creating physical backups with MongoDB. While you can create physical backups with LVM snapshots or block storage volume snapshots, it’s easier to use MongoDB Atlas.

MongoDB Atlas can be used to restore both logical and physical backups.

### Data Visualization and Mongodb
In today’s digital world, you’re constantly surrounded by data. But in its raw form, it’s hard to get actionable insights from this data. You can imagine how difficult and confusing it can be to interpret rows and rows of numbers. Because of this, you usually use a method called data visualization to present patterns and trends in data more easily. 

Therefore, data visualization is the translation of data in a visual way such as a graph, plot, heatmap, or equivalent with the main goal being to make the data easy to digest

MongoDB Charts provides an intuitive way to build charts and dashboards off data sitting in MongoDB Atlas. It’s fully interactive; a user can drag and drop fields, zoom in, pan, etc. It also includes a Chart builder that you can use to customize how charts appear. 
### Visualization tool:
Qlik Sense is a modern cloud analytics platform designed to help you create a data literate workforce and transform your business. Qlik has a variety of different modules that all work in harmony to help you manage and clean your data, share information, create visualizations, and reports. It combines the power of AI with the creativity of human intuition.