<h1 align="center">NoSQL-with-MongoDB</h1>

# Learning Agenda of Notebook:
- Introduction to MongoDB and NoSQL databases
- Difference between SQL and No-SQL
- Setting up MongoDB and Python
- Common MongoDB Operators
- CRUD Operations
- Data Modeling
- Indexing and Performance Optimization
- Advanced Querying
- Aggregation Framework
- Working with GridFS
- MongoDB and Cloud Services
- Error handling and Debugging
- Understanding MongoDB Architecture
- MongoDB Atlas and Cloud Integration
- MongoDB and data visualization
- MongoDB and scalability
- MongoDB and backup and recovery

# Introduction to MongoDB and NoSQL databases


## NoSQL Databases

- **NoSQL databases** (Not Only SQL) are a type of database that differ from traditional relational databases in the way they store and manage data. Unlike relational databases, which use tables and structured query language (SQL) to store and retrieve data, NoSQL databases use a variety of data models, such as document-oriented, key-value, graph, and column-family, to store and retrieve data.

- Here are some **examples** of NoSQL databases:

    - **Document-oriented databases:** MongoDB is an example of a document-oriented database. In this type of database, data is stored in the form of documents, which are similar to JSON objects. Each document can have different fields and structures, making it easy to store complex, unstructured data.

    - **Key-value databases:** Redis is an example of a key-value database. In this type of database, data is stored as key-value pairs, where the key is a unique identifier and the value is the data being stored. Key-value databases are designed to be fast and scalable and are often used for caching and session management.

    - **Graph databases:** Neo4j is an example of a graph database. In this type of database, data is stored in the form of nodes and relationships, making it well-suited for storing data with complex relationships, such as social networks or recommendation systems.

    - **Column-family databases:** Apache Cassandra is an example of a column-family database. In this type of database, data is stored in columns rather than rows, making it well-suited for large-scale, distributed systems.


> **Note:** NoSQL databases are often preferred over relational databases for certain use cases, such as when dealing with large amounts of unstructured data, when scalability is a concern, or when the data has complex relationships that are difficult to model in a traditional relational database.


<img src="images/NoSQLDatabases.jpg" align="center" height=400px width=400px>


## What is MongoDB?

<img src="images/intro_mongodb.jpeg" align="center" height=400px width=400px>

- **MongoDB** is a `document-oriented NoSQL database` that is designed to be scalable and flexible. Unlike traditional relational databases, which store data in tables and use structured query language (SQL) to retrieve data, MongoDB stores data in documents, which are similar to JSON objects.

- Here are some **examples** to illustrate how MongoDB works:

    - **Storing data in documents:** In MongoDB, data is stored in documents, which are similar to JSON objects. For example, consider a database of books. Each book can be stored as a document with fields for the title, author, ISBN, and so on.
    
    - **Collections:** Documents are stored in collections, which are similar to tables in relational databases. For example, the books database might have a collection named "books" that contains all the books in the database.

    - **Querying data:** MongoDB provides a rich query language that allows you to retrieve data based on various criteria. For example, you can retrieve all books written by F. Scott Fitzgerald by using the following query.

```
{
    "_id": ObjectId("5f22b3e3e3f0b375bc0edc71"),
    "title": "The Great Gatsby",
    "author": "F. Scott Fitzgerald",
    "ISBN": "9780743273565"
}

```



# Difference between SQL and No-SQL


SQL and NoSQL are two different types of database management systems, each with its own strengths and weaknesses. Here are some of the key differences between the two:

- **Data model:** SQL databases use a tabular data model, where data is stored in tables with fixed schemas, and each table has a unique primary key. NoSQL databases, on the other hand, use a variety of data models, including document, key-value, graph, and column-family models.

- **Scaling:** SQL databases are vertically scalable, which means that to handle more traffic, you need to upgrade your hardware to a more powerful server. NoSQL databases are horizontally scalable, which means that to handle more traffic, you can add more servers to the database cluster.

- **ACID compliance:** SQL databases are typically ACID compliant, which means that they ensure data consistency, isolation, and durability. NoSQL databases, on the other hand, often sacrifice ACID compliance for performance and scalability.

- **Query language:** SQL databases use the Structured Query Language (SQL) for querying data, which is a powerful and widely-used language. NoSQL databases use a variety of query languages, including MongoDB's query language, Cassandra Query Language (CQL), and Amazon DynamoDB's API.

- **Use cases:** SQL databases are typically used for applications that require complex queries and transactions, such as banking, finance, and e-commerce. NoSQL databases are often used for applications that require high scalability and performance, such as real-time analytics, IoT, and social media.

<img src="images/sqlvsnosql.jpeg" height=400px width=400px>

# Setting up MongoDB and Python

<img src="images/mongodbvspython.png" align="center" height=400px width=400px>



To set up MongoDB and Python, you need to follow these steps:

- **Install MongoDB:** You can download MongoDB from the official [website](https://www.mongodb.com/try/download/community) and install it on your system.

<br/>

- **Install MongoDB:** MongoDB Compass is a graphical user interface (GUI) tool for MongoDB that allows users to explore and interact with their data in a visual way. To install MongoDB Compass, go to the official MongoDB website at [website](https://www.mongodb.com/try/download/compass) and click the `Download Compass` button.
<br/>

- **Install PyMongo**: PyMongo is the Python driver for MongoDB. You can install it using pip: `pip3 install pymongo` , this command is for linux OS.
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>


- **Start and Enable the MongoDB service**: 
By default, the MongoDB service is disabled upon installation. You can verify this by running the command:

<p style="color:green; text-align:center"> <b> sudo systemctl status mongod </b></p>


<img src="images/s1_1.png" height=500px width=500px>
<br/>

To start the MongoDB service execute the command:
<p style="color:green; text-align:center"> <b> sudo systemctl start mongod </b></p>


Once again, confirm if the service is running:
<p style="color:green; text-align:center"> <b>sudo systemctl status mongod </b></p>



<img src="images/s2_2.png" height=500px width=500px>
<br/>


<br/>
<br/>
<br/>
<br/>


- **Connect to MongoDB:** To connect to MongoDB from Python, you need to create a `MongoClient object`. You can connect to a local MongoDB instance like this:

<img src="images/s3_3.png" height=500px width=500px align="center">
<br/>

<img src="images/s4_4.png" height=500px width=500px align="center">
<br/>





<p style="color:green; text-align:center"> <b>from pymongo import MongoClient</b></p>
<br />
<p style="color:green; text-align:center"> <b>client = MongoClient("mongodb://localhost:27017/")</b></p>
<br />



>**Note:** The **MongoClient object** in PyMongo is used to establish a connection to a MongoDB database. It represents a client-side connection to a MongoDB database and provides a way to interact with the database. With a MongoClient object, you can create and manipulate databases, collections, and documents. In this example, the MongoClient object is connecting to a MongoDB instance running on the same machine as the Python code (localhost) and using the default MongoDB port (27017). Once you have a MongoClient object, you can use it to access and manipulate the databases, collections, and documents in the MongoDB instance.

- **Work with databases and collections:** Once you have connected to MongoDB, you can work with databases and collections. To create a database, you can use the `client.db_name` syntax, where `db_name` is the name of the database. To create a collection, you can use the `db.collection_name` syntax, where `collection_name` is the name of the collection. For example, to create a database named `booksdb` and a collection named `books`, you can do the following: 


```

db = client.booksdb
books = db.books

```

- **Insert data into the collection:** To insert data into the collection, you can use the `insert_one` or `insert_many` method. For example, to insert a single book into the collection, you can do the following:

```
book = {"title": "Taxonomy of IDS", "author": "Arif Butt"}
books.insert_one(book)

```

This will insert the book document into the "books" collection in the "booksdb" database.





### Before Creating new Database 

<img src="images/s5.png" height=700px width=700px>

### Demo 

In [3]:
from pymongo import MongoClient
client = MongoClient("mongodb://localhost:27017/")



In [7]:
# get list of all current databases
client.list_database_names()

['admin', 'config', 'local', 'test_database']

In [8]:
# create a new database
db = client.booksdb

In [9]:
# create a collection into `booksdb` database
books = db.books

In [11]:
# insert record into new created collection
book = {"title": "Taxonomy of IDS", "author": "Arif Butt"}
books.insert_one(book)

<pymongo.results.InsertOneResult at 0x7f09d055ba40>

In [16]:
# fetch result
books.find_one()

{'_id': ObjectId('63fb9fecc766296b075ab4f3'),
 'title': 'Taxonomy of IDS',
 'author': 'Arif Butt'}

### After Creating new Database
<img src="images/s6.png" height=600px width=600px>
<img src="images/s7.png" height=600px width=600px>

> **Note:** `_id`s have to be unique in a collection. You can quickly verify this by inserting two documents with the same `_id` in two different collections.

In [13]:
# after creating new database get list of all the databases
client.list_database_names()

['admin', 'booksdb', 'config', 'local', 'test_database']

### Check Your Concept:
- Create a new database and `drop` it from `MongoDB` local host.
- Create a collection in a database and `drop` it from the database.

# Common MongoDB Operators

**MongoDB offers different types of operators that can be used to interact with the database. Operators are special symbols or keywords that inform a compiler or an interpreter to carry out mathematical or logical operations.**

The query operators enhance the functionality of MongoDB by allowing developers to create complex queries to interact with data sets that match their applications.

MongoDB offers the following query operator types:

- Comparison
- Logical
- Element
- Evaluation
- Array
- Bitwise


## Comparison Operators
MongoDB comparison operators can be used to compare values in a document. The following table contains the common comparison operators.

<img src="images/s9.png" height=500px width=500px>

## Logical Operators

MongoDB logical operators can be used to filter data based on given conditions. These operators provide a way to combine multiple conditions. Each operator equates the given condition to a true or false value.

<img src="images/s10.png" height=500px width=500px>

## CRUD Operations 
<img src="images/crud.png" align="center" height=400px width=400px>

- **CRUD** stands for `Create`, `Read`, `Update`, and `Delete` and are the basic operations that you perform on data in a database.

Here's a brief overview of the CRUD operations in MongoDB using Python's PyMongo library:


### Create (Insert) 
**To insert a document (record) into a collection `(table)`, you use the `insert_one()` or `insert_many()` method on the collection object. For example:**


In [18]:
from pymongo import MongoClient

client = MongoClient("mongodb://localhost:27017/")

# create a new database
db = client["test_database"]

# create a new collection
collection = db["test_collection"]

# Insert a single document
collection.insert_one({"name": "Ehtisham", "age": 22})


<pymongo.results.InsertOneResult at 0x7f09d0532780>

In [19]:
# Insert multiple documents
docs = [{"name": "Ali", "age": 20}, {"name": "Ayesha", "age": 17}]
collection.insert_many(docs)


<pymongo.results.InsertManyResult at 0x7f09d01d9880>

<img src="images/s8.png" height=800px width=800px>

### Read (Find)

**To query for documents in a collection, you use the `find()` method on the collection object. For example:**

In [21]:
# Find all documents
docs = collection.find()
for doc in docs:
    print(doc)

{'_id': ObjectId('63fba517c766296b075ab4f5'), 'name': 'Ehtisham', 'age': 22}
{'_id': ObjectId('63fba52ac766296b075ab4f6'), 'name': 'Ali', 'age': 20}
{'_id': ObjectId('63fba52ac766296b075ab4f7'), 'name': 'Ayesha', 'age': 17}


In [24]:
# Find documents that match a certain condition
# display all the documents which have age greater than 20
# gt is operator which stands for greater than
docs = collection.find({"age": {"$gt": 20}})
for doc in docs:
    print(doc)

{'_id': ObjectId('63fba517c766296b075ab4f5'), 'name': 'Ehtisham', 'age': 22}


### Update 

**To update a document in a collection, you use the `update_one()` or `update_many()` method on the collection object. For example:**

In [26]:
# Update a single document
collection.update_one({"name": "Ehtisham"}, {"$set": {"age": 21}})


<pymongo.results.UpdateResult at 0x7f09baa7e840>

In [29]:
# after updating record, verify your record
docs = collection.find()
for doc in docs:
    print(doc)

{'_id': ObjectId('63fba517c766296b075ab4f5'), 'name': 'Ehtisham', 'age': 21}
{'_id': ObjectId('63fba52ac766296b075ab4f6'), 'name': 'Ali', 'age': 20}
{'_id': ObjectId('63fba52ac766296b075ab4f7'), 'name': 'Ayesha', 'age': 17}


In [30]:
# Update multiple documents
# The $inc operator is used to increment the value of a field by a specified amount.
# Increment by 1 in age in all the documents, where age is less than 20
collection.update_many({"age": {"$lt": 20}}, {"$inc": {"age": 1}})


<pymongo.results.UpdateResult at 0x7f09b8eced00>

In [31]:
# after updating record, verify your record
docs = collection.find()
for doc in docs:
    print(doc)

{'_id': ObjectId('63fba517c766296b075ab4f5'), 'name': 'Ehtisham', 'age': 21}
{'_id': ObjectId('63fba52ac766296b075ab4f6'), 'name': 'Ali', 'age': 20}
{'_id': ObjectId('63fba52ac766296b075ab4f7'), 'name': 'Ayesha', 'age': 18}


### Delete 

**To delete a document from a collection, you use the `delete_one()` or `delete_many()` method on the collection object. For example:**

In [32]:
# Delete a single document
# Delete a document where name is Ehtisham
collection.delete_one({"name": "Ehtisham"})

<pymongo.results.DeleteResult at 0x7f09b8e50800>

In [33]:
# after updating record, verify your record
docs = collection.find()
for doc in docs:
    print(doc)

{'_id': ObjectId('63fba52ac766296b075ab4f6'), 'name': 'Ali', 'age': 20}
{'_id': ObjectId('63fba52ac766296b075ab4f7'), 'name': 'Ayesha', 'age': 18}


In [34]:
# Delete multiple documents
# Delete all the documents which contain age less than 20
collection.delete_many({"age": {"$lt": 20}})


<pymongo.results.DeleteResult at 0x7f09d0179d00>

In [35]:
# after updating record, verify your record
docs = collection.find()
for doc in docs:
    print(doc)

{'_id': ObjectId('63fba52ac766296b075ab4f6'), 'name': 'Ali', 'age': 20}


<img src="images/s11.png">

## Data Modeling

<img src="images/data_model.jpg" align="right" height=500px width=500px>

- **Data modeling** in MongoDB is the process of designing and organizing data in a way that is optimized for the MongoDB document-oriented data model. It involves creating a schema that defines the structure, relationships, and constraints of the data, and using that schema to create collections and documents. The goal of data modeling in MongoDB is to ensure that the data is stored in a way that is efficient, flexible, and consistent with the requirements of the application.

- **Data modeling** in MongoDB is different from data modeling in relational databases because MongoDB is a document-oriented database and it stores data in BSON format (binary JSON) in the form of documents, and collections rather than tables.

- MongoDB also provides a more flexible data model, which allows for more dynamic and unstructured data, and it also support horizontal scaling, which allows for better performance and scalability.


#### Another Definition

- **Data modeling** in MongoDB refers to the process of defining the structure of your data, including the relationships between different collections, documents, and fields. This structure is important for ensuring that your data is stored and retrieved efficiently, and for making it easy to access and manipulate the data in your applications.



### Example

Here's a simple example to give you an idea of how data modeling works in MongoDB:

**Let's say you have a collection of blog posts, and each post has an author, a title, a body, and a list of comments. To model this data in MongoDB, you might define two collections: one for posts and one for comments.**

Here's what the posts collection might look like:

In [None]:
{
    "_id": ObjectId("5f3ab0fa7ef2d01234567890"),
    "author": "Ehtisham",
    "title": "My first post",
    "body": "I love my country.",
    "comments": [
        ObjectId("5f3ab0fa7ef2d01234567891"),
        ObjectId("5f3ab0fa7ef2d01234567892"),
        ...
    ]
}


And here's what the **comments** collection might look like:



In [None]:
{
    "_id": ObjectId("5f3ab0fa7ef2d01234567891"),
    "post_id": ObjectId("5f3ab0fa7ef2d01234567890"),
    "author": "Ehtisham",
    "text": "This is a great post!"
}


- In this example, each post document has an `_id` field that is unique to that document and serves as `its primary key`. The comments field is an array of references to comment documents, which are stored in the separate comments collection. The comments collection also has a `post_id` field, which links each comment to the post it belongs to.

- This is just one simple example of data modeling in MongoDB. In a real-world application, your data structures may be much more complex, with multiple levels of nested documents, arrays, and references to other collections. 

> **Note:** The key is to think carefully about the structure of your data and to define the relationships between collections, documents, and fields in a way that makes sense for your specific use case.

## Indexing and Performance Optimization
<img src="images/indexing.png" height=500px width=500px>

**Indexing** is a crucial aspect of performance optimization in MongoDB. It allows the database to quickly find the relevant documents based on the specified criteria, improving query performance. Some of the key points about indexing in MongoDB are:

- Indexing in MongoDB is done at the field level.

- MongoDB supports multiple types of indexes, including single-field, compound, geospatial, text, and hashed indexes.
- By default, MongoDB creates a unique index on the `_id` field.

- To create an index, you can use the `createIndex()` method in the MongoDB shell or via the MongoDB driver in your programming language.

- When you run a query, MongoDB uses the most appropriate index to match the query conditions. You can use the `explain()` method to see which index MongoDB has selected.

- Indexes can improve query performance, but they also have some overhead in terms of disk space and write operations. Therefore, it's important to choose your indexes carefully.

- To optimize performance, you can use indexing techniques such as compound indexes, partial indexes, and multikey indexes.


> **Note:** Indexing is an important aspect of performance optimization in MongoDB. It enables the database to efficiently find the relevant data and improve query performance. Careful consideration of which indexes to create and how to use them can significantly improve the performance of your MongoDB-based applications

### Implementation of Indexing and Performance Optimization 

- Identify the queries that are slow and causing performance issues.

- Use the **explain()** method to analyze the query execution plan and identify any performance issues. For example, if a query is doing a full collection scan, it may be a sign that an index is needed.

- Create indexes on the fields that are frequently used in queries using the **create_index()** method in PyMongo. For example, if you have a collection of user data and frequently query based on the user's name and age, you could create indexes on those fields like this:

<br />
<p style="color:green; text-align:center"> <b>db.users.create_index([('name', pymongo.ASCENDING), ('age', pymongo.ASCENDING)])</b></p>
<br />

This creates a compound index on the `name` and `age` fields in ascending order.

- Monitor the performance of your queries using the MongoDB Profiler or other monitoring tools to see if the indexes are improving query performance.

- If you are still experiencing performance issues, consider using sharding(we will discuss sharding later) to distribute the data across multiple MongoDB instances. Sharding can improve query performance by allowing for parallel query execution across multiple shards.

- Continuously monitor and optimize your indexes and queries to ensure that your MongoDB database continues to perform well over time.

### Demo

In [38]:
# Step1: Connect to the Mongodb database using PyMongo
from pymongo import MongoClient

# establish a connection to the database
client = MongoClient('localhost', 27017)

db = client['mydatabase']


In [39]:
# Step2: Create a collection and insert data

# create a collection
collection = db['mycollection']

# insert some sample data
collection.insert_many([
    {"name": "Ehtisham", "age": 22, "city": "Lahore"},
    {"name": "Ali", "age": 20, "city": "Okara"},
    {"name": "Ayesha", "age": 17, "city": "Okara"},
    {"name": "Dua", "age": 6, "city": "Sahiwal"},
    {"name": "Adeen", "age": 3, "city": "Lahore"}
])


<pymongo.results.InsertManyResult at 0x7f09d0556c40>

In [41]:
# Step3: Create index on the `name` field for faster querying

collection.create_index([('name',1)])

'name_1'

In [43]:
# Step4: query the collection with the 'find' method and the 'explain' option
query = {"name": "Ehtisham"}
result = collection.find(query).explain()

# print the execution plan
result


{'queryPlanner': {'plannerVersion': 1,
  'namespace': 'mydatabase.mycollection',
  'indexFilterSet': False,
  'parsedQuery': {'name': {'$eq': 'Ehtisham'}},
  'winningPlan': {'stage': 'FETCH',
   'inputStage': {'stage': 'IXSCAN',
    'keyPattern': {'name': 1},
    'indexName': 'name_1',
    'isMultiKey': False,
    'multiKeyPaths': {'name': []},
    'isUnique': False,
    'isSparse': False,
    'isPartial': False,
    'indexVersion': 2,
    'direction': 'forward',
    'indexBounds': {'name': ['["Ehtisham", "Ehtisham"]']}}},
  'rejectedPlans': []},
 'executionStats': {'executionSuccess': True,
  'nReturned': 1,
  'executionTimeMillis': 0,
  'totalKeysExamined': 1,
  'totalDocsExamined': 1,
  'executionStages': {'stage': 'FETCH',
   'nReturned': 1,
   'executionTimeMillisEstimate': 0,
   'works': 2,
   'advanced': 1,
   'needTime': 0,
   'needYield': 0,
   'saveState': 0,
   'restoreState': 0,
   'isEOF': 1,
   'docsExamined': 1,
   'alreadyHasObj': 0,
   'inputStage': {'stage': 'IXSCAN',

Note that the `winningPlan` section indicates the use of the `name_1` index.

In [45]:
# Step5: Use the `explain` option to compare the performance of the query with and without the index

In [48]:
# query the collection without the index and print the execution time
result = collection.find({"age": {"$gt": 35}}).explain()
print(result["executionStats"]["executionTimeMillis"])



0


In [49]:
# create an index on the 'age' field
collection.create_index([("age", 1)])

# query the collection with the index and print the execution time
result = collection.find({"age": {"$gt": 35}}).explain()
print(result["executionStats"]["executionTimeMillis"])


0


> **Note:** By comparing the two execution times, we can see the improvement in performance achieved by using the index.

## Understanding of MongoDB architecture

<img src="images/MongoDB-Architecture.png" height=500px width=500px>

MongoDB architecture refers to the components and design of a MongoDB database system. It includes the following key components:

- **MongoDB Server:** This is the main component of MongoDB architecture, responsible for handling client requests, managing data storage, and executing queries.

- **Replication:** Replication is a key aspect of MongoDB architecture, providing high availability and data durability. MongoDB uses a replication mechanism known as replica sets, where data is replicated across multiple servers for redundancy.

- **Sharding:** Sharding is the process of dividing data across multiple servers to distribute load and improve performance. MongoDB uses a sharding mechanism known as sharded clusters, where data is divided into smaller chunks called shards and distributed across multiple servers.

- **Database:** A MongoDB database is a collection of related data stored together in a single unit. Databases are made up of collections, which are equivalent to tables in a relational database.

- **Collection:** A collection is a group of related documents stored together in a single unit. Documents in a collection are equivalent to rows in a table in a relational database.

- **Document:** A document is a single unit of data stored in a collection. It is a JSON-like structure, containing one or more key-value pairs.



> **Note:** Understanding MongoDB architecture is important for ensuring that your database is designed and optimized for your specific use case. It helps you make informed decisions about things like data modeling, replication, and sharding, and ensures that your database is scalable, reliable, and efficient.

+------------+     +------------+     +------------+
| Config     |     | Config     |     | Config     |
| Server 1   |     | Server 2   |     | Server 3   |
+------------+     +------------+     +------------+
         |                 |                 |
         +-----------------+-----------------+
                             |
                      +--------------+
                      | Mongos Router |
                      +--------------+
                             |
                 +---------------+---------------+
                 |               |               |
          +---------+     +---------+     +---------+
          | Shard 1 |     | Shard 2 |     | Shard 3 |
          +---------+     +---------+     +---------+


To connect to a MongoDB cluster using the PyMongo library in Python, you need to specify the connection details of each node in the cluster, as well as the name of the database and collection that you want to access. Here is an example code snippet:

In [None]:
from pymongo import MongoClient

# Define the connection details for each node in the cluster
config_servers = [
    'config-server-1.example.com:27019',
    'config-server-2.example.com:27019',
    'config-server-3.example.com:27019'
]

shards = [
    'shard-1.example.com:27017',
    'shard-2.example.com:27017',
    'shard-3.example.com:27017'
]

# Define the name of the database and collection to access
db_name = 'mydatabase'
collection_name = 'mycollection'

# Create a connection to the cluster using the MongoClient class
client = MongoClient(
    f'mongodb://{",".join(config_servers)}/?replicaSet=myreplica'
)

# Specify the database and collection to access
db = client[db_name]
collection = db[collection_name]

# Perform CRUD operations on the collection


# Advanced Querying in moongodb

<img src="images/pipeline.png" height=400px width=400px>

**Advanced querying in MongoDB involves using more complex query operations and aggregation pipelines to retrieve and analyze data in the database.**

Here is an example of how to perform advanced querying in MongoDB using the MongoDB shell:

Assume we have a collection of customer orders in our MongoDB database named `orders`. Each document in the collection represents a customer order and has the following fields: orderId, customerId, orderDate, orderItems, and totalAmount.

To retrieve all orders placed by a specific customer, we can use the following query:

In [None]:
db.orders.find( { customerId: "12345" } )


To retrieve all orders placed between two specific dates, we can use the following query:

In [None]:
db.orders.find( { orderDate: { $gte: ISODate("2022-01-01"), $lte: ISODate("2022-01-31") } } )


To retrieve the total number of orders placed by each customer, we can use the following aggregation pipeline:

In [None]:
db.orders.aggregate( [
   { $group: { _id: "$customerId", totalOrders: { $sum: 1 } } }
] )


This pipeline groups the orders by customer ID, and then uses the $sum operator to count the number of orders for each customer.

To retrieve the total amount of orders placed by each customer, we can use the following aggregation pipeline:

In [None]:
db.orders.aggregate( [
   { $group: { _id: "$customerId", totalAmount: { $sum: "$totalAmount" } } }
] )


This pipeline groups the orders by customer ID, and then uses the $sum operator to calculate the total amount of orders for each customer.

In addition to these basic examples, MongoDB provides a wide range of query operators and aggregation stages that can be used to perform more complex queries and data analysis. It's important to carefully analyze your data and query requirements to select the most appropriate operations and pipelines to achieve the desired results.

# Aggregation Framework

<img src="images/aggregation.png" height=400px width=400px>

The Aggregation Framework is a powerful feature in MongoDB that allows for complex data analysis by combining and processing data in multiple stages. Aggregation pipelines consist of a sequence of stages, each with its own set of operators that transform documents as they pass through the pipeline. It collects values from various documents and groups them together and then performs different types of operations on that grouped data like sum, average, minimum, maximum, etc to return a computed result. It is similar to the **aggregate function of SQL**.

# Working with GridFS in mongodb

<img src="images/gridfs.png" height=400px width=400px >

**GridFS** is a specification used by MongoDB to store and retrieve large files such as images, audio, and video. It stores files in chunks of 255 KB and is optimized for handling files larger than 16 MB.


To work with `GridFS` in Python using PyMongo, you can use the `gridfs` module. Here's an example of how to upload a file to GridFS:

In [53]:
import pymongo
from gridfs import GridFS

# connect to MongoDB server
client = pymongo.MongoClient("mongodb://localhost:27017/")

# select a database and GridFS collection
db = client["gridfs"]
fs = GridFS(db, collection="myfiles")

# open the file to be uploaded
with open("ranking.csv", "rb") as file:
    # upload the file to GridFS
    file_id = fs.put(file, filename="ranking.csv")
    
print(f"File uploaded with ID: {file_id}")


File uploaded with ID: 63fc4184c766296b075ab500


<img src="images/s12.png">

We then open the file to be uploaded and use the put method to store the file in GridFS. The **put** method returns the ID of the newly uploaded file.

To retrieve a file from GridFS, you can use the **get** method and pass in the ID of the file. Here's an example:

In [55]:
# retrieve a file from GridFS
retrieved_file = fs.get(file_id)

# print the contents of the file
print(retrieved_file.read())


b'Rank,institution,location code,location,SCORE,ar rank,er score,fsr score,fsr rank,cpf score,cpf rank,ifr score,ifr rank,isr score,isr rank,irn score,irn rank,ger score,ger rank,score scaled\n1,Massachusetts Institute of Technology (MIT) ,US,United States,100,5,100,100,14,100,5,100.0,54,90,109,96.1,58,100,3,100\n2,University of Cambridge,UK,United Kingdom,100,2,100,100,11,92,55,100.0,60,96,70,99.5,6,100,9,98\n3,Stanford University,US,United States,100,4,100,100,6,99,9,99.0,74,60,235,96.3,55,100,2,98\n4,University of Oxford,UK,United Kingdom,100,3,100,100,8,90,64,98.0,101,98,54,99.9,3,100,7,98\n5,Harvard University,US,United States,100,1,100,99,35,100,2,76.0,228,66,212,100.0,1,100,1,97\n6,California Institute of Technology (Caltech),US,United States,96,28,87,100,3,100,4,99.0,75,85,134,73.0,425,98,24,97\n7,Imperial College London,UK,United Kingdom,98,24,99,99,34,86,84,100.0,55,100,13,98.1,20,88,76,97\n8,UCL,UK,United Kingdom,99,14,98,97,51,77,119,99.0,87,100,14,100.0,2,90,71,95\n9,ETH Z

> **Note**: GridFS can be a useful tool when working with large files in MongoDB, but it's important to keep in mind that it may not be the best solution for all use cases. It's important to consider factors such as file size, frequency of access, and performance when deciding whether to use GridFS or a different approach.

## MongoDB Atlas and Cloud Integration:


- MongoDB Atlas is a fully managed cloud database service provided by MongoDB Inc. It allows developers to easily deploy, run and manage MongoDB databases in the cloud. MongoDB Atlas integrates with popular cloud providers such as Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP), providing seamless deployment and scalability options.

- MongoDB offers several options to run your database in the cloud, with a range of features and benefits depending on your use case. Some of the popular cloud services offered by MongoDB include:

### MongoDB Atlas: 

A fully-managed cloud database service that offers automatic scaling, built-in security features, and backup and recovery options. You can get started with a free tier and scale up as needed.

Here's an example of how to connect to a MongoDB Atlas cluster using PyMongo:

In [None]:
import pymongo
from pymongo import MongoClient

# Replace the following with your Atlas connection string
conn_str = "mongodb+srv://<username>:<password>@<cluster>.mongodb.net/test?retryWrites=true&w=majority"

client = MongoClient(conn_str)

# List all the databases in the cluster
print(client.list_database_names())


# Error handling and Debugging in mongodb

Error handling and debugging are important aspects of any development process, and MongoDB provides several tools to help with this process.

In MongoDB, errors can occur for various reasons, such as invalid queries, network issues, or authentication problems. The PyMongo library, which is the Python driver for MongoDB, provides several methods to catch and handle these errors. The most common way is to use try-except blocks, which allow you to catch and handle exceptions that are raised when an error occurs.

## MongoDB and data visualization:
    
**Data visualization** is the process of converting raw data into visual representations such as graphs, charts, and dashboards. MongoDB can be used in combination with various data visualization tools to create rich visual representations of data. Some popular tools include `Tableau`, `PowerBI`, and `D3.js`. This allows organizations to easily extract insights from their data and make data-driven decisions.


<p style="color:red">MongoDB provides many features for data visualization and analysis. One popular tool for data visualization with MongoDB is called MongoDB Charts, which allows you to create charts, graphs, and other visualizations based on data stored in a MongoDB database. You can use MongoDB Charts to create interactive dashboards, share data insights with others, and gain a better understanding of your data. </p>

Here is an example of using MongoDB Charts to create a bar chart based on data stored in a MongoDB database:

- First, create a MongoDB Charts account and connect it to your MongoDB database.

- Next, create a new chart and choose the "Bar Chart" option.

- Select the collection from your database that you want to visualize.

- Choose the `Group By` option and select the field that you want to group the data by.

- Choose the `Count` option to count the number of documents in each group.

- Customize the chart's appearance and labels as desired.

- Save the chart and share it with others.

Here is an example Python code using the PyMongo library to query data from a MongoDB database and generate a simple bar chart using the matplotlib library:

In [None]:
import pymongo
import matplotlib.pyplot as plt

# Connect to the MongoDB database
client = pymongo.MongoClient("mongodb://localhost:27017/")
db = client["mydatabase"]
collection = db["mycollection"]

# Query the data
data = collection.aggregate([
  { "$group": { "_id": "$field", "count": { "$sum": 1 } } },
  { "$sort": { "count": -1 } },
  { "$limit": 10 }
])




In [None]:
# Extract the data into separate lists
labels = []
values = []
for d in data:
  labels.append(d["_id"])
  values.append(d["count"])

# Create the bar chart
plt.bar(labels, values)
plt.title("My Bar Chart")
plt.xlabel("Field Values")
plt.ylabel("Count")
plt.show()

This code connects to a MongoDB database, queries the data in a collection, and groups the data by a specific field. It then extracts the grouped data into separate lists and uses matplotlib to generate a simple bar chart based on the data.

## MongoDB and scalability:

<br/>
<img src="images/scaling.avif">
<br/>

MongoDB is designed to be scalable, allowing organizations to easily handle increasing amounts of data and increasing numbers of users. MongoDB supports horizontal scalability through sharding, which allows data to be split across multiple servers. This makes it possible to handle large amounts of data and high levels of traffic, ensuring that the database remains fast and responsive.

There are several strategies for achieving scalability in MongoDB, including:

- **Sharding:** This involves distributing data across multiple servers or clusters, based on a shard key. Each shard contains a subset of the data and can be scaled independently.

- **Replication:** This involves creating multiple copies of your data across multiple servers, which improves availability and provides fault tolerance.

- **Load balancing:** This involves distributing requests across multiple servers to ensure that each server is utilized evenly.
<br/>
<br/>

### Example
<br/>

Let's say we have a MongoDB database containing **customer** data for an online retailer. As the retailer grows, the amount of customer data increases, which could result in slower queries and decreased performance. In order to maintain high performance, we could add more servers to our MongoDB cluster to handle the increased load.
<br/>



Here is an example of how to set up a sharded MongoDB cluster for scalability:

- Start by setting up a configuration server, which keeps track of the cluster's metadata and configuration.

<p style="text-align:center;"> <b>mongod --configsvr --dbpath /data/configdb --port 27019 </b></p>
<br/>

-     Next, start each shard server.

<p style="text-align:center;"> <b> mongod --shardsvr --replSet shard1 --dbpath /data/shard1 --port 27018</b></p>

<p style="text-align:center;"> <b>mongod --shardsvr --replSet shard2 --dbpath /data/shard2 --port 27019</b></p>

<br/>


-     Initialize the replica set for each shard.

<p style="text-align:center;"> <b>mongo --port 27018    </b></p>
<p style="text-align:center;"> <b>rs.initiate()</b></p>

<br/>


-     Add each shard to the cluster.

<p style="text-align:center;"> <b>mongo --port 27017   </b></p>
<p style="text-align:center;"> <b>sh.addShard("shard1/localhost:27018")</b></p>
<p style="text-align:center;"> <b>sh.addShard("shard1/localhost:27019")</b></p>

<br/>

- Create a sharded collection by defining a shard key and enabling sharding for the database.

<p style="text-align:center;"> <b>mongo --port 27017   </b></p>
<p style="text-align:center;"> <b>use mydb</b></p>
<p style="text-align:center;"> <b>sh.enableSharding("mydb")</b></p>

<p style="text-align:center;"> <b>db.createCollection("mycollection", {sharded: true, key: {"name": 1}})</b></p>


> **Note**: Once this is set up, the **MongoDB** cluster will automatically distribute data across the different shards and handle queries in parallel, providing horizontal scalability and improved performance.

## MongoDB and backup and recovery:

<br/>
<img src="images/backup.jpg" height=400px width=400px>
<br/>


**Backups** are essential for ensuring that data is protected against loss in the event of a disaster or system failure. MongoDB provides several options for backing up data, including on-demand backups, scheduled backups, and continuous backups. Additionally, MongoDB provides various **recovery** options, such as point-in-time recovery and rollback, to ensure that data can be restored in the event of a disaster. This provides peace of mind and helps organizations ensure that their data is safe and secure.


**MongoDB** provides several mechanisms to backup and recover data in case of hardware failures or accidental data loss. Some of the ways to perform backup and recovery in MongoDB are:

- **mongodump and mongorestore:** These are command-line utilities provided by MongoDB that are used to backup and restore MongoDB databases. `mongodump` creates a binary export of the data in a MongoDB database, while `mongorestore` imports that binary export back into a MongoDB database. Here is an example of using `mongodump` to backup a database named `mydb`: 

<p style="text-align:center;"><b>mongodump --db mydb</b></p>

<br/>

And here is an example of using `mongorestore` to restore that `backup`:
<br/>
<p style="text-align:center;"><b>mongorestore --db mydb dump/mydb</b></p>


- **Snapshots:** You can also use hardware snapshots to backup your MongoDB data. This method involves taking a point-in-time snapshot of the data and copying it to a different location. This is often done using tools provided by your cloud provider or storage vendor.
<br/>

- **Replication:** MongoDB supports replication, which involves maintaining multiple copies of data across multiple servers. In case of hardware failure or data loss, one of the replicas can be promoted to primary and take over the responsibilities of the failed primary.
<br/>

- **Sharding:** MongoDB also supports sharding, which involves distributing data across multiple servers. In case of hardware failure or data loss, the remaining servers can continue to provide access to the data that was not lost.
<br/>


Here is an example of using `mongodump` and `mongorestore` to backup and restore a database:

**Backup the `mydb` database to a directory called `mydb-backup`**

<br/>
<p style="text-align:center;"><b>mongodump --db mydb --out mydb backup</b></p>


<br/>

**Restore the "mydb" database from the backup directory**

<br/>

<br/>
<p style="text-align:center;"><b>
mongorestore --db mydb mydb-backup/mydb</b></p>




## MongoDB interview questions:

- What is MongoDB and how is it different from a relational database?
- What is a document-oriented database and how does MongoDB store data?
- How do you install and start MongoDB?
- What are the different data types supported by MongoDB?
- What is a collection in MongoDB?
- What is sharding and how does it work in MongoDB?
- What are indexes in MongoDB and why are they important?
- What is the aggregation framework in MongoDB?
- How do you create and drop a database in MongoDB?
- What is a replica set and how does it work in MongoDB?
- How do you perform backup and recovery in MongoDB?
- What is the role of the BSON format in MongoDB?
- How do you optimize query performance in MongoDB?
- How does MongoDB handle transactions?
- How do you use MongoDB with Python?

## Industrial applications of MongoDB

**MongoDB** has become a popular choice for a wide range of applications across various industries. Here are some examples of industrial applications of MongoDB:

- **E-commerce platforms:** MongoDB can be used for building e-commerce platforms for online retail and wholesale stores.
<br/>
- **Social media platforms:** MongoDB is used for storing and processing social media data such as user profiles, posts, comments, and likes.
<br/>

- **Content management systems:** MongoDB is used for building content management systems (CMS) that can handle large amounts of unstructured data.
<br/>

- **Financial services:** MongoDB is used for building financial applications such as trading platforms, portfolio management systems, and fraud detection systems.
<br/>

- **Healthcare:** MongoDB is used for building healthcare applications such as electronic health record (EHR) systems, patient monitoring systems, and clinical research data management systems.
<br/>

- **Internet of Things (IoT):** MongoDB is used for building IoT applications such as smart home systems, connected car platforms, and industrial monitoring systems.
<br/>

- **Gaming:** MongoDB is used for building gaming applications such as online multiplayer games and mobile games.
<br/>

- **Advertising:** MongoDB is used for building advertising platforms that can handle large amounts of data and provide real-time analytics.
<br/>

- **Government:** MongoDB is used for building government applications such as citizen services, public safety systems, and open data portals.
<br/>

- **Education:** MongoDB is used for building education applications such as learning management systems, student information systems, and research data management systems.
<br/>

- **Energy and utilities:** MongoDB is used for building energy and utilities applications such as smart grid systems, predictive maintenance systems, and asset management systems.
<br/>

- **Logistics and transportation:** MongoDB is used for building logistics and transportation applications such as real-time tracking systems, fleet management systems, and supply chain management systems.
<br/>

- **Media and entertainment:** MongoDB is used for building media and entertainment applications such as content distribution platforms, digital asset management systems, and video streaming services.
<br/>

- **Travel and hospitality:** MongoDB is used for building travel and hospitality applications such as booking engines, travel recommendation systems, and hotel management systems.
<br/>

- **Retail:** MongoDB is used for building retail applications such as inventory management systems, customer relationship management systems, and point of sale systems.
<br/>
