<h1 align="center">NoSQL-with-MongoDB</h1>

### Learning Agenda of Notebook:
- Introduction to MongoDB and NoSQL databases
- Setting up MongoDB and Python
- CRUD Operations
- Data Modeling
- Indexing and Performance Optimization
- Advanced Querying
- Aggregation Framework
- Working with GridFS
- MongoDB and PyMongo library
- MongoDB and Cloud Services
- Error handling and Debugging
- Understanding MongoDB Architecture
- MongoDB Atlas and Cloud Integration
- MongoDB and data visualization
- MongoDB and scalability
- MongoDB and backup and recovery

# Introduction to MongoDB and NoSQL databases


## NoSQL Databases

- **NoSQL databases** (Not Only SQL) are a type of database that differ from traditional relational databases in the way they store and manage data. Unlike relational databases, which use tables and structured query language (SQL) to store and retrieve data, NoSQL databases use a variety of data models, such as document-oriented, key-value, graph, and column-family, to store and retrieve data.

- Here are some **examples** of NoSQL databases:

    - **Document-oriented databases:** MongoDB is an example of a document-oriented database. In this type of database, data is stored in the form of documents, which are similar to JSON objects. Each document can have different fields and structures, making it easy to store complex, unstructured data.

    - **Key-value databases:** Redis is an example of a key-value database. In this type of database, data is stored as key-value pairs, where the key is a unique identifier and the value is the data being stored. Key-value databases are designed to be fast and scalable and are often used for caching and session management.

    - **Graph databases:** Neo4j is an example of a graph database. In this type of database, data is stored in the form of nodes and relationships, making it well-suited for storing data with complex relationships, such as social networks or recommendation systems.

    - **Column-family databases:** Apache Cassandra is an example of a column-family database. In this type of database, data is stored in columns rather than rows, making it well-suited for large-scale, distributed systems.


> **Note:** NoSQL databases are often preferred over relational databases for certain use cases, such as when dealing with large amounts of unstructured data, when scalability is a concern, or when the data has complex relationships that are difficult to model in a traditional relational database.


<img src="images/NoSQLDatabases.jpg" align="center" height=400px width=400px>


## What is MongoDB?

<img src="images/intro_mongodb.jpeg" align="center" height=400px width=400px>

- **MongoDB** is a `document-oriented NoSQL database` that is designed to be scalable and flexible. Unlike traditional relational databases, which store data in tables and use structured query language (SQL) to retrieve data, MongoDB stores data in documents, which are similar to JSON objects.

- Here are some **examples** to illustrate how MongoDB works:

    - **Storing data in documents:** In MongoDB, data is stored in documents, which are similar to JSON objects. For example, consider a database of books. Each book can be stored as a document with fields for the title, author, ISBN, and so on.
    
    - **Collections:** Documents are stored in collections, which are similar to tables in relational databases. For example, the books database might have a collection named "books" that contains all the books in the database.

    - **Querying data:** MongoDB provides a rich query language that allows you to retrieve data based on various criteria. For example, you can retrieve all books written by F. Scott Fitzgerald by using the following query.

```
{
    "_id": ObjectId("5f22b3e3e3f0b375bc0edc71"),
    "title": "The Great Gatsby",
    "author": "F. Scott Fitzgerald",
    "ISBN": "9780743273565"
}

```



In [None]:
!ls 

## Setting up MongoDB and Python

<img src="images/mongodbvspython.png" align="center" height=400px width=400px>



To set up MongoDB and Python, you need to follow these steps:

- **Install MongoDB:** You can download MongoDB from the official [website](https://www.mongodb.com/try/download/community) and install it on your system.


- **Install PyMongo**: PyMongo is the Python driver for MongoDB. You can install it using pip: `pip3 install pymongo` , this command is for linux OS.


- **Connect to MongoDB:** To connect to MongoDB from Python, you need to create a `MongoClient object`. You can connect to a local MongoDB instance like this:

```
from pymongo import MongoClient

client = MongoClient("mongodb://localhost:27017/")

```
>**Note:** The **MongoClient object** in PyMongo is used to establish a connection to a MongoDB database. It represents a client-side connection to a MongoDB database and provides a way to interact with the database. With a MongoClient object, you can create and manipulate databases, collections, and documents. In this example, the MongoClient object is connecting to a MongoDB instance running on the same machine as the Python code (localhost) and using the default MongoDB port (27017). Once you have a MongoClient object, you can use it to access and manipulate the databases, collections, and documents in the MongoDB instance.

- **Work with databases and collections:** Once you have connected to MongoDB, you can work with databases and collections. To create a database, you can use the `client.db_name` syntax, where `db_name` is the name of the database. To create a collection, you can use the `db.collection_name` syntax, where `collection_name` is the name of the collection. For example, to create a database named `booksdb` and a collection named `books`, you can do the following: 


```

db = client.booksdb
books = db.books

```

- **Insert data into the collection:** To insert data into the collection, you can use the `insert_one` or `insert_many` method. For example, to insert a single book into the collection, you can do the following:

```
book = {"title": "The Great Gatsby", "author": "F. Scott Fitzgerald"}
books.insert_one(book)

```

This will insert the book document into the "books" collection in the "booksdb" database.





## CRUD Operations 
<img src="images/crud.png" align="center" height=400px width=400px>

- **CRUD** stands for `Create`, `Read`, `Update`, and `Delete` and are the basic operations that you perform on data in a database.

Here's a brief overview of the CRUD operations in MongoDB using Python's PyMongo library:


- **Create (Insert):** To insert a document (record) into a collection `(table)`, you use the `insert_one()` or `insert_many()` method on the collection object. For example:


In [6]:
from pymongo import MongoClient

client = MongoClient("mongodb://localhost:27017/")
db = client["test_database"]
collection = db["test_collection"]

# Insert a single document
collection.insert_one({"name": "Ehtisham", "age": 22})

# Insert multiple documents
docs = [{"name": "Ali", "age": 20}, {"name": "Ayesha", "age": 17}]
collection.insert_many(docs)


<pymongo.results.InsertManyResult at 0x7f469a132d80>

- **Read (Find):** To query for documents in a collection, you use the `find()` method on the collection object. For example:

In [8]:
# Find all documents
docs = collection.find()
for doc in docs:
    print(doc)

# Find documents that match a certain condition
docs = collection.find({"age": {"$gt": 20}})
for doc in docs:
    print(doc)


{'_id': ObjectId('63e48107321747d78e212af5'), 'name': 'Ehtisham', 'age': 22}
{'_id': ObjectId('63e48107321747d78e212af6'), 'name': 'Ali', 'age': 20}
{'_id': ObjectId('63e48107321747d78e212af7'), 'name': 'Ayesha', 'age': 17}
{'_id': ObjectId('63e48107321747d78e212af5'), 'name': 'Ehtisham', 'age': 22}


- **Update:** To update a document in a collection, you use the `update_one()` or `update_many()` method on the collection object. For example:

In [11]:
# Update a single document
collection.update_one({"name": "Ehtisham"}, {"$set": {"age": 21}})

# Update multiple documents
collection.update_many({"age": {"$lt": 20}}, {"$inc": {"age": 1}})


<pymongo.results.UpdateResult at 0x7f469a731200>

In [12]:
# Find all documents
docs = collection.find()
for doc in docs:
    print(doc)


{'_id': ObjectId('63e48107321747d78e212af5'), 'name': 'Ehtisham', 'age': 21}
{'_id': ObjectId('63e48107321747d78e212af6'), 'name': 'Ali', 'age': 21}
{'_id': ObjectId('63e48107321747d78e212af7'), 'name': 'Ayesha', 'age': 19}


- **Delete:** To delete a document from a collection, you use the `delete_one()` or `delete_many()` method on the collection object. For example:

In [13]:
# Delete a single document
collection.delete_one({"name": "Ehtisham"})

# Delete multiple documents
collection.delete_many({"age": {"$gt": 20}})


<pymongo.results.DeleteResult at 0x7f469ae19500>

In [14]:
# Find all documents
docs = collection.find()
for doc in docs:
    print(doc)


{'_id': ObjectId('63e48107321747d78e212af7'), 'name': 'Ayesha', 'age': 19}


## Data Modeling

<img src="images/data_model.jpg" align="right" height=500px width=500px>

- **Data modeling** in MongoDB is the process of designing and organizing data in a way that is optimized for the MongoDB document-oriented data model. It involves creating a schema that defines the structure, relationships, and constraints of the data, and using that schema to create collections and documents. The goal of data modeling in MongoDB is to ensure that the data is stored in a way that is efficient, flexible, and consistent with the requirements of the application.

- **Data modeling** in MongoDB is different from data modeling in relational databases because MongoDB is a document-oriented database and it stores data in BSON format (binary JSON) in the form of documents, and collections rather than tables.

- MongoDB also provides a more flexible data model, which allows for more dynamic and unstructured data, and it also support horizontal scaling, which allows for better performance and scalability.


#### Another Definition

- **Data modeling** in MongoDB refers to the process of defining the structure of your data, including the relationships between different collections, documents, and fields. This structure is important for ensuring that your data is stored and retrieved efficiently, and for making it easy to access and manipulate the data in your applications.



### Example

Here's a simple example to give you an idea of how data modeling works in MongoDB:

**Let's say you have a collection of blog posts, and each post has an author, a title, a body, and a list of comments. To model this data in MongoDB, you might define two collections: one for posts and one for comments.**

Here's what the posts collection might look like:

In [None]:
{
    "_id": ObjectId("5f3ab0fa7ef2d01234567890"),
    "author": "Ehtisham",
    "title": "My first post",
    "body": "I love my country.",
    "comments": [
        ObjectId("5f3ab0fa7ef2d01234567891"),
        ObjectId("5f3ab0fa7ef2d01234567892"),
        ...
    ]
}


And here's what the **comments** collection might look like:



In [None]:
{
    "_id": ObjectId("5f3ab0fa7ef2d01234567891"),
    "post_id": ObjectId("5f3ab0fa7ef2d01234567890"),
    "author": "Ehtisham",
    "text": "This is a great post!"
}


- In this example, each post document has an `_id` field that is unique to that document and serves as `its primary key`. The comments field is an array of references to comment documents, which are stored in the separate comments collection. The comments collection also has a `post_id` field, which links each comment to the post it belongs to.

- This is just one simple example of data modeling in MongoDB. In a real-world application, your data structures may be much more complex, with multiple levels of nested documents, arrays, and references to other collections. 

> **Note:** The key is to think carefully about the structure of your data and to define the relationships between collections, documents, and fields in a way that makes sense for your specific use case.

## Indexing and Performance Optimization
<img src="images/indexing.png" height=500px width=500px>

**Indexing** is a crucial aspect of performance optimization in MongoDB. It allows the database to quickly find the relevant documents based on the specified criteria, improving query performance. Some of the key points about indexing in MongoDB are:

- Indexing in MongoDB is done at the field level.

- MongoDB supports multiple types of indexes, including single-field, compound, geospatial, text, and hashed indexes.
- By default, MongoDB creates a unique index on the `_id` field.

- To create an index, you can use the `createIndex()` method in the MongoDB shell or via the MongoDB driver in your programming language.

- When you run a query, MongoDB uses the most appropriate index to match the query conditions. You can use the `explain()` method to see which index MongoDB has selected.

- Indexes can improve query performance, but they also have some overhead in terms of disk space and write operations. Therefore, it's important to choose your indexes carefully.

- To optimize performance, you can use indexing techniques such as compound indexes, partial indexes, and multikey indexes.


> **Note:** Indexing is an important aspect of performance optimization in MongoDB. It enables the database to efficiently find the relevant data and improve query performance. Careful consideration of which indexes to create and how to use them can significantly improve the performance of your MongoDB-based applications

## MongoDB and PyMongo library
<img src="images/pymongo.jpg" height=400px width=400px>

## Understanding of MongoDB architecture

<img src="images/MongoDB-Architecture.png" height=500px width=500px>

MongoDB architecture refers to the components and design of a MongoDB database system. It includes the following key components:

- **MongoDB Server:** This is the main component of MongoDB architecture, responsible for handling client requests, managing data storage, and executing queries.

- **Replication:** Replication is a key aspect of MongoDB architecture, providing high availability and data durability. MongoDB uses a replication mechanism known as replica sets, where data is replicated across multiple servers for redundancy.

- **Sharding:** Sharding is the process of dividing data across multiple servers to distribute load and improve performance. MongoDB uses a sharding mechanism known as sharded clusters, where data is divided into smaller chunks called shards and distributed across multiple servers.

- **Database:** A MongoDB database is a collection of related data stored together in a single unit. Databases are made up of collections, which are equivalent to tables in a relational database.

- **Collection:** A collection is a group of related documents stored together in a single unit. Documents in a collection are equivalent to rows in a table in a relational database.

- **Document:** A document is a single unit of data stored in a collection. It is a JSON-like structure, containing one or more key-value pairs.



> **Note:** Understanding MongoDB architecture is important for ensuring that your database is designed and optimized for your specific use case. It helps you make informed decisions about things like data modeling, replication, and sharding, and ensures that your database is scalable, reliable, and efficient.

## MongoDB Atlas and Cloud Integration:

**MongoDB Atlas and Cloud Integration:** MongoDB Atlas is a fully managed cloud database service provided by MongoDB Inc. It allows developers to easily deploy, run and manage MongoDB databases in the cloud. MongoDB Atlas integrates with popular cloud providers such as Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP), providing seamless deployment and scalability options.

## MongoDB and data visualization:
    
**Data visualization** is the process of converting raw data into visual representations such as graphs, charts, and dashboards. MongoDB can be used in combination with various data visualization tools to create rich visual representations of data. Some popular tools include `Tableau`, `PowerBI`, and `D3.js`. This allows organizations to easily extract insights from their data and make data-driven decisions.


## MongoDB and scalability:

<br/>
<img src="images/scaling.avif">
<br/>

MongoDB is designed to be scalable, allowing organizations to easily handle increasing amounts of data and increasing numbers of users. MongoDB supports horizontal scalability through sharding, which allows data to be split across multiple servers. This makes it possible to handle large amounts of data and high levels of traffic, ensuring that the database remains fast and responsive.

There are several strategies for achieving scalability in MongoDB, including:

- **Sharding:** This involves distributing data across multiple servers or clusters, based on a shard key. Each shard contains a subset of the data and can be scaled independently.

- **Replication:** This involves creating multiple copies of your data across multiple servers, which improves availability and provides fault tolerance.

- **Load balancing:** This involves distributing requests across multiple servers to ensure that each server is utilized evenly.
<br/>
<br/>

### Example
<br/>

Let's say we have a MongoDB database containing **customer** data for an online retailer. As the retailer grows, the amount of customer data increases, which could result in slower queries and decreased performance. In order to maintain high performance, we could add more servers to our MongoDB cluster to handle the increased load.
<br/>



Here is an example of how to set up a sharded MongoDB cluster for scalability:

- Start by setting up a configuration server, which keeps track of the cluster's metadata and configuration.

<p style="text-align:center;"> <b>mongod --configsvr --dbpath /data/configdb --port 27019 </b></p>
<br/>

-     Next, start each shard server.

<p style="text-align:center;"> <b> mongod --shardsvr --replSet shard1 --dbpath /data/shard1 --port 27018</b></p>

<p style="text-align:center;"> <b>mongod --shardsvr --replSet shard2 --dbpath /data/shard2 --port 27019</b></p>

<br/>


-     Initialize the replica set for each shard.

<p style="text-align:center;"> <b>mongo --port 27018    </b></p>
<p style="text-align:center;"> <b>rs.initiate()</b></p>

<br/>


-     Add each shard to the cluster.

<p style="text-align:center;"> <b>mongo --port 27017   </b></p>
<p style="text-align:center;"> <b>sh.addShard("shard1/localhost:27018")</b></p>
<p style="text-align:center;"> <b>sh.addShard("shard1/localhost:27019")</b></p>

<br/>

- Create a sharded collection by defining a shard key and enabling sharding for the database.

<p style="text-align:center;"> <b>mongo --port 27017   </b></p>
<p style="text-align:center;"> <b>use mydb</b></p>
<p style="text-align:center;"> <b>sh.enableSharding("mydb")</b></p>

<p style="text-align:center;"> <b>db.createCollection("mycollection", {sharded: true, key: {"name": 1}})</b></p>


> **Note**: Once this is set up, the **MongoDB** cluster will automatically distribute data across the different shards and handle queries in parallel, providing horizontal scalability and improved performance.

## MongoDB and backup and recovery:

<br/>
<img src="images/backup.jpg" height=400px width=400px>
<br/>


**Backups** are essential for ensuring that data is protected against loss in the event of a disaster or system failure. MongoDB provides several options for backing up data, including on-demand backups, scheduled backups, and continuous backups. Additionally, MongoDB provides various **recovery** options, such as point-in-time recovery and rollback, to ensure that data can be restored in the event of a disaster. This provides peace of mind and helps organizations ensure that their data is safe and secure.


**MongoDB** provides several mechanisms to backup and recover data in case of hardware failures or accidental data loss. Some of the ways to perform backup and recovery in MongoDB are:

- **mongodump and mongorestore:** These are command-line utilities provided by MongoDB that are used to backup and restore MongoDB databases. `mongodump` creates a binary export of the data in a MongoDB database, while `mongorestore` imports that binary export back into a MongoDB database. Here is an example of using `mongodump` to backup a database named `mydb`: 

<p style="text-align:center;"><b>mongodump --db mydb</b></p>

<br/>

And here is an example of using `mongorestore` to restore that `backup`:
<br/>
<p style="text-align:center;"><b>mongorestore --db mydb dump/mydb</b></p>


- **Snapshots:** You can also use hardware snapshots to backup your MongoDB data. This method involves taking a point-in-time snapshot of the data and copying it to a different location. This is often done using tools provided by your cloud provider or storage vendor.
<br/>

- **Replication:** MongoDB supports replication, which involves maintaining multiple copies of data across multiple servers. In case of hardware failure or data loss, one of the replicas can be promoted to primary and take over the responsibilities of the failed primary.
<br/>

- **Sharding:** MongoDB also supports sharding, which involves distributing data across multiple servers. In case of hardware failure or data loss, the remaining servers can continue to provide access to the data that was not lost.
<br/>


Here is an example of using `mongodump` and `mongorestore` to backup and restore a database:

**Backup the `mydb` database to a directory called `mydb-backup`**

<br/>
<p style="text-align:center;"><b>mongodump --db mydb --out mydb backup</b></p>


<br/>

**Restore the "mydb" database from the backup directory**

<br/>

<br/>
<p style="text-align:center;"><b>
mongorestore --db mydb mydb-backup/mydb</b></p>




## MongoDB interview questions:

- What is MongoDB and how is it different from a relational database?
- What is a document-oriented database and how does MongoDB store data?
- How do you install and start MongoDB?
- What are the different data types supported by MongoDB?
- What is a collection in MongoDB?
- What is sharding and how does it work in MongoDB?
- What are indexes in MongoDB and why are they important?
- What is the aggregation framework in MongoDB?
- How do you create and drop a database in MongoDB?
- What is a replica set and how does it work in MongoDB?
- How do you perform backup and recovery in MongoDB?
- What is the role of the BSON format in MongoDB?
- How do you optimize query performance in MongoDB?
- How does MongoDB handle transactions?
- How do you use MongoDB with Python?

## Industrial applications of MongoDB

**MongoDB** has become a popular choice for a wide range of applications across various industries. Here are some examples of industrial applications of MongoDB:

- **E-commerce platforms:** MongoDB can be used for building e-commerce platforms for online retail and wholesale stores.
<br/>
- **Social media platforms:** MongoDB is used for storing and processing social media data such as user profiles, posts, comments, and likes.
<br/>

- **Content management systems:** MongoDB is used for building content management systems (CMS) that can handle large amounts of unstructured data.
<br/>

- **Financial services:** MongoDB is used for building financial applications such as trading platforms, portfolio management systems, and fraud detection systems.
<br/>

- **Healthcare:** MongoDB is used for building healthcare applications such as electronic health record (EHR) systems, patient monitoring systems, and clinical research data management systems.
<br/>

- **Internet of Things (IoT):** MongoDB is used for building IoT applications such as smart home systems, connected car platforms, and industrial monitoring systems.
<br/>

- **Gaming:** MongoDB is used for building gaming applications such as online multiplayer games and mobile games.
<br/>

- **Advertising:** MongoDB is used for building advertising platforms that can handle large amounts of data and provide real-time analytics.
<br/>

- **Government:** MongoDB is used for building government applications such as citizen services, public safety systems, and open data portals.
<br/>

- **Education:** MongoDB is used for building education applications such as learning management systems, student information systems, and research data management systems.
<br/>

- **Energy and utilities:** MongoDB is used for building energy and utilities applications such as smart grid systems, predictive maintenance systems, and asset management systems.
<br/>

- **Logistics and transportation:** MongoDB is used for building logistics and transportation applications such as real-time tracking systems, fleet management systems, and supply chain management systems.
<br/>

- **Media and entertainment:** MongoDB is used for building media and entertainment applications such as content distribution platforms, digital asset management systems, and video streaming services.
<br/>

- **Travel and hospitality:** MongoDB is used for building travel and hospitality applications such as booking engines, travel recommendation systems, and hotel management systems.
<br/>

- **Retail:** MongoDB is used for building retail applications such as inventory management systems, customer relationship management systems, and point of sale systems.
<br/>


In [16]:
string="Ehtisham Sadiq"
string = string.replace(" ","")
sorted(string)

['E', 'S', 'a', 'a', 'd', 'h', 'h', 'i', 'i', 'm', 'q', 's', 't']

In [20]:
# check if two strings are anagram or not?

# anagram means both have same number of characters but into different order

def check_anagram(str1, str2):
    str1 = str1.replace(" ",'')
    str2 = str2.replace(" ",'')
    return sorted(str1) == sorted(str2)

str1 = "rats"
str2 = "starb"

if check_anagram(str1, str2):
    print("Both string are Anagram")
else:
    print("Strings are not Anagram")

Strings are not Anagram


In [42]:
# Reverse a Linked List using recursive method

# create a node class to create a new node

class Node:
    def __init__(self, data):
        self.data = data
        self.next = None

    
class LinkedList:
    def __init__(self):
        self.head=None
        
    def push(self,data):
        new_node = Node(data)
        new_node.next = self.head
        self.head = new_node
        
    def print_list(self, node):
        if node is None:
            return 
        print(node.data, end=" ")
        self.print_list(node.next)
    
    def reverse_list(self, node):
        if node is None:
            return 
        self.reverse_list(node.next)
        print(node.data, end=" ")

In [43]:
list1 = LinkedList()
list1.push(3)
list1.push(9)
list1.push(13)
list1.push(34)
list1.push(4)
list1.print_list(list1.head)
list1.reverse_list(list1.head)


4 34 13 9 3 

In [44]:
def left_rotate(arr, steps):
    n = len(arr)
    steps %= n  # To handle the case when steps is greater than n
    
    # Slice the array and concatenate it to itself
    return arr[steps:] + arr[:steps]


In [48]:
arr1 =[4, 34, 13, 9, 3] 
left_rotate(arr1, 5)

[4, 34, 13, 9, 3]

In [54]:
# when step is greater than array size how we can deal it

def left_rotate(arr, steps):
    size = len(arr)
    steps = steps%size
    return arr[steps:] + arr[:steps]

In [55]:
left_rotate(arr1,2)

[13, 9, 3, 4, 34]

In [56]:
def first_non_repeating_char(s):
    char_freq = {}
    for char in s:
        if char in char_freq:
            char_freq[char] = char_freq[char] +1
        else:
            char_freq[char] = 1
            
            
    for char in s:
        if char_freq[char]==1:
            return char
    return None

In [58]:
s = 'abbcde'
result = first_non_repeating_char(s)
print(result)  # Output: 'c'


a


In [64]:
data = {'col':arr1}
data

{'col': [4, 34, 13, 9, 3]}

In [65]:
import  pandas as pd

In [68]:
a = pd.DataFrame(data).mode()
a

Unnamed: 0,col
0,3
1,4
2,9
3,13
4,34


In [71]:
a.col.mode()[0]

3