# Theoretical Questions

### 1 What are the key differences between SQL and NoSQL databases

In [None]:
'''
SQL Databases:-

Structured Data: SQL databases store data in pre-defined tables with a fixed schema, 
                 which defines the structure of the data.
                 
ACID Compliance: SQL databases follow the Atomicity, Consistency, Isolation, and Durability 
                to ensure data consistency and reliability.
                
Querying: SQL databases use SQL queries to retrieve and manipulate data.

Schema-on-write: SQL databases require a predefined schema before data can be inserted or updated.


NoSQL Databases:-

Unstructured Data: NoSQL databases store data in various formats, such as JSON documents,
                    key-value pairs, or graphs, which allows for flexible schema design.
                    
CAP Theorem Compliance: NoSQL databases often prioritize availability and partition tolerance
                        over consistency to achieve high scalability and performance.
Querying: NoSQL databases use proprietary query 
            languages or APIs to retrieve and manipulate data.

'''

### 2 What makes MongoDB a good choice for modern applications

In [None]:
'''

Easier Data Integration: The JSON-like format makes it easier to work with data 
            from various sources and integrate with modern programming languages and frameworks.

Indexing: MongoDB provides rich indexing capabilities, allowing you to optimize query performance.


Easy to Learn and Use: The JSON-like document structure is intuitive and easy to understand, making
                        it easier for developers to learn and use MongoDB.


MongoDB Atlas: MongoDB Atlas is a fully managed cloud database service that simplifies deployment, management, 
                and scaling of MongoDB deployments
                

MongoDB is designed to work well with cloud platforms like AWS, Azure, and Google Cloud.

'''

### 3  Explain the concept of collections in MongoDB

In [None]:
'''

In MongoDB, a collection is an  organizational unit that serves as a container for documents

Collections can hold documents with varying structures (schemas)

Each document within a collection is represented in a BSON format (Binary JSON).

This allows you to store complex data types like arrays, nested documents, and various types of fields

Collections can be indexed to improve the performance of queries

You can perform queries on collections to retrieve specific documents using a variety of criteria

'''

#### example

In [None]:
{
  "_id": ObjectId("60c72b2f9b1d2258f7b2a111"),
  "username": "john_doe",
  "email": "john@example.com",
}

### 4 How does MongoDB ensure high availability using replication

In [None]:
'''

MongoDB ensures high availability through a feature called replication.

Replication involves creating multiple copies of your data across different server instances, 
forming what's called a replica set.


Setting up a replica set is  done through the MongoDB shell 

initialize a replica set using the rs.initiate() command and then add members using rs.add().


It provides automatic failover, data redundancy, and read scalability, which are essential for
modern applications that require continuous operation and data durability

'''

### 5  What are the main benefits of MongoDB Atlas

In [None]:
'''

Main benefits of MongoDB Atlas :-

Simplified Operations: Atlas handles all the operational complexities of running a MongoDB database

Automated Maintenance: Atlas automatically applies security patches, software updates, and version upgrades

Reduced Operational Costs: By offloading operational tasks to Atlas,  team can focus on other works

Global Clusters: Atlas supports global clusters that can span multiple geographic regions, allowing you 
                to deploy your database closer to your users and reduce latency.

Built-in Security Features: Atlas includes a range of security features, such as encryption at rest 


'''

### 6 What is the role of indexes in MongoDB, and how do they improve performance


In [None]:
'''

Role of Indexes:

The primary role of an index is to speed up read operations (queries) 
by allowing MongoDB to locate and retrieve documents more efficiently.

Instead of scanning every document in a collection, MongoDB can use the 
index to quickly find the documents that match the query criteria.

 Indexes Improve Performance:

 (a) Reduced Query Execution Time
 
 (b)  Optimized Query Targeting
 
 (c) Efficient Sorting
 
 (d) Covered Queries
 

indexes are essential for optimizing query performance in MongoDB. 
'''

### 7 Describe the stages of the MongoDB aggregation pipeline

In [None]:
'''
 MongoDB aggregation pipeline is a series of operations that process data from a MongoDB collection. 

 Stages :-

 1 $match:  This stage filters the input documents to include only those that match the specified condition. 

 2 $sort: This stage sorts the input documents in ascending or descending order based on the specified field(s).

 3 $project: This stage adds or removes fields from the input documents, or performs other data modifications

 4 $group: This stage groups the input documents by one or more fields and applies an aggregation 
             operation to each group.

 5 $skip: This stage skips a specified number of documents from the input document stream

 

'''

In [None]:
'''
db.collection.aggregate([
    {
        $match: { category: "Wood" }
    },
    {
        $sort: { price: 1 }
    },
    {
        $project: { _id: 1, name: 0, price: 1 }
    },
    {
        $group: { _id: "$category", total: { $sum: "$price" } }
    },
    {
        $out: "wood_sales"
    }
]).explain()
'''

### 8  What is sharding in MongoDB? How does it differ from replication

In [None]:
'''

Sharding in MongoDB is a method used to distribute data across multiple servers
or clusters to ensure that each server maintains a portion of the overall workload and data. 

you can add more servers and distribute the data among them rather than upgrading existing hardware. 

In a sharded cluster, data is divided into chunks and each chunk is assigned to a shard. Each shard 
is essentially a single MongoDB instance (or replica set) that holds a subset of the data


Difference from Replication:

Replication, on the other hand, is focused on data availability and reliability rather than
distributing data across servers.

It involves creating copies of the same data across multiple servers.

Maintains multiple copies of the same data across different servers for redundancy and high availability.
It focuses on data reliability and fault tolerance.

'''

### 9 What is PyMongo, and why is it used

In [None]:
'''

PyMongo is a Python library that provides a way for Python applications to interact with MongoDB databases. 

 It serves as a driver that facilitates communication between Python code and MongoDB, allowing developers
 to perform various database operations seamlessly.

 Use PyMongo?
 Ease of Use: It provides a Pythonic interface for interacting with MongoDB

  PyMongo offers comprehensive support for most MongoDB features, enabling
  developers to leverage the full power of MongoDB 

  Community Support:  PyMongo has strong community support and extensive 
  documentation, making it easier for developers to find resources and troubleshoot issues.
  
  Compatibility  PyMongo ensures that developers can take advantage of the specific
  functionalities and optimizations of MongoDB.

'''

### 10 What are the ACID properties in the context of MongoDB transactions

In [None]:
'''

1 Atomicity means that a transaction is treated as a single, indivisible unit of work.
If any part of a transaction fails, the entire transaction is rolled back,
and the database is restored to its previous state.


2 Consistency means that a transaction will bring the database from one valid state to another.
In MongoDB, consistency is ensured by maintaining data invariants and following the data modeling principles.


3 Isolation refers to the ability of a transaction to execute in isolation from other concurrent transactions.
In MongoDB, isolation is ensured through the
concept of "lock-based concurrency control" and "multi-document ACID transactions.


4 Durability refers to the persistence of transaction results after the transaction has completed successfully.
In MongoDB, durability is ensured through the concept of "journaling."


'''

### 11 What is the purpose of MongoDB’s explain() function

In [None]:
'''

 the explain() function is a method that provides detailed information about the execution plan of a query.

 It allows us to understand how MongoDB processes your queries and identifies potential performance issues.

 explain() can help you diagnose performance problems by providing insights into query
 execution and data retrieval.
 
 explain() shows you how MongoDB processes your queries, including the order of operations,
 indexes used, and data retrieval methods.

 
'''

### 12  How does MongoDB handle schema validation

In [None]:
'''

MongoDB provides a powerful mechanism for schema validation, which helps 
maintain data consistency and integrity

MongoDB provides two validation modes to help manage the validation process:

Strict Mode: In strict mode, MongoDB will not insert a document into the collection 
if it fails the validation rules.

The default validation action is "error,

Moderate Mode: In moderate mode, MongoDB will not insert a document into the collection 
if it fails the validation rules, but it will insert a document when a 
validation failure occurs.

'''

### 13  What is the difference between a primary and a secondary node in a replica set

In [None]:
'''
Primary Node:

The primary node is the main server in a replica set that receives all write operations

Only one primary node can exist in a replica set at any given time

 The primary node is responsible for ensuring that writes are acknowledged.

 If the primary node becomes unavailable (e.g., due to a server crash or network issue),
 an automatic failover process occurs.

By default, reads can be sent to the primary node

Secondary Node:

Secondary nodes replicate the data of the primary node and provide redundancy within the replica set

 Secondary nodes replicate the operations from the primary node’s oplog (operation log).

 Secondary nodes periodically pull data changes from the primary node, meaning they keep their
 data sets synchronized with the primary.

 Depending on the read preference configuration, secondary nodes can also handle read operations,
 allowing for distributed query loads
'''

### 14  What security mechanisms does MongoDB provide for data protection

In [None]:
'''

1. Authentication
Authentication ensures that users are who they claim to be. MongoDB supports several authentication methods:


2. Authorization
Authorization ensures that authenticated users have permission to perform certain actions. 
MongoDB uses role-based access control (RBAC):


3. Encryption
MongoDB provides data encryption mechanisms to protect data at rest and in transit:


4. Auditing
MongoDB includes a built-in auditing feature that allows you to log access and changes to the database. 


5. Network Security
MongoDB offers several measures for securing network communications:

IP Whitelisting: You can restrict access to the MongoDB server
by allowing connections only from specific IP addresses.



'''

### 15 Explain the concept of embedded documents and when they should be used

In [None]:
'''

Embedded documents, also known as nested documents, are a way to structure 
data in MongoDB by including one document inside another.


An embedded document is simply a document that is nested within another document,
creating a parent-child relationship.


Use of  Embedded Documents:


Data is Hierarchical: When you have a natural "parent-child" or one-to-many relationship


Data is Frequently Accessed Together: If you often need to retrieve the embedded data alongside
the main document in your queries


Atomicity is Required:  If you need to ensure that changes to the parent document and its 
embedded documents are always consistent (all or nothing), embedding can ensure this.

'''

### 16 What is the purpose of MongoDB’s $lookup stage in aggregation

In [None]:
'''

The $lookup stage in MongoDB's aggregation framework serves the purpose 
of performing a left outer join between two collections.


Purpose of $lookup


The primary purpose of the $lookup stage is to join documents from one collection with documents from another.


$lookup allows you to build more complex queries that can include related data, making it easier to
retrieve comprehensive information in a single query 


By using $lookup, you can enrich the documents returned by the aggregation pipeline with
additional data from another collection, helping to unify data for reporting or presentation purposes.


'''

### 17 What are some common use cases for MongoDB

In [None]:
'''

Real-time Analytics: Businesses often need to analyze data in real time.

Internet of Things (IoT): IoT applications generate large volumes of data from numerous devices.
MongoDB can easily scale horizontally to handle this influx of data


Mobile Applications: Due to its flexible schema and easy integration with various programming 
languages, MongoDB is often used in backend databases


Social Networks: MongoDB can efficiently store user profiles, connections, and interactions,
making it  suitable for social networking applications 

Machine Learning and AI: MongoDB can be used to store and retrieve large datasets necessary
for training machine learning models, 

'''

### 18  What are the advantages of using MongoDB for horizontal scaling

In [None]:
'''

Easy Horizontal Scaling: MongoDB makes it easy to add more replicas and shards
to the cluster as needed


Autodetection of Nodes: MongoDB's configuration file allows the database to 
discover new nodes automatically


No Manual Data Rebalancing: MongoDB automatically handles data rebalancing
when new nodes are added or existing nodes are removed


Multi-Data Center Support: MongoDB supports data centers across different
geographic regions, allowing for global distribution and replication of data.


Automatic Backup and Recovery: MongoDB includes automatic backup and recovery
features, such as the MMAPv1 journal


'''

### 19 How do MongoDB transactions differ from SQL transactions

In [None]:
'''

SQL Transactions: SQL databases are built around the ACID 
(Atomicity, Consistency, Isolation, Durability)
properties to ensure data integrity. 

These properties are central to how transactions are handled. 

SQL databases typically operate with a rigid schema that enforces 
consistency across the entire database.


MongoDB Transactions: Historically, MongoDB emphasized performance and scalability
over strict ACID compliance, especially across multiple documents.

MongoDB has significantly improved its support for ACID transactions. 

MongoDB multi-document transactions use snapshot isolation

SQL Transactions: SQL transactions typically involve starting a transaction with BEGIN TRANSACTION

MongoDB Transactions: MongoDB transactions are managed through a session. 

'''

### 20 What are the main differences between capped collections and regular collections

In [None]:
'''

Capped Collections:

Capped collections are designed to be used as a circular buffer or a queue.

They are ideal for storing log data, audit trails, or any other type of data that needs
to be stored in a first-in-first-out (FIFO) order.

Behavior: When a capped collection reaches its maximum size, it wraps around to the beginning, 
overwriting the oldest documents.

This means that only the most recent documents are retained.

Insertion Order: Documents are inserted in the order they are received, 
and the oldest documents are automatically removed when the collection 
reaches its maximum size.

 Capped collections have lower write performance because they require
 additional overhead to manage the circular buffer.

 Regular Collections:

 Regular collections are the default type of collection in MongoDB. 
 
 They are used for storing a wide range of data types and use cases.
 
Behavior: There is no automatic limit on the size of a regular collection,
and documents are not automatically removed when the collection grows.

Insertion Order: Documents are inserted in the order they are received,
but there is no guarantee that they will be stored in a specific order.

There is no limit to the number of documents that can be inserted into a regular collection.

 Regular collections have lower read performance compared to capped collections
 because they use multiple indexes to store documents.
 
 Regular collections have better write performance compared to capped collections 
 because they do not require additional overhead to manage the circular buffer.


'''

### 21 What is the purpose of the $match stage in MongoDB’s aggregation pipeline

In [None]:
'''
Purpose of the $match :-

Filter out irrelevant data: The $match stage eliminates documents that do not 
meet certain criteria, reducing the amount of data that needs to be processed in the pipeline.

Improve data quality: By filtering out invalid or incorrect data, the $match stage ensures that the 
input to the next stage of the pipeline is accurate and valid.

Improve scalability: Filtering out irrelevant data early in the pipeline can help scale the
aggregation pipeline by reducing the amount of data that needs to be processed.

Filtering out irrelevant data early in the pipeline can improve performance
by reducing the amount of data 

'''

### 22 How can you secure access to a MongoDB database

In [None]:
'''

Enable Authentication:

Use MongoDB’s built-in user authentication

Use Role-Based Access Control (RBAC):

Implement RBAC to restrict access based on roles. Assign users only the 
permissions they need to operate.

Enable TLS/SSL:

Use Transport Layer Security (TLS) to encrypt data in transit between the client and server

Regularly Update MongoDB:

Keep MongoDB up to date with the latest security patches and releases to
protect against vulnerabilities.

Data Encryption at Rest:

Use MongoDB's Encrypted Storage Engine to encrypt data at rest.

Audit and Logging:

Enable audit logging to track access and actions performed in the database

'''

### 23 What is MongoDB’s WiredTiger storage engine, and why is it important?

In [None]:
'''

MongoDB's WiredTiger storage engine is a high-performance, scalable, and flexible storage engine
introduced as the default in MongoDB version 3.2.

It provides several important features and benefits that enhance the
overall performance and usability of MongoDB databases.

Importance of WiredTiger

Performance Optimization: The enhancements in concurrency control and locking 
mechanisms lead to significantly improved performance

Scalability: WiredTiger can handle larger datasets efficiently

Cost Efficiency: The ability to compress data can lead to lower storage costs,
which is especially important for cloud deployments

 Features like write-ahead logging and checkpointing enhance the durability of data, 
 
'''

# Practical Questions:-

### 1 Write a Python script to load the Superstore dataset from a CSV file into MongoDB

In [3]:
from pymongo import MongoClient
import pandas as pd

In [10]:
# Connect to MongoDB
client = MongoClient("mongodb://localhost:27017/")
db = client["mydb"]
collection = db["store"]
df = pd.read_csv(r'C:\Users\Hrishabh\Downloads\superstore.csv')
df.head()

Unnamed: 0,Row ID,Order ID,Order Date,Ship Date,Ship Mode,Customer ID,Customer Name,Segment,Country,City,...,Postal Code,Region,Product ID,Category,Sub-Category,Product Name,Sales,Quantity,Discount,Profit
0,1,CA-2016-152156,11/8/2016,11/11/2016,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,...,42420,South,FUR-BO-10001798,Furniture,Bookcases,Bush Somerset Collection Bookcase,261.96,2,0.0,41.9136
1,2,CA-2016-152156,11/8/2016,11/11/2016,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,...,42420,South,FUR-CH-10000454,Furniture,Chairs,"Hon Deluxe Fabric Upholstered Stacking Chairs,...",731.94,3,0.0,219.582
2,3,CA-2016-138688,6/12/2016,6/16/2016,Second Class,DV-13045,Darrin Van Huff,Corporate,United States,Los Angeles,...,90036,West,OFF-LA-10000240,Office Supplies,Labels,Self-Adhesive Address Labels for Typewriters b...,14.62,2,0.0,6.8714
3,4,US-2015-108966,10/11/2015,10/18/2015,Standard Class,SO-20335,Sean O'Donnell,Consumer,United States,Fort Lauderdale,...,33311,South,FUR-TA-10000577,Furniture,Tables,Bretford CR4500 Series Slim Rectangular Table,957.5775,5,0.45,-383.031
4,5,US-2015-108966,10/11/2015,10/18/2015,Standard Class,SO-20335,Sean O'Donnell,Consumer,United States,Fort Lauderdale,...,33311,South,OFF-ST-10000760,Office Supplies,Storage,Eldon Fold 'N Roll Cart System,22.368,2,0.2,2.5164


In [11]:
# Insert data into MongoDB
collection.insert_many(df.to_dict(orient='records'))

InsertManyResult([ObjectId('67acab1f2923d727a67bb85f'), ObjectId('67acab1f2923d727a67bb860'), ObjectId('67acab1f2923d727a67bb861'), ObjectId('67acab1f2923d727a67bb862'), ObjectId('67acab1f2923d727a67bb863'), ObjectId('67acab1f2923d727a67bb864'), ObjectId('67acab1f2923d727a67bb865'), ObjectId('67acab1f2923d727a67bb866'), ObjectId('67acab1f2923d727a67bb867'), ObjectId('67acab1f2923d727a67bb868'), ObjectId('67acab1f2923d727a67bb869'), ObjectId('67acab1f2923d727a67bb86a'), ObjectId('67acab1f2923d727a67bb86b'), ObjectId('67acab1f2923d727a67bb86c'), ObjectId('67acab1f2923d727a67bb86d'), ObjectId('67acab1f2923d727a67bb86e'), ObjectId('67acab1f2923d727a67bb86f'), ObjectId('67acab1f2923d727a67bb870'), ObjectId('67acab1f2923d727a67bb871'), ObjectId('67acab1f2923d727a67bb872'), ObjectId('67acab1f2923d727a67bb873'), ObjectId('67acab1f2923d727a67bb874'), ObjectId('67acab1f2923d727a67bb875'), ObjectId('67acab1f2923d727a67bb876'), ObjectId('67acab1f2923d727a67bb877'), ObjectId('67acab1f2923d727a67bb8

In [15]:
client.close()

### 2 Retrieve and print all documents from the Orders collection

In [31]:
client = MongoClient("mongodb://localhost:27017/")
db = client["mydb"]
collection = db["Orders"]

In [32]:
data = collection.find()
for doc in data:
    print(doc)

{'_id': ObjectId('67acad7b2923d727a67bdf6f'), 'Row ID': 1, 'Order ID': 'CA-2016-152156', 'Order Date': '11/8/2016', 'Ship Date': '11/11/2016', 'Ship Mode': 'Second Class', 'Customer ID': 'CG-12520', 'Customer Name': 'Claire Gute', 'Segment': 'Consumer', 'Country': 'United States', 'City': 'Henderson', 'State': 'Kentucky', 'Postal Code': 42420, 'Region': 'South', 'Product ID': 'FUR-BO-10001798', 'Category': 'Furniture', 'Sub-Category': 'Bookcases', 'Product Name': 'Bush Somerset Collection Bookcase', 'Sales': 261.96, 'Quantity': 2, 'Discount': 0.0, 'Profit': 41.9136}
{'_id': ObjectId('67acad7b2923d727a67bdf70'), 'Row ID': 2, 'Order ID': 'CA-2016-152156', 'Order Date': '11/8/2016', 'Ship Date': '11/11/2016', 'Ship Mode': 'Second Class', 'Customer ID': 'CG-12520', 'Customer Name': 'Claire Gute', 'Segment': 'Consumer', 'Country': 'United States', 'City': 'Henderson', 'State': 'Kentucky', 'Postal Code': 42420, 'Region': 'South', 'Product ID': 'FUR-CH-10000454', 'Category': 'Furniture', 'Sub

IOPub data rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_data_rate_limit`.

Current values:
ServerApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
ServerApp.rate_limit_window=3.0 (secs)



### 3 Count and display the total number of documents in the Orders collection

In [34]:
t_doc = collection.count_documents({})
print(f"Total number of documents in the Orders collection: {t_doc}")

Total number of documents in the Orders collection: 9994


### 4 Write a query to fetch all orders from the "West" region

In [36]:
data=collection.find({"Region":"West"})
for i in data:
    print(i)

{'_id': ObjectId('67acad7b2923d727a67bdf71'), 'Row ID': 3, 'Order ID': 'CA-2016-138688', 'Order Date': '6/12/2016', 'Ship Date': '6/16/2016', 'Ship Mode': 'Second Class', 'Customer ID': 'DV-13045', 'Customer Name': 'Darrin Van Huff', 'Segment': 'Corporate', 'Country': 'United States', 'City': 'Los Angeles', 'State': 'California', 'Postal Code': 90036, 'Region': 'West', 'Product ID': 'OFF-LA-10000240', 'Category': 'Office Supplies', 'Sub-Category': 'Labels', 'Product Name': 'Self-Adhesive Address Labels for Typewriters by Universal', 'Sales': 14.62, 'Quantity': 2, 'Discount': 0.0, 'Profit': 6.8714}
{'_id': ObjectId('67acad7b2923d727a67bdf74'), 'Row ID': 6, 'Order ID': 'CA-2014-115812', 'Order Date': '6/9/2014', 'Ship Date': '6/14/2014', 'Ship Mode': 'Standard Class', 'Customer ID': 'BH-11710', 'Customer Name': 'Brosina Hoffman', 'Segment': 'Consumer', 'Country': 'United States', 'City': 'Los Angeles', 'State': 'California', 'Postal Code': 90032, 'Region': 'West', 'Product ID': 'FUR-FU-1

### 5  Write a query to find orders where Sales is greater than 500

In [37]:
data=collection.find({"Sales":{"$gt":500}})
for i in data:
    print(i)

{'_id': ObjectId('67acad7b2923d727a67bdf70'), 'Row ID': 2, 'Order ID': 'CA-2016-152156', 'Order Date': '11/8/2016', 'Ship Date': '11/11/2016', 'Ship Mode': 'Second Class', 'Customer ID': 'CG-12520', 'Customer Name': 'Claire Gute', 'Segment': 'Consumer', 'Country': 'United States', 'City': 'Henderson', 'State': 'Kentucky', 'Postal Code': 42420, 'Region': 'South', 'Product ID': 'FUR-CH-10000454', 'Category': 'Furniture', 'Sub-Category': 'Chairs', 'Product Name': 'Hon Deluxe Fabric Upholstered Stacking Chairs, Rounded Back', 'Sales': 731.94, 'Quantity': 3, 'Discount': 0.0, 'Profit': 219.582}
{'_id': ObjectId('67acad7b2923d727a67bdf72'), 'Row ID': 4, 'Order ID': 'US-2015-108966', 'Order Date': '10/11/2015', 'Ship Date': '10/18/2015', 'Ship Mode': 'Standard Class', 'Customer ID': 'SO-20335', 'Customer Name': "Sean O'Donnell", 'Segment': 'Consumer', 'Country': 'United States', 'City': 'Fort Lauderdale', 'State': 'Florida', 'Postal Code': 33311, 'Region': 'South', 'Product ID': 'FUR-TA-100005

### 6 Fetch the top 3 orders with the highest Profit

In [38]:
data=collection.find().sort("Profit",-1).limit(3)
for i in data:
    print(i)

{'_id': ObjectId('67acad7b2923d727a67bfa19'), 'Row ID': 6827, 'Order ID': 'CA-2016-118689', 'Order Date': '10/2/2016', 'Ship Date': '10/9/2016', 'Ship Mode': 'Standard Class', 'Customer ID': 'TC-20980', 'Customer Name': 'Tamara Chand', 'Segment': 'Corporate', 'Country': 'United States', 'City': 'Lafayette', 'State': 'Indiana', 'Postal Code': 47905, 'Region': 'Central', 'Product ID': 'TEC-CO-10004722', 'Category': 'Technology', 'Sub-Category': 'Copiers', 'Product Name': 'Canon imageCLASS 2200 Advanced Copier', 'Sales': 17499.95, 'Quantity': 5, 'Discount': 0.0, 'Profit': 8399.976}
{'_id': ObjectId('67acad7b2923d727a67bff48'), 'Row ID': 8154, 'Order ID': 'CA-2017-140151', 'Order Date': '3/23/2017', 'Ship Date': '3/25/2017', 'Ship Mode': 'First Class', 'Customer ID': 'RB-19360', 'Customer Name': 'Raymond Buch', 'Segment': 'Consumer', 'Country': 'United States', 'City': 'Seattle', 'State': 'Washington', 'Postal Code': 98115, 'Region': 'West', 'Product ID': 'TEC-CO-10004722', 'Category': 'Te

### 7 Update all orders with Ship Mode as "First Class" to "Premium Class.

In [40]:
fil = {"Ship Mode": "First Class"}
update = {"$set": {"Ship Mode": "Premium Class"}}
collection.update_many(fil, update)

UpdateResult({'n': 0, 'nModified': 0, 'ok': 1.0, 'updatedExisting': False}, acknowledged=True)

In [41]:
print(f"Updated {collection.count_documents(fil)} orders")

Updated 0 orders


In [43]:
data=collection.find({"Ship Mode":"Premium Class"}).limit(2)
for i in data:
    print(i)

{'_id': ObjectId('67acad7b2923d727a67bdf92'), 'Row ID': 36, 'Order ID': 'CA-2016-117590', 'Order Date': '12/8/2016', 'Ship Date': '12/10/2016', 'Ship Mode': 'Premium Class', 'Customer ID': 'GH-14485', 'Customer Name': 'Gene Hale', 'Segment': 'Corporate', 'Country': 'United States', 'City': 'Richardson', 'State': 'Texas', 'Postal Code': 75080, 'Region': 'Central', 'Product ID': 'TEC-PH-10004977', 'Category': 'Technology', 'Sub-Category': 'Phones', 'Product Name': 'GE 30524EE4', 'Sales': 1097.544, 'Quantity': 7, 'Discount': 0.2, 'Profit': 123.4737}
{'_id': ObjectId('67acad7b2923d727a67bdf93'), 'Row ID': 37, 'Order ID': 'CA-2016-117590', 'Order Date': '12/8/2016', 'Ship Date': '12/10/2016', 'Ship Mode': 'Premium Class', 'Customer ID': 'GH-14485', 'Customer Name': 'Gene Hale', 'Segment': 'Corporate', 'Country': 'United States', 'City': 'Richardson', 'State': 'Texas', 'Postal Code': 75080, 'Region': 'Central', 'Product ID': 'FUR-FU-10003664', 'Category': 'Furniture', 'Sub-Category': 'Furnis

### 8 Delete all orders where Sales is less than 50

In [45]:
data=collection.delete_many({"Sales":{"$lt":50}})
print(f"Deleted {data.deleted_count} orders with Sales less than 50.")

Deleted 4849 orders with Sales less than 50.


### 9  Use aggregation to group orders by Region and calculate total sales per region

In [47]:
pipeline = [
    {"$match": {"Region": {"$ne": None}}}, {"$group": {"_id": "$Region",   "TotalSales": {"$sum": "$Sales"}}}
]

# Execute the aggregation pipeline
data = collection.aggregate(pipeline)
print("Total Sales per Region:")
for doc in data:
    print(f"Region: {doc['_id']}, Total Sales: {doc['TotalSales']}")

Total Sales per Region:
Region: Central, Total Sales: 479611.8458
Region: South, Total Sales: 376023.312
Region: East, Total Sales: 651137.705
Region: West, Total Sales: 694686.6195


### 10  Fetch all distinct values for Ship Mode from the collection

In [51]:
data=collection.distinct("Ship Mode")
print("Distinct values for Ship Mode :")
print("-------------------------------")
for mode in data:
    print(mode)

Distinct values for Ship Mode :
-------------------------------
Premium Class
Same Day
Second Class
Standard Class


###  11 Count the number of orders for each category.

In [53]:
p = [ {"$group": {"_id": "$Category","TotalOrders": {"$sum": 1}}}]
data = collection.aggregate(p)
for doc in data:
    print(f"Category: {doc['_id']}, Total Orders: {doc['TotalOrders']}")

Category: Furniture, Total Orders: 1573
Category: Technology, Total Orders: 1496
Category: Office Supplies, Total Orders: 2076


# close the connection

In [54]:
client.close()