1. What are the key differences between SQL and NoSQL databases?

-> Here are the **key differences between SQL and NoSQL databases**:

| Feature                  | **SQL Databases**                                       | **NoSQL Databases**                                      |
| ------------------------ | ------------------------------------------------------- | -------------------------------------------------------- |
| **Type**                 | Relational (RDBMS)                                      | Non-relational or distributed                            |
| **Data Storage Format**  | Tables with rows and columns                            | Key-value, document, column, or graph formats            |
| **Schema**               | Fixed and predefined schema                             | Dynamic or flexible schema                               |
| **Scalability**          | Vertically scalable (add more power to a single server) | Horizontally scalable (add more servers)                 |
| **Query Language**       | SQL (Structured Query Language)                         | No standard query language; varies by system             |
| **Examples**             | MySQL, PostgreSQL, Oracle, SQL Server                   | MongoDB, Cassandra, Redis, CouchDB                       |
| **Transactions Support** | Strong ACID compliance                                  | Often BASE compliant, some support ACID                  |
| **Best For**             | Complex queries, structured data, consistency           | Big data, unstructured/semi-structured data, flexibility |



2. What makes MongoDB a good choice for modern application?

-> MongoDB is a **popular NoSQL database** that is well-suited for modern applications due to the following key features:

---

###  1. **Flexible Schema (Document-Oriented)**

* Stores data in **JSON-like documents** (BSON), allowing dynamic fields.
* You don’t need to predefine your schema — useful for rapidly changing requirements.

---

###  2. **Scalability**

* Supports **horizontal scaling** using sharding (distributing data across multiple servers).
* Ideal for **big data** and **high-traffic** applications.

---

###  3. **High Performance**

* Faster reads and writes for large-scale unstructured data.
* Uses in-memory storage and indexing for quick access.

---

###  4. **Developer-Friendly**

* Easy to use with popular programming languages (Node.js, Python, Java, etc.).
* Rich query language supports filtering, aggregation, geospatial queries, etc.

---

###  5. **High Availability**

* Built-in **replication** and **automatic failover** ensure data availability and reliability.

---

###  6. **Cloud-Native with Atlas**

* MongoDB Atlas offers a fully managed database-as-a-service.
* Supports automated backups, monitoring, and global deployments.

---

###  7. **Good for Modern Use Cases**

Perfect for:

* Real-time analytics
* IoT applications
* Content management
* E-commerce platforms
* Mobile apps



3. Explain the concept of collections in MongoDB.

-> ###  Concept of **Collections** in MongoDB:

In **MongoDB**, a **collection** is the equivalent of a **table** in a relational (SQL) database — but it's **schema-less** and much more flexible.

---

###  **Definition:**

A **collection** is a **group of documents** (records) that are stored together in a MongoDB database.

---

###  **Key Features of Collections:**

| Feature                | Description                                                               |
| ---------------------- | ------------------------------------------------------------------------- |
| **Document storage**   | Holds **documents** in **BSON** (Binary JSON) format                      |
| **No fixed schema**    | Documents in the same collection can have different fields and structures |
| **Named containers**   | Collections are identified by **names**, like `users`, `orders`, etc.     |
| **Indexing supported** | You can add **indexes** on fields within documents to improve query speed |

---

### 🧾 **Example:**

You can create a collection called `users`:

```js
db.users.insertOne({
  name: "Megha",
  age: 22,
  email: "megha@example.com"
});
```

In this example:

* `users` is the **collection**
* The object inside `{}` is a **document**

---

###  Think of it like:

* **SQL**: Database → Table → Row → Column
* **MongoDB**: Database → Collection → Document → Field



4. How does MongoDB ensure high availability using replication?

-> MongoDB uses a feature called replica sets to provide high availability, ensuring that your data remains accessible even if some servers fail. MongoDB ensures high availability using replica sets with automatic data replication, failover, and optional read scaling, making it reliable for mission-critical applications.

5. What are the main benefits of MongoDB Atla?

-> MongoDB Atlas is the fully managed cloud version of MongoDB, designed to handle deployment, scaling, and security — so you can focus on building apps, not managing databases.



6. What is the role of indexes in MongoDB, and how do they improve performance?

-> ###  Role of **Indexes in MongoDB** & How They Improve Performance

---

###  **What is an Index in MongoDB?**

An **index** in MongoDB is a special data structure that **stores a small portion of the dataset** in a way that makes queries faster — similar to an index in a book.

---

###  **How Indexes Improve Performance:**

Without an index:

* MongoDB must **scan every document** in a collection to find matches (called a *collection scan*).
* This is **slow**, especially for large collections.

With an index:

* MongoDB can **quickly locate** documents using the indexed fields.
* **Faster read operations** and better query performance.

---

###  **Types of Indexes in MongoDB:**

| Type             | Description                                              |
| ---------------- | -------------------------------------------------------- |
| **Single Field** | Indexes one field (e.g., `name`, `email`)                |
| **Compound**     | Indexes multiple fields (e.g., `firstName` + `lastName`) |
| **Multikey**     | Indexes array fields                                     |
| **Text Index**   | Enables full-text search on string fields                |
| **Geospatial**   | Used for location-based queries                          |
| **Hashed**       | For sharding, evenly distributes data across shards      |

---

###  **Example: Creating an Index**

```js
db.users.createIndex({ email: 1 });
```

* `1` means **ascending order**.
* Now queries like `db.users.find({ email: "megha@example.com" })` will be much faster.

---



7. Describe the stages of the MongoDB aggregation pipeline.

-> The aggregation pipeline stages work like an assembly line:

$match → $group → $project → $sort → ...

Each stage focuses on filtering, transforming, joining, or summarizing data — making MongoDB powerful for analytics and reporting.

8. What is sharding in MongoDB? How does it differ from replication?

-> ###  **What is Sharding in MongoDB?**

**Sharding** is the process of **splitting large datasets** across multiple machines (called **shards**) to ensure **horizontal scalability**.

---

###  **Purpose of Sharding:**

* To handle **very large volumes of data**.
* To support **high-throughput** read/write operations.
* To **scale out** by adding more machines instead of upgrading a single server.

---

###  **How Sharding Works:**

MongoDB distributes data based on a **shard key**, which determines how data is split.

**Architecture Includes:**

1. **Shards**: Data-bearing nodes (can be replica sets).
2. **Config Servers**: Store metadata and cluster configuration.
3. **Query Routers (`mongos`)**: Route queries to the correct shard(s).

---

###  **Replication vs Sharding:**

| Feature              | **Replication**                                         | **Sharding**                                  |
| -------------------- | ------------------------------------------------------- | --------------------------------------------- |
| **Purpose**          | High availability and redundancy                        | Horizontal scaling of data and workload       |
| **Data Copy**        | Copies same data to multiple nodes                      | Splits data across nodes                      |
| **Data Location**    | Each node has **full copy** of data                     | Each node holds **a portion** of data         |
| **Failover Support** | Yes (via replica sets and elections)                    | Not the goal; focuses on scale                |
| **Used For**         | Backup, high availability, disaster recovery            | Large datasets, high-performance applications |
| **Can Combine?**     |  Yes! Each shard can be a replica set (for HA + scale) |  Yes! Common in production environments      |





9. What is PyMongo, and why is it used?

-> PyMongo is the official Python driver for working with MongoDB.

It allows Python applications to:

Connect to MongoDB

Perform CRUD operations (Create, Read, Update, Delete)

Execute queries and aggregations

Manage collections and databases

10. What are the ACID properties in the context of MongoDB transactions?

-> ###  **ACID Properties in MongoDB Transactions**

In the context of MongoDB (especially since **version 4.0**), **ACID transactions** ensure **reliable and consistent** data operations — just like in relational databases.

---

###  What Does ACID Stand For?

| Property            | Meaning                                                                                                           | MongoDB Behavior                                                                  |
| ------------------- | ----------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------- |
| **A — Atomicity**   | All operations in a transaction are treated as a **single unit**: either **all succeed** or **none** are applied. | MongoDB rolls back the transaction if any part fails.                             |
| **C — Consistency** | Transactions always move the database from one **valid state to another**.                                        | MongoDB enforces schema rules, constraints, and referential integrity if defined. |
| **I — Isolation**   | **Concurrent transactions** do not interfere with each other. Intermediate states are **not visible** to others.  | Uses **snapshot isolation** to ensure no dirty reads.                             |
| **D — Durability**  | Once a transaction is **committed**, the changes are **permanently stored**, even in the case of a crash.         | Data is written to the **journal** and is recoverable.                            |



11. What is the purpose of MongoDB’s explain() function?

-> The explain() function in MongoDB is used to analyze and understand how a query is executed. It helps optimize performance by showing how MongoDB processes the query internally.

12. How does MongoDB handle schema validation?

-> MongoDB, while known for its flexible and schema-less nature, provides a powerful feature called **schema validation** to enforce data structure and integrity. This feature allows developers to define rules for the documents stored in a collection using the **JSON Schema standard**. Through schema validation, you can specify required fields, data types, and even value constraints such as minimum values or string patterns (e.g., for email validation). These rules are applied at the collection level using the `$jsonSchema` operator. When creating or updating a collection, you can set options like `validationLevel` (e.g., "strict" to apply rules to all documents) and `validationAction` (e.g., "error" to block invalid writes or "warn" to log them). This functionality helps maintain **data consistency and quality**, especially in applications where certain fields and formats must be enforced. Despite MongoDB’s flexibility, schema validation strikes a balance by offering structured enforcement without fully compromising the benefits of a dynamic schema model.


13. What is the difference between a primary and a secondary node in a replica set?

-> In a MongoDB **replica set**, the difference between a **primary** and a **secondary** node lies in their **roles and responsibilities** within data replication and client interaction.

The **primary node** is the **main node** that **accepts all write operations**. Any changes made to the data—such as inserts, updates, or deletes—are first applied to the primary. It then records these operations in its **oplog** (operations log), which is used by the secondary nodes to replicate the data.

The **secondary nodes**, on the other hand, do **not accept write operations** directly (unless using special configurations like read preferences). Instead, they **continuously replicate** the data from the primary’s oplog to maintain an up-to-date copy of the data. These secondaries can be used for **read operations** and provide **high availability** in case the primary fails.

If the primary node goes down, the replica set automatically **elects a new primary** from the secondaries to ensure continuous service — this process is known as **automatic failover**.

In summary, the **primary node handles writes and coordinates replication**, while **secondary nodes replicate data and provide redundancy and failover support**.


14. What security mechanisms does MongoDB provide for data protection?

-> MongoDB’s security mechanisms include authentication, authorization, encryption, auditing, and network access controls, making it a secure choice for both cloud and on-premise deployments. These features help ensure your data is protected against unauthorized access, breaches, and compliance failures.

15. Explain the concept of embedded documents and when they should be used.

-> In MongoDB, embedded documents are documents that are nested inside other documents as field values. This is a core feature of MongoDB’s document-oriented structure, allowing related data to be stored together in a single document.



16. What is the purpose of MongoDB’s $lookup stage in aggregation?

-> The purpose of MongoDB’s **`$lookup`** stage in aggregation is to **join data from two collections** by matching fields, similar to a **left outer join** in SQL. It allows you to combine documents from one collection with related documents from another collection based on a common field. This is especially useful when you need to enrich data or create more complex reports within a single aggregation query.

For example, if you have an `orders` collection and a `customers` collection, you can use `$lookup` to include customer details in each order document by matching `orders.customerId` with `customers._id`. The joined data appears as a **new array field** (e.g., `customerInfo`) in each document.

Overall, `$lookup` helps avoid multiple queries in application logic, simplifies data processing, and allows MongoDB to handle relational-style operations in a NoSQL environment.


17. What are some common use cases for MongoDB?

-> MongoDB is a popular NoSQL, document-based database designed for flexibility, scalability, and performance. It’s widely used in modern applications across various industries. MongoDB is used wherever applications require flexible data models, horizontal scalability, and high availability. Its schema-less design and document structure make it an excellent choice for dynamic, modern, cloud-based applications.

18. What are the advantages of using MongoDB for horizontal scaling?

-> MongoDB offers several advantages when it comes to horizontal scaling, making it a powerful choice for handling large-scale applications. Horizontal scaling in MongoDB is achieved through **sharding**, which involves distributing data across multiple servers (shards). This allows MongoDB to manage very large datasets efficiently by spreading the data and read/write operations across many machines. As a result, performance improves significantly since the system can handle more concurrent requests without bottlenecks. One major benefit is **elastic scalability**—new shards can be added dynamically as the data or user base grows, eliminating the need for expensive vertical upgrades to a single server. Additionally, each shard can be part of a **replica set**, ensuring high availability and fault tolerance in case of node failures. MongoDB’s flexible schema and ability to choose a **custom shard key** also make it easier to distribute data based on application needs, such as user ID or region. Overall, MongoDB’s horizontal scaling offers improved performance, high availability, cost-efficiency, and the ability to scale applications globally with minimal complexity.


19. How do MongoDB transactions differ from SQL transactions?

-> MongoDB transactions and SQL transactions both aim to ensure **data consistency** and **integrity**, but they differ in how they are implemented and used due to the underlying architecture of each database system.

SQL databases, being relational and traditionally ACID-compliant, are built around **multi-row, multi-table transactions**. They natively support complex transactions across multiple tables by default, allowing operations like `BEGIN`, `COMMIT`, and `ROLLBACK` to be executed easily and consistently. These transactions are core to SQL databases, providing strong guarantees for consistency, isolation, and durability even in complex relational environments.

MongoDB, being a NoSQL document-based database, originally focused on single-document operations, which are atomic by default. Starting from **MongoDB 4.0**, multi-document transactions were introduced, allowing MongoDB to perform ACID-compliant operations across multiple documents and collections — much like SQL. However, unlike in SQL, MongoDB’s transactions are typically less performant at scale, and are intended for specific use cases that truly require strict consistency.

In summary, SQL databases are designed around transactions and use them extensively, while MongoDB supports transactions but encourages simpler, atomic operations within documents to maintain performance and scalability. MongoDB’s transaction model is powerful but best used only when necessary, given its NoSQL foundations.


20. What are the main differences between capped collections and regular collections?

-> Capped collections and regular collections in MongoDB differ primarily in how they store and manage data, particularly regarding **size limits, order, and update rules**.

A **capped collection** is a fixed-size collection that maintains the insertion order of documents. When the allocated size limit is reached, it **automatically overwrites the oldest documents** with new ones, similar to a circular buffer. This makes capped collections ideal for **logging, caching, and real-time analytics**, where you only need the most recent data and don't require deleting or modifying older entries.

In contrast, a **regular collection** has **no size limit** and allows full CRUD (Create, Read, Update, Delete) operations. Documents in regular collections can grow dynamically, be updated freely, and are not automatically removed. This makes them suitable for most general-purpose applications where historical data is important and flexible updates are required.

In summary, capped collections are optimized for high-throughput insertions and fixed storage size, with limited update and delete capabilities, while regular collections offer more flexibility and control over data management.


21. What is the purpose of the $match stage in MongoDB’s aggregation pipeline?

-> The **`$match`** stage in MongoDB’s aggregation pipeline is used to **filter documents** based on specified conditions, similar to the `WHERE` clause in SQL. It allows only those documents that meet the criteria to pass through to the next stage in the pipeline.

This stage is typically placed early in the pipeline to **reduce the number of documents** processed by subsequent stages, which improves performance. For example, if you're analyzing sales data and only want to work with records from a specific year or product category, `$match` will select only those relevant documents.

In summary, the purpose of `$match` is to **narrow down the dataset** to documents that satisfy particular conditions, making the aggregation process more efficient and focused.


22. How can you secure access to a MongoDB database?

-> Securing access to a MongoDB database is essential to protect data from unauthorized access, data breaches, or malicious activity. MongoDB provides several robust **security mechanisms** to safeguard your database:

1. **Authentication**: MongoDB supports user-based authentication, requiring users to provide valid credentials (username and password) to access the database. You can enable different authentication mechanisms like SCRAM, LDAP, or Kerberos depending on your environment.

2. **Authorization**: Once authenticated, MongoDB uses **role-based access control (RBAC)** to limit what each user can do. Roles define privileges such as read, write, or admin-level operations, helping you enforce the principle of least privilege.

3. **Network Access Control**: MongoDB allows you to configure **IP whitelisting** to restrict database access only to trusted hosts or applications. Firewalls and security groups should be configured to block all external access unless explicitly permitted.

4. **Encryption**: MongoDB supports **encryption at rest** and **TLS/SSL encryption in transit**. Encryption at rest protects stored data on disk, while TLS/SSL ensures data is encrypted while moving between clients and servers.

5. **Auditing**: MongoDB’s Enterprise edition includes auditing features that track database activity. This helps you monitor who accessed what data and when, useful for compliance and forensic analysis.

6. **Security Best Practices**: Disable unused features like the default localhost exception, avoid running MongoDB as a root user, update regularly, and always use strong passwords or certificate-based access.

In summary, securing MongoDB involves a layered approach: **authenticating users, restricting access, encrypting data, and monitoring activity**—all of which work together to ensure a strong and secure database environment.


23. What is MongoDB’s WiredTiger storage engine, and why is it important?

-> MongoDB’s **WiredTiger** storage engine is the default storage engine since version 3.2, and it plays a critical role in determining how data is stored, accessed, and managed on disk. WiredTiger is designed for **high performance, concurrency, and efficient storage**, making it a key factor in MongoDB’s scalability and speed.

One of the most important features of WiredTiger is its **document-level concurrency control**. Unlike the older MMAPv1 engine that used collection-level locks, WiredTiger allows multiple write operations to occur simultaneously on different documents within the same collection. This greatly improves throughput and performance, especially in write-heavy applications.

WiredTiger also provides **data compression**, which helps reduce storage costs and improves I/O efficiency by storing more data in memory and on disk. Additionally, it supports **checkpointing** and **journaling**, which are essential for data durability and crash recovery.

In summary, WiredTiger is important because it enables MongoDB to handle modern workloads with better performance, finer-grained locking, and reduced storage requirements, all while maintaining data integrity and durability.



In [35]:
!pip install pymongo



In [3]:
from pymongo import MongoClient

# Connect to your local MongoDB
client = MongoClient("mongodb://localhost:27017/")

# Access your database and collection
db = client['superstore']
orders = db['orders']

# Print a sample document
print(orders.find_one())


{'_id': ObjectId('68824eac63303348430ddffb'), 'Row ID': 1, 'Order ID': 'CA-2016-152156', 'Order Date': '11/8/2016', 'Ship Date': '11/11/2016', 'Ship Mode': 'Second Class', 'Customer ID': 'CG-12520', 'Customer Name': 'Claire Gute', 'Segment': 'Consumer', 'Country': 'United States', 'City': 'Henderson', 'State': 'Kentucky', 'Postal Code': 42420, 'Region': 'South', 'Product ID': 'FUR-BO-10001798', 'Category': 'Furniture', 'Sub-Category': 'Bookcases', 'Product Name': 'Bush Somerset Collection Bookcase', 'Sales': 261.96, 'Quantity': 2, 'Discount': 0, 'Profit': 41.9136}


 1. Write a Python script to load the Superstore dataset from a CSV file into MongoDB

In [16]:
import pandas as pd
df = pd.read_csv("superstore.csv.csv")
df.head()


Unnamed: 0,Row ID,Order ID,Order Date,Ship Date,Ship Mode,Customer ID,Customer Name,Segment,Country,City,...,Postal Code,Region,Product ID,Category,Sub-Category,Product Name,Sales,Quantity,Discount,Profit
0,1,CA-2016-152156,11/8/2016,11/11/2016,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,...,42420,South,FUR-BO-10001798,Furniture,Bookcases,Bush Somerset Collection Bookcase,261.96,2,0.0,41.9136
1,2,CA-2016-152156,11/8/2016,11/11/2016,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,...,42420,South,FUR-CH-10000454,Furniture,Chairs,"Hon Deluxe Fabric Upholstered Stacking Chairs,...",731.94,3,0.0,219.582
2,3,CA-2016-138688,6/12/2016,6/16/2016,Second Class,DV-13045,Darrin Van Huff,Corporate,United States,Los Angeles,...,90036,West,OFF-LA-10000240,Office Supplies,Labels,Self-Adhesive Address Labels for Typewriters b...,14.62,2,0.0,6.8714
3,4,US-2015-108966,10/11/2015,10/18/2015,Standard Class,SO-20335,Sean O'Donnell,Consumer,United States,Fort Lauderdale,...,33311,South,FUR-TA-10000577,Furniture,Tables,Bretford CR4500 Series Slim Rectangular Table,957.5775,5,0.45,-383.031
4,5,US-2015-108966,10/11/2015,10/18/2015,Standard Class,SO-20335,Sean O'Donnell,Consumer,United States,Fort Lauderdale,...,33311,South,OFF-ST-10000760,Office Supplies,Storage,Eldon Fold 'N Roll Cart System,22.368,2,0.2,2.5164


In [18]:
import pandas as pd
from pymongo import MongoClient

# Step 1: Load the CSV file (must be in same folder as this script or notebook)
csv_file = "superstore.csv.csv"
df = pd.read_csv(csv_file)

# Step 2: Connect to MongoDB (local instance)
client = MongoClient("mongodb://localhost:27017/")

# Step 3: Create/use database and collection
db = client["superstore"]           # Database
collection = db["orders"]           # Collection

# Step 4: Convert DataFrame to list of dictionaries
data = df.to_dict(orient="records")

# Step 5: Insert into MongoDB
collection.insert_many(data)

# Step 6: Confirm upload
print(f"✅ Inserted {len(data)} documents into MongoDB!")


✅ Inserted 9994 documents into MongoDB!


 2. Retrieve and print all documents from the Orders collection.


In [19]:
from pymongo import MongoClient

# Step 1: Connect to local MongoDB
client = MongoClient("mongodb://localhost:27017/")

# Step 2: Access database and collection
db = client["superstore"]
orders = db["orders"]

# Step 3: Retrieve and print all documents
for doc in orders.find():
    print(doc)


{'_id': ObjectId('688254d763303348430e0706'), 'Row ID': 1, 'Order ID': 'CA-2016-152156', 'Order Date': '11/8/2016', 'Ship Date': '11/11/2016', 'Ship Mode': 'Second Class', 'Customer ID': 'CG-12520', 'Customer Name': 'Claire Gute', 'Segment': 'Consumer', 'Country': 'United States', 'City': 'Henderson', 'State': 'Kentucky', 'Postal Code': 42420, 'Region': 'South', 'Product ID': 'FUR-BO-10001798', 'Category': 'Furniture', 'Sub-Category': 'Bookcases', 'Product Name': 'Bush Somerset Collection Bookcase', 'Sales': 261.96, 'Quantity': 2, 'Discount': 0, 'Profit': 41.9136}
{'_id': ObjectId('688254d763303348430e0707'), 'Row ID': 2, 'Order ID': 'CA-2016-152156', 'Order Date': '11/8/2016', 'Ship Date': '11/11/2016', 'Ship Mode': 'Second Class', 'Customer ID': 'CG-12520', 'Customer Name': 'Claire Gute', 'Segment': 'Consumer', 'Country': 'United States', 'City': 'Henderson', 'State': 'Kentucky', 'Postal Code': 42420, 'Region': 'South', 'Product ID': 'FUR-CH-10000454', 'Category': 'Furniture', 'Sub-C

IOPub data rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_data_rate_limit`.

Current values:
ServerApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
ServerApp.rate_limit_window=3.0 (secs)




{'_id': ObjectId('688257b8e0aee02cc686b816'), 'Row ID': 9941, 'Order ID': 'CA-2016-169824', 'Order Date': '12/12/2016', 'Ship Date': '12/17/2016', 'Ship Mode': 'Standard Class', 'Customer ID': 'NS-18640', 'Customer Name': 'Noel Staavos', 'Segment': 'Corporate', 'Country': 'United States', 'City': 'New York City', 'State': 'New York', 'Postal Code': 10009, 'Region': 'East', 'Product ID': 'OFF-AR-10000462', 'Category': 'Office Supplies', 'Sub-Category': 'Art', 'Product Name': 'Sanford Pocket Accent Highlighters', 'Sales': 11.2, 'Quantity': 7, 'Discount': 0.0, 'Profit': 4.816}
{'_id': ObjectId('688257b8e0aee02cc686b817'), 'Row ID': 9942, 'Order ID': 'CA-2017-164028', 'Order Date': '11/24/2017', 'Ship Date': '11/30/2017', 'Ship Mode': 'Standard Class', 'Customer ID': 'JL-15835', 'Customer Name': 'John Lee', 'Segment': 'Consumer', 'Country': 'United States', 'City': 'San Francisco', 'State': 'California', 'Postal Code': 94122, 'Region': 'West', 'Product ID': 'TEC-AC-10001772', 'Category': 

 3. Count and display the total number of documents in the Orders collection.

In [20]:
from pymongo import MongoClient

# Connect to MongoDB
client = MongoClient("mongodb://localhost:27017/")

# Access the database and collection
db = client["superstore"]
collection = db["orders"]

# Count documents
total_docs = collection.count_documents({})
print(f"📦 Total documents in 'orders' collection: {total_docs}")


📦 Total documents in 'orders' collection: 19988


4. Write a query to fetch all orders from the "West" region.

In [22]:
count = collection.count_documents({"Region": "West"})
print(f"🧾 Total West region orders: {count}")


🧾 Total West region orders: 6406


 5. Write a query to find orders where Sales is greater than 500.

In [24]:
count = collection.count_documents({"Sales": {"$gt": 500}})
print(f"📊 Total orders with Sales > 500: {count}")


📊 Total orders with Sales > 500: 2324


6. Fetch the top 3 orders with the highest Profit.

In [25]:
from pymongo import MongoClient

# Connect to MongoDB
client = MongoClient("mongodb://localhost:27017/")

# Access database and collection
db = client["superstore"]
collection = db["orders"]

# Query: Sort by Profit in descending order and limit to top 3
top_profit_orders = collection.find().sort("Profit", -1).limit(3)

# Print the top 3 orders
for order in top_profit_orders:
    print(order)


{'_id': ObjectId('688257b7e0aee02cc686abec'), 'Row ID': 6827, 'Order ID': 'CA-2016-118689', 'Order Date': '10/2/2016', 'Ship Date': '10/9/2016', 'Ship Mode': 'Standard Class', 'Customer ID': 'TC-20980', 'Customer Name': 'Tamara Chand', 'Segment': 'Corporate', 'Country': 'United States', 'City': 'Lafayette', 'State': 'Indiana', 'Postal Code': 47905, 'Region': 'Central', 'Product ID': 'TEC-CO-10004722', 'Category': 'Technology', 'Sub-Category': 'Copiers', 'Product Name': 'Canon imageCLASS 2200 Advanced Copier', 'Sales': 17499.95, 'Quantity': 5, 'Discount': 0.0, 'Profit': 8399.976}
{'_id': ObjectId('688254d963303348430e21b0'), 'Row ID': 6827, 'Order ID': 'CA-2016-118689', 'Order Date': '10/2/2016', 'Ship Date': '10/9/2016', 'Ship Mode': 'Standard Class', 'Customer ID': 'TC-20980', 'Customer Name': 'Tamara Chand', 'Segment': 'Corporate', 'Country': 'United States', 'City': 'Lafayette', 'State': 'Indiana', 'Postal Code': 47905, 'Region': 'Central', 'Product ID': 'TEC-CO-10004722', 'Category

7. Update all orders with Ship Mode as "First Class" to "Premium Class."

In [26]:
from pymongo import MongoClient

# Connect to MongoDB
client = MongoClient("mongodb://localhost:27017/")

# Access the database and collection
db = client["superstore"]
collection = db["orders"]

# Update all documents where Ship Mode is "First Class"
result = collection.update_many(
    {"Ship Mode": "First Class"},
    {"$set": {"Ship Mode": "Premium Class"}}
)

# Print result summary
print(f"✅ Matched: {result.matched_count}, Modified: {result.modified_count}")


✅ Matched: 3076, Modified: 3076


 8.  Delete all orders where Sales is less than 50.

In [27]:
from pymongo import MongoClient

# Connect to MongoDB
client = MongoClient("mongodb://localhost:27017/")

# Access the database and collection
db = client["superstore"]
collection = db["orders"]

# Delete all documents where Sales < 50
result = collection.delete_many({"Sales": {"$lt": 50}})

# Output how many were deleted
print(f"🗑️ Deleted {result.deleted_count} orders with Sales < 50")


🗑️ Deleted 9698 orders with Sales < 50


9. Use aggregation to group orders by Region and calculate total sales per region.

In [28]:
from pymongo import MongoClient

# Connect to MongoDB
client = MongoClient("mongodb://localhost:27017/")

# Access database and collection
db = client["superstore"]
collection = db["orders"]

# Aggregation pipeline
pipeline = [
    {
        "$group": {
            "_id": "$Region",                 # Group by Region
            "total_sales": {"$sum": "$Sales"} # Sum the Sales field
        }
    },
    {
        "$sort": {"total_sales": -1}           # Optional: sort by total sales descending
    }
]

# Run the aggregation
results = collection.aggregate(pipeline)

# Display results
for region in results:
    print(f"📍 Region: {region['_id']} — 💰 Total Sales: {region['total_sales']:.2f}")


📍 Region: West — 💰 Total Sales: 1389373.24
📍 Region: East — 💰 Total Sales: 1302275.41
📍 Region: Central — 💰 Total Sales: 959223.69
📍 Region: South — 💰 Total Sales: 752046.62


10. Fetch all distinct values for Ship Mode from the collection.

In [31]:
from pymongo import MongoClient

# Connect to MongoDB
client = MongoClient("mongodb://localhost:27017/")

# Access the database and collection
db = client["superstore"]
collection = db["orders"]

# Get all distinct Ship Mode values
distinct_modes = collection.distinct("Ship Mode")

# Print the results
print("📦 Distinct Ship Modes:")
for mode in distinct_modes:
    print(f"• {mode}")


📦 Distinct Ship Modes:
• Premium Class
• Same Day
• Second Class
• Standard Class


11. Count the number of orders for each category.

In [34]:


from pymongo import MongoClient

# Connect to MongoDB
client = MongoClient("mongodb://localhost:27017/")

# Access database and collection
db = client["superstore"]
collection = db["orders"]

# Aggregation pipeline
pipeline = [
    {
        "$group": {
            "_id": "$Category",          # Group by Category
            "order_count": {"$sum": 1}   # Count each document (order)
        }
    },
    {
        "$sort": {"order_count": -1}    # Optional: sort by count descending
    }
]

# Run the aggregation
results = collection.aggregate(pipeline)

# Display results
print("🗂️ Order Count per Category:")
for category in results:
    print(f"• {category['_id']}: {category['order_count']} orders")

🗂️ Order Count per Category:
• Office Supplies: 4152 orders
• Furniture: 3146 orders
• Technology: 2992 orders
