# MongoDB

Q1. What are the key differences between SQL and NoSQL databases?

->  The key differences between **SQL (Structured Query Language)** and **NoSQL (Not Only SQL)** databases lie in how they structure, store, and query data. Here's a breakdown across several core areas:

---

### 1. **Data Model**

* **SQL Databases (Relational)**:

  * Use **tables** with rows and columns.
  * Data is highly **structured** and follows a strict schema.
  * Relationships between tables are defined using **foreign keys**.

* **NoSQL Databases (Non-relational)**:

  * Use flexible data models:

    * **Document-based** (e.g., MongoDB)
    * **Key-Value** (e.g., Redis)
    * **Column-family** (e.g., Cassandra)
    * **Graph-based** (e.g., Neo4j)
  * Schema-less or dynamic schemas — better for **unstructured or semi-structured** data.

---

### 2. **Schema**

* **SQL**:

  * **Predefined schema**; changing it requires migrations.
  * Data must conform to the schema (e.g., data types, constraints).

* **NoSQL**:

  * **Flexible schema**; allows for varied data formats in the same collection or table.
  * Easier to adapt to changes in data structure.

---

### 3. **Scalability**

* **SQL**:

  * **Vertically scalable** — increase server capacity (CPU, RAM).
  * Harder to scale horizontally.

* **NoSQL**:

  * **Horizontally scalable** — add more servers to handle load.
  * Designed for distributed environments and large-scale data.

---

### 4. **Query Language**

* **SQL**:

  * Use standard **SQL** for queries (e.g., `SELECT`, `JOIN`, `WHERE`).
  * Powerful for complex queries and data relationships.

* **NoSQL**:

  * Use different querying mechanisms depending on the type (e.g., JSON-based queries in MongoDB).
  * Often optimized for **specific use cases**, not general-purpose querying.

---

### 5. **Transactions**

* **SQL**:

  * Support **ACID** (Atomicity, Consistency, Isolation, Durability) properties.
  * Reliable for applications needing complex transactions (e.g., banking).

* **NoSQL**:

  * Tend to favor **BASE** (Basically Available, Soft state, Eventually consistent).
  * Some modern NoSQL databases offer ACID compliance (e.g., MongoDB with multi-document transactions).

---

### 6. **Use Cases**

* **SQL**:

  * Ideal for applications needing structured data and complex queries (e.g., CRM systems, financial applications).
  * Preferred when data integrity is critical.

* **NoSQL**:

  * Great for large-scale data, real-time analytics, IoT, content management, or applications with changing data models (e.g., social media platforms, recommendation engines).

---

### 7. **Examples**

* **SQL Databases**:

  * MySQL, PostgreSQL, Oracle, Microsoft SQL Server

* **NoSQL Databases**:

  * MongoDB (Document), Cassandra (Column), Redis (Key-Value), Neo4j (Graph)

---

### Summary Table:

| Feature        | SQL                            | NoSQL                               |
| -------------- | ------------------------------ | ----------------------------------- |
| Data Model     | Relational (tables)            | Document, Key-Value, Graph, Column  |
| Schema         | Fixed, predefined              | Dynamic, flexible                   |
| Scalability    | Vertical                       | Horizontal                          |
| Query Language | SQL                            | Varies by DB type                   |
| Transactions   | Full ACID support              | Mostly eventual consistency         |
| Best for       | Structured data, relationships | Big data, fast changes, scalability |
| Examples       | MySQL, PostgreSQL              | MongoDB, Cassandra, Redis           |

---


Q2. What makes MongoDB a good choice for modern applications?

->  MongoDB is a popular **NoSQL, document-oriented database** that has become a strong choice for modern applications due to its **flexibility, scalability, and developer-friendly features**. Here's a breakdown of what makes it a good fit:

---

### 🔑 1. **Flexible Schema (Schema-less Design)**

* Documents are stored in **BSON (Binary JSON)** format.
* Each document can have a different structure — ideal for applications where data structures evolve quickly (e.g. startups, agile development).
* No need for rigid schema definitions like in SQL databases.

**✅ Useful for:** Rapid prototyping, frequent updates to the data model.

---

### ⚡ 2. **High Performance at Scale**

* MongoDB is designed for **high throughput and low latency**.
* Supports **horizontal scaling** through **sharding** (distributes data across multiple machines).
* Good for big data and real-time applications.

**✅ Useful for:** Web apps, mobile backends, real-time analytics.

---

### 🌐 3. **JSON-Like Documents**

* Uses **documents (BSON)** which map naturally to modern application code (especially in JavaScript/Node.js, Python, etc.).
* Reduces impedance mismatch between database and application objects.

**✅ Useful for:** Full-stack JavaScript applications (e.g., MERN stack — MongoDB, Express, React, Node.js).

---

### 🔁 4. **Powerful Query Language**

* Rich, expressive query capabilities:

  * Nested queries
  * Filtering
  * Aggregations (via the **Aggregation Pipeline**)
  * Full-text search
* Optional indexing for performance tuning.

**✅ Useful for:** Reporting, dashboards, complex data operations.

---

### 🔒 5. **Built-in Features for Modern Needs**

* **Horizontal scaling** (sharding)
* **Replication** for high availability
* **Change streams** for real-time data updates (e.g., live notifications)
* **Atlas**: Fully managed cloud version with easy scaling, backup, and monitoring.

**✅ Useful for:** Cloud-native apps, microservices, and global deployments.

---

### 🔄 6. **ACID Transactions Support**

* MongoDB 4.0+ supports **multi-document ACID transactions**, like relational databases.
* Useful when you need strong consistency across multiple documents or collections.

**✅ Useful for:** Financial apps, order processing, or anywhere atomicity is required.

---

### 🛠️ 7. **Rich Ecosystem and Tooling**

* Tools like:

  * **MongoDB Compass** (GUI for querying and visualization)
  * **MongoDB Atlas** (managed DBaaS)
  * **Drivers for most programming languages**
* Integration with popular frameworks and cloud platforms.

---

### ✅ Summary: When MongoDB Is a Good Fit

| Situation / Use Case                      | Why MongoDB Works Well                |
| ----------------------------------------- | ------------------------------------- |
| Rapid development & changing requirements | Flexible schema, easy iteration       |
| Real-time applications                    | Fast queries, change streams          |
| Large-scale web or mobile apps            | Horizontal scaling, high availability |
| JSON-heavy applications (e.g., APIs)      | Natural document model, fast I/O      |
| Distributed or cloud-native systems       | Sharding, replication, cloud support  |
| Analytics or reporting                    | Aggregation pipeline, indexing        |

---


Q3. Explain the concept of collections in MongoDB.

-> In **MongoDB**, a **collection** is the equivalent of a **table** in relational databases. It is a **group of related documents** that are stored together in the database.

Here’s a clear breakdown of the concept:

---

### 📘 What Is a Collection?

* A **collection** is a **container for documents**.
* Documents in a collection are stored in **BSON format** (a binary representation of JSON).
* Collections **do not enforce a schema**, meaning each document can have a different structure (fields, data types, etc.).

---

### 🧩 Key Characteristics of Collections

| Feature              | Description                                                                           |
| -------------------- | ------------------------------------------------------------------------------------- |
| **Schema-less**      | Documents in the same collection can have different fields and data types.            |
| **Logical grouping** | Collections logically group related data (e.g., all "users" in a `users` collection). |
| **Indexed**          | Collections can have indexes for faster search and query performance.                 |
| **Dynamic creation** | Collections are created automatically when a document is inserted.                    |

---

### 📂 Example

Let’s say you have a **`users`** collection. It might contain documents like:

```json
// Document 1
{
  "_id": ObjectId("..."),
  "name": "Alice",
  "email": "alice@example.com",
  "age": 30
}

// Document 2
{
  "_id": ObjectId("..."),
  "name": "Bob",
  "email": "bob@example.com",
  "joined": "2025-01-01",
  "isActive": true
}
```

* Both documents are in the **same collection**.
* They have different structures, and that’s completely fine in MongoDB.

---

### 🛠️ Creating a Collection

You don’t always have to manually create a collection — MongoDB creates it when you insert the first document:

```js
db.users.insertOne({ name: "Charlie", email: "charlie@example.com" });
```

But you can also create it explicitly:

```js
db.createCollection("users");
```

---

### 📌 When to Use Separate Collections

Use different collections when:

* Documents represent different types of entities (e.g., `users` vs `orders`).
* The structure of data is significantly different.
* Access patterns (queries, indexing) are different.

---

### ✅ Summary

| Term           | MongoDB              | Relational Equivalent |
| -------------- | -------------------- | --------------------- |
| **Document**   | JSON-like object     | Row                   |
| **Collection** | Group of documents   | Table                 |
| **Database**   | Contains collections | Database              |

---

Q4.  How does MongoDB ensure high availability using replication?

-> MongoDB ensures **high availability** through a feature called **replication**, implemented using a **replica set**.

---

## 🔁 What is a Replica Set?

A **replica set** is a group of MongoDB servers that maintain the **same data set**. It provides **automatic failover**, **redundancy**, and **data availability**.

### A typical replica set includes:

* **1 Primary node**
* **1 or more Secondary nodes**
* **(Optional) Arbiter node** – helps in elections but doesn't store data

---

## ⚙️ How It Works

### 1. **Primary Node**

* The only node that **receives write operations**.
* All **reads and writes** by default go to the primary.
* Keeps an **oplog** (operation log) of changes.

### 2. **Secondary Nodes**

* Continuously **replicate data** from the primary’s oplog.
* They are **read-only** unless configured for read access (read preference).
* Can be **promoted** to primary if the current primary fails.

### 3. **Automatic Failover**

* If the **primary becomes unreachable**, an **election** is triggered among the secondaries.
* A new **primary** is elected automatically — no manual intervention needed.
* This ensures minimal downtime and **high availability**.

---

## 🔐 Example: Replica Set with 3 Nodes

| Node     | Role      | Behavior                             |
| -------- | --------- | ------------------------------------ |
| `Node A` | Primary   | Handles writes, replicates to others |
| `Node B` | Secondary | Reads from primary, syncs oplog      |
| `Node C` | Secondary | Reads from primary, syncs oplog      |

If `Node A` fails:

* An election occurs.
* Either `Node B` or `Node C` becomes the new primary.
* Application continues running without downtime.

---

## 🔄 Replication Mechanics

* **Oplog** (operation log):

  * A special capped collection that records all changes on the primary.
  * Secondaries replay the oplog to apply changes in the same order.

---

## 💡 Benefits of Replication in MongoDB

| Benefit               | Description                                                  |
| --------------------- | ------------------------------------------------------------ |
| **High Availability** | Automatic failover ensures minimal downtime                  |
| **Redundancy**        | Copies of data are stored across multiple nodes              |
| **Disaster Recovery** | Can recover from hardware or network failures                |
| **Read Scalability**  | Reads can be distributed to secondaries (with configuration) |

---

## 🛠️ Optional Features for Replication

* **Read Preferences**: Route reads to secondaries for load balancing.
* **Write Concern**: Control how many nodes must acknowledge writes for durability.
* **Arbiters**: Lightweight nodes that participate in elections but don’t store data.

---

## ✅ Summary

MongoDB ensures high availability by:

* Using **replica sets** with automatic **failover**.
* Keeping **multiple copies** of data in real time.
* Supporting **self-healing** via election of a new primary if the old one fails.



Q5. What are the main benefits of MongoDB Atlas?

-> **MongoDB Atlas** is the **fully managed cloud version** of MongoDB, designed to simplify database deployment, scaling, and management across major cloud providers (AWS, Azure, GCP). It offers many benefits that are especially valuable for modern, cloud-native applications.

Here are the **main benefits of using MongoDB Atlas**:

---

## 🚀 1. **Fully Managed Service**

* No need to worry about infrastructure, setup, patching, upgrades, or backups — Atlas handles it all.
* Saves time and reduces DevOps overhead.

**✅ Benefit:** Focus on building applications, not managing databases.

---

## 🌍 2. **Multi-Cloud & Global Clusters**

* Deploy across **AWS, Azure, or Google Cloud** — or even span multiple cloud providers.
* Support for **global clusters**: store data close to users for low-latency access.

**✅ Benefit:** Global availability, vendor flexibility, and reduced latency.

---

## 📈 3. **Automatic Scalability**

* **Vertical scaling**: Adjust CPU/RAM on-demand.
* **Horizontal scaling**: Add shards as needed for large datasets or high throughput.
* **Autoscaling**: Automatically scales storage and compute resources based on usage.

**✅ Benefit:** Effortless growth as your app scales.

---

## 🔐 4. **Built-in Security**

* End-to-end encryption (in-transit and at-rest).
* VPC peering, IP whitelisting, and private endpoints.
* Role-based access control (RBAC) and integration with identity providers (e.g., LDAP, Okta).

**✅ Benefit:** Enterprise-grade security without complex setup.

---

## 💾 5. **Automated Backups & Recovery**

* Continuous backups with point-in-time recovery.
* Snapshot-based backups for long-term data protection.

**✅ Benefit:** Data safety and easy disaster recovery.

---

## 📊 6. **Monitoring & Performance Tools**

* Real-time performance dashboards and metrics.
* Integrated **performance advisor** and **query profiler** to detect slow queries and suggest indexes.

**✅ Benefit:** Easy performance tuning and troubleshooting.

---

## 🔄 7. **Data Sync, Triggers, and Real-Time Features**

* **Change Streams**: React to real-time data changes.
* **Triggers**: Run serverless functions on document changes.
* **Data Lake**: Query data across multiple sources (Atlas, S3, etc.) using the MongoDB query language.

**✅ Benefit:** Build real-time, event-driven, or analytics-heavy applications more easily.

---

## 🧰 8. **Developer-Friendly Tools & Integrations**

* Atlas integrates with:

  * MongoDB Compass
  * MongoDB Realm (for mobile/backend sync)
  * CI/CD pipelines
  * Infrastructure-as-code (Terraform, CloudFormation)
* Native SDKs for JavaScript, Python, Java, Go, and more.

**✅ Benefit:** Fast integration into existing dev workflows and toolchains.

---

## 🛡️ 9. **High Availability by Default**

* Every Atlas cluster runs as a **replica set**, ensuring redundancy and failover.
* SLA-backed uptime depending on the tier (often >99.99%).

**✅ Benefit:** Minimal downtime, automatic recovery from failures.

---

## 💳 10. **Flexible Pricing**

* Free tier available (shared cluster with 512MB storage).
* Pay-as-you-go pricing model with predictable costs.
* Suitable for startups to enterprises.

**✅ Benefit:** Cost-effective for all stages of development.

---

## ✅ Summary: Why Use MongoDB Atlas?

| Feature                | Benefit                                     |
| ---------------------- | ------------------------------------------- |
| Fully managed          | No ops headaches                            |
| Cloud-agnostic         | Use AWS, GCP, or Azure                      |
| Autoscaling            | Scales compute and storage automatically    |
| Security & compliance  | Built-in security and regulatory compliance |
| Real-time features     | Triggers, change streams                    |
| High availability      | Automatic failover with replica sets        |
| Dev-friendly ecosystem | Integrates with your tools and languages    |
| Global deployments     | Low-latency access worldwide                |

---

Q6.  What is the role of indexes in MongoDB, and how do they improve performance?

-> ### 🔍 What Is the Role of Indexes in MongoDB?

In **MongoDB**, **indexes** play a crucial role in **optimizing query performance** by allowing the database to **quickly locate and access the data** you're searching for—without scanning the entire collection.

Think of an index like the **table of contents in a book**: instead of flipping through every page to find a topic, you go straight to the page listed in the index.

---

## ⚙️ How Indexes Work in MongoDB

When you run a query on a MongoDB collection:

* **Without an index**: MongoDB performs a **collection scan** — it checks every document in the collection.
* **With an index**: MongoDB uses the **index to jump directly** to the documents that match the query.

---

## 🚀 Benefits of Indexes

| Benefit                       | Explanation                                                                    |
| ----------------------------- | ------------------------------------------------------------------------------ |
| **Improved Read Performance** | Speeds up queries by avoiding full collection scans.                           |
| **Efficient Sorting**         | Allows MongoDB to sort results using the index, without loading all documents. |
| **Faster Joins (\$lookup)**   | Indexed fields used in joins significantly reduce lookup time.                 |
| **Support for Uniqueness**    | Enforce uniqueness on fields (e.g., unique email addresses).                   |
| **Improved Aggregations**     | Speeds up aggregation pipelines that filter or group by indexed fields.        |

---

## 🧠 Types of Indexes in MongoDB

| Index Type             | Description                                                                 |
| ---------------------- | --------------------------------------------------------------------------- |
| **Single Field Index** | Indexes a single field. Most basic and common.                              |
| **Compound Index**     | Indexes multiple fields in a specified order.                               |
| **Multikey Index**     | Automatically created for array fields — indexes each element in the array. |
| **Text Index**         | Enables full-text search on string content.                                 |
| **Geospatial Index**   | For location-based queries (e.g., near, within a radius).                   |
| **Hashed Index**       | Supports sharding by hashing field values.                                  |
| **Wildcard Index**     | Indexes all fields or unknown/dynamic fields — useful for flexible schemas. |

---

## 📌 Example: Basic Index Usage

### Collection:

```json
{
  "name": "Alice",
  "email": "alice@example.com",
  "age": 30
}
```

### Creating an Index:

```js
db.users.createIndex({ email: 1 });  // 1 for ascending order
```

### Query Before Index:

```js
db.users.find({ email: "alice@example.com" });
```

* Without an index: Full scan of every document.
* With index: MongoDB jumps straight to matching documents.

---

## 📊 Tools to Analyze Index Performance

* **Explain Plans (`.explain()`)**: Shows how MongoDB executes a query and whether it uses an index.

  ```js
  db.users.find({ email: "alice@example.com" }).explain("executionStats");
  ```

* **Atlas Performance Advisor**: Recommends indexes based on your workload.

---

## ⚠️ Considerations & Trade-offs

| Trade-off            | Description                                                              |
| -------------------- | ------------------------------------------------------------------------ |
| **Storage Overhead** | Indexes consume disk space.                                              |
| **Write Overhead**   | Every insert/update/delete must also update the index — can slow writes. |
| **Too Many Indexes** | May lead to inefficient query planner decisions and higher maintenance.  |

---

## ✅ Best Practices

* Index fields used frequently in **filters, sorts, and joins**.
* Use **compound indexes** to cover multiple query fields in order.
* Monitor performance and **remove unused indexes**.
* Use **covered queries** (where all needed fields are in the index) for maximum performance.

---

## 🔚 Summary

| Feature         | Role of Indexes in MongoDB                        |
| --------------- | ------------------------------------------------- |
| **Speed**       | Drastically improves query and read performance   |
| **Efficiency**  | Reduces CPU and memory usage for large queries    |
| **Sorting**     | Enables fast, index-based sorting                 |
| **Uniqueness**  | Enforces data integrity rules                     |
| **Scalability** | Essential for performant large-scale applications |

---

Q7. Describe the stages of the MongoDB aggregation pipeline.

-> The **MongoDB Aggregation Pipeline** is a powerful framework used for **data transformation and analysis**. It processes data through a sequence of **stages**, where each stage performs a specific operation on the input documents and passes the results to the next stage — much like a data processing pipeline.

---

## 🔄 Key Concept

* Think of the aggregation pipeline as a **series of filters or transformations** applied step-by-step to your documents.
* It’s used for **grouping, filtering, projecting, sorting, joining, and calculating aggregates** — often more efficiently than traditional queries.

---

## 📊 Common Aggregation Pipeline Stages

Here are the most frequently used stages:

---

### 1. **`$match`** – Filter Documents

* Filters documents to **pass only those that match the condition**.
* Equivalent to a **`find()`** query.

```js
{ $match: { status: "active" } }
```

✅ **Use early in the pipeline** to reduce data as soon as possible.

---

### 2. **`$project`** – Reshape the Output

* Used to **include, exclude, rename, or compute fields** in the output.

```js
{ $project: { name: 1, email: 1, fullName: { $concat: ["$firstName", " ", "$lastName"] } } }
```

---

### 3. **`$group`** – Aggregate Documents

* Groups documents by a key and **applies accumulators** (e.g., `$sum`, `$avg`, `$max`, `$push`, etc.).

```js
{ $group: { _id: "$department", totalEmployees: { $sum: 1 } } }
```

---

### 4. **`$sort`** – Sort Results

* Sorts the documents based on field(s).

```js
{ $sort: { createdAt: -1 } }  // Descending order
```

---

### 5. **`$limit` and `$skip`** – Pagination

* **`$limit`**: Restricts the number of documents that pass through.
* **`$skip`**: Skips a specified number of documents.

```js
{ $skip: 10 }
{ $limit: 5 }
```

---

### 6. **`$unwind`** – Deconstruct Arrays

* Breaks up an **array field into multiple documents**, one per array element.

```js
{ $unwind: "$tags" }
```

---

### 7. **`$lookup`** – Perform Joins

* Joins documents from another collection (like SQL joins).

```js
{
  $lookup: {
    from: "orders",
    localField: "_id",
    foreignField: "customerId",
    as: "orders"
  }
}
```

---

### 8. **`$addFields`** – Add Computed Fields

* Adds new fields or updates existing ones.

```js
{ $addFields: { yearJoined: { $year: "$joinDate" } } }
```

---

### 9. **`$count`** – Count Documents

* Outputs a count of the number of documents in the pipeline.

```js
{ $count: "totalActiveUsers" }
```

---

### 10. **`$facet`** – Multiple Pipelines in Parallel

* Runs **multiple sub-pipelines** in parallel and returns their results in a single document.

```js
{
  $facet: {
    groupedByStatus: [ { $group: { _id: "$status", count: { $sum: 1 } } } ],
    recentDocs: [ { $sort: { createdAt: -1 } }, { $limit: 5 } ]
  }
}
```

---

## 📌 Example Aggregation Pipeline

```js
db.users.aggregate([
  { $match: { status: "active" } },
  { $group: { _id: "$role", count: { $sum: 1 } } },
  { $sort: { count: -1 } }
]);
```

### This pipeline:

1. Filters for active users.
2. Groups them by role and counts each group.
3. Sorts the result by count in descending order.

---

## ✅ Summary Table

| Stage        | Purpose                                |
| ------------ | -------------------------------------- |
| `$match`     | Filter documents                       |
| `$project`   | Select or transform fields             |
| `$group`     | Aggregate data (e.g., count, sum)      |
| `$sort`      | Order the results                      |
| `$limit`     | Limit the number of results            |
| `$skip`      | Skip documents (for pagination)        |
| `$unwind`    | Flatten arrays                         |
| `$lookup`    | Join data from another collection      |
| `$addFields` | Add or modify fields                   |
| `$facet`     | Run multiple sub-pipelines in parallel |
| `$count`     | Count documents                        |

---

Q8. What is sharding in MongoDB? How does it differ from replication?

-> Great question! **Sharding** and **replication** in MongoDB are both strategies for handling large-scale data and ensuring availability, but they serve very **different purposes**.

Let’s break them down clearly:

---

## 🌐 **What is Sharding in MongoDB?**

### ➤ **Definition:**

**Sharding** is the process of **horizontally partitioning** data across multiple servers, called **shards**, to support large datasets and high-throughput operations.

Each **shard** contains a subset of the total data. MongoDB uses a special **shard key** to determine which data belongs to which shard.

---

### 🔀 **How Sharding Works:**

1. **Data Distribution:**

   * You define a **shard key** (e.g., `userId`, `region`, or `timestamp`).
   * MongoDB splits the data into **chunks** based on this key.
   * Chunks are distributed across multiple shards.

2. **Cluster Components:**

   * **Shards:** Actual databases holding the data.
   * **Query Router (mongos):** Routes client queries to the correct shard(s).
   * **Config Servers:** Store metadata and configuration info for the cluster.

3. **Automatic Load Balancing:**

   * As data grows, MongoDB automatically balances chunks across shards to maintain even load.

---

### ✅ **Benefits of Sharding:**

| Benefit                    | Description                                   |
| -------------------------- | --------------------------------------------- |
| **Horizontal scalability** | Supports massive data volumes                 |
| **High throughput**        | Spreads read/write load across multiple nodes |
| **Big data support**       | Scales beyond single server limitations       |

---

## 🔁 **What is Replication in MongoDB?**

### ➤ **Definition:**

**Replication** is the process of **copying data** across multiple servers to ensure **high availability** and **data redundancy**.

MongoDB achieves this using **replica sets**.

---

### 🔄 **How Replication Works:**

1. **Primary Node:** Handles all writes and by default all reads.
2. **Secondary Nodes:** Copy the primary’s data and can serve reads if configured.
3. **Automatic Failover:** If the primary fails, a secondary is elected to take over.

---

### ✅ **Benefits of Replication:**

| Benefit               | Description                                   |
| --------------------- | --------------------------------------------- |
| **High availability** | Automatic failover if primary node goes down  |
| **Redundancy**        | Multiple copies protect against data loss     |
| **Read scalability**  | Distribute read operations across secondaries |

---

## 🔄 vs 🔀 **Key Differences: Sharding vs Replication**

| Feature                | **Sharding**                                       | **Replication**                                     |
| ---------------------- | -------------------------------------------------- | --------------------------------------------------- |
| **Goal**               | Scale data horizontally                            | Ensure high availability and redundancy             |
| **Data distribution**  | Each node holds a different subset of data         | All nodes have a full copy of the data              |
| **Scalability**        | Handles very large datasets                        | Limited by size of single dataset                   |
| **Failure handling**   | Not focused on availability (use with replication) | Automatic failover with replica sets                |
| **Read/Write Scaling** | Improves write and read throughput                 | Improves read throughput (if read from secondaries) |
| **Components**         | Shards, mongos, config servers                     | Primary, secondaries, arbiter (optional)            |

---

## 🔧 **Can You Use Both Together?**

Yes! **Sharded clusters often use replication within each shard** for both scalability **and** high availability.

✅ This setup:

* Uses **sharding** to distribute data.
* Uses **replication** to protect each shard from failure.

---

## 🧠 Quick Analogy:

* **Sharding** = Spreading books across different shelves to handle a growing library.
* **Replication** = Making copies of the same book so it’s always available even if one copy gets lost.

---

Q9.  What is PyMongo, and why is it used?

-> ### 🐍 What is **PyMongo**?

**PyMongo** is the **official Python driver** for **MongoDB**.
It allows Python applications to **connect to, interact with, and perform operations** on a MongoDB database.

---

## 📦 In Simple Terms:

PyMongo is a **library** that lets you do things like:

* Connect to a MongoDB server or cluster
* Insert, update, delete, and query documents
* Work with collections and databases
* Use MongoDB features like aggregation pipelines, indexes, and more

---

## 🧰 Why is PyMongo Used?

Here are the main reasons **PyMongo** is used in Python applications:

---

### ✅ 1. **Easy Integration with Python**

* PyMongo uses familiar **Python syntax and data types** (like dictionaries for documents).
* Ideal for developers building **Python backends**, scripts, or data pipelines.

```python
from pymongo import MongoClient

client = MongoClient("mongodb://localhost:27017")
db = client["mydatabase"]
collection = db["users"]

collection.insert_one({"name": "Alice", "email": "alice@example.com"})
```

---

### ✅ 2. **Full Access to MongoDB Features**

* Supports all major MongoDB operations:

  * CRUD (Create, Read, Update, Delete)
  * Aggregation pipelines
  * Indexes
  * Transactions
  * Change streams
  * GridFS (for storing large files)

---

### ✅ 3. **Great for Data-Driven Applications**

* Works well for:

  * Web apps using frameworks like **Flask** or **Django**
  * Data analytics and ETL jobs
  * Prototyping apps that need flexible schemas

---

### ✅ 4. **MongoDB Atlas Compatibility**

* PyMongo can easily connect to **MongoDB Atlas**, the fully managed cloud version of MongoDB.

```python
client = MongoClient("mongodb+srv://<user>:<password>@cluster.mongodb.net")
```

---

### ✅ 5. **Strong Community and Documentation**

* As the **official MongoDB driver**, it’s well-supported and regularly updated.
* Comprehensive documentation available at:
  👉 [https://pymongo.readthedocs.io](https://pymongo.readthedocs.io)

---

## 🔄 Summary Table

| Feature                | PyMongo Advantage                          |
| ---------------------- | ------------------------------------------ |
| MongoDB Integration    | Official driver for full MongoDB support   |
| Python-Friendly        | Uses native Python data types (dict, list) |
| Flexible Data Handling | Works well with MongoDB's dynamic schema   |
| Rich Functionality     | Supports CRUD, aggregation, transactions   |
| Cloud-Ready            | Connects to MongoDB Atlas with ease        |

---

## 🧪 Example Use Case

Build a user login system:

* Store users and passwords in MongoDB.
* Use PyMongo in a Flask app to handle registration and login.
* Secure and scalable for small to large projects.

---

Q10 . What are the ACID properties in the context of MongoDB transactions?

-> In the context of **MongoDB transactions**, **ACID** properties ensure that **data remains accurate, consistent, and reliable**, even in the event of failures or concurrent operations.

Here’s what each **ACID** property means and how MongoDB supports it:

---

## 🔐 ACID in MongoDB

| **ACID Property**   | **What It Means**                                                                                     | **How MongoDB Supports It**                                                                                                                    |
| ------------------- | ----------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------- |
| **A – Atomicity**   | **All or nothing**: A transaction's operations either all succeed or all fail.                        | MongoDB transactions are atomic — if one operation in a multi-document transaction fails, the entire transaction is rolled back.               |
| **C – Consistency** | Data must move from one **valid state to another**. Rules (like constraints) must always be followed. | MongoDB maintains data integrity by ensuring that constraints, schemas (if used), and validation rules are respected during transactions.      |
| **I – Isolation**   | Concurrent transactions should **not interfere** with each other.                                     | MongoDB provides **snapshot isolation**, so each transaction sees a consistent snapshot of the data as it was at the start of the transaction. |
| **D – Durability**  | Once a transaction is committed, its changes are **permanently saved**, even after crashes.           | MongoDB writes transactions to the journal and uses write concern levels to ensure data is safely written to disk.                             |

---

## ✅ ACID Support in MongoDB

| Feature                      | Available Since                 |
| ---------------------------- | ------------------------------- |
| Single-document atomicity    | Since the beginning             |
| Multi-document transactions  | **MongoDB 4.0+ (replica sets)** |
| Sharded cluster transactions | **MongoDB 4.2+**                |

---

## 🧪 Example of ACID in Action

Imagine transferring money between two accounts:

```python
with client.start_session() as session:
    with session.start_transaction():
        accounts.update_one({"name": "Alice"}, {"$inc": {"balance": -100}}, session=session)
        accounts.update_one({"name": "Bob"}, {"$inc": {"balance": 100}}, session=session)
```

If either update fails, the **entire transaction is rolled back**, ensuring **atomicity and consistency**.

---

## 🛡️ Write Concern & Read Concern in ACID

* **Write Concern**: Ensures data is safely written to disk (controls **durability**).
* **Read Concern**: Ensures consistency and isolation when reading within a transaction.

---

## 🚫 Without ACID?

Without ACID guarantees:

* Partial updates might corrupt data.
* Concurrent operations might interfere.
* Crashes could leave inconsistent states.

MongoDB’s ACID support ensures **safe and predictable operations**, especially important for **financial apps**, **inventory systems**, or **order processing**.

---

Q11.   What is the purpose of MongoDB’s explain() function?

->  Great question!

### 🔎 What is MongoDB’s `explain()` Function?

The **`explain()`** function in MongoDB is a **diagnostic tool** that provides detailed information about how MongoDB executes a query or an aggregation pipeline.

---

## 🎯 Purpose of `explain()`

* **Analyze query performance:** Understand how MongoDB processes a query.
* **Identify if indexes are used:** Check whether the query uses an index or performs a full collection scan.
* **View query plan details:** See the stages MongoDB uses internally to retrieve results.
* **Optimize queries:** Find bottlenecks or inefficient parts of a query.
* **Compare different query shapes or indexes:** Test how changes impact performance.

---

## 🛠 How to Use `explain()`

You attach `explain()` to a query, for example:

```js
db.users.find({ age: { $gt: 30 } }).explain();
```

Or with options to get more detail:

```js
db.users.find({ age: { $gt: 30 } }).explain("executionStats");
```

---

## 📊 Output Levels

* **`queryPlanner`** (default): Shows the query plan selected by MongoDB’s optimizer.
* **`executionStats`**: Adds detailed statistics about the actual execution of the query (e.g., number of documents scanned, keys examined).
* **`allPlansExecution`**: Includes info on all considered query plans and execution stats for each.

---

## 🔍 What `explain()` Tells You

* Whether an **index** was used or a **COLLSCAN** (collection scan) happened.
* Number of **documents examined** vs. number returned.
* Stages and operations performed internally (e.g., FETCH, IXSCAN).
* Estimated costs and winning plan chosen.
* Execution time and resource usage.

---

## ✅ Why Use `explain()`?

| Benefit                          | Explanation                                          |
| -------------------------------- | ---------------------------------------------------- |
| **Query optimization**           | Improve query speed by understanding query execution |
| **Index effectiveness check**    | Verify if indexes are properly used                  |
| **Performance debugging**        | Identify slow queries and bottlenecks                |
| **Planning schema/index design** | Make informed decisions on indexes and schema        |

---

Q12.  How does MongoDB handle schema validation?

->  Great question! Here's how **MongoDB handles schema validation**:

---

## 📋 What is Schema Validation in MongoDB?

MongoDB is a **schema-less** (or schema-flexible) database by design, meaning it **does not require a fixed schema** for documents in a collection. However, to enforce some structure and data integrity, MongoDB provides **schema validation** rules at the collection level.

---

## ⚙️ How Schema Validation Works

* You define **validation rules** using **JSON Schema** (a standard for describing the structure and constraints of JSON documents).
* When documents are inserted or updated, MongoDB **validates** them against these rules.
* If a document **does not comply**, the operation is **rejected**.

---

## 🛠 Defining Validation Rules

You specify rules when creating or modifying a collection using the `validator` option.

### Example: Creating a Collection with Validation

```js
db.createCollection("users", {
  validator: {
    $jsonSchema: {
      bsonType: "object",
      required: ["name", "email", "age"],
      properties: {
        name: {
          bsonType: "string",
          description: "must be a string and is required"
        },
        email: {
          bsonType: "string",
          pattern: "^.+@.+\\..+$",
          description: "must be a valid email address"
        },
        age: {
          bsonType: "int",
          minimum: 18,
          maximum: 120,
          description: "must be an integer in [18, 120]"
        }
      }
    }
  }
});
```

---

## 📝 Validation Levels and Actions

* **Validation Levels**:

  * **strict** (default): All inserts and updates must pass validation.
  * **moderate**: Only new documents or updated fields are validated; existing docs remain unaffected.

* **Validation Actions**:

  * **error** (default): Reject documents that fail validation.
  * **warn**: Allow documents but log warnings.

Example modifying existing collection validation:

```js
db.runCommand({
  collMod: "users",
  validator: { /* validation rules */ },
  validationLevel: "moderate",
  validationAction: "warn"
});
```

---

## ✅ Benefits of Schema Validation in MongoDB

* Enforces **data integrity** without sacrificing flexibility.
* Helps **catch errors early** at the database layer.
* Allows gradual enforcement — no need for rigid schemas upfront.
* Works seamlessly with MongoDB’s flexible document model.

---

## Summary

| Feature          | Description                            |
| ---------------- | -------------------------------------- |
| Schema Type      | Flexible, but optionally enforce rules |
| Validation Rules | Defined using JSON Schema              |
| Enforcement      | On insert/update operations            |
| Levels           | strict or moderate                     |
| Actions          | error (reject) or warn (log but allow) |

---

Q13. What is the difference between a primary and a secondary node in a replica set?

-> Great question! Here's a clear breakdown of the difference between **primary** and **secondary nodes** in a MongoDB replica set:

---

## 🔑 Primary vs Secondary Nodes in a MongoDB Replica Set

| Aspect              | Primary Node                                                                        | Secondary Node                                                                                        |
| ------------------- | ----------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------- |
| **Role**            | Handles all **write operations** and reads by default                               | Maintains a **copy** of the primary’s data and can serve read operations if configured                |
| **Data Updates**    | Accepts all **writes and updates** from clients                                     | Replicates data by **applying oplog entries** from the primary asynchronously                         |
| **Read Operations** | Default target for reads unless configured otherwise                                | Can serve reads if **read preference** is set to allow it (e.g., `secondary` or `secondaryPreferred`) |
| **Failover**        | If the primary fails, an **election occurs** and a secondary is promoted to primary | Participates in elections to become primary if needed                                                 |
| **Number of Nodes** | Only **one primary** per replica set                                                | Can have **multiple secondary** nodes                                                                 |
| **Writes**          | Only primary accepts writes                                                         | Secondary nodes are **read-only replicas**                                                            |
| **Oplog**           | Generates the oplog entries for replication                                         | Applies oplog entries to stay in sync                                                                 |

---

## 🛠 Summary

* The **primary** node is the **main** server that clients write to.
* The **secondary** nodes are **read-only copies** that replicate from the primary for redundancy and failover.
* If the primary goes down, a secondary is elected to become the new primary, ensuring **high availability**.

---

Q14.  What security mechanisms does MongoDB provide for data protection?

-> Great question! MongoDB offers a robust set of **security mechanisms** to protect your data at rest, in transit, and control access. Here’s a breakdown of the key features:

---

## 🔐 MongoDB Security Mechanisms for Data Protection

### 1. **Authentication**

* Verifies the identity of users connecting to the database.
* Supports multiple mechanisms:

  * **SCRAM (Salted Challenge Response Authentication Mechanism)** — default and most common.
  * **LDAP** — integrates with enterprise directory services.
  * **x.509 Certificates** — for mutual TLS authentication.
  * **Kerberos** — for secure single sign-on (SSO) environments.

---

### 2. **Authorization (Access Control)**

* Controls what authenticated users can do.
* Uses **Role-Based Access Control (RBAC)** to assign granular permissions.
* Built-in roles (e.g., read, readWrite, dbAdmin) and ability to create custom roles.
* Restricts access at database, collection, or even operation level.

---

### 3. **Encryption**

* **Encryption in Transit**:

  * Uses **TLS/SSL** to encrypt data moving between clients and servers.
  * Prevents eavesdropping and man-in-the-middle attacks.

* **Encryption at Rest**:

  * **Encrypted Storage Engines** available in MongoDB Enterprise.
  * Data files are encrypted on disk.
  * Also supports integration with external Key Management Systems (KMS) for key management.

---

### 4. **Auditing**

* Tracks database activity such as authentication attempts, CRUD operations, and administrative actions.
* Useful for compliance and forensic analysis.
* Configurable to capture relevant events.

---

### 5. **Network Security**

* **IP Whitelisting / Network Access Controls** restrict which clients can connect.
* **Firewall rules** and **VPC peering** (especially in MongoDB Atlas).
* Supports **Virtual Private Cloud (VPC)** and private endpoints in managed cloud services.

---

### 6. **Data Masking & Redaction**

* Available in MongoDB Enterprise for limiting sensitive data exposure in query results.

---

### 7. **Backup and Recovery**

* MongoDB provides tools for creating consistent backups, including point-in-time recovery options, which help protect against data loss.

---

## 📋 Summary Table

| Security Mechanism      | Purpose                                 |
| ----------------------- | --------------------------------------- |
| Authentication          | Verify user identity                    |
| Authorization (RBAC)    | Control user permissions                |
| TLS/SSL Encryption      | Secure data in transit                  |
| Encryption at Rest      | Secure data stored on disk              |
| Auditing                | Monitor and log database activity       |
| Network Access Controls | Restrict network connections            |
| Data Masking/Redaction  | Protect sensitive data in query outputs |
| Backup and Recovery     | Protect against data loss               |

---

Q15.  Explain the concept of embedded documents and when they should be used.

-> Absolutely! Here's a clear explanation of **embedded documents** in MongoDB and when to use them:

---

## 📦 What Are Embedded Documents?

In MongoDB, **embedded documents** are **documents nested inside other documents**. Instead of storing related data in separate collections (like in relational databases), you can store related information **within the same document** as a sub-document.

### Example:

```json
{
  "_id": 1,
  "name": "Alice",
  "address": {
    "street": "123 Maple St",
    "city": "Springfield",
    "zip": "12345"
  }
}
```

Here, `address` is an **embedded document** inside the `users` document.

---

## 🛠 How Embedded Documents Work

* Embedded docs are stored **inline** inside the parent document.
* They can be nested multiple levels deep.
* They support all BSON data types, including arrays and other embedded docs.

---

## 🔍 When to Use Embedded Documents

### Use embedding when:

| Scenario                                              | Reason                                                      |
| ----------------------------------------------------- | ----------------------------------------------------------- |
| **Related data is frequently accessed together**      | Embedding avoids costly joins or multiple queries.          |
| **Data has a one-to-few relationship**                | E.g., user profile with a few addresses or contact numbers. |
| **Data is tightly coupled and changes together**      | Ensures atomic updates to the parent and child data.        |
| **You want faster read performance for related data** | All needed data is retrieved in a single document fetch.    |

### Avoid embedding when:

| Scenario                                                   | Reason                                                                                             |
| ---------------------------------------------------------- | -------------------------------------------------------------------------------------------------- |
| **One-to-many relationships with large or growing arrays** | Large embedded arrays can grow documents beyond MongoDB’s 16MB limit and cause performance issues. |
| **Data is accessed or updated independently**              | Separate collections reduce unnecessary data transfer or locking.                                  |
| **Data needs separate indexing or querying**               | Indexes on embedded arrays are limited; separate collections may be better.                        |

---

## 🧩 Embedded vs Referenced Documents

* **Embedded documents** = store related data **inside** one document.
* **Referenced documents** = store related data in **separate collections**, link via IDs.

Embedding favors **read efficiency and atomicity**, referencing favors **scalability and flexibility**.

---

## 💡 Summary

| Embedded Documents                           |
| -------------------------------------------- |
| Store related data **inside** the parent doc |
| Best for **one-to-few** relationships        |
| Improves read performance with fewer queries |
| Supports atomic updates on whole document    |
| Beware of document size limit (16MB)         |

---

Q16.  What is the purpose of MongoDB’s $lookup stage in aggregation?

-> Great question!

### 🔗 What is MongoDB’s `$lookup` Stage?

The **`$lookup`** stage in MongoDB’s **aggregation pipeline** allows you to perform a **left outer join** between documents from two collections. It lets you **combine related data from different collections** into a single query result.

---

## 🎯 Purpose of `$lookup`

* To **join documents** from a **foreign collection** to the current aggregation pipeline.
* Helps you **enrich data** by embedding related documents from another collection.
* Allows you to perform complex queries that would otherwise require multiple queries or application-side joins.

---

## 🛠 How `$lookup` Works

You specify:

* `from`: The target collection to join.
* `localField`: Field from the input documents.
* `foreignField`: Field from the `from` collection.
* `as`: The name of the new array field to add to the input documents with matching documents.

---

### Example:

Suppose you have two collections:

* `orders`:

```json
{ "_id": 1, "product_id": 101, "quantity": 2 }
```

* `products`:

```json
{ "_id": 101, "name": "Laptop", "price": 1200 }
```

Use `$lookup` to join `products` info into `orders`:

```js
db.orders.aggregate([
  {
    $lookup: {
      from: "products",
      localField: "product_id",
      foreignField: "_id",
      as: "productDetails"
    }
  }
])
```

**Result:**

```json
{
  "_id": 1,
  "product_id": 101,
  "quantity": 2,
  "productDetails": [
    { "_id": 101, "name": "Laptop", "price": 1200 }
  ]
}
```

---

## ✅ Key Points

| Feature               | Description                                                                                     |
| --------------------- | ----------------------------------------------------------------------------------------------- |
| **Type of join**      | Left outer join                                                                                 |
| **Output**            | Adds an array of matching docs                                                                  |
| **Use case**          | Combine related data from two collections                                                       |
| **Supports pipeline** | Since MongoDB 3.6, supports joining with pipeline queries (`$lookup` with `let` and `pipeline`) |

---

Q17.  What are some common use cases for MongoDB?

-> Absolutely! MongoDB is super popular because it’s flexible and scalable, making it fit well in lots of scenarios. Here are some common use cases:

---

## 🏆 Common Use Cases for MongoDB

### 1. **Content Management Systems (CMS) & Blogs**

* Stores diverse content types with flexible schemas (articles, comments, metadata).
* Easy to evolve schema as content types change.
* Supports rich querying and indexing for fast content delivery.

---

### 2. **Real-Time Analytics and Big Data**

* Handles large volumes of semi-structured data.
* Aggregation framework enables fast data processing and analytics.
* Scales horizontally to ingest data streams and perform real-time analysis.

---

### 3. **Internet of Things (IoT) Applications**

* Stores sensor data and device logs with varying formats.
* Handles high write loads and time-series data efficiently.
* Flexible schema accommodates new device types and data structures easily.

---

### 4. **Mobile and Social Media Apps**

* Manages user profiles, posts, comments, likes, and social graphs.
* Supports rapid development with evolving data models.
* Offers high availability and scalability for large user bases.

---

### 5. **E-commerce and Catalog Management**

* Stores product catalogs with variable attributes.
* Supports complex queries and indexing for fast product searches.
* Handles inventory updates and order processing transactions.

---

### 6. **Gaming**

* Stores player profiles, game states, scores, and leaderboards.
* Supports real-time data updates and retrieval.
* Scales to millions of concurrent users with flexible data models.

---

### 7. **Catalog and Inventory Management**

* Models complex products with nested attributes.
* Efficiently handles inventory levels, pricing changes, and product variants.

---

### 8. **Personalization and Recommendation Engines**

* Stores user behavior data and preferences.
* Supports fast retrieval and aggregation for generating personalized content.

---

### 9. **Healthcare Applications**

* Stores patient records, clinical notes, and medical images.
* Flexible schema accommodates diverse healthcare data.
* Ensures compliance with security and privacy requirements.

---

## 🔑 Why MongoDB Fits These Use Cases

* **Flexible schema** to adapt to changing requirements.
* **Scalable** across distributed clusters.
* **Rich querying and aggregation** capabilities.
* **High availability** via replication.
* **Strong developer ecosystem** and integrations.

---

Q18. What are the advantages of using MongoDB for horizontal scaling?

-> Great topic! Here are the main advantages of using **MongoDB for horizontal scaling**:

---

## 🚀 Advantages of MongoDB for Horizontal Scaling

### 1. **Native Sharding Support**

* MongoDB has built-in **sharding**, which partitions data across multiple servers (shards).
* Each shard holds a subset of the data, allowing you to distribute the load horizontally.

### 2. **Automatic Data Distribution**

* MongoDB automatically **balances data across shards** to ensure even distribution.
* It moves chunks of data behind the scenes without downtime.

### 3. **Scales Out with Commodity Hardware**

* You can add more shards to scale out your cluster.
* No need for expensive, high-end servers—just add more affordable machines.

### 4. **High Throughput and Low Latency**

* Distributing data means queries and writes can be parallelized across shards.
* This reduces bottlenecks and improves overall performance.

### 5. **Flexible Shard Keys**

* You choose a shard key to optimize data distribution based on your query patterns.
* Good shard keys minimize hotspots and maximize scalability.

### 6. **Supports Large Data Sets Beyond Single Server Limits**

* Sharding allows your database to exceed the storage or RAM capacity of any single machine.
* Enables handling **terabytes or petabytes** of data.

### 7. **Transparent to Applications**

* Applications connect to a **mongos query router**, which abstracts the sharded cluster.
* Developers can query the cluster as if it were a single database.

### 8. **Fault Tolerance and High Availability**

* Each shard is typically a **replica set**, so data remains available even if some nodes fail.
* Sharding combined with replication gives both scalability and resilience.

---

## 🔑 Summary Table

| Advantage                    | Description                                     |
| ---------------------------- | ----------------------------------------------- |
| Built-in Sharding            | Native horizontal partitioning of data          |
| Automatic Balancing          | Redistributes data across shards automatically  |
| Commodity Hardware Scaling   | Add more servers easily and cost-effectively    |
| Parallel Query Execution     | Queries can run across shards simultaneously    |
| Flexible Shard Key Selection | Optimize data distribution and reduce hotspots  |
| Handles Very Large Data Sets | Scales beyond single server limitations         |
| Transparent to Applications  | Query routing abstracts complexity              |
| Combines with Replication    | Provides both scalability and high availability |

---

Q19.  How do MongoDB transactions differ from SQL transactions?

->  Great question! While **MongoDB transactions** and **SQL transactions** share the same core ACID principles, there are some important differences due to their underlying architectures and data models.

---

## 🆚 MongoDB Transactions vs. SQL Transactions

| Aspect                      | MongoDB Transactions                                                                     | SQL Transactions                                                              |
| --------------------------- | ---------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------- |
| **Data Model**              | Document-based (NoSQL, JSON-like BSON documents)                                         | Relational tables with fixed schema                                           |
| **Transaction Scope**       | Multi-document transactions (since MongoDB 4.0)                                          | Multi-row, multi-table transactions                                           |
| **Atomicity**               | Atomic across multiple documents and collections                                         | Atomic across multiple rows/tables                                            |
| **Isolation Level**         | Snapshot isolation (default)                                                             | Varies by DBMS; often supports Serializable, Repeatable Read, Read Committed  |
| **Schema Flexibility**      | Schema-less or flexible schema                                                           | Rigid schema defined upfront                                                  |
| **Complex Joins**           | Limited support; aggregation pipelines and `$lookup` are used                            | Full SQL joins supported natively                                             |
| **Transaction Size Limits** | Limited by document size and oplog size; recommended to keep transactions short          | Generally no strict size limits, but long transactions can impact performance |
| **Performance Impact**      | Multi-document transactions add overhead; single-document operations are atomic and fast | Transactions are fundamental, optimized for relational workloads              |
| **Use Cases**               | Useful for atomic updates spanning multiple documents or collections                     | Ideal for complex, multi-table relational data with strong consistency        |

---

## 🔍 Key Differences Explained

### 1. **Data Model & Flexibility**

* MongoDB works with flexible documents that can vary in structure, so transactions must handle potentially diverse data formats.
* SQL uses fixed schemas, making it easier to enforce constraints and complex joins inside transactions.

### 2. **Transaction Support History**

* MongoDB introduced multi-document ACID transactions in version 4.0; before that, only single-document atomicity was guaranteed.
* SQL databases have supported multi-statement ACID transactions from the start.

### 3. **Isolation Levels**

* MongoDB provides **snapshot isolation**, meaning each transaction works with a consistent snapshot of data.
* SQL databases offer multiple isolation levels allowing fine control over locking and concurrency behavior.

### 4. **Joins and Relationships**

* MongoDB transactions rarely need complex joins because data is often embedded or referenced differently.
* SQL transactions routinely involve complex joins across normalized tables.

---

## 🧩 When to Use MongoDB Transactions vs SQL Transactions

* Use **MongoDB transactions** when you need atomicity across multiple documents or collections but still want a flexible schema and scalability.
* Use **SQL transactions** when your application relies heavily on complex relationships, joins, and strong relational integrity.

---

Q20. What are the main differences between capped collections and regular collections?

-> Great question! Here’s a clear comparison between **capped collections** and **regular collections** in MongoDB:

---

## 📊 Main Differences: Capped Collections vs Regular Collections

| Feature                | Capped Collections                                                             | Regular Collections                                  |
| ---------------------- | ------------------------------------------------------------------------------ | ---------------------------------------------------- |
| **Purpose**            | Store fixed-size, high-throughput, **circular** data (e.g., logs, caches)      | General-purpose data storage                         |
| **Size Limit**         | Fixed size defined at creation (in bytes or number of documents)               | No fixed size limit (grows dynamically)              |
| **Document Order**     | Preserve **insertion order** (FIFO behavior)                                   | No guaranteed order unless explicitly sorted         |
| **Document Overwrite** | Once size limit is reached, oldest documents are **overwritten automatically** | Documents remain until explicitly deleted or updated |
| **Updates**            | Can only update documents without increasing their size                        | Supports updates that can increase document size     |
| **Deletion**           | Documents are deleted automatically in circular fashion                        | Documents deleted manually                           |
| **Performance**        | Very fast writes and reads due to fixed size and no fragmentation              | General performance, may vary with size and indexes  |
| **Use Cases**          | Logging, caching, real-time analytics, queues                                  | General application data                             |
| **Indexing**           | Supports indexes, but with some restrictions                                   | Full indexing support                                |

---

## 🔍 Summary

* **Capped collections** are like a fixed-size circular buffer optimized for high-throughput inserts and reads, automatically deleting old data when full.
* **Regular collections** are flexible, grow as needed, and suit a wide variety of use cases.

---

Q21. What is the purpose of the $match stage in MongoDB’s aggregation pipeline?

-> Great question!

### 🎯 Purpose of the `$match` Stage in MongoDB’s Aggregation Pipeline

The **`$match`** stage is used to **filter documents** in the aggregation pipeline based on specified criteria — basically like a **query filter**.

---

## How `$match` Works:

* It filters input documents and passes only those that **match the conditions** to the next stage.
* You can use any standard query operators inside `$match` (e.g., `$eq`, `$gt`, `$in`, `$regex`, etc.).
* Usually placed early in the pipeline to **reduce the number of documents processed downstream**, improving performance.

---

## Example:

Suppose you have a collection `orders` and want to aggregate only orders where `status` is `"shipped"`:

```js
db.orders.aggregate([
  { $match: { status: "shipped" } },
  // other stages follow...
])
```

This filters the documents to only those with `status` equal to `"shipped"` before further processing.

---

## Key Points

| Feature                   | Description                                            |
| ------------------------- | ------------------------------------------------------ |
| Filters documents         | Selects documents matching criteria                    |
| Improves efficiency       | Limits documents early in pipeline                     |
| Supports rich queries     | Uses all MongoDB query operators                       |
| Can appear multiple times | Often combined with other stages for complex pipelines |

---

Q22.  How can you secure access to a MongoDB database?

-> Great question! Securing access to a MongoDB database involves multiple layers to protect data from unauthorized access and attacks. Here are the key steps and best practices:

---

## 🔒 How to Secure Access to a MongoDB Database

### 1. **Enable Authentication**

* Turn on authentication so that users must provide valid credentials to connect.
* MongoDB supports various authentication mechanisms like:

  * **SCRAM-SHA-1 / SCRAM-SHA-256** (default)
  * **LDAP**
  * **x.509 certificates**
  * **Kerberos**

---

### 2. **Use Role-Based Access Control (RBAC)**

* Assign **principle of least privilege** roles to users.
* Use built-in roles (read, readWrite, dbAdmin, etc.) or create **custom roles**.
* Limit user actions to only what they need.

---

### 3. **Encrypt Data in Transit**

* Enable **TLS/SSL** to encrypt network traffic between clients and MongoDB servers.
* Prevents eavesdropping and man-in-the-middle attacks.

---

### 4. **Encrypt Data at Rest**

* Use MongoDB’s **Encrypted Storage Engine** (Enterprise version) or encrypt disks at the OS level.
* Protects data on physical storage from unauthorized access.

---

### 5. **Restrict Network Access**

* Use **firewalls** and **IP whitelisting** to restrict which IP addresses can connect.
* In cloud environments (like MongoDB Atlas), configure **VPC peering** and private endpoints.
* Avoid exposing MongoDB directly to the public internet.

---

### 6. **Enable Auditing**

* Track database activities like login attempts, CRUD operations, and changes to roles.
* Helps detect suspicious behavior and supports compliance.

---

### 7. **Use Strong Passwords and Rotate Credentials**

* Enforce complex passwords and change them regularly.
* Avoid default accounts or credentials.

---

### 8. **Keep MongoDB Updated**

* Regularly apply security patches and updates to MongoDB server and drivers.
* Stay informed about security advisories.

---

### 9. **Limit Exposure of Sensitive Data**

* Use **field-level encryption** if needed (MongoDB Enterprise).
* Mask or redact sensitive data in query results when appropriate.

---

## 📋 Summary Table

| Security Measure     | Purpose                      |
| -------------------- | ---------------------------- |
| Authentication       | Verify user identity         |
| RBAC                 | Control user permissions     |
| TLS/SSL Encryption   | Secure data in transit       |
| Encryption at Rest   | Secure stored data           |
| Network Restrictions | Limit network access         |
| Auditing             | Monitor and log activity     |
| Strong Passwords     | Prevent unauthorized access  |
| Regular Updates      | Fix security vulnerabilities |

---

Q23.  What is MongoDB’s WiredTiger storage engine, and why is it important?

-> Great question! Here’s a clear explanation:

---

## 🔥 What is MongoDB’s WiredTiger Storage Engine?

**WiredTiger** is the **default storage engine** for MongoDB (since version 3.2). It’s the component responsible for managing how data is stored, accessed, and managed on disk.

---

## 🛠 Key Features of WiredTiger:

* **Document-level concurrency control:** Allows multiple operations to access different documents simultaneously without blocking each other, improving throughput.
* **Compression:** Supports data compression (Snappy or zlib) to reduce disk usage and improve I/O efficiency.
* **Checkpointing:** Periodically saves consistent snapshots of data to disk, improving recovery times after crashes.
* **Write-Ahead Logging (WAL):** Uses journaling to ensure durability and crash resilience.
* **Memory-mapped files:** Efficient use of RAM for caching frequently accessed data.
* **High performance and scalability:** Designed to work well with modern multi-core CPUs and large datasets.

---

## 📌 Why is WiredTiger Important?

* **Better Performance:** Thanks to document-level locking and compression, it improves read/write speeds compared to previous engines (like MMAPv1).
* **Efficient Storage:** Compression saves disk space and reduces I/O costs.
* **Improved Concurrency:** Multiple clients can read/write without significant contention.
* **Data Durability and Crash Recovery:** Journaling and checkpointing help keep your data safe even in failures.
* **Scalability:** Handles large-scale applications and workloads efficiently.

---

## Summary Table

| Feature                    | Benefit                                    |
| -------------------------- | ------------------------------------------ |
| Document-level Locking     | Higher concurrency, better throughput      |
| Compression                | Saves disk space and I/O bandwidth         |
| Journaling & Checkpointing | Data durability and fast recovery          |
| Efficient RAM Use          | Faster data access and caching             |
| Designed for Modern HW     | Scales with multi-core CPUs and large data |

---


#  Practical Questions

In [1]:
!pip install matplotlib-venn



In [2]:
!pip install pymongo


Collecting pymongo
  Downloading pymongo-4.14.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (22 kB)
Collecting dnspython<3.0.0,>=1.16.0 (from pymongo)
  Downloading dnspython-2.7.0-py3-none-any.whl.metadata (5.8 kB)
Downloading pymongo-4.14.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.4 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.4/1.4 MB[0m [31m23.4 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading dnspython-2.7.0-py3-none-any.whl (313 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m313.6/313.6 kB[0m [31m19.0 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: dnspython, pymongo
Successfully installed dnspython-2.7.0 pymongo-4.14.0


In [3]:
pip install pandas pymongo




Q1. Write a Python script to load the Superstore dataset from a CSV file into MongoDB.

In [7]:
from pymongo import MongoClient



In [9]:
client = MongoClient("mongodb+srv://palakkhejuri:pZ64iB8kU1WGruqP@cluster0.b2l4usi.mongodb.net/?retryWrites=true&w=majority&appName=Cluster0")


In [12]:
import pandas as pd
from pymongo import MongoClient

# === Configuration ===
CSV_FILE_PATH = '/content/superstore.csv'   # Path to the Superstore CSV file
MONGO_URI = 'mongodb+srv://palakkhejuri:pZ64iB8kU1WGruqP@cluster0.b2l4usi.mongodb.net/?retryWrites=true&w=majority&appName=Cluster0'    # MongoDB URI for local connection
DB_NAME = 'superstore'                      # Target database
COLLECTION_NAME = 'orders'                  # Target collection

# === Step 1: Load CSV using pandas ===
# === Step 1: Load CSV using pandas ===
try:
    df = pd.read_csv(CSV_FILE_PATH, encoding='ISO-8859-1')  # or use 'cp1252'
    print(f"✅ Loaded {len(df)} records from '{CSV_FILE_PATH}'.")
except Exception as e:
    print(f"❌ Error reading CSV file: {e}")
    exit(1)

# === Step 2: Convert DataFrame to list of dictionaries ===
records = df.to_dict(orient='records')

# === Step 3: Connect to MongoDB ===
try:
    client = MongoClient(MONGO_URI)
    db = client[DB_NAME]
    collection = db[COLLECTION_NAME]
    print(f"✅ Connected to MongoDB database '{DB_NAME}'.")
except Exception as e:
    print(f"❌ MongoDB connection failed: {e}")
    exit(1)

# === Step 4: Insert records into MongoDB ===
try:
    if records:
        result = collection.insert_many(records)
        print(f"✅ Inserted {len(result.inserted_ids)} documents into collection '{COLLECTION_NAME}'.")
    else:
        print("⚠️ No records to insert.")
except Exception as e:
    print(f"❌ Failed to insert records: {e}")


✅ Loaded 9994 records from '/content/superstore.csv'.
✅ Connected to MongoDB database 'superstore'.
✅ Inserted 9994 documents into collection 'orders'.


Q2. Retrieve and print all documents from the Orders collection.

In [14]:
!apt-get -qq install -y libarchive-dev && pip install -U libarchive
import libarchive

Selecting previously unselected package libarchive-dev:amd64.
(Reading database ... 126284 files and directories currently installed.)
Preparing to unpack .../libarchive-dev_3.6.0-1ubuntu1.5_amd64.deb ...
Unpacking libarchive-dev:amd64 (3.6.0-1ubuntu1.5) ...
Setting up libarchive-dev:amd64 (3.6.0-1ubuntu1.5) ...
Processing triggers for man-db (2.10.2-1) ...
Collecting libarchive
  Downloading libarchive-0.4.7.tar.gz (23 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting nose (from libarchive)
  Downloading nose-1.3.7-py3-none-any.whl.metadata (1.7 kB)
Downloading nose-1.3.7-py3-none-any.whl (154 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m154.7/154.7 kB[0m [31m3.9 MB/s[0m eta [36m0:00:00[0m
[?25hBuilding wheels for collected packages: libarchive
  Building wheel for libarchive (setup.py) ... [?25l[?25hdone
  Created wheel for libarchive: filename=libarchive-0.4.7-py3-none-any.whl size=31629 sha256=11f98fca343184684aff4aca56261e335d836193d7

In [17]:
!apt-get -qq install -y graphviz && pip install pydot
import pydot



In [19]:
!pip install python-dotenv


Collecting python-dotenv
  Downloading python_dotenv-1.1.1-py3-none-any.whl.metadata (24 kB)
Downloading python_dotenv-1.1.1-py3-none-any.whl (20 kB)
Installing collected packages: python-dotenv
Successfully installed python-dotenv-1.1.1


In [20]:
from pymongo import MongoClient
from dotenv import load_dotenv
import os
import pprint

# === Load environment variables ===
load_dotenv()

# === Configuration from .env ===
MONGO_USERNAME = os.getenv("MONGO_USERNAME")
MONGO_PASSWORD = os.getenv("MONGO_PASSWORD")
MONGO_CLUSTER = os.getenv("MONGO_CLUSTER")
DB_NAME = os.getenv("DB_NAME", "superstore")
COLLECTION_NAME = 'orders'

# === Construct MongoDB URI ===
MONGO_URI = 'mongodb+srv://palakkhejuri:pZ64iB8kU1WGruqP@cluster0.b2l4usi.mongodb.net/?retryWrites=true&w=majority&appName=Cluster0'

# === Connect to MongoDB ===
try:
    client = MongoClient(MONGO_URI)
    db = client[DB_NAME]
    collection = db[COLLECTION_NAME]
    print(f"✅ Connected to MongoDB database '{DB_NAME}'.")
except Exception as e:
    print(f"❌ MongoDB connection failed: {e}")
    exit(1)

# === Retrieve and print all documents ===
try:
    print(f"\n📄 All documents in '{COLLECTION_NAME}' collection:\n")
    for doc in collection.find():
        pprint.pprint(doc)
except Exception as e:
    print(f"❌ Failed to retrieve documents: {e}")


[1;30;43mStreaming output truncated to the last 5000 lines.[0m
 'Segment': 'Consumer',
 'Ship Date': '12/15/2017',
 'Ship Mode': 'Standard Class',
 'State': 'Michigan',
 'Sub-Category': 'Binders',
 '_id': ObjectId('6896428c6603f244d6d94bc0')}
{'Category': 'Office Supplies',
 'City': 'San Francisco',
 'Country': 'United States',
 'Customer ID': 'TC-21535',
 'Customer Name': 'Tracy Collins',
 'Discount': 0.0,
 'Order Date': '12/7/2017',
 'Order ID': 'CA-2017-142328',
 'Postal Code': 94122,
 'Product ID': 'OFF-PA-10000380',
 'Product Name': 'REDIFORM Incoming/Outgoing Call Register, 11" X 8 1/2", 100 '
                 'Messages',
 'Profit': 25.02,
 'Quantity': 6,
 'Region': 'West',
 'Row ID': 9769,
 'Sales': 50.04,
 'Segment': 'Home Office',
 'Ship Date': '12/14/2017',
 'Ship Mode': 'Standard Class',
 'State': 'California',
 'Sub-Category': 'Paper',
 '_id': ObjectId('6896428c6603f244d6d94bc1')}
{'Category': 'Furniture',
 'City': 'Hialeah',
 'Country': 'United States',
 'Customer ID': '

Q3.  Count and display the total number of documents in the Orders collection.

In [21]:
from pymongo import MongoClient
from dotenv import load_dotenv
import os

# === Load environment variables ===
load_dotenv()

# === Configuration from .env ===
MONGO_USERNAME = os.getenv("MONGO_USERNAME")
MONGO_PASSWORD = os.getenv("MONGO_PASSWORD")
MONGO_CLUSTER = os.getenv("MONGO_CLUSTER")
DB_NAME = os.getenv("DB_NAME", "superstore")
COLLECTION_NAME = 'orders'

# === Construct MongoDB URI ===
#MONGO_URI = f"mongodb+srv://{MONGO_USERNAME}:{MONGO_PASSWORD}@{MONGO_CLUSTER}/{DB_NAME}?retryWrites=true&w=majority"

# === Connect to MongoDB ===
try:
    client = MongoClient(MONGO_URI)
    db = client[DB_NAME]
    collection = db[COLLECTION_NAME]
    print(f"✅ Connected to MongoDB database '{DB_NAME}'.")
except Exception as e:
    print(f"❌ MongoDB connection failed: {e}")
    exit(1)

# === Count and display documents ===
try:
    total_docs = collection.count_documents({})
    print(f"📊 Total documents in '{COLLECTION_NAME}' collection: {total_docs}")
except Exception as e:
    print(f"❌ Failed to count documents: {e}")


✅ Connected to MongoDB database 'superstore'.
📊 Total documents in 'orders' collection: 9994


Q4. Write a query to fetch all orders from the "West" region

In [22]:
query = {"Region": "West"}


In [23]:
from pymongo import MongoClient
from dotenv import load_dotenv
import os
import pprint

# === Load environment variables ===
load_dotenv()

# === Configuration ===
MONGO_USERNAME = os.getenv("MONGO_USERNAME")
MONGO_PASSWORD = os.getenv("MONGO_PASSWORD")
MONGO_CLUSTER = os.getenv("MONGO_CLUSTER")
DB_NAME = os.getenv("DB_NAME", "superstore")
COLLECTION_NAME = 'orders'

# === MongoDB URI ===
#MONGO_URI = f"mongodb+srv://{MONGO_USERNAME}:{MONGO_PASSWORD}@{MONGO_CLUSTER}/{DB_NAME}?retryWrites=true&w=majority"

# === Connect to MongoDB ===
try:
    client = MongoClient(MONGO_URI)
    db = client[DB_NAME]
    collection = db[COLLECTION_NAME]
    print(f"✅ Connected to MongoDB database '{DB_NAME}'.")
except Exception as e:
    print(f"❌ MongoDB connection failed: {e}")
    exit(1)

# === Query for orders from the "West" region ===
try:
    query = {"Region": "West"}
    west_orders = collection.find(query)

    print(f"\n📦 Orders from 'West' region:\n")
    for order in west_orders:
        pprint.pprint(order)

except Exception as e:
    print(f"❌ Failed to retrieve 'West' region orders: {e}")


[1;30;43mStreaming output truncated to the last 5000 lines.[0m
 'State': 'New Mexico',
 'Sub-Category': 'Art',
 '_id': ObjectId('6896428c6603f244d6d949d3')}
{'Category': 'Furniture',
 'City': 'Seattle',
 'Country': 'United States',
 'Customer ID': 'HJ-14875',
 'Customer Name': 'Heather Jas',
 'Discount': 0.0,
 'Order Date': '9/20/2016',
 'Order ID': 'CA-2016-166772',
 'Postal Code': 98105,
 'Product ID': 'FUR-BO-10002853',
 'Product Name': "O'Sullivan 5-Shelf Heavy-Duty Bookcases",
 'Profit': 40.97,
 'Quantity': 2,
 'Region': 'West',
 'Row ID': 9281,
 'Sales': 163.88,
 'Segment': 'Home Office',
 'Ship Date': '9/24/2016',
 'Ship Mode': 'Standard Class',
 'State': 'Washington',
 'Sub-Category': 'Bookcases',
 '_id': ObjectId('6896428c6603f244d6d949d9')}
{'Category': 'Office Supplies',
 'City': 'Fort Collins',
 'Country': 'United States',
 'Customer ID': 'LW-17215',
 'Customer Name': 'Luke Weiss',
 'Discount': 0.2,
 'Order Date': '9/2/2017',
 'Order ID': 'CA-2017-102218',
 'Postal Code':

Q5.  Write a query to find orders where Sales is greater than 500.

In [24]:
query = { "Sales": { "$gt": 500 } }


In [25]:
from pymongo import MongoClient
from dotenv import load_dotenv
import os
import pprint

# === Load environment variables ===
load_dotenv()

# === Configuration ===
MONGO_USERNAME = os.getenv("MONGO_USERNAME")
MONGO_PASSWORD = os.getenv("MONGO_PASSWORD")
MONGO_CLUSTER = os.getenv("MONGO_CLUSTER")
DB_NAME = os.getenv("DB_NAME", "superstore")
COLLECTION_NAME = 'orders'

# === Construct MongoDB URI ===
#MONGO_URI = f"mongodb+srv://{MONGO_USERNAME}:{MONGO_PASSWORD}@{MONGO_CLUSTER}/{DB_NAME}?retryWrites=true&w=majority"

# === Connect to MongoDB ===
try:
    client = MongoClient(MONGO_URI)
    db = client[DB_NAME]
    collection = db[COLLECTION_NAME]
    print(f"✅ Connected to MongoDB database '{DB_NAME}'.")
except Exception as e:
    print(f"❌ MongoDB connection failed: {e}")
    exit(1)

# === Query: Find orders where Sales > 500 ===
try:
    query = { "Sales": { "$gt": 500 } }
    high_sales_orders = collection.find(query)

    print("\n💰 Orders with Sales > 500:\n")
    for order in high_sales_orders:
        pprint.pprint(order)

except Exception as e:
    print(f"❌ Failed to retrieve high-sales orders: {e}")


[1;30;43mStreaming output truncated to the last 5000 lines.[0m
 'Country': 'United States',
 'Customer ID': 'CW-11905',
 'Customer Name': 'Carl Weiss',
 'Discount': 0.0,
 'Order Date': '12/8/2017',
 'Order ID': 'CA-2017-152436',
 'Postal Code': 2920,
 'Product ID': 'OFF-ST-10000036',
 'Product Name': 'Recycled Data-Pak for Archival Bound Computer Printouts, '
                 '12-1/2 x 12-1/2 x 16',
 'Profit': 160.0398,
 'Quantity': 6,
 'Region': 'East',
 'Row ID': 7821,
 'Sales': 592.74,
 'Segment': 'Home Office',
 'Ship Date': '12/10/2017',
 'Ship Mode': 'Second Class',
 'State': 'Rhode Island',
 'Sub-Category': 'Storage',
 '_id': ObjectId('6896428c6603f244d6d94425')}
{'Category': 'Technology',
 'City': 'Chicago',
 'Country': 'United States',
 'Customer ID': 'CY-12745',
 'Customer Name': 'Craig Yedwab',
 'Discount': 0.2,
 'Order Date': '10/31/2017',
 'Order ID': 'CA-2017-117114',
 'Postal Code': 60610,
 'Product ID': 'TEC-PH-10004042',
 'Product Name': 'ClearOne Communications CHAT

Q6.  Fetch the top 3 orders with the highest Profit.

In [27]:
from pymongo import MongoClient
from dotenv import load_dotenv
import os
import pprint

# === Load environment variables ===
load_dotenv()

# === Configuration ===
MONGO_USERNAME = os.getenv("MONGO_USERNAME")
MONGO_PASSWORD = os.getenv("MONGO_PASSWORD")
MONGO_CLUSTER = os.getenv("MONGO_CLUSTER")
DB_NAME = os.getenv("DB_NAME", "superstore")
COLLECTION_NAME = 'orders'

# === MongoDB URI ===
#MONGO_URI = f"mongodb+srv://{MONGO_USERNAME}:{MONGO_PASSWORD}@{MONGO_CLUSTER}/{DB_NAME}?retryWrites=true&w=majority"

# === Connect to MongoDB ===
try:
    client = MongoClient(MONGO_URI)
    db = client[DB_NAME]
    collection = db[COLLECTION_NAME]
    print(f"✅ Connected to MongoDB database '{DB_NAME}'.")
except Exception as e:
    print(f"❌ MongoDB connection failed: {e}")
    exit(1)

# === Query: Top 3 orders with highest Profit ===
try:
    top_profit_orders = collection.find().sort("Profit", -1).limit(3)

    print("\n🏆 Top 3 orders with highest Profit:\n")
    for order in top_profit_orders:
        pprint.pprint(order)

except Exception as e:
    print(f"❌ Failed to fetch top profit orders: {e}")


✅ Connected to MongoDB database 'superstore'.

🏆 Top 3 orders with highest Profit:

{'Category': 'Technology',
 'City': 'Lafayette',
 'Country': 'United States',
 'Customer ID': 'TC-20980',
 'Customer Name': 'Tamara Chand',
 'Discount': 0.0,
 'Order Date': '10/2/2016',
 'Order ID': 'CA-2016-118689',
 'Postal Code': 47905,
 'Product ID': 'TEC-CO-10004722',
 'Product Name': 'Canon imageCLASS 2200 Advanced Copier',
 'Profit': 8399.976,
 'Quantity': 5,
 'Region': 'Central',
 'Row ID': 6827,
 'Sales': 17499.95,
 'Segment': 'Corporate',
 'Ship Date': '10/9/2016',
 'Ship Mode': 'Standard Class',
 'State': 'Indiana',
 'Sub-Category': 'Copiers',
 '_id': ObjectId('6896428c6603f244d6d94043')}
{'Category': 'Technology',
 'City': 'Seattle',
 'Country': 'United States',
 'Customer ID': 'RB-19360',
 'Customer Name': 'Raymond Buch',
 'Discount': 0.0,
 'Order Date': '3/23/2017',
 'Order ID': 'CA-2017-140151',
 'Postal Code': 98115,
 'Product ID': 'TEC-CO-10004722',
 'Product Name': 'Canon imageCLASS 22

Q7.  Update all orders with Ship Mode as "First Class" to "Premium Class."

In [28]:
from pymongo import MongoClient
from dotenv import load_dotenv
import os

# === Load environment variables ===
load_dotenv()

# === Configuration ===
MONGO_USERNAME = os.getenv("MONGO_USERNAME")
MONGO_PASSWORD = os.getenv("MONGO_PASSWORD")
MONGO_CLUSTER = os.getenv("MONGO_CLUSTER")
DB_NAME = os.getenv("DB_NAME", "superstore")
COLLECTION_NAME = 'orders'

# === MongoDB URI ===
#MONGO_URI = f"mongodb+srv://{MONGO_USERNAME}:{MONGO_PASSWORD}@{MONGO_CLUSTER}/{DB_NAME}?retryWrites=true&w=majority"

# === Connect to MongoDB ===
try:
    client = MongoClient(MONGO_URI)
    db = client[DB_NAME]
    collection = db[COLLECTION_NAME]
    print(f"✅ Connected to MongoDB database '{DB_NAME}'.")
except Exception as e:
    print(f"❌ MongoDB connection failed: {e}")
    exit(1)

# === Update all documents where Ship Mode is "First Class" ===
try:
    filter_query = {"Ship Mode": "First Class"}
    update_operation = {"$set": {"Ship Mode": "Premium Class"}}

    result = collection.update_many(filter_query, update_operation)
    print(f"✅ Updated {result.modified_count} documents from 'First Class' to 'Premium Class'.")
except Exception as e:
    print(f"❌ Failed to update documents: {e}")


✅ Connected to MongoDB database 'superstore'.
✅ Updated 1538 documents from 'First Class' to 'Premium Class'.


Q8.  Delete all orders where Sales is less than 50.

In [29]:
from pymongo import MongoClient
from dotenv import load_dotenv
import os

# === Load environment variables ===
load_dotenv()

# === Configuration ===
MONGO_USERNAME = os.getenv("MONGO_USERNAME")
MONGO_PASSWORD = os.getenv("MONGO_PASSWORD")
MONGO_CLUSTER = os.getenv("MONGO_CLUSTER")
DB_NAME = os.getenv("DB_NAME", "superstore")
COLLECTION_NAME = 'orders'

# === MongoDB URI ===
#MONGO_URI = f"mongodb+srv://{MONGO_USERNAME}:{MONGO_PASSWORD}@{MONGO_CLUSTER}/{DB_NAME}?retryWrites=true&w=majority"

# === Connect to MongoDB ===
try:
    client = MongoClient(MONGO_URI)
    db = client[DB_NAME]
    collection = db[COLLECTION_NAME]
    print(f"✅ Connected to MongoDB database '{DB_NAME}'.")
except Exception as e:
    print(f"❌ MongoDB connection failed: {e}")
    exit(1)

# === Delete orders where Sales < 50 ===
try:
    filter_query = {"Sales": {"$lt": 50}}
    result = collection.delete_many(filter_query)
    print(f"✅ Deleted {result.deleted_count} documents where Sales < 50.")
except Exception as e:
    print(f"❌ Failed to delete documents: {e}")


✅ Connected to MongoDB database 'superstore'.
✅ Deleted 4849 documents where Sales < 50.


Q9.  Use aggregation to group orders by Region and calculate total sales per region.

In [30]:
from pymongo import MongoClient
from dotenv import load_dotenv
import os
import pprint

# === Load environment variables ===
load_dotenv()

# === Configuration ===
MONGO_USERNAME = os.getenv("MONGO_USERNAME")
MONGO_PASSWORD = os.getenv("MONGO_PASSWORD")
MONGO_CLUSTER = os.getenv("MONGO_CLUSTER")
DB_NAME = os.getenv("DB_NAME", "superstore")
COLLECTION_NAME = 'orders'

# === MongoDB URI ===
#MONGO_URI = f"mongodb+srv://{MONGO_USERNAME}:{MONGO_PASSWORD}@{MONGO_CLUSTER}/{DB_NAME}?retryWrites=true&w=majority"

# === Connect to MongoDB ===
try:
    client = MongoClient(MONGO_URI)
    db = client[DB_NAME]
    collection = db[COLLECTION_NAME]
    print(f"✅ Connected to MongoDB database '{DB_NAME}'.")
except Exception as e:
    print(f"❌ MongoDB connection failed: {e}")
    exit(1)

# === Aggregation pipeline to group by Region and sum Sales ===
try:
    pipeline = [
        {
            "$group": {
                "_id": "$Region",
                "totalSales": {"$sum": "$Sales"}
            }
        },
        {
            "$sort": {"totalSales": -1}  # Optional: sort descending by totalSales
        }
    ]

    results = collection.aggregate(pipeline)

    print("\n📊 Total Sales by Region:\n")
    for doc in results:
        print(f"Region: {doc['_id']}, Total Sales: {doc['totalSales']:.2f}")

except Exception as e:
    print(f"❌ Aggregation failed: {e}")


✅ Connected to MongoDB database 'superstore'.

📊 Total Sales by Region:

Region: West, Total Sales: 694686.62
Region: East, Total Sales: 651137.70
Region: Central, Total Sales: 479611.85
Region: South, Total Sales: 376023.31


Q10.  Fetch all distinct values for Ship Mode from the collection

In [31]:
from pymongo import MongoClient
from dotenv import load_dotenv
import os

# === Load environment variables ===
load_dotenv()

# === Configuration ===
MONGO_USERNAME = os.getenv("MONGO_USERNAME")
MONGO_PASSWORD = os.getenv("MONGO_PASSWORD")
MONGO_CLUSTER = os.getenv("MONGO_CLUSTER")
DB_NAME = os.getenv("DB_NAME", "superstore")
COLLECTION_NAME = 'orders'

# === MongoDB URI ===
#MONGO_URI = f"mongodb+srv://{MONGO_USERNAME}:{MONGO_PASSWORD}@{MONGO_CLUSTER}/{DB_NAME}?retryWrites=true&w=majority"

# === Connect to MongoDB ===
try:
    client = MongoClient(MONGO_URI)
    db = client[DB_NAME]
    collection = db[COLLECTION_NAME]
    print(f"✅ Connected to MongoDB database '{DB_NAME}'.")
except Exception as e:
    print(f"❌ MongoDB connection failed: {e}")
    exit(1)

# === Get all distinct Ship Mode values ===
try:
    distinct_ship_modes = collection.distinct("Ship Mode")
    print("\n🚚 Distinct Ship Modes in the 'orders' collection:")
    for mode in distinct_ship_modes:
        print(f"- {mode}")
except Exception as e:
    print(f"❌ Failed to fetch distinct ship modes: {e}")


✅ Connected to MongoDB database 'superstore'.

🚚 Distinct Ship Modes in the 'orders' collection:
- Premium Class
- Same Day
- Second Class
- Standard Class


Q11.  Count the number of orders for each category.

In [32]:
from pymongo import MongoClient
from dotenv import load_dotenv
import os

# === Load environment variables ===
load_dotenv()

# === Configuration ===
MONGO_USERNAME = os.getenv("MONGO_USERNAME")
MONGO_PASSWORD = os.getenv("MONGO_PASSWORD")
MONGO_CLUSTER = os.getenv("MONGO_CLUSTER")
DB_NAME = os.getenv("DB_NAME", "superstore")
COLLECTION_NAME = 'orders'

# === MongoDB URI ===
#MONGO_URI = f"mongodb+srv://{MONGO_USERNAME}:{MONGO_PASSWORD}@{MONGO_CLUSTER}/{DB_NAME}?retryWrites=true&w=majority"

# === Connect to MongoDB ===
try:
    client = MongoClient(MONGO_URI)
    db = client[DB_NAME]
    collection = db[COLLECTION_NAME]
    print(f"✅ Connected to MongoDB database '{DB_NAME}'.")
except Exception as e:
    print(f"❌ MongoDB connection failed: {e}")
    exit(1)

# === Aggregation: Count orders per category ===
try:
    pipeline = [
        {
            "$group": {
                "_id": "$Category",
                "orderCount": {"$sum": 1}
            }
        },
        {
            "$sort": {"orderCount": -1}  # Optional: sort by count descending
        }
    ]

    results = collection.aggregate(pipeline)

    print("\n📦 Order count per category:\n")
    for doc in results:
        print(f"Category: {doc['_id']}, Orders: {doc['orderCount']}")

except Exception as e:
    print(f"❌ Aggregation failed: {e}")


✅ Connected to MongoDB database 'superstore'.

📦 Order count per category:

Category: Office Supplies, Orders: 2076
Category: Furniture, Orders: 1573
Category: Technology, Orders: 1496
