<a href="https://colab.research.google.com/github/kanchandhole/Data-Scientist/blob/main/MongoDB_Theoretical_Questions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Q1.**What are the key differences between SQL and NoSQL databases?

**Ans:**Data Model

SQL → Structured data, stored in tables with rows & columns.

NoSQL → Unstructured/semi-structured data, stored in documents, key-value pairs, wide-columns, or graphs.

Example:

SQL:

CREATE TABLE Customers (
  ID INT,
  Name VARCHAR(100),
  Age INT
);


NoSQL (MongoDB document):

{
  "ID": 1,
  "Name": "Alice",
  "Age": 25
}


Schema

SQL → Fixed schema (columns must be predefined).

NoSQL → Flexible schema (documents can have different fields).

Example:

SQL: Every row must have the same columns (Name, Age).

NoSQL: One document can have {"Name": "Alice", "Age": 25} and another can have {"Name": "Bob", "City": "Delhi"}.

Scalability

SQL → Vertical scaling (increase power of one server).

NoSQL → Horizontal scaling (add more servers easily).

Example:

SQL: Add more CPU/RAM to one server.

NoSQL: Add more machines to handle more data.

Transactions

SQL → Strong ACID properties (Atomicity, Consistency, Isolation, Durability).

NoSQL → Often BASE (Basically Available, Soft state, Eventually consistent).

Example:

SQL: Banking system → Transfer ₹100 must update both accounts safely.

NoSQL: Social media likes → Eventual consistency is fine (a like count may update a bit later).

Query Language

SQL → Structured Query Language (SQL).

NoSQL → Different APIs, JSON-based queries, or key-value access.

 Example:

SQL:

SELECT * FROM Customers WHERE Age > 30;


MongoDB (NoSQL):

db.Customers.find({ Age: { $gt: 30 } })


Use Cases

SQL → Best for structured data, complex queries, and transactions. (e.g., Banking, ERP systems).

NoSQL → Best for big data, real-time apps, unstructured data. (e.g., Social media, IoT, Recommendation engines).

**Q2.** What makes MongoDB a good choice for modern applications?

**ans:** Flexible Schema (No Fixed Structure)

MongoDB stores data in JSON-like documents (BSON).

Fields can differ from one document to another.

Example:

{ "name": "Alice", "email": "alice@gmail.com" }
{ "name": "Bob", "phone": "9876543210", "city": "Delhi" }


Unlike SQL, no need to predefine all columns.

Scalability & Performance

Supports horizontal scaling using sharding → data is distributed across multiple servers.

Good for apps that need to handle huge traffic & big data.

Example: Social media apps (Instagram, Twitter-like apps).

High Availability

Built-in replica sets (copies of data on multiple servers).

If one server fails, another takes over automatically.

Useful for 24/7 apps like e-commerce or banking dashboards.

Powerful Querying

Rich query language → supports filters, aggregation, sorting, regex, geospatial queries, etc.

Example:

db.orders.find({ sales: { $gt: 500 }, region: "West" })


Aggregation Framework

Allows advanced analytics directly inside MongoDB (like SQL GROUP BY).

Example: Total sales per region:

db.orders.aggregate([
  { $group: { _id: "$Region", totalSales: { $sum: "$Sales" } } }
])


Integration with Modern Tech Stack

Works seamlessly with JavaScript (Node.js), Python, Java, Go, Cloud services.

JSON/BSON fits naturally with REST APIs & microservices.

Supports Modern Data Types

Can handle geospatial data, arrays, nested documents, and even time-series data.

Example: A ride-sharing app like Uber can store driver location as:

{ "driver": "Rahul", "location": { "lat": 28.7041, "lng": 77.1025 } }


Open Source & Cloud Ready

Free community edition + MongoDB Atlas (fully managed cloud DB).

Easy deployment in AWS, Azure, GCP.

**Q3.** Explain the concept of collections in MongoDB.

**Ans:**

In MongoDB, a collection is a group of documents.

It is the equivalent of a table in SQL databases.

Unlike SQL tables, collections do not enforce a fixed schema → documents inside the same collection can have different fields.

Key Features of Collections

Schema-less → No need to predefine columns.

Flexible → Each document can have different fields.

Dynamic → You can insert new fields at any time.

Organized grouping → Similar types of data are stored together.

// Collection: Users

{ "_id": 1, "name": "Alice", "email": "alice@gmail.com", "phone": "1234567890" }
{ "_id": 2, "name": "Bob", "email": "bob@yahoo.com" }


**Q4.** How does MongoDB ensure high availability using replication?

**Ans:**
Replication in MongoDB means making copies of data across multiple servers.

If one server (primary) fails, another server (secondary) can take over automatically → ensuring high availability.

Replication Set (Replica Set)

A replica set is a group of MongoDB servers that maintain the same data set.

It consists of:

Primary → handles all read & write operations.

Secondary(s) → maintain copies of the data by replicating from the primary.

Arbiter (optional) → participates in voting during failover but does not store data.

How High Availability Works

If the Primary server goes down, MongoDB automatically triggers an election among the secondaries.

One of the secondaries is promoted to become the new Primary.

This process ensures no single point of failure.



 Replica Set with 3 Nodes

+-----------+         +-----------+         +-----------+
| Primary   | <-----> | Secondary | <-----> | Secondary |
| (Writes)  |         | (Replica) |         | (Replica) |
+-----------+         +-----------+         +-----------+


Clients always write to the Primary.

Secondaries replicate data from the primary in real time.

If the primary fails → one secondary is promoted to primary automatically.

Advantages of Replication
High Availability → automatic failover.
Data Redundancy → copies of data on multiple servers.
Read Scalability → secondary nodes can handle read queries (if configured).
Disaster Recovery → backups from secondaries possible.

Example Command (Creating a Replica Set in MongoDB Shell)

rs.initiate(
   {
      _id: "rs0",
      members: [
         { _id: 0, host: "localhost:27017" },
         { _id: 1, host: "localhost:27018" },
         { _id: 2, host: "localhost:27019" }
      ]
   }
)

**Q5.** What are the main benefits of MongoDB Atlas?

**Ans:**
1. Fully Managed Service

No need to worry about installation, setup, upgrades, or maintenance.

MongoDB Atlas takes care of everything automatically.

2. High Availability (HA)

Built-in replica sets across multiple availability zones.

Automatic failover if a primary node goes down.
 Applications stay online with minimal downtime.

3. Global Distribution

Deploy databases across multiple regions worldwide.

Data is closer to your users, reducing latency.

Multi-cloud support (AWS, Azure, Google Cloud).

4. Scalability

Vertical Scaling → Increase cluster size (more CPU, RAM, storage).

Horizontal Scaling (Sharding) → Automatically distribute data across multiple servers for very large datasets.

5. Security

End-to-end encryption (at rest and in transit).

Fine-grained access control with roles.

Integration with cloud identity providers (AWS IAM, Azure AD, Okta, etc.).

Network isolation (VPC peering, private endpoints).

6. Performance Optimization

Built-in monitoring and performance dashboards.

Automated backups and point-in-time recovery.

Intelligent indexing suggestions for queries.

7. Serverless & Flexible

Serverless instances → Pay only for what you use.

Flexible document model (NoSQL), handling structured + unstructured data.

Ideal for modern apps (IoT, real-time analytics, AI/ML apps).

8. Integration & Ecosystem

Works seamlessly with BI tools (Tableau, Power BI).

Supports GraphQL and Data API for easy app development.

Built-in support for full-text search and analytics.

Example Use Case:
A global e-commerce site using MongoDB Atlas can:

Deploy clusters in US, Europe, and Asia for low latency.

Automatically scale up during holiday sales traffic.

Ensure zero downtime if one region/server fails.

**Q6.** What is the role of indexes in MongoDB, and how do they improve performance?

**ans:** Role of Indexes in MongoDB

An index in MongoDB is a data structure (typically a B-tree) that improves the speed of query operations by allowing the database to quickly locate documents without scanning the entire collection.

 How Indexes Improve Performance
1. Faster Query Execution

Without an index → MongoDB performs a collection scan (checks every document).

With an index → MongoDB can jump directly to the documents that match.

 Example:

db.users.find({ age: 25 })


Without index on age → MongoDB checks every document.

With index on age → MongoDB instantly finds matching documents.

2. Support for Sorting

Queries with .sort() can use indexes instead of sorting all documents in memory.

 Example:

db.orders.find().sort({ orderDate: -1 })


If there’s an index on orderDate, sorting is fast.

3. Efficient Range Queries

Indexes help with queries like $gt, $lt, $gte, $lte.

 Example:

db.products.find({ price: { $gt: 500 } })


With an index on price, MongoDB can quickly find all products above 500.

4. Unique Constraints

Indexes can enforce uniqueness (like primary keys in SQL).

 Example:

db.users.createIndex({ email: 1 }, { unique: true })


This prevents duplicate emails.

 Types of Indexes in MongoDB

Single Field Index → On one field (e.g., name).

Compound Index → On multiple fields (e.g., { lastName: 1, firstName: 1 }).

Multikey Index → For arrays (e.g., tags: ["tech", "AI"]).

Text Index → For full-text search in strings.

Hashed Index → For sharding & equality queries.

Example in Mongo Shell
// Create index on "age"
db.users.createIndex({ age: 1 })

// Check indexes on collection
db.users.getIndexes()

 Quick Analogy

Think of a book without an index → you’d flip through every page to find "Machine Learning".
With an index at the back → you jump directly to the right page.
 Same idea in MongoDB → Indexes save time and resources.

**Q7.** Describe the stages of the MongoDB aggregation pipeline.

**Ans:** Key Stages of the MongoDB Aggregation Pipeline
1. $match – Filtering Documents

Filters documents by a condition (like WHERE in SQL).

Reduces the number of documents moving to the next stage.

 Example:

db.orders.aggregate([
  { $match: { region: "West" } }
])

2. $group – Grouping Data

Groups documents by a field and performs aggregations (SUM, AVG, COUNT).

 Example:

db.orders.aggregate([
  { $group: { _id: "$Category", totalSales: { $sum: "$Sales" } } }
])

3. $project – Reshaping Documents

Selects specific fields, renames them, creates computed fields.

 Example:

db.orders.aggregate([
  { $project: { orderId: 1, profitMargin: { $divide: ["$Profit", "$Sales"] } } }
])

4. $sort – Sorting Data

Orders documents by one or more fields.

 Example:

db.orders.aggregate([
  { $sort: { Sales: -1 } }   // Descending order
])

5. $limit – Limiting Results

Restricts the number of documents passed.

 Example:

db.orders.aggregate([
  { $limit: 5 }
])

6. $skip – Skipping Documents

Skips a given number of documents (useful for pagination).

Example:

db.orders.aggregate([
  { $skip: 10 }
])

7. $unwind – Breaking Down Arrays

Deconstructs an array field into multiple documents (like flattening).

 Example:

db.orders.aggregate([
  { $unwind: "$items" }
])

8. $lookup – Joining Collections

Performs a left outer join with another collection.

 Example:

db.orders.aggregate([
  {
    $lookup: {
      from: "customers",
      localField: "customerId",
      foreignField: "_id",
      as: "customerDetails"
    }
  }
])

9. $count – Counting Documents

Counts the number of documents in the pipeline.

Example:

db.orders.aggregate([
  { $match: { region: "West" } },
  { $count: "totalOrders" }
])

10. $facet – Multiple Pipelines in Parallel

Runs different pipelines in a single stage.

 Example:

db.orders.aggregate([
  {
    $facet: {
      totalByRegion: [{ $group: { _id: "$Region", total: { $sum: "$Sales" } } }],
      topOrders: [{ $sort: { Sales: -1 } }, { $limit: 3 }]
    }
  }
])

 Analogy

Think of the aggregation pipeline like a factory assembly line:

Raw materials (documents) go in,

Each machine (stage) performs a task (filtering, grouping, sorting, etc.),

The final product (result set) comes out.

**Q8.** What is sharding in MongoDB? How does it differ from replication?

**ans:**
Sharding in MongoDB

Sharding is the process of splitting large datasets into smaller chunks and distributing them across multiple servers (called shards).

Each shard holds a portion of the data.

A shard key determines how data is divided.

A mongos router routes queries to the right shard(s).

Goal → Scalability (handle huge amounts of data & high throughput).

Example:
If you have 1 billion customer orders:

Shard 1 → Orders from Region = "West"

Shard 2 → Orders from Region = "East"

Shard 3 → Orders from Region = "South"

 Replication in MongoDB

Replication means making multiple copies of the same data across different servers.

One primary node → accepts writes.

Multiple secondary nodes → replicate (copy) the primary’s data.

If the primary fails, a secondary becomes the new primary (automatic failover).

Goal → High availability & fault tolerance.

Example:

Node A (Primary) → handles writes.

Node B, C (Secondaries) → keep exact copies.

If Node A fails → Node B takes over automatically.

**Q9.** What is PyMongo, and why is it used?

**ans:**

PyMongo is the official Python driver (library) for working with MongoDB.

It provides the tools to connect Python applications to a MongoDB database, perform queries, insert/update/delete documents, and use advanced MongoDB features.

In simple words: PyMongo = Bridge between Python and MongoDB.

 Why is PyMongo used?

Database Connectivity

Connects Python code to MongoDB databases (MongoClient).

CRUD Operations

Allows performing Create, Read, Update, Delete on collections.

Query Execution

Lets you write MongoDB queries in Python (e.g., find(), aggregate()).

Scalability

Works with MongoDB’s advanced features like replication and sharding.

Ease of Use

Python-friendly API that integrates smoothly with libraries like Pandas, Flask, Django, etc.

**Q10.** What are the ACID properties in the context of MongoDB transactions?

**Ans:** ACID Properties in MongoDB Transactions

ACID stands for Atomicity, Consistency, Isolation, Durability — fundamental principles that ensure reliable database transactions.

MongoDB introduced multi-document ACID transactions starting from version 4.0, so it can behave like traditional relational databases when needed.

1. Atomicity

 A transaction’s operations are all-or-nothing:

Either all operations succeed, or none are applied.

Example: Transferring money between two accounts — debit and credit happen together, or neither happens.

In MongoDB: with client.start_session() as session:
    
    with session.start_transaction():
        accounts.update_one({"name": "Alice"}, {"$inc": {"balance": -100}}, session=session)
        accounts.update_one({"name": "Bob"}, {"$inc": {"balance": 100}}, session=session)

        2. Consistency

 A transaction must take the database from one valid state to another, following rules like schema validation, constraints, and indexes.

 In MongoDB:

If your schema requires balance >= 0, MongoDB ensures that no transaction can leave the database in an invalid state.

3. Isolation

Transactions should be independent of each other.

One transaction’s intermediate changes are not visible to others until committed.

Prevents dirty reads, non-repeatable reads, and phantom reads (depending on isolation level).

In MongoDB:

Default isolation level is “read committed”, so uncommitted changes are never visible.

4. Durability

Once a transaction is committed, the changes are permanent, even if there is a crash.

MongoDB ensures this via write-ahead logging (WiredTiger storage engine).

Data is written to disk before the commit is acknowledged.


**Q11.** What is the purpose of MongoDB’s explain() function?

**Ans:** The explain() function in MongoDB is used to analyze and understand how a query is executed by the database.
It helps developers and DBAs optimize queries by showing whether indexes are being used or if a full collection scan (COLLSCAN) is happening.

What it shows:

Query Execution Plan → How MongoDB processes the query internally.

Index Usage → Whether an index is being used or not.

Execution Stats → Number of documents scanned vs returned.

Performance Insights → Helps in optimizing queries for speed.

Example

Suppose you have a collection Orders and you run:

db.Orders.find({ Region: "West" }).explain("executionStats")

Key Modes of explain()

"queryPlanner" → Shows planned strategy for the query.

"executionStats" → Shows real execution performance.

"allPlansExecution" → Shows all considered plans with performance details.

**Q12.** How does MongoDB handle schema validation?

**ans:** 1. Schema-less by Default

Collections can store documents with different fields and structures.

Example:

{ "name": "Alice", "age": 25 }
{ "username": "bob123", "email": "bob@test.com" }


Both documents can exist in the same collection.

2. JSON Schema Validation

MongoDB uses the $jsonSchema validator to define rules for fields, data types, and constraints.
You can set this when creating or updating a collection.

Example:
db.createCollection("Orders", {
  validator: {
    $jsonSchema: {
      bsonType: "object",
      required: ["OrderID", "CustomerName", "Sales"],
      properties: {
        OrderID: { bsonType: "int", description: "Must be an integer" },
        CustomerName: { bsonType: "string", description: "Must be a string" },
        Sales: { bsonType: "double", minimum: 0, description: "Must be ≥ 0" }
      }
    }
  },
  validationLevel: "strict",   // enforce for all inserts/updates
  validationAction: "error"    // reject invalid documents
})

3. Validation Options

validationLevel

"off" → no validation

"moderate" → validates only modified fields

"strict" → validates all documents

validationAction

"warn" → allows invalid docs but logs a warning

"error" → blocks invalid docs

4. Examples

 Insert valid data:


db.Orders.insertOne({ OrderID: 101, CustomerName: "Alice", Sales: 500.5 })


 Insert invalid data:

db.Orders.insertOne({ OrderID: "ABC", Sales: -50 })


Will fail due to type mismatch and negative value.

Summary

MongoDB is schema-less by default but supports schema validation.

Validation is done using JSON Schema rules.

Ensures data integrity and consistency without losing flexibility.

**Q13.** What is the difference between a primary and a secondary node in a replica set?

**Ans:-** The primary node is the main server where all the write operations happen. Whenever you insert, update, or delete data, it always goes to the primary first. By default, your application also reads from the primary to make sure the data is the most up-to-date.

The secondary nodes are like backup servers. They don’t accept direct writes. Instead, they continuously copy the changes (via something called the oplog, or operation log) from the primary so that they always stay in sync. Secondaries can serve read requests if you configure your app to read from them — this helps distribute the load.

**Q14.** What security mechanisms does MongoDB provide for data protection?

**Ans:** Security Mechanisms in MongoDB

Authentication

Ensures that only authorized users can access the database.

MongoDB supports multiple authentication methods:

SCRAM (Salted Challenge Response Authentication Mechanism) → default username/password.

x.509 Certificates → for SSL/TLS-based client authentication.

LDAP & Kerberos → integration with enterprise identity systems.

Authorization (Role-Based Access Control - RBAC)

Once a user is authenticated, MongoDB uses roles to define what operations they can perform.

For example:

read → can only read data.

readWrite → can read and write.

dbAdmin → manage indexes and stats.

root → full access.

Encryption in Transit (TLS/SSL)

All communication between clients and servers can be encrypted with TLS/SSL, preventing eavesdropping or man-in-the-middle attacks.

Encryption at Rest

MongoDB Enterprise and Atlas provide Encrypted Storage Engines, so data stored on disk is encrypted automatically.

Uses AES-256 encryption.

Auditing

MongoDB can record who accessed what and when, providing accountability.

Helps meet compliance requirements (e.g., GDPR, HIPAA).

Network Security

Supports IP whitelisting and binding MongoDB to specific IPs.

You can configure firewalls, VPC peering, and private endpoints in Atlas to isolate your database from the public internet.

Field-Level and Client-Side Encryption (FLE / CSFLE)

Allows you to encrypt specific fields (like credit card numbers or SSNs).

Data is encrypted on the client side and MongoDB never sees the plain text.

**Q15.** Explain the concept of embedded documents and when they should be used.

**Ans:**  Embedded Documents in MongoDB

In MongoDB, instead of storing related data in multiple collections (like tables in SQL), you can embed documents inside a parent document.

This means you store nested data directly within a single document, using BSON (Binary JSON).

 Example:

Suppose you have a user with multiple addresses.

Using Embedded Documents (preferred in MongoDB):

{
  "name": "Alice",
  "email": "alice@example.com",
  "addresses": [
    {
      "type": "home",
      "city": "New York",
      "zip": "10001"
    },
    {
      "type": "work",
      "city": "Boston",
      "zip": "02115"
    }
  ]
}


 Here, the addresses are embedded documents inside the user document.

Using Referenced Documents (alternative):

// User document

{
  "name": "Alice",
  "email": "alice@example.com",
  "address_ids": [101, 102]
}

// Address collection
{
  "_id": 101,
  "type": "home",
  "city": "New York",
  "zip": "10001"
}
{
  "_id": 102,
  "type": "work",
  "city": "Boston",
  "zip": "02115"
}


 Here, addresses are stored separately, and we use references (like foreign keys in SQL).

 When to Use Embedded Documents

 Use embedded documents when:

Data is accessed together frequently

Example: A blog post with its comments.

Fetching one document gives everything in one query.

One-to-few relationships

Example: A customer with 2-3 phone numbers.

Data has strong ownership

Example: An order and its order items (items only exist with the order).

Improves read performance

Because no joins are needed.

 Avoid embedding when:

Data grows too large (MongoDB document size limit = 16 MB).

Example: Millions of comments for a viral post → better to use a separate collection.

Relationship is many-to-many or highly dynamic.

Example: Users and groups → better to use references.

**Q16.** What is the purpose of MongoDB’s $lookup stage in aggregation?

**ans:** Purpose of MongoDB’s $lookup Stage in Aggregation

The $lookup stage in MongoDB’s aggregation pipeline is used to perform a left outer join between documents in the same database.

It allows you to combine data from two different collections into a single result, similar to an SQL JOIN.

Syntax:
{
  $lookup: {
    from: "other_collection",     // The collection to join with
    localField: "field_in_current_collection",  
    foreignField: "field_in_other_collection",  
    as: "output_array_field"      // Name of the new array field for joined docs
  }
}

 Example:

Suppose we have two collections:

orders collection

{ "_id": 1, "item": "laptop", "customer_id": 101 }
{ "_id": 2, "item": "phone", "customer_id": 102 }
{ "_id": 3, "item": "tablet", "customer_id": 101 }


customers collection

{ "_id": 101, "name": "Alice", "city": "New York" }
{ "_id": 102, "name": "Bob", "city": "Boston" }

 Using $lookup:
db.orders.aggregate([
  {
    $lookup: {
      from: "customers",
      localField: "customer_id",
      foreignField: "_id",
      as: "customer_details"
    }
  }
])

Output:
{
  "_id": 1,
  "item": "laptop",
  "customer_id": 101,
  "customer_details": [
    { "_id": 101, "name": "Alice", "city": "New York" }
  ]
}
{
  "_id": 2,
  "item": "phone",
  "customer_id": 102,
  "customer_details": [
    { "_id": 102, "name": "Bob", "city": "Boston" }
  ]
}
{
  "_id": 3,
  "item": "tablet",
  "customer_id": 101,
  "customer_details": [
    { "_id": 101, "name": "Alice", "city": "New York" }
  ]
}

 Key Points about $lookup

Works like a LEFT OUTER JOIN (keeps documents from the left even if no match is found).

Stores joined documents inside an array field (as).

Helps avoid multiple queries by combining data in one pipeline.

Can be combined with $unwind if you want to flatten the joined array.

**Q17.** What are some common use cases for MongoDB?

**ans:** Content Management Systems (CMS)

Websites, blogs, and media platforms store articles, videos, images, and metadata.

MongoDB’s flexible schema allows different content types to coexist without rigid table structures.
 Example: A news website storing articles with different formats (text, video, gallery).

 E-commerce Applications

Stores product catalogs, user profiles, orders, and inventory.

Products may have varying attributes (color, size, specifications), which fit well with MongoDB’s schema-less design.
 Example: Amazon-like product catalog with millions of items.

 Real-Time Analytics

Applications requiring high-speed inserts and querying, like IoT devices or financial dashboards.

MongoDB’s aggregation framework supports fast, complex data analysis.
 Example: Stock trading dashboards, IoT sensor monitoring.

 Social Media Platforms

Store user profiles, posts, comments, likes, and connections.

Data is highly relational but also unstructured (images, text, videos, etc.).
 Example: Facebook-like friend connections or Twitter-like feeds.

 Gaming Applications

Store user progress, in-game assets, leaderboards, and event logs.

Flexible schema makes it easy to track varying types of game data.
 Example: Multiplayer online games storing user profiles and live scores.

Mobile Applications

Mobile apps need scalable and flexible backends.

MongoDB integrates well with cloud (via MongoDB Atlas) for syncing across devices.

 Example: Food delivery apps (Zomato, UberEats) storing orders and live tracking.

 Big Data & Machine Learning

Handles large volumes of unstructured/semi-structured data.

Often used as a data lake before transformation into ML pipelines.
Example: Collecting and preprocessing customer behavior data for recommendation systems.

IoT (Internet of Things)

Devices generate time-series data.

MongoDB handles high-write throughput and supports geospatial queries.
Example: Smart home devices, connected cars storing telemetry.

 Healthcare Applications

Storing patient records, medical images, prescriptions in a flexible way.

Schema-less storage helps adapt to evolving healthcare data standards.
Example: Electronic health record (EHR) systems.

 Why MongoDB is chosen:

Schema flexibility → Handle structured & unstructured data.

Horizontal scalability → Sharding for big data.

High availability → Replica sets ensure fault tolerance.

Rich queries → Aggregation framework supports analytics.

**Q18.** What are the advantages of using MongoDB for horizontal scaling?

**ans:** Advantages of Using MongoDB for Horizontal Scaling

Sharding Support (Built-in Horizontal Scaling)

MongoDB natively supports sharding, which distributes data across multiple servers.

Each shard stores a subset of data, and queries are intelligently routed.
 Example: An e-commerce app with millions of products can spread data across shards (e.g., by product category).

Handles Large Datasets Easily

With horizontal scaling, MongoDB can manage terabytes to petabytes of data.

Instead of one huge server, many smaller servers share the load.

Improved Read and Write Throughput

Since data is distributed, multiple servers handle queries in parallel.

This means faster read and write performance, especially for high-traffic apps.
 Example: Social media apps where millions of users post, like, and comment simultaneously.

 Elastic Scalability

MongoDB allows you to add more shards anytime without downtime.

Useful for businesses with growing data and users.

 Cost Efficiency

Instead of buying one very powerful (and expensive) server, you can use multiple low-cost commodity servers.

MongoDB makes scaling more affordable.

 High Availability with Replication + Sharding

Shards are usually deployed with replica sets, meaning even if one server fails, others keep working.

This ensures fault tolerance + horizontal scaling together.

 Geographically Distributed Data

MongoDB sharding allows data to be stored close to users (e.g., US data in US servers, Europe data in EU servers).

Improves latency and user experience globally.

**Q19.** How do MongoDB transactions differ from SQL transactions?

**ans:**  Nature of Database

SQL (Relational DBs):

Built around ACID transactions from the start.

Multi-row and multi-table transactions are the norm.

MongoDB (NoSQL, Document DB):

Originally designed for single-document atomicity (each document update was already atomic).

Multi-document transactions were added later (MongoDB 4.0+) to support complex use cases.

 Scope of Transactions

SQL: Transactions can span multiple tables and rows easily.

MongoDB: Transactions can span multiple documents and collections, but performance may degrade for very large, multi-document operations.

 Example:

SQL → Transfer money across two accounts (two rows in two tables).

MongoDB → Transfer money across two users (two documents, possibly in different collections).

 Performance Impact

SQL: Optimized for transactions; heavy use is common.

MongoDB: Transactions are more costly, since MongoDB is optimized for single-document operations. Overusing transactions can reduce performance.

Data Model Differences

SQL: Normalized model → data spread across multiple tables → transactions often required.

MongoDB: Denormalized model → related data often stored in a single document → fewer transactions needed.

Use Cases

SQL: Banking systems, inventory management → heavy reliance on transactions.

MongoDB: Real-time apps, content management, IoT → transactions used only when truly necessary.

**Q20.** What are the main differences between capped collections and regular collections?

**Ans:** Storage Size

Capped Collection:

Has a fixed size defined at creation.

Once full, it automatically overwrites the oldest documents (like a circular buffer).

Regular Collection:

Grows dynamically as you insert data.

No fixed size unless explicitly limited by disk space.

Document Deletion

Capped Collection:

You cannot delete individual documents (except by dropping the entire collection).

Old documents are automatically removed when space is needed.

Regular Collection:

You can freely insert, update, and delete documents.

 Document Order

Capped Collection:

Maintains documents in the insertion order.

Queries return documents in the order they were added.

Regular Collection:

No guaranteed order unless you explicitly use sorting in queries.

 Use Cases

Capped Collection:

Logging, real-time analytics, sensor/IoT data, caching — where you only care about the most recent data.

Regular Collection:

General-purpose use cases — transactions, e-commerce, user profiles, etc.

Performance

Capped Collection:

Faster inserts and reads because MongoDB doesn’t need to move documents around (fixed space, no fragmentation).

Regular Collection:

Inserts/updates may cause fragmentation and require more overhead.

**Q21. ** What is the purpose of the $match stage in MongoDB’s aggregation pipeline?

**ans:** he $match stage is used to filter documents in the aggregation pipeline, similar to the WHERE clause in SQL.

It only passes documents that match the given condition(s) to the next stage in the pipeline.

This reduces the number of documents processed in later stages, making the pipeline faster and more efficient.

**Q23.**How can you secure access to a MongoDB database?

**ans:** Ways to Secure Access to MongoDB
1. Enable Authentication & Authorization

By default, MongoDB might allow connections without a username/password.

Create admin users and enforce role-based access control (RBAC).

mongo
> use admin
> db.createUser({
    user: "adminUser",
    pwd: "StrongPassword123",
    roles: [ { role: "userAdminAnyDatabase", db: "admin" } ]
})


Start MongoDB with authentication enabled:

mongod --auth --port 27017 --dbpath /data/db

2. Use Strong Passwords and Roles

Assign the minimum necessary roles (principle of least privilege).

Example: Give a reporting app only read access, not write/delete.

3. Enable Network Security

Bind MongoDB to specific IP addresses (avoid 0.0.0.0).

# mongod.conf
net:
  bindIp: 127.0.0.1,192.168.1.100


Use firewalls (UFW, iptables, security groups in cloud) to allow only trusted IPs.

4. Use TLS/SSL Encryption

Encrypt traffic between clients and MongoDB servers to prevent sniffing.

Example (in mongod.conf):

net:
  ssl:
    mode: requireSSL
    PEMKeyFile: /etc/ssl/mongodb.pem

5. Encrypt Data at Rest

Use Encrypted Storage Engine (WiredTiger with encryption) or disk-level encryption.

6. Enable Auditing & Logging

Turn on MongoDB auditing to track who accessed what.

Regularly review logs for suspicious activity.

7. Use Replica Sets and Authentication Between Nodes

For replica sets/sharded clusters, enable internal authentication using keyfiles or x.509 certificates so only trusted nodes can join the cluster.

8. Regular Updates & Patches

Always keep MongoDB updated to the latest stable version to fix security vulnerabilities.

9. MongoDB Atlas Security (if using cloud)

IP whitelisting

Built-in authentication

TLS enabled by default

Encryption at rest and in transit

**Q21.** What is MongoDB’s WiredTiger storage engine, and why is it important?

**ans:** What is it?

WiredTiger is the default storage engine in MongoDB since version 3.2.

A storage engine is the component that manages how data is stored, compressed, and retrieved from disk.

Before WiredTiger, MongoDB used the MMAPv1 storage engine, which was less efficient.

🔹 Why is WiredTiger important?

Document-Level Concurrency

Unlike MMAPv1 (which locked entire collections), WiredTiger supports document-level locking.

This means multiple operations can happen simultaneously on different documents → better performance for high-concurrency applications.

Compression for Efficiency

WiredTiger supports snappy and zlib compression.

This reduces disk space usage and improves read/write efficiency.

High Performance

Faster read/write throughput thanks to modern caching and concurrency control.

Uses a write-ahead log (WAL) for crash recovery.

In-Memory and On-Disk Balancing

WiredTiger maintains a large cache in memory for frequently accessed data.

This speeds up reads while still writing changes safely to disk.

Transactions Support

Starting from MongoDB 4.0, WiredTiger enables multi-document ACID transactions, making MongoDB behave more like traditional SQL databases for critical apps.

Checkpoints & Crash Recovery

WiredTiger periodically writes checkpoints to disk.

In case of a crash, MongoDB can quickly recover using the journal + checkpoints.

Scalability

Optimized for multi-core CPUs and large memory systems.

Makes MongoDB scale better in modern hardware and cloud environments.

🔹 Real-World Example

Imagine you’re building an e-commerce app with thousands of users placing orders at the same time.

With MMAPv1 → if one user updates their order, the entire collection might be locked, slowing down everyone else.

With WiredTiger → only the specific document being updated is locked, so other users can still browse products, update carts, and place orders simultaneously → smoother user experience.