#Mongo DB Theoretical Questions

---

#### **1. What are the key differences between SQL and NoSQL databases ?**

**Answer:**
SQL databases are relational and use structured query language (SQL) to manage data in predefined schemas (tables with fixed columns and types). NoSQL databases like MongoDB are non-relational, use flexible document-based models, and are better suited for unstructured or semi-structured data. SQL excels in consistency and complex joins, while NoSQL offers scalability, high performance, and schema flexibility, ideal for modern, large-scale applications.

---

#### **2. What makes MongoDB a good choice for modern applications ?**

**Answer:**
MongoDB supports a flexible schema, allowing dynamic and nested documents, which aligns well with agile development and evolving data structures. It handles large volumes of data efficiently, scales horizontally via sharding, and supports powerful features like indexing, aggregation, replication, and transactions. These qualities make it ideal for cloud-native, real-time, and big data applications.

---

#### **3. Explain the concept of collections in MongoDB.**

**Answer:**
In MongoDB, a collection is analogous to a table in SQL but more flexible. It holds multiple documents (records) with varying structures. Collections do not enforce a schema, allowing documents with different fields. This enables rapid development and easy iteration. Collections are created automatically when the first document is inserted and can be indexed and queried efficiently.

---

#### **4. How does MongoDB ensure high availability using replication ?**

**Answer:**
MongoDB uses replica sets to achieve high availability. A replica set is a group of mongod instances that maintain the same dataset. It includes a primary node (which handles writes) and secondary nodes (which replicate data from the primary). If the primary fails, an automatic election promotes a secondary to primary, ensuring minimal downtime.

---

#### **5. What are the main benefits of MongoDB Atlas ?**

**Answer:**
MongoDB Atlas is a fully managed cloud database service. It offers automated backups, real-time monitoring, global distribution, scalability, and security features like encryption and access control. Atlas removes the burden of infrastructure management, making it easy to deploy, scale, and maintain MongoDB clusters in AWS, Azure, or GCP.

---

#### **6. What is the role of indexes in MongoDB, and how do they improve performance ?**

**Answer:**
Indexes in MongoDB enhance query performance by allowing fast data retrieval without scanning every document. They function similarly to book indexes. Common types include single-field, compound, text, and geospatial indexes. Without indexes, MongoDB must perform full collection scans, which are slower for large datasets.

---

#### **7. Describe the stages of the MongoDB aggregation pipeline.**

**Answer:**
The aggregation pipeline processes data through multiple stages, each transforming the input documents. Key stages include:


- **match** : filters documents
- **group** : groups documents and performs calculations
- **project** : reshapes documents
- **sort**, **limit**, **skip** : control output order and size
- **lookup** : joins collections

This allows powerful, SQL-like analytical processing on MongoDB data.

---

#### **8. What is sharding in MongoDB? How does it differ from replication ?**

**Answer:**
Sharding splits large datasets across multiple servers (shards) to enable horizontal scaling. Each shard holds a portion of the data. It improves performance and storage capacity. Replication, by contrast, copies data across nodes for redundancy and availability. Sharding spreads the load; replication ensures reliability.

---

#### **9. What is PyMongo, and why is it used ?**

**Answer:**
PyMongo is the official Python driver for MongoDB. It allows Python applications to interact with MongoDB databases by providing methods to perform CRUD operations, aggregations, indexing, and connection management. It's essential for building Python-based data pipelines and applications that use MongoDB.

---

#### **10. What are the ACID properties in the context of MongoDB transactions ?**

**Answer:**
MongoDB supports ACID (Atomicity, Consistency, Isolation, Durability) properties in multi-document transactions starting from version 4.0. This ensures that either all operations in a transaction succeed or none do (atomicity), maintaining data integrity. It enables use cases like financial systems where data consistency is critical.

---

#### **11. What is the purpose of MongoDB’s explain() function ?**

**Answer:**
The **explain()** function analyzes query performance by showing how MongoDB executes a query. It details stages like collection scans, index use, number of documents scanned, and query execution time. This helps developers optimize queries and understand why certain operations may be slow.

---

#### **12. How does MongoDB handle schema validation ?**

**Answer:**
While MongoDB is schema-less by default, it supports schema validation using JSON Schema. Validation rules can be defined at the collection level, enforcing required fields, data types, or field patterns. This offers flexibility with control, ensuring data integrity without rigid schemas.

---

#### **13. What is the difference between a primary and a secondary node in a replica set ?**

**Answer:**
The primary node in a replica set handles all write operations. Secondary nodes replicate the primary's data and can serve read operations (if enabled). If the primary fails, an automatic election promotes a secondary to become the new primary, maintaining high availability.

---

#### **14. What security mechanisms does MongoDB provide for data protection ?**

**Answer:**
MongoDB offers multiple security features:

* Authentication (SCRAM, LDAP)
* Role-based access control (RBAC)
* Encryption at rest and in transit (TLS/SSL)
* Auditing and IP whitelisting
* Field-level encryption
  These help ensure that data is accessed and modified only by authorized users.

---

#### **15. Explain the concept of embedded documents and when they should be used.**

**Answer:**
Embedded documents are nested documents stored inside a parent document. They enable fast read/write operations by keeping related data together. They are ideal for one-to-few relationships (e.g., a blog post with comments). However, they can increase document size and complexity if used excessively.

---

#### **16. What is the purpose of MongoDB’s $lookup stage in aggregation ?**

**Answer:**
The **$lookup** stage performs a left outer join between collections, allowing you to combine documents based on matching fields. It's useful for denormalized joins like fetching user details from a **users** collection for each **order** in an **orders** collection — similar to SQL joins.

---

#### **17. What are some common use cases for MongoDB ?**

**Answer:**
MongoDB is ideal for:

* Real-time analytics
* Content management systems
* IoT applications
* Product catalogs
* Social media platforms
* Mobile and web apps
  Its flexibility and scalability make it suitable for handling rapidly changing and large-scale data.

---

#### **18. What are the advantages of using MongoDB for horizontal scaling ?**

**Answer:**
MongoDB supports horizontal scaling via sharding, which distributes data across multiple machines. This improves read/write throughput, manages large datasets efficiently, and avoids the limitations of vertical scaling (adding more CPU/RAM to one machine). It’s essential for handling big data workloads.

---

#### **19. How do MongoDB transactions differ from SQL transactions ?**

**Answer:**
MongoDB transactions are newer (introduced in v4.0) and support ACID guarantees for multi-document operations. SQL databases offer mature, default ACID transactions across multiple rows/tables. MongoDB transactions are slower than single-document operations and should be used when atomicity across documents is essential.

---

#### **20. What are the main differences between capped collections and regular collections ?**

**Answer:**
Capped collections are fixed-size, circular collections that overwrite the oldest data when full. They preserve insertion order and are useful for logging or caching. Regular collections grow dynamically and can be modified freely. Capped collections improve performance for append-only workloads.

---

#### **21. What is the purpose of the $match stage in MongoDB’s aggregation pipeline ?**

The **$match** stage filters documents based on specific conditions, similar to a SQL WHERE clause. It’s often used at the beginning of a pipeline to reduce the number of documents passed to subsequent stages, thereby improving performance and narrowing focus.

---

#### **22. How can you secure access to a MongoDB database ?**

**Answer:**
You can secure MongoDB by enabling authentication, using role-based access control, encrypting data in transit with TLS, setting firewall rules, enabling IP whitelisting, and regularly auditing access. MongoDB Atlas simplifies security configuration with built-in tools and alerts.

---

#### **23. What is MongoDB’s WiredTiger storage engine, and why is it important ?**

**Answer:**
WiredTiger is MongoDB’s default storage engine. It offers high concurrency, compression, and efficient memory usage. It supports document-level locking and checkpointing for durability. WiredTiger significantly improves write throughput and is crucial for scaling performance-intensive applications.

---

In [22]:
## Practical Questions
#1. Write a Python script to load the Superstore dataset from a CSV file into MongoDB.

# Step 1: Import required libraries
import pandas as pd
from mongita import MongitaClientMemory  # Use in-memory MongoDB-like DB

# Step 2: Load the CSV file with encoding fix
df = pd.read_csv('/content/superstore.csv', encoding='ISO-8859-1')

# Step 3: Create a Mongita client (in-memory)
client = MongitaClientMemory()
db = client["SuperstoreDB"]
collection = db["Orders"]

# Step 4: Insert data into the collection
data = df.to_dict(orient='records')
collection.insert_many(data)

print(f"Successfully inserted {len(data)} documents into the 'Orders' collection.")

In [None]:
#2. Retrieve and print all documents from the Orders collection.

for doc in collection.find().limit(5):
    print(doc)

In [None]:
#3. Count and display the total number of documents in the Orders collection.

count = collection.count_documents({})
print("Total number of documents:", count)

In [None]:
#4. Write a query to fetch all orders from the "West" region.

west_orders = collection.find({"Region": "West"})
for order in west_orders:
    print(order)

In [None]:
#5. Write a query to find orders where Sales is greater than 500.

high_sales = collection.find({"Sales": {"$gt": 500}})
for order in high_sales:
    print(order)

In [None]:
#6. Fetch the top 3 orders with the highest Profit.

top_profit_orders = sorted(collection.find(), key=lambda x: x.get("Profit", 0), reverse=True)[:3]
for order in top_profit_orders:
    print(order)

In [None]:
#7. Update all orders with Ship Mode as "First Class" to "Premium Class".

result = collection.update_many(
    {"Ship Mode": "First Class"},
    {"$set": {"Ship Mode": "Premium Class"}}
)
print(f"Updated {result.modified_count} documents.")

In [None]:
#8. Delete all orders where Sales is less than 50.

result = collection.delete_many({"Sales": {"$lt": 50}})
print(f"Deleted {result.deleted_count} documents.")

In [None]:
#9. Use aggregation to group orders by Region and calculate total sales per region.

from collections import defaultdict

sales_by_region = defaultdict(float)
for doc in collection.find().limit(5):
    region = doc["Region"]
    sales = doc.get("Sales", 0)
    sales_by_region[region] += sales

# Display the result
for region, total_sales in sales_by_region.items():
    print(f"Region: {region}, Total Sales: {round(total_sales, 2)}")

In [None]:
#10. Fetch all distinct values for Ship Mode from the collection.

distinct_ship_modes = set()
for doc in collection.find().limit(5):
    if "Ship Mode" in doc:
        distinct_ship_modes.add(doc["Ship Mode"])

print("Distinct Ship Modes:", list(distinct_ship_modes))

In [None]:
#11. Count the number of orders for each category.

from collections import Counter

category_counts = Counter()
for doc in collection.find().limit(5):
    category = doc.get("Category")
    if category:
        category_counts[category] += 1

for category, count in category_counts.items():
    print(f"Category: {category}, Order Count: {count}")