## Theoretical Questions and Answers

1. What are the key differences between SQL and NoSQL databases?- SQL databases are relational with fixed schema, support complex joins, and use structured query language. NoSQL databases are non-relational, schema-flexible, and use documents, key-value, graph, or columnar models which scale horizontally and handle unstructured data better.

2. What makes MongoDB a good choice for modern applications?
- MongoDB offers schema flexibility, high scalability, native JSON-like document format (BSON), built-in replication and sharding for high availability and big data, making it ideal for dynamic modern applications.

3. Explain the concept of collections in MongoDB.
- Collections are groups of MongoDB documents and are equivalent to tables in relational databases. They store related JSON-like documents without a fixed schema.

4. How does MongoDB ensure high availability using replication?
- MongoDB uses replica sets which consist of primary and secondary nodes. Data is asynchronously replicated from primary to secondaries ensuring data redundancy and automatic failover.

5. What are the main benefits of MongoDB Atlas?
- Atlas is MongoDB’s fully managed cloud service, providing automated backups, monitoring, scaling, and security, reducing operational overhead.

6. What is the role of indexes in MongoDB, and how do they improve performance?
- Indexes allow faster query execution by avoiding full collection scans, similar to indexes in SQL databases, optimized for various query types.

7. Describe the stages of the MongoDB aggregation pipeline.
- Stages include $match (filter), $group (aggregate), $sort, $project (reshape), $lookup (join), etc., chaining operations to transform and summarize data.

8. What is sharding in MongoDB? How does it differ from replication?
- Sharding horizontally partitions data across multiple servers for scalability. Replication duplicates data sets for redundancy and availability.

9. What is PyMongo, and why is it used?
- PyMongo is the Python driver for MongoDB, allowing you to connect to and interact with MongoDB databases from Python code.

10. What are the ACID properties in the context of MongoDB transactions?
- MongoDB supports multi-document transactions with ACID properties ensuring atomicity, consistency, isolation, and durability for operations across multiple documents or collections.

11. What is the purpose of MongoDB’s explain() function?
- It provides detailed information on how queries are executed, including index usage and query plan, helping optimize performance.

12. How does MongoDB handle schema validation?
- MongoDB supports optional schema validation via JSON Schema, allowing enforcement of document structure at collection level.

13. What is the difference between a primary and a secondary node in a replica set?
- The primary node receives all write operations; secondaries replicate data from the primary and can serve read queries depending on configuration.

14. What security mechanisms does MongoDB provide for data protection?
- Authentication, authorization (role-based access control), encryption at rest and in transit, auditing, and network segmentation.

15. Explain the concept of embedded documents and when they should be used.
- Embedded documents store related data within a parent document, reducing the need for joins and improving read performance for related data.

16. What is the purpose of MongoDB’s $lookup stage in aggregation?
- $lookup performs left outer joins between collections in aggregation pipelines, combining related data.

17. What are some common use cases for MongoDB?
- Content management, real-time analytics, IoT applications, mobile apps, catalogues, and any use case needing schema flexibility.

18. What are the advantages of using MongoDB for horizontal scaling?
- Easy sharding across clusters, automatic balancing of data, and high availability make scaling seamless.

19. How do MongoDB transactions differ from SQL transactions?
- MongoDB added support for multi-document ACID transactions recently; traditional SQL databases have long supported these; MongoDB transactions work within replica sets and sharded clusters with limitations.

20. What are the main differences between capped collections and regular collections?
- Capped collections have fixed size, automatically overwrite oldest entries, and provide high-performance insertion; regular collections grow dynamically.

21. What is the purpose of the $match stage in MongoDB’s aggregation pipeline?
- Filters documents to pass only those matching specified conditions to next stage.

22. How can you secure access to a MongoDB database?
- Enforce authentication, use encryption, restrict IP access, implement role-based permissions.

23. What is MongoDB’s WiredTiger storage engine, and why is it important?
- Default storage engine since MongoDB 3.2. Supports document-level locking, compression, and multiversion concurrency control for performance and efficiency.



In [None]:
# 1. Write a Python script to load the Superstore dataset from a CSV file into MongoDB

import pymongo
import pandas as pd

client = pymongo.MongoClient("mongodb://127.0.0.1:27017/?directConnection=true&serverSelectionTimeoutMS=2000&appName=mongosh+2.5.6")
db = client["superstore_db"]
orders_col = db["Orders"]

# FIX: use correct encoding
df = pd.read_csv("Superstore.csv")  

data_dict = df.to_dict("records")
orders_col.insert_many(data_dict)
print(f"Inserted {len(data_dict)} documents into Orders collection.")


In [None]:
# 2. Retrieve and print all documents from the Orders collection.
for doc in orders_col.find():
    print(doc)


In [None]:
# 3. Count and display the total number of documents in the Orders collection.
print("Total documents:", orders_col.count_documents({}))


In [None]:
# 4. Write a query to fetch all orders from the "West" region.
for doc in orders_col.find({"Region": "West"}):
    print(doc)


In [None]:
# 5. Write a query to find orders where Sales is greater than 500.
for doc in orders_col.find({"Sales": {"$gt": 500}}):
    print(doc)


In [None]:
# 6. Fetch the top 3 orders with the highest Profit.
for doc in orders_col.find().sort("Profit", -1).limit(3):
    print(doc)


In [None]:
# 7. Update all orders with Ship Mode as "First Class" to "Premium Class".
orders_col.update_many(
    {"Ship Mode": "First Class"},
    {"$set": {"Ship Mode": "Premium Class"}}
)


In [None]:
# 8. Delete all orders where Sales is less than 50.
orders_col.delete_many({"Sales": {"$lt": 50}})


In [None]:
# 9. Use aggregation to group orders by Region and calculate total sales per region.
pipeline = [
    {"$group": {"_id": "$Region", "TotalSales": {"$sum": "$Sales"}}}
]
for doc in orders_col.aggregate(pipeline):
    print(doc)


In [None]:
# 10. Fetch all distinct values for Ship Mode from the collection.
print(orders_col.distinct("Ship Mode"))


In [None]:
# 11. Count the number of orders for each category.
pipeline = [
    {"$group": {"_id": "$Category", "Count": {"$sum": 1}}}
]
for doc in orders_col.aggregate(pipeline):
    print(doc)
