# MongoDB Assignment

# Theoretical Questions

1. What are the key differences between SQL and NoSQL databases?
- SQL: Structured tables, fixed schema, relational, vertical scaling

- NoSQL: Flexible schema, document-based/key-value, horizontal scaling

2. What makes MongoDB a good choice for modern applications?
- Schema flexibility
- High scalability
- Fast performance
- JSON-like document storage
- Easy integration with modern tech stacks

3. Explain the concept of collections in MongoDB.
- collection is a group of documents
- Similar to a table in SQL
- No fixed schema required
4. How does MongoDB ensure high availability using replication.
- Uses Replica Sets
- Multiple copies of data across nodes
- Automatic failover if primary fails

5. What are the main benefits of MongoDB Atlas?
- Fully managed cloud database
- Auto-scaling
- Backup & recovery
- High security
- Global deployment

6. What is the role of indexes in MongoDB, and how do they improve performance?
- Speed up query execution
- Reduce data scanning
- Improve read performance

7. Describe the stages of the MongoDB aggregation pipeline.
- $match : filter data
- $group : group documents
- $project : reshape fields
- $sort : sort results
- $lookup : join collections

8. What is sharding in MongoDB? How does it differ from replication?
- Sharding: Distributes data across servers (scalability)

- Replication: Copies data for fault tolerance

9. What is PyMongo, and why is it used?
-  Python driver for MongoDB

- Used to connect Python apps with MongoDB

10. What are the ACID properties in the context of MongoDB transactions?
- Atomicity – all or nothing

- Consistency – valid state

- Isolation – transactions don’t interfere

- Durability – data persists after commit

11. What is the purpose of MongoDB’s explain() function?
- Shows query execution plan
- Helps in performance optimization

12. How does MongoDB handle schema validation.
- Uses JSON Schema validation
- Enforces rules on documents

13. What is the difference between a primary and a secondary node in a replica set?
- Primary: Handles read & write

- Secondary: Read-only replicas

14. What security mechanisms does MongoDB provide for data protection?
- Authentication
- Role-based access control (RBAC)
- Encryption (at rest & in transit)
- Auditing

15. Explain the concept of embedded documents and when they should be used.
- Documents inside documents
- Used when data is tightly related
- Improves read performance
16. What is the purpose of MongoDB’s $lookup stage in aggregation?
- Performs join between collections
- Similar to SQL JOIN
17. What are some common use cases for MongoDB?
- Web & mobile apps
- Real-time analytics
- IoT
- Content management systems
18. What are the advantages of using MongoDB for horizontal scaling?
- Built-in sharding
- Handles large data volumes
- High performance across clusters

19. How do MongoDB transactions differ from SQL transactions.
- MongoDB supports multi-document transactions (since v4.0)
- SQL has stronger traditional transaction support

20. What are the main differences between capped collections and regular collections?
- Capped: Fixed size, FIFO, faster

- Regular: Flexible size, normal behavior

21. What is the purpose of the $match stage in MongoDB’s aggregation pipeline?
- Filters documents in aggregation
- Works like WHERE clause

22. How can you secure access to a MongoDB database.
- Enable authentication
- Use strong roles
- IP whitelisting
- SSL/TLS encryption

23. What is MongoDB’s WiredTiger storage engine, and why is it important?
- Default MongoDB engine
- Supports compression
- Document-level locking
- High performance & concurrency



# Practical Questions:-

In [5]:
!pip install pymongo


Collecting pymongo
  Downloading pymongo-4.16.0-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.metadata (10.0 kB)
Collecting dnspython<3.0.0,>=2.6.1 (from pymongo)
  Downloading dnspython-2.8.0-py3-none-any.whl.metadata (5.7 kB)
Downloading pymongo-4.16.0-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (1.7 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m21.7 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading dnspython-2.8.0-py3-none-any.whl (331 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m331.1/331.1 kB[0m [31m21.1 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: dnspython, pymongo
Successfully installed dnspython-2.8.0 pymongo-4.16.0


In [6]:
from pymongo import MongoClient
import pandas as pd

# Connect to MongoDB
client = MongoClient("mongodb://localhost:27017/")

# Create / use database
db = client["SuperstoreDB"]

# Create / use collection
orders = db["Orders"]

print("Connected to MongoDB")


Connected to MongoDB


In [None]:
# 1.Write a Python script to load the Superstore dataset from a CSV file into MongoDB.
import pandas as pd
from pymongo import MongoClient

client = MongoClient("mongodb://localhost:27017/")
db = client["SuperstoreDB"]
orders = db["Orders"]

# Load CSV
df = pd.read_csv("superstore.csv", encoding="latin1")

# Convert DataFrame to dictionary
data = df.to_dict(orient="records")

# Insert into MongoDB
orders.insert_many(data)

print("Data inserted successfully")


In [None]:
# 2.Retrieve and print all documents from the Orders collection.
for order in orders.find():
    print(order)

In [None]:
# 3.Count and display the total number of documents in the Orders collection.
count = orders.count_documents({})
print("Total documents:", count)

In [None]:
# 4.Write a query to fetch all orders from the "West" region.
west_orders = orders.find({"Region": "West"})

for order in west_orders:
    print(order)


In [None]:
#  5.Write a query to find orders where Sales is greater than 500.
high_sales = orders.find({"Sales": {"$gt": 500}})

for order in high_sales:
    print(order)


In [None]:
# 6.Fetch the top 3 orders with the highest Profit.
top_profit=orders.find().sort("Profit",-1).limit(3)

for order in top_profit:
  print(order)

In [None]:
# 7.Update all orders with Ship Mode as "First Class" to "Premium Class".
orders.update_many(
    {"Ship Mode":"First Class"},{"$set":{"Ship Mode":"Pemium Class"}}
)
Print("Ship Mode updated")

In [None]:
# 8.Delete all orders where Sales is less than 50.
orders.delete_many(
    {
        "sales":{"It":50}
    }
)
print("Low sales orders deleted")

In [None]:
# 9.Use aggregation to group orders by Region and calculate total sales per region.
pipeline=[
    {
        "$group":{
            "_id":"$Region",
            "TotalSales":{"$sum":"$Sales"}
        }
    }
]

result= orders.aggregate(pipeline)

for r in result:
  print(r)

In [None]:
# 10.Fetch all distinct values for Ship Mode from the collection.
ship_modes=orders.distinct("Ship Mode")
print("Ship Modes:", ship_modes)

In [None]:
# 11.Count the number of orders for each category.
pipeline=[
    {
        "$group":{
            "_id":"$Category",
            "OrderCount":{"$sum":1}
        }
    }
]

result=orders.aggregate(pipeline)
for r in result:
  print(r)
