# MongoDB Assignment - Theoretical Questions

1. **Key differences between SQL and NoSQL databases**  
SQL databases store data in tables with rows and columns, while NoSQL databases like MongoDB store data in collections with documents that use a flexible JSON-like structure. SQL requires a fixed schema, whereas MongoDB allows dynamic schemas. SQL typically scales vertically by increasing server capacity, but MongoDB supports horizontal scaling through sharding. SQL offers strong ACID transactions by default, while MongoDB is more flexible and supports eventual consistency with optional ACID transactions.

2. **What makes MongoDB a good choice for modern applications**  
MongoDB is ideal for modern applications because it offers a flexible schema, high scalability, rich queries, built-in replication, and native JSON support, making it easier to handle dynamic and large-scale data.

3. **Concept of collections in MongoDB**  
A collection in MongoDB is a container for documents, similar to a table in SQL, but without a fixed schema, allowing different documents to have different structures.

4. **High availability using replication**  
MongoDB ensures high availability through replica sets, which have one primary node that handles writes and multiple secondary nodes that replicate data. If the primary fails, an automatic election promotes a secondary to primary.

5. **Benefits of MongoDB Atlas**  
MongoDB Atlas is a fully managed cloud service offering automated scaling, backups, monitoring, global distribution, and built-in security.

6. **Role of indexes in MongoDB**  
Indexes improve query performance by reducing the number of scanned documents. They can be single-field, compound, text, or geospatial. Without indexes, MongoDB must scan all documents to find results.

7. **Stages of aggregation pipeline**  
The MongoDB aggregation pipeline processes data through stages such as `$match` for filtering, `$group` for grouping and aggregations, `$project` for reshaping documents, `$sort` for sorting, `$limit` and `$skip` for pagination, and `$lookup` for joining collections.

8. **Sharding vs replication**  
Sharding distributes data across multiple servers to handle large datasets, improving scalability. Replication creates multiple copies of data for availability and fault tolerance.

9. **What is PyMongo**  
PyMongo is the official Python driver for MongoDB, enabling CRUD operations, aggregations, and administrative tasks from Python.

10. **ACID properties in MongoDB transactions**  
Atomicity ensures all operations complete or none do. Consistency ensures data rules are maintained. Isolation ensures concurrent transactions don’t interfere. Durability ensures committed data remains permanent even after failures.

11. **Purpose of explain()**  
The `explain()` function reveals how MongoDB will execute a query, including index usage and execution plan, helping optimize performance.

12. **Schema validation**  
MongoDB supports JSON Schema validation to enforce structure and rules for documents, either blocking or warning when invalid data is inserted.

13. **Primary vs secondary node**  
A primary node handles all writes and replicates data to secondaries, which are read-only unless configured for reads.

14. **Security mechanisms in MongoDB**  
MongoDB offers authentication, role-based access control, TLS/SSL encryption, field-level encryption, IP whitelisting, and audit logging.

15. **Embedded documents**  
Embedded documents store related data inside a single document and are best used when related data is frequently accessed together.

16. **Purpose of $lookup**  
The `$lookup` stage joins documents from another collection, similar to a left join in SQL.

17. **Common use cases of MongoDB**  
MongoDB is used for content management, IoT applications, e-commerce catalogs, real-time analytics, and mobile or gaming apps.

18. **Advantages for horizontal scaling**  
MongoDB can horizontally scale by sharding, allowing large datasets to be distributed across multiple servers without relying on expensive vertical scaling.

19. **MongoDB vs SQL transactions**  
MongoDB transactions work with documents and can span multiple collections and shards, while SQL transactions work with rows and are ACID by default.

20. **Capped vs regular collections**  
Capped collections have a fixed size and automatically overwrite the oldest data when full, while regular collections grow as needed and require explicit deletions.

21. **Purpose of $match**  
The `$match` stage filters documents early in the aggregation pipeline to improve performance and reduce unnecessary processing.

22. **Securing MongoDB**  
Secure MongoDB by enabling authentication, using role-based permissions, restricting network access, enabling encryption, and applying audit logs.

23. **WiredTiger storage engine**  
WiredTiger is MongoDB’s default storage engine, providing document-level locking, compression, and improved concurrency for better performance.


In [None]:

from pymongo import MongoClient
import pandas as pd
from google.colab import files

# Upload CSV
uploaded = files.upload()

## 1. Load Superstore dataset into MongoDB

In [None]:
# 1️⃣ Install PyMongo
!pip install pymongo

# 2️⃣ Imports
from pymongo import MongoClient, errors
import urllib.parse  # For password encoding

# 3️⃣ User inputs
username = ""   # From Atlas Database Access
password = "your_password_here"   # From Atlas Database Access
cluster_url = "cluster0.xxxxx.mongodb.net"  # From Atlas cluster connect string

# 4️⃣ Encode password for URL safety
encoded_password = urllib.parse.quote(password)

# 5️⃣ Build URI (with authSource=admin)
uri = f"mongodb+srv://{username}:{encoded_password}@{cluster_url}/?retryWrites=true&w=majority&authSource=admin"

# 6️⃣ Try connecting
try:
    client = MongoClient(uri, serverSelectionTimeoutMS=5000)
    print("Pinging MongoDB server...")
    print(client.admin.command("ping"))
    print("✅ Connected to MongoDB Atlas!")
except errors.OperationFailure as err:
    print("❌ Authentication failed! Check username, password, and roles in Atlas.")
    print("Detailed error:", err)
except errors.ServerSelectionTimeoutError as err:
    print("❌ Cannot reach MongoDB Atlas! Check your IP whitelist in Atlas.")
    print("Detailed error:", err)


## 2. Retrieve all documents

In [None]:
for doc in orders.find():
    print(doc)


## 3. Count total documents

In [None]:
print("Total orders:", orders.count_documents({}))


## 4. Orders from 'West' region

In [None]:
for doc in orders.find({"Region": "West"}):
    print(doc)


## 5. Orders where Sales > 500

In [None]:
for doc in orders.find({"Sales": {"$gt": 500}}):
    print(doc)


## 6. Top 3 orders with highest Profit

In [None]:
for doc in orders.find().sort("Profit", -1).limit(3):
    print(doc)


## 7. Update Ship Mode

In [None]:
orders.update_many({"Ship Mode": "First Class"}, {"$set": {"Ship Mode": "Premium Class"}})
print("Ship Mode updated.")


## 8. Delete orders where Sales < 50

In [None]:
orders.delete_many({"Sales": {"$lt": 50}})
print("Orders deleted where Sales < 50.")


## 9. Group orders by Region & total sales

In [None]:
pipeline = [
    {"$group": {"_id": "$Region", "TotalSales": {"$sum": "$Sales"}}}
]
for doc in orders.aggregate(pipeline):
    print(doc)


## 10. Distinct Ship Modes

In [None]:
print(orders.distinct("Ship Mode"))


## 11. Count orders per category

In [None]:
pipeline = [
    {"$group": {"_id": "$Category", "Count": {"$sum": 1}}}
]
for doc in orders.aggregate(pipeline):
    print(doc)
