1. Key Differences Between SQL and NoSQL Databases

Data Model: SQL uses structured tables with predefined schemas; NoSQL stores data in flexible formats (document, key-value, graph, columnar).

Scalability: SQL is usually vertically scalable; NoSQL supports horizontal scaling.

Schema: SQL requires fixed schema; NoSQL is schema-less or schema-flexible.

Transactions: SQL follows strict ACID compliance; NoSQL often prioritizes availability and scalability (BASE model), though modern NoSQL (like MongoDB) supports ACID transactions.

Query Language: SQL uses SQL syntax; NoSQL uses database-specific query APIs (e.g., MongoDB Query Language - MQL).

2. Why MongoDB Is a Good Choice for Modern Applications

Flexible schema for evolving data structures.

Horizontal scalability with sharding.

Built-in high availability via replication.

Rich query language and powerful aggregation pipeline.

Cloud-native integration (MongoDB Atlas).

JSON-like BSON format integrates naturally with modern applications.

3. Concept of Collections in MongoDB

A collection is a group of MongoDB documents, similar to a table in SQL.

Collections do not enforce a fixed schema, allowing documents to have varying fields.

They store BSON (binary JSON) documents for efficient storage and retrieval.

4. High Availability in MongoDB via Replication

Achieved through Replica Sets — multiple copies of data across servers.

One primary node handles writes, multiple secondary nodes replicate data.

Automatic failover: if the primary fails, a secondary is promoted to primary.

5. Benefits of MongoDB Atlas

Fully managed cloud database.

Automatic scaling (both storage and performance).

Built-in backup and restore options.

Global distribution and multi-region replication.

Security features like encryption, IP whitelisting, and role-based access.

6. Role of Indexes in MongoDB & Performance Improvement

Purpose: Speed up data retrieval by avoiding full collection scans.

Indexes store small, ordered subsets of data fields for faster searches.

Common types: Single-field, compound, multikey, text, geospatial indexes.

7. Stages of the MongoDB Aggregation Pipeline

$match: Filters documents.

$group: Groups data by specified keys.

$project: Shapes the output documents.

$sort: Orders documents.

$limit / $skip: Controls pagination.

$lookup: Performs joins between collections.

$unwind: Deconstructs arrays into separate documents.

8. Sharding in MongoDB & Difference from Replication

Sharding: Splitting data across multiple servers (shards) for horizontal scaling.

Replication: Copying data across multiple servers for high availability.

Difference: Sharding distributes different parts of the dataset; replication duplicates the same dataset.

9. What is PyMongo & Why Used

Official Python driver for MongoDB.

Allows Python applications to connect, query, and manage MongoDB databases.

Supports CRUD operations, aggregation, and transactions.

10. ACID Properties in MongoDB Transactions

Atomicity: All operations in a transaction succeed or fail as a whole.

Consistency: Transactions leave the database in a valid state.

Isolation: Transactions are isolated from others until committed.

Durability: Committed data is permanently stored.

11. Purpose of MongoDB’s explain() Function

Shows how MongoDB executes a query.

Displays index usage, execution stages, and performance metrics.

Helps in query optimization.

12. Schema Validation in MongoDB

Allows defining rules for document structure using JSON Schema.

Can enforce data types, required fields, and value constraints.

Prevents invalid data from being inserted/updated.

13. Difference Between Primary and Secondary Nodes in Replica Sets

Primary: Handles all write operations and reads (if allowed).

Secondary: Copies data from primary, used for read scaling and failover.

14. Security Mechanisms in MongoDB

Authentication (SCRAM, LDAP, x.509).

Authorization via role-based access control (RBAC).

TLS/SSL for encrypted connections.

Encryption at rest.

IP whitelisting and network access controls.

15. Embedded Documents & When to Use

Storing related data inside a single document instead of separate collections.

Ideal when data is frequently accessed together and has a 1:1 or 1:many relationship.

Reduces need for joins, improves read performance.

16. Purpose of $lookup Stage in Aggregation

Performs a left outer join between two collections.

Useful for combining related data without embedding.

17. Common Use Cases for MongoDB

Real-time analytics.

Content management systems.

IoT and sensor data storage.

Catalog and inventory management.

Social media applications.

18. Advantages of MongoDB for Horizontal Scaling

Built-in sharding support.

Distributes data across servers, avoiding bottlenecks.

Supports adding/removing nodes dynamically.

19. How MongoDB Transactions Differ from SQL Transactions

MongoDB’s transactions are multi-document and introduced in v4.0+.

SQL transactions are table/row-based; MongoDB’s are document-based.

MongoDB transactions have more network overhead due to distributed nature.

20. Differences Between Capped Collections & Regular Collections

Capped: Fixed size, FIFO document replacement, high write throughput.

Regular: Unlimited growth, no automatic overwrite.

21. Purpose of $match Stage in Aggregation Pipeline

Filters documents at the start of the pipeline.

Similar to SQL WHERE clause.

22. Securing Access to MongoDB

Enable authentication & role-based access.

Use TLS/SSL.

Restrict network access.

Enable auditing.

Use strong passwords and encryption at rest.

23. MongoDB’s WiredTiger Storage Engine & Importance

Default storage engine since MongoDB 3.2.

Supports document-level concurrency control.

Provides compression (snappy/zlib) for efficient storage.

Better performance for reads/writes compared to older MMAPv1 engine.

In [None]:
#Write a Python script to load the Superstore dataset from a CSV file into MongoDB
import pandas as pd
from pymongo import MongoClient

# MongoDB connection (Update if using a remote server)
client = MongoClient("mongodb://localhost:27017/")

# Connect to the 'mongo' database and 'superstore_data' collection
db = client["Mongo"]
collection = db["Superstore_Data"]

# Load CSV into Pandas DataFrame
csv_file = "superstore.csv"  # Update with the correct path
df = pd.read_csv(csv_file, encoding="ISO-8859-1")

# Convert DataFrame to a list of dictionaries
data = df.to_dict(orient="records")

# Insert data into MongoDB
if data:
    collection.insert_many(data)
    print(f" Successfully inserted {len(data)} records into MongoDB!")
else:
    print("No data found in CSV.")

# Verify insertion
print("Sample Record from MongoDB:")
print(collection.find_one())

In [None]:
# Load CSV file
df = pd.read_csv('superstore.csv',encoding="ISO-8859-1")

In [None]:
# Connect to MongoDB
client = MongoClient("mongodb://localhost:27017/")
db = client["SuperstoreDB"]
orders_collection = db["Orders"]

In [None]:
# Insert data into MongoDB
orders_collection.insert_many(df.to_dict(orient='records'))

In [None]:
# Retrieve and print all documents
for order in orders_collection.find():
    print(order)

In [None]:
# Count and display the total number of documents
total_orders = orders_collection.count_documents({})
print("Total number of orders:", total_orders)

In [None]:
# Fetch all orders from the "West" region
west_orders = orders_collection.find({"Region": "West"})
for order in west_orders:
    print(order)

In [None]:
# Find orders where Sales is greater than 500
high_sales_orders = orders_collection.find({"Sales": {"$gt": 500}})
for order in high_sales_orders:
    print(order)

In [None]:
# Fetch the top 3 orders with the highest Profit
top_profit_orders = orders_collection.find().sort("Profit", -1).limit(3)
for order in top_profit_orders:
    print(order)

In [None]:
# Update all orders with Ship Mode as "First Class" to "Premium Class"
orders_collection.update_many({"Ship Mode": "First Class"}, {"$set": {"Ship Mode": "Premium Class"}})


In [None]:
# Delete all orders where Sales is less than 50
orders_collection.delete_many({"Sales": {"$lt": 50}})



In [None]:
# Aggregation: Group orders by Region and calculate total sales per region
total_sales_per_region = orders_collection.aggregate([
    {"$group": {"_id": "$Region", "TotalSales": {"$sum": "$Sales"}}}
])
for region in total_sales_per_region:
    print(region)

In [None]:
# Fetch all distinct values for Ship Mode
distinct_ship_modes = orders_collection.distinct("Ship Mode")
print("Distinct Ship Modes:", distinct_ship_modes)

In [None]:
# Count the number of orders for each category
orders_per_category = orders_collection.aggregate([
    {"$group": {"_id": "$Category", "TotalOrders": {"$sum": 1}}}
])
for category in orders_per_category:
    print(category)

In [None]:
# Close the MongoDB connection
client.close()