1. Key Differences Between SQL and NoSQL Databases

SQL databases are relational, structured, and use fixed schemas (e.g., MySQL). NoSQL databases are non-relational, schema-less, and store data flexibly (e.g., MongoDB). SQL is better for complex queries; NoSQL is better for scalability.

2. Why MongoDB is a Good Choice for Modern Applications

MongoDB supports schema flexibility, horizontal scaling, high availability, and efficient handling of large, unstructured data, making it ideal for real-time applications, big data, and agile development workflows.

3. Concept of Collections in MongoDB

Collections in MongoDB are analogous to tables in SQL. They store related documents (records) and provide a structure for grouping data without enforcing a fixed schema.

4. High Availability Through Replication in MongoDB

MongoDB uses replica sets, where a primary node handles writes and secondary nodes replicate data for read operations or failover, ensuring availability during node failure.

5. Benefits of MongoDB Atlas

MongoDB Atlas offers automatic scaling, global distribution, automated backups, and a fully managed cloud database service, reducing operational overhead and ensuring reliability.

6. Role of Indexes in MongoDB and Performance Improvement

Indexes improve query performance by allowing MongoDB to quickly locate data without scanning the entire collection. They enhance speed for reads but require additional storage.

7. Stages of the MongoDB Aggregation Pipeline

The aggregation pipeline consists of stages like $match, $group, $sort, and $project, which sequentially process and transform documents for data analysis and manipulation.

8. Sharding vs. Replication in MongoDB

Sharding distributes data across multiple servers for horizontal scaling, while replication duplicates data for high availability and fault tolerance. They address different scalability needs.

9. PyMongo and Its Use

PyMongo is a Python driver for MongoDB. It provides an interface for connecting to MongoDB, performing CRUD operations, and working with its database functionalities.

10. ACID Properties in MongoDB Transactions

ACID (Atomicity, Consistency, Isolation, Durability) ensures reliable, isolated, and fault-tolerant transactions in MongoDB, especially for multi-document operations requiring strict data consistency.

11. Purpose of MongoDB’s explain() Function

The explain() function provides details about query execution plans, such as indexes used, query efficiency, and resource consumption, helping optimize performance.

12. How MongoDB Handles Schema Validation

MongoDB allows schema validation using JSON Schema to enforce rules on document fields, ensuring data integrity without compromising flexibility.

13. Difference Between Primary and Secondary Nodes in Replica Sets

The primary node handles all writes, while secondary nodes replicate data from the primary and handle read requests or act as backups during failover.

14. Security Mechanisms in MongoDB

MongoDB supports authentication, authorization, encryption (TLS/SSL), auditing, and IP whitelisting for robust data protection and secure database access.

15. Embedded Documents in MongoDB

Embedded documents store related data within a single document, reducing joins and improving performance. They are used when data is highly related and frequently accessed together.

16. Purpose of MongoDB’s $lookup Stage

The $lookup stage performs left outer joins between collections, enabling retrieval of related data from different collections within the aggregation pipeline.

17. Common Use Cases for MongoDB

MongoDB is used in content management, e-commerce, IoT applications, real-time analytics, social media platforms, and any use case requiring schema flexibility and scalability.

18. Advantages of MongoDB for Horizontal Scaling

MongoDB’s sharding distributes data across servers, enabling efficient horizontal scaling by handling growing data volumes and workloads without degrading performance.

19. How MongoDB Transactions Differ From SQL Transactions

MongoDB transactions are suitable for multi-document operations within replica sets and provide ACID guarantees, but SQL databases offer more mature transaction management.

20. Differences Between Capped and Regular Collections

Capped collections have fixed size and order, are auto-deleting, and support fast inserts. Regular collections are flexible and used for general-purpose storage.

21. Purpose of $match in Aggregation Pipeline

The $match stage filters documents based on conditions, similar to a SQL WHERE clause, optimizing data flow for subsequent stages in the pipeline.

22. Securing Access to a MongoDB Database

Access is secured using strong authentication, user roles, IP whitelisting, TLS/SSL encryption, and proper firewall configurations to protect data from unauthorized access.

23. MongoDB’s WiredTiger Storage Engine

WiredTiger provides high performance and compression, handling concurrent writes efficiently and ensuring durability. It is the default storage engine in MongoDB.









In [13]:
!pip install pymongo



In [14]:
import pandas as pd
from pymongo import MongoClient
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt


# 1. Load the Superstore dataset into MongoDB
def load_csv_to_mongodb(csv_path, db_name, collection_name):
    client = MongoClient("mongodb://localhost:27017/")
    db = client[db_name]
    collection = db[collection_name]

    data = pd.read_csv('/content/superstore.csv', encoding='latin-1')
    collection.insert_many(data.to_dict("records"))
    print("Data loaded into MongoDB.")



In [15]:
# 2. Retrieve and print all documents from the Orders collection
def print_all_documents(db_name, collection_name):
    client = MongoClient("mongodb://localhost:27017/")
    collection = client[db_name][collection_name]
    for doc in collection.find():
        print(doc)

In [16]:
# 3. Count and display the total number of documents in the Orders collection
def count_documents(db_name, collection_name):
    client = MongoClient("mongodb://localhost:27017/")
    collection = client[db_name][collection_name]
    count = collection.count_documents({})
    print(f"Total documents: {count}")

In [17]:
# 4. Fetch all orders from the "West" region
def fetch_orders_from_region(db_name, collection_name, region):
    client = MongoClient("mongodb://localhost:27017/")
    collection = client[db_name][collection_name]
    results = collection.find({"Region": region})
    for doc in results:
        print(doc)

In [18]:
# 5. Find orders where Sales > 500
def fetch_orders_sales_gt(db_name, collection_name, sales_value):
    client = MongoClient("mongodb://localhost:27017/")
    collection = client[db_name][collection_name]
    results = collection.find({"Sales": {"$gt": sales_value}})
    for doc in results:
        print(doc)

In [19]:
# 6. Fetch top 3 orders with the highest Profit
def fetch_top_profit_orders(db_name, collection_name, top_n=3):
    client = MongoClient("mongodb://localhost:27017/")
    collection = client[db_name][collection_name]
    results = collection.find().sort("Profit", -1).limit(top_n)
    for doc in results:
        print(doc)

In [20]:
# 7. Update Ship Mode from "First Class" to "Premium Class"
def update_ship_mode(db_name, collection_name):
    client = MongoClient("mongodb://localhost:27017/")
    collection = client[db_name][collection_name]
    collection.update_many({"Ship Mode": "First Class"}, {"$set": {"Ship Mode": "Premium Class"}})
    print("Updated Ship Mode from 'First Class' to 'Premium Class'.")

In [21]:
# 8. Delete all orders where Sales < 50
def delete_low_sales_orders(db_name, collection_name, sales_value):
    client = MongoClient("mongodb://localhost:27017/")
    collection = client[db_name][collection_name]
    result = collection.delete_many({"Sales": {"$lt": sales_value}})
    print(f"Deleted {result.deleted_count} documents where Sales < {sales_value}.")

In [22]:
# 9. Use aggregation to group orders by Region and calculate total sales per region
def calculate_sales_per_region(db_name, collection_name):
    client = MongoClient("mongodb://localhost:27017/")
    collection = client[db_name][collection_name]
    pipeline = [
        {"$group": {"_id": "$Region", "TotalSales": {"$sum": "$Sales"}}},
        {"$sort": {"TotalSales": -1}}
    ]
    results = collection.aggregate(pipeline)
    for result in results:
        print(result)

In [23]:
# 10. Fetch all distinct values for Ship Mode from the collection
def fetch_distinct_ship_modes(db_name, collection_name):
    client = MongoClient("mongodb://localhost:27017/")
    collection = client[db_name][collection_name]
    distinct_modes = collection.distinct("Ship Mode")
    print("Distinct Ship Modes:", distinct_modes)

In [24]:
# 11. Count the number of orders for each category
def count_orders_per_category(db_name, collection_name):
    client = MongoClient("mongodb://localhost:27017/")
    collection = client[db_name][collection_name]
    pipeline = [
        {"$group": {"_id": "$Category", "OrderCount": {"$sum": 1}}},
        {"$sort": {"OrderCount": -1}}
    ]
    results = collection.aggregate(pipeline)
    for result in results:
        print(result)