In [1]:
# MongoDB Questions and Answers

# 1. Key differences between SQL and NoSQL databases
print("""
- Data Model: SQL uses structured, tabular data; NoSQL supports unstructured, semi-structured, or hierarchical data.
- Schema: SQL enforces a fixed schema, while NoSQL offers flexible schemas.
- Scalability: SQL scales vertically, while NoSQL is designed for horizontal scaling.
- Transactions: SQL is strongly ACID-compliant; NoSQL supports eventual consistency, though some offer transactional capabilities.
- Query Language: SQL uses structured query language (SQL); NoSQL databases often use specific APIs or query languages.
""")

# 2. What makes MongoDB a good choice for modern applications?
print("""
- Flexible schema for agile development.
- Horizontal scaling and high availability.
- Native support for JSON-like data (BSON), which aligns with modern application needs.
- Rich query and aggregation capabilities.
- Strong ecosystem with tools like MongoDB Atlas for cloud management.
""")

# 3. Explain the concept of collections in MongoDB
print("""
- A collection in MongoDB is analogous to a table in relational databases.
- It stores documents (JSON-like objects) and does not enforce a fixed schema.
- Collections group related data for efficient querying and management.
""")

# 4. How does MongoDB ensure high availability using replication?
print("""
- MongoDB uses replica sets for high availability.
- A replica set consists of a primary node and one or more secondary nodes.
- The primary node handles all write operations, while secondaries replicate data from the primary.
- In case of primary node failure, an automatic election promotes a secondary to primary, ensuring uptime.
""")

# 5. What are the main benefits of MongoDB Atlas?
print("""
- Fully managed cloud database service with automated backups and scaling.
- Built-in security features like encryption and network isolation.
- Multi-cloud support and global deployment options.
- Monitoring and performance optimization tools.
- Simplified integration with modern development workflows.
""")

# 6. What is the role of indexes in MongoDB, and how do they improve performance?
print("""
- Indexes in MongoDB improve query performance by reducing the amount of data scanned.
- They allow efficient access to documents based on query filters.
- Common index types include single-field, compound, multikey, and text indexes.
""")

# 7. Describe the stages of the MongoDB aggregation pipeline
print("""
- The aggregation pipeline processes data through a series of stages:
  1. `$match`: Filters documents based on criteria.
  2. `$group`: Groups data and performs operations like sum, avg, etc.
  3. `$sort`: Sorts documents by specified fields.
  4. `$project`: Reshapes documents by including/excluding fields.
  5. `$lookup`: Performs joins with other collections.
  6. `$limit` and `$skip`: Limits or skips the number of documents.
""")

# 8. What is sharding in MongoDB? How does it differ from replication?
print("""
- Sharding distributes data across multiple servers to handle large datasets and high throughput.
- Data is divided into chunks based on a shard key and distributed across shards.
- Replication, on the other hand, duplicates data across servers for high availability.
- Sharding focuses on scalability, while replication ensures redundancy.
""")

# 9. What is PyMongo, and why is it used?
print("""
- PyMongo is the official Python driver for MongoDB.
- It allows developers to interact with MongoDB databases using Python.
- Provides functionality for CRUD operations, aggregation, indexing, and transactions.
""")

# 10. What are the ACID properties in the context of MongoDB transactions?
print("""
- Atomicity: Ensures all operations in a transaction are completed or none are.
- Consistency: Guarantees data integrity before and after a transaction.
- Isolation: Transactions are executed in isolation from other operations.
- Durability: Changes are permanently written to disk upon successful transaction completion.
""")

# 11. What is the purpose of MongoDB's explain() function?
print("""
- The `explain()` function provides detailed information on how a query is executed.
- It helps in analyzing query performance and optimizing indexes.
""")

# 12. How does MongoDB handle schema validation?
print("""
- MongoDB supports schema validation using JSON Schema.
- Validation rules can be defined for collections to enforce constraints on document fields.
""")

# 13. What is the difference between a primary and a secondary node in a replica set?
print("""
- Primary Node: Handles all write operations and can serve read requests.
- Secondary Node: Replicates data from the primary and can optionally handle read requests.
""")

# 14. What security mechanisms does MongoDB provide for data protection?
print("""
- Authentication and authorization using role-based access control (RBAC).
- Encryption at rest and in transit.
- IP whitelisting and network isolation.
- Auditing to monitor database activity.
""")

# 15. Explain the concept of embedded documents and when they should be used
print("""
- Embedded documents are nested documents within a document.
- Useful when related data is accessed together or has a one-to-few relationship.
- Avoids the need for joins, improving query performance.
""")

# 16. What is the purpose of MongoDB's $lookup stage in aggregation?
print("""
- The `$lookup` stage performs a left outer join between collections.
- It adds fields from the joined collection to the documents in the pipeline.
""")

# 17. What are some common use cases for MongoDB?
print("""
- Content management systems.
- Real-time analytics and event logging.
- IoT applications and time-series data.
- E-commerce platforms and catalogs.
- Mobile and social media applications.
""")

# 18. What are the advantages of using MongoDB for horizontal scaling?
print("""
- Sharding distributes data across multiple servers, increasing throughput.
- Enables handling of large datasets and concurrent user requests.
- Allows dynamic addition of shards to accommodate growing data.
""")

# 19. How do MongoDB transactions differ from SQL transactions?
print("""
- MongoDB transactions support multi-document operations within replica sets and sharded clusters.
- SQL transactions often involve multiple tables, while MongoDB transactions handle multiple documents.
""")

# 20. What are the main differences between capped collections and regular collections?
print("""
- Capped Collections:
  - Fixed-size collections with a maximum size or document count.
  - Automatically overwrite oldest documents when full.
- Regular Collections:
  - No size restrictions and documents are not overwritten.
""")

# 21. What is the purpose of the $match stage in MongoDB's aggregation pipeline?
print("""
- The `$match` stage filters documents based on specified conditions.
- Reduces data early in the pipeline for efficiency.
""")

# 22. How can you secure access to a MongoDB database?
print("""
- Enable authentication and enforce strong passwords.
- Use role-based access control (RBAC).
- Enable TLS/SSL for encrypted communication.
- Restrict network access using firewalls and IP whitelisting.
""")

# 23. What is MongoDB's WiredTiger storage engine, and why is it important?
print("""
- WiredTiger is the default storage engine for MongoDB.
- Provides document-level concurrency and compression.
- Enhances performance for write-intensive workloads.
""")



- Data Model: SQL uses structured, tabular data; NoSQL supports unstructured, semi-structured, or hierarchical data.
- Schema: SQL enforces a fixed schema, while NoSQL offers flexible schemas.
- Scalability: SQL scales vertically, while NoSQL is designed for horizontal scaling.
- Transactions: SQL is strongly ACID-compliant; NoSQL supports eventual consistency, though some offer transactional capabilities.
- Query Language: SQL uses structured query language (SQL); NoSQL databases often use specific APIs or query languages.


- Flexible schema for agile development.
- Horizontal scaling and high availability.
- Native support for JSON-like data (BSON), which aligns with modern application needs.
- Rich query and aggregation capabilities.
- Strong ecosystem with tools like MongoDB Atlas for cloud management.


- A collection in MongoDB is analogous to a table in relational databases.
- It stores documents (JSON-like objects) and does not enforce a fixed schema.
- Collections group rela

In [2]:
pip install pymongo pandas



[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m24.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49m/Users/saurabhkumar/anaconda3/bin/python -m pip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [24]:
# 1. Write a Python script to load the Superstore dataset from a CSV file into MongoDB.


import pandas as pd
from pymongo import MongoClient

def load_csv_to_mongodb(csv_file_path, database_name, collection_name, mongo_uri="mongodb://localhost:27017/"):
    """
    Load data from a CSV file into a MongoDB collection.

    Args:
        csv_file_path (str): Path to the CSV file.
        database_name (str): Name of the MongoDB database.
        collection_name (str): Name of the MongoDB collection.
        mongo_uri (str): MongoDB connection URI. Default is "mongodb://localhost:27017/".
    """
    try:
        print("Loading CSV file...")
        df = pd.read_csv(csv_file_path,encoding="latin1")
        print("CSV file loaded successfully.")
        print(f"Connecting to MongoDB at {mongo_uri}...")
        client = MongoClient(mongo_uri)
        db = client[database_name]
        collection = db[collection_name]
        print("Connected to MongoDB.")
        
        print(f"Clearing existing documents in the '{collection_name}' collection...")
        collection.delete_many({})
        print("Existing documents cleared.")
        
        data = df.to_dict("records")
        print(f"Inserting data into the '{collection_name}' collection...")
        collection.insert_many(data)
        print("Data inserted successfully.")
        print(f"Data from '{csv_file_path}' has been loaded into the '{database_name}.{collection_name}' collection.")

    except Exception as e:
        print("An error occurred:", e)

# Example usage
if __name__ == "__main__":
    csv_file = "/Users/saurabhkumar/Downloads/superstore.csv" 
    db_name = "SuperstoreDB"    
    collection = "Orders"       

    load_csv_to_mongodb(csv_file, db_name, collection)

# output:
# Loading CSV file...
# CSV file loaded successfully.
# Connecting to MongoDB at mongodb://localhost:27017/...
# Connected to MongoDB.
# Inserting data into the 'Orders' collection...
# Data inserted successfully.
# Data from '/Users/saurabhkumar/Downloads/superstore.csv' has been loaded into the 'SuperstoreDB.Orders' collection.


Loading CSV file...
CSV file loaded successfully.
Connecting to MongoDB at mongodb://localhost:27017/...
Connected to MongoDB.
Clearing existing documents in the 'Orders' collection...
Existing documents cleared.
Inserting data into the 'Orders' collection...
Data inserted successfully.
Data from '/Users/saurabhkumar/Downloads/superstore.csv' has been loaded into the 'SuperstoreDB.Orders' collection.


In [25]:
# 2.Retrieve and print all documents from the Orders collection.

def fetch_all_documents(mongo_client, database_name, collection_name):
    try:
        collection = mongo_client.get_collection(database_name, collection_name)
        documents = collection.find()
        for doc in documents:
            print(doc)
    except Exception as e:
        print("An error occurred:", e)

if __name__ == "__main__":
    mongo_client = MongoDBClient(mongo_uri="mongodb://localhost:27017/")
    db_name = "SuperstoreDB"   
    collection = "Orders"
    fetch_all_documents(mongo_client, db_name, collection)
    mongo_client.close_connection()


IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)



In [26]:
# 3.Count and display the total number of documents in the Orders collection.

def count_documents(mongo_client, database_name, collection_name):
    try:
        collection = mongo_client.get_collection(database_name, collection_name)
        total_documents = collection.count_documents({})
        print(f"Total number of documents: {total_documents}")
    except Exception as e:
        print("An error occurred:", e)

if __name__ == "__main__":
    mongo_client = MongoDBClient(mongo_uri="mongodb://localhost:27017/")
    db_name = "SuperstoreDB"   
    collection = "Orders"
    count_documents(mongo_client, db_name, collection)
    mongo_client.close_connection()
    
# output:
# Connected to MongoDB at mongodb://localhost:27017/
# Total number of documents: 9994
# Connection closed.

Connected to MongoDB at mongodb://localhost:27017/
Total number of documents: 9994
Connection closed.


In [27]:
#4. Write a query to fetch all orders from the "West" region.
def fetch_orders_from_west(mongo_client, database_name, collection_name):
    try:
        collection = mongo_client.get_collection(database_name, collection_name)
        filter_condition = {"Region": "West"}
        orders_from_west = collection.find(filter_condition)
        for order in orders_from_west:
            print(order)
    except Exception as e:
        print("An error occurred:", e)
if __name__ == "__main__":
    mongo_client = MongoDBClient(mongo_uri="mongodb://localhost:27017/")
    db_name = "SuperstoreDB"   
    collection = "Orders"
    fetch_orders_from_west(mongo_client, db_name, collection)
    mongo_client.close_connection()

Connected to MongoDB at mongodb://localhost:27017/
{'_id': ObjectId('6797f2bce868574b4fb4f773'), 'Row ID': 3, 'Order ID': 'CA-2016-138688', 'Order Date': '6/12/2016', 'Ship Date': '6/16/2016', 'Ship Mode': 'Second Class', 'Customer ID': 'DV-13045', 'Customer Name': 'Darrin Van Huff', 'Segment': 'Corporate', 'Country': 'United States', 'City': 'Los Angeles', 'State': 'California', 'Postal Code': 90036, 'Region': 'West', 'Product ID': 'OFF-LA-10000240', 'Category': 'Office Supplies', 'Sub-Category': 'Labels', 'Product Name': 'Self-Adhesive Address Labels for Typewriters by Universal', 'Sales': 14.62, 'Quantity': 2, 'Discount': 0.0, 'Profit': 6.8714}
{'_id': ObjectId('6797f2bce868574b4fb4f776'), 'Row ID': 6, 'Order ID': 'CA-2014-115812', 'Order Date': '6/9/2014', 'Ship Date': '6/14/2014', 'Ship Mode': 'Standard Class', 'Customer ID': 'BH-11710', 'Customer Name': 'Brosina Hoffman', 'Segment': 'Consumer', 'Country': 'United States', 'City': 'Los Angeles', 'State': 'California', 'Postal Code

In [28]:
# 5.Write a query to find orders where Sales is greater than 500.

def fetch_orders_sales_greater_than_500(mongo_client, database_name, collection_name):
    try:
        collection = mongo_client.get_collection(database_name, collection_name)
        filter_condition = {"Sales": {"$gt": 500}}
        orders = collection.find(filter_condition)
        for order in orders:
            print(order)
    except Exception as e:
        print("An error occurred:", e)
if __name__ == "__main__":
    mongo_client = MongoDBClient(mongo_uri="mongodb://localhost:27017/")
    db_name = "SuperstoreDB"   
    collection = "Orders"
    fetch_orders_sales_greater_than_500(mongo_client, db_name, collection)
    mongo_client.close_connection()

Connected to MongoDB at mongodb://localhost:27017/
{'_id': ObjectId('6797f2bce868574b4fb4f772'), 'Row ID': 2, 'Order ID': 'CA-2016-152156', 'Order Date': '11/8/2016', 'Ship Date': '11/11/2016', 'Ship Mode': 'Second Class', 'Customer ID': 'CG-12520', 'Customer Name': 'Claire Gute', 'Segment': 'Consumer', 'Country': 'United States', 'City': 'Henderson', 'State': 'Kentucky', 'Postal Code': 42420, 'Region': 'South', 'Product ID': 'FUR-CH-10000454', 'Category': 'Furniture', 'Sub-Category': 'Chairs', 'Product Name': 'Hon Deluxe Fabric Upholstered Stacking Chairs, Rounded Back', 'Sales': 731.94, 'Quantity': 3, 'Discount': 0.0, 'Profit': 219.582}
{'_id': ObjectId('6797f2bce868574b4fb4f774'), 'Row ID': 4, 'Order ID': 'US-2015-108966', 'Order Date': '10/11/2015', 'Ship Date': '10/18/2015', 'Ship Mode': 'Standard Class', 'Customer ID': 'SO-20335', 'Customer Name': "Sean O'Donnell", 'Segment': 'Consumer', 'Country': 'United States', 'City': 'Fort Lauderdale', 'State': 'Florida', 'Postal Code': 333

In [29]:
# 6.Fetch the top 3 orders with the highest Profit.

def fetch_top_3_orders_by_profit(mongo_client, database_name, collection_name):
    try:
        collection = mongo_client.get_collection(database_name, collection_name)
        top_orders = collection.find().sort("Profit", -1).limit(3)
        for order in top_orders:
            print(order)
    except Exception as e:
        print("An error occurred:", e)
if __name__ == "__main__":
    mongo_client = MongoDBClient(mongo_uri="mongodb://localhost:27017/")
    db_name = "SuperstoreDB"   
    collection = "Orders"
    fetch_top_3_orders_by_profit(mongo_client, db_name, collection)
    mongo_client.close_connection()

Connected to MongoDB at mongodb://localhost:27017/
{'_id': ObjectId('6797f2bce868574b4fb5121b'), 'Row ID': 6827, 'Order ID': 'CA-2016-118689', 'Order Date': '10/2/2016', 'Ship Date': '10/9/2016', 'Ship Mode': 'Standard Class', 'Customer ID': 'TC-20980', 'Customer Name': 'Tamara Chand', 'Segment': 'Corporate', 'Country': 'United States', 'City': 'Lafayette', 'State': 'Indiana', 'Postal Code': 47905, 'Region': 'Central', 'Product ID': 'TEC-CO-10004722', 'Category': 'Technology', 'Sub-Category': 'Copiers', 'Product Name': 'Canon imageCLASS 2200 Advanced Copier', 'Sales': 17499.95, 'Quantity': 5, 'Discount': 0.0, 'Profit': 8399.976}
{'_id': ObjectId('6797f2bce868574b4fb5174a'), 'Row ID': 8154, 'Order ID': 'CA-2017-140151', 'Order Date': '3/23/2017', 'Ship Date': '3/25/2017', 'Ship Mode': 'First Class', 'Customer ID': 'RB-19360', 'Customer Name': 'Raymond Buch', 'Segment': 'Consumer', 'Country': 'United States', 'City': 'Seattle', 'State': 'Washington', 'Postal Code': 98115, 'Region': 'West

In [30]:
# 7.Update all orders with Ship Mode as "First Class" to "Premium Class."

def update_ship_mode(mongo_client, database_name, collection_name):
    try:
        collection = mongo_client.get_collection(database_name, collection_name)
        update_result = collection.update_many(
            {"Ship Mode": "First Class"},
            {"$set": {"Ship Mode": "Premium Class"}}
        )
        print(f"Matched {update_result.matched_count} documents and updated {update_result.modified_count} documents.")
    except Exception as e:
        print("An error occurred:", e)
if __name__ == "__main__":
    mongo_client = MongoDBClient(mongo_uri="mongodb://localhost:27017/")
    db_name = "SuperstoreDB"   
    collection = "Orders"
    update_ship_mode(mongo_client, db_name, collection)
    mongo_client.close_connection()

Connected to MongoDB at mongodb://localhost:27017/
Matched 1538 documents and updated 1538 documents.
Connection closed.


In [31]:
# 8.Delete all orders where Sales is less than 50.

def delete_orders_sales_less_than_50(mongo_client, database_name, collection_name):
    try:
        collection = mongo_client.get_collection(database_name, collection_name)
        delete_result = collection.delete_many({"Sales": {"$lt": 50}})
        print(f"Deleted {delete_result.deleted_count} documents.")
    except Exception as e:
        print("An error occurred:", e)
if __name__ == "__main__":
    mongo_client = MongoDBClient(mongo_uri="mongodb://localhost:27017/")
    db_name = "SuperstoreDB"   
    collection = "Orders"
    delete_orders_sales_less_than_50(mongo_client, db_name, collection)
    mongo_client.close_connection()

Connected to MongoDB at mongodb://localhost:27017/
Deleted 4849 documents.
Connection closed.


In [32]:
# 9.Use aggregation to group orders by Region and calculate total sales per region.

def aggregate_sales_per_region(mongo_client, database_name, collection_name):
    try:
        collection = mongo_client.get_collection(database_name, collection_name)
        pipeline = [
            {"$group": {"_id": "$Region", "total_sales": {"$sum": "$Sales"}}}
        ]
        results = collection.aggregate(pipeline)
        for result in results:
            print(result)
    except Exception as e:
        print("An error occurred:", e)
if __name__ == "__main__":
    mongo_client = MongoDBClient(mongo_uri="mongodb://localhost:27017/")
    db_name = "SuperstoreDB"   
    collection = "Orders"
    aggregate_sales_per_region(mongo_client, db_name, collection)
    mongo_client.close_connection()
# output:
# Connected to MongoDB at mongodb://localhost:27017/
# {'_id': 'South', 'total_sales': 376023.312}
# {'_id': 'Central', 'total_sales': 479611.8458}
# {'_id': 'West', 'total_sales': 694686.6195}
# {'_id': 'East', 'total_sales': 651137.705}
# Connection closed.

Connected to MongoDB at mongodb://localhost:27017/
{'_id': 'South', 'total_sales': 376023.312}
{'_id': 'Central', 'total_sales': 479611.8458}
{'_id': 'West', 'total_sales': 694686.6195}
{'_id': 'East', 'total_sales': 651137.705}
Connection closed.


In [33]:
# 10. Fetch all distinct values for Ship Mode from the collection.

def fetch_distinct_ship_modes(mongo_client, database_name, collection_name):
    try:
        collection = mongo_client.get_collection(database_name, collection_name)
        distinct_ship_modes = collection.distinct("Ship Mode")
        print(f"Distinct Ship Modes: {distinct_ship_modes}")
    except Exception as e:
        print("An error occurred:", e)
if __name__ == "__main__":
    mongo_client = MongoDBClient(mongo_uri="mongodb://localhost:27017/")
    db_name = "SuperstoreDB"   
    collection = "Orders"
    fetch_distinct_ship_modes(mongo_client, db_name, collection)
    mongo_client.close_connection()
# output:
# Connected to MongoDB at mongodb://localhost:27017/
# Distinct Ship Modes: ['Premium Class', 'Same Day', 'Second Class', 'Standard Class']
# Connection closed.

Connected to MongoDB at mongodb://localhost:27017/
Distinct Ship Modes: ['Premium Class', 'Same Day', 'Second Class', 'Standard Class']
Connection closed.


In [34]:
# 11. Count the number of orders for each category.

def count_orders_per_category(mongo_client, database_name, collection_name):
    try:
        collection = mongo_client.get_collection(database_name, collection_name)
        pipeline = [
            {"$group": {"_id": "$Category", "order_count": {"$sum": 1}}}
        ]
        results = collection.aggregate(pipeline)
        for result in results:
            print(result)
    except Exception as e:
        print("An error occurred:", e)
if __name__ == "__main__":
    mongo_client = MongoDBClient(mongo_uri="mongodb://localhost:27017/")
    db_name = "SuperstoreDB"   
    collection = "Orders"
    count_orders_per_category(mongo_client, db_name, collection)
    mongo_client.close_connection()
# output:
# Connected to MongoDB at mongodb://localhost:27017/
# {'_id': 'Office Supplies', 'order_count': 2076}
# {'_id': 'Furniture', 'order_count': 1573}
# {'_id': 'Technology', 'order_count': 1496}
# Connection closed.

Connected to MongoDB at mongodb://localhost:27017/
{'_id': 'Office Supplies', 'order_count': 2076}
{'_id': 'Furniture', 'order_count': 1573}
{'_id': 'Technology', 'order_count': 1496}
Connection closed.
