## Question-1. What are the key differences between SQL and NoSQL databases?
* SQL stands for Structured Query Language and NoSQL stands for Not Only SQL databases serve different purposes and have distinct characteristics.

### 1. Data Model:
#### SQL Databases: 
Use a structured schema with tables, rows, and columns. Data is organized in a relational model, where relationships between tables are defined using foreign keys.
#### NoSQL Databases: 
Use various data models, including document, key-value, column-family, and graph. They are more flexible and can handle unstructured or semi-structured data.
### 2. Schema:
#### SQL Databases: 
Have a fixed schema that requires a predefined structure. Changes to the schema can be complex and may require migrations.
#### NoSQL Databases: 
Typically have a dynamic schema, allowing for more flexibility in data storage. New fields can be added without affecting existing data.
### 3. Query Language:
#### SQL Databases: 
Use SQL as the standard query language for defining and manipulating data. SQL provides powerful querying capabilities, including joins and complex transactions.
#### NoSQL Databases: 
Use various query languages or APIs specific to the database type. They may not support complex queries or joins in the same way SQL does.
### 4. Transactions:
#### SQL Databases: 
Support ACID (Atomicity, Consistency, Isolation, Durability) properties ensuring reliable transactions and data integrity.
#### NoSQL Databases: 
Often prioritize availability and partition tolerance over strict consistency (following the CAP theorem). Some NoSQL databases offer eventual consistency rather than strict ACID compliance.
### 5. Scalability:
#### SQL Databases: 
Typically scale vertically (adding more power to a single server). Horizontal scaling (adding more servers) can be challenging due to the relational model.
#### NoSQL Databases: 
Designed for horizontal scaling, allowing them to handle large volumes of data and high traffic by distributing data across multiple servers.
### 6. Use Cases:
#### SQL Databases: 
Best suited for applications requiring complex queries, transactions, and data integrity, such as financial systems, ERP, and CRM applications.
#### NoSQL Databases: 
Ideal for applications with large volumes of unstructured data, real-time analytics, content management, and scenarios requiring high availability and scalability, such as social media, IoT, and big data applications.
### 7. Examples:
* SQL Databases: MySQL, PostgreSQL, Oracle, Microsoft SQL Server.
* NoSQL Databases: MongoDB (document), Cassandra (column-family), Redis (key-value), Neo4j (graph).

## Question - 2.What makes MongoDB a good choice for modern applications?

* MongoDB is a popular NoSQL database that offers several features and advantages that make it a good choice for modern applications.
#### 1. Flexible Schema:
MongoDB uses a document-oriented data model allowing for a flexible schema. This means that documents in a collection can have different structures making it easy to adapt to changing application requirements without the need for complex migrations.
#### 2. Scalability:
MongoDB is designed for horizontal scalability allowing it to handle large volumes of data and high traffic loads. It supports sharding, which distributes data across multiple servers, enabling applications to scale out easily as demand grows.
#### 3. High Performance:
MongoDB provides high performance for read and write operations. Its in-memory processing capabilities and efficient indexing mechanisms contribute to fast query execution, making it suitable for real-time applications.
#### 4. Rich Query Language:
MongoDB offers a powerful and expressive query language that supports a wide range of queries, including filtering, sorting, and aggregation. This allows developers to perform complex data operations without needing to write extensive code.
#### 5. Document Storage:
Data is stored in BSON (Binary JSON) format, which allows for rich data types, including arrays and nested documents. This makes it easy to represent complex data structures and relationships within a single document.
#### 6. Built-in Replication and High Availability:
MongoDB supports replica sets, which provide automatic failover and data redundancy. This ensures high availability and data durability, making it suitable for mission-critical applications.
#### 7. Geospatial Queries:
MongoDB has built-in support for geospatial data and queries, making it a great choice for applications that require location-based services, such as mapping and location tracking.
#### 8. Aggregation Framework:
The aggregation framework in MongoDB allows for powerful data processing and transformation capabilities. It enables developers to perform complex data analysis and reporting directly within the database.
#### 9. Integration with Modern Technologies:
MongoDB integrates well with modern development frameworks and tools, including cloud services, microservices architectures, and containerization technologies like Docker and Kubernetes.
#### 10. Community and Ecosystem:
MongoDB has a large and active community, along with a rich ecosystem of tools, libraries, and resources. This support makes it easier for developers to find solutions, share knowledge, and access third-party integrations.
#### 11. Cloud Services:
MongoDB Atlas, the fully managed cloud database service, simplifies deployment, scaling, and management of MongoDB databases. It provides features like automated backups, monitoring, and security, making it easier for teams to focus on application development.

## Question-3. Explain the concept of collections in MongoDB.

* In MongoDB a collection is a fundamental data structure that serves as a container for documents.#### 

#### 1. Definition:
A collection is analogous to a table in a relational database. It is a grouping of MongoDB documents that share a similar structure or purpose. Each document within a collection can have a different schema, allowing for flexibility in data representation.
#### 2. Documents:
Documents are the individual records stored within a collection. They are represented in BSON (Binary JSON) format, which allows for rich data types, including arrays, nested objects, and various data types (strings, numbers, dates). Each document has a unique identifier called the _id field, which is automatically generated by MongoDB unless specified otherwise.
#### 3. Schema Flexibility:
Unlike traditional relational databases, collections in MongoDB do not require a predefined schema. This means that documents within the same collection can have different fields and data types. This flexibility is particularly useful for applications that evolve over time or deal with unstructured data.
#### 4. Creating Collections:
Collections are created automatically when a document is inserted into a non-existent collection. However, developers can also explicitly create collections using the createCollection command if they want to specify options such as validation rules, storage engine, or indexing.
#### 5. Indexes:
Collections can have indexes to improve query performance. MongoDB supports various types of indexes, including single-field, compound, geospatial, and text indexes. Indexes can be created on one or more fields within a collection to optimize read operations.

## Question-4. How does MongoDB ensure high availability using replication?

MongoDB ensures high availability through replica sets, which are groups of MongoDB servers that maintain the same dataset. Here are the key components of this mechanism:

#### 1.Primary and Secondary Nodes: 
In a replica set one node acts as the primary handling all write operations while one or more secondary nodes replicate the data from the primary.

#### 2.Automatic Failover: 
If the primary node fails the secondary nodes automatically detect the failure through heartbeat signals. An election process is initiated to select a new primary from the secondaries, ensuring continuous availability.

#### 3.Oplog: 
The primary maintains an operation log (oplog) that records all changes. Secondary nodes replicate these changes from the oplog, keeping their data synchronized with the primary.

#### 4.Read Preferences: 
Clients can configure read preferences to read from either the primary or secondary nodes, allowing for load balancing and improved read performance.

#### 5.Arbiters: 
In cases where an odd number of nodes is needed for elections, arbiter nodes can be added. They participate in elections but do not store data, helping maintain a quorum.

This replication strategy provides redundancy and fault tolerance, ensuring that MongoDB remains available even in the event of node failures.

## Questio-5. What are the main benefits of MongoDB Atlas?


### Main Benefits of MongoDB Atlas

#### Fully Managed Service: 
Handles database deployment, updates, backups, and scaling automatically.
#### Global Distribution: 
Easily distribute data across multiple regions for low-latency access and disaster recovery.
#### Scalability: 
Supports vertical and horizontal scaling with minimal effort.
#### High Security: 
Provides built-in encryption, role-based access control, and compliance with standards like GDPR and HIPAA.
#### Performance Monitoring: 
Offers real-time monitoring and performance optimization tools.
* MongoDB Atlas simplifies database management while ensuring reliability, scalability, and security.

## Question-6. What is the role of indexes in MongoDB, and how do they improve performance?


### Role of Indexes in MongoDB and Their Impact on Performance
Indexes in MongoDB play a crucial role in optimizing query performance and improving data retrieval efficiency. Here are the key points regarding their role and benefits:

#### Faster Query Execution:
Indexes allow MongoDB to quickly locate and access the documents that match a query, significantly reducing the amount of data that needs to be scanned. This leads to faster query execution times.

#### Efficient Sorting:
Indexes can be used to sort query results without requiring additional processing. When a query includes sorting on indexed fields, MongoDB can return results in the desired order directly from the index.

#### Reduced Resource Consumption:
By minimizing the number of documents scanned during a query, indexes reduce CPU and memory usage, leading to more efficient resource utilization and improved overall performance.

#### Support for Complex Queries:
Indexes enable the execution of complex queries, including those with multiple fields, range queries, and text searches. This enhances the flexibility and capability of the database to handle diverse query patterns.

#### Types of Indexes:
MongoDB supports various types of indexes, including single-field, compound, geospatial, and text indexes. Each type is designed to optimize specific query patterns, allowing developers to tailor indexing strategies to their application needs.

## Question- 7. Describe the stages of the MongoDB aggregation pipeline.

* The MongoDB aggregation pipeline is a powerful framework for processing and transforming data in a collection. It consists of a series of stages, each performing a specific operation on the data.

* $match : Filters documents based on a condition (like SQL WHERE clause).
* $group: Groups documents by a field and performs aggregations (e.g., sum, average).
* $project: Reshapes documents by including, excluding, or computing fields.
* $sort: Sorts documents in ascending or descending order.
* $limit: Limits the number of documents in the output.
* $skip: Skips a specified number of documents.
* $unwind: Deconstructs an array field into individual documents.
* $lookup:This stage performs a left outer join with another collection, allowing you to combine documents from different collections based on a specified field.
* $facet:This stage allows you to perform multiple independent aggregations within a single pipeline.
* $merge:This stage allows you to write the results of the aggregation pipeline to a specified collection.

## Question-8. What is sharding in MongoDB? How does it differ from replication?

Sharding in MongoDB is a method for distributing data across multiple servers or clusters to ensure horizontal scalability. It allows MongoDB to handle large datasets and high-throughput applications by partitioning data into smaller, more manageable pieces called shards. Each shard is an independent database that contains a subset of the data, and together, they form a single logical database.

### Key Features of Sharding:
* Data Distribution: Sharding divides the dataset into chunks based on a shard key, which is a specific field in the documents.
* Horizontal Scalability: As the data grows new shards can be added to the cluster without downtime allowing the system to scale out easily.
* Load Balancing: MongoDB automatically balances the data across shards to ensure even distribution and optimal performance.
* High Availability: Each shard can be configured as a replica set providing redundancy and failover capabilities.
### How Does Sharding Differ from Replication?
While both sharding and replication are techniques used in MongoDB to enhance performance and availability, they serve different 

### Purposes
* Sharding:Primarily used for horizontal scaling and managing large datasets by distributing data across multiple servers.
* Replication: Used for data redundancy and high availability by creating copies of the same data on multiple servers.
### Data Structure:
* Sharding: Involves partitioning data into distinct shards, each containing a subset of the overall dataset.
* Replication: Involves maintaining identical copies of the same dataset across multiple nodes (replica sets).
### Data Access: 
* Sharding:Queries may be directed to specific shards based on the shard key, allowing for efficient data retrieval from large datasets.
* Replication: All read and write operations are typically directed to the primary node, with secondary nodes serving as backups.
### Scalability vs. Availability:
* Sharding: Focuses on scaling out the database to handle increased load and larger datasets.
* Replication: Focuses on ensuring data availability and durability in case of node failures.

## Question-9. What is PyMongo, and why is it used?

PyMongo is the official Python driver for MongoDB, providing a way for Python applications to interact with MongoDB databases. It allows developers to perform various database operations, such as creating, reading, updating, and deleting documents, as well as managing collections and databases.

### Key Features of PyMongo:
#### Database Operations:
* PyMongo provides methods to perform CRUD (Create, Read, Update, Delete) operations on MongoDB documents, making it easy to manipulate data.
#### Connection Management:
* It handles connections to MongoDB servers, including support for replica sets and sharded clusters, ensuring efficient communication with the database.
#### Aggregation Framework:
* PyMongo supports MongoDB's aggregation framework, allowing developers to perform complex data processing and analysis directly from their Python applications.
#### Indexing:
* It provides functionality to create and manage indexes, which can improve query performance.
#### Support for BSON:
* PyMongo can handle BSON (Binary JSON) data types, which are used by MongoDB, allowing for rich data structures, including arrays and nested documents.

### Why is PyMongo Used?
#### Ease of Use:
* PyMongo offers a simple and intuitive API for interacting with MongoDB, making it accessible for developers familiar with Python.
#### Integration with Python Applications:
* It seamlessly integrates with Python applications, allowing developers to leverage MongoDB's capabilities within their existing codebases.
#### Community and Documentation:
* As the official MongoDB driver for Python, PyMongo has extensive documentation and a supportive community, making it easier for developers to find resources and troubleshoot issues.
#### Performance:
* PyMongo is optimized for performance, allowing efficient data access and manipulation, which is crucial for applications that require fast response times.
#### Flexibility:
* It supports various MongoDB features, including transactions, change streams, and gridFS, providing developers with the flexibility to build a wide range of applications.

## Question-10. What are the ACID properties in the context of MongoDB transactions?

* ACID properties refer to a set of principles that ensure reliable processing of database transactions. ACID stands for Atomicity, Consistency, Isolation, and Durability.

### Atomicity :
* Transactions are treated as a single unit either all operations succeed, or none do. If any operation fails, the entire transaction is rolled back, maintaining the database's previous state.

### Consistency :
* Transactions must transition the database from one valid state to another and After a transaction, the data must comply with the schema and validation rules.

### Isolation : 
* Transactions operate independently, meaning the operations in one transaction are not visible to others until committed. This prevents interference and ensures that transactions do not affect each other.

### Durability : 
* Transactions operate independently, meaning the operations in one transaction are not visible to others until committed. This prevents interference and ensures that transactions do not affect each other.

## Question- 11. What is the purpose of MongoDB’s explain() function?

* The explain() function in MongoDB provides detailed information about the execution plan for a query, including the stages involved, the order of execution, and the estimated cost of each stage. It helps developers understand the performance of their queries and optimize them for better performance.

* Query Execution Plan: The explain() function returns the execution plan for a given query, detailing how MongoDB processes the query. 

* Performance Metrics:  It provides various performance metrics, such as the time taken to execute the query, the number of documents examined, and the number of documents returned.

* Index Usage: The output of explain() indicates whether the query is using an index and if so which index is being utilized.

* Different Modes: The explain() function can be run in different verbosity modes (e.g., "queryPlanner", "executionStats", and "allPlansExecution"), allowing users to choose the level of detail they need for their analysis.

* Optimization Guidance: By analyzing the output of explain(), developers can make informed decisions about query optimization, such as modifying queries, adding indexes, or restructuring data models to enhance performance.

## Question- 12.How does MongoDB handle schema validation?

* MongoDB uses a schemaless data model, which means it does not require a predefined structure or schema for documents. This flexibility allows developers to define the structure of their documents based on their specific requirements and use cases.

* MongoDB provides a rich set of validation mechanisms, including validation rules, default values, and validation operators, to ensure that documents meet specific criteria.

### Validation Rules: 
MongoDB allows you to define validation rules using JSON Schema, a powerful and flexible schema definition language. You can specify required fields, data types, value ranges, and other constraints that documents must meet to be considered valid.

### Validation Levels:

#### Strict: 
* Documents that do not meet the validation criteria will be rejected during insertion or update operations.
#### Moderate: 
* Documents that do not meet the criteria will generate a warning but will still be allowed to be inserted or updated.

### Validation Actions: 
* When a document fails validation, you can configure the action that MongoDB should take:
* Error: The operation will fail, and an error message will be returned.
* Warn: The operation will succeed, but a warning will be logged.

## Question- 13. What is the difference between a primary and a secondary node in a replica set?

In MongoDB a replica set is a group of MongoDB servers that maintain the same dataset, providing redundancy and high availability. Within a replica set, there are two main types of nodes: primary nodes and secondary nodes.

### Primary Node:- 
* Role: The primary node is the main node in a replica set that receives all write operations.
* Data Consistency: The primary node holds the most up-to-date version of the data.
* Election: In the event of a primary node failure, the replica set will automatically elect a new primary from the secondary nodes.
* Read Operations: By default read operations are directed to the primary node.* 
* Replication: The primary node replicates its data to the secondary nodes, ensuring that they have the same dataset.

### Secondary Node
* Role: Secondary nodes are replicas of the primary node. They do not accept write operations directly but replicate the data from the primary node.
* Data Consistency: Secondary nodes maintain copies of the data from the primary node, but they may not always be up-to-date due to replication lag.
* Read Operations: Secondary nodes can be configured to handle read operations, depending on the read preference settings. 
* Failover: If the primary node fails, one of the secondary nodes can be elected as the new primary, ensuring high availability and fault tolerance.
* Backup and Reporting: Secondary nodes can be used for backup purposes or for running read-heavy operations without impacting the performance of the primary node.

## Question - 14. What security mechanisms does MongoDB provide for data protection?

MongoDB incorporates a variety of security mechanisms to protect data and ensure secure access to the database.

### Authentication: 
MongoDB supports multiple authentication methods to verify user identities, including:

* SCRAM: The default mechanism that uses salted passwords for secure authentication.
* LDAP: Integrates with existing directory services for user authentication.
* Kerberos: Provides secure authentication for users and services in a network.
* x.509 Certificates: Used for client and server authentication in secure environments.

### Authorization: 
* MongoDB employs role-based access control (RBAC), allowing administrators to define roles with specific permissions. Users are assigned roles that grant access to certain resources and operations, ensuring that only authorized individuals can perform specific actions on the database.

### Encryption:
* Encryption at Rest: Data stored on disk can be encrypted using the Encrypted Storage Engine, protecting it from unauthorized access.
* Encryption in Transit: Data transmitted over the network can be encrypted using TLS/SSL, safeguarding it from interception during communication between clients and servers.

### Auditing: 
* MongoDB provides auditing capabilities that log database operations, including user access and administrative actions.

### Network Security: 
* MongoDB can be configured to bind to specific IP addresses, limiting access to trusted sources. It also supports firewalls and Virtual Private Networks (VPNs) to enhance network security.

## Question- 15. Explain the concept of embedded documents and when they should be used.

Embedded documents are documents that are stored as part of another document, rather than being stored as separate documents in a separate collection. Embedded documents can be used to represent hierarchical data structures, enabling more efficient querying and manipulation of the data.

### When to Use Embedded Documents:
* One-to-Few Relationships: Embedded documents are ideal for one-to-few relationships, where a parent document contains a limited number of related sub-documents.
* Data Locality: When you frequently access a parent document and its embedded documents together, embedding can enhance performance.
* Atomicity: MongoDB ensures that updates to a single document, including its embedded documents, are atomic.
* Simplified Data Model: Embedding can lead to a more straightforward data model by reducing the need for complex joins or multiple collections.
* Avoiding Joins: Since MongoDB does not support traditional joins like relational databases, embedding can help avoid the need for complex queries that would require joining multiple collections.

## Question - 16. What is the purpose of MongoDB’s $lookup stage in aggregation?

The $lookup stage in MongoDB's aggregation framework is used to perform a left outer join between two collections. It combines documents from the input array with documents from the specified collection, based on a specified condition.

### Purpose:
* Combines data from a primary collection and a secondary collection.
* Matches documents based on specified fields.
* Embeds the matching documents into an array field in the result.
* Useful for querying related data across collections without denormalizing.

## Question - 17. What are some common use cases for MongoDB?

* MongoDB is widely used for various applications due to its flexibility and scalability. Common use cases include:

* ### *Content Management Systems (CMS):* 
Handles unstructured or semi-structured data like blog posts, articles, and media files, allowing for easy content updates and retrieval.

* ### *E-commerce Platforms:*
Manages product catalogs, customer details, and transaction data efficiently, enabling dynamic product variations and personalized user experiences.

* ### *Real-Time Analytics:* 
Processes and analyzes data streams, such as user activity logs or IoT sensor data, providing immediate insights for decision-making.

* ### *Mobile and Web Applications:* 
Supports dynamic schema changes for user-generated content and rapid iteration, making it suitable for applications with evolving data requirements.

* ### *Gaming Applications:*
Stores player profiles, in-game assets, and real-time leaderboards, facilitating high write loads and real-time data processing.

* ### *Big Data Applications:* 
Handles large volumes of unstructured or semi-structured data for data lakes, enabling efficient storage and analysis of diverse data sources.

## Question - 18. What are the advantages of using MongoDB for horizontal scaling?

* MongoDB offers several advantages for horizontal scaling, making it a popular choice for applications that require high availability and the ability to handle large volumes of data.

* #### *Distributes Data:* 
Automatically divides data across multiple servers (shards) based on a shard key, ensuring balanced data distribution and preventing any single server from becoming a bottleneck.

* #### *Increases Performance:* 
Queries and writes are distributed across shards, reducing the load on individual servers and significantly improving overall speed and responsiveness.

* #### *Supports High Availability:* 
Each shard can have its own replica set, ensuring fault tolerance and minimal downtime in case of server failures, which is crucial for mission-critical applications.

* #### *Handles Large Datasets:* 
Easily scales storage capacity by adding more shards as data grows, allowing organizations to manage large volumes of data without performance degradation.

* #### *Cost-Effective:* 
Enables scaling with commodity hardware rather than relying on expensive high-end servers, making it a more economical solution for growing data needs.

## Question - 19. How do MongoDB transactions differ from SQL transactions?

* MongoDB transactions and SQL transactions serve the same fundamental purpose of ensuring data integrity and consistency during operations that involve multiple documents or tables. 

### Data Model:
* *MongoDB:*   MongoDB is a document-oriented NoSQL database that stores data in BSON format. Transactions can span multiple documents within a single collection or across multiple collections, allowing for more flexible data structures.
*SQL:*   SQL databases are based on a relational model, where data is organized into tables with predefined schemas. Transactions typically operate within a single table or across multiple related tables using foreign keys.

### ACID Compliance:
* *MongoDB:*   As of version 4.0, MongoDB supports multi-document ACID transactions, ensuring that all operations within a transaction are atomic, consistent, isolated, and durable. However, earlier versions only supported atomic operations at the document level.
* *SQL:*   SQL databases have traditionally supported ACID transactions from the outset, ensuring strong consistency and integrity across multiple rows and tables.

### Isolation Levels:
* *MongoDB:*   MongoDB provides snapshot isolation for transactions, meaning that transactions see a consistent snapshot of the data at the start of the transaction. This can lead to less contention but may not provide the same level of isolation as some SQL databases.
* *SQL:*   SQL databases offer various isolation levels (e.g., READ COMMITTED, REPEATABLE READ, SERIALIZABLE) that allow developers to choose the level of consistency and concurrency control needed for their applications.

### Performance:
* *MongoDB:*   Transactions in MongoDB can introduce overhead, especially when spanning multiple documents or collections. However, MongoDB is designed for high throughput and can handle many concurrent operations efficiently.
* *SQL:*   SQL transactions can also introduce overhead, particularly with complex joins and locking mechanisms. However, they are optimized for relational data access patterns.

### Syntax and Implementation:
* *MongoDB:*   Transactions in MongoDB are initiated using the startSession() method, and operations are wrapped in a session context. The syntax is different from traditional SQL, as it uses JavaScript-like syntax for operations.
* *SQL:*   SQL transactions are typically initiated with BEGIN TRANSACTION, followed by a series of SQL statements, and concluded with COMMIT or ROLLBACK. The syntax is standardized across SQL databases.

## Question - 20. What are the main differences between capped collections and regular collections?

Capped collections in MongoDB are a special type of collection that have specific characteristics and behaviors that differentiate them from regular collections.


#### *Fixed Size:*
* *Capped Collections:* Capped collections have a fixed size limit defined in bytes. Once this limit is reached, the oldest documents are automatically removed to make space for new documents. This behavior ensures that the collection does not grow indefinitely.
* *Regular Collections:* Regular collections do not have a size limit. They can grow as needed, depending on the amount of data being inserted, until the storage capacity of the database is reached.

#### *Insertion Order:*
* *Capped Collections:* Capped collections maintain the order of document insertion. The documents are stored in the order they are added, and this order is preserved when retrieving documents.
* *Regular Collections:* Regular collections do not guarantee any specific order of documents unless an index is created to enforce a particular order. The order of documents can change based on various operations, such as updates or deletions.

#### *Document Deletion:*
* *Capped Collections:* Documents in capped collections are automatically deleted in a first-in, first-out (FIFO) manner when the size limit is reached. This means that the oldest documents are removed to accommodate new ones.
* *Regular Collections:* In regular collections, documents are deleted only when explicitly removed by a delete operation. There is no automatic deletion based on size or age.

#### *Indexing:*
* *Capped Collections:* Capped collections automatically create an index on the _id field, which is unique for each document. However, they do not support secondary indexes, which limits the types of queries that can be performed.
* *Regular Collections:* Regular collections support multiple indexes, including secondary indexes, allowing for more complex queries and efficient data retrieval.

#### *Use Cases:*
* *Capped Collections:* Capped collections are ideal for use cases where you need to store a fixed-size log of events, such as logging systems, real-time data feeds, or caching scenarios where only the most recent data is relevant.
* *Regular Collections:* Regular collections are suitable for general-purpose data storage where the size and lifespan of the data are not predetermined, such as user profiles, product catalogs, and other dynamic datasets.

#### *Performance:*
* *Capped Collections:* Capped collections can offer better performance for certain use cases, as they are optimized for high-throughput insert operations and do not require complex management of document deletion.
* *Regular Collections:* Regular collections may incur overhead due to the need for managing document growth, deletions, and indexing, which can affect performance in write-heavy scenarios.

## Question - 21. What is the purpose of the *'$match'* stage in MongoDB’s aggregation pipeline?

The *'*'$match'*'* stage in MongoDB’s aggregation pipeline is used to filter documents from the input collection based on specified criteria. It serves a similar purpose to the *'find()'* method but is integrated into the aggregation framework, allowing for more complex data processing and transformation.

### *Purpose of the *'$match'* Stage*

* #### Filtering Documents:
The primary purpose of the *'$match'* stage is to filter documents that meet certain conditions. Only the documents that satisfy the specified criteria will be passed to the next stage in the pipeline, reducing the amount of data processed in subsequent stages.

* #### Using Query Operators:
The *'$match'* stage supports a wide range of query operators, such as $eq, $gt, $lt, $in, and logical operators like $and, $or, and $not. This allows for complex filtering conditions to be applied to the documents.

* #### Improving Performance:
By filtering out unnecessary documents early in the aggregation pipeline, the *'$match'* stage can improve performance. It reduces the amount of data that needs to be processed in later stages, which can lead to faster query execution times.

* #### Combining with Other Stages:
The *'$match'* stage can be used in conjunction with other aggregation stages, such as $group, $sort, and $project. This allows for powerful data transformations and aggregations based on filtered data.

* #### Placement in the Pipeline:
The *'$match'* stage can be placed at the beginning or in the middle of the aggregation pipeline. Placing it early can help optimize performance by reducing the dataset size before further processing.

## Question- 22. How can you secure access to a MongoDB database?

#### Authentication:
* Use SCRAM (Salted Challenge Response Authentication Mechanism) or LDAP to authenticate users.
* Enable MongoDB authentication to ensure that only authorized users can access the database.

#### Authorization (Role-Based Access Control):
* Implement RBAC to define and enforce access control at the database level, assigning users to specific roles with defined permissions.
* Use built-in roles or create custom roles for fine-grained access control.

#### Encryption:
* Encryption at Rest: Enable the encrypted storage engine to encrypt data stored on disk.
* Encryption in Transit: Use TLS/SSL to encrypt data transmitted between MongoDB clients and servers to protect against interception.

#### Auditing:
* Enable auditing to track user actions and detect unauthorized access attempts or suspicious behavior.

#### IP Whitelisting and Network Security:
* Restrict database access to trusted IP addresses using IP whitelisting.
* Use firewalls to prevent unauthorized external access.

#### Disable Unused Features:
* Disable unnecessary services like the MongoDB HTTP interface and unused ports to reduce the attack surface.

#### Backup Security:
* Ensure that backups are encrypted and stored securely to prevent unauthorized access to sensitive data.

## Question - 23. What is MongoDB’s WiredTiger storage engine, and why is it important?

MongoDB’s WiredTiger storage engine is the default storage engine for MongoDB starting from version 3.2. It is designed to provide high performance, scalability, and efficient data management. 

### Key Features of WiredTiger

* #### *Document-Level Locking:*
WiredTiger uses document-level locking, which allows multiple operations to occur concurrently on different documents. This improves performance and throughput, especially in write-heavy workloads, as it reduces contention compared to collection-level or database-level locking.

* #### *Compression:*
WiredTiger supports data compression, which helps reduce the amount of disk space used by the database. It offers various compression algorithms, such as Snappy and Zlib, allowing users to choose the level of compression that best fits their needs. This feature not only saves storage space but can also improve I/O performance.

* #### *Memory-Mapped Files:*
WiredTiger utilizes memory-mapped files to manage data storage, which allows the operating system to handle caching and memory management. This can lead to better performance by leveraging the operating system's capabilities for efficient memory usage.

* #### *Multi-Version Concurrency Control (MVCC):*
WiredTiger implements MVCC, which allows readers to access a consistent snapshot of the data without being blocked by writers. This enhances read performance and ensures that read operations do not interfere with write operations, providing a more responsive experience for applications.

* #### *Checkpointing:*
WiredTiger uses a checkpointing mechanism to ensure data durability and consistency. Checkpoints are created periodically, allowing the system to recover to a consistent state in the event of a failure.

* #### *Scalability:*
The architecture of WiredTiger is designed to scale efficiently with increasing data volumes and workloads. It can handle large datasets and high-throughput applications, making it suitable for modern applications that require robust performance.
Importance of WiredTiger

* #### *Performance Improvements:*
The features of WiredTiger, such as document-level locking and MVCC, significantly enhance the performance of MongoDB, especially in environments with high concurrency and large datasets. This makes it suitable for applications that require fast read and write operations.

* #### *Efficient Resource Utilization:*
With its support for compression and memory-mapped files, WiredTiger optimizes resource usage, reducing the overall storage footprint and improving I/O performance. This is particularly important for organizations looking to minimize costs associated with storage and infrastructure.

* #### *Enhanced Data Integrity:*
The checkpointing and MVCC mechanisms in WiredTiger contribute to data integrity and durability, ensuring that data remains consistent even in the event of failures. This is critical for applications that require reliable data storage and retrieval.

* #### *Flexibility for Developers:*
WiredTiger provides developers with options for tuning performance through configuration settings, such as choosing different compression algorithms and adjusting cache sizes. This flexibility allows developers to optimize the database for their specific use cases.

## *Question- 1. Write a Python script to load the Superstore dataset from a CSV file into MongoDB.*

In [81]:
import pandas as pd
from pymongo import MongoClient

In [82]:
df = pd.read_csv("superstore.csv", encoding="latin1")

In [83]:
df.head()

Unnamed: 0,Row ID,Order ID,Order Date,Ship Date,Ship Mode,Customer ID,Customer Name,Segment,Country,City,...,Postal Code,Region,Product ID,Category,Sub-Category,Product Name,Sales,Quantity,Discount,Profit
0,1,CA-2016-152156,11/8/2016,11/11/2016,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,...,42420,South,FUR-BO-10001798,Furniture,Bookcases,Bush Somerset Collection Bookcase,261.96,2,0.0,41.9136
1,2,CA-2016-152156,11/8/2016,11/11/2016,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,...,42420,South,FUR-CH-10000454,Furniture,Chairs,"Hon Deluxe Fabric Upholstered Stacking Chairs,...",731.94,3,0.0,219.582
2,3,CA-2016-138688,6/12/2016,6/16/2016,Second Class,DV-13045,Darrin Van Huff,Corporate,United States,Los Angeles,...,90036,West,OFF-LA-10000240,Office Supplies,Labels,Self-Adhesive Address Labels for Typewriters b...,14.62,2,0.0,6.8714
3,4,US-2015-108966,10/11/2015,10/18/2015,Standard Class,SO-20335,Sean O'Donnell,Consumer,United States,Fort Lauderdale,...,33311,South,FUR-TA-10000577,Furniture,Tables,Bretford CR4500 Series Slim Rectangular Table,957.5775,5,0.45,-383.031
4,5,US-2015-108966,10/11/2015,10/18/2015,Standard Class,SO-20335,Sean O'Donnell,Consumer,United States,Fort Lauderdale,...,33311,South,OFF-ST-10000760,Office Supplies,Storage,Eldon Fold 'N Roll Cart System,22.368,2,0.2,2.5164


In [84]:
client = MongoClient("mongodb://localhost:27017/")
db = client["superstore_db"]
collection = db["superstore_data"]

In [85]:
data_dict = df.to_dict(orient="records")

In [86]:
if data_dict:
    collection.insert_many(data_dict)
    print(f"Successfully inserted {len(data_dict)} records into MongoDB.")
else:
    print("No data to insert.")

Successfully inserted 9994 records into MongoDB.


## *Question - 2.Retrieve and print all documents from the Orders collection.*


In [87]:
collection = db["superstore_data"] 

In [88]:
documents = collection.find()

In [None]:
for doc in documents:
    print(doc)
    
'''I can't display the file output on GitHub due to size limitations'''

## *Question-3. Count and display the total number of documents in the Orders collection*

In [90]:
total_documents = collection.count_documents({})

In [91]:
print(f"Total number of documents in the Superstore collection is :- {total_documents}")

Total number of documents in the Superstore collection is :- 39976


## *Question-4. Write a query to fetch all orders from the "West" region.*

In [92]:
west_orders = collection.find({"Region": "West"})

In [None]:
for order in west_orders:
    print(order)

'''I can't display the file output on GitHub due to size limitations'''

## *Question-5. Write a query to find orders where Sales is greater than 500.*

In [94]:
high_sales_orders = collection.find({"Sales": {"$gt": 500}})

In [None]:
for order in high_sales_orders:
    print(order)

'''I can't display the file output on GitHub due to size limitations'''

## *Question-6. Fetch the top 3 orders with the highest Profit.*

In [96]:
top_profitable_orders = collection.find().sort("Profit", -1).limit(3)

In [97]:
for order in top_profitable_orders:
    print(order)

{'_id': ObjectId('6798adc75a7a33d83efe5814'), 'Row ID': 6827, 'Order ID': 'CA-2016-118689', 'Order Date': '10/2/2016', 'Ship Date': '10/9/2016', 'Ship Mode': 'Standard Class', 'Customer ID': 'TC-20980', 'Customer Name': 'Tamara Chand', 'Segment': 'Corporate', 'Country': 'United States', 'City': 'Lafayette', 'State': 'Indiana', 'Postal Code': 47905, 'Region': 'Central', 'Product ID': 'TEC-CO-10004722', 'Category': 'Technology', 'Sub-Category': 'Copiers', 'Product Name': 'Canon imageCLASS 2200 Advanced Copier', 'Sales': 17499.95, 'Quantity': 5, 'Discount': 0.0, 'Profit': 8399.976}
{'_id': ObjectId('6798a6015a7a33d83efe3105'), 'Row ID': 6827, 'Order ID': 'CA-2016-118689', 'Order Date': '10/2/2016', 'Ship Date': '10/9/2016', 'Ship Mode': 'Standard Class', 'Customer ID': 'TC-20980', 'Customer Name': 'Tamara Chand', 'Segment': 'Corporate', 'Country': 'United States', 'City': 'Lafayette', 'State': 'Indiana', 'Postal Code': 47905, 'Region': 'Central', 'Product ID': 'TEC-CO-10004722', 'Category

## *Question- 7. Update all orders with Ship Mode as "First Class" to "Premium Class.*

In [98]:
updated_orders = collection.update_many({"Ship Mode": "First Class"}, {"$set": {"Ship Mode": "Premium Class"}})
print(f"Successfully updated {updated_orders.modified_count} orders.")

Successfully updated 1538 orders.


## *Question- 8. Delete all orders where Sales is less than 50.*

In [99]:
deleted_orders = collection.delete_many({"Sales": {"$lt": 50}})
print(f"Successfully deleted {deleted_orders.deleted_count} orders.")

Successfully deleted 19396 orders.


## *Question- 9. Use aggregation to group orders by Region and calculate total sales per reg.*

In [111]:
grouped_orders = [
    {"$group": {"_id": "$Region", "Total Sales": {"$sum": "$Sales"}}}
]

In [112]:
results = collection.aggregate(grouped_orders)

In [113]:
for result in results:
    print(result)

{'_id': 'East', 'Total Sales': 2604550.82}
{'_id': 'West', 'Total Sales': 2778746.478}
{'_id': 'South', 'Total Sales': 1504093.248}
{'_id': 'Central', 'Total Sales': 1918447.3832}


## *Question- 10. Fetch all distinct values for Ship Mode from the collection.*

In [114]:
distinct_ship_modes = collection.distinct("Ship Mode")
print(distinct_ship_modes)

['Premium Class', 'Same Day', 'Second Class', 'Standard Class']


## *Question- 11. Count the number of orders for each category.*

In [136]:
order_count_by_category = [
    {"$group": {"_id": "$Category", "Order Count": {"$sum": 1}}}
]

In [137]:
results = collection.aggregate(order_count_by_category)

In [138]:
for result in results:
    print(f"Category: {result['_id']}, Order Count: {result}")

Category: Office Supplies, Order Count: {'_id': 'Office Supplies', 'Order Count': 8304}
Category: Technology, Order Count: {'_id': 'Technology', 'Order Count': 5984}
Category: Furniture, Order Count: {'_id': 'Furniture', 'Order Count': 6292}
