# MongoDB Basics with Python: Post Data Analysis in CMS

MongoDB is a popular NoSQL database that allows for flexible, schema-less storage of data. It is widely used in content management systems (CMS) due to its ability to handle diverse data types and scale easily. In this notebook will practice the basics of MongoDB CRUD operations using Python, covering essential concepts and practical examples for managing and analyzing post data within a CMS.

#### Setting up the connection to MongoDB from Python
First, we need to set up the connection to the MongoDB server.

In [1]:
from pymongo import MongoClient
from datetime import datetime

# Replace the connection string with your MongoDB instance details
client = MongoClient('mongodb://localhost:27017/')

# Create a new database called 'cms_database'
db = client['cms_database']

# List available databases to verify the creation (the database will appear after inserting a document)
databases = client.list_database_names()
databases

['admin', 'config', 'conversations_db', 'local']

The 'cms_database' will not appear yet because it is empty. The database will not appear in the list until a document is inserted.

### Creating a collection and inserting a document

* Collection: Similar to a table in relational databases. It is a grouping of documents.
* Document: A record in MongoDB. Documents are stored in BSON (Binary JSON) format.

Let's create a collection for storing posts and insert a document.

In [2]:
# Create a collection called 'posts'
posts = db['posts']

# Insert a sample document
post = {
    "_id": "post1",
    "author": "Brad Pitt",
    "content": "I love Italian cuisine! Especially pasta.",
    "tags": ["food", "Italian cuisine"],
    "published_date": datetime(2024, 7, 18),
    "views": 1500,
    "rating": 4.5,
    "verified": True,
    "timestamp": datetime(2024, 7, 18, 12, 0, 0),
    "comments": [
        {
            "commenter": "Angelina Jolie",
            "comment": "Me too! Pasta is the best.",
            "comment_date": datetime(2024, 7, 18),
            "likes": 15,
            "approved": True
        },
        {
            "commenter": "Leonardo DiCaprio",
            "comment": "Try the carbonara, it's amazing!",
            "comment_date": datetime(2024, 7, 19),
            "likes": 20,
            "approved": False
        }
    ]
}

# Insert the post into the 'posts' collection
post_id = posts.insert_one(post).inserted_id

# Verify the insertion by listing available databases
databases = client.list_database_names()
databases

['admin', 'cms_database', 'config', 'conversations_db', 'local']

* Every document in MongoDB must have an `_id` field, which uniquely identifies the document within a collection. If we do not specify an `_id`, MongoDB will automatically generate an ObjectId. In this example, we explicitly set the _id to "post1".

* The `insert_one()` operation creates both the database `cms_database` and the collection ``posts` if they do not already exist. The database 'cms_database' now exists because it contains a document.
 
* The `inserted_id` attribute is used to retrieve the unique identifier (ObjectId) of the document that was just inserted into the collection, allowing you to reference this document later. This ObjectId is automatically generated by MongoDB if not provided.

#### Inserting many document

We can also insert multiple documents at once using the insert_many() method.

In [3]:
# Create multiple documents to insert
new_posts = [
    {
        "author": "Angelina Jolie",
        "content": "Mediterranean food is not only healthy but also delicious!",
        "tags": ["food", "Mediterranean"],
        "published_date": datetime(2024, 7, 19),
        "views": 1200,
        "rating": 4.8,
        "verified": True,
        "timestamp": datetime(2024, 7, 19, 14, 30, 0),
        "comments": [
            {
                "commenter": "Brad Pitt",
                "comment": "Totally agree!",
                "comment_date": datetime(2024, 7, 20),
                "likes": 10,
                "approved": True
            }
        ]
    },
    {
        "author": "Leonardo DiCaprio",
        "content": "Sushi is my favorite comfort food.",
        "tags": ["food", "Japanese cuisine", "Sushi"],
        "published_date": datetime(2024, 7, 20),
        "views": 800,
        "rating": 4.3,
        "verified": False,
        "timestamp": datetime(2024, 7, 20, 16, 0, 0),
        "comments": [
            {
                "commenter": "Brad Pitt",
                "comment": "Sushi is great! Have you tried sashimi?",
                "comment_date": datetime(2024, 7, 21),
                "likes": 5,
                "approved": True
            }
        ]
    },
    {
        "author": "Natalie Portman",
        "content": "As a vegan, I love discovering new plant-based recipes.",
        "tags": ["food", "vegan"],
        "published_date": datetime(2024, 7, 21),
        "views": 1100,
        "rating": 4.5,
        "verified": True,
        "timestamp": datetime(2024, 7, 21, 10, 0, 0),
        "comments": [
            {
                "commenter": "Angelina Jolie",
                "comment": "I should try more vegan dishes!",
                "comment_date": datetime(2024, 7, 22),
                "likes": 8,
                "approved": False
            }
        ]
    }
]

# Insert the new posts into the 'posts' collection
inserted_ids = posts.insert_many(new_posts).inserted_ids

### Querying the collection

Now that we have inserted multiple documents, let's perform some queries to retrieve and manipulate data.

##### Retrieving a single document

To retrieve a single document from a MongoDB collection, we use the `find_one()` method.

  - **no criteria**: When called without parameters, `find_one()` retrieves the first document from the collection in the order they were inserted or stored.
  - **with criteria**: When provided with a query (e.g., `find_one({"author": "Brad Pitt"})`), it returns the first document that matches the query.

In [4]:
# Retrieve a document in the 'posts' collection without any criteria
print("Retrieving a document without criteria:")
post = posts.find_one()
print(post)

# Retrieve a document with criteria
print("\nRetrieving a document with criteria (author: Angelina Jolie):")
post_with_criteria = posts.find_one({"author": "Angelina Jolie"})
print(post_with_criteria)

Retrieving a document without criteria:
{'_id': 'post1', 'author': 'Brad Pitt', 'content': 'I love Italian cuisine! Especially pasta.', 'tags': ['food', 'Italian cuisine'], 'published_date': datetime.datetime(2024, 7, 18, 0, 0), 'views': 1500, 'rating': 4.5, 'verified': True, 'timestamp': datetime.datetime(2024, 7, 18, 12, 0), 'comments': [{'commenter': 'Angelina Jolie', 'comment': 'Me too! Pasta is the best.', 'comment_date': datetime.datetime(2024, 7, 18, 0, 0), 'likes': 15, 'approved': True}, {'commenter': 'Leonardo DiCaprio', 'comment': "Try the carbonara, it's amazing!", 'comment_date': datetime.datetime(2024, 7, 19, 0, 0), 'likes': 20, 'approved': False}]}

Retrieving a document with criteria (author: Angelina Jolie):
{'_id': ObjectId('66a3c7c02c7227f672b7667a'), 'author': 'Angelina Jolie', 'content': 'Mediterranean food is not only healthy but also delicious!', 'tags': ['food', 'Mediterranean'], 'published_date': datetime.datetime(2024, 7, 19, 0, 0), 'views': 1200, 'rating': 4.8

* `{ }` in MongoDB querying - In MongoDB, curly braces `{ }` are used to define query conditions. Inside these braces, we specify the fields and the conditions that the documents must meet to be selected by the query.

##### Retrieving multiple documents

To retrieve multiple documents that match certain criteria, use the `find()` method.

In [5]:
# Find all documents
all_posts = posts.find()

# Print the results
print("All posts:")
for post in all_posts:
    print("\n",post)

All posts:

 {'_id': 'post1', 'author': 'Brad Pitt', 'content': 'I love Italian cuisine! Especially pasta.', 'tags': ['food', 'Italian cuisine'], 'published_date': datetime.datetime(2024, 7, 18, 0, 0), 'views': 1500, 'rating': 4.5, 'verified': True, 'timestamp': datetime.datetime(2024, 7, 18, 12, 0), 'comments': [{'commenter': 'Angelina Jolie', 'comment': 'Me too! Pasta is the best.', 'comment_date': datetime.datetime(2024, 7, 18, 0, 0), 'likes': 15, 'approved': True}, {'commenter': 'Leonardo DiCaprio', 'comment': "Try the carbonara, it's amazing!", 'comment_date': datetime.datetime(2024, 7, 19, 0, 0), 'likes': 20, 'approved': False}]}

 {'_id': ObjectId('66a3c7c02c7227f672b7667a'), 'author': 'Angelina Jolie', 'content': 'Mediterranean food is not only healthy but also delicious!', 'tags': ['food', 'Mediterranean'], 'published_date': datetime.datetime(2024, 7, 19, 0, 0), 'views': 1200, 'rating': 4.8, 'verified': True, 'timestamp': datetime.datetime(2024, 7, 19, 14, 30), 'comments': [{'

#### Filtering

In [6]:
# Retrieve multiple documents with a specific criterion
posts_with_high_views = posts.find({"views": {"$gt": 1100}})  # Find posts with more than 1000 views

# Print the results
print("Posts with high views:")
for post in posts_with_high_views:
    print("\n",post)

Posts with high views:

 {'_id': 'post1', 'author': 'Brad Pitt', 'content': 'I love Italian cuisine! Especially pasta.', 'tags': ['food', 'Italian cuisine'], 'published_date': datetime.datetime(2024, 7, 18, 0, 0), 'views': 1500, 'rating': 4.5, 'verified': True, 'timestamp': datetime.datetime(2024, 7, 18, 12, 0), 'comments': [{'commenter': 'Angelina Jolie', 'comment': 'Me too! Pasta is the best.', 'comment_date': datetime.datetime(2024, 7, 18, 0, 0), 'likes': 15, 'approved': True}, {'commenter': 'Leonardo DiCaprio', 'comment': "Try the carbonara, it's amazing!", 'comment_date': datetime.datetime(2024, 7, 19, 0, 0), 'likes': 20, 'approved': False}]}

 {'_id': ObjectId('66a3c7c02c7227f672b7667a'), 'author': 'Angelina Jolie', 'content': 'Mediterranean food is not only healthy but also delicious!', 'tags': ['food', 'Mediterranean'], 'published_date': datetime.datetime(2024, 7, 19, 0, 0), 'views': 1200, 'rating': 4.8, 'verified': True, 'timestamp': datetime.datetime(2024, 7, 19, 14, 30), 'co

##### Common MongoDB filtering operators
We can use various query operators to filter the results based on specific conditions. These operators are used inside the field curly braces `{ }` to specify conditions, while the external curly braces `{ }` define the overall query structure.

1. `$gt` (greater than) - To find documents where a field's value is greater than a specified value. For example, to find documents where the `views` field is greater than 1000.
```json
{ "views": { "$gt": 1000 } }
```

2. `$lt` (less than) - To find documents where a field's value is less than a specified value. For example, to find documents where the `views` field is less than 500.
```json
{ "views": { "$lt": 500 } }
```

3. `$gte` (Greater than or equal to) - To find documents where a field's value is greater than or equal to a specified value. For example, to find documents where the `published_date` is on or after July 1, 2024.
```json
{ "published_date": { "$gte": ISODate("2024-07-01T00:00:00Z") } }
```

4. `$lte` (less than or equal to) - To find documents where a field's value is less than or equal to a specified value. For example, to find documents where the `rating` is less than or equal to 4.5.
```json
{ "rating": { "$lte": 4.5 } }
```

5. `$eq` (equal to) - To find documents where a field's value is equal to a specified value. For example, to find documents where the `author` is "Brad Pitt".
```json
{ "author": { "$eq": "Brad Pitt" } }
```
 - More straightforward and common way to perform equality checks is directly specifying the field’s value:
     ```json
    { "author": "Brad Pitt" }
    ```

6. `$ne` (not equal to) - To find documents where a field's value is not equal to a specified value. For example, to find documents where the `verified` field is not true.
```json
{ "verified": { "$ne": true } }
```

7. `$in` (in) - To find documents where a field's value matches any value in a specified array. For example, to find documents where the `tags` field contains either "food" or "Italian cuisine".
```json
{ "tags": { "$in": ["food", "Italian cuisine"] } }
```

8. `$nin` (not in) - To find documents where a field's value does not match any value in a specified array. For example, to find documents where the `tags` field does not contain "vegan".
```json
{ "tags": { "$nin": ["vegan"] } }
```

9. `$exists` (field existence) - To find documents where a specific field exists or does not exist. For example, to find documents where the `comments` field exists.
```json
{ "comments": { "$exists": true } }
```

10. `$regex` (regular expression) - To find documents where a field's value matches a regular expression pattern. For example, to find documents where the `content` field contains the word "pasta". The `$options: "i"` makes the search case-insensitive.
```json
{ "content": { "$regex": "pasta", "$options": "i" } }
```

#### Query with array elements

In [7]:
# Retrieve documents with a specific tag
posts_with_food_tag = posts.find({"tags": "Mediterranean"})
print("Posts with 'Mediterranean' Tag:")
for post in posts_with_food_tag:
    print("\n",post)

# Retrieve documents with arrays that match exactly - with correct order
posts_with_exact_match_with_food_and_mediterranean = posts.find({"tags": ["food", "Mediterranean"]})
print("\nPosts with tags exact order as ['food', 'Mediterranean']:")
for post in posts_with_exact_match_with_food_and_mediterranean:
    print("\n", post)
    
# Retrieve documents with arrays that match exactly - with incorrect order
posts_with_exact_match_with_mediterranean_and_food = posts.find({"tags": ["Mediterranean", "food"]})
print("\nPosts with tags exact order as ['Mediterranean', 'food']:")
for post in posts_with_exact_match_with_mediterranean_and_food:
    print("\n", post)

# Retrieve documents where the tags field contains both "food" and "Italian cuisine"
posts_with_food_and_italian = posts.find({"tags": {"$all": ["food", "Italian cuisine"]}})
print("\nPosts containing both 'food' and 'Italian cuisine' tags:")
for post in posts_with_food_and_italian:
    print("\n",post)
    

# Retrieve documents where the tags field contains at least one of "Mediterranean" or "Japanese cuisine"
posts_with_mediterranean_or_japanese = posts.find({"tags": {"$in": ["Mediterranean", "Japanese cuisine"]}})
print("\nPosts containing at least one of 'Mediterranean' or 'Japanese cuisine' tags:")
for post in posts_with_mediterranean_or_japanese:
    print("\n",post)
    
# Retrieve documents where the tags array contains exactly 3 elements
posts_with_two_tags = posts.find({"tags": {"$size": 3}})
print("\nPosts with exactly 3 tags:")
for post in posts_with_two_tags:
    print("\n", post)
    
# Retrieve documents where the first element in the comments array has more than 10 likes
posts_with_first_comment_likes = posts.find({"comments.0.likes": {"$gt": 10}})
print("\nPosts where the first comment has more than 10 likes:")
for post in posts_with_first_comment_likes:
    print("\n", post)

Posts with 'Mediterranean' Tag:

 {'_id': ObjectId('66a3c7c02c7227f672b7667a'), 'author': 'Angelina Jolie', 'content': 'Mediterranean food is not only healthy but also delicious!', 'tags': ['food', 'Mediterranean'], 'published_date': datetime.datetime(2024, 7, 19, 0, 0), 'views': 1200, 'rating': 4.8, 'verified': True, 'timestamp': datetime.datetime(2024, 7, 19, 14, 30), 'comments': [{'commenter': 'Brad Pitt', 'comment': 'Totally agree!', 'comment_date': datetime.datetime(2024, 7, 20, 0, 0), 'likes': 10, 'approved': True}]}

Posts with tags exact order as ['food', 'Mediterranean']:

 {'_id': ObjectId('66a3c7c02c7227f672b7667a'), 'author': 'Angelina Jolie', 'content': 'Mediterranean food is not only healthy but also delicious!', 'tags': ['food', 'Mediterranean'], 'published_date': datetime.datetime(2024, 7, 19, 0, 0), 'views': 1200, 'rating': 4.8, 'verified': True, 'timestamp': datetime.datetime(2024, 7, 19, 14, 30), 'comments': [{'commenter': 'Brad Pitt', 'comment': 'Totally agree!', 'c

##### MongoDB array query operators

1. Equality query (`{ "field": value }`) - Matches documents where the specified array field contains the exact value. For example, to find documents where the `tags` field contains the tag "food".
```json
{ "tags": "food" }
```

2. Exact array match (`{ "field": [value1, value2] }`) - Matches documents where the array field matches the specified array exactly, including the order of elements. For example, to find documents where the `tags` field is exactly "['food', 'Mediterranean']".
```json
{ "tags": ["food", "Mediterranean"] }
```

3. `$all` operator - Matches documents where the array field contains all of the specified values. This operator does not consider the order of the elements. For example, to find documents where the `tags` field contains both "food" and "Italian cuisine".
```json
{ "tags": { "$all": ["food", "Italian cuisine"] } }
```

4. `$in` operator - Matches documents where the array field contains any of the specified values. For example, to find documents where the `tags` field contains at least one of "food" or "Japanese cuisine".
```json
{ "tags": { "$in": ["food", "Japanese cuisine"] } }
```

5. `$nin` operator (not in) - Matches documents where the array field does not contain any of the specified values. For example, to find documents where the `tags` field does not contain "vegan" or "gluten-free".
```json
{ "tags": { "$nin": ["vegan", "gluten-free"] } }
```

6. `$size` operator - Matches documents where the array field has a specific number of elements. For example, to find documents where the `comments` array contains exactly 2 comments.
```json
{ "comments": { "$size": 2 } }
```

7. `$elemMatch` operator - Matches documents where at least one array element matches all specified criteria. For example, to find documents where the `comments` array contains an element where the `likes` field is greater than 10 and `approved` is true.
```json
{ "comments": { "$elemMatch": { "likes": { "$gt": 10 }, "approved": true } } }
```

8. `$exists` operator - Matches documents where the array field exists or does not exist. For example, to find documents where the `tags` field exists.
```json
{ "tags": { "$exists": true } }
```

9. `$type` operator - Matches documents where the array field is of a specific BSON type. For example, to find documents where the `tags` field is an array.
```json
{ "tags": { "$type": "array" } }
```

10. Query by the array index position - Matches documents where a specific element in the array matches the specified condition by its index position. For example, where the first element in the `comments` array (`comments.0`) has `likes` greater than 10. The index `0` refers to the first element in the array.

```json
{ "comments.0.likes": { "$gt": 10 } }
```

### Using `AND` and `OR` conditions

In [8]:
# Implicit $and using `,` - Find documents where the author is 'Brad Pitt' and views are greater than 1100
query = {"author": "Brad Pitt", "views": {"$gt": 1100}}
results = posts.find(query)
print("Documents where the author is 'Brad Pitt' and views are greater than 1100 (implicit $and):")
for post in results:
    print("\n",post)
    
# Explicit $and - Find documents where the author is 'Brad Pitt' and views are greater than 1100
query = {"$and": [{"author": "Brad Pitt"}, {"views": {"$gt": 1100}}]}
results = posts.find(query)
print("\nDocuments where the author is 'Brad Pitt' and views are greater than 1100 (explicit $and):")
for post in results:
    print("\n",post)
    
# $or operator - Find documents where the author is 'Brad Pitt' or views are greater than 1100
query = {"$or": [{"author": "Brad Pitt"}, {"views": {"$gt": 1100}}]}
results = posts.find(query)
print("\nDocuments where the author is 'Brad Pitt' or views are greater than 1100:")
for post in results:
    print("\n",post)

Documents where the author is 'Brad Pitt' and views are greater than 1100 (implicit $and):

 {'_id': 'post1', 'author': 'Brad Pitt', 'content': 'I love Italian cuisine! Especially pasta.', 'tags': ['food', 'Italian cuisine'], 'published_date': datetime.datetime(2024, 7, 18, 0, 0), 'views': 1500, 'rating': 4.5, 'verified': True, 'timestamp': datetime.datetime(2024, 7, 18, 12, 0), 'comments': [{'commenter': 'Angelina Jolie', 'comment': 'Me too! Pasta is the best.', 'comment_date': datetime.datetime(2024, 7, 18, 0, 0), 'likes': 15, 'approved': True}, {'commenter': 'Leonardo DiCaprio', 'comment': "Try the carbonara, it's amazing!", 'comment_date': datetime.datetime(2024, 7, 19, 0, 0), 'likes': 20, 'approved': False}]}

Documents where the author is 'Brad Pitt' and views are greater than 1100 (explicit $and):

 {'_id': 'post1', 'author': 'Brad Pitt', 'content': 'I love Italian cuisine! Especially pasta.', 'tags': ['food', 'Italian cuisine'], 'published_date': datetime.datetime(2024, 7, 18, 

##### Using the `$and` operator
The `$and` operator matches documents that satisfy all the specified conditions. We can specify `$and` in two ways. Both methods achieve the same result, but the explicit `$and` operator is useful when combining complex conditions.

1. **Implicit `$and` using `,`**: We simply list the conditions separated by commas within the curly braces `{}`. When we include multiple fields separated by `,` in a single query object, MongoDB implicitly performs an `$and` operation.
  ```python
  {"field1": value1, "field2": value2}
  ```
  
2. **Explicit `$and` using the `$and` operator**: We can explicitly use the `$and` operator to combine multiple conditions. We use the `$and` operator followed by an array `[]` that contains the conditions as separate objects.
  ```python
  {"$and": [{"field1": value1}, {"field2": value2}]}
  ```
  
##### Using the `$or` operator
The `$or` operator matches documents that satisfy at least one of the specified conditions. We use the `$or` operator followed by an array `[]` that contains the conditions as separate objects.
  ```python
  {"$or": [{"field1": value1}, {"field2": value2}]}
  ```

#### Query with nested/embedded documents

In [9]:
# Query by field in a nested document using $elemMatch - Retrieve documents with a specific commenter
posts_with_comment = posts.find({"comments": {"$elemMatch": {"commenter": "Angelina Jolie"}}})
print("Posts with comments by Angelina Jolie ($elemMatch operator):")
for post in posts_with_comment:
    print("\n", post)
    

# Query by field in a nested document using dot notation - Retrieve documents with a specific commenter
posts_with_comment = posts.find({"comments.commenter": "Angelina Jolie"})
print("\nPosts with comments by Angelina Jolie (dot notation):")
for post in posts_with_comment:
    print("\n", post)
    

####### Query by multiple fields ######
print("\nQuery by multiple fields:")
# Query by multiple fields in a nested document using $elemMatch - Retrieve documents where a commenter is "Angelina Jolie", 'approved' is True, and likes are greater than 10
posts_with_approved_angelina_comment_explicit = posts.find({
    "comments": {"$elemMatch": {"commenter": "Angelina Jolie", "approved": True, "likes": {"$gt": 10}}}
})
print("\nPosts with approved comments with likes > 10 by Angelina Jolie ($elemMatch operator):")
for post in posts_with_approved_angelina_comment_explicit:
    print("\n", post)
    

# Query by multiple fields in a nested document using dot notation - Retrieve documents where a commenter is "Angelina Jolie", 'approved' is True, and likes are greater than 10
posts_with_approved_angelina_comment = posts.find({
    "comments.commenter": "Angelina Jolie",
    "comments.approved": True,
    "comments.likes": {"$gt": 10}
})
print("\nPosts with approved comments with likes > 10 by Angelina Jolie (dot notation):")
for post in posts_with_approved_angelina_comment:
    print("\n",post)

Posts with comments by Angelina Jolie ($elemMatch operator):

 {'_id': 'post1', 'author': 'Brad Pitt', 'content': 'I love Italian cuisine! Especially pasta.', 'tags': ['food', 'Italian cuisine'], 'published_date': datetime.datetime(2024, 7, 18, 0, 0), 'views': 1500, 'rating': 4.5, 'verified': True, 'timestamp': datetime.datetime(2024, 7, 18, 12, 0), 'comments': [{'commenter': 'Angelina Jolie', 'comment': 'Me too! Pasta is the best.', 'comment_date': datetime.datetime(2024, 7, 18, 0, 0), 'likes': 15, 'approved': True}, {'commenter': 'Leonardo DiCaprio', 'comment': "Try the carbonara, it's amazing!", 'comment_date': datetime.datetime(2024, 7, 19, 0, 0), 'likes': 20, 'approved': False}]}

 {'_id': ObjectId('66a3c7c02c7227f672b7667c'), 'author': 'Natalie Portman', 'content': 'As a vegan, I love discovering new plant-based recipes.', 'tags': ['food', 'vegan'], 'published_date': datetime.datetime(2024, 7, 21, 0, 0), 'views': 1100, 'rating': 4.5, 'verified': True, 'timestamp': datetime.dateti

MongoDB provides two main ways to query these nested fields: using dot notation and explicitly specifying conditions within nested documents.

##### Querying using `$elemMatch`
The `$elemMatch` operator is used to match documents that contain an array with at least one element that matches all the specified conditions.

  ```python
  {"field": {"$elemMatch": {"subfield": value}}}
  ```

##### Querying with dot notation
Dot notation allows us to access and query fields within nested documents by specifying the path to the field.
  ```python
  { "field.subfield": value }
  ```

#### Query with date range

In [10]:
# Retrieve documents published in July 2024
posts_in_july = posts.find({"published_date": {"$gte": datetime(2024, 7, 20), "$lt": datetime(2024, 8, 1)}})

# Print the results
print("Posts Published between July 20th to August 1st 2024:")
for post in posts_in_july:
    print("\n",post)

Posts Published between July 20th to August 1st 2024:

 {'_id': ObjectId('66a3c7c02c7227f672b7667b'), 'author': 'Leonardo DiCaprio', 'content': 'Sushi is my favorite comfort food.', 'tags': ['food', 'Japanese cuisine', 'Sushi'], 'published_date': datetime.datetime(2024, 7, 20, 0, 0), 'views': 800, 'rating': 4.3, 'verified': False, 'timestamp': datetime.datetime(2024, 7, 20, 16, 0), 'comments': [{'commenter': 'Brad Pitt', 'comment': 'Sushi is great! Have you tried sashimi?', 'comment_date': datetime.datetime(2024, 7, 21, 0, 0), 'likes': 5, 'approved': True}]}

 {'_id': ObjectId('66a3c7c02c7227f672b7667c'), 'author': 'Natalie Portman', 'content': 'As a vegan, I love discovering new plant-based recipes.', 'tags': ['food', 'vegan'], 'published_date': datetime.datetime(2024, 7, 21, 0, 0), 'views': 1100, 'rating': 4.5, 'verified': True, 'timestamp': datetime.datetime(2024, 7, 21, 10, 0), 'comments': [{'commenter': 'Angelina Jolie', 'comment': 'I should try more vegan dishes!', 'comment_date'

### Projection

Projection specifies which fields should be included or excluded from the query results.

* Include fields: By default, MongoDB returns all fields in the documents that match the query. Using projection, we can choose to include only specific fields.

* Exclude fields: we can also exclude specific fields from the query results, which means the fields not specified in the projection will be excluded.

* Limit fields: Projection allows us to limit the amount of data retrieved by including only the necessary fields, which can improve query performance.

In [11]:
# Including specific fields - Retrieve only the author and content fields
projected_posts = posts.find({}, {"author": 1, "content": 1})
print("Posts with only 'author' and 'content' fields:")
for post in projected_posts:
    print(post)
    
# Excluding specific fields - Retrieve all fields except comments and _id
no_comments = posts.find({}, {"comments": 0, "_id": 0})
print("\nPosts without comments and _id field:")
for post in no_comments:
    print(post)
    
# Retrieve documents where only author and views fields are included, and exclude _id
author_views = posts.find({}, {"author": 1, "views": 1, "_id": 0})
print("\nAuthors and views of all posts (without _id):")
for post in author_views:
    print(post)
    
# Projection with filtering - Retrieve documents with 'Mediterranean' tag and include only 'author' and 'content'
filtered_projection = posts.find({"tags": "Mediterranean"}, {"author": 1, "views": 1, "_id": 0})
print("\nPosts with 'Mediterranean' tag showing only 'author' and 'views':")
for post in filtered_projection:
    print(post)
    
# Projection with nested fields - Retrieve only 'comments.commenter' from documents
nested_projection = posts.find({}, {"comments.commenter": 1, "_id": 0})
print("\nPosts with only 'comments.commenter':")
for post in nested_projection:
    print(post)

Posts with only 'author' and 'content' fields:
{'_id': 'post1', 'author': 'Brad Pitt', 'content': 'I love Italian cuisine! Especially pasta.'}
{'_id': ObjectId('66a3c7c02c7227f672b7667a'), 'author': 'Angelina Jolie', 'content': 'Mediterranean food is not only healthy but also delicious!'}
{'_id': ObjectId('66a3c7c02c7227f672b7667b'), 'author': 'Leonardo DiCaprio', 'content': 'Sushi is my favorite comfort food.'}
{'_id': ObjectId('66a3c7c02c7227f672b7667c'), 'author': 'Natalie Portman', 'content': 'As a vegan, I love discovering new plant-based recipes.'}

Posts without comments and _id field:
{'author': 'Brad Pitt', 'content': 'I love Italian cuisine! Especially pasta.', 'tags': ['food', 'Italian cuisine'], 'published_date': datetime.datetime(2024, 7, 18, 0, 0), 'views': 1500, 'rating': 4.5, 'verified': True, 'timestamp': datetime.datetime(2024, 7, 18, 12, 0)}
{'author': 'Angelina Jolie', 'content': 'Mediterranean food is not only healthy but also delicious!', 'tags': ['food', 'Mediter

##### Projection concepts

1. **Default Behavior**: By default, if no projection is specified, MongoDB returns all fields in the documents.

2. **Inclusion projection** `{ "field1": 1, "field2": 1 }`: Specifies which fields should be included in the query results. For example: To retrieve only the `author` field from all documents:
 ```json
 { "author": 1 }
 ```
   - **Note**: If we include one field, MongoDB will include the `_id` field by default unless explicitly excluded.

3. **Exclusion projection** `{ "field1": 0, "field2": 0 }`: Specifies which fields should be excluded from the query results. For example: To exclude the `comments` field from all documents:
 ```json
 { "comments": 0 }
 ```
    - **Note**: Excluding `_id` field: The `_id` field is included by default unless explicitly excluded.

4. **Mixed projection** `{ "field1": 1, "field2": 0 }`: Combines inclusion and exclusion. Only one field can be excluded at a time if we are using inclusion. For example**: To include `author` and `views`, and exclude `_id`:
 ```json
 { "author": 1, "views": 1, "_id": 0 }
 ```

5. **Projection with nested fields** `{ "nestedField.subField": 1 }`: To project fields within nested documents, use dot notation. For example: To include only `comments.commenter` in each document:
 ```json
 { "comments.commenter": 1 }
 ```
 
6. **Projection with filtering**: We can use projection in combination with filtering to return specific fields for documents that match certain criteria. For example: To filter posts with the `Mediterranean` tag and project only `author` and `content` fields:
 ```json
 posts.find({"tags": "Mediterranean"}, {"author": 1, "content": 1, "_id": 0})
 ```
 
#### Limiting and skipping

In [12]:
# Limit: Retrieve only the first 2 documents
limited_posts = posts.find({}, {"author": 1, "comments.commenter": 1, "_id": 0}).limit(2)
print("First 2 posts:")
for post in limited_posts:
    print(post)

# Skip: Skip the first 2 documents and retrieve the next 2
skipped_posts = posts.find({}, {"author": 1, "comments.commenter": 1, "_id": 0}).skip(2)
print("\nPosts after skipping the first 2, showing next 2:")
for post in skipped_posts:
    print(post)

# Limit and Skip Together: Skip the first 2 documents and limit the result to 1 document
combined_posts = posts.find({}, {"author": 1, "comments.commenter": 1, "_id": 0}).skip(2).limit(1)
print("\nCombined: Skipping the first 2 and limiting to 1 more post:")
for post in combined_posts:
    print(post)

First 2 posts:
{'author': 'Brad Pitt', 'comments': [{'commenter': 'Angelina Jolie'}, {'commenter': 'Leonardo DiCaprio'}]}
{'author': 'Angelina Jolie', 'comments': [{'commenter': 'Brad Pitt'}]}

Posts after skipping the first 2, showing next 2:
{'author': 'Leonardo DiCaprio', 'comments': [{'commenter': 'Brad Pitt'}]}
{'author': 'Natalie Portman', 'comments': [{'commenter': 'Angelina Jolie'}]}

Combined: Skipping the first 2 and limiting to 1 more post:
{'author': 'Leonardo DiCaprio', 'comments': [{'commenter': 'Brad Pitt'}]}


1. **Limiting results (`limit(n)`)**: The `limit()` method restricts the number of documents returned by a query to a specified number. This is useful for controlling the size of the result set and for implementing pagination.

2. **Skipping results (`skip(n)`)**: The `skip()` method skips a specified number of documents in the result set before returning the remaining documents. This is often used in conjunction with `limit` to implement pagination, where we want to skip a number of documents based on the current page.

3. **Combining limit and skip**: we can combine `skip()` and `limit()` to paginate through a dataset. For example, to get the next set of results after skipping a certain number of documents.

#### Sorting

In [13]:
# Sort by 'published_date' in ascending order
sorted_by_date = posts.find({}, {"author": 1, "views": 1, "comments.likes": 1, "_id": 0}).sort("published_date", 1)
print("Posts sorted by 'published_date' (ascending):")
for post in sorted_by_date:
    print(post)
    
# Sort by 'published_date' in descending order
sorted_by_date_desc = posts.find({}, {"author": 1, "views": 1, "comments.likes": 1, "_id": 0}).sort("published_date", -1)
print("\nPosts sorted by 'published_date' (descending):")
for post in sorted_by_date_desc:
    print(post)
    
# Sort by 'rating' (ascending) and then by 'views' (descending)
sorted_by_rating_and_views = posts.find({}, {"author": 1, "published_date": 1, "comments.likes": 1, "_id": 0}).sort([("rating", 1), ("views", -1)])
print("\nPosts sorted by 'rating' (ascending) and then by 'views' (descending):")
for post in sorted_by_rating_and_views:
    print(post)
    
# Sort by 'comments.comment_date' (ascending) - Sorting by a nested field's date
sorted_by_comment_date = posts.find({}, {"author": 1, "views": 1, "comments.likes": 1, "_id": 0}).sort("comments.comment_date", 1)
print("\nPosts sorted by 'comments.comment_date' (ascending):")
for post in sorted_by_comment_date:
    print(post)

Posts sorted by 'published_date' (ascending):
{'author': 'Brad Pitt', 'views': 1500, 'comments': [{'likes': 15}, {'likes': 20}]}
{'author': 'Angelina Jolie', 'views': 1200, 'comments': [{'likes': 10}]}
{'author': 'Leonardo DiCaprio', 'views': 800, 'comments': [{'likes': 5}]}
{'author': 'Natalie Portman', 'views': 1100, 'comments': [{'likes': 8}]}

Posts sorted by 'published_date' (descending):
{'author': 'Natalie Portman', 'views': 1100, 'comments': [{'likes': 8}]}
{'author': 'Leonardo DiCaprio', 'views': 800, 'comments': [{'likes': 5}]}
{'author': 'Angelina Jolie', 'views': 1200, 'comments': [{'likes': 10}]}
{'author': 'Brad Pitt', 'views': 1500, 'comments': [{'likes': 15}, {'likes': 20}]}

Posts sorted by 'rating' (ascending) and then by 'views' (descending):
{'author': 'Leonardo DiCaprio', 'published_date': datetime.datetime(2024, 7, 20, 0, 0), 'comments': [{'likes': 5}]}
{'author': 'Brad Pitt', 'published_date': datetime.datetime(2024, 7, 18, 0, 0), 'comments': [{'likes': 15}, {'li

1. **Sorting order**:
   - **Ascending order `sort("field_name", 1)`**: Sorts documents in ascending order based on the specified field. For example, sorting by `published_date` in ascending order will show the earliest dates first.
     ```python
     posts.find().sort("published_date", 1)
     ```
   - **Descending order `sort("field_name", -1)`**: Sorts documents in descending order based on the specified field. For example, sorting by `published_date` in descending order will show the latest dates first.
     ```python
     posts.find().sort("published_date", -1)
     ```

2. **Multiple fields sorting `sort([("field_name1", sort_order1), ("field_name2", sort_order2)])`**: When sorting by multiple fields, MongoDB first sorts by the first field. If there are documents with the same value for the first field, it then sorts by the second field, and so on. For example, sort posts by `rating` (ascending) and then by `views` (descending):
     ```python
     posts.find().sort([("rating", 1), ("views", -1)])
     ```
     
### Updating documents

We can update documents using the `update_one`, `update_many`, `replace_one` methods.

#### Update one document

In [14]:
################################# Update a single document and a single field ##################################
# Find and display the document to be updated
document_to_update = posts.find_one({"author": "Brad Pitt"},{"author": 1, "content": 1})
print("Document to update (author 'Brad Pitt'):")
print(document_to_update)
# Update a single document - update the content of a post by Brad Pitt
update_result = posts.update_one(
    {"author": "Brad Pitt"},
    {"$set": {"content": "I love all kinds of cuisine! Especially pasta."}}
)
print(f"Matched {update_result.matched_count} document(s) and modified {update_result.modified_count} document(s).")
# Verify the update
verification = posts.find_one({"author": "Brad Pitt"},{"author": 1, "content": 1})
print("\nVerification of deletion (should be None):")
print(verification)

############################## Update a single document and multiple fields ##################################
# Find and display the document to be updated
document_to_update = posts.find_one({"author": "Brad Pitt"}, {"author": 1, "content": 1, "views": 1, "_id": 0})
print("Document to update (author 'Brad Pitt'):")
print(document_to_update)
# Update a single document - modify multiple fields
update_result = posts.update_one(
    {"author": "Brad Pitt"},
    {"$set": {"content": "I love all kinds of cuisine! Especially Pizza.", "views": 2000}}
)
print(f"Matched {update_result.matched_count} document(s) and modified {update_result.modified_count} document(s).")
# Verify the update
verification = posts.find_one({"author": "Brad Pitt"}, {"author": 1, "content": 1, "views": 1, "_id": 0})
print("\nVerification of update (content and views should be updated):")
print(verification)

######################################## Update a nested field ########################################
# Find and display the document to be updated (showing the nested field)
document_to_update = posts.find_one({"comments.commenter": "Angelina Jolie"}, {"author": 1, "comments.$": 1, "_id": 0})
print("Document to update (commenter 'Angelina Jolie'):")
print(document_to_update)
# Update a nested field - change the likes of a specific comment
update_result = posts.update_one(
    {"comments.commenter": "Angelina Jolie"},
    {"$set": {"comments.$.likes": 25}}
)
print(f"Matched {update_result.matched_count} document(s) and modified {update_result.modified_count} document(s).")
# Verify the update
verification = posts.find_one({"comments.commenter": "Angelina Jolie"}, {"author": 1, "comments.$": 1, "_id": 0})
print("\nVerification of update (likes should be 25):")
print(verification)

Document to update (author 'Brad Pitt'):
{'_id': 'post1', 'author': 'Brad Pitt', 'content': 'I love Italian cuisine! Especially pasta.'}
Matched 1 document(s) and modified 1 document(s).

Verification of deletion (should be None):
{'_id': 'post1', 'author': 'Brad Pitt', 'content': 'I love all kinds of cuisine! Especially pasta.'}
Document to update (author 'Brad Pitt'):
{'author': 'Brad Pitt', 'content': 'I love all kinds of cuisine! Especially pasta.', 'views': 1500}
Matched 1 document(s) and modified 1 document(s).

Verification of update (content and views should be updated):
{'author': 'Brad Pitt', 'content': 'I love all kinds of cuisine! Especially Pizza.', 'views': 2000}
Document to update (commenter 'Angelina Jolie'):
{'author': 'Brad Pitt', 'comments': [{'commenter': 'Angelina Jolie', 'comment': 'Me too! Pasta is the best.', 'comment_date': datetime.datetime(2024, 7, 18, 0, 0), 'likes': 15, 'approved': True}]}
Matched 1 document(s) and modified 1 document(s).

Verification of u

##### Explanation
The `update_one` method is used to update a single document that matches a specified filter. The syntax of the `update_one` method:
```python
update_one(filter, update, upsert=False)
```

- **`filter`**: The query criteria to select the document(s) to update. This is a dictionary specifying the conditions that the document must meet to be updated. It uses query operators (e.g., `$eq`, `$gt`, `$lt`) to specify the conditions. For example, find the document where the `author` is "Brad Pitt":
```python
{"author": "Brad Pitt"}
```
- **`update`**: The update operations to be applied to the matched document(s). This is a dictionary specifying the respective update operators and the modifications. For example, update the `content` field of a post by Brad Pitt using the `$set` operation:
    ```python
     {"$set": {"content": "I love all kinds of cuisine! Especially pasta."}}
    ```

     - **Updating multiple fields**: Multiple fields can be updated in a single update_one operation. For example, update both the `content` and `views` fields of a post by Brad Pitt:
        ```python
        {"$set": {"content": "I love all kinds of cuisine! Especially Pizza.", "views": 2000}}
        ```
    
    - **Update of a nested field**: Nested fields can be updated in an update_one operation. The positional operator (`$`) is used to update a specific element in an array that matches the query criteria. The `$` operator represents the index of the array element that matches the filter condition. For example, update the `likes` field of a comment made by "Angelina Jolie" within the `comments` array:
        ```python
        {"$set": {"comments.$.likes": 25}}
        ```
        The `$` operator in `{"comments.$.likes": 25}` represents the index of the matched comment element in the `comments` array. MongoDB updates the likes field for this specific comment to 25. The `$` operator only updates the first element that matches the query. If there are multiple elements that match, only the first one is updated. For deeply nested arrays, we may need multiple positional operators.

- **`upsert`** (optional): A boolean value that, if set to `True`, will insert a new document if no matching document is found. The default is `False`. For example, if there is no document with `author` as "Leonardo DiCaprio", the following operation will insert a new document with the specified content:
```python
 posts.update_one({"author": "Leonardo DiCaprio"},
                  {"$set": {"content": "I love all kinds of cuisine! Especially seafood."}}, 
                  upsert=True)
```


##### **Update operators**:
- **`$set`**: Updates the value of a field in the document. If the field does not exist, it will be created.
```python
{"$set": {"field_name": "new_value"}}
```
- **`$inc`**: Increments the value of a field by a specified amount. This operator is used for numeric fields.
```python
{"$inc": {"field_name": 1}}
```
- **`$unset`**: Removes a field from the document.
```python
{"$unset": {"field_name": ""}}
```
- **`$push`**: Adds an element to an array field. If the field does not exist, it will be created.
```python
{"$push": {"array_field": "new_element"}}
```
- **`$pull`**: Removes an element from an array field.
```python
{"$pull": {"array_field": "element_to_remove"}}
```

#### Update multiple documents

In [15]:
# Update multiple documents and a single field - set the 'status' field to 'archived' for documents with views less than 1000
update_result = posts.update_many(
    {"views": {"$lt": 1000}}, 
    {"$set": {"status": "archived"}}
)
print(f"Matched {update_result.matched_count} document(s) and modified {update_result.modified_count} document(s).")


# Update multiple documents and multiple fields - update the rating to 4 and increment the views by 500 for all documents where the views are more than 1000
update_result = posts.update_many(
    {"views": {"$gt": 1000}},
    {"$set": {"rating": 4}, "$inc": {"views": 500}}
)
print(f"Matched {update_result.matched_count} document(s) and modified {update_result.modified_count} document(s).")


# Update multiple documents - update nested fields
update_result = posts.update_many(
    {"comments.commenter": "Brad Pitt"},
    {
        "$set": {"comments.$.likes": 25, "status": "updated"},  # Update likes and status
        "$unset": {"comments.$.approved": ""},  # Remove unwanted field
        "$inc": {"views": 100}  # Increment views by 100
    }
)
print(f"Matched {update_result.matched_count} document(s) and modified {update_result.modified_count} document(s).")

Matched 1 document(s) and modified 1 document(s).
Matched 3 document(s) and modified 3 document(s).
Matched 2 document(s) and modified 2 document(s).


#### Replace one document

In [16]:
# Replace a single document - replace the document where author is 'Brad Pitt' with a new document
replacement_doc = {
    "author": "Brad Pitt",
    "content": "I also love Indian food",
    "views": 5000,
    "status": "replaced"
}

replace_result = posts.replace_one(
    {"author": "Brad Pitt"},
    replacement_doc
)
print(f"Matched {replace_result.matched_count} document(s) and replaced {replace_result.modified_count} document(s).")


# Replace a single document or insert a new one if no document matches (with upsert)
replacement_doc = {
    "author": "Leonardo DiCaprio",
    "content": "I also love Shakshuka",
    "views": 3000,
    "status": "new"
}

replace_result = posts.replace_one(
    {"author": "Leonardo DiCaprio", "views": 20},
    replacement_doc,
    upsert=True
)
print(f"Matched {replace_result.matched_count} document(s) and replaced {replace_result.modified_count} document(s).")

Matched 1 document(s) and replaced 1 document(s).
Matched 0 document(s) and replaced 0 document(s).


##### Explanation
The `replace_one` method is used to replace a single document that matches a specified filter with a new document. This method is different from `update_one` in that it replaces the entire document rather than just modifying specific fields. The syntax of the `replace_one` method:
```python
replace_one(filter, replacement, upsert=False)
```

- **`replacement`**: The new document that will replace the matched document. This document completely replaces the existing document.
- **`upsert`** (optional): If `True`, it inserts a new document if no matching document is found. The default is `False`.

### Deleting documents

We can delete documents using the `delete_one` or `delete_many` methods.

##### Delete documents

In [17]:
################### Delete one documnt ###################
# Find and display the document to be deleted
document_to_delete = posts.find_one({"author": "Brad Pitt"})
print("Document to delete (author 'Brad Pitt'):")
print(document_to_delete)
# Delete a single document by a specific criterion
delete_result = posts.delete_one({"author": "Brad Pitt"})
print(f"\nDeleted {delete_result.deleted_count} document(s) with author 'Brad Pitt'.")
# Verify deletion
verification = posts.find_one({"author": "Brad Pitt"})
print("\nVerification of deletion (should be None):")
print(verification)


########### Delete multiple documnt matching a criterion ###########
# Find and display documents to be deleted
documents_to_delete = posts.find({"views": {"$gt": 1000}})
print("\nDocuments to delete (views > 1000):")
for doc in documents_to_delete:
    print(doc)
# Delete multiple documents matching a criterion
delete_result = posts.delete_many({"views": {"$gt": 1000}})
print(f"\nDeleted {delete_result.deleted_count} document(s) with views great than 1000.")
# Verify deletion
verification = posts.find({"views": {"$gt": 1000}})
print("\nVerification of deletion (should be empty):")
for doc in verification:
    print(doc)
    

################### Delete all documnts ###################
# Find and display all documents
all_documents = posts.find()
print("\nAll documents before deleting the collection:")
for doc in all_documents:
    print(doc)
# Delete all documents in the collection
delete_result = posts.delete_many({})
print(f"\nDeleted {delete_result.deleted_count} document(s) from the collection.")
# Verify deletion
verification = posts.count_documents({})
print("\nVerification of deletion (should be zero):")
print(verification)

Document to delete (author 'Brad Pitt'):
{'_id': 'post1', 'author': 'Brad Pitt', 'content': 'I also love Indian food', 'views': 5000, 'status': 'replaced'}

Deleted 1 document(s) with author 'Brad Pitt'.

Verification of deletion (should be None):
None

Documents to delete (views > 1000):
{'_id': ObjectId('66a3c7c02c7227f672b7667a'), 'author': 'Angelina Jolie', 'content': 'Mediterranean food is not only healthy but also delicious!', 'tags': ['food', 'Mediterranean'], 'published_date': datetime.datetime(2024, 7, 19, 0, 0), 'views': 1800, 'rating': 4, 'verified': True, 'timestamp': datetime.datetime(2024, 7, 19, 14, 30), 'comments': [{'commenter': 'Brad Pitt', 'comment': 'Totally agree!', 'comment_date': datetime.datetime(2024, 7, 20, 0, 0), 'likes': 25}], 'status': 'updated'}
{'_id': ObjectId('66a3c7c02c7227f672b7667c'), 'author': 'Natalie Portman', 'content': 'As a vegan, I love discovering new plant-based recipes.', 'tags': ['food', 'vegan'], 'published_date': datetime.datetime(2024, 

##### Notes
* `delete_one(filter)` - Deletes the first document that matches the given filter.
* Use of filters: The filter criteria for deletion use the same query language as for finding documents, allowing for complex and flexible conditions.
* Impact on indexes: Deleting documents can affect indexes. It's important to consider the performance implications, especially when deleting large numbers of documents.

#### Drop database

All collections and documents within the database will be permanently deleted.

In [18]:
# Confirm the existence of the database before deletion
print(f"Databases before deletion: {client.list_database_names()}")
# Drop the database
client.drop_database('cms_database')
print(f"Deleted database 'cms_database'.")
# Verify deletion by listing all databases
print(f"Databases after deletion: {client.list_database_names()}")

Databases before deletion: ['admin', 'cms_database', 'config', 'conversations_db', 'local']
Deleted database 'cms_database'.
Databases after deletion: ['admin', 'config', 'conversations_db', 'local']
