# Mongo tutorial

## Prerequisites

### Documentation

You will find all documentation for :
* [Mongo commands](https://docs.mongodb.com/manual/reference/)
* [Mongo python client](http://api.mongodb.com/python/current/api/pymongo/mongo_client.html#pymongo.mongo_client.MongoClient)

### Import libraries

In [None]:
import datetime
from pprint import pprint

import pymongo
from pymongo import MongoClient

In [None]:
client = MongoClient('localhost', 27017)

In [None]:
# let's work in a test_database
db = client.test_database
posts = db.posts

In [None]:
post = {
    "author": "Mike",
    "text": "My first blog post!",
    "tags": ["mongodb", "python", "pymongo"],
    "date": datetime.datetime.utcnow()
}
post_id = posts.insert_one(post).inserted_id
post_id

In [None]:
db.list_collection_names()

In [None]:
pprint(posts.find_one())

You can launch a terminal aside, connect to your server with a mongo client and check that the value is present :

```bash
vagrant@nosql:~$ mongo
> show databases;
admin          0.000GB
config         0.000GB
local          0.000GB
test_database  0.000GB
> use test_database;
switched to db test_database
> db.posts.find()
{ 
    "_id" : ObjectId("..."), 
    "author" : "Mike", 
    "text" : "My first blog post!", 
    "tags" : [ "mongodb", "python", "pymongo"], 
    "date" : ISODate("2019-02-10T11:33:47.883Z") 
}
```

## I. Quick start

### First steps

**Q** : Create a document `{msg: 'hello'}` in the `test` collection with `insert_one()`. Fetch it back to display it. What is the `_id` for ?

NB : if the collection doesn't exist yet, MongoDB automatically creates it.

In [None]:
test = db.test
test_id = test.insert_one({"msg": "hello"}).inserted_id

# Fetch and display the document
pprint(test.find_one())

**Q**: Display the number of documents inside the `test` collection

In [None]:
test.count_documents({})

### Interacting with a database

We have 2 `.json` files we want to interact with inside the `data` folder. Let's first dump them into a `MovieLens` database, inside `users` and `movies` collections.

For this section, you will need to read a bit on [query operators](https://docs.mongodb.com/manual/reference/operator/query/#query-selectors). Most methods on collections you will use have `filter` as a first parameter, on which you must pass a dictionary of query parameters.

**Q** : In the `MovieLens` database, load `data/movielens_movies.json` into `movies` and `data/movielens_users.json` into `users`. 

Use the dedicated shell command for this : `mongoimport --db <some_db> --collection <some_collection> --file <some_file>` 

In [None]:
import os
import json


# Create MovieLens database
movielens_db = client.MovieLens

# Create collections for movies and users
movies = movielens_db.movies
users = movielens_db.users

# Read movies data
try:
    with open('data/movielens_movies.json', 'r') as f:
        movies_data = [json.loads(line) for line in f]
    
    # Only insert if collection is empty
    if movies.count_documents({}) == 0:
        movies.insert_many(movies_data)
    else:
        print("Movies collection already populated")
except FileNotFoundError:
    print("Warning: movielens_movies.json not found")

# Read users data
try:
    with open('data/movielens_users.json', 'r') as f:
        users_data = [json.loads(line) for line in f]
    
    # Only insert if collection is empty
    if users.count_documents({}) == 0:
        users.insert_many(users_data)
    else:
        print("Users collection already populated")
except FileNotFoundError:
    print("Warning: movielens_users.json not found")





**Q** : how many users are in the `MovieLens` database ?

In [None]:
print(f"Number of users in MovieLens: {users.count_documents({})}")

In [None]:
print(f"Number of movies in MovieLens: {movies.count_documents({})}")

**Q** : Display all comedies (the `genres` property equals `Comedy`). 

NB : You will need to find how to go through a `command_cursor`, then use the `pprint` function for a better display of those documents.

In [None]:
comedy_cursor = movies.find({'genres': 'Comedy'})

for movie in comedy_cursor:
    pprint(movie)

**Q** : Fetch and display the `name` and `occupation` for Clifford Johnathan. The second paramater for `find()` ([doc here](https://api.mongodb.com/python/current/api/pymongo/collection.html#pymongo.collection.Collection.find)) is called the `projection` and is used to limit which data to fetch from the query.

In [None]:
result = users.find_one({"name": "Clifford Johnathan"}, {"name": 1, "occupation": 1, "_id": 0})
pprint(result)

**Q**: How many minors (by `age`) have rated movies ?

In [None]:
minor_count = users.count_documents({"age": {"$lt": 18}})
print(f"Number of minors who rated movies: {minor_count}")

**Q**: Display science fiction movies ('Sci-Fi') and suspense movies ('Thriller'). This time you need to use a regex to parse genres and look for those values.

In [None]:
import re

sci_fi_thriller = movies.find({'genres': {'$regex': '(Sci-Fi|Thriller)'}})
for movie in sci_fi_thriller:
    pprint(movie)

**Q**: If we want more advanced textual search, we need a particular index. Use the `create_index()` method to index as [TEXT](https://docs.mongodb.com/manual/core/index-text/) the `genres` field of the `movies` collection.

In [None]:
movies.create_index([("genres", "text")])

**Q**: Restart the search for science fiction and thriller movies with the operator `$text`

In [None]:
sci_fi_thriller = movies.find({'$text': {'$search': 'Sci-Fi Thriller'}})
for movie in sci_fi_thriller:
    pprint(movie)

**Q**: Display the first 30 movies (`limit`) in alphabetical order (`sort`) by title

In [None]:
movies.find().sort('title', 1).limit(30)
for movie in movies.find().sort('title', 1).limit(30):
    pprint(movie)

**Q**: How many users have seen the movie "Star Wars: Episode V - The Empire Strikes Back (1980)" (`_id 1196`) ? The `movies` argument is an array so we should try the [elemMatch](https://docs.mongodb.com/manual/reference/operator/projection/elemMatch/) operator here.

In [None]:
count = users.count_documents({"movies": {"$elemMatch": {"movieid": 1196}}})
print(f"Number of users who have seen Star Wars Episode V: {count}")

**Q**: And how many gave it a rating of 1 or 2 ?

In [None]:
count_low_ratings = users.count_documents({
    "movies": {
        "$elemMatch": {
            "movieid": 1196,
            "rating": {"$in": [1, 2]}
        }
    }
})
print(f"Number of users who rated Star Wars Episode V with 1 or 2 stars: {count_low_ratings}")

### Updating data

**Q**: Insert a new user with the properties `name`, `gender` ('M' or'F'), `occupation` and `age`, using the `insert_one()` command. Display it with `find_one()`.

In [None]:
# Insert a new user
new_user = {
    "name": "John Smith",
    "gender": "M", 
    "occupation": "programmer",
    "age": 28
}

users.insert_one(new_user)

# Display the newly inserted user
pprint(users.find_one({"name": "John Smith"}))

**Q**: Add an appreciation on a viewed movie with `update_one()`, add the movies property containing a table with a document (`movieid`, `rating`, `timestamp` with the value `datetime.datetime.utcnow()`).

You will need to read the documentation on [update operators](https://docs.mongodb.org/manual/reference/operator/update/).

In [None]:
users.update_one(
    {"name": "John Smith"},
    {"$set": {
        "movies": [{
            "movieid": 1,  # Toy Story
            "rating": 5,
            "timestamp": datetime.datetime.utcnow()
        }]
    }}
)

# Verify the update
pprint(users.find_one({"name": "John Smith"}))

**Q**: Find the number of users who have declared a `programmer` occupation. Modify them so that they are `developer`. Verify your update.

In [None]:
# Count programmers
programmer_count = users.count_documents({"occupation": "programmer"})
print(f"Number of programmers before update: {programmer_count}")

# Update programmers to developers
update_result = users.update_many(
    {"occupation": "programmer"},
    {"$set": {"occupation": "developer"}}
)
print(f"Modified {update_result.modified_count} documents")

# Verify update
developer_count = users.count_documents({"occupation": "developer"})
print(f"Number of developers after update: {developer_count}")

## II. Modelling a blog

We will now model a blog using Mongo. 

First, switch to a new `Blog` database. Each blog post will have the following arguments:

* The author (author field, string type)
* The date (date field, string type in YYYY-MM-DD format)
* The content (field content)
* Tags (field tags, a string array)
* A list of comments (field comments) containing:
 * The author (author field, string type)
 * The date (date field, string type in YYYY-MM-DD format)
 * The content (field content)


**Q**: Create a first post by `rick`, on January 15th, with the tags `mongodb` and `nosql`.

In [None]:
# Switch to Blog database
blog_db = client.Blog
blog_posts = blog_db.posts

# Create first post
first_post = {
    "author": "rick",
    "date": "2024-01-15",
    "content": "This is my first post about MongoDB!",
    "tags": ["mongodb", "nosql"],
    "comments": []
}

blog_posts.insert_one(first_post)

**Q**: Create a second post by `kate`, on January 21, with the tag `nosql` and a comment from `rick` on the same day.

In [None]:
second_post = {
    "author": "kate",
    "date": "2024-01-21",  
    "content": "Learning more about NoSQL databases!",
    "tags": ["nosql"],
    "comments": [
        {
            "author": "rick",
            "date": "2024-01-21",
            "content": "Great post about NoSQL!"
        }
    ]
}

blog_posts.insert_one(second_post)

**Q**: Display the author of the last post with the tag `nosql`

In [None]:
last_nosql_post = blog_posts.find({"tags": "nosql"}).sort("date", -1).limit(1)
for post in last_nosql_post:
    print(f"Author of last nosql post: {post['author']}")

**Q**: Add a comment by `jack` on January 25, to `kate`'s post

In [None]:
blog_posts.update_one(
    {"author": "kate"},
    {"$push": {
        "comments": {
            "author": "jack",
            "date": "2024-01-25",
            "content": "Thanks for sharing this info about NoSQL!"
        }
    }}
)

# Verify the update
pprint(blog_posts.find_one({"author": "kate"}))

**Q**: Display all comments by `kate`

In [None]:
# Find all posts with comments by kate
comments_by_kate = blog_posts.find(
    {"comments.author": "kate"},
    {"comments.$": 1}  # Project only matching comments
)

for post in comments_by_kate:
    for comment in post['comments']:
        if comment['author'] == 'kate':
            pprint(comment)

## Postquisites

In [None]:
!mongo test_database --eval 'db.dropDatabase()'

In [None]:
!mongo MovieLens --eval 'db.dropDatabase()'

In [None]:
!mongo Blog --eval 'db.dropDatabase()'