# Mongo tutorial

## Prerequisites

### Documentation

You will find all documentation for :
* [Mongo commands](https://docs.mongodb.com/manual/reference/)
* [Mongo python client](http://api.mongodb.com/python/current/api/pymongo/mongo_client.html#pymongo.mongo_client.MongoClient)

### Import libraries

In [192]:
import datetime
from pprint import pprint

import pymongo
from pymongo import MongoClient

In [193]:
client = MongoClient('localhost', 27017)

In [194]:
# let's work in a test_database
db = client.test_database
posts = db.posts

In [195]:
post = {
    "author": "Mike",
    "text": "My first blog post!",
    "tags": ["mongodb", "python", "pymongo"],
    "date": datetime.datetime.utcnow()
}
post_id = posts.insert_one(post).inserted_id
post_id

  "date": datetime.datetime.utcnow()


ObjectId('6793961ea3c9010d1f1668b3')

In [196]:
db.list_collection_names()

['posts']

In [197]:
pprint(posts.find_one())

{'_id': ObjectId('6793961ea3c9010d1f1668b3'),
 'author': 'Mike',
 'date': datetime.datetime(2025, 1, 24, 13, 31, 10, 666000),
 'tags': ['mongodb', 'python', 'pymongo'],
 'text': 'My first blog post!'}


You can launch a terminal aside, connect to your server with a mongo client and check that the value is present :

```bash
vagrant@nosql:~$ mongo
> show databases;
admin          0.000GB
config         0.000GB
local          0.000GB
test_database  0.000GB
> use test_database;
switched to db test_database
> db.posts.find()
{ 
    "_id" : ObjectId("..."), 
    "author" : "Mike", 
    "text" : "My first blog post!", 
    "tags" : [ "mongodb", "python", "pymongo"], 
    "date" : ISODate("2019-02-10T11:33:47.883Z") 
}
```

## I. Quick start

### First steps

**Q** : Create a document `{msg: 'hello'}` in the `test` collection with `insert_one()`. Fetch it back to display it. What is the `_id` for ?

NB : if the collection doesn't exist yet, MongoDB automatically creates it.

In [198]:
#Create a document `{msg: 'hello'}` in the `test` collection with `insert_one()`. Fetch it back to display it. What is the `_id` for ?
#NB : if the collection doesn't exist yet, MongoDB automatically creates it.

test = db.test
test.insert_one({'msg': 'hello'})
pprint(test.find_one())

{'_id': ObjectId('6793961fa3c9010d1f1668b4'), 'msg': 'hello'}


**Q**: Display the number of documents inside the `test` collection

In [199]:
#Display the number of documents inside the `test` collection
test.count_documents({})

1

### Interacting with a database

We have 2 `.json` files we want to interact with inside the `data` folder. Let's first dump them into a `MovieLens` database, inside `users` and `movies` collections.

For this section, you will need to read a bit on [query operators](https://docs.mongodb.com/manual/reference/operator/query/#query-selectors). Most methods on collections you will use have `filter` as a first parameter, on which you must pass a dictionary of query parameters.

**Q** : In the `MovieLens` database, load `data/movielens_movies.json` into `movies` and `data/movielens_users.json` into `users`. 

Use the dedicated shell command for this : `mongoimport --db <some_db> --collection <some_collection> --file <some_file>` 

In [200]:
from pymongo import MongoClient
import json

# Connexion à la base de données MongoDB
client = MongoClient('mongodb://localhost:27017/')
db = client["MovieLens"]  # Nom de la base de données

def load_jsonl_to_mongodb(file_path, collection_name):
    with open(file_path, 'r', encoding='utf-8') as file:
        data = [json.loads(line) for line in file]  # Charger chaque ligne comme un objet JSON
        collection = db[collection_name]
        collection.insert_many(data)

# Importer les fichiers JSONL
#load_jsonl_to_mongodb("D:/M2 SISE/MlOps/NoSQL/data/movielens_movies.json", "movies")
#load_jsonl_to_mongodb("D:/M2 SISE/MlOps/NoSQL/data/movielens_users.json", "users")


In [201]:
# Afficher le nombre de documents dans chaque collection
print("Nombre de films :", db.movies.count_documents({}))
print("Nombre d'utilisateurs :", db.users.count_documents({}))

# Exemple : Afficher les 5 premiers films
for movie in db.movies.find().limit(5):
    print(movie)


Nombre de films : 0
Nombre d'utilisateurs : 0


**Q** : how many users are in the `MovieLens` database ?

In [202]:
users = db.users
users.count_documents({})

0

**Q** : Display all comedies (the `genres` property equals `Comedy`). 

NB : You will need to find how to go through a `command_cursor`, then use the `pprint` function for a better display of those documents.

In [203]:
comedies = db.movies.find({"genres": "Comedy"})
for comedy in comedies:
    pprint(comedy)

**Q** : Fetch and display the `name` and `occupation` for Clifford Johnathan. The second paramater for `find()` ([doc here](https://api.mongodb.com/python/current/api/pymongo/collection.html#pymongo.collection.Collection.find)) is called the `projection` and is used to limit which data to fetch from the query.

In [204]:
name = "Clifford Johnathan"
user = users.find_one({"name": name}, {"name": 1, "occupation": 1})
pprint(user)

None


**Q**: How many minors (by `age`) have rated movies ?

In [205]:
age = 18
minors = users.count_documents({"age": {"$lt": age}})
print(minors)

0


**Q**: Display science fiction movies ('Sci-Fi') and suspense movies ('Thriller'). This time you need to use a regex to parse genres and look for those values.

In [206]:
movies = db.movies.find({"genres": {"$regex": "Sci-Fi|Thriller"}})
for movie in movies:
    pprint(movie)
    

**Q**: If we want more advanced textual search, we need a particular index. Use the `create_index()` method to index as [TEXT](https://docs.mongodb.com/manual/core/index-text/) the `genres` field of the `movies` collection.

In [207]:
db.movies.create_index([("genres", pymongo.TEXT)])

'genres_text'

**Q**: Restart the search for science fiction and thriller movies with the operator `$text`

In [208]:
movies = db.movies.find({"$text": {"$search": "Sci-Fi Thriller"}})
for movie in movies:
    pprint(movie)


**Q**: Display the first 30 movies (`limit`) in alphabetical order (`sort`) by title

In [209]:
movies = db.movies.find().sort("title").limit(30)
for movie in movies:
    pprint(movie)

**Q**: How many users have seen the movie "Star Wars: Episode V - The Empire Strikes Back (1980)" (`_id 1196`) ? The `movies` argument is an array so we should try the [elemMatch](https://docs.mongodb.com/manual/reference/operator/projection/elemMatch/) operator here.

In [210]:
movie_id = 1196
users_count = users.count_documents({"movies": {"$elemMatch": {"movieId": movie_id}}})
print(users_count)

0


**Q**: And how many gave it a rating of 1 or 2 ?

In [211]:
users_count = users.count_documents({"movies": {"$elemMatch": {"movieId": movie_id, "rating": {"$lte": 2}}}})
print(users_count)

0


### Updating data

**Q**: Insert a new user with the properties `name`, `gender` ('M' or'F'), `occupation` and `age`, using the `insert_one()` command. Display it with `find_one()`.

In [212]:
new_user = {
    "name": "Alice",
    "gender": "F",
    "occupation": "Student",
    "age": 22}
users.insert_one(new_user)
pprint(users.find_one({"name": "Alice"}))

{'_id': ObjectId('6793961fa3c9010d1f1668b6'),
 'age': 22,
 'gender': 'F',
 'name': 'Alice',
 'occupation': 'Student'}


**Q**: Add an appreciation on a viewed movie with `update_one()`, add the movies property containing a table with a document (`movieid`, `rating`, `timestamp` with the value `datetime.datetime.utcnow()`).

You will need to read the documentation on [update operators](https://docs.mongodb.org/manual/reference/operator/update/).

In [213]:
# Nom de l'utilisateur cible
user_name = "Alice"
movie_entry = {
    "movieid": "movie789",  # ID du film
    "rating": 4.0,          # Note attribuée
    "timestamp": datetime.datetime.utcnow()  # Horodatage
}

# Mise à jour du document correspondant au nom
result = db.users.update_one(
    {"name": user_name},          # Filtrer sur le champ `name`
    {"$push": {"movies": movie_entry}}  # Ajouter l'appréciation dans le tableau `movies`
)

  "timestamp": datetime.datetime.utcnow()  # Horodatage


In [214]:
# Vérification du résultat
if result.matched_count > 0:
    print(f"Le film a été ajouté pour l'utilisateur {user_name}.")
else:
    print(f"Aucun utilisateur trouvé avec le nom {user_name}.")

Le film a été ajouté pour l'utilisateur Alice.


In [215]:
#afficher la ligne correspondant à l'utilisateur Alice
pprint(users.find_one({"name": "Alice"}))

{'_id': ObjectId('6793961fa3c9010d1f1668b6'),
 'age': 22,
 'gender': 'F',
 'movies': [{'movieid': 'movie789',
             'rating': 4.0,
             'timestamp': datetime.datetime(2025, 1, 24, 13, 31, 12, 985000)}],
 'name': 'Alice',
 'occupation': 'Student'}


**Q**: Find the number of users who have declared a `programmer` occupation. Modify them so that they are `developer`. Verify your update.

In [216]:
programmers = users.count_documents({"occupation": "programmer"})
print(programmers)

0


## II. Modelling a blog

We will now model a blog using Mongo. 

First, switch to a new `Blog` database. Each blog post will have the following arguments:

* The author (author field, string type)
* The date (date field, string type in YYYY-MM-DD format)
* The content (field content)
* Tags (field tags, a string array)
* A list of comments (field comments) containing:
 * The author (author field, string type)
 * The date (date field, string type in YYYY-MM-DD format)
 * The content (field content)


**Q**: Create a first post by `rick`, on January 15th, with the tags `mongodb` and `nosql`.

In [217]:
post = {
    "author": "rick",
    "date": "2021-01-15",
    "content": "This is my first post",
    "tags": ["mongodb", "nosql"],
    "comments": []
}
posts.insert_one(post)

InsertOneResult(ObjectId('67939622a3c9010d1f1668b7'), acknowledged=True)

**Q**: Create a second post by `kate`, on January 21, with the tag `nosql` and a comment from `rick` on the same day.

In [218]:
post = {
    "author": "kate",
    "date": "2021-01-21",
    "content": "This is my second post",
    "tags": ["nosql"],
    "comments": [{
        "author": "rick",
        "date": "2021-01-21",
        "content": "Nice post!"
    }]
}
posts.insert_one(post)

InsertOneResult(ObjectId('67939623a3c9010d1f1668b8'), acknowledged=True)

**Q**: Display the author of the last post with the tag `nosql`

In [219]:
post = posts.find_one({"tags": "nosql"}, sort=[("date", pymongo.DESCENDING)])
print(post["author"])

kate


**Q**: Add a comment by `jack` on January 25, to `kate`'s post

In [220]:
post_id = post["_id"]
comment = {
    "author": "jack",
    "date": "2021-01-25",
    "content": "Great post!"
}
posts.update_one({"_id": post_id}, {"$push": {"comments": comment}})
post = posts.find_one({"_id": post_id})
pprint(post)

{'_id': ObjectId('67939623a3c9010d1f1668b8'),
 'author': 'kate',
 'comments': [{'author': 'rick', 'content': 'Nice post!', 'date': '2021-01-21'},
              {'author': 'jack',
               'content': 'Great post!',
               'date': '2021-01-25'}],
 'content': 'This is my second post',
 'date': '2021-01-21',
 'tags': ['nosql']}


**Q**: Display all comments by `kate`

In [221]:
comments = posts.find({"author": "kate"}, {"comments": 1})
for comment in comments:
    pprint(comment)

{'_id': ObjectId('67939623a3c9010d1f1668b8'),
 'comments': [{'author': 'rick', 'content': 'Nice post!', 'date': '2021-01-21'},
              {'author': 'jack',
               'content': 'Great post!',
               'date': '2021-01-25'}]}


## Postquisites

In [222]:
!mongo test_database --eval 'db.dropDatabase()'

'mongo' n'est pas reconnu en tant que commande interne
ou externe, un programme ex�cutable ou un fichier de commandes.


In [223]:
!mongo MovieLens --eval 'db.dropDatabase()'

'mongo' n'est pas reconnu en tant que commande interne
ou externe, un programme ex�cutable ou un fichier de commandes.


In [224]:
!mongo Blog --eval 'db.dropDatabase()'

'mongo' n'est pas reconnu en tant que commande interne
ou externe, un programme ex�cutable ou un fichier de commandes.
