# Mongo tutorial

## Prerequisites

### Documentation

You will find all documentation for :
* [Mongo commands](https://docs.mongodb.com/manual/reference/)
* [Mongo python client](http://api.mongodb.com/python/current/api/pymongo/mongo_client.html#pymongo.mongo_client.MongoClient)

### Import libraries

In [87]:
import datetime
from pprint import pprint

import pymongo
from pymongo import MongoClient, errors

In [88]:
client = MongoClient('localhost', 27017)

In [89]:
# let's work in a test_database
db = client.test_database
posts = db.posts

In [90]:
post = {
    "author": "Mike",
    "text": "My first blog post!",
    "tags": ["mongodb", "python", "pymongo"],
    "date": datetime.datetime.utcnow()
}
post_id = posts.insert_one(post).inserted_id
post_id

ObjectId('6793941cdf3d5360bbc84b73')

In [91]:
db.list_collection_names()

['test_collection', 'posts']

In [92]:
pprint(posts.find_one())

{'_id': ObjectId('67938a0edf3d5360bbc84b6c'),
 'author': 'Mike',
 'date': datetime.datetime(2025, 1, 24, 12, 39, 42, 238000),
 'tags': ['mongodb', 'python', 'pymongo'],
 'text': 'My first blog post!'}


You can launch a terminal aside, connect to your server with a mongo client and check that the value is present :

```bash
vagrant@nosql:~$ mongo
> show databases;
admin          0.000GB
config         0.000GB
local          0.000GB
test_database  0.000GB
> use test_database;
switched to db test_database
> db.posts.find()
{ 
    "_id" : ObjectId("..."), 
    "author" : "Mike", 
    "text" : "My first blog post!", 
    "tags" : [ "mongodb", "python", "pymongo"], 
    "date" : ISODate("2019-02-10T11:33:47.883Z") 
}
```

## I. Quick start

### First steps

**Q** : Create a document `{msg: 'hello'}` in the `test` collection with `insert_one()`. Fetch it back to display it. What is the `_id` for ?

NB : if the collection doesn't exist yet, MongoDB automatically creates it.

In [93]:
#Création d'un document
doc = {
    'msg' : 'hello'
}

#Création d'une collection mongodb
test_collection = db.test_collection
test_collection = db['test_collection']

#Insertion d'un document 
test_collection.insert_one(doc)

InsertOneResult(ObjectId('6793941cdf3d5360bbc84b74'), acknowledged=True)

**Q**: Display the number of documents inside the `test` collection

In [94]:
test_collection.count_documents({})

3

### Interacting with a database

We have 2 `.json` files we want to interact with inside the `data` folder. Let's first dump them into a `MovieLens` database, inside `users` and `movies` collections.

For this section, you will need to read a bit on [query operators](https://docs.mongodb.com/manual/reference/operator/query/#query-selectors). Most methods on collections you will use have `filter` as a first parameter, on which you must pass a dictionary of query parameters.

**Q** : In the `MovieLens` database, load `data/movielens_movies.json` into `movies` and `data/movielens_users.json` into `users`. 

Use the dedicated shell command for this : `mongoimport --db <some_db> --collection <some_collection> --file <some_file>` 

In [95]:
%pwd

'c:\\Users\\pbourbon\\Documents\\GitHub\\lyon2-nosql\\notebooks'

In [96]:
#Importation de movies 
!mongoimport --db MovieLens --collection movies --file ../data/movielens_movies.json

#Importation de users 
!mongoimport --db MovieLens --collection users --file ../data/movielens_users.json

2025-01-24T14:22:37.058+0100	connected to: localhost
2025-01-24T14:22:37.144+0100	error inserting documents: multiple errors in bulk operation:
  - E11000 duplicate key error collection: MovieLens.movies index: _id_ dup key: { _id: 12 }
  - E11000 duplicate key error collection: MovieLens.movies index: _id_ dup key: { _id: 6 }
  - E11000 duplicate key error collection: MovieLens.movies index: _id_ dup key: { _id: 13 }
  - E11000 duplicate key error collection: MovieLens.movies index: _id_ dup key: { _id: 14 }
  - E11000 duplicate key error collection: MovieLens.movies index: _id_ dup key: { _id: 15 }
  - E11000 duplicate key error collection: MovieLens.movies index: _id_ dup key: { _id: 16 }
  - E11000 duplicate key error collection: MovieLens.movies index: _id_ dup key: { _id: 17 }
  - E11000 duplicate key error collection: MovieLens.movies index: _id_ dup key: { _id: 19 }
  - E11000 duplicate key error collection: MovieLens.movies index: _id_ dup key: { _id: 18 }
  - E11000 duplicate

**Q** : how many users are in the `MovieLens` database ?

In [97]:
db2 = client['MovieLens']
users_collection = db2['users']

user_count = users_collection.count_documents({})
print(f"Nombres de users : {user_count}")

Nombres de users : 6041


**Q** : Display all comedies (the `genres` property equals `Comedy`). 

NB : You will need to find how to go through a `command_cursor`, then use the `pprint` function for a better display of those documents.

In [98]:
movies_collection = db2["movies"]

cursor = movies_collection.find({"genres":"Comedy"})

for movie in cursor:
    pprint(movie)

{'_id': 19,
 'genres': 'Comedy',
 'title': 'Ace Ventura: When Nature Calls (1995)'}
{'_id': 5, 'genres': 'Comedy', 'title': 'Father of the Bride Part II (1995)'}
{'_id': 38, 'genres': 'Comedy', 'title': 'It Takes Two (1995)'}
{'_id': 63,
 'genres': 'Comedy',
 'title': "Don't Be a Menace to South Central While Drinking Your Juice in the "
          'Hood (1996)'}
{'_id': 65, 'genres': 'Comedy', 'title': 'Bio-Dome (1996)'}
{'_id': 69, 'genres': 'Comedy', 'title': 'Friday (1995)'}
{'_id': 52, 'genres': 'Comedy', 'title': 'Mighty Aphrodite (1995)'}
{'_id': 101, 'genres': 'Comedy', 'title': 'Bottle Rocket (1996)'}
{'_id': 88, 'genres': 'Comedy', 'title': 'Black Sheep (1996)'}
{'_id': 102, 'genres': 'Comedy', 'title': 'Mr. Wrong (1996)'}
{'_id': 104, 'genres': 'Comedy', 'title': 'Happy Gilmore (1996)'}
{'_id': 109, 'genres': 'Comedy', 'title': 'Headless Body in Topless Bar (1995)'}
{'_id': 115, 'genres': 'Comedy', 'title': 'Happiness Is in the Field (1995)'}
{'_id': 96, 'genres': 'Comedy', '

**Q** : Fetch and display the `name` and `occupation` for Clifford Johnathan. The second paramater for `find()` ([doc here](https://api.mongodb.com/python/current/api/pymongo/collection.html#pymongo.collection.Collection.find)) is called the `projection` and is used to limit which data to fetch from the query.

In [99]:
cursor2 = users_collection.find({"name":"Clifford Johnathan"}, projection=["name", "occupation"])

for users in cursor2:
    pprint(users)

{'_id': 1276, 'name': 'Clifford Johnathan', 'occupation': 'technician/engineer'}


**Q**: How many minors (by `age`) have rated movies ?

In [100]:
nbrs_minors = users_collection.count_documents({"age" : {"$lt" : 18} })

print(nbrs_minors)

222


**Q**: Display science fiction movies ('Sci-Fi') and suspense movies ('Thriller'). This time you need to use a regex to parse genres and look for those values.

In [101]:
cursor3 = movies_collection.find({
    "genres": {
        "$regex":"(Sci-Fi|Thriller)",
    }
})

compteur = 0
for movie in cursor3:
    compteur +=1 
    pprint(movie)

print(compteur)

{'_id': 16, 'genres': 'Drama|Thriller', 'title': 'Casino (1995)'}
{'_id': 18, 'genres': 'Thriller', 'title': 'Four Rooms (1995)'}
{'_id': 6, 'genres': 'Action|Crime|Thriller', 'title': 'Heat (1995)'}
{'_id': 23, 'genres': 'Thriller', 'title': 'Assassins (1995)'}
{'_id': 22, 'genres': 'Crime|Drama|Thriller', 'title': 'Copycat (1995)'}
{'_id': 24, 'genres': 'Drama|Sci-Fi', 'title': 'Powder (1995)'}
{'_id': 29,
 'genres': 'Adventure|Sci-Fi',
 'title': 'City of Lost Children, The (1995)'}
{'_id': 10, 'genres': 'Action|Adventure|Thriller', 'title': 'GoldenEye (1995)'}
{'_id': 32, 'genres': 'Drama|Sci-Fi', 'title': 'Twelve Monkeys (1995)'}
{'_id': 61, 'genres': 'Drama|Thriller', 'title': 'Eye for an Eye (1996)'}
{'_id': 66,
 'genres': 'Sci-Fi|Thriller',
 'title': 'Lawnmower Man 2: Beyond Cyberspace (1996)'}
{'_id': 47, 'genres': 'Crime|Thriller', 'title': 'Seven (Se7en) (1995)'}
{'_id': 70,
 'genres': 'Action|Comedy|Crime|Horror|Thriller',
 'title': 'From Dusk Till Dawn (1996)'}
{'_id': 50, 

**Q**: If we want more advanced textual search, we need a particular index. Use the `create_index()` method to index as [TEXT](https://docs.mongodb.com/manual/core/index-text/) the `genres` field of the `movies` collection.

In [102]:
movies_collection.create_index({
    "genres" : "text"
})

'genres_text'

**Q**: Restart the search for science fiction and thriller movies with the operator `$text`

In [103]:
cursor4 = movies_collection.find({
    "$text": {
        "$search" : "Sci-Fi Thriller"
    }
})

compteur2 = 0
for movie in cursor4:
    compteur2 +=1 

print(compteur2)

698


**Q**: Display the first 30 movies (`limit`) in alphabetical order (`sort`) by title

In [104]:
result = movies_collection.find().sort('title',1).limit(30)

compt = 0
for movie in result : 
    compt += 1
    print(movie)

print(compt)

{'_id': 2031, 'title': '$1,000,000 Duck (1971)', 'genres': "Children's|Comedy"}
{'_id': 3112, 'title': "'Night Mother (1986)", 'genres': 'Drama'}
{'_id': 779, 'title': "'Til There Was You (1997)", 'genres': 'Drama|Romance'}
{'_id': 2072, 'title': "'burbs, The (1989)", 'genres': 'Comedy'}
{'_id': 3420, 'title': '...And Justice for All (1979)', 'genres': 'Drama|Thriller'}
{'_id': 889, 'title': '1-900 (1994)', 'genres': 'Romance'}
{'_id': 2572, 'title': '10 Things I Hate About You (1999)', 'genres': 'Comedy|Romance'}
{'_id': 2085, 'title': '101 Dalmatians (1961)', 'genres': "Animation|Children's"}
{'_id': 1367, 'title': '101 Dalmatians (1996)', 'genres': "Children's|Comedy"}
{'_id': 1203, 'title': '12 Angry Men (1957)', 'genres': 'Drama'}
{'_id': 2826, 'title': '13th Warrior, The (1999)', 'genres': 'Action|Horror|Thriller'}
{'_id': 1609, 'title': '187 (1997)', 'genres': 'Drama'}
{'_id': 999, 'title': '2 Days in the Valley (1996)', 'genres': 'Crime'}
{'_id': 2492, 'title': '20 Dates (1998)

**Q**: How many users have seen the movie "Star Wars: Episode V - The Empire Strikes Back (1980)" (`_id 1196`) ? The `movies` argument is an array so we should try the [elemMatch](https://docs.mongodb.com/manual/reference/operator/projection/elemMatch/) operator here.

In [105]:
results_swV = users_collection.find({"movies" : {"$elemMatch" : {"movieid" :1196}}})

compt_swV = 0
for user in results_swV : 
    compt_swV += 1
    
print(compt_swV)

2990


**Q**: And how many gave it a rating of 1 or 2 ?

In [106]:
results_swV_rate = users_collection.find({"movies" : {"$elemMatch" : {"movieid" :1196, "rating": {"$in" : [1,2]}}}})

compt_swV_rating = 0
for user in results_swV_rate : 
    compt_swV_rating += 1
    
print(compt_swV_rating)

105


### Updating data

**Q**: Insert a new user with the properties `name`, `gender` ('M' or'F'), `occupation` and `age`, using the `insert_one()` command. Display it with `find_one()`.

In [107]:
new_user = {
    'name': 'Roger',
    'gender' : 'M',
    'occupation' : 'data scientist',
    'age' : '64'
}

try :
    result2 = users_collection.insert_one(new_user)
    print(f"Utilisateur inséré avec l'ID : {result2.inserted_id}")
except errors.DuplicateKeyError:
    print("Erreur : l'email existe déjà dans la base de données.")

Utilisateur inséré avec l'ID : 67939420df3d5360bbc84b75


**Q**: Add an appreciation on a viewed movie with `update_one()`, add the movies property containing a table with a document (`movieid`, `rating`, `timestamp` with the value `datetime.datetime.utcnow()`).

You will need to read the documentation on [update operators](https://docs.mongodb.org/manual/reference/operator/update/).

In [108]:
filter = {'name' : 'Roger'}

update = {
    "$set": {
        'movieid' : '4352432',
        'rating' : 5,
        'timestamp' : datetime.datetime.utcnow()
    }
}

result3 = users_collection.update_one(filter, update, upsert=True)

if result3.upserted_id:
    print(f"Utilisateur inséré avec l'ID : {result.upserted_id}")
else:
    print("Mise à jour effectuée.")


Mise à jour effectuée.


**Q**: Find the number of users who have declared a `programmer` occupation. Modify them so that they are `developer`. Verify your update.

In [109]:
result4 = users_collection.find({'occupation' : 'programmer'})

compt_prog = 0

for prog in result4:
    compt_prog += 1

print(f"Il y a {compt_prog} personnes qui ont renseigné la profession programmeur")

Il y a 0 personnes qui ont renseigné la profession programmeur


In [110]:
#Modification de programmer à developer
filter1 = {'occupation' : 'programmer'}

update1 = {
    "$set" : {
        'occupation' : 'developer'
    }
}

result5 = users_collection.update_many(filter1, update1)

## II. Modelling a blog

We will now model a blog using Mongo. 

First, switch to a new `Blog` database. Each blog post will have the following arguments:

* The author (author field, string type)
* The date (date field, string type in YYYY-MM-DD format)
* The content (field content)
* Tags (field tags, a string array)
* A list of comments (field comments) containing:
 * The author (author field, string type)
 * The date (date field, string type in YYYY-MM-DD format)
 * The content (field content)


**Q**: Create a first post by `rick`, on January 15th, with the tags `mongodb` and `nosql`.

In [115]:
#Blog database
db_blog = client.Blog 
blog_collection = db_blog['Blog']

#First post
first_post = {
    "author": "Rick",
    "date": "2025-01-15",
    "content": "This is Rick's first post. Stay tuned for more content about MongoDB and NoSQL.",
    "tags": ["mongodb", "nosql"],
    "comments": []  
}

#Insérer le post 
insert_post = blog_collection.insert_one(first_post)

print(f"First post inserted with ID: {insert_post.inserted_id}")

First post inserted with ID: 67939ab8df3d5360bbc84b77


**Q**: Create a second post by `kate`, on January 21, with the tag `nosql` and a comment from `rick` on the same day.

In [116]:
second_post = {
    "author": "Kate",
    "date": "2025-01-21",
    "content": "This is Kate's second post, diving deeper into NoSQL databases.",
    "tags": ["nosql"],
    "comments": [
        {
            "author": "Rick",
            "date": "2025-01-21",
            "content": "Great post, Kate! I love learning more about NoSQL."
        }
    ]
}

insert_second_post = blog_collection.insert_one(second_post)

print(f"Second post inserted with ID: {insert_second_post.inserted_id}")

Second post inserted with ID: 67939b1bdf3d5360bbc84b78


**Q**: Display the author of the last post with the tag `nosql`

In [117]:
last_post = blog_collection.find_one(
    {"tags":"nosql"},
    sort=[("date", -1)]
)

if last_post:
    print(f"The author of the last 'nosql' post is: {last_post['author']}")
else:
    print("No posts found with the 'nosql' tag.")

The author of the last 'nosql' post is: Kate


**Q**: Add a comment by `jack` on January 25, to `kate`'s post

In [119]:
#Commentaire de jack
new_comment = {
    "author":"Jack",
    "date":"2025-01-25",
    "content":"Great insights!"
}

result6 = blog_collection.update_one(
    {"author": "Kate", "date": "2025-01-21"},  
    {"$push": {"comments": new_comment}}
)

if result6.modified_count > 0:
    print("Comment added successfully!")
else:
    print("No matching post found, or comment not added.")

Comment added successfully!


**Q**: Display all comments by `rick`

In [123]:
comments_by_rick = blog_collection.find(
    {"comments.author": "Rick"},  
    {"comments.$": 1} 
)

for post in comments_by_rick:
    for comment in post["comments"]:
        if comment["author"] == "Rick":
            print(f"Comment by Rick (Date: {comment['date']}): {comment['content']}")
            print("-----")

Comment by Rick (Date: 2025-01-21): Great post, Kate! I love learning more about NoSQL.
-----


## Postquisites

In [111]:
!mongo test_database --eval 'db.dropDatabase()'

'mongo' n'est pas reconnu en tant que commande interne
ou externe, un programme ex�cutable ou un fichier de commandes.


In [112]:
!mongo MovieLens --eval 'db.dropDatabase()'

'mongo' n'est pas reconnu en tant que commande interne
ou externe, un programme ex�cutable ou un fichier de commandes.


In [113]:
!mongo Blog --eval 'db.dropDatabase()'

'mongo' n'est pas reconnu en tant que commande interne
ou externe, un programme ex�cutable ou un fichier de commandes.
