# Mongo tutorial

## Prerequisites

### Documentation

You will find all documentation for :
* [Mongo commands](https://docs.mongodb.com/manual/reference/)
* [Mongo python client](http://api.mongodb.com/python/current/api/pymongo/mongo_client.html#pymongo.mongo_client.MongoClient)

### Import libraries

In [1]:
import datetime
from pprint import pprint

import pymongo
from pymongo import MongoClient

In [2]:
client = MongoClient('localhost', 27017)

In [3]:
# let's work in a test_database
db = client.test_database
posts = db.posts

In [4]:
post = {
    "author": "Mike",
    "text": "My first blog post!",
    "tags": ["mongodb", "python", "pymongo"],
    "date": datetime.datetime.utcnow()
}
post_id = posts.insert_one(post).inserted_id
post_id

ObjectId('6793938d4afc85658fafd38e')

In [5]:
db.list_collection_names()

['posts']

In [6]:
pprint(posts.find_one())

{'_id': ObjectId('6793938d4afc85658fafd38e'),
 'author': 'Mike',
 'date': datetime.datetime(2025, 1, 24, 13, 20, 13, 153000),
 'tags': ['mongodb', 'python', 'pymongo'],
 'text': 'My first blog post!'}


You can launch a terminal aside, connect to your server with a mongo client and check that the value is present :

```bash
vagrant@nosql:~$ mongo
> show databases;
admin          0.000GB
config         0.000GB
local          0.000GB
test_database  0.000GB
> use test_database;
switched to db test_database
> db.posts.find()
{ 
    "_id" : ObjectId("..."), 
    "author" : "Mike", 
    "text" : "My first blog post!", 
    "tags" : [ "mongodb", "python", "pymongo"], 
    "date" : ISODate("2019-02-10T11:33:47.883Z") 
}
```

## I. Quick start

### First steps

**Q** : Create a document `{msg: 'hello'}` in the `test` collection with `insert_one()`. Fetch it back to display it. What is the `_id` for ?

NB : if the collection doesn't exist yet, MongoDB automatically creates it.

In [7]:
test = db.test
test1 = {"msg": "hello"}

test.insert_one(test1)

pprint(test.find_one())

{'_id': ObjectId('6793938d4afc85658fafd38f'), 'msg': 'hello'}


**Q**: Display the number of documents inside the `test` collection

In [8]:
test.count_documents({})

1

### Interacting with a database

We have 2 `.json` files we want to interact with inside the `data` folder. Let's first dump them into a `MovieLens` database, inside `users` and `movies` collections.

For this section, you will need to read a bit on [query operators](https://docs.mongodb.com/manual/reference/operator/query/#query-selectors). Most methods on collections you will use have `filter` as a first parameter, on which you must pass a dictionary of query parameters.

**Q** : In the `MovieLens` database, load `data/movielens_movies.json` into `movies` and `data/movielens_users.json` into `users`. 

Use the dedicated shell command for this : `mongoimport --db <some_db> --collection <some_collection> --file <some_file>` 

In [9]:
# Création de la base de données MovieLens et des collections users et movies
MovieLens = client.MovieLens
users = MovieLens.users
movies = MovieLens.movies

!mongoimport --db MovieLens --collection users --file "../data/movielens_users.json"
!mongoimport --db MovieLens --collection movies --file "../data/movielens_movies.json"

# Affichage des collections
print(MovieLens.list_collection_names())

2025-01-24T14:20:13.649+0100	connected to: localhost
2025-01-24T14:20:15.082+0100	imported 6040 documents


['users', 'movies']


2025-01-24T14:20:15.346+0100	connected to: localhost
2025-01-24T14:20:15.528+0100	imported 3883 documents


**Q** : how many users are in the `MovieLens` database ?

In [10]:
users.count_documents({})

6040

**Q** : Display all comedies (the `genres` property equals `Comedy`). 

NB : You will need to find how to go through a `command_cursor`, then use the `pprint` function for a better display of those documents.

In [11]:
comedy = movies.find({"genres": "Comedy"})
for movie in comedy:
    pprint(movie)

{'_id': 19,
 'genres': 'Comedy',
 'title': 'Ace Ventura: When Nature Calls (1995)'}
{'_id': 5, 'genres': 'Comedy', 'title': 'Father of the Bride Part II (1995)'}
{'_id': 38, 'genres': 'Comedy', 'title': 'It Takes Two (1995)'}
{'_id': 63,
 'genres': 'Comedy',
 'title': "Don't Be a Menace to South Central While Drinking Your Juice in the "
          'Hood (1996)'}
{'_id': 65, 'genres': 'Comedy', 'title': 'Bio-Dome (1996)'}
{'_id': 69, 'genres': 'Comedy', 'title': 'Friday (1995)'}
{'_id': 52, 'genres': 'Comedy', 'title': 'Mighty Aphrodite (1995)'}
{'_id': 88, 'genres': 'Comedy', 'title': 'Black Sheep (1996)'}
{'_id': 96, 'genres': 'Comedy', 'title': 'In the Bleak Midwinter (1995)'}
{'_id': 101, 'genres': 'Comedy', 'title': 'Bottle Rocket (1996)'}
{'_id': 102, 'genres': 'Comedy', 'title': 'Mr. Wrong (1996)'}
{'_id': 115, 'genres': 'Comedy', 'title': 'Happiness Is in the Field (1995)'}
{'_id': 119, 'genres': 'Comedy', 'title': 'Steal Big, Steal Little (1995)'}
{'_id': 125, 'genres': 'Comedy

**Q** : Fetch and display the `name` and `occupation` for Clifford Johnathan. The second paramater for `find()` ([doc here](https://api.mongodb.com/python/current/api/pymongo/collection.html#pymongo.collection.Collection.find)) is called the `projection` and is used to limit which data to fetch from the query.

In [12]:
docs = users.find({"name": "Clifford Johnathan"}, projection=["name", "occupation"])
for user in docs:
    pprint(user)

{'_id': 1276, 'name': 'Clifford Johnathan', 'occupation': 'technician/engineer'}


**Q**: How many minors (by `age`) have rated movies ?

In [13]:
# trouver tous les utilisateurs ayant moins de 18 ans

compte = 0
for doc in users.find({"age" : {"$lt": 18}}):
    compte += 1
compte

222

**Q**: Display science fiction movies ('Sci-Fi') and suspense movies ('Thriller'). This time you need to use a regex to parse genres and look for those values.

In [14]:
compte = 0
for movie in movies.find({"genres": {"$regex": "(Sci-Fi|Thriller)"}}):
    pprint(movie)
    compte += 1
compte

{'_id': 16, 'genres': 'Drama|Thriller', 'title': 'Casino (1995)'}
{'_id': 18, 'genres': 'Thriller', 'title': 'Four Rooms (1995)'}
{'_id': 6, 'genres': 'Action|Crime|Thriller', 'title': 'Heat (1995)'}
{'_id': 22, 'genres': 'Crime|Drama|Thriller', 'title': 'Copycat (1995)'}
{'_id': 23, 'genres': 'Thriller', 'title': 'Assassins (1995)'}
{'_id': 24, 'genres': 'Drama|Sci-Fi', 'title': 'Powder (1995)'}
{'_id': 29,
 'genres': 'Adventure|Sci-Fi',
 'title': 'City of Lost Children, The (1995)'}
{'_id': 10, 'genres': 'Action|Adventure|Thriller', 'title': 'GoldenEye (1995)'}
{'_id': 32, 'genres': 'Drama|Sci-Fi', 'title': 'Twelve Monkeys (1995)'}
{'_id': 47, 'genres': 'Crime|Thriller', 'title': 'Seven (Se7en) (1995)'}
{'_id': 50, 'genres': 'Crime|Thriller', 'title': 'Usual Suspects, The (1995)'}
{'_id': 61, 'genres': 'Drama|Thriller', 'title': 'Eye for an Eye (1996)'}
{'_id': 66,
 'genres': 'Sci-Fi|Thriller',
 'title': 'Lawnmower Man 2: Beyond Cyberspace (1996)'}
{'_id': 70,
 'genres': 'Action|Come

698

**Q**: If we want more advanced textual search, we need a particular index. Use the `create_index()` method to index as [TEXT](https://docs.mongodb.com/manual/core/index-text/) the `genres` field of the `movies` collection.

In [15]:
movies.create_index([("genres", pymongo.TEXT)])

'genres_text'

**Q**: Restart the search for science fiction and thriller movies with the operator `$text`

In [16]:
compte = 0
for movie in movies.find({"$text": {"$search": "Sci-Fi Thriller"}}):
    pprint(movie)
    compte += 1
compte

{'_id': 3934, 'genres': 'Sci-Fi', 'title': 'Kronos (1957)'}
{'_id': 3878, 'genres': 'Sci-Fi', 'title': 'X: The Unknown (1956)'}
{'_id': 3780, 'genres': 'Sci-Fi', 'title': 'Rocketship X-M (1950)'}
{'_id': 3779, 'genres': 'Sci-Fi', 'title': 'Project Moon Base (1953)'}
{'_id': 3687, 'genres': 'Sci-Fi', 'title': 'Light Years (1988)'}
{'_id': 3658, 'genres': 'Sci-Fi', 'title': 'Quatermass and the Pit (1967)'}
{'_id': 3486, 'genres': 'Sci-Fi', 'title': 'Devil Girl From Mars (1954)'}
{'_id': 3375, 'genres': 'Sci-Fi', 'title': 'Destination Moon (1950)'}
{'_id': 3354, 'genres': 'Sci-Fi', 'title': 'Mission to Mars (2000)'}
{'_id': 3032, 'genres': 'Sci-Fi', 'title': 'Omega Man, The (1971)'}
{'_id': 2698, 'genres': 'Sci-Fi', 'title': 'Zone 39 (1997)'}
{'_id': 2667, 'genres': 'Sci-Fi', 'title': 'Mole People, The (1956)'}
{'_id': 2666, 'genres': 'Sci-Fi', 'title': 'It Conquered the World (1956)'}
{'_id': 2665,
 'genres': 'Sci-Fi',
 'title': 'Earth Vs. the Flying Saucers (1956)'}
{'_id': 2663,
 'genr

698

**Q**: Display the first 30 movies (`limit`) in alphabetical order (`sort`) by title

In [17]:
for movie in movies.find().sort("title").limit(30):
    pprint(movie)

{'_id': 2031, 'genres': "Children's|Comedy", 'title': '$1,000,000 Duck (1971)'}
{'_id': 3112, 'genres': 'Drama', 'title': "'Night Mother (1986)"}
{'_id': 779, 'genres': 'Drama|Romance', 'title': "'Til There Was You (1997)"}
{'_id': 2072, 'genres': 'Comedy', 'title': "'burbs, The (1989)"}
{'_id': 3420,
 'genres': 'Drama|Thriller',
 'title': '...And Justice for All (1979)'}
{'_id': 889, 'genres': 'Romance', 'title': '1-900 (1994)'}
{'_id': 2572,
 'genres': 'Comedy|Romance',
 'title': '10 Things I Hate About You (1999)'}
{'_id': 2085,
 'genres': "Animation|Children's",
 'title': '101 Dalmatians (1961)'}
{'_id': 1367, 'genres': "Children's|Comedy", 'title': '101 Dalmatians (1996)'}
{'_id': 1203, 'genres': 'Drama', 'title': '12 Angry Men (1957)'}
{'_id': 2826,
 'genres': 'Action|Horror|Thriller',
 'title': '13th Warrior, The (1999)'}
{'_id': 1609, 'genres': 'Drama', 'title': '187 (1997)'}
{'_id': 999, 'genres': 'Crime', 'title': '2 Days in the Valley (1996)'}
{'_id': 2492, 'genres': 'Comedy

**Q**: How many users have seen the movie "Star Wars: Episode V - The Empire Strikes Back (1980)" (`_id 1196`) ? The `movies` argument is an array so we should try the [elemMatch](https://docs.mongodb.com/manual/reference/operator/projection/elemMatch/) operator here.

In [18]:
c = 0
for user in users.find({"movies": {"$elemMatch": {"movieid": 1196}}}):
    c+=1
c

2990

**Q**: And how many gave it a rating of 1 or 2 ?

In [19]:
c = 0
for user in users.find({"movies": {"$elemMatch": {"movieid": 1196, "rating": {"$lt": 2}}} }):
    c+=1
c

22

### Updating data

**Q**: Insert a new user with the properties `name`, `gender` ('M' or'F'), `occupation` and `age`, using the `insert_one()` command. Display it with `find_one()`.

In [20]:
# Insert a new user with the properties `name`, `gender` ('M' or'F'), `occupation` and `age`, using the `insert_one()` command. Display it with `find_one()`.

new_user = {
    "name": "John Doe",
    "gender": "M",
    "occupation": "student",
    "age": 20
}

users.insert_one(new_user)

pprint(users.find_one({"name": "John Doe"}))

{'_id': ObjectId('679393904afc85658fafd390'),
 'age': 20,
 'gender': 'M',
 'name': 'John Doe',
 'occupation': 'student'}


**Q**: Add an appreciation on a viewed movie with `update_one()`, add the movies property containing a table with a document (`movieid`, `rating`, `timestamp` with the value `datetime.datetime.utcnow()`).

You will need to read the documentation on [update operators](https://docs.mongodb.org/manual/reference/operator/update/).

In [21]:
users.update_one({"name": "John Doe"}, {"$push": {"movies": {"movieid": 1196, "rating": 4, "timestamp": datetime.datetime.utcnow()}}})
pprint(users.find_one({"name": "John Doe"}))

{'_id': ObjectId('679393904afc85658fafd390'),
 'age': 20,
 'gender': 'M',
 'movies': [{'movieid': 1196,
             'rating': 4,
             'timestamp': datetime.datetime(2025, 1, 24, 13, 20, 17, 4000)}],
 'name': 'John Doe',
 'occupation': 'student'}


**Q**: Find the number of users who have declared a `programmer` occupation. Modify them so that they are `developer`. Verify your update.

In [22]:
c = 0
for user in users.find({"occupation": "programmer" }):
    user.update({"occupation": "developer"})
    pprint(user)
    c+=1
c

{'_id': 6024,
 'age': 31,
 'gender': 'M',
 'movies': [{'movieid': 2058, 'rating': 4, 'timestamp': 956749684},
            {'movieid': 589, 'rating': 5, 'timestamp': 956749555},
            {'movieid': 2406, 'rating': 4, 'timestamp': 956749639},
            {'movieid': 3082, 'rating': 4, 'timestamp': 956749725},
            {'movieid': 3405, 'rating': 4, 'timestamp': 956749566},
            {'movieid': 3408, 'rating': 5, 'timestamp': 956749327},
            {'movieid': 218, 'rating': 4, 'timestamp': 956749186},
            {'movieid': 2490, 'rating': 5, 'timestamp': 956749793},
            {'movieid': 3443, 'rating': 4, 'timestamp': 956749779},
            {'movieid': 2805, 'rating': 4, 'timestamp': 956749367},
            {'movieid': 260, 'rating': 4, 'timestamp': 956749537},
            {'movieid': 1127, 'rating': 4, 'timestamp': 956749668},
            {'movieid': 2108, 'rating': 4, 'timestamp': 956836558},
            {'movieid': 648, 'rating': 4, 'timestamp': 956749725},
          

388

## II. Modelling a blog

We will now model a blog using Mongo. 

First, switch to a new `Blog` database. Each blog post will have the following arguments:

* The author (author field, string type)
* The date (date field, string type in YYYY-MM-DD format)
* The content (field content)
* Tags (field tags, a string array)
* A list of comments (field comments) containing:
 * The author (author field, string type)
 * The date (date field, string type in YYYY-MM-DD format)
 * The content (field content)


**Q**: Create a first post by `rick`, on January 15th, with the tags `mongodb` and `nosql`.

In [23]:
Blog = client.blog_db
posts = Blog.posts

post = {
    "author": "rick",
    "date": "2021-01-15",
    "content": "This is a generic blog post",
    "tags": ["mongodb", "nosql"],
    "comments": []
}

posts.insert_one(post)

InsertOneResult(ObjectId('679393924afc85658fafd391'), acknowledged=True)

**Q**: Create a second post by `kate`, on January 21, with the tag `nosql` and a comment from `rick` on the same day.

In [24]:
post2 = {
    "author": "kate",
    "date": "2021-01-21",
    "content": "This is another generic blog post",
    "tags": ["nosql"],
    "comments": [{
        "author": "rick",
        "date": "2021-01-21",
        "content": "This is a generic comment"
    }]
}

posts.insert_one(post2)

InsertOneResult(ObjectId('679393924afc85658fafd392'), acknowledged=True)

In [25]:
for post in posts.find():
    pprint(post)

{'_id': ObjectId('679393924afc85658fafd391'),
 'author': 'rick',
 'comments': [],
 'content': 'This is a generic blog post',
 'date': '2021-01-15',
 'tags': ['mongodb', 'nosql']}
{'_id': ObjectId('679393924afc85658fafd392'),
 'author': 'kate',
 'comments': [{'author': 'rick',
               'content': 'This is a generic comment',
               'date': '2021-01-21'}],
 'content': 'This is another generic blog post',
 'date': '2021-01-21',
 'tags': ['nosql']}


**Q**: Display the author of the last post with the tag `nosql`

In [26]:
for post in posts.find().sort("date", pymongo.DESCENDING).limit(1):
    pprint(post)

{'_id': ObjectId('679393924afc85658fafd392'),
 'author': 'kate',
 'comments': [{'author': 'rick',
               'content': 'This is a generic comment',
               'date': '2021-01-21'}],
 'content': 'This is another generic blog post',
 'date': '2021-01-21',
 'tags': ['nosql']}


**Q**: Add a comment by `jack` on January 25, to `kate`'s post

In [27]:
posts.update_one({"author": "kate"},
                 {"$push": {"comments": {
                     "author": "jack",
                     "date": "2021-01-25",
                     "content": "This is again a generic comment"
                    }}
                }
)

UpdateResult({'n': 1, 'nModified': 1, 'ok': 1.0, 'updatedExisting': True}, acknowledged=True)

In [28]:
for post in posts.find().sort("date", pymongo.DESCENDING).limit(1):
    pprint(post)

{'_id': ObjectId('679393924afc85658fafd392'),
 'author': 'kate',
 'comments': [{'author': 'rick',
               'content': 'This is a generic comment',
               'date': '2021-01-21'},
              {'author': 'jack',
               'content': 'This is again a generic comment',
               'date': '2021-01-25'}],
 'content': 'This is another generic blog post',
 'date': '2021-01-21',
 'tags': ['nosql']}


**Q**: Display all comments by `jack`

In [29]:
for post in posts.find({"comments.author": "jack"}):
    for comment in post['comments']:
        if comment['author'] == 'jack':
            pprint(comment)

{'author': 'jack',
 'content': 'This is again a generic comment',
 'date': '2021-01-25'}


## Postquisites

In [30]:
!mongo test_database --eval 'db.dropDatabase()'

'mongo' n'est pas reconnu en tant que commande interne
ou externe, un programme ex�cutable ou un fichier de commandes.


In [31]:
!mongo MovieLens --eval 'db.dropDatabase()'

'mongo' n'est pas reconnu en tant que commande interne
ou externe, un programme ex�cutable ou un fichier de commandes.


In [32]:
!mongo Blog --eval 'db.dropDatabase()'

'mongo' n'est pas reconnu en tant que commande interne
ou externe, un programme ex�cutable ou un fichier de commandes.
