# Mongo tutorial

## Prerequisites

### Documentation

You will find all documentation for :
* [Mongo commands](https://docs.mongodb.com/manual/reference/)
* [Mongo python client](http://api.mongodb.com/python/current/api/pymongo/mongo_client.html#pymongo.mongo_client.MongoClient)

### Import libraries

In [1]:
!pip install pymongo

Collecting pymongo
  Downloading pymongo-4.6.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (22 kB)
Collecting dnspython<3.0.0,>=1.16.0 (from pymongo)
  Downloading dnspython-2.4.2-py3-none-any.whl.metadata (4.9 kB)
Downloading pymongo-4.6.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (680 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m680.8/680.8 kB[0m [31m11.7 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hDownloading dnspython-2.4.2-py3-none-any.whl (300 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m300.4/300.4 kB[0m [31m25.7 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: dnspython, pymongo
Successfully installed dnspython-2.4.2 pymongo-4.6.1


In [2]:
import datetime
from pprint import pprint

import pymongo
from pymongo import MongoClient

In [3]:
client = MongoClient('mongo', 27017)

In [4]:
# let's work in a test_database
db = client.test_database
posts = db.posts

In [5]:
post = {
    "author": "Mike",
    "text": "My first blog post!",
    "tags": ["mongodb", "python", "pymongo"],
    "date": datetime.datetime.utcnow()
}
post_id = posts.insert_one(post).inserted_id
post_id

ObjectId('65a13404c9b87f7955f6bb93')

In [6]:
db.list_collection_names()

['posts']

In [7]:
pprint(posts.find_one())

{'_id': ObjectId('65a13404c9b87f7955f6bb93'),
 'author': 'Mike',
 'date': datetime.datetime(2024, 1, 12, 12, 43, 48, 80000),
 'tags': ['mongodb', 'python', 'pymongo'],
 'text': 'My first blog post!'}


You can launch a terminal aside, connect to your server with a mongo client and check that the value is present :

```bash
vagrant@nosql:~$ mongo
> show databases;
admin          0.000GB
config         0.000GB
local          0.000GB
test_database  0.000GB
> use test_database;
switched to db test_database
> db.posts.find()
{ 
    "_id" : ObjectId("..."), 
    "author" : "Mike", 
    "text" : "My first blog post!", 
    "tags" : [ "mongodb", "python", "pymongo"], 
    "date" : ISODate("2019-02-10T11:33:47.883Z") 
}
```

## I. Quick start

### First steps

**Q** : Create a document `{msg: 'hello'}` in the `test` collection with `insert_one()`. Fetch it back to display it. What is the `_id` for ?

NB : if the collection doesn't exist yet, MongoDB automatically creates it.

In [8]:
document_data = {'msg': 'hello'}
result = posts.insert_one(document_data)

print(f"Inserted document ID: {result.inserted_id}")

fetched_document = posts.find_one({'_id': result.inserted_id})
print("Fetched document:", fetched_document)

Inserted document ID: 65a1340dc9b87f7955f6bb94
Fetched document: {'_id': ObjectId('65a1340dc9b87f7955f6bb94'), 'msg': 'hello'}


**Q**: Display the number of documents inside the `test` collection

In [9]:
num_documents = posts.count_documents({})
print(num_documents)

2


### Interacting with a database

We have 2 `.json` files we want to interact with inside the `data` folder. Let's first dump them into a `MovieLens` database, inside `users` and `movies` collections.

For this section, you will need to read a bit on [query operators](https://docs.mongodb.com/manual/reference/operator/query/#query-selectors). Most methods on collections you will use have `filter` as a first parameter, on which you must pass a dictionary of query parameters.

**Q** : In the `MovieLens` database, load `data/movielens_movies.json` into `movies` and `data/movielens_users.json` into `users`. 

Use the dedicated shell command for this : `mongoimport --db <some_db> --collection <some_collection> --file <some_file>` 

In [10]:
!conda install -c anaconda mongo-tools -y

Channels:
 - anaconda
 - conda-forge
Platform: linux-64
Collecting package metadata (repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /opt/conda

  added / updated specs:
    - mongo-tools


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    certifi-2023.11.17         |  py311h06a4308_0         160 KB  anaconda
    mongo-tools-100.9.4        |       h8361555_0        50.7 MB  conda-forge
    ------------------------------------------------------------
                                           Total:        50.9 MB

The following NEW packages will be INSTALLED:

  mongo-tools        conda-forge/linux-64::mongo-tools-100.9.4-h8361555_0 

The following packages will be SUPERSEDED by a higher-priority channel:

  certifi            conda-forge/noarch::certifi-2023.11.1~ --> anaconda/linux-64::certifi-2023.11.17-py311h06a4308_0 



Downloading and Extr

In [11]:
!mongoimport --db MoviesLens --collection users --file data/movielens_users.json --host mongo --port 27017
!mongoimport --db MoviesLens --collection movies --file data/movielens_movies.json --host mongo --port 27017

2024-01-12T12:45:45.217+0000	connected to: mongodb://mongo:27017/
2024-01-12T12:45:47.021+0000	6040 document(s) imported successfully. 0 document(s) failed to import.
2024-01-12T12:45:47.683+0000	connected to: mongodb://mongo:27017/
2024-01-12T12:45:47.758+0000	3883 document(s) imported successfully. 0 document(s) failed to import.


**Q** : how many users are in the `MovieLens` database ?

In [12]:
client = MongoClient('mongo', 27017)  # Use the name of your MongoDB service as the host

db = client['MoviesLens']
users_collection = db['users']

user_count = users_collection.count_documents({})

print("Number of users:", user_count)

Number of users: 6040


**Q** : Display all comedies (the `genres` property equals `Comedy`). 

NB : You will need to find how to go through a `command_cursor`, then use the `pprint` function for a better display of those documents.

In [13]:
movies_collection = db['movies']

comedies = movies_collection.find({'genres': 'Comedy'})

for comedy in comedies:
    print(comedy)

{'_id': 19, 'title': 'Ace Ventura: When Nature Calls (1995)', 'genres': 'Comedy'}
{'_id': 38, 'title': 'It Takes Two (1995)', 'genres': 'Comedy'}
{'_id': 5, 'title': 'Father of the Bride Part II (1995)', 'genres': 'Comedy'}
{'_id': 52, 'title': 'Mighty Aphrodite (1995)', 'genres': 'Comedy'}
{'_id': 63, 'title': "Don't Be a Menace to South Central While Drinking Your Juice in the Hood (1996)", 'genres': 'Comedy'}
{'_id': 65, 'title': 'Bio-Dome (1996)', 'genres': 'Comedy'}
{'_id': 69, 'title': 'Friday (1995)', 'genres': 'Comedy'}
{'_id': 88, 'title': 'Black Sheep (1996)', 'genres': 'Comedy'}
{'_id': 96, 'title': 'In the Bleak Midwinter (1995)', 'genres': 'Comedy'}
{'_id': 102, 'title': 'Mr. Wrong (1996)', 'genres': 'Comedy'}
{'_id': 101, 'title': 'Bottle Rocket (1996)', 'genres': 'Comedy'}
{'_id': 119, 'title': 'Steal Big, Steal Little (1995)', 'genres': 'Comedy'}
{'_id': 104, 'title': 'Happy Gilmore (1996)', 'genres': 'Comedy'}
{'_id': 109, 'title': 'Headless Body in Topless Bar (1995)'

**Q** : Fetch and display the `name` and `occupation` for Clifford Johnathan. The second paramater for `find()` ([doc here](https://api.mongodb.com/python/current/api/pymongo/collection.html#pymongo.collection.Collection.find)) is called the `projection` and is used to limit which data to fetch from the query.

In [14]:
clifford = users_collection.find_one({'name': 'Clifford Johnathan'}, {'_id': 0, 'name': 1, 'occupation': 1})

print(clifford)

{'name': 'Clifford Johnathan', 'occupation': 'technician/engineer'}


**Q**: How many minors (by `age`) have rated movies ?

In [15]:
age_threshold = 18

minors_count = users_collection.count_documents({'age': {'$lt': age_threshold}})

print(f"Number of minors who have rated movies: {minors_count}")

Number of minors who have rated movies: 222


**Q**: Display science fiction movies ('Sci-Fi') and suspense movies ('Thriller'). This time you need to use a regex to parse genres and look for those values.

In [22]:
import re

# Use a regex to match both 'Sci-Fi' and 'Thriller' genres
genres_regex = re.compile(r'Sci-Fi|Thriller', re.IGNORECASE)

# Use the regex in the query
Sci_Thri = movies_collection.find({'genres': {'$regex': genres_regex}})

# Iterate over the results and print each movie
for movie in Sci_Thri:
    print(movie)

{'_id': 16, 'title': 'Casino (1995)', 'genres': 'Drama|Thriller'}
{'_id': 18, 'title': 'Four Rooms (1995)', 'genres': 'Thriller'}
{'_id': 6, 'title': 'Heat (1995)', 'genres': 'Action|Crime|Thriller'}
{'_id': 22, 'title': 'Copycat (1995)', 'genres': 'Crime|Drama|Thriller'}
{'_id': 24, 'title': 'Powder (1995)', 'genres': 'Drama|Sci-Fi'}
{'_id': 23, 'title': 'Assassins (1995)', 'genres': 'Thriller'}
{'_id': 29, 'title': 'City of Lost Children, The (1995)', 'genres': 'Adventure|Sci-Fi'}
{'_id': 10, 'title': 'GoldenEye (1995)', 'genres': 'Action|Adventure|Thriller'}
{'_id': 32, 'title': 'Twelve Monkeys (1995)', 'genres': 'Drama|Sci-Fi'}
{'_id': 47, 'title': 'Seven (Se7en) (1995)', 'genres': 'Crime|Thriller'}
{'_id': 50, 'title': 'Usual Suspects, The (1995)', 'genres': 'Crime|Thriller'}
{'_id': 51, 'title': 'Guardian Angel (1994)', 'genres': 'Action|Drama|Thriller'}
{'_id': 61, 'title': 'Eye for an Eye (1996)', 'genres': 'Drama|Thriller'}
{'_id': 76, 'title': 'Screamers (1995)', 'genres': 'S

**Q**: If we want more advanced textual search, we need a particular index. Use the `create_index()` method to index as [TEXT](https://docs.mongodb.com/manual/core/index-text/) the `genres` field of the `movies` collection.

In [23]:
movies_collection.create_index([('genres', 'text')])

'genres_text'

**Q**: Restart the search for science fiction and thriller movies with the operator `$text`

In [26]:
# Use the regex in the query
Sci_Thri = movies_collection.find({'$text': {'$search': 'Sci-Fi Thriller'}})

# Iterate over the results and print each movie
for movie in Sci_Thri:
    print(movie)

{'_id': 3934, 'title': 'Kronos (1957)', 'genres': 'Sci-Fi'}
{'_id': 3878, 'title': 'X: The Unknown (1956)', 'genres': 'Sci-Fi'}
{'_id': 3779, 'title': 'Project Moon Base (1953)', 'genres': 'Sci-Fi'}
{'_id': 3780, 'title': 'Rocketship X-M (1950)', 'genres': 'Sci-Fi'}
{'_id': 3658, 'title': 'Quatermass and the Pit (1967)', 'genres': 'Sci-Fi'}
{'_id': 3687, 'title': 'Light Years (1988)', 'genres': 'Sci-Fi'}
{'_id': 3486, 'title': 'Devil Girl From Mars (1954)', 'genres': 'Sci-Fi'}
{'_id': 3354, 'title': 'Mission to Mars (2000)', 'genres': 'Sci-Fi'}
{'_id': 3375, 'title': 'Destination Moon (1950)', 'genres': 'Sci-Fi'}
{'_id': 3032, 'title': 'Omega Man, The (1971)', 'genres': 'Sci-Fi'}
{'_id': 2698, 'title': 'Zone 39 (1997)', 'genres': 'Sci-Fi'}
{'_id': 2667, 'title': 'Mole People, The (1956)', 'genres': 'Sci-Fi'}
{'_id': 2666, 'title': 'It Conquered the World (1956)', 'genres': 'Sci-Fi'}
{'_id': 2665, 'title': 'Earth Vs. the Flying Saucers (1956)', 'genres': 'Sci-Fi'}
{'_id': 2663, 'title':

**Q**: Display the first 30 movies (`limit`) in alphabetical order (`sort`) by title

In [27]:
result = movies_collection.find().sort('title', 1).limit(30)

# Iterate over the results and print each movie
for movie in result:
    print(movie)

{'_id': 2031, 'title': '$1,000,000 Duck (1971)', 'genres': "Children's|Comedy"}
{'_id': 3112, 'title': "'Night Mother (1986)", 'genres': 'Drama'}
{'_id': 779, 'title': "'Til There Was You (1997)", 'genres': 'Drama|Romance'}
{'_id': 2072, 'title': "'burbs, The (1989)", 'genres': 'Comedy'}
{'_id': 3420, 'title': '...And Justice for All (1979)', 'genres': 'Drama|Thriller'}
{'_id': 889, 'title': '1-900 (1994)', 'genres': 'Romance'}
{'_id': 2572, 'title': '10 Things I Hate About You (1999)', 'genres': 'Comedy|Romance'}
{'_id': 2085, 'title': '101 Dalmatians (1961)', 'genres': "Animation|Children's"}
{'_id': 1367, 'title': '101 Dalmatians (1996)', 'genres': "Children's|Comedy"}
{'_id': 1203, 'title': '12 Angry Men (1957)', 'genres': 'Drama'}
{'_id': 2826, 'title': '13th Warrior, The (1999)', 'genres': 'Action|Horror|Thriller'}
{'_id': 1609, 'title': '187 (1997)', 'genres': 'Drama'}
{'_id': 999, 'title': '2 Days in the Valley (1996)', 'genres': 'Crime'}
{'_id': 2492, 'title': '20 Dates (1998)

**Q**: How many users have seen the movie "Star Wars: Episode V - The Empire Strikes Back (1980)" (`_id 1196`) ? The `movies` argument is an array so we should try the [elemMatch](https://docs.mongodb.com/manual/reference/operator/projection/elemMatch/) operator here.

In [67]:
result = users_collection.find().limit(2)

for movie in result:
    print(movie)

{'_id': 6038, 'name': 'Yaeko Hassan', 'gender': 'F', 'age': 95, 'occupation': 'academic/educator', 'movies': [{'movieid': 1419, 'rating': 4, 'timestamp': 956714815}, {'movieid': 920, 'rating': 3, 'timestamp': 956706827}, {'movieid': 3088, 'rating': 5, 'timestamp': 956707640}, {'movieid': 232, 'rating': 4, 'timestamp': 956707640}, {'movieid': 1136, 'rating': 4, 'timestamp': 956707708}, {'movieid': 1148, 'rating': 5, 'timestamp': 956707604}, {'movieid': 1183, 'rating': 5, 'timestamp': 956717204}, {'movieid': 2146, 'rating': 4, 'timestamp': 956706909}, {'movieid': 3548, 'rating': 4, 'timestamp': 956707604}, {'movieid': 356, 'rating': 4, 'timestamp': 956707005}, {'movieid': 1210, 'rating': 4, 'timestamp': 956706876}, {'movieid': 1223, 'rating': 5, 'timestamp': 956707734}, {'movieid': 1276, 'rating': 3, 'timestamp': 956707604}, {'movieid': 1296, 'rating': 5, 'timestamp': 956714684}, {'movieid': 1354, 'rating': 3, 'timestamp': 956714725}, {'movieid': 1387, 'rating': 2, 'timestamp': 956707005

In [40]:
movie_id = 1196

count = users_collection.count_documents({'movies': {'$elemMatch': {'movieid': movie_id}}})

print(f'The number of user who have rated the movie Star Wars: Episode V - The Empire Strikes Back (1980): {count}')

The number of user who have rated the movie Star Wars: Episode V - The Empire Strikes Back (1980): 2990


**Q**: And how many gave it a rating of 1 or 2 ?

In [43]:
movie_id = 1196

count_rate = users_collection.count_documents({'movies': {'$elemMatch': {'movieid': movie_id,'rating': {'$in': [1, 2]}}}})

print(f'The number of user who have rated 1 or 2 the movie Star Wars: Episode V - The Empire Strikes Back (1980): {count_rate}')

The number of user who have rated 1 or 2 the movie Star Wars: Episode V - The Empire Strikes Back (1980): 105


### Updating data

**Q**: Insert a new user with the properties `name`, `gender` ('M' or'F'), `occupation` and `age`, using the `insert_one()` command. Display it with `find_one()`.

In [50]:
user = {
    "name": "Natacha",
    "gender": "F",  # corrected the typo in 'gender'
    "occupation": "student",
    "age": 25,  # corrected the typo in 'age' and made it an integer
}

# Use insert_one to insert the new user into the collection
result = users_collection.insert_one(user)

# Use find_one to retrieve and display the inserted user
inserted_user = users_collection.find_one({"_id": result.inserted_id})
pprint(inserted_user)

{'_id': ObjectId('65a14523c9b87f7955f6bb99'),
 'age': 25,
 'gender': 'F',
 'name': 'Natacha',
 'occupation': 'student'}


**Q**: Add an appreciation on a viewed movie with `update_one()`, add the movies property containing a table with a document (`movieid`, `rating`, `timestamp` with the value `datetime.datetime.utcnow()`).

You will need to read the documentation on [update operators](https://docs.mongodb.org/manual/reference/operator/update/).

In [57]:
from datetime import datetime

user_name = "Natacha"
movie_id = 1196
rating = 2

# Use update_one to add an appreciation to an existing user
result = users_collection.update_one(
    {"name": user_name},
    {
        "$push": {
            "movies": {
                "movieid": movie_id,
                "rating": rating,
                "timestamp": datetime.utcnow()
            }
        }
    }
)

# Check if the update was successful
if result.modified_count > 0:
    print(f"Appreciation added successfully for user: {user_name}")
else:
    print(f"User not found or no modifications were made")

Appreciation added successfully for user: Natacha
{'_id': ObjectId('65a144d5c9b87f7955f6bb96'),
 'age': 25,
 'gender': 'F',
 'movies': [{'movieid': 1196,
             'rating': 4,
             'timestamp': datetime.datetime(2024, 1, 12, 14, 4, 18, 708000)},
            {'movieid': 1196,
             'rating': 2,
             'timestamp': datetime.datetime(2024, 1, 12, 14, 4, 42, 140000)}],
 'name': 'Natacha',
 'occupation': 'student'}


In [59]:
# Display the updated user
updated_user = users_collection.find_one({"name": user_name})
pprint(updated_user)

{'_id': ObjectId('65a144d5c9b87f7955f6bb96'),
 'age': 25,
 'gender': 'F',
 'movies': [{'movieid': 1196,
             'rating': 2,
             'timestamp': datetime.datetime(2024, 1, 12, 14, 4, 42, 140000)}],
 'name': 'Natacha',
 'occupation': 'student'}


**Q**: Find the number of users who have declared a `programmer` occupation. Modify them so that they are `developer`. Verify your update.

In [70]:
count_dev = users_collection.count_documents({'occupation': 'programmer'})
print(f'The number of users with occupation "developer": {count_dev}')

The number of users with occupation "developer": 388


In [75]:
result = users_collection.update_many(
    {'occupation': 'programmer'},
    {'$set': {'occupation': 'developer'}}
)

count_dev = users_collection.count_documents({'occupation': 'programmer'})
print(f'The number of users with occupation "developer": {count_dev}')

The number of users with occupation "developer": 0


## II. Modelling a blog

We will now model a blog using Mongo. 

First, switch to a new `Blog` database. Each blog post will have the following arguments:

* The author (author field, string type)
* The date (date field, string type in YYYY-MM-DD format)
* The content (field content)
* Tags (field tags, a string array)
* A list of comments (field comments) containing:
 * The author (author field, string type)
 * The date (date field, string type in YYYY-MM-DD format)
 * The content (field content)


In [77]:
blog_db = client['Blog']

posts_collection = blog_db['posts']

**Q**: Create a first post by `rick`, on January 15th, with the tags `mongodb` and `nosql`.

In [78]:
post_data = {
    'author': 'Rick',
    'date': '2023-01-15', 
    'content': 'This is my first blog post about MongoDB.',
    'tags': ['mongodb', 'nosql'],
    'comments': []
}

result = posts_collection.insert_one(post_data)

print(f'Inserted post with _id: {result.inserted_id}')

Inserted post with _id: 65a14c79c9b87f7955f6bb9a


**Q**: Create a second post by `kate`, on January 21, with the tag `nosql` and a comment from `rick` on the same day.

In [79]:
post_data = {
    'author': 'Kate',
    'date': '2023-01-21', 
    'content': 'This is my first blog post about MongoDB.',
    'tags': ['nosql'],
    'comments': [
        {
            'author': 'Rick',
            'date': '2023-01-21',
            'content': 'Great post, Kate!'
        }
    ]
}

result_2 = posts_collection.insert_one(post_data)

print(f'Inserted post with _id: {result.inserted_id}')

Inserted post with _id: 65a14c79c9b87f7955f6bb9a


**Q**: Display the author of the last post with the tag `nosql`

In [81]:
last_post = posts_collection.find_one(
    {'tags': 'nosql'},
    sort=[('date', -1)])
pprint(last_post)

{'_id': ObjectId('65a14d01c9b87f7955f6bb9b'),
 'author': 'Kate',
 'comments': [{'author': 'Rick',
               'content': 'Great post, Kate!',
               'date': '2023-01-21'}],
 'content': 'This is my first blog post about MongoDB.',
 'date': '2023-01-21',
 'tags': ['nosql']}


**Q**: Add a comment by `jack` on January 25, to `kate`'s post

In [82]:
result = posts_collection.update_one(
    {'author': 'Kate'},
    {
        "$push": {
            'comments': [
        {
            'author': 'jack',
            'date': '2023-01-25',
            'content': 'Bullshit!'
        }
    ]
        }
    }
)

## Postquisites

In [85]:
!mongo test_database --eval 'db.dropDatabase()'

/bin/bash: line 1: mongo: command not found


In [None]:
!mongo MovieLens --eval 'db.dropDatabase()'

In [None]:
!mongo Blog --eval 'db.dropDatabase()'