# Section 4: Creating and manipulating documents

<div class="alert alert-block alert-info">
   
## Jupyter Notebook basics

- **Code cells:** Cells shaded grey are code cells. As you work through the lab, run all code cells in order.
- **Running code:** To run code, press Shift + Enter or click the 'Run' button on the menu bar. Where there is code already in a cell, run it as written. Where a code cell contains the comment `#Write your code here`, write code to complete the task & then run it. If needed, consult the hints & answer to enter and run the correct entry for a task before moving on to the next task. Not every command will result in visible output.
- **Markdown cells:** The non-code cells are written in the Markdown markup language. Double-clicking a Markdown cell will cause it to appear in raw Markdown format. To render as text again, run the cell just like running a code cell: press Shift + Enter or click the 'Run' button on the menu bar.  
- **Restarting kernel:** If the notebook becomes unresponsive, or if either the notebook or your code displays unexpected behavior, reset the notebook by choosing "Kernel -> Restart & Clear Output" from the menu bar. This will clear all memory objects in the notebook, stop any code running, and reset the notebook to its initial state. 
- **Session timeout:** Sessions will automatically shut down after about 10 minutes of inactivity. (If you leave a lab window open in the foreground, this will generally be counted as “activity”.) See Binder docs: [How long will my Binder session last?](https://mybinder.readthedocs.io/en/latest/about/about.html?highlight=session%20last#how-long-will-my-binder-session-last)
- **File navigation:** To navigate the other files in this lab, click on the folder icon (File Browser) at the top of the left sidebar and choose the `Contents.ipynb` file (or access the Contents file directly [here](../Contents.ipynb))

</div>

## Introduction

In this section you'll create and manipulate documents in the `movies` collection of the `sample_mflix` database. Specifically, you'll:

- Insert a new document to a given database and collection using `db.collection.insert_one()`
- Delete all documents that match a condition using `db.collection.delete_many()` 
- Update one document that matches a condition using `db.collection.update_one()`
- Update all documents that match a condition using `db.collection.update_many()`

## Setup 

Before starting on the tasks below, run the following cells. 

This sets up a new MongoDB client, connects it to the MongoDB server instance and sets up to query the `sample_mflix` database.  

In [12]:
from pymongo import MongoClient
client = MongoClient()
db = client.mydatabase

Run the cell below, which imports the Python `pprint` module and method. You'll use `pprint` to print output in a more readable format. 

In [13]:
# Import the pprint method from the native Python pprint library
from pprint import pprint

## Tasks

In [14]:
movies = db.movies

### 1. Insert a new document using `db.collection.insert_one()`

Insert a document for a movie of your choice. Include the `title`, `director` and `runtime` fields.

In [15]:
# Write your code here 
new_doc = {
    'title': 'Our family',
    'director': 'Us',
    'runtime': '74880'
}

movies.insert_one(new_doc)

InsertOneResult(ObjectId('67320cddcfd2a5f8c3ff5d4c'), acknowledged=True)

#### <span style="color:blue">Hints</span>
- If your document runs longer than the code block, space it over multiple lines to make it easier to edit & debug. 
- If the document is successfully inserted, you will get a return value like `<pymongo.results.InsertOneResult at 0x7efe19e8e140>`.
- Related docs: [Insert data into MongoDB](https://docs.mongodb.com/guides/server/insert/) - select 'Python' client

### Check that document inserted

To confirm that the document was added to the database, run a `find()` command on the title of the movie you added. 

(There might multiple documents for the movie title if the database already contained data on this film before you added your document. Confirm that the exact document you entered is in the database.) 

In [16]:
# Replace the blank below with the title of the movie you inserted
cursor = db.movies.find({"title": "Our family"})

for movie in cursor:
    pprint(movie)

{'_id': ObjectId('67320c0dcfd2a5f8c3ff5d4a'),
 'director': 'Us',
 'runtime': '74880',
 'title': 'Our family'}
{'_id': ObjectId('67320cddcfd2a5f8c3ff5d4c'),
 'director': 'Us',
 'runtime': '74880',
 'title': 'Our family'}


### 2. Delete all documents that match a condition using `db.collection.delete_many()`
The `movies` collection contains 46 films longer than 240 minutes. Delete the data on all of them.

In [17]:
# Write your code here 
query = {'runtime': {'$gt': 240}}

result = movies.delete_many(query)
pprint(result)

DeleteResult({'n': 43, 'ok': 1.0}, acknowledged=True)


#### <span style="color:blue">Hints</span>
- Select documents where the `"runtime"` field value is greater than 240.
- You should get a return value like `<pymongo.results.DeleteResult at 0x7fb5a42c6730>`.
- Related docs: [Delete documents](https://docs.mongodb.com/manual/tutorial/remove-documents/)

### Check that documents deleted

To confirm that the documents were deleted from the database, count the number of movies with runtime greater than 240. There should now be zero.

In [18]:
db.movies.count_documents({"runtime": {"$gt": 240}})

0

### 3. Update one document using `db.collection.update_one()`
The information on number of award nominations for the movie "Coraline" is out of date. The movie has now received 46 awards. Use the `$inc` update operator to add 9 more award nominations to "Coraline".  

In [23]:
# Write your code here 
query = {'title': 'Coraline'}
new_vals = {'$inc': {'awards.nominations': 9}}

#### <span style="color:blue">Hints</span>
- Specify the `"awards.nominations"` sub-field to be increased by 9 using the `$inc` update operator. 
- You should get a return value like `<pymongo.results.UpdateResult at 0x7fb880029f50>`.
- Related docs: [`$inc` update operator](https://docs.mongodb.com/manual/reference/operator/update/inc/#mongodb-update-up.-inc)

### Check that document updated

To confirm that the document was updated, query for the movie "Coraline" again. The value of the `"awards.nominations"` subfield should now be 46. 

In [24]:
cursor = db.movies.find({"title": "Coraline"})

for movie in cursor:
    pprint(movie)

{'_id': ObjectId('573a13aaf29313caabd218b4'),
 'awards': {'nominations': 37,
            'text': 'Nominated for 1 Oscar. Another 9 wins & 37 nominations.',
            'wins': 10},
 'cast': ['Dakota Fanning', 'Teri Hatcher', 'Jennifer Saunders', 'Dawn French'],
 'countries': ['USA'],
 'directors': ['Henry Selick'],
 'genres': ['Animation', 'Fantasy'],
 'imdb': {'id': 327597, 'rating': 7.7, 'votes': 131308},
 'languages': ['English', 'Russian'],
 'plot': 'An adventurous girl finds another world that is a strangely '
         'idealized version of her frustrating home, but it has sinister '
         'secrets.',
 'rated': 'PG',
 'released': datetime.datetime(2009, 2, 6, 0, 0),
 'runtime': 100,
 'title': 'Coraline',
 'year': 2009}


### 4. Update all documents that match a condition using `db.collection.update_many()`
Update movies that were nominated for an award but didn't win by adding a `"summary"` sub-field to the `"awards"` field. Set the value of `"summary"` to `"Nominated but didn't win"`.

In [25]:
# Write your code here 
query = {"awards.nominations": {"$gt": 0}, "awards.wins": 0}
update = {"$set": {"awards.summary": "Nominated but didn't win"}}
result = movies.update_many(query, update)

print(f"Matched {result.matched_count} document(s), modified {result.modified_count} document(s)")

Matched 3506 document(s), modified 3506 document(s)


#### <span style="color:blue">Hints</span>
- Use the `$set` update operator. It will create the field specified in the update if it doesn't yet exist. 
- You should get a return value like `<pymongo.results.UpdateResult at 0x7f89e2c5c780>`.
- Related docs: [Update documents](https://docs.mongodb.com/manual/tutorial/update-documents/)

### Check that documents updated

To confirm that the documents were updated, query for movies that were nominated for an award but didn't win. The value of the `"awards.summary"` subfield should now be `"Nominated but didn't win"`. 

The query is limited to return three documents, for brevity's sake.

In [26]:
cursor = db.movies.find( {"awards.wins": 0, "awards.nominations": {"$gte": 1}}).limit(3)

for movie in cursor:
    pprint(movie)

{'_id': ObjectId('573a139af29313caabcf0eb3'),
 'awards': {'nominations': 8,
            'summary': "Nominated but didn't win",
            'text': '8 nominations.',
            'wins': 0},
 'cast': ['Johnny Depp', 'Heather Graham', 'Ian Holm', 'Robbie Coltrane'],
 'countries': ['USA'],
 'directors': ['Albert Hughes', 'Allen Hughes'],
 'genres': ['Horror', 'Mystery', 'Thriller'],
 'imdb': {'id': 120681, 'rating': 6.8, 'votes': 116865},
 'languages': ['English'],
 'plot': 'In Victorian Era London, a troubled clairvoyant police detective '
         'investigates the murders by Jack The Ripper.',
 'rated': 'R',
 'released': datetime.datetime(2001, 10, 19, 0, 0),
 'runtime': 122,
 'title': 'From Hell',
 'year': 2001}
{'_id': ObjectId('573a139af29313caabcf1e9c'),
 'awards': {'nominations': 2,
            'summary': "Nominated but didn't win",
            'text': '2 nominations.',
            'wins': 0},
 'cast': ['Sigourney Weaver',
          'Jennifer Love Hewitt',
          'Ray Liotta',
 

## Section wrap-up

Congratulations! In this section you created and manipulated documents in the `movies` collection of the `sample_mflix` database. Specifically, you:

- Inserted a new document to a given database and collection using `db.collection.insert_one()`
- Deleted all documents that match a condition using `db.collection.delete_many()` 
- Updated one document that matches a condition using `db.collection.update_one()`
- Updated all documents that match a condition using `db.collection.update_many()`

Your next step could be to use aggregation pipelines to do more complex data processing and work with documents across multiple collections in a database. 