# Using MongoDB with Python

We will look at how to store and retrieve collections of documents in Python using MongoDB. To begin:

## 1. Installing MongoDB

Begin first be installing MongoDB:

### 1.1 Installing MongoDB on MacOS:

You can find instructions for installing MongoDB on Mac [here](https://docs.mongodb.com/manual/tutorial/install-mongodb-on-os-x/)

Essentially we do:

```
brew install mongodb
```

MongoDB should automatically start as a background process when it is installed.

### 1.2 Installing MongoDB on LINUX:

Instructions for Ubuntu are [here](https://docs.mongodb.com/manual/tutorial/install-mongodb-on-ubuntu/)

Essentially:

```
sudo apt-get install mongodb
```

Unlike in MacOS however, MongoDB does not always start in the background once it is installed. To start it:

```
sudo service mongodb start
```

Otherwise:

```
mongod --dbpath=\path\to\database
```

### 1.3 Installing MongoDB on Windows

Windows is a whole weird beast with its own quirks, so we will not work directly with Windows. Instead you should follow the instructions [here](https://docs.microsoft.com/en-us/windows/wsl/install-win10) to install Windows Subsystem for Linux (WSL). We recommend installing Ubuntu since it is the simplest distribution to use.


### 1.4 Installing PyMongo

PyMongo is installed using pip3, so it is the same for all platforms

```
source venv\bin\activate
pip3 install pymongo
```

## 2. Working with MongoDB in Python

Now that all the installation is out of the way, let's see how to interface with MongoDB in Python.

### 2.1 Importing MongoClient from pymongo and Connecting

The main interface is MongoClient, which we import from pymongo. Once we have imported it we can call MongoClient to connect to MongoDB through the default port 27017:

In [8]:
from pymongo import MongoClient
from pymongo.errors import BulkWriteError
# Connect to the Mongo server
client = MongoClient('mongodb://localhost:27017/')



### 2.2 Establishing a Connection to the Database

Once we have connected to the MongoDB server, we can connect to our database. A database is a set of collections, while a collection is a set of documents. Here we will call our database "testdb", and we will have two collections: 'col1' and 'col2':

In [9]:
# Get a database object
mydb = client['testdb']

# Get first collection
mycol1 = mydb['col1']

# Get second collection
mycol2 = mydb['col2']


### 2.3 Creating and Inserting Documents

Documents in MongoDB are generally stored as JSON objects, allowing us to search for documents using the fields. MongoDB enforces a 16MB limit on document lengths, so for the most part it is not possible to store images. You can however store images as files and put the pathnames to the files in a MongoDB document.

At any rate we create three documents, and insert two of them into collection col1, and one in collection col2.

In [11]:
test_doc = {"author":"Gal Gadot", "title":"Being Wonderwoman"}
test_doc_2 = {"author":"Victor Hugo", "title":"Les Miserables"}
test_doc_3 = {"author":"Jean Valjean", 
              "title":"Javert: What's His Problem?"}

# Insert first two documents into mycol1, which is the variable
# that we use to access the "col1" collection

result = mycol1.insert_one(test_doc)
result = mycol1.insert_one(test_doc_2)

# Insert the third document into mycol2
result = mycol2.insert_one(test_doc_3)

# We can print the results
print("Result of last insert: ", result)




Result of last insert:  <pymongo.results.InsertOneResult object at 0x10b109240>


We can also call "insert_many" to insert many documents at one time.  To help in debugging we can catch the BulkWriteError exception:

In [12]:
test_doc_4 = {"title":"Sleep is good", "SaleCount":4}
test_doc_5 = {"Chapter":4, "ChapterTitle":"Reflections, reflections"}
# Can also call insert_many. 
try:
    result = mycol2.insert_many([test_doc_4, test_doc_5])
except BulkWriteError as bwe:
    # This is how we debug
    print(bwe.details)


### 2.4 Querying the Database

Now let's look at how we can query the database. We can use "count_documents" to see if a record exists. If count_documents returns a 0 then the record does not exist:

In [13]:
num = mycol1.count_documents({ictor Hugo""author":"V})
if num == 0:
    print("Victor Hugo did not write any books! The bum!")
else:
    print("Victor Hugo only wrote %d book(s). The bum!" % num)


Victor Hugo only wrote 3 book(s). The bum!


To search the collection we can use the "find" function:

In [14]:
print("All collections:", mydb.list_collection_names())

# Search for all works by Gal Gadot:
results = mycol1.find({"author":"Gal Gadot"})

print("Result of Gal Gadot Query:")

for result in results:
    print(result)
    
# Alternatively we can search for one record:
result = mycol2.find_one({"author":"Jean Valjean"})
print("Search for Valjean:", result)    


All collections: ['col2', 'col1', 'user']
Result of Gal Gadot Query:
{'_id': ObjectId('60f0d822afe96feaa04a0043'), 'author': 'Gal Gadot', 'title': 'Being Wonderwoman'}
{'_id': ObjectId('60f0f01cafe96feaa04a0049'), 'author': 'Gal Gadot', 'title': 'Being Wonderwoman'}
{'_id': ObjectId('60f0f02eafe96feaa04a004c'), 'author': 'Gal Gadot', 'title': 'Being Wonderwoman'}
Search for Valjean: {'_id': ObjectId('60f0d822afe96feaa04a0045'), 'author': 'Jean Valjean', 'title': "Javert: What's His Problem?"}


We can of course do better in printing our records:


In [15]:
print("Author= %s, Title= %s" % (result['author'], result['title']))

Author= Jean Valjean, Title= Javert: What's His Problem?


### 2.5 Updating Records

In addition to creating and searching for records, we also want to update records, and for this we use the "$set" operator and the "update_one" operation. The "update_one" operation updates the first record matching the query that we provide.

Here we change the author "Gal Gadot" to "Diana Prince". For a full view of how to do this, we first use count_documents to ensure that the record exists,  and then update it.

(Here we use update_one to update one record. You can also use update_many to bulk update all records that match the query)

---

Note: You can also use "find_one" and check whether the return result is None. You cannot use "find" as it always returns a Cursor object even if no record exists, and there's no convenient way of measuring the number of items in the Cursor

---



In [16]:
query = {"author":"Gal Gadot"}

# Check that the author Gal Gadot exists
if mycol1.count_documents(query) > 0:
    mycol1.update_one(query, {"$set":{"author":"Diana Prince"}})
    
    result = mycol1.find_one({"author":"Diana Prince"})
    print("Result of searching for Diana Prince: ", result)
    
else:
    print("Author does not exist")
        


Result of searching for Diana Prince:  {'_id': ObjectId('60f0d822afe96feaa04a0043'), 'author': 'Diana Prince', 'title': 'Being Wonderwoman'}


### 2.6 Delete Records and Collections

Finally we can also delete records and collections. We use "delete_one' to delete the first record that matches the query, and "delete_many" to delete all records that match the query.

We cal also use .drop() to delete the entire collection (!!). 

In [17]:

# We can delete records
query = {"author":"Jean Valjean"}
mycol2.delete_one(query)

print("\nDeleted all Valjean")
for result in mycol1.find(query):
    print(result)
    
print("\nDropping collections")
mycol1.drop()
mycol2.drop()

results = mycol1.find({"author":"Diana Prince"})

for result in results:
    print(result)


Deleted all Valjean

Dropping collections
