# MongoDB PyMongo [Tutorial](http://api.mongodb.com/python/current/tutorial.html), [Back to Basics](https://www.mongodb.com/presentations/back-to-basics-introduction-to-mongodb) and [Beyond the Basics](https://www.mongodb.com/presentations/beyond-the-basics-1)

## Modules

In [1]:
import pymongo
from pymongo import MongoClient
import datetime
import pprint
from bson.objectid import ObjectId

## Creating connection, database and collection

In [2]:
client = MongoClient('mongodb://localhost:27017/')
db = client['test-database']
collection = db['test-collection']

## Documents

Data in MongoDB is represented (and stored) using JSON-style documents. In PyMongo we use dictionaries to represent documents. As an example, the following dictionary might be used to represent a blog post:

In [3]:
post = {"author": "Mike",
         "text": "My first blog post!",
         "tags": ["mongodb", "python", "pymongo"],
         "date": datetime.datetime.utcnow()}

### Inserting a Document

In [4]:
posts = db.posts
post_id = posts.insert_one(post).inserted_id
post_id

ObjectId('5931570f89df412dd8d605d6')

When a document is inserted a special key, *"_id"*, is automatically added if the document doesn’t already contain an *"_id"* key.

After inserting the first document, the posts collection has actually been created on the server. We can verify this by listing all of the collections in our database:

In [42]:
db.collection_names(include_system_collections=False)

['posts', 'profiles']

### Updating a Document

In [87]:
db.posts.update_many({"author":"Mike"},
                {"$set": {"tags":["new updated tag", "mongodb", "python", "pymongo"]}})

<pymongo.results.UpdateResult at 0x47c6c10>

The **$push** operator appends a specified value to an array.

In [89]:
db.posts.update_many({"author":"Mike"},
                {"$push": {"tags":"last tag"}})

<pymongo.results.UpdateResult at 0x47c6968>

In [90]:
db.posts.find_one({"author": "Mike"})

{'_id': ObjectId('5931570f89df412dd8d605d6'),
 'author': 'Mike',
 'date': datetime.datetime(2017, 6, 2, 12, 16, 14, 522000),
 'tags': ['new updated tag', 'mongodb', 'python', 'pymongo', 'last tag'],
 'text': 'My first blog post!'}

### Getting a Single Document With find_one()

The most basic type of query that can be performed in MongoDB is find_one(). This method returns a single document matching a query (or None if there are no matches). It is useful when you know there is only one matching document, or are only interested in the first match.

In [25]:
posts.find_one()

{'_id': ObjectId('5931570f89df412dd8d605d6'),
 'author': 'Mike',
 'date': datetime.datetime(2017, 6, 2, 12, 16, 14, 522000),
 'tags': ['mongodb', 'python', 'pymongo'],
 'text': 'My first blog post!'}

In [26]:
posts.find_one({"author": "Mike"})

{'_id': ObjectId('5931570f89df412dd8d605d6'),
 'author': 'Mike',
 'date': datetime.datetime(2017, 6, 2, 12, 16, 14, 522000),
 'tags': ['mongodb', 'python', 'pymongo'],
 'text': 'My first blog post!'}

In [27]:
posts.find_one({"author": "Eliot"}) # No result

{'_id': ObjectId('5931571789df412dd8d605d8'),
 'author': 'Eliot',
 'date': datetime.datetime(2009, 11, 10, 10, 45),
 'text': 'and pretty easy too!',
 'title': 'MongoDB is fun'}

#### Projection
Use $ in the [projection](https://docs.mongodb.com/manual/reference/glossary/#term-projection) document of the find() method or the findOne() method when you only need one particular array element in selected documents.

In [40]:
posts.find_one( {}, { "_id" : 1, "author" : 1, "tags" : 1 } )

{'_id': ObjectId('5931570f89df412dd8d605d6'),
 'author': 'Mike',
 'tags': ['mongodb', 'python', 'pymongo']}

### Querying By ObjectId

In [29]:
posts.find_one({"_id": post_id})

{'_id': ObjectId('5931570f89df412dd8d605d6'),
 'author': 'Mike',
 'date': datetime.datetime(2017, 6, 2, 12, 16, 14, 522000),
 'tags': ['mongodb', 'python', 'pymongo'],
 'text': 'My first blog post!'}

ObjectId is not the same as its string representation:

In [10]:
post_id_as_str = str(post_id)
posts.find_one({"_id": post_id_as_str}) # No result

In [30]:
print(post_id)
print(post_id_as_str)

5931570f89df412dd8d605d6
5931570f89df412dd8d605d6


A common task in web applications is to get an ObjectId from the request URL and find the matching document. It’s necessary in this case to **convert the ObjectId from a string** before passing it to *find_one*:

In [12]:
from bson.objectid import ObjectId

# The web framework gets post_id from the URL and passes it as a string
def get(post_id):
    # Convert from string to ObjectId:
    document = client.db.collection.find_one({'_id': ObjectId(post_id)})

### Bulk Inserts

In addition to inserting a single document, we can also perform bulk insert operations, by passing a list as the first argument to *insert_many()*.

In [13]:
new_posts = [{"author": "Mike",
              "text": "Another post!",
              "tags": ["bulk", "insert"],
              "date": datetime.datetime(2009, 11, 12, 11, 14)},
             {"author": "Eliot",
              "title": "MongoDB is fun",
              "text": "and pretty easy too!",
              "date": datetime.datetime(2009, 11, 10, 10, 45)}]
result = posts.insert_many(new_posts)
result.inserted_ids

[ObjectId('5931571789df412dd8d605d7'), ObjectId('5931571789df412dd8d605d8')]

There are a couple of interesting things to note about this example:
- The result from *insert_many()* now returns two ObjectId instances, one for each inserted document.
- *new_posts[1]* has a different “shape” than the other posts - there is no *"tags"* field and we’ve added a new field, *"title"*. This is what we mean when we say that MongoDB is *schema-free*.

### Querying for More Than One Document

To get more than a single document as the result of a query we use the find() method. find() returns a Cursor instance, which allows us to iterate over all matching documents:

In [33]:
for post in posts.find():
    pprint.pprint(post)

{'_id': ObjectId('5931570f89df412dd8d605d6'),
 'author': 'Mike',
 'date': datetime.datetime(2017, 6, 2, 12, 16, 14, 522000),
 'tags': ['mongodb', 'python', 'pymongo'],
 'text': 'My first blog post!'}
{'_id': ObjectId('5931571789df412dd8d605d7'),
 'author': 'Mike',
 'date': datetime.datetime(2009, 11, 12, 11, 14),
 'tags': ['bulk', 'insert'],
 'text': 'Another post!'}
{'_id': ObjectId('5931571789df412dd8d605d8'),
 'author': 'Eliot',
 'date': datetime.datetime(2009, 11, 10, 10, 45),
 'text': 'and pretty easy too!',
 'title': 'MongoDB is fun'}


In [15]:
for post in posts.find({"author": "Mike"}):
    pprint.pprint(post)

{'_id': ObjectId('5931570f89df412dd8d605d6'),
 'author': 'Mike',
 'date': datetime.datetime(2017, 6, 2, 12, 16, 14, 522000),
 'tags': ['mongodb', 'python', 'pymongo'],
 'text': 'My first blog post!'}
{'_id': ObjectId('5931571789df412dd8d605d7'),
 'author': 'Mike',
 'date': datetime.datetime(2009, 11, 12, 11, 14),
 'tags': ['bulk', 'insert'],
 'text': 'Another post!'}


### Counting

In [16]:
posts.count()

3

In [17]:
posts.find({"author": "Mike"}).count()

2

### Range Queries

MongoDB supports many different types of [advanced queries](http://www.mongodb.org/display/DOCS/Advanced+Queries). As an example, lets perform a query where we limit results to posts older than a certain date, but also sort the results by author:

In [18]:
d = datetime.datetime(2009, 11, 12, 12)

# Here we use the special "$lt" operator to do a range query, and also call sort() to sort the results by author.
for post in posts.find({"date": {"$lt": d}}).sort("author"):
    pprint.pprint(post)

{'_id': ObjectId('5931571789df412dd8d605d8'),
 'author': 'Eliot',
 'date': datetime.datetime(2009, 11, 10, 10, 45),
 'text': 'and pretty easy too!',
 'title': 'MongoDB is fun'}
{'_id': ObjectId('5931571789df412dd8d605d7'),
 'author': 'Mike',
 'date': datetime.datetime(2009, 11, 12, 11, 14),
 'tags': ['bulk', 'insert'],
 'text': 'Another post!'}


### Indexing

Adding [indexes](https://docs.mongodb.com/manual/indexes/) can help accelerate certain queries and can also add additional functionality to querying and storing documents. In this example, we’ll demonstrate how to create a [unique index](http://docs.mongodb.org/manual/core/index-unique/) on a key that rejects documents whose value for that key already exists in the index.

First, we’ll need to create the index:

In [19]:
result = db.profiles.create_index([('user_id', pymongo.ASCENDING)], unique=True)

sorted(list(db.profiles.index_information()))

['_id_', 'user_id_1']

Notice that we have two indexes now: one is the index on *_id* that MongoDB creates automatically, and the other is the index on *user_id* we just created.

In [20]:
user_profiles = [
    {'user_id': 211, 'name': 'Luke'},
    {'user_id': 212, 'name': 'Ziltoid'}]

result = db.profiles.insert_many(user_profiles)

The index prevents us from inserting a document whose *user_id* is already in the collection:

In [21]:
new_profile = {'user_id': 213, 'name': 'Drew'}
duplicate_profile = {'user_id': 212, 'name': 'Tommy'}
result = db.profiles.insert_one(new_profile)  # This is fine.
result = db.profiles.insert_one(duplicate_profile) # This will DuplicateKeyError

DuplicateKeyError: E11000 duplicate key error collection: test-database.profiles index: user_id_1 dup key: { : 212 }

### Deleting documents/collections

In [111]:
# Delete documents
db.posts.delete_many({})

<pymongo.results.DeleteResult at 0x49495a8>

In [34]:
posts.find_one()

{'_id': ObjectId('5931570f89df412dd8d605d6'),
 'author': 'Mike',
 'date': datetime.datetime(2017, 6, 2, 12, 16, 14, 522000),
 'tags': ['mongodb', 'python', 'pymongo'],
 'text': 'My first blog post!'}

In [113]:
db.collection_names(include_system_collections=False) # at the moment we have 2 collections

['posts', 'profiles']

In [114]:
# Delete collection
db.profiles.drop()

In [115]:
db.collection_names(include_system_collections=False)

['posts']

## Full Text Indexes

### Hence

### Using Weights

### $textscore

### Other parameters

## Geospatial Indexes

- Coordinates in MongoDB are stored on Longitude/Latitude order
- Coordinates in Google are stored in Latitude/Longitude order

### Coordinates

### Importing datasets

Datasets: [neighborhoods](https://raw.githubusercontent.com/mongodb/docs-assets/geospatial/neighborhoods.json) and [restaurants](https://raw.githubusercontent.com/mongodb/docs-assets/geospatial/restaurants.json)

Execute in command line (having bin's MongoDB directory in path):
~~~
mongoimport -c neighborhoods -d geo neighborhoods.json
mongoimport -c restaurants -d geo restaurants.json
~~~