# MongoDB PyMongo [Tutorial](http://api.mongodb.com/python/current/tutorial.html), [Back to Basics](https://www.mongodb.com/presentations/back-to-basics-introduction-to-mongodb) and [Beyond the Basics](https://www.mongodb.com/presentations/beyond-the-basics-1)

## Modules

In [1]:
import pymongo
from pymongo import MongoClient
import datetime
import pprint
from bson.objectid import ObjectId
from bson.son import SON

## Creating connection, database and collection variables

In [2]:
client = MongoClient("mongodb://localhost:27017/")
db = client["test-database"]
posts = db["posts"] # collection

## Documents

Data in MongoDB is represented (and stored) using JSON-style documents. In PyMongo we use dictionaries to represent documents. As an example, the following dictionary might be used to represent a blog post:

In [3]:
post = {"author": "Mike",
         "text": "My first blog post!",
         "tags": ["mongodb", "python", "pymongo"],
         "date": datetime.datetime.utcnow()}

### Inserting a Document

In [4]:
post_id = posts.insert_one(post).inserted_id
post_id

ObjectId('594b85dc5069ef619c87206a')

When a document is inserted a special key, *"_id"*, is automatically added if the document doesn’t already contain an *"_id"* key.

After inserting the first document, the posts collection has actually been created on the server. We can verify this by listing all of the collections in our database:

In [5]:
db.collection_names(include_system_collections=False)

['posts']

### Bulk Inserts

In addition to inserting a single document, we can also perform bulk insert operations, by passing a list as the first argument to *insert_many()*.

In [6]:
new_posts = [{"author": "Mike",
              "text": "Another post!",
              "tags": ["bulk", "insert"],
              "date": datetime.datetime(2009, 11, 12, 11, 14)},
             {"author": "Eliot",
              "title": "MongoDB is fun",
              "text": "and pretty easy too!",
              "date": datetime.datetime(2009, 11, 10, 10, 45)}]
result = posts.insert_many(new_posts)
result.inserted_ids

[ObjectId('594b85ea5069ef619c87206b'), ObjectId('594b85ea5069ef619c87206c')]

There are a couple of interesting things to note about this example:
- The result from *insert_many()* now returns two ObjectId instances, one for each inserted document.
- *new_posts[1]* has a different “shape” than the other posts - there is no *"tags"* field and we’ve added a new field, *"title"*. This is what we mean when we say that MongoDB is *schema-free*.

### Updating a Document

In [7]:
db.posts.update_many({"author":"Mike"},
                {"$set": {"tags":["new updated tag", "mongodb", "python", "pymongo"]}})

<pymongo.results.UpdateResult at 0x109edc8>

The **$push** operator appends a specified value to an array.

In [8]:
db.posts.update_many({"author":"Mike"},
                {"$push": {"tags":"last tag"}})

<pymongo.results.UpdateResult at 0x10ae418>

In [9]:
db.posts.find_one({"author": "Mike"})

{'_id': ObjectId('59459eba5069ef52a894c794'),
 'author': 'Mike',
 'date': datetime.datetime(2017, 6, 17, 21, 27, 7, 907000),
 'tags': ['new updated tag', 'mongodb', 'python', 'pymongo', 'last tag'],
 'text': 'My first blog post!'}

## Querying

### Getting a Single Document With find_one()

The most basic type of query that can be performed in MongoDB is find_one(). This method returns a single document matching a query (or None if there are no matches). It is useful when you know there is only one matching document, or are only interested in the first match.

In [10]:
posts.find_one()

{'_id': ObjectId('59459eba5069ef52a894c794'),
 'author': 'Mike',
 'date': datetime.datetime(2017, 6, 17, 21, 27, 7, 907000),
 'tags': ['new updated tag', 'mongodb', 'python', 'pymongo', 'last tag'],
 'text': 'My first blog post!'}

In [11]:
posts.find_one({"author": "Mike"})

{'_id': ObjectId('59459eba5069ef52a894c794'),
 'author': 'Mike',
 'date': datetime.datetime(2017, 6, 17, 21, 27, 7, 907000),
 'tags': ['new updated tag', 'mongodb', 'python', 'pymongo', 'last tag'],
 'text': 'My first blog post!'}

In [12]:
posts.find_one({"author": "Eliot"}) # No result

{'_id': ObjectId('594b85ea5069ef619c87206c'),
 'author': 'Eliot',
 'date': datetime.datetime(2009, 11, 10, 10, 45),
 'text': 'and pretty easy too!',
 'title': 'MongoDB is fun'}

#### Projection
Use $ in the [projection](https://docs.mongodb.com/manual/reference/glossary/#term-projection) document of the find() method or the findOne() method when you only need one particular array element in selected documents.

In [13]:
posts.find_one( {}, { "_id" : 1, "author" : 1, "tags" : 1 } )

{'_id': ObjectId('59459eba5069ef52a894c794'),
 'author': 'Mike',
 'tags': ['new updated tag', 'mongodb', 'python', 'pymongo', 'last tag']}

### Querying By ObjectId

In [14]:
posts.find_one({"_id": post_id})

{'_id': ObjectId('594b85dc5069ef619c87206a'),
 'author': 'Mike',
 'date': datetime.datetime(2017, 6, 22, 8, 54, 51, 590000),
 'tags': ['new updated tag', 'mongodb', 'python', 'pymongo', 'last tag'],
 'text': 'My first blog post!'}

*ObjectId* is not the same as its string representation:

In [15]:
post_id_as_str = str(post_id)
posts.find_one({"_id": post_id_as_str}) # No result

Even we see the same output for both strings:

In [16]:
print(post_id)
print(post_id_as_str)

594b85dc5069ef619c87206a
594b85dc5069ef619c87206a


A common task in web applications is to get an ObjectId from the request URL and find the matching document. It’s necessary in this case to **convert the ObjectId from a string** before passing it to *find_one*:

In [17]:
# The web framework gets post_id from the URL and passes it as a string
def get(post_id):
    # Convert from string to ObjectId:
    document = posts.find_one({'_id': ObjectId(post_id)})
    return document

In [18]:
get("594b85dc5069ef619c87206a") # Now it returns object

{'_id': ObjectId('594b85dc5069ef619c87206a'),
 'author': 'Mike',
 'date': datetime.datetime(2017, 6, 22, 8, 54, 51, 590000),
 'tags': ['new updated tag', 'mongodb', 'python', 'pymongo', 'last tag'],
 'text': 'My first blog post!'}

### Querying for More Than One Document

To get more than a single document as the result of a query we use the find() method. find() returns a Cursor instance, which allows us to iterate over all matching documents:

In [19]:
for post in posts.find():
    pprint.pprint(post)

{'_id': ObjectId('59459eba5069ef52a894c794'),
 'author': 'Mike',
 'date': datetime.datetime(2017, 6, 17, 21, 27, 7, 907000),
 'tags': ['new updated tag', 'mongodb', 'python', 'pymongo', 'last tag'],
 'text': 'My first blog post!'}
{'_id': ObjectId('594b85dc5069ef619c87206a'),
 'author': 'Mike',
 'date': datetime.datetime(2017, 6, 22, 8, 54, 51, 590000),
 'tags': ['new updated tag', 'mongodb', 'python', 'pymongo', 'last tag'],
 'text': 'My first blog post!'}
{'_id': ObjectId('594b85ea5069ef619c87206b'),
 'author': 'Mike',
 'date': datetime.datetime(2009, 11, 12, 11, 14),
 'tags': ['new updated tag', 'mongodb', 'python', 'pymongo', 'last tag'],
 'text': 'Another post!'}
{'_id': ObjectId('594b85ea5069ef619c87206c'),
 'author': 'Eliot',
 'date': datetime.datetime(2009, 11, 10, 10, 45),
 'text': 'and pretty easy too!',
 'title': 'MongoDB is fun'}


In [20]:
for post in posts.find({"author": "Mike"}):
    pprint.pprint(post)

{'_id': ObjectId('59459eba5069ef52a894c794'),
 'author': 'Mike',
 'date': datetime.datetime(2017, 6, 17, 21, 27, 7, 907000),
 'tags': ['new updated tag', 'mongodb', 'python', 'pymongo', 'last tag'],
 'text': 'My first blog post!'}
{'_id': ObjectId('594b85dc5069ef619c87206a'),
 'author': 'Mike',
 'date': datetime.datetime(2017, 6, 22, 8, 54, 51, 590000),
 'tags': ['new updated tag', 'mongodb', 'python', 'pymongo', 'last tag'],
 'text': 'My first blog post!'}
{'_id': ObjectId('594b85ea5069ef619c87206b'),
 'author': 'Mike',
 'date': datetime.datetime(2009, 11, 12, 11, 14),
 'tags': ['new updated tag', 'mongodb', 'python', 'pymongo', 'last tag'],
 'text': 'Another post!'}


### Counting

In [21]:
posts.count()

4

In [22]:
posts.find({"author": "Mike"}).count()

3

### Range Queries

MongoDB supports many different types of [advanced queries](http://www.mongodb.org/display/DOCS/Advanced+Queries). As an example, lets perform a query where we limit results to posts older than a certain date, but also sort the results by author.

Here we use the special *$lt* operator to do a range query, and also call *sort()* to sort the results by author:


In [23]:
d = datetime.datetime(2009, 11, 12, 12)

for post in posts.find({"date": {"$lt": d}}).sort("author"):
    pprint.pprint(post)

{'_id': ObjectId('594b85ea5069ef619c87206c'),
 'author': 'Eliot',
 'date': datetime.datetime(2009, 11, 10, 10, 45),
 'text': 'and pretty easy too!',
 'title': 'MongoDB is fun'}
{'_id': ObjectId('594b85ea5069ef619c87206b'),
 'author': 'Mike',
 'date': datetime.datetime(2009, 11, 12, 11, 14),
 'tags': ['new updated tag', 'mongodb', 'python', 'pymongo', 'last tag'],
 'text': 'Another post!'}


### Explain plan

Provides information on the query plan for the *db.collection.find()* method.

In [24]:
posts.find({"date": {"$lt": d}}).sort("author").explain()

{'executionStats': {'allPlansExecution': [],
  'executionStages': {'advanced': 2,
   'executionTimeMillisEstimate': 0,
   'inputStage': {'advanced': 2,
    'executionTimeMillisEstimate': 0,
    'inputStage': {'advanced': 2,
     'direction': 'forward',
     'docsExamined': 4,
     'executionTimeMillisEstimate': 0,
     'filter': {'date': {'$lt': datetime.datetime(2009, 11, 12, 12, 0)}},
     'invalidates': 0,
     'isEOF': 1,
     'nReturned': 2,
     'needTime': 3,
     'needYield': 0,
     'restoreState': 0,
     'saveState': 0,
     'stage': 'COLLSCAN',
     'works': 6},
    'invalidates': 0,
    'isEOF': 1,
    'nReturned': 2,
    'needTime': 4,
    'needYield': 0,
    'restoreState': 0,
    'saveState': 0,
    'stage': 'SORT_KEY_GENERATOR',
    'works': 7},
   'invalidates': 0,
   'isEOF': 1,
   'memLimit': 33554432,
   'memUsage': 298,
   'nReturned': 2,
   'needTime': 7,
   'needYield': 0,
   'restoreState': 0,
   'saveState': 0,
   'sortPattern': {'author': 1},
   'stage': 'SOR

In [25]:
# see only the executionStats
posts.find({"date": {"$lt": d}}).sort("author").explain()['executionStats']

{'allPlansExecution': [],
 'executionStages': {'advanced': 2,
  'executionTimeMillisEstimate': 0,
  'inputStage': {'advanced': 2,
   'executionTimeMillisEstimate': 0,
   'inputStage': {'advanced': 2,
    'direction': 'forward',
    'docsExamined': 4,
    'executionTimeMillisEstimate': 0,
    'filter': {'date': {'$lt': datetime.datetime(2009, 11, 12, 12, 0)}},
    'invalidates': 0,
    'isEOF': 1,
    'nReturned': 2,
    'needTime': 3,
    'needYield': 0,
    'restoreState': 0,
    'saveState': 0,
    'stage': 'COLLSCAN',
    'works': 6},
   'invalidates': 0,
   'isEOF': 1,
   'nReturned': 2,
   'needTime': 4,
   'needYield': 0,
   'restoreState': 0,
   'saveState': 0,
   'stage': 'SORT_KEY_GENERATOR',
   'works': 7},
  'invalidates': 0,
  'isEOF': 1,
  'memLimit': 33554432,
  'memUsage': 298,
  'nReturned': 2,
  'needTime': 7,
  'needYield': 0,
  'restoreState': 0,
  'saveState': 0,
  'sortPattern': {'author': 1},
  'stage': 'SORT',
  'works': 10},
 'executionSuccess': True,
 'executio

### Deleting documents/collections

Delete documents wich autor is Mike:

In [26]:
db.posts.delete_many({"author" : "Eliot"})

<pymongo.results.DeleteResult at 0x6c83418>

In [27]:
posts.find_one()

{'_id': ObjectId('59459eba5069ef52a894c794'),
 'author': 'Mike',
 'date': datetime.datetime(2017, 6, 17, 21, 27, 7, 907000),
 'tags': ['new updated tag', 'mongodb', 'python', 'pymongo', 'last tag'],
 'text': 'My first blog post!'}

Delete all documents:

In [28]:
db.posts.delete_many({})

<pymongo.results.DeleteResult at 0x6c83620>

In [29]:
posts.find_one() # no result

In [30]:
# at the moment we have posts collection
db.collection_names(include_system_collections=False)

['posts']

In [31]:
# Delete posts collection
db.posts.drop()

In [32]:
# now we don't have posts collection
db.collection_names(include_system_collections=False)

[]

### Indexing

Adding [indexes](https://docs.mongodb.com/manual/indexes/) can help accelerate certain queries and can also add additional functionality to querying and storing documents. In this example, we’ll demonstrate how to create a [unique index](http://docs.mongodb.org/manual/core/index-unique/) on a key that rejects documents whose value for that key already exists in the index.

First, we’ll need to create the index:

In [33]:
result = db.profiles.create_index([("user_id", pymongo.ASCENDING)], unique=True)

Now we can see the indexes:

In [34]:
db.profiles.index_information()

{'_id_': {'key': [('_id', 1)], 'ns': 'test-database.profiles', 'v': 2},
 'user_id_1': {'key': [('user_id', 1)],
  'ns': 'test-database.profiles',
  'unique': True,
  'v': 2}}

Showing only IDs of indexes:

In [35]:
sorted(list(db.profiles.index_information()))

['_id_', 'user_id_1']

Notice that we have two indexes now: one is the index on *_id* that MongoDB creates automatically, and the other is the index on *user_id* we just created.

In [36]:
user_profiles = [
    {"user_id": 211, "name": "Luke"},
    {"user_id": 212, "name": "Ziltoid"}]

result = db.profiles.insert_many(user_profiles)

The index prevents us from inserting a document whose *user_id* is already in the collection:

In [37]:
new_profile = {"user_id": 213, "name": "Drew"}
duplicate_profile = {"user_id": 212, "name": "Tommy"}

In [38]:
result = db.profiles.insert_one(new_profile)  # This is fine.

In [39]:
result = db.profiles.insert_one(duplicate_profile) # This will DuplicateKeyError

DuplicateKeyError: E11000 duplicate key error collection: test-database.profiles index: user_id_1 dup key: { : 212 }

Now we will **drop** the indexes:

In [40]:
db.profiles.drop_indexes()

Now we don't see *user_id_1* index

In [41]:
sorted(list(db.profiles.index_information()))

['_id_']

## Geospatial Indexes

### Coordinates

- Coordinates in MongoDB are stored on Longitude/Latitude order
- Coordinates in Google are stored in Latitude/Longitude order

### Connecting to another database

In [42]:
client = MongoClient("mongodb://localhost:27017/")
dbgeo = client["geo"]
polygons = dbgeo["polygons"] # collection

### Inserting test case and creating geospatial index

In [43]:
# poly1 will be our reference polygon
poly1 = { "type" : "Polygon", "coordinates" : [[[0, 0], [3, 0], [0, 3], [0, 0]]] }

# poly2 is a smaller triangle inside poly1
poly2 = { "type" : "Polygon", "coordinates" : [[[1, 1], [2, 1], [1, 2], [1, 1]]] }
# poly3 is poly1 flipped around its "vertical" edge, then bumped over one unit
# so it intersects poly1 but is not contained in it
poly3 = { "type" : "Polygon", "coordinates" : [[[1, 0], [-2, 0], [1, 3], [1, 0]]] }

In [44]:
# as poly1 is our reference we will only insert poly2 and poly3
dbgeo.polygons.insert_one({ "loc" : poly2 })
dbgeo.polygons.insert_one({ "loc" : poly3 })

<pymongo.results.InsertOneResult at 0xe04dc8>

In [45]:
# Create index
dbgeo.polygons.create_index([("loc", "2dsphere")])

'loc_2dsphere'

### Importing, loading datasets and creating geospatial index

Datasets: [neighborhoods](https://raw.githubusercontent.com/mongodb/docs-assets/geospatial/neighborhoods.json) and [restaurants](https://raw.githubusercontent.com/mongodb/docs-assets/geospatial/restaurants.json)

Execute in **command line** (remember to have MongoDB *bin* directory in path):
~~~
mongoimport -c neighborhoods -d geo neighborhoods.json
mongoimport -c restaurants -d geo restaurants.json
~~~

#### Loading datasets:

In [46]:
neighborhoods = dbgeo["neighborhoods"]
restaurants = dbgeo["restaurants"]

#### Creating geospatial indexes:

In order to create the indexes, first must see both dataset fields in order to see which field to index:

In [47]:
dbgeo.neighborhoods.find_one() # field: geometry, type: Polygon

{'_id': ObjectId('55cb9c666c522cafdb053a1a'),
 'geometry': {'coordinates': [[[-73.94193078816193, 40.70072523469547],
    [-73.9443878859649, 40.70042452378256],
    [-73.94424286147482, 40.69969927964773],
    [-73.94409591260093, 40.69897295461309],
    [-73.94394947271304, 40.69822127983908],
    [-73.94391750192877, 40.69805620211356],
    [-73.94380383211836, 40.697469265449826],
    [-73.94378455587042, 40.6973697290538],
    [-73.94374306706803, 40.69715549995503],
    [-73.9437245356891, 40.697059812179496],
    [-73.94368427322361, 40.696851909818065],
    [-73.9436842703752, 40.69685189440415],
    [-73.94363806934868, 40.69661331854307],
    [-73.94362121369004, 40.696526279661654],
    [-73.9435563415296, 40.69619128295102],
    [-73.94354024149403, 40.6961081421151],
    [-73.94352527471477, 40.69603085523812],
    [-73.94338802084431, 40.69528899051899],
    [-73.943242490861, 40.694557485733355],
    [-73.94312826743185, 40.693967038330925],
    [-73.94311427813774, 40.6

In [48]:
dbgeo.restaurants.find_one() # field: location, type: Point

{'_id': ObjectId('55cba2476c522cafdb053add'),
 'location': {'coordinates': [-73.856077, 40.848447], 'type': 'Point'},
 'name': 'Morris Park Bake Shop'}

Now we can create both indexes:

In [49]:
dbgeo.neighborhoods.create_index([("geometry", "2dsphere")])

'geometry_2dsphere'

In [50]:
dbgeo.restaurants.create_index([("location", "2dsphere")])

'location_2dsphere'

### [Geospatial Query Operators](https://docs.mongodb.com/manual/reference/operator/query-geospatial/)

Query operators provide ways to locate data within the database and projection operators modify how data is presented.

*Note: In order to check geometries better use tools like [geodndmap](https://github.com/rueckstiess/geodndmap) ([webapp version](https://s3.amazonaws.com/geodndmap/index.html)). [This](https://jira.mongodb.org/browse/SERVER-24549?focusedCommentId=1293398&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-1293398) is a good example of why use it, reported in MongoDB's Jira.*

#### [\$geoWithin](https://docs.mongodb.com/manual/reference/operator/query/geoWithin/#op._S_geoWithin)

Selects documents with geospatial data that exists entirely within a specified shape. When determining inclusion, MongoDB considers the border of a shape to be part of the shape, subject to the precision of floating point numbers.

- Find wich polygon fits inside polygon 1:

In [51]:
within = dbgeo.polygons.find(
   {
     "loc": { "$geoWithin": { "$geometry": poly1 } }
   }
)
for doc in within:
    pprint.pprint(doc) # only 1 result: poly2

{'_id': ObjectId('594b87cd5069ef619c872072'),
 'loc': {'coordinates': [[[1, 1], [2, 1], [1, 2], [1, 1]]], 'type': 'Polygon'}}


- Find all restaurants within 4 km from [-73.9, 40.634] coordinates:

In [53]:
rests4km = dbgeo.restaurants.find({
     "location": {
         "$geoWithin": {
             "$centerSphere": [ [ -73.9, 40.634 ], 4 / 63781.1 ]
                                 # 4 Km divided by 63781.1 Km (radius of earth)
                         }
                 }
})

[\$centerSphere](https://docs.mongodb.com/manual/reference/operator/query/centerSphere/) needs as parameters the coordinates and the circle’s radius measured in radians. To calculate radians, see [Calculate Distance Using Spherical Geometry](https://docs.mongodb.com/manual/tutorial/calculate-distances-using-spherical-geometry-with-2d-geospatial-indexes/).

In [54]:
rests4km.count() # 5 restaurants found

5

In [55]:
for doc in rests4km:
    pprint.pprint(doc)

{'_id': ObjectId('55cba2476c522cafdb0569de'),
 'location': {'coordinates': [-73.9022305, 40.6339824], 'type': 'Point'},
 'name': 'Double Dragon'}
{'_id': ObjectId('55cba2476c522cafdb0568ce'),
 'location': {'coordinates': [-73.899778, 40.6360412], 'type': 'Point'},
 'name': 'Yi Hong Restaurant'}
{'_id': ObjectId('55cba2476c522cafdb0562f4'),
 'location': {'coordinates': [-73.89863749999999, 40.6362343], 'type': 'Point'},
 'name': 'Happy Taco'}
{'_id': ObjectId('55cba2476c522cafdb057ec4'),
 'location': {'coordinates': [-73.899084, 40.635921], 'type': 'Point'},
 'name': "Richard'S Diner & Catering"}
{'_id': ObjectId('55cba2476c522cafdb0586f7'),
 'location': {'coordinates': [-73.89760249999999, 40.636928], 'type': 'Point'},
 'name': 'Citadelle Bar And Restaurant'}


#### [\$geoIntersects](https://docs.mongodb.com/manual/reference/operator/query/geoIntersects/#op._S_geoIntersects)

Selects documents whose geospatial data intersects with a specified GeoJSON object.

- Find wich polygon intersect with polygon 1:

In [56]:
intersects = dbgeo.polygons.find(
   {
     "loc": { "$geoIntersects": { "$geometry": poly1 } }
   }
)
for doc in intersects:
    pprint.pprint(doc) # 2 results: poly1 and poly3

{'_id': ObjectId('594b87cd5069ef619c872073'),
 'loc': {'coordinates': [[[1, 0], [-2, 0], [1, 3], [1, 0]]], 'type': 'Polygon'}}
{'_id': ObjectId('594b87cd5069ef619c872072'),
 'loc': {'coordinates': [[[1, 1], [2, 1], [1, 2], [1, 1]]], 'type': 'Polygon'}}


#### [\$near](https://docs.mongodb.com/manual/reference/operator/query/near/#op._S_near)

Specifies a point for which a geospatial query returns the documents from nearest to farthest. The \$near operator can specify either a [GeoJSON](https://docs.mongodb.com/manual/reference/glossary/#term-geojson) point or legacy coordinate point. **\$near** requires a geospatial index:
- **2dsphere** index if specifying a GeoJSON point
- **2d** index if specifying a point using legacy coordinates.

In this example we will look near *Morris Park Bake Shop* (-73.856077, 40.848447), looking for coordinates [ -73.85607**9**, 40.84844**9** ] and limiting the results to those documents that are at most 150m from the center point.

In [57]:
near = dbgeo.restaurants.find(
   {
     "location": { "$near": 
                      { "coordinates": [ -73.856079, 40.848449 ] }
                     , "$maxDistance": 150
                 }
   }
)

In [58]:
near.count() # 13 restaurants found

13

As we expected *Morris Park Bake Shop* is the nearest one:

In [59]:
for doc in near.limit(5): # limit output
    pprint.pprint(doc)

{'_id': ObjectId('55cba2476c522cafdb053add'),
 'location': {'coordinates': [-73.856077, 40.848447], 'type': 'Point'},
 'name': 'Morris Park Bake Shop'}
{'_id': ObjectId('55cba2476c522cafdb056c87'),
 'location': {'coordinates': [-73.8560691, 40.8483622], 'type': 'Point'},
 'name': "Luciano'S Pizza"}
{'_id': ObjectId('55cba2486c522cafdb05984f'),
 'location': {'coordinates': [-73.8559476, 40.8479438], 'type': 'Point'},
 'name': 'La Masa Restaurant'}
{'_id': ObjectId('55cba2476c522cafdb055c0d'),
 'location': {'coordinates': [-73.85517089999999, 40.8483418], 'type': 'Point'},
 'name': 'Doyles Pub'}
{'_id': ObjectId('55cba2486c522cafdb059637'),
 'location': {'coordinates': [-73.856526, 40.847648], 'type': 'Point'},
 'name': "Patsy'S Pizzeria"}


#### [\$nearSphere](https://docs.mongodb.com/manual/reference/operator/query/nearSphere/#op._S_nearSphere)

Specifies a point for which a geospatial query returns the documents from nearest to farthest. MongoDB calculates distances for **\$nearSphere** using spherical geometry. **\$nearSphere** requires a geospatial index:
- **2dsphere** index for location data defined as GeoJSON points
- **2d** index for location data defined as legacy coordinate pairs. To use a 2d index on GeoJSON points, create the index on the coordinates field of the GeoJSON object.

There is no much difference between \$near and \$nearSphere (more info [here](https://stackoverflow.com/questions/38287374/in-mongo-what-is-the-difference-between-near-and-nearsphere)). The previous example used in \$near works also with \$nearSphere.

#### Some notes about \$near and \$nearSphere

Some operations, like **\$near** and **\$nearSphere**, **[requires a geospatial index](https://docs.mongodb.com/manual/core/2dsphere/#geonear-and-geonear-restrictions)**. If we don't create a geospatial index everytime we call this operation we'll get an error like this:
~~~
OperationFailure: error processing query: ns=geo.restaurantsTree: GEONEAR  field=location maxdist=1.79769e+308 isNearSphere=0
Sort: {}
Proj: {}
 planner returned error: unable to find index for $geoNear query
~~~

In order to reproduce the problem we have to delete the index:

In [60]:
dbgeo.restaurants.drop_indexes()

Now we will see the error mentioned:

In [61]:
dbgeo.restaurants.find( { "location": { "$near": { "coordinates": [ -73.856079, 40.848449 ] } } }).count()

OperationFailure: error processing query: ns=geo.restaurantsTree: GEONEAR  field=location maxdist=1.79769e+308 isNearSphere=0
Sort: {}
Proj: {}
 planner returned error: unable to find index for $geoNear query

More info about the kind of geometry each geospatial operator will use in: [Differences Between Flat and Spherical Geometry](https://docs.mongodb.com/manual/tutorial/geospatial-tutorial/#differences-between-flat-and-spherical-geometry).

Let's fix the problem creating again the index:

In [62]:
dbgeo.restaurants.create_index([("location", "2dsphere")])

'location_2dsphere'

#### [\$geoNear (aggregation)](https://docs.mongodb.com/manual/reference/operator/aggregation/geoNear/#pipe._S_geoNear)

Outputs documents in order of nearest to farthest from a specified point.

The $geoNear aggregation is not really a Geospatial Query Operators but it has similar uses and one of his advantages is the ability to **calculate the distance from two points**. The option that allow us to calculate the distance is:
- **distanceField**: The output field that contains the calculated distance. To specify a field within an embedded document, use dot notation.


In the next example we will locate the nearest restaurant from a random point (-74.4, 40.4):

In [63]:
geoNear = restaurants.aggregate(
[{ 
    "$geoNear": {
        "near": [ -74.4, 40.4 ],
        "distanceField": "dist", 
        "spherical": True,
        "limit":1
    }
}])

for doc in geoNear:
    pprint.pprint(doc)
    print("Distance: " + str(doc['dist'] * 6378.1) + " km")

{'_id': ObjectId('55cba2476c522cafdb053c92'),
 'dist': 0.0007912778972893815,
 'location': {'coordinates': [-74.3731727, 40.4404759], 'type': 'Point'},
 'name': "Water'S Edge Club"}
Distance: 5.046849556701405 km


So, from point 40.4, -74.4 to point 40.4404759, -74.3731727 there are 5.04 km of distance.

In order to calculate the distance we have to multiply the radian measure (*dist* obtained) by the radius of the Earth (6378.1 km). As stated before, more info here: [Calculate Distance Using Spherical Geometry](https://docs.mongodb.com/manual/tutorial/calculate-distances-using-spherical-geometry-with-2d-geospatial-indexes/) 

We can check if the query and calculations is correct in gmaps. Remember that coordinates in MongoDB are stored on Longitude/Latitude order and Google store it in Latitude/Longitude order. So we have to search 40.4, -74.4 and 40.4404759, -74.3731727:

![mongodb_geoNear-gmaps_measure_distance](../images/mongodb_geoNear-gmaps_measure_distance.png)

#### Other querying examples

- Querying a restaurant by coordinates (you must specify the doc type):

In [64]:
dbgeo.restaurants.find({'location': {'coordinates': [-73.856077, 40.848447], 'type': 'Point'}}).count()

1

- Querying a restaurant by name:

In [65]:
dbgeo.restaurants.find({'name': 'Morris Park Bake Shop'}).count()

1

## [The Aggregation Framework](https://docs.mongodb.com/manual/aggregation/)

Aggregation operations process data records and return computed results. Aggregation operations group values from multiple documents together, and can perform a variety of operations on the grouped data to return a single result.

### Pipeline Operators

- **\$match**: Filter documents, standard query, uses indexes, reduce doc count, use firt.
- **\$project**: Reshape documents, remove fields, add fields, reduce doc size, looks at every doc.
- **\$group**: Summarize documents, sum avg etc., rewrite _id, reduce doc count, looks at every doc.
- **\$sort**: Order documents, several sorts allowed, ascending or descending, 100mb limit, allowDiskUse
- **\$limit/$skip**: Paginate documents
- **\$lookup**: Join two collections together
- **\$unwind**: Expand an array
- **\$out**: Create new collections, \$out overwrites, only one per aggregate, last member

#### \$match

Let's filter all restaurant's names ending with *Express Coffee Shop*:

In [66]:
match = dbgeo.restaurants.aggregate([ { "$match": {"name": { "$regex":"Express Coffee Shop$" } } } ])

for doc in match:
    pprint.pprint(doc)

{'_id': ObjectId('55cba2476c522cafdb056982'),
 'location': {'coordinates': [-73.99420099999999, 40.660136], 'type': 'Point'},
 'name': 'Express Coffee Shop'}
{'_id': ObjectId('55cba2476c522cafdb059096'),
 'location': {'coordinates': [-73.99620399999999, 40.690177], 'type': 'Point'},
 'name': "Henry'S Express Coffee Shop"}
