# MongoDB 3.2 New Feature Review

This notebook collects a set of examples that focus on MongoDB 3.2 release new features and how they can be used in a python context.
Although the majority of the examples are python based (since we are using [ipython notebooks](http://ipython.org/notebook.html)) they are applicable to any other offical driver supported language within the 3.2 release.

## Basic Connection to MongoDB 3.2 instance
Make sure you are running one of these in your host machine. 
It's quite easy, you just need to follow the [installation tutorial](https://docs.mongodb.org/master/installation/#)

In [2]:
from pymongo import MongoClient

host = "mongodb://localhost:27017"

mc = MongoClient(host)

#check the server version: >3.2.0-rc0
mc.server_info()['version']

u'3.2.0-rc2'

## WiredTiger default storage engine
3.2 introduces WiredTiger as default storage engine. So make sure you understand the implications if you are accostumed to use the default storage engines and want to migrate to 3.2 version. 
Please follow this [upgrade tutorial](https://docs.mongodb.org/manual/release-notes/3.0-upgrade/#change-storage-engine-to-wiredtiger)

In [3]:
#let's connect to `admin` db and run serverStatus command 
mc.admin.command('serverStatus')['storageEngine']

{u'name': u'wiredTiger', u'supportsCommittedReads': True}

## [Partial Indexes](https://docs.mongodb.org/manual/release-notes/3.2/#partial-indexes)
Partial indexes allow the user to establish _ranges_, _types_ and _existance_ conditions to fields inorder to determine if those documents can be indexed or not. 
The typical example around _ranges_ is that when we only require documents to be indexed if the field on which we create the index is interessing only if it above a certain value. `Ex: only index if quantity is above X`
```javascript
{_id: 1, quantity: 10}, {_id: 2, quantity: 11}, {_id:3, quantity: 4}
```
If know that we our queries are only interested on documents that have `quantity` greater than 9 we can use a partial index to reflect that and make our indexes more efficient _(smaller = better)_


In [6]:
import pymongo
coll = mc.notebook.partialindexcollection
docs = []
for i in xrange(30): 
    docs.append( {'quantity': i  })

coll.insert_many(docs)
coll.count()

210

In [7]:
partialFilterExpression = { 'quantity':{'$gt': 9} }

coll.create_index([("quantity", pymongo.ASCENDING)], partialFilterExpression=partialFilterExpression )

u'quantity_1'

In [12]:
import simplejson as json
#let's run some queries and check if they are using the indexes accordingly 
partialcovered = {"quantity": {"$gt": 10}}
print "covered"
print json.dumps(coll.find(partialcovered).explain(), indent=4)

covered
{
    "queryPlanner": {
        "parsedQuery": {
            "quantity": {
                "$gt": 10
            }
        },
        "rejectedPlans": [],
        "namespace": "notebook.partialindexcollection",
        "winningPlan": {
            "inputStage": {
                "direction": "forward",
                "indexName": "quantity_1",
                "isUnique": false,
                "isSparse": false,
                "isPartial": true,
                "indexBounds": {
                    "quantity": [
                        "(10, inf.0]"
                    ]
                },
                "isMultiKey": false,
                "stage": "IXSCAN",
                "indexVersion": 1,
                "keyPattern": {
                    "quantity": 1
                }
            },
            "stage": "FETCH"
        },
        "indexFilterSet": false,
        "plannerVersion": 1
    },
    "serverInfo": {
        "host": "nair.local",
        "version": "3.2.0-rc2"

In [11]:
notcovered = {"quantity": {"$gt": 8}}
print "not covered"
print json.dumps(coll.find(notcovered).explain(), indent=4)

not covered
{
    "queryPlanner": {
        "parsedQuery": {
            "quantity": {
                "$gt": 8
            }
        },
        "rejectedPlans": [],
        "namespace": "notebook.partialindexcollection",
        "winningPlan": {
            "filter": {
                "quantity": {
                    "$gt": 8
                }
            },
            "direction": "forward",
            "stage": "COLLSCAN"
        },
        "indexFilterSet": false,
        "plannerVersion": 1
    },
    "serverInfo": {
        "host": "nair.local",
        "version": "3.2.0-rc2",
        "port": 27017,
        "gitVersion": "8a3acb42742182c5e314636041c2df368232bbc5"
    },
    "waitedMS": 0,
    "ok": 1.0,
    "executionStats": {
        "executionTimeMillis": 0,
        "nReturned": 147,
        "totalKeysExamined": 0,
        "allPlansExecution": [],
        "executionSuccess": true,
        "executionStages": {
            "needYield": 0,
            "direction": "forward",
   

## [Document Validation](https://docs.mongodb.org/manual/release-notes/3.2/#document-validation)
With 3.2 we are now supporting collection based document validation.
This features enables developers to establish rules that validate incoming write operations

In [51]:
#Validate that attribute `name` always exists
# > db.createCollection('validateme', { validator: { $and: [ {'name': {$exists:1}}  ]  }  })
validator={ "$and": [ {'name': {"$exists":True}}  ]  } 
mc.notebook.command("collMod", "validateme", validator=validator)

validateme = mc.notebook.validateme
validDoc = {'name': 'Jose Mourinho', 'nickname': 'Special One'}
insertRes = validateme.insert_one( validDoc )

print(insertRes.inserted_id)

565485894cc75f0d2c6e05c9


In [52]:
notValidDoc = {"nickname": "Not So Special"}
try:
    insertRes = validateme.insert_one(notValidDoc)
except Exception, e:
    print(e)

Document failed validation


In [53]:
#validation can be override 
writeOptions = {"validatioLevel" : "off"}
notValidDoc = {"nickname": "Not So Special"}
insertRes = validateme.insert_one(notValidDoc, writeOptions)

print(insertRes.inserted_id)

5654858f4cc75f0d2c6e05cb


In [54]:
#change the existing validator rules 
newvalidator = {'$and': [ {"name": {"$exists": True}}, {"age": {"$gt": 18, "$exists": True}}  ]}
mc.notebook.command("collMod", "validateme", validator=newvalidator)
usted_to_be_valid = {'name': 'Jose Mourinho', 'nickname': 'Special One'}
validateme = mc.notebook.validateme
try:
    result = validateme.insert_one(usted_to_be_valid)
except Exception, e:
    print(e)

Document failed validation


In [60]:
# we can also not report back an error but an warning instead - this will be reported on the logs
# db.createCollection('validatewarn', { validator: { $and: [ {'name': {$exists:1}}  ]  }, validationAction: 'warn'  })
validatewarn = mc.notebook.validatewarn
notValidDoc = {"nickname": "Not So Special"}

result = validatewarn.insert_one(notValidDoc)
print(insertRes.inserted_id)

5654858f4cc75f0d2c6e05cb


## [New Aggregation Stages](https://docs.mongodb.org/manual/release-notes/3.2/#new-aggregation-stages)
We've added improvements and new operators to our aggregation framework

In [62]:
# let's start with simple set of elements
sample = mc.notebook.sample
ds = []
for i in xrange(10):
    d = {'i': i}
    ds.append(d)

sample.insert_many(ds)

{u'i': 4, u'_id': ObjectId('565487d44cc75f0d2c6e05d7')}
{u'i': 0, u'_id': ObjectId('565487d44cc75f0d2c6e05d3')}
{u'i': 2, u'_id': ObjectId('565487d44cc75f0d2c6e05d5')}


In [84]:
# $sample allows you to randomly select N documents from the input
pipeline = [{"$sample": {"size": 3}}]

cur = sample.aggregate( pipeline )
for d in cur:
    print(d)

{u'i': 0, u'_id': ObjectId('565487d44cc75f0d2c6e05d3')}
{u'i': 9, u'_id': ObjectId('565487d44cc75f0d2c6e05dc')}
{u'i': 8, u'_id': ObjectId('565487d44cc75f0d2c6e05db')}


In [104]:
# $lookup allows one to perform a left outer join between 2 collections 

names = mc.notebook.names 
ns = [{"_id": 1, "name": "Alice"}, {"_id": 2, "name": "Eve"}, {"_id": 3, "name": "MOURINHO"}]
names.insert_many(ns)

pipeline = [{"$lookup": { "from": "names", "localField": "i", "foreignField": "_id", "as": "names" }}]

cur = sample.aggregate(pipeline)
for d in cur:
    print(d)
names.drop()  

{u'i': 0, u'_id': ObjectId('565487d44cc75f0d2c6e05d3'), u'names': []}
{u'i': 1, u'_id': ObjectId('565487d44cc75f0d2c6e05d4'), u'names': [{u'_id': 1, u'name': u'Alice'}]}
{u'i': 2, u'_id': ObjectId('565487d44cc75f0d2c6e05d5'), u'names': [{u'_id': 2, u'name': u'Eve'}]}
{u'i': 3, u'_id': ObjectId('565487d44cc75f0d2c6e05d6'), u'names': [{u'_id': 3, u'name': u'MOURINHO'}]}
{u'i': 4, u'_id': ObjectId('565487d44cc75f0d2c6e05d7'), u'names': []}
{u'i': 5, u'_id': ObjectId('565487d44cc75f0d2c6e05d8'), u'names': []}
{u'i': 6, u'_id': ObjectId('565487d44cc75f0d2c6e05d9'), u'names': []}
{u'i': 7, u'_id': ObjectId('565487d44cc75f0d2c6e05da'), u'names': []}
{u'i': 8, u'_id': ObjectId('565487d44cc75f0d2c6e05db'), u'names': []}
{u'i': 9, u'_id': ObjectId('565487d44cc75f0d2c6e05dc'), u'names': []}


In [147]:
# new $group operators 
import random
ages = mc.notebook.ages

people = []
for i in xrange(100):
    people.append( {'i': i, "city":"NYC", 'age': random.randint(0,100)} )

ages.insert_many(people)

pipeline = [ 
    {"$group": { "_id": "$city", "stdDev": {"$stdDevSamp": "$age"}  }},
    ]
cur = ages.aggregate(pipeline)
for d in cur:
    print(d)
ages.drop()

{u'_id': u'NYC', u'stdDev': 30.422830010677597}


## [Case Sensitive Text Search](https://docs.mongodb.org/manual/release-notes/3.2/#case-sensitive-text-search)
By default the `text` index is case insensitive but for some cases we might want to perform case sensitive searches 

In [149]:
# load a few documents 
docs = [
    {'name': 'New York', 'desc': 'CITY Capital of The Empire'}, 
    {'name': 'Paris', 'desc': 'la ville LUMIERE'},
    {'name': 'Berlin', 'desc': 'the underground city'}
]
cities = mc.notebook.cities
len(cities.insert_many(docs).inserted_ids)

3

In [134]:
# create the text index
cities.create_index([( 'desc','text' )])

u'desc_text'

In [139]:
# default case insentive 
query = {"$text": {"$search": "City"}}
for d in cities.find(query): print(d)

{u'_id': ObjectId('5654aa844cc75f0d2c6e0e75'), u'name': u'New York', u'desc': u'CITY Capital of The Empire'}
{u'_id': ObjectId('5654aa844cc75f0d2c6e0e77'), u'name': u'Berlin', u'desc': u'the underground city'}


In [141]:
# using case sensitive options
query = {"$text": {"$search": "CITY", "$caseSensitive": True}}
for d in cities.find(query): print(d)

{u'_id': ObjectId('5654aa844cc75f0d2c6e0e75'), u'name': u'New York', u'desc': u'CITY Capital of The Empire'}
