# Aggregration in MongoDB

In [1]:
import pymongo # pymongo is a python driver for MongoDB

client = pymongo.MongoClient('mongodb://localhost:27017/')

# cardio and patients were created in a previous notebook (4)
cardio_db = client.cardio # this is a 'database'
patients_collection = cardio_db.patients # this is a 'collection'

# companies and reviews were created in a previous notebook (2)
companies_db = client.companies # this is a 'database'
reviews_collection = companies_db.reviews # this is a 'collection'

MapReduce had been a main method for data processing in MongoDB. MapReduce is a programming model for processing and generating large data sets (we will see more later). 

In MongoDB 5.0, map-reduce is deprecated and replaced by aggregation.

https://www.mongodb.com/docs/manual/core/map-reduce/

Aggregation is a powerful data processing framework that enables you to perform a variety of data processing operations as a pipeline. 

https://www.mongodb.com/docs/manual/reference/map-reduce-to-aggregation-pipeline/

In this notebook, we will see how to use aggregation in MongoDB. NOTE: This is a rather large topic to cover fully; we will only cover the basics here. This will provide you with a solid foundation for further exploration.

Let's start by grouping the everage height of patients by office; we will use the aggregation framework to do this.

In [2]:
averages = patients_collection.aggregate( [
   {
    "$match" : 
        { "office_num" : { "$ne" : None }}
   },
   {
    "$group": 
        { "_id": "$office_num", "avg height": { "$avg": "$height" }}
   },
   { 
    "$sort": 
        { "avg height": -1 }
   }
])
list(averages)

[{'_id': [2], 'avg height': 157.33333333333334},
 {'_id': [9], 'avg height': 156.5},
 {'_id': [6], 'avg height': 153.6},
 {'_id': [1], 'avg height': 152.0},
 {'_id': [5], 'avg height': 146.5},
 {'_id': [7], 'avg height': 144.25},
 {'_id': [3], 'avg height': 142.83333333333334},
 {'_id': [4], 'avg height': 130.25},
 {'_id': [8], 'avg height': 129.0},
 {'_id': [10], 'avg height': 102.0}]

A more complex problem would be to aggregate the average weights by office. This is due to the fact the each patient has multiple weight readings. To accomplish this type or aggregation, we will use the $project and $group stages. The $project stage will aggregate the weights for each patient and the $group stage will aggregate the average weight for each office.

In [3]:
averages = patients_collection.aggregate( [
   {
    "$match" : 
        { "office_num" : { "$ne" : None }}
   },
   { 
    "$project": 
        {'office_num':'$office_num',
         "avg weight" : {"$avg":"$weights"} }
   },
   { 
    "$group": 
        { "_id": "$office_num", "avg weight": { "$avg": "$avg weight" }}
   },
   { 
    "$sort": 
        { "_id": 1 }
   }
])
list(averages)

[{'_id': [1], 'avg weight': 241.17499999999998},
 {'_id': [2], 'avg weight': 229.35555555555553},
 {'_id': [3], 'avg weight': 280.5833333333333},
 {'_id': [4], 'avg weight': 278.0416666666667},
 {'_id': [5], 'avg weight': 266.96999999999997},
 {'_id': [6], 'avg weight': 254.81333333333333},
 {'_id': [7], 'avg weight': 238.7},
 {'_id': [8], 'avg weight': 303.8},
 {'_id': [9], 'avg weight': 247.33958333333334},
 {'_id': [10], 'avg weight': 217.08333333333331}]

# More information

If you are interested in learning more about aggregation, you can read the documentation here: https://www.mongodb.com/docs/manual/core/aggregation-pipeline/ 

For more advanced aggregation, you can read the documentation here: https://www.mongodb.com/blog/post/advanced-aggregation-with-mongodb

It's also recommended that you look at MongoDB Compass. It's a GUI for MongoDB that makes it easy to explore your data and includes an aggregation builder. You can download it here: https://www.mongodb.com/products/compass
