# Aggregation Pipeline Expression Operators

----

- Expressions are the queries performed on a field. A sample expression object :-

> - `{ <field1>: <expression1>, ... }`

> - `'$project':{'ID':'$enrollee_id'}`

- Expression operators are available to construct expressions for use in the aggregation pipeline stages.


- In general, these expression operators take an array of arguments.

> `{ <operator>: [ <argument1>, <argument2> ... ] }`


- Expression operators - Arithmetic, Date, String, etc.


- This lets us derive new fields from existing fields and pass on to next stage of pipeline. 

- [Aggregation Pipeline Expression operators](https://docs.mongodb.com/manual/reference/operator/aggregation/#aggregation-pipeline-operators)


---
### Connecting to MongoDB using Pymongo
----

In [1]:
# Importing the required libraries
import pymongo
import pprint as pp

pp.sorted = lambda x, key=None: x

In [2]:
# Connect to local server
client = pymongo.MongoClient('mongodb://localhost:27017/')

In [3]:
# Using training dataset
db = client.training

In [4]:
# Sample document
pp.pprint(
    db.hr.find_one()
)

{'_id': ObjectId('60bc95fb12d1778df87722e2'),
 'enrollee_id': 23798,
 'gender': 'Male',
 'date_of_enrollment': datetime.datetime(2016, 8, 4, 8, 4, 14, 780000),
 'city': {'name': 'city_149', 'development_index': 0.689},
 'education': {'level': 'Graduate', 'discipline': 'STEM'},
 'experience': {'years': 3,
                'company_type': 'Pvt Ltd',
                'last_new_job': 1,
                'relevent_experience': 1},
 'training_hours': 106}


---
----
### [Arithmetic Expression Operators](https://docs.mongodb.com/manual/reference/operator/aggregation/#arithmetic-expression-operators)

Arithmetic expressions perform mathematic operations on numbers. 

----

For example, let's say we need to look at `training_hours` after subtracting 3 hours from it for those documents where `education.discipline` is `STEM`.

We will need to use [$subtract](https://docs.mongodb.com/manual/reference/operator/aggregation/subtract/#mongodb-expression-exp.-subtract) operator here.

---

In [5]:
# Use of aggregation operators

result = db.hr.aggregate(
        # Pipeline
        [
            # Stage 1
            {
                '$match':{'education.discipline':'STEM'}
            },
            # Stage 2
            {
                '$project':{
                                '_id':0,
                                'Actual_tranining_hours': '$training_hours',
                                'Modified_training_hours': {'$subtract': ['$training_hours', 3]}
                }
            }
        ])

# Print result
for doc in result:
    pp.pprint(doc)

{'Actual_tranining_hours': 106, 'Modified_training_hours': 103}
{'Actual_tranining_hours': 69, 'Modified_training_hours': 66}
{'Actual_tranining_hours': 4, 'Modified_training_hours': 1}
{'Actual_tranining_hours': 26, 'Modified_training_hours': 23}
{'Actual_tranining_hours': 88, 'Modified_training_hours': 85}
{'Actual_tranining_hours': 23, 'Modified_training_hours': 20}
{'Actual_tranining_hours': 8, 'Modified_training_hours': 5}
{'Actual_tranining_hours': 10, 'Modified_training_hours': 7}
{'Actual_tranining_hours': 85, 'Modified_training_hours': 82}
{'Actual_tranining_hours': 106, 'Modified_training_hours': 103}
{'Actual_tranining_hours': 51, 'Modified_training_hours': 48}
{'Actual_tranining_hours': 4, 'Modified_training_hours': 1}
{'Actual_tranining_hours': 35, 'Modified_training_hours': 32}
{'Actual_tranining_hours': 42, 'Modified_training_hours': 39}
{'Actual_tranining_hours': 45, 'Modified_training_hours': 42}
{'Actual_tranining_hours': 106, 'Modified_training_hours': 103}
{'Actual_

---
**We can refer to these derived fields in the subsequent stages of pipeline.**

For example, we only keep those documents from previous step where the Modified_training_hours is greater than 100 hours.

----

In [6]:
# Use of aggregation operators

result = db.hr.aggregate(
       
        # Pipeline
        [
            # Stage 1
            {
                '$match':{'education.discipline':'STEM'}
            },
            
            # Stage 2
            {
                '$project':{
                                '_id':0,
                                'Actual_tranining_hours': '$training_hours',
                                'Modified_training_hours': {'$subtract': ['$training_hours', 3]}
                            }
            },
            
            # Stage 3
            {
                '$match':{'Modified_training_hours':{'$gt':100}}
            }
        ])

# Print result
for doc in result:
    pp.pprint(doc)

{'Actual_tranining_hours': 106, 'Modified_training_hours': 103}
{'Actual_tranining_hours': 106, 'Modified_training_hours': 103}
{'Actual_tranining_hours': 106, 'Modified_training_hours': 103}
{'Actual_tranining_hours': 298, 'Modified_training_hours': 295}
{'Actual_tranining_hours': 114, 'Modified_training_hours': 111}
{'Actual_tranining_hours': 104, 'Modified_training_hours': 101}
{'Actual_tranining_hours': 109, 'Modified_training_hours': 106}
{'Actual_tranining_hours': 262, 'Modified_training_hours': 259}
{'Actual_tranining_hours': 112, 'Modified_training_hours': 109}
{'Actual_tranining_hours': 166, 'Modified_training_hours': 163}
{'Actual_tranining_hours': 110, 'Modified_training_hours': 107}
{'Actual_tranining_hours': 170, 'Modified_training_hours': 167}
{'Actual_tranining_hours': 132, 'Modified_training_hours': 129}
{'Actual_tranining_hours': 145, 'Modified_training_hours': 142}
{'Actual_tranining_hours': 152, 'Modified_training_hours': 149}
{'Actual_tranining_hours': 168, 'Modifie

----
**We can even subtract two field values.**

For example, we subtract the `experience.relevant_experience` from `expereience.years`.

----

In [7]:
# Use of aggregation operators

result = db.hr.aggregate(
       
        # Pipeline
        [
            # Stage 1
            {
                '$match':{'education.discipline':'STEM'}
            },
            
            # Stage 2
            {
                '$project':{
                                '_id':0,
                                'Total_Exp': '$experience.years',
                                'Relevant_Exp': '$experience.relevent_experience',
                                # Subtract
                                'Result': {'$subtract': ['$experience.years', '$experience.relevent_experience']}
                            }
            }
        ])

# Print result
for doc in result:
    pp.pprint(doc)

{'Total_Exp': 3, 'Relevant_Exp': 1, 'Result': 2}
{'Total_Exp': 14, 'Relevant_Exp': 1, 'Result': 13}
{'Total_Exp': 6, 'Relevant_Exp': 1, 'Result': 5}
{'Total_Exp': 14, 'Relevant_Exp': 1, 'Result': 13}
{'Total_Exp': 8, 'Relevant_Exp': 0, 'Result': 8}
{'Total_Exp': 6, 'Relevant_Exp': 1, 'Result': 5}
{'Total_Exp': 20, 'Relevant_Exp': 1, 'Result': 19}
{'Total_Exp': 20, 'Relevant_Exp': 1, 'Result': 19}
{'Total_Exp': 20, 'Relevant_Exp': 1, 'Result': 19}
{'Total_Exp': 4, 'Relevant_Exp': 0, 'Result': 4}
{'Total_Exp': 20, 'Relevant_Exp': 1, 'Result': 19}
{'Total_Exp': 3, 'Relevant_Exp': 1, 'Result': 2}
{'Total_Exp': 20, 'Relevant_Exp': 1, 'Result': 19}
{'Total_Exp': 3, 'Relevant_Exp': 1, 'Result': 2}
{'Total_Exp': 15, 'Relevant_Exp': 1, 'Result': 14}
{'Total_Exp': 3, 'Relevant_Exp': 1, 'Result': 2}
{'Total_Exp': 10, 'Relevant_Exp': 1, 'Result': 9}
{'Total_Exp': 1, 'Relevant_Exp': 0, 'Result': 1}
{'Total_Exp': 15, 'Relevant_Exp': 0, 'Result': 15}
{'Total_Exp': 14, 'Relevant_Exp': 1, 'Result': 13}

---
**We can nest as many operations in the expression operators.**

For example, we want to print the `city.development_index` as an absolute value out of 10 instead of a fraction.

----

In [8]:
# Sample document
pp.pprint(
    db.hr.find_one())

{'_id': ObjectId('60bc95fb12d1778df87722e2'),
 'enrollee_id': 23798,
 'gender': 'Male',
 'date_of_enrollment': datetime.datetime(2016, 8, 4, 8, 4, 14, 780000),
 'city': {'name': 'city_149', 'development_index': 0.689},
 'education': {'level': 'Graduate', 'discipline': 'STEM'},
 'experience': {'years': 3,
                'company_type': 'Pvt Ltd',
                'last_new_job': 1,
                'relevent_experience': 1},
 'training_hours': 106}


---
----
We first multiple the value by 10 using [\$multiply](https://docs.mongodb.com/manual/reference/operator/aggregation/multiply/#mongodb-expression-exp.-multiply) operator and then we take the [\$floor](https://docs.mongodb.com/manual/reference/operator/aggregation/floor/#mongodb-expression-exp.-floor) value.

---

In [9]:
# Nested operators

result = db.hr.aggregate(
        
        # Pipeline
        [
            # Stage 1
            {
                '$match':{'education.discipline':'STEM'}
            },
            
            # Stage 2
            {
                '$project':{
                            '_id':0,
                            'city.development_index':1,
                            'Index': {
                                        # Take floor value
                                        '$floor':{
                                                    # Multiply by 10
                                                    '$multiply': ['$city.development_index', 10]
                                                 }
                                     }
                        }
            }
        ])

# Print results
for doc in result:
    pp.pprint(doc)

{'city': {'development_index': 0.689}, 'Index': 6.0}
{'city': {'development_index': 0.923}, 'Index': 9.0}
{'city': {'development_index': 0.91}, 'Index': 9.0}
{'city': {'development_index': 0.666}, 'Index': 6.0}
{'city': {'development_index': 0.887}, 'Index': 8.0}
{'city': {'development_index': 0.624}, 'Index': 6.0}
{'city': {'development_index': 0.926}, 'Index': 9.0}
{'city': {'development_index': 0.92}, 'Index': 9.0}
{'city': {'development_index': 0.925}, 'Index': 9.0}
{'city': {'development_index': 0.92}, 'Index': 9.0}
{'city': {'development_index': 0.698}, 'Index': 6.0}
{'city': {'development_index': 0.92}, 'Index': 9.0}
{'city': {'development_index': 0.92}, 'Index': 9.0}
{'city': {'development_index': 0.897}, 'Index': 8.0}
{'city': {'development_index': 0.843}, 'Index': 8.0}
{'city': {'development_index': 0.624}, 'Index': 6.0}
{'city': {'development_index': 0.855}, 'Index': 8.0}
{'city': {'development_index': 0.939}, 'Index': 9.0}
{'city': {'development_index': 0.754}, 'Index': 7.0

----
Aggregation Pipeline Expression operators link - (https://docs.mongodb.com/manual/reference/operator/aggregation/#aggregation-pipeline-operators)

----

---
### Exercise 1 - 

Return the enrollee_id and number of months of total experience of an enrollee by using the [$multiply]() operator on the experience.years field.

----

----
### Exercise 2 - 

Return the enrollee_id, city name and development_index truncated to first decimal place where the education discipline of the enrollee is not STEM.

*Hint - Use [$trunc](https://docs.mongodb.com/manual/reference/operator/aggregation/trunc/#-trunc--aggregation-) operator.*

----