# Sorting data in MongoDB

----
- [$sort](https://docs.mongodb.com/manual/reference/operator/aggregation/sort/) - Sorts all input documents and returns them to the pipeline in sorted order.

- [$limit](https://docs.mongodb.com/manual/reference/operator/aggregation/limit/) - Limits the number of documents passed to the next stage in the pipeline.

---
### Connecting to MongoDB using Pymongo
----

In [1]:
# Importing the required libraries
import pymongo
import pprint as pp

pp.sorted = lambda x, key=None: x

In [2]:
# Connect to the mongo client - Atlas Cluster
client = pymongo.MongoClient('mongodb://localhost:27017/')

In [3]:
# training dataset
db = client.training

In [4]:
# Sample document
pp.pprint(
    db.hr.find_one())

{'_id': ObjectId('60bc95fb12d1778df87722e2'),
 'enrollee_id': 23798,
 'gender': 'Male',
 'date_of_enrollment': datetime.datetime(2016, 8, 4, 8, 4, 14, 780000),
 'city': {'name': 'city_149', 'development_index': 0.689},
 'education': {'level': 'Graduate', 'discipline': 'STEM'},
 'experience': {'years': 3,
                'company_type': 'Pvt Ltd',
                'last_new_job': 1,
                'relevent_experience': 1},
 'training_hours': 106}


---
----
### `$sort` stage

[$sort](https://docs.mongodb.com/manual/reference/operator/aggregation/sort/) sorts all input documents and returns them to the pipeline in sorted order.



**Syntax -** `{ $sort: { <field1>: <sort order>, <field2>: <sort order> ... } }`

---

For example, sort by `training_hours` field in ascending order where `experience.years` is more than 3 years.

---

In [5]:
# Ascending order sort

result = db.hr.aggregate([
                            # Stage 1
                            {
                                '$match': {'experience.years':{'$gt':3}}
                            },
                            # Stage 2
                            {
                                '$sort': {'training_hours':1}
                            },
                            # Stage 3
                            {
                                '$project': {
                                                '_id':0, 
                                                'ID':'$enrollee_id',
                                                'Experience':'$experience.years', 
                                                'Training':'$training_hours'
                                            }
                            }
                        ])

# Print result
for doc in result:
    pp.pprint(doc)

{'ID': 24949, 'Experience': 9, 'Training': 1}
{'ID': 31032, 'Experience': 20, 'Training': 1}
{'ID': 12243, 'Experience': 7, 'Training': 1}
{'ID': 8759, 'Experience': 20, 'Training': 1}
{'ID': 27199, 'Experience': 20, 'Training': 1}
{'ID': 4168, 'Experience': 15, 'Training': 1}
{'ID': 10057, 'Experience': 8, 'Training': 1}
{'ID': 27830, 'Experience': 6, 'Training': 1}
{'ID': 17156, 'Experience': 9, 'Training': 2}
{'ID': 9391, 'Experience': 7, 'Training': 2}
{'ID': 10440, 'Experience': 20, 'Training': 2}
{'ID': 3946, 'Experience': 18, 'Training': 2}
{'ID': 12392, 'Experience': 20, 'Training': 2}
{'ID': 17843, 'Experience': 9, 'Training': 2}
{'ID': 3974, 'Experience': 7, 'Training': 2}
{'ID': 4659, 'Experience': 4, 'Training': 2}
{'ID': 15908, 'Experience': 7, 'Training': 2}
{'ID': 10985, 'Experience': 6, 'Training': 2}
{'ID': 833, 'Experience': 7, 'Training': 2}
{'ID': 25341, 'Experience': 12, 'Training': 2}
{'ID': 12994, 'Experience': 14, 'Training': 2}
{'ID': 22648, 'Experience': 20, '

---
To sort is descending order, use the -1 value.

----

In [6]:
# Descending order sort

result = db.hr.aggregate([
                            # Stage 1
                            {
                                '$match': {'experience.years':{'$gt':3}}
                            },
                            # Stage 2
                            {
                                '$sort': {'training_hours':-1}
                            },
                            # Stage 3
                            {
                                '$project': {
                                                '_id':0, 
                                                'ID':'$enrollee_id',
                                                'Experience':'$experience.years', 
                                                'Training':'$training_hours'
                                            }
                            }
                        ])

# Print result
for doc in result:
    pp.pprint(doc)

{'ID': 28975, 'Experience': 12, 'Training': 336}
{'ID': 27795, 'Experience': 6, 'Training': 336}
{'ID': 32716, 'Experience': 17, 'Training': 336}
{'ID': 4983, 'Experience': 5, 'Training': 336}
{'ID': 31454, 'Experience': 15, 'Training': 336}
{'ID': 26858, 'Experience': 16, 'Training': 336}
{'ID': 14332, 'Experience': 5, 'Training': 336}
{'ID': 1919, 'Experience': 4, 'Training': 334}
{'ID': 6564, 'Experience': 4, 'Training': 334}
{'ID': 32712, 'Experience': 14, 'Training': 334}
{'ID': 15053, 'Experience': 20, 'Training': 334}
{'ID': 17854, 'Experience': 15, 'Training': 334}
{'ID': 13744, 'Experience': 6, 'Training': 334}
{'ID': 17951, 'Experience': 15, 'Training': 334}
{'ID': 7443, 'Experience': 8, 'Training': 334}
{'ID': 22466, 'Experience': 7, 'Training': 334}
{'ID': 24537, 'Experience': 14, 'Training': 334}
{'ID': 10600, 'Experience': 14, 'Training': 334}
{'ID': 31256, 'Experience': 19, 'Training': 334}
{'ID': 32401, 'Experience': 6, 'Training': 332}
{'ID': 24845, 'Experience': 14, '

---
**Sort on multiple fields.**

We can even sort on multiple fields. The sort order is from left to right.

For example, sort documents by `training_hours` and `experience.years` where `experience.years` is more than 3 years.

---

In [7]:
# Descending order sort

result = db.hr.aggregate([
                            # Stage 1
                            {
                                '$match': {'experience.years':{'$gt':3}}
                            },
                            # Stage 2
                            {
                                '$sort': {
                                            'experience.years':-1,
                                            'training_hours':-1
                                        }
                            },
                            # Stage 3
                            {
                                '$project': {
                                                '_id':0, 
                                                'ID':'$enrollee_id',
                                                'Experience':'$experience.years', 
                                                'Training':'$training_hours'
                                            }
                            }
                        ])

# Print result
for doc in result:
    pp.pprint(doc)

{'ID': 15053, 'Experience': 20, 'Training': 334}
{'ID': 31979, 'Experience': 20, 'Training': 330}
{'ID': 5063, 'Experience': 20, 'Training': 330}
{'ID': 2934, 'Experience': 20, 'Training': 328}
{'ID': 10063, 'Experience': 20, 'Training': 328}
{'ID': 5682, 'Experience': 20, 'Training': 328}
{'ID': 13333, 'Experience': 20, 'Training': 326}
{'ID': 32476, 'Experience': 20, 'Training': 326}
{'ID': 28863, 'Experience': 20, 'Training': 326}
{'ID': 23112, 'Experience': 20, 'Training': 324}
{'ID': 9744, 'Experience': 20, 'Training': 324}
{'ID': 5854, 'Experience': 20, 'Training': 322}
{'ID': 15475, 'Experience': 20, 'Training': 322}
{'ID': 1092, 'Experience': 20, 'Training': 322}
{'ID': 10803, 'Experience': 20, 'Training': 322}
{'ID': 11763, 'Experience': 20, 'Training': 322}
{'ID': 18196, 'Experience': 20, 'Training': 320}
{'ID': 25080, 'Experience': 20, 'Training': 320}
{'ID': 26177, 'Experience': 20, 'Training': 316}
{'ID': 30707, 'Experience': 20, 'Training': 314}
{'ID': 10135, 'Experience'

---
**Overcoming memory limit error**


Sort is limited to 100 MB of data that it can store in RAM. If data exceeds this, sort will throw an error.


To overcome this challenge, you need to use `allowDiskUse` parameter. This allows MongoDB to use temporary files on disk to store data exceeding the 100 megabyte system memory limit while processing a sort operation. 


---

In [8]:
# Descending order sort with `allowDiskUse` parameter

result = db.hr.aggregate(
                         # Pipeline
                         [
                            # Stage 1
                            {
                                '$match': {'experience.years':{'$gt':3}}
                            },
                            # Stage 2
                            {
                                '$sort': {
                                            'experience.years':-1,
                                            'training_hours':-1
                                        }
                            },
                            # Stage 3
                            {
                                '$project': {
                                                '_id':0, 
                                                'ID':'$enrollee_id',
                                                'Experience':'$experience.years', 
                                                'Training':'$training_hours'
                                            }
                            }
                        ],
                        # Enable parameter
                        allowDiskUse=True)

# Print result
for doc in result:
    pp.pprint(doc)

{'ID': 15053, 'Experience': 20, 'Training': 334}
{'ID': 31979, 'Experience': 20, 'Training': 330}
{'ID': 5063, 'Experience': 20, 'Training': 330}
{'ID': 2934, 'Experience': 20, 'Training': 328}
{'ID': 10063, 'Experience': 20, 'Training': 328}
{'ID': 5682, 'Experience': 20, 'Training': 328}
{'ID': 13333, 'Experience': 20, 'Training': 326}
{'ID': 32476, 'Experience': 20, 'Training': 326}
{'ID': 28863, 'Experience': 20, 'Training': 326}
{'ID': 23112, 'Experience': 20, 'Training': 324}
{'ID': 9744, 'Experience': 20, 'Training': 324}
{'ID': 5854, 'Experience': 20, 'Training': 322}
{'ID': 15475, 'Experience': 20, 'Training': 322}
{'ID': 1092, 'Experience': 20, 'Training': 322}
{'ID': 10803, 'Experience': 20, 'Training': 322}
{'ID': 11763, 'Experience': 20, 'Training': 322}
{'ID': 18196, 'Experience': 20, 'Training': 320}
{'ID': 25080, 'Experience': 20, 'Training': 320}
{'ID': 26177, 'Experience': 20, 'Training': 316}
{'ID': 30707, 'Experience': 20, 'Training': 314}
{'ID': 10135, 'Experience'

---
### **`$limit`** operator

Another way is to use the [$limit](https://docs.mongodb.com/manual/reference/operator/aggregation/limit/) operator that limits the number of documents that are returned to the next stage of pipeline.

For example, return top 5 documents sorted by `training_hours` and `experience.years` where `experience.years` is more than 3 years.

----

In [9]:
# Descending order sort with limit 5

result = db.hr.aggregate(
                         # Pipeline
                         [
                            # Stage 1
                            {
                                '$match': {'experience.years':{'$gt':3}}
                            },
                            # Stage 2
                            {
                                '$sort': {
                                            'experience.years':-1,
                                            'training_hours':-1
                                        }
                            },
                            # Stage 3
                            {
                                '$limit':5
                            },
                            # Stage 4
                            {
                                '$project': {
                                                '_id':0, 
                                                'ID':'$enrollee_id',
                                                'Experience':'$experience.years', 
                                                'Training':'$training_hours'
                                            }
                            }
                        ])

# Print result
for doc in result:
    pp.pprint(doc)

{'ID': 15053, 'Experience': 20, 'Training': 334}
{'ID': 5063, 'Experience': 20, 'Training': 330}
{'ID': 31979, 'Experience': 20, 'Training': 330}
{'ID': 5682, 'Experience': 20, 'Training': 328}
{'ID': 10063, 'Experience': 20, 'Training': 328}


----
----
### Question -

Sort enrollees by date of joining where the training hours are more than 100.

----

In [10]:
# Question

result = db.hr.aggregate(
                         # Pipeline
                         [
                            # Stage 1
                            {
                                '$match': {'training_hours':{'$gt':100}}
                            },
                            # Stage 2
                            {
                                '$sort': {
                                            'date_of_enrollment':1
                                        }
                            },
                            # Stage 3
                            {
                                '$limit':10
                            },
                            # Stage 4
                            {
                                '$project': {
                                                '_id':0, 
                                                'ID':'$enrollee_id',
                                                'Experience':'$experience.years', 
                                                'Training':'$training_hours',
                                                'Date':'$date_of_enrollment'
                                            }
                            }
                        ])

# Print result
for doc in result:
    pp.pprint(doc)

{'ID': 20804,
 'Experience': 2,
 'Training': 214,
 'Date': datetime.datetime(2015, 1, 1, 10, 45, 56, 62000)}
{'ID': 241,
 'Experience': 7,
 'Training': 196,
 'Date': datetime.datetime(2015, 1, 1, 16, 15, 54, 294000)}
{'ID': 6156,
 'Experience': 7,
 'Training': 110,
 'Date': datetime.datetime(2015, 1, 1, 22, 41, 33, 820000)}
{'ID': 30016,
 'Experience': 20,
 'Training': 107,
 'Date': datetime.datetime(2015, 1, 1, 23, 28, 44, 319000)}
{'ID': 11246,
 'Experience': 18,
 'Training': 306,
 'Date': datetime.datetime(2015, 1, 2, 11, 48, 19, 504000)}
{'ID': 2208,
 'Experience': 2,
 'Training': 320,
 'Date': datetime.datetime(2015, 1, 2, 18, 10, 55, 81000)}
{'ID': 22603,
 'Experience': 0,
 'Training': 114,
 'Date': datetime.datetime(2015, 1, 2, 21, 35, 18, 832000)}
{'ID': 6478,
 'Experience': 16,
 'Training': 105,
 'Date': datetime.datetime(2015, 1, 4, 6, 6, 42, 976000)}
{'ID': 9620,
 'Experience': 15,
 'Training': 102,
 'Date': datetime.datetime(2015, 1, 4, 6, 21, 16, 688000)}
{'ID': 14398,
 'E

---
### Exercise 1 - 

Find out the enrollee who has the at least one year of relevant experience and has the highest experience in years and training hours.

----

### Exercise 2 - 

Find out the first enrollee enrolled for training from city city_150.

----