# Compound Index

---
---

### [Compound Index](https://docs.mongodb.com/manual/core/index-compound/)

- Index created on multiple fields.

- Index created on fields in order of mention.

- Each field in compound index can be in either ascending/descending order.

![compound_index](Images/compound_index.svg)



---
### Connect to local server

---

In [1]:
# Importing the required libraries
import pymongo

import pprint as pp
pp.sorted = lambda x, key=None: x

In [2]:
# Connect to local host
client = pymongo.MongoClient("mongodb://localhost:27017/")

In [3]:
# Connect to database
db = client['nyc']

In [4]:
# Sample document
pp.pprint(
    db.airbnb.find_one()
)

{'_id': ObjectId('60c21bf5b653d40e79b4a7d0'),
 'accom_id': 2595,
 'description': 'Skylit Midtown Castle',
 'host': {'id': 2845,
          'name': 'Jennifer',
          'listings_count': 3,
          'neighbourhood_list': ['Midtown', "Hell's Kitchen"]},
 'neighbourhood': {'name': 'Midtown', 'group': 'Manhattan'},
 'location': {'type': 'Point', 'coordinates': [-73.98559, 40.75356]},
 'room_type': 'Entire home/apt',
 'price': 150,
 'minimum_nights': 30,
 'reviews': {'number_of_reviews': 48,
             'last_review': datetime.datetime(2019, 11, 4, 0, 0),
             'reviews_per_month': 0.35},
 'availability_365': 365}


----
**Drop previous indexes.**

---

In [5]:
# Drop indexes
db.airbnb.drop_indexes()

___
Query data based on `price` and `number_of_reviews`. Look for documents where `price = 150` and `reviews.number_of_reviews > 50`.
____


In [6]:
# Query on price and number of reviews
pp.pprint(
            db.airbnb.find({
                                'price': 150,
                                'reviews.number_of_reviews': {'$gt': 50}
                            })\
                     .explain()['executionStats']
        )

{'executionSuccess': True,
 'nReturned': 124,
 'executionTimeMillis': 19,
 'totalKeysExamined': 0,
 'totalDocsExamined': 36905,
 'executionStages': {'stage': 'COLLSCAN',
                     'filter': {'$and': [{'price': {'$eq': 150}},
                                         {'reviews.number_of_reviews': {'$gt': 50}}]},
                     'nReturned': 124,
                     'executionTimeMillisEstimate': 1,
                     'works': 36907,
                     'advanced': 124,
                     'needTime': 36782,
                     'needYield': 0,
                     'saveState': 36,
                     'restoreState': 36,
                     'isEOF': 1,
                     'direction': 'forward',
                     'docsExamined': 36905},
 'allPlansExecution': []}


---
--- 
We have returned 124 documents out of 36905 total examined documents. We can reduce the number of examined documents by creating a **single index** on `price` field.

----

In [7]:
# Create single index
db.airbnb.create_index('price',
                       name = 'price_index')

'price_index'

---
This will reduce the number of examined documents.

---

In [8]:
# Query
pp.pprint(
            db.airbnb.find({
                                'price': 150,
                                'reviews.number_of_reviews': {'$gt': 50}
                            })\
                     .explain()['executionStats']
    )

{'executionSuccess': True,
 'nReturned': 124,
 'executionTimeMillis': 4,
 'totalKeysExamined': 1298,
 'totalDocsExamined': 1298,
 'executionStages': {'stage': 'FETCH',
                     'filter': {'reviews.number_of_reviews': {'$gt': 50}},
                     'nReturned': 124,
                     'executionTimeMillisEstimate': 0,
                     'works': 1299,
                     'advanced': 124,
                     'needTime': 1174,
                     'needYield': 0,
                     'saveState': 1,
                     'restoreState': 1,
                     'isEOF': 1,
                     'docsExamined': 1298,
                     'alreadyHasObj': 0,
                     'inputStage': {'stage': 'IXSCAN',
                                    'nReturned': 1298,
                                    'executionTimeMillisEstimate': 0,
                                    'works': 1299,
                                    'advanced': 1298,
                                  

---
---
Still a lot more documents are examined than retruned. Since we are indexing on more than one field, we can create an index that holds references to more than one field. We can use a `Compound Index`.

----
Create compound index using `price` and `reviews.number_of_reviews` fields.

![compound_example](Images/compound_example.png)

---

In [9]:
# Create compound index
# price in ASCENDING order
# reviews.number_of_reviews in DESCENDING order

db.airbnb.create_index(
                        # Compound index
                        [
                            ('price', pymongo.ASCENDING),
                            ('reviews.number_of_reviews', pymongo.DESCENDING)
                        ],
                        # Index name
                        name = 'price_reviews')

'price_reviews'

In [10]:
# Query

pp.pprint(
            db.airbnb.find({
                                'price': 150,
                                'reviews.number_of_reviews': {'$gt': 50}
                           })\
                     .explain()['executionStats']
    )

{'executionSuccess': True,
 'nReturned': 124,
 'executionTimeMillis': 2,
 'totalKeysExamined': 124,
 'totalDocsExamined': 124,
 'executionStages': {'stage': 'FETCH',
                     'nReturned': 124,
                     'executionTimeMillisEstimate': 0,
                     'works': 125,
                     'advanced': 124,
                     'needTime': 0,
                     'needYield': 0,
                     'saveState': 0,
                     'restoreState': 0,
                     'isEOF': 1,
                     'docsExamined': 124,
                     'alreadyHasObj': 0,
                     'inputStage': {'stage': 'IXSCAN',
                                    'nReturned': 124,
                                    'executionTimeMillisEstimate': 0,
                                    'works': 125,
                                    'advanced': 124,
                                    'needTime': 0,
                                    'needYield': 0,
                

___
Look for documents where `150 < price <170` and `reviews.number_of_reviews > 50`.
____


In [11]:
# Query

pp.pprint(
            db.airbnb.find({
                                'price': {
                                            '$gt': 150,
                                            '$lt': 170
                                        },
                                'reviews.number_of_reviews': {'$gt': 50}
                           })\
                     .explain()['executionStats']
    )

{'executionSuccess': True,
 'nReturned': 201,
 'executionTimeMillis': 1,
 'totalKeysExamined': 221,
 'totalDocsExamined': 201,
 'executionStages': {'stage': 'FETCH',
                     'nReturned': 201,
                     'executionTimeMillisEstimate': 0,
                     'works': 221,
                     'advanced': 201,
                     'needTime': 19,
                     'needYield': 0,
                     'saveState': 0,
                     'restoreState': 0,
                     'isEOF': 1,
                     'docsExamined': 201,
                     'alreadyHasObj': 0,
                     'inputStage': {'stage': 'IXSCAN',
                                    'nReturned': 201,
                                    'executionTimeMillisEstimate': 0,
                                    'works': 221,
                                    'advanced': 201,
                                    'needTime': 19,
                                    'needYield': 0,
              

---
***While creating compound index :***
- First add those fields against which equality conditions are run.

- Add last those fields against which range based queries are run.


---

----
### Exercise -

Consider the following queries.

- How many accomodations in "Manhattan" `neighbourhood.group` had `price` less than $100?

- How many accomodations in either "Manhattan", "Brooklyn" or "Queens" `neighbourhood.group` had `price` less than $100 and `reviews.number_of_reviews` is greater than 10?

What was the difference in the execution of the two queries? What index did you create?

-----