# Querying on Single Field Indexes

---
### Connecting to MongoDB using Pymongo
----

In [1]:
# Importing the required libraries

import pymongo

import pprint as pp
pp.sorted = lambda x, key=None: x

In [2]:
# Connect to local host

client = pymongo.MongoClient("mongodb://localhost:27017/")

In [3]:
# Connect to database

db = client['nyc']

In [4]:
# Sample document

pp.pprint(
    db.airbnb.find_one()
)

{'_id': ObjectId('60c21bf5b653d40e79b4a7d0'),
 'accom_id': 2595,
 'description': 'Skylit Midtown Castle',
 'host': {'id': 2845,
          'name': 'Jennifer',
          'listings_count': 3,
          'neighbourhood_list': ['Midtown', "Hell's Kitchen"]},
 'neighbourhood': {'name': 'Midtown', 'group': 'Manhattan'},
 'location': {'type': 'Point', 'coordinates': [-73.98559, 40.75356]},
 'room_type': 'Entire home/apt',
 'price': 150,
 'minimum_nights': 30,
 'reviews': {'number_of_reviews': 48,
             'last_review': datetime.datetime(2019, 11, 4, 0, 0),
             'reviews_per_month': 0.35},
 'availability_365': 365}


---
### Querying on single index


Create single index on `price` field.


----

In [5]:
# Create single index

db.airbnb.create_index('price',
                       name = 'price_index')

'price_index'

In [6]:
# Get indexes

pp.pprint(
    db.airbnb.index_information()
)

{'_id_': {'v': 2, 'key': [('_id', 1)]},
 'price_index': {'v': 2, 'key': [('price', 1)]}}


---
Look for accomodations in the `price` range of 150-170.

---

In [7]:
# Query

pp.pprint(
            db.airbnb.find({
                                'price': {
                                            '$gt': 150,
                                            '$lt': 170
                                        }
                            })\
                     .explain()['executionStats']
        )

{'executionSuccess': True,
 'nReturned': 1215,
 'executionTimeMillis': 4,
 'totalKeysExamined': 1215,
 'totalDocsExamined': 1215,
 'executionStages': {'stage': 'FETCH',
                     'nReturned': 1215,
                     'executionTimeMillisEstimate': 0,
                     'works': 1216,
                     'advanced': 1215,
                     'needTime': 0,
                     'needYield': 0,
                     'saveState': 1,
                     'restoreState': 1,
                     'isEOF': 1,
                     'docsExamined': 1215,
                     'alreadyHasObj': 0,
                     'inputStage': {'stage': 'IXSCAN',
                                    'nReturned': 1215,
                                    'executionTimeMillisEstimate': 0,
                                    'works': 1216,
                                    'advanced': 1215,
                                    'needTime': 0,
                                    'needYield': 0,
      

---
Mongodb used the `price_index` that we created to examine only 1215 docs that were then returned.

---
For example, we want to look for those accomodations where the `price` is between 150-170 and the number of `reviews` are more than 50.

---

In [8]:
# Query

pp.pprint(
            db.airbnb.find({
                                'price': {
                                            '$gt': 150,
                                            '$lt': 170
                                        },
                                'reviews.number_of_reviews': {'$gt': 50}
                            })\
                     .explain()['executionStats']
        )

{'executionSuccess': True,
 'nReturned': 201,
 'executionTimeMillis': 2,
 'totalKeysExamined': 1215,
 'totalDocsExamined': 1215,
 'executionStages': {'stage': 'FETCH',
                     'filter': {'reviews.number_of_reviews': {'$gt': 50}},
                     'nReturned': 201,
                     'executionTimeMillisEstimate': 0,
                     'works': 1216,
                     'advanced': 201,
                     'needTime': 1014,
                     'needYield': 0,
                     'saveState': 1,
                     'restoreState': 1,
                     'isEOF': 1,
                     'docsExamined': 1215,
                     'alreadyHasObj': 0,
                     'inputStage': {'stage': 'IXSCAN',
                                    'nReturned': 1215,
                                    'executionTimeMillisEstimate': 0,
                                    'works': 1216,
                                    'advanced': 1215,
                                  

---
Mongodb examined only 1215 documents that had price between 150 and 170. From that it returned only 201 documents that had number of reviews more than 50. 

We can further improve the performance of such queries using `Compound Indexes`.

----

### [Using indexes with Aggregation pipelines](https://docs.mongodb.com/manual/core/aggregation-pipeline/#pipeline-operators-and-indexes)

Starting in MongoDB 3.2, indexes can cover an aggregation pipeline.

----
For example, creating aggregation pipeline to filter documents that have `price` between 150 and 170.

---

In [9]:
# Aggregation Pipeline

cur = db.airbnb.aggregate([
                                {
                                    '$match':{
                                                'price':{'$gt':150, '$lt':170}
                                            }
                                }
                            ])

for doc in cur:
    pp.pprint(doc)

{'_id': ObjectId('60c21bf6b653d40e79b4aa69'),
 'accom_id': 435909,
 'description': 'Sunny West Village Dream',
 'host': {'id': 2165401,
          'name': 'Andrew',
          'listings_count': 1,
          'neighbourhood_list': ['West Village']},
 'neighbourhood': {'name': 'West Village', 'group': 'Manhattan'},
 'location': {'type': 'Point', 'coordinates': [-74.00418, 40.73553]},
 'room_type': 'Entire home/apt',
 'price': 151,
 'minimum_nights': 88,
 'reviews': {'number_of_reviews': 19,
             'last_review': datetime.datetime(2016, 12, 24, 0, 0),
             'reviews_per_month': 0.18},
 'availability_365': 159}
{'_id': ObjectId('60c21bf6b653d40e79b4af32'),
 'accom_id': 1780748,
 'description': 'Sunny 1 bedroom In The Heart of NYC',
 'host': {'id': 9293512,
          'name': 'Pk',
          'listings_count': 1,
          'neighbourhood_list': ['Murray Hill']},
 'neighbourhood': {'name': 'Murray Hill', 'group': 'Manhattan'},
 'location': {'type': 'Point', 'coordinates': [-73.97218,

---
We can define the pipeline separately.

For example, filter document where `150 < price < 170`.

----

In [10]:
# Pipeline

pipeline_1 = [
                {
                    '$match':{
                                'price':{'$gt':150, '$lt':170}
                             }
                }
            ]

In [11]:
# Aggregation Pipeline

cur = db.airbnb.aggregate(pipeline_1)

for doc in cur:
    pp.pprint(doc)

{'_id': ObjectId('60c21bf6b653d40e79b4aa69'),
 'accom_id': 435909,
 'description': 'Sunny West Village Dream',
 'host': {'id': 2165401,
          'name': 'Andrew',
          'listings_count': 1,
          'neighbourhood_list': ['West Village']},
 'neighbourhood': {'name': 'West Village', 'group': 'Manhattan'},
 'location': {'type': 'Point', 'coordinates': [-74.00418, 40.73553]},
 'room_type': 'Entire home/apt',
 'price': 151,
 'minimum_nights': 88,
 'reviews': {'number_of_reviews': 19,
             'last_review': datetime.datetime(2016, 12, 24, 0, 0),
             'reviews_per_month': 0.18},
 'availability_365': 159}
{'_id': ObjectId('60c21bf6b653d40e79b4af32'),
 'accom_id': 1780748,
 'description': 'Sunny 1 bedroom In The Heart of NYC',
 'host': {'id': 9293512,
          'name': 'Pk',
          'listings_count': 1,
          'neighbourhood_list': ['Murray Hill']},
 'neighbourhood': {'name': 'Murray Hill', 'group': 'Manhattan'},
 'location': {'type': 'Point', 'coordinates': [-73.97218,

---
In Pymongo, we need to use the [command](https://pymongo.readthedocs.io/en/stable/api/pymongo/database.html#pymongo.database.Database.command) to run the explain plan for the aggregation pipeline.

---

In [12]:
# Explain

pp.pprint(
    db.command('aggregate','airbnb',pipeline=pipeline_1, explain=True)
)

{'queryPlanner': {'plannerVersion': 1,
                  'namespace': 'nyc.airbnb',
                  'indexFilterSet': False,
                  'parsedQuery': {'$and': [{'price': {'$lt': 170}},
                                           {'price': {'$gt': 150}}]},
                  'queryHash': '4291FEFF',
                  'planCacheKey': 'B99FC200',
                  'optimizedPipeline': True,
                  'winningPlan': {'stage': 'FETCH',
                                  'inputStage': {'stage': 'IXSCAN',
                                                 'keyPattern': {'price': 1},
                                                 'indexName': 'price_index',
                                                 'isMultiKey': False,
                                                 'multiKeyPaths': {'price': []},
                                                 'isUnique': False,
                                                 'isSparse': False,
                                        


---
The `$match` stage can use an index to filter documents if it occurs at the beginning of a pipeline.

----

---
### Exercise 1 - 

Using single field index find the documents that have the `room_type` as `Shared room`. Also check out the size of the index you just created.

---
### Exercise 2 - 


Use aggregation pipeline to find the documents that have the `room_type` as `Shared room` and `price < 100`. Check the query execution using explain() method.

---