# Multikey Index

---
### [Multikey Index](https://docs.mongodb.com/manual/core/index-multikey/)

- Index on field that contains an array value.

- Index key is created for each value in array.

- Array elements can be either scalar (like number and string) or nested documents.

- Compound index key can have atmost one index field that is an array.


![img](Images/multikey_index.svg)



----

---
### Connect to local server

---

In [1]:
# Importing the required libraries
import pymongo

import pprint as pp
pp.sorted = lambda x, key=None: x

In [2]:
# Connect to local host
client = pymongo.MongoClient("mongodb://localhost:27017/")

In [3]:
# Connect to database
db = client['nyc']

In [4]:
# Sample document
pp.pprint(db.airbnb.find_one())

{'_id': ObjectId('60c21bf5b653d40e79b4a7d0'),
 'accom_id': 2595,
 'description': 'Skylit Midtown Castle',
 'host': {'id': 2845,
          'name': 'Jennifer',
          'listings_count': 3,
          'neighbourhood_list': ['Midtown', "Hell's Kitchen"]},
 'neighbourhood': {'name': 'Midtown', 'group': 'Manhattan'},
 'location': {'type': 'Point', 'coordinates': [-73.98559, 40.75356]},
 'room_type': 'Entire home/apt',
 'price': 150,
 'minimum_nights': 30,
 'reviews': {'number_of_reviews': 48,
             'last_review': datetime.datetime(2019, 11, 4, 0, 0),
             'reviews_per_month': 0.35},
 'availability_365': 365}


----
**Drop previous indexes.**

----

In [5]:
# Drop indexes
db.airbnb.drop_indexes()

---
**Query on array field.**

Documents where hosts have accomodation in `Gravesend` neighbourhood.

----

In [6]:
# Query

pp.pprint(
            db.airbnb.find(
                            {
                                'host.neighbourhood_list': 'Gravesend'
                            })\
                     .explain()['executionStats']
)

{'executionSuccess': True,
 'nReturned': 74,
 'executionTimeMillis': 24,
 'totalKeysExamined': 0,
 'totalDocsExamined': 36905,
 'executionStages': {'stage': 'COLLSCAN',
                     'filter': {'host.neighbourhood_list': {'$eq': 'Gravesend'}},
                     'nReturned': 74,
                     'executionTimeMillisEstimate': 2,
                     'works': 36907,
                     'advanced': 74,
                     'needTime': 36832,
                     'needYield': 0,
                     'saveState': 36,
                     'restoreState': 36,
                     'isEOF': 1,
                     'direction': 'forward',
                     'docsExamined': 36905},
 'allPlansExecution': []}


---
---
**Create multikey index**. 

Index field can be in ascending or descending order.

---

In [7]:
# Multikey index

db.airbnb.create_index(
                        [('host.neighbourhood_list', pymongo.ASCENDING)],
                        name = 'neighbourhood_list'
                      )

'neighbourhood_list'

---
**Query with index**

Documents where hosts have accomodation in `Gravesend` neighbourhood.

----

In [8]:
# Query

pp.pprint(
            db.airbnb.find(
                            {
                                'host.neighbourhood_list': 'Gravesend'
                            })\
                     .explain()['executionStats']
)

{'executionSuccess': True,
 'nReturned': 74,
 'executionTimeMillis': 2,
 'totalKeysExamined': 74,
 'totalDocsExamined': 74,
 'executionStages': {'stage': 'FETCH',
                     'nReturned': 74,
                     'executionTimeMillisEstimate': 0,
                     'works': 75,
                     'advanced': 74,
                     'needTime': 0,
                     'needYield': 0,
                     'saveState': 0,
                     'restoreState': 0,
                     'isEOF': 1,
                     'docsExamined': 74,
                     'alreadyHasObj': 0,
                     'inputStage': {'stage': 'IXSCAN',
                                    'nReturned': 74,
                                    'executionTimeMillisEstimate': 0,
                                    'works': 75,
                                    'advanced': 74,
                                    'needTime': 0,
                                    'needYield': 0,
                          

---

Documents where hosts have accomodation in `Kensington` and `Gravesend`.


---

In [9]:
# Explain query execution

pp.pprint(
            db.airbnb.find({
                                'host.neighbourhood_list': {
                                                                '$all': ['Gravesend', 
                                                                        'Kensington']
                                                            }
                            })\
                     .explain()['executionStats']
)

{'executionSuccess': True,
 'nReturned': 8,
 'executionTimeMillis': 0,
 'totalKeysExamined': 74,
 'totalDocsExamined': 74,
 'executionStages': {'stage': 'FETCH',
                     'filter': {'host.neighbourhood_list': {'$eq': 'Kensington'}},
                     'nReturned': 8,
                     'executionTimeMillisEstimate': 0,
                     'works': 76,
                     'advanced': 8,
                     'needTime': 66,
                     'needYield': 0,
                     'saveState': 0,
                     'restoreState': 0,
                     'isEOF': 1,
                     'docsExamined': 74,
                     'alreadyHasObj': 0,
                     'inputStage': {'stage': 'IXSCAN',
                                    'nReturned': 74,
                                    'executionTimeMillisEstimate': 0,
                                    'works': 75,
                                    'advanced': 74,
                                    'needTime': 

----

Documents where hosts have only two accomodations and either `Gravesend` or `Kensington` neighbourhood is one of them.

----

In [10]:
# Explain query execution

pp.pprint(
            db.airbnb.find({
                                'host.neighbourhood_list': {
                                                                '$in': ['Gravesend', 
                                                                        'Kensington'],
                                                                '$size':2
                                                            }
                            })\
                     .explain()['executionStats']
)

{'executionSuccess': True,
 'nReturned': 21,
 'executionTimeMillis': 0,
 'totalKeysExamined': 211,
 'totalDocsExamined': 201,
 'executionStages': {'stage': 'FETCH',
                     'filter': {'host.neighbourhood_list': {'$size': 2}},
                     'nReturned': 21,
                     'executionTimeMillisEstimate': 0,
                     'works': 211,
                     'advanced': 21,
                     'needTime': 189,
                     'needYield': 0,
                     'saveState': 0,
                     'restoreState': 0,
                     'isEOF': 1,
                     'docsExamined': 201,
                     'alreadyHasObj': 0,
                     'inputStage': {'stage': 'IXSCAN',
                                    'nReturned': 201,
                                    'executionTimeMillisEstimate': 0,
                                    'works': 211,
                                    'advanced': 201,
                                    'needTime'

----
**Drop index**

---

In [11]:
# Drop indexes
db.airbnb.drop_indexes()

---
**Create compound multikey index.**

----

In [12]:
# Compound multikey index

db.airbnb.create_index(
                            [
                                ('host.neighbourhood_list', pymongo.ASCENDING),
                                ('host.listings_count', pymongo.ASCENDING),
                            ],
                            name = 'listings_neighbourhood'
                        )

'listings_neighbourhood'

---

Documents where host has accomodations in `Harlem` as well as in `Midtown`. And where the `listings_count` is greater than 4.

----

In [13]:
# Query on compound multikey index

pp.pprint(
            db.airbnb.find(
                            {
                                'host.neighbourhood_list': {
                                                                '$all': ['Harlem', 
                                                                        'Midtown']
                                                            },
                                'host.listings_count': {'$gt': 4},
                            })\
                     .explain()['executionStats']
    )

{'executionSuccess': True,
 'nReturned': 251,
 'executionTimeMillis': 5,
 'totalKeysExamined': 696,
 'totalDocsExamined': 696,
 'executionStages': {'stage': 'FETCH',
                     'filter': {'host.neighbourhood_list': {'$eq': 'Midtown'}},
                     'nReturned': 251,
                     'executionTimeMillisEstimate': 4,
                     'works': 697,
                     'advanced': 251,
                     'needTime': 445,
                     'needYield': 0,
                     'saveState': 1,
                     'restoreState': 1,
                     'isEOF': 1,
                     'docsExamined': 696,
                     'alreadyHasObj': 0,
                     'inputStage': {'stage': 'IXSCAN',
                                    'nReturned': 696,
                                    'executionTimeMillisEstimate': 4,
                                    'works': 697,
                                    'advanced': 696,
                                    '

---
### Exercise 1 - 

Look for documents where the `host.listings_count` is greater than 4 and `host.neighbourhood_list` contains `Kensington` or `Bensonhurst`.

Find the distinct hosts from the above query. 

*Hint - Use distinct() method on `host.id`.*

---

### Exercise 2 - 

Look for documents where the `host.listings_count` is greater than 10 and where the `host.neighbourhood_list` contains 3 elements neither of which should be 'Upper East Side' or 'Upper West Side'.

Find the distinct hosts from the above query. 

*Hint - Use distinct() method on `host.id`.*

---