### Problem Statement

The data is originally taken from the [NYC Open Data website](https://opendata.cityofnewyork.us/) and contains data related to park events in the New York City area.

The data provided here contain two collections - **events** and **neighbourhoods**.

**events** collection documents have the following fields - 

- `event_id` - Unique event id

- `title` - Name of the event

- `start_date_time` - The start date and time of the event

- `end_date_time` - The end date and time of the event

- `snippet` - A brief description of the event

- `cost_free` - Indicating whether an event is free (0) or not (1)

- `must_see` - Indicates if event should be featured on Parks website with "Must See" banner. 0 if event is not featured and 1 if event is featured.

- `location_name` - Location name where event takes place

- `location` - Longitude and latitude of the location of event


**neighbourhoods** collection documents have the following fields -

- `properties` - Embedded document containing information related to the neighbourhood

>- `ntacode` - Neighbourhood code
>- `ntaname` - Neighbourhood name
>- `boro_code` - Code of borough in which neighbourhood falls
>- `boro_name` - name of borough in which neighbourhood falls

- `geometry` - GEOJSON object containing coordinates of boundary of the neighbourhood 



----

*The data for **events** collection has been originally taken from - https://data.cityofnewyork.us/browse?Data-Collection_Data-Collection=NYC+Parks+Events&sortBy=most_accessed&utf8=%E2%9C%93*

*The data for **neighbourhoods** collectio has been originally taken from - https://data.cityofnewyork.us/City-Government/Neighborhood-Tabulation-Areas-NTA-/cpf4-rkhq*


----

### Connecting to MongoDB


----

In [1]:
# Importing the required libraries
import pymongo
import pprint as pp

from datetime import datetime

pp.sorted = lambda x, key=None: x

In [2]:
client = pymongo.MongoClient("mongodb://localhost:27017/")

In [3]:
client

MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True)

---
### Importing data

----

In [4]:
# # Restore database

# mongorestore c:\temp\av\indexing_assignment\events.bson
# mongorestore c:\temp\av\indexing_assignment\neighbourhoods.bson

In [5]:
# List databases
client.list_database_names()

['admin',
 'av',
 'config',
 'indexing_assignment',
 'local',
 'sample',
 'sample_analytics']

In [6]:
# Database
db = client['indexing_assignment']

In [7]:
# List collections
db.list_collection_names()

['events', 'neighbourhoods']

In [8]:
# Sample document
pp.pprint(
    db.events.find_one()
)

{'_id': ObjectId('60d9cb7310d0be7a77638579'),
 'event_id': 173635,
 'title': 'Central Park Tour: Iconic Views of Central Park',
 'start_date_time': datetime.datetime(2018, 10, 21, 11, 0),
 'end_date_time': datetime.datetime(2018, 10, 21, 12, 30),
 'snippet': 'Some of New York’s most iconic sights are found in Central Park, '
            'including the fountain at Bethesda Terrace and Bow Bridge. Join '
            'Central Park Conservancy guides for an insider’s look.',
 'cost_free': 0,
 'must_see': 0,
 'location_name': 'Dairy Visitor Center & Gift Shop',
 'location': {'type': 'Point',
              'coordinates': [-73.973614931107, 40.769109102536]}}


In [9]:
# Sample document
pp.pprint(
    db.neighbourhoods.find_one()
)

{'_id': ObjectId('60d9d8036fa8d9e558634fc0'),
 'properties': {'ntacode': 'SI05',
                'ntaname': 'New Springville-Bloomfield-Travis',
                'boro_name': 'Staten Island',
                'boro_code': '5'},
 'geometry': {'type': 'MultiPolygon',
              'coordinates': [[[[-74.15379116657292, 40.61225561225275],
                                [-74.1537183422139, 40.612206281350744],
                                [-74.1531727534184, 40.611761521752825],
                                [-74.1512034818308, 40.610264994879955],
                                [-74.15031900700392, 40.609592814688725],
                                [-74.15018666451151, 40.609497811155244],
                                [-74.14998477050442, 40.60953841783077],
                                [-74.14954343557085, 40.60962717289029],
                                [-74.14878329527237, 40.609199987171],
                                [-74.14852820182477, 40.60908079531739],
      

                                [-74.20979737148632, 40.56702877942574],
                                [-74.20977691342713, 40.5670703728058],
                                [-74.20976134380433, 40.567098207219836],
                                [-74.20973856379577, 40.56713340987283],
                                [-74.20971519072347, 40.56717473398208],
                                [-74.20969971286488, 40.567209072409554],
                                [-74.20967795471469, 40.56724902182599],
                                [-74.20964655233287, 40.56728992198326],
                                [-74.20962824988968, 40.56731836464221],
                                [-74.20960959528612, 40.56736116242448],
                                [-74.2095963852099, 40.567395386799255],
                                [-74.20957929223883, 40.56744243665677],
                                [-74.20955643852933, 40.56748894802501],
                                [-74.2095412585739

                                [-74.20080892377673, 40.61707296732945],
                                [-74.20082804386197, 40.61709428804825],
                                [-74.20083040964775, 40.61713528352543],
                                [-74.20083723821065, 40.6171694381884],
                                [-74.20084742522637, 40.6172027324161],
                                [-74.20084535241517, 40.6172591114855],
                                [-74.20084328605802, 40.61731719833679],
                                [-74.20084680818466, 40.617370150367776],
                                [-74.20086267588442, 40.61742820588299],
                                [-74.20087629624597, 40.61748455752935],
                                [-74.20087646075423, 40.61753922321929],
                                [-74.20087662783493, 40.61759474363882],
                                [-74.20089136968501, 40.61765109231757],
                                [-74.20089938404747, 

----
### Assignment Questions


Note - View all queries before attempting the questions. Use proper indexing to answer the questions.

----

----
**Drop previous indexes.**

---

In [10]:
# Drop indexes
db.events.drop_indexes()
db.neighbourhoods.drop_indexes()

<div class="alert alert-block alert-info">
Q1, Q2, Q3 - Without indexs
</div>

### Q1

How many events were `must see events`?

In [11]:
# Enter your code here
db.events.count_documents({'must_see':{'$eq':1}})

4360

In [12]:
db.events.find({'must_see':{'$eq':1}}).explain()['executionStats']

{'executionSuccess': True,
 'nReturned': 4360,
 'executionTimeMillis': 88,
 'totalKeysExamined': 0,
 'totalDocsExamined': 76526,
 'executionStages': {'stage': 'COLLSCAN',
  'filter': {'must_see': {'$eq': 1}},
  'nReturned': 4360,
  'executionTimeMillisEstimate': 5,
  'works': 76528,
  'advanced': 4360,
  'needTime': 72167,
  'needYield': 0,
  'saveState': 76,
  'restoreState': 76,
  'isEOF': 1,
  'direction': 'forward',
  'docsExamined': 76526},
 'allPlansExecution': []}

### Q2

How `many events` were must see as well as `cost free`?

In [13]:
# Enter your code here
db.events.count_documents(
    {
        'must_see':{'$eq':1},
        'cost_free':{'$eq':1}
    }
)

3643

In [14]:
db.events.find(
    {
        'must_see':{'$eq':1},
        'cost_free':{'$eq':1}
    }
).explain()['executionStats']

{'executionSuccess': True,
 'nReturned': 3643,
 'executionTimeMillis': 162,
 'totalKeysExamined': 0,
 'totalDocsExamined': 76526,
 'executionStages': {'stage': 'COLLSCAN',
  'filter': {'$and': [{'cost_free': {'$eq': 1}}, {'must_see': {'$eq': 1}}]},
  'nReturned': 3643,
  'executionTimeMillisEstimate': 18,
  'works': 76528,
  'advanced': 3643,
  'needTime': 72884,
  'needYield': 0,
  'saveState': 76,
  'restoreState': 76,
  'isEOF': 1,
  'direction': 'forward',
  'docsExamined': 76526},
 'allPlansExecution': []}

### Q3

How many `must see and cost free events` were held after `2018-01-01`?

In [15]:
# Enter your code here
# Import datetime lib


In [16]:
db.events.count_documents(
    {
        'must_see':{'$eq':1},
        'cost_free':{'$eq':1},
        'start_date_time':{'$gt': datetime(2018, 1, 1)}
    }
)

597

In [17]:
pp.pprint(
    db.events.find(
    {
        'must_see':{'$eq':1},
        'cost_free':{'$eq':1},
        'start_date_time':{'$gt': datetime(2018, 1, 1)}
    }).explain()['executionStats']
)

{'executionSuccess': True,
 'nReturned': 597,
 'executionTimeMillis': 106,
 'totalKeysExamined': 0,
 'totalDocsExamined': 76526,
 'executionStages': {'stage': 'COLLSCAN',
                     'filter': {'$and': [{'cost_free': {'$eq': 1}},
                                         {'must_see': {'$eq': 1}},
                                         {'start_date_time': {'$gt': datetime.datetime(2018, 1, 1, 0, 0)}}]},
                     'nReturned': 597,
                     'executionTimeMillisEstimate': 3,
                     'works': 76528,
                     'advanced': 597,
                     'needTime': 75930,
                     'needYield': 0,
                     'saveState': 76,
                     'restoreState': 76,
                     'isEOF': 1,
                     'direction': 'forward',
                     'docsExamined': 76526},
 'allPlansExecution': []}


<div class="alert alert-block alert-info">
Q1, Q2, Q3 - With `indexs`
</div>

In [18]:
#  Create index on - must_see,  cost_free and start_date_time
db.events.create_index(
                        # Compound index
                        [
                            ('must_see', pymongo.ASCENDING),
                            ('cost_free', pymongo.ASCENDING),
                            ('start_date_time',pymongo.DESCENDING)
                        ],
                        # Index name
                        name = 'see_cost_stDt')

'see_cost_stDt'

In [19]:
db.events.index_information()

{'_id_': {'v': 2, 'key': [('_id', 1)]},
 'see_cost_stDt': {'v': 2,
  'key': [('must_see', 1), ('cost_free', 1), ('start_date_time', -1)]}}

In [20]:
# Storage statistics

db.command('collStats', 'events')

{'ns': 'indexing_assignment.events',
 'size': 32425004,
 'count': 76526,
 'avgObjSize': 423,
 'storageSize': 16175104,
 'freeStorageSize': 0,
 'capped': False,
 'wiredTiger': {'metadata': {'formatVersion': 1},
  'creationString': 'access_pattern_hint=none,allocation_size=4KB,app_metadata=(formatVersion=1),assert=(commit_timestamp=none,durable_timestamp=none,read_timestamp=none,write_timestamp=off),block_allocation=best,block_compressor=snappy,cache_resident=false,checksum=on,colgroups=,collator=,columns=,dictionary=0,encryption=(keyid=,name=),exclusive=false,extractor=,format=btree,huffman_key=,huffman_value=,ignore_in_memory_cache_size=false,immutable=false,import=(enabled=false,file_metadata=,repair=false),internal_item_max=0,internal_key_max=0,internal_key_truncate=true,internal_page_max=4KB,key_format=q,key_gap=10,leaf_item_max=0,leaf_key_max=0,leaf_page_max=32KB,leaf_value_max=64MB,log=(enabled=true),lsm=(auto_throttle=true,bloom=true,bloom_bit_count=16,bloom_config=,bloom_hash_co

### Q1

How many events were `must see events`?

In [21]:
# Enter your code here
db.events.count_documents({'must_see':{'$eq':1}})

4360

In [22]:
db.events.find({'must_see':{'$eq':1}}).explain()['executionStats']

{'executionSuccess': True,
 'nReturned': 4360,
 'executionTimeMillis': 57,
 'totalKeysExamined': 4360,
 'totalDocsExamined': 4360,
 'executionStages': {'stage': 'FETCH',
  'nReturned': 4360,
  'executionTimeMillisEstimate': 38,
  'works': 4361,
  'advanced': 4360,
  'needTime': 0,
  'needYield': 0,
  'saveState': 4,
  'restoreState': 4,
  'isEOF': 1,
  'docsExamined': 4360,
  'alreadyHasObj': 0,
  'inputStage': {'stage': 'IXSCAN',
   'nReturned': 4360,
   'executionTimeMillisEstimate': 29,
   'works': 4361,
   'advanced': 4360,
   'needTime': 0,
   'needYield': 0,
   'saveState': 4,
   'restoreState': 4,
   'isEOF': 1,
   'keyPattern': {'must_see': 1, 'cost_free': 1, 'start_date_time': -1},
   'indexName': 'see_cost_stDt',
   'isMultiKey': False,
   'multiKeyPaths': {'must_see': [], 'cost_free': [], 'start_date_time': []},
   'isUnique': False,
   'isSparse': False,
   'isPartial': False,
   'indexVersion': 2,
   'direction': 'forward',
   'indexBounds': {'must_see': ['[1, 1]'],
    'c

### Q2

How `many events` were must see as well as `cost free`?

In [23]:
# Enter your code here
db.events.count_documents(
    {
        'must_see':{'$eq':1},
        'cost_free':{'$eq':1}
    }
)

3643

In [24]:
db.events.find(
    {
        'must_see':{'$eq':1},
        'cost_free':{'$eq':1}
    }
).explain()['executionStats']

{'executionSuccess': True,
 'nReturned': 3643,
 'executionTimeMillis': 19,
 'totalKeysExamined': 3643,
 'totalDocsExamined': 3643,
 'executionStages': {'stage': 'FETCH',
  'nReturned': 3643,
  'executionTimeMillisEstimate': 10,
  'works': 3644,
  'advanced': 3643,
  'needTime': 0,
  'needYield': 0,
  'saveState': 3,
  'restoreState': 3,
  'isEOF': 1,
  'docsExamined': 3643,
  'alreadyHasObj': 0,
  'inputStage': {'stage': 'IXSCAN',
   'nReturned': 3643,
   'executionTimeMillisEstimate': 5,
   'works': 3644,
   'advanced': 3643,
   'needTime': 0,
   'needYield': 0,
   'saveState': 3,
   'restoreState': 3,
   'isEOF': 1,
   'keyPattern': {'must_see': 1, 'cost_free': 1, 'start_date_time': -1},
   'indexName': 'see_cost_stDt',
   'isMultiKey': False,
   'multiKeyPaths': {'must_see': [], 'cost_free': [], 'start_date_time': []},
   'isUnique': False,
   'isSparse': False,
   'isPartial': False,
   'indexVersion': 2,
   'direction': 'forward',
   'indexBounds': {'must_see': ['[1, 1]'],
    'co

### Q3

How many `must see and cost free events` were held after `2018-01-01`?

In [25]:
# Enter your code here
db.events.count_documents(
    {
        'must_see':{'$eq':1},
        'cost_free':{'$eq':1},
        'start_date_time':{'$gt': datetime(2018, 1, 1)}
    }
)

597

In [26]:
pp.pprint(
    db.events.find(
    {
        'must_see':{'$eq':1},
        'cost_free':{'$eq':1},
        'start_date_time':{'$gt': datetime(2018, 1, 1)}
    }).explain()['executionStats']
)

{'executionSuccess': True,
 'nReturned': 597,
 'executionTimeMillis': 9,
 'totalKeysExamined': 597,
 'totalDocsExamined': 597,
 'executionStages': {'stage': 'FETCH',
                     'nReturned': 597,
                     'executionTimeMillisEstimate': 10,
                     'works': 598,
                     'advanced': 597,
                     'needTime': 0,
                     'needYield': 0,
                     'saveState': 0,
                     'restoreState': 0,
                     'isEOF': 1,
                     'docsExamined': 597,
                     'alreadyHasObj': 0,
                     'inputStage': {'stage': 'IXSCAN',
                                    'nReturned': 597,
                                    'executionTimeMillisEstimate': 0,
                                    'works': 598,
                                    'advanced': 597,
                                    'needTime': 0,
                                    'needYield': 0,
               

### Q4

How many indexes did you use to answer the above queries? List the index keys for each index used.

In [27]:
# Answer
db.events.index_information()

{'_id_': {'v': 2, 'key': [('_id', 1)]},
 'see_cost_stDt': {'v': 2,
  'key': [('must_see', 1), ('cost_free', 1), ('start_date_time', -1)]}}

### Q5

What was the combined size of all the index created for the above queries?

In [28]:
# Answer
db.events.totalIndexSize()

TypeError: 'Collection' object is not callable. If you meant to call the 'totalIndexSize' method on a 'Collection' object it is failing because no such method exists.

In [29]:
db.command('collStats.indexSizes', 'events')

OperationFailure: no such command: 'collStats.indexSizes', full error: {'ok': 0.0, 'errmsg': "no such command: 'collStats.indexSizes'", 'code': 59, 'codeName': 'CommandNotFound'}

In [30]:
# Storage statistics
db.command('collStats', 'events')

{'ns': 'indexing_assignment.events',
 'size': 32425004,
 'count': 76526,
 'avgObjSize': 423,
 'storageSize': 16175104,
 'freeStorageSize': 0,
 'capped': False,
 'wiredTiger': {'metadata': {'formatVersion': 1},
  'creationString': 'access_pattern_hint=none,allocation_size=4KB,app_metadata=(formatVersion=1),assert=(commit_timestamp=none,durable_timestamp=none,read_timestamp=none,write_timestamp=off),block_allocation=best,block_compressor=snappy,cache_resident=false,checksum=on,colgroups=,collator=,columns=,dictionary=0,encryption=(keyid=,name=),exclusive=false,extractor=,format=btree,huffman_key=,huffman_value=,ignore_in_memory_cache_size=false,immutable=false,import=(enabled=false,file_metadata=,repair=false),internal_item_max=0,internal_key_max=0,internal_key_truncate=true,internal_page_max=4KB,key_format=q,key_gap=10,leaf_item_max=0,leaf_key_max=0,leaf_page_max=32KB,leaf_value_max=64MB,log=(enabled=true),lsm=(auto_throttle=true,bloom=true,bloom_bit_count=16,bloom_config=,bloom_hash_co

# From above details - 
'nindexes': 2,
'file size in bytes': 819200,

### Q6

How many events have the exact term `"Central Park" but not the term "Tour"` in the `title` field? 

***Hint - You will need to create a text index here.***

In [31]:
# Enter your code here
# Sample document
pp.pprint(
    db.events.find_one()
)

{'_id': ObjectId('60d9cb7310d0be7a77638579'),
 'event_id': 173635,
 'title': 'Central Park Tour: Iconic Views of Central Park',
 'start_date_time': datetime.datetime(2018, 10, 21, 11, 0),
 'end_date_time': datetime.datetime(2018, 10, 21, 12, 30),
 'snippet': 'Some of New York’s most iconic sights are found in Central Park, '
            'including the fountain at Bethesda Terrace and Bow Bridge. Join '
            'Central Park Conservancy guides for an insider’s look.',
 'cost_free': 0,
 'must_see': 0,
 'location_name': 'Dairy Visitor Center & Gift Shop',
 'location': {'type': 'Point',
              'coordinates': [-73.973614931107, 40.769109102536]}}


In [32]:
# Create text index on title field
db.events.create_index(
                        [('title', 'text')],
                        name = 'title_txt_index'
                )

'title_txt_index'

In [33]:
db.events.index_information()

{'_id_': {'v': 2, 'key': [('_id', 1)]},
 'see_cost_stDt': {'v': 2,
  'key': [('must_see', 1), ('cost_free', 1), ('start_date_time', -1)]},
 'title_txt_index': {'v': 2,
  'key': [('_fts', 'text'), ('_ftsx', 1)],
  'weights': SON([('title', 1)]),
  'default_language': 'english',
  'language_override': 'language',
  'textIndexVersion': 3}}

In [34]:
# Perform text search on collection with text index using $text operator
# How many events have the exact term "Central Park" but not the term "Tour" in the title field?
db.events.count_documents({'$text':{'$search': '"Central Park" -Tour'}})

462

In [35]:
# Explain
db.events.find({'$text':{'$search': '"Central Park" -Tour'}}).explain()['executionStats']

{'executionSuccess': True,
 'nReturned': 462,
 'executionTimeMillis': 460,
 'totalKeysExamined': 24778,
 'totalDocsExamined': 21473,
 'executionStages': {'stage': 'TEXT_MATCH',
  'nReturned': 462,
  'executionTimeMillisEstimate': 395,
  'works': 24780,
  'advanced': 462,
  'needTime': 24317,
  'needYield': 0,
  'saveState': 36,
  'restoreState': 36,
  'isEOF': 1,
  'indexPrefix': {},
  'indexName': 'title_txt_index',
  'parsedTextQuery': {'terms': ['central', 'park'],
   'negatedTerms': ['tour'],
   'phrases': ['Central Park'],
   'negatedPhrases': []},
  'textIndexVersion': 3,
  'docsRejected': 21011,
  'inputStage': {'stage': 'FETCH',
   'nReturned': 21473,
   'executionTimeMillisEstimate': 130,
   'works': 24780,
   'advanced': 21473,
   'needTime': 3306,
   'needYield': 0,
   'saveState': 36,
   'restoreState': 36,
   'isEOF': 1,
   'docsExamined': 21473,
   'alreadyHasObj': 0,
   'inputStage': {'stage': 'OR',
    'nReturned': 21473,
    'executionTimeMillisEstimate': 93,
    'work

### Q7

How many events were held in `Williamsburg` neighbourhood of `Brooklyn` borough?

***Hint - Create geospatial index for this query. Use the `neighbourhoods` collection for geometry of the neighbourhood. Query on the `ntaname` and `boro_name` fields.***

In [36]:
# Enter your code here
# Sample document
pp.pprint(
    db.neighbourhoods.find_one()
)

{'_id': ObjectId('60d9d8036fa8d9e558634fc0'),
 'properties': {'ntacode': 'SI05',
                'ntaname': 'New Springville-Bloomfield-Travis',
                'boro_name': 'Staten Island',
                'boro_code': '5'},
 'geometry': {'type': 'MultiPolygon',
              'coordinates': [[[[-74.15379116657292, 40.61225561225275],
                                [-74.1537183422139, 40.612206281350744],
                                [-74.1531727534184, 40.611761521752825],
                                [-74.1512034818308, 40.610264994879955],
                                [-74.15031900700392, 40.609592814688725],
                                [-74.15018666451151, 40.609497811155244],
                                [-74.14998477050442, 40.60953841783077],
                                [-74.14954343557085, 40.60962717289029],
                                [-74.14878329527237, 40.609199987171],
                                [-74.14852820182477, 40.60908079531739],
      

                                [-74.21192721864456, 40.55945666388716],
                                [-74.21191561803289, 40.55948255196265],
                                [-74.21190768055529, 40.55950120769871],
                                [-74.21190056124928, 40.55952005525457],
                                [-74.2118942710298, 40.559539072998895],
                                [-74.2118888118123, 40.559558241327196],
                                [-74.21188419516277, 40.559577536093954],
                                [-74.21188042321681, 40.559596937861784],
                                [-74.21187398427335, 40.55961152202366],
                                [-74.21186683213746, 40.55962591149037],
                                [-74.2118589825614, 40.55964008646415],
                                [-74.21185044273771, 40.55965402934092],
                                [-74.21184122621574, 40.5596677183166],
                                [-74.21183134501631

                                [-74.20069092927316, 40.61921029084947],
                                [-74.20070789218406, 40.61925980264958],
                                [-74.20071472324891, 40.61929481103731],
                                [-74.200739547533, 40.61934943365838],
                                [-74.20075316134255, 40.619388013817364],
                                [-74.20075764740834, 40.619404069692656],
                                [-74.20077574921665, 40.619459557774206],
                                [-74.20079832794018, 40.61951247701321],
                                [-74.20082091697915, 40.6195688129885],
                                [-74.20085806260045, 40.61961999679621],
                                [-74.20089632653291, 40.61967032608111],
                                [-74.2009356989252, 40.6197163819225],
                                [-74.20096499633091, 40.61976758108818],
                                [-74.20097296280494, 

In [37]:
# Create geospatial index
db.neighbourhoods.create_index([('geometry', '2dsphere')])

'geometry_2dsphere'

In [38]:
db.neighbourhoods.index_information()

{'_id_': {'v': 2, 'key': [('_id', 1)]},
 'geometry_2dsphere': {'v': 2,
  'key': [('geometry', '2dsphere')],
  '2dsphereIndexVersion': 3}}

In [39]:
# Query
pp.pprint(
        db.neighbourhoods.find_one({
                                'properties.ntaname': 'Williamsburg'
                            })
)

{'_id': ObjectId('60d9d8036fa8d9e558634f37'),
 'properties': {'ntacode': 'BK72',
                'ntaname': 'Williamsburg',
                'boro_name': 'Brooklyn',
                'boro_code': '3'},
 'geometry': {'type': 'MultiPolygon',
              'coordinates': [[[[-73.95023693757913, 40.70547324665451],
                                [-73.94983788592845, 40.70522902156278],
                                [-73.94947068333505, 40.70501901928315],
                                [-73.94925335721452, 40.70489593076167],
                                [-73.94866498773293, 40.70456507443061],
                                [-73.9485496267594, 40.70450043097954],
                                [-73.94814066076677, 40.70427125438625],
                                [-73.94785367209381, 40.7041113832059],
                                [-73.94705205297525, 40.703663949340196],
                                [-73.94753858146478, 40.70335065066481],
                                [

In [40]:
db.neighbourhoods.find_one(
                            {
                                'properties.ntaname': 'Williamsburg',
                                'properties.boro_name': 'Brooklyn'
            
                            })

{'_id': ObjectId('60d9d8036fa8d9e558634f37'),
 'properties': {'ntacode': 'BK72',
  'ntaname': 'Williamsburg',
  'boro_name': 'Brooklyn',
  'boro_code': '3'},
 'geometry': {'type': 'MultiPolygon',
  'coordinates': [[[[-73.95023693757913, 40.70547324665451],
     [-73.94983788592845, 40.70522902156278],
     [-73.94947068333505, 40.70501901928315],
     [-73.94925335721452, 40.70489593076167],
     [-73.94866498773293, 40.70456507443061],
     [-73.9485496267594, 40.70450043097954],
     [-73.94814066076677, 40.70427125438625],
     [-73.94785367209381, 40.7041113832059],
     [-73.94705205297525, 40.703663949340196],
     [-73.94753858146478, 40.70335065066481],
     [-73.94937873477332, 40.702158889825995],
     [-73.9502742404355, 40.7015792419503],
     [-73.95128819368698, 40.70092236548564],
     [-73.95161150233153, 40.701211496744214],
     [-73.951920189279, 40.70148754916077],
     [-73.9525505277791, 40.702051667016995],
     [-73.95318085172319, 40.702616905477456],
     [-73

In [41]:
# Neighbourhood
neighbourhood_loc = db.neighbourhoods.find_one(
                            {
                                'properties.ntaname': 'Williamsburg',
                                'properties.boro_name': 'Brooklyn'
            
                            })['geometry']

In [42]:
# Neighbourhood geometry
neighbourhood_loc

{'type': 'MultiPolygon',
 'coordinates': [[[[-73.95023693757913, 40.70547324665451],
    [-73.94983788592845, 40.70522902156278],
    [-73.94947068333505, 40.70501901928315],
    [-73.94925335721452, 40.70489593076167],
    [-73.94866498773293, 40.70456507443061],
    [-73.9485496267594, 40.70450043097954],
    [-73.94814066076677, 40.70427125438625],
    [-73.94785367209381, 40.7041113832059],
    [-73.94705205297525, 40.703663949340196],
    [-73.94753858146478, 40.70335065066481],
    [-73.94937873477332, 40.702158889825995],
    [-73.9502742404355, 40.7015792419503],
    [-73.95128819368698, 40.70092236548564],
    [-73.95161150233153, 40.701211496744214],
    [-73.951920189279, 40.70148754916077],
    [-73.9525505277791, 40.702051667016995],
    [-73.95318085172319, 40.702616905477456],
    [-73.9538119690652, 40.70318097979544],
    [-73.95572361014882, 40.70194576955723],
    [-73.95745736438872, 40.70082260318484],
    [-73.95722517405659, 40.69999934952369],
    [-73.957167301

In [43]:
# Number of accomodations that fall within the neighbourhood
db.neighbourhoods.count_documents({
                'location': {
                                '$geoWithin': {
                                                '$geometry': neighbourhood_loc
                                            }
                            }
            })

0

In [44]:
# Why 0 ? 

In [45]:
db.neighbourhoods.count_documents(
                            {
                                'properties.ntaname': 'Williamsburg',
                                'properties.boro_name': 'Brooklyn'
            
                            })

1

In [46]:
cur = db.neighbourhoods.find(
                            {
                                'properties.ntaname': 'Williamsburg',
                                'properties.boro_name': 'Brooklyn'
            
                            })

# Print docs
for doc in cur :
    pp.pprint(doc)

{'_id': ObjectId('60d9d8036fa8d9e558634f37'),
 'properties': {'ntacode': 'BK72',
                'ntaname': 'Williamsburg',
                'boro_name': 'Brooklyn',
                'boro_code': '3'},
 'geometry': {'type': 'MultiPolygon',
              'coordinates': [[[[-73.95023693757913, 40.70547324665451],
                                [-73.94983788592845, 40.70522902156278],
                                [-73.94947068333505, 40.70501901928315],
                                [-73.94925335721452, 40.70489593076167],
                                [-73.94866498773293, 40.70456507443061],
                                [-73.9485496267594, 40.70450043097954],
                                [-73.94814066076677, 40.70427125438625],
                                [-73.94785367209381, 40.7041113832059],
                                [-73.94705205297525, 40.703663949340196],
                                [-73.94753858146478, 40.70335065066481],
                                [

### Q8

Name the title of the `paid and must see events` that are located maximum `500 meters` from the `Brooklyn Museum (coordinates = [-73.9636, 40.6712])` after `2018-06-06`.

In [47]:
# Sample document
pp.pprint(
    db.events.find_one()
)

{'_id': ObjectId('60d9cb7310d0be7a77638579'),
 'event_id': 173635,
 'title': 'Central Park Tour: Iconic Views of Central Park',
 'start_date_time': datetime.datetime(2018, 10, 21, 11, 0),
 'end_date_time': datetime.datetime(2018, 10, 21, 12, 30),
 'snippet': 'Some of New York’s most iconic sights are found in Central Park, '
            'including the fountain at Bethesda Terrace and Bow Bridge. Join '
            'Central Park Conservancy guides for an insider’s look.',
 'cost_free': 0,
 'must_see': 0,
 'location_name': 'Dairy Visitor Center & Gift Shop',
 'location': {'type': 'Point',
              'coordinates': [-73.973614931107, 40.769109102536]}}


In [48]:
# Create geospatial index
db.events.create_index([('location', '2dsphere')])

'location_2dsphere'

In [49]:
db.events.index_information()

{'_id_': {'v': 2, 'key': [('_id', 1)]},
 'see_cost_stDt': {'v': 2,
  'key': [('must_see', 1), ('cost_free', 1), ('start_date_time', -1)]},
 'title_txt_index': {'v': 2,
  'key': [('_fts', 'text'), ('_ftsx', 1)],
  'weights': SON([('title', 1)]),
  'default_language': 'english',
  'language_override': 'language',
  'textIndexVersion': 3},
 'location_2dsphere': {'v': 2,
  'key': [('location', '2dsphere')],
  '2dsphereIndexVersion': 3}}

In [50]:
# Enter your code here
# Aggregate pipeline
cur = db.events.aggregate([
                        # geoNear
                        {
                            '$geoNear':{
                                            # Point
                                            'near': {
                                                        'type': 'Point',
                                                        'coordinates': [-73.9636,40.6712]
                                                    },
                                            # Output field with calculated distance
                                            'distanceField': 'Distance',
                                            # Optional fields
                                            # Spherical geometry
                                            'spherical': True,
                                            # Maximum distance
                                            'maxDistance': 500,
                                            # Minimum distance
                                            #'minDistance': 1000, 
                                            #Query
                                            'query': {
                                                        'must_see':{'$eq':1},
                                                        'cost_free':{'$eq':0},
                                                        'start_date_time':{'$gt': datetime(2016, 1, 1)} # 2018,6,6 

                                            },
                                            # Location of the matched document
                                            'includeLocs': 'Location'
                                        }
                        },
                        # Project
                        {
                            '$project':{
                                            '_id':0,
                                            'title': 1,
                                            'location_name': 1,
                                            'cost_free': 1,
                                            'must_see':1,
                                            'start_date_time':1
                                        }
                        },
                        # Limit
                        {
                            '$limit': 20
                       }
                ])

for doc in cur:
    pp.pprint(doc)


{'title': 'Sakura Matsuri 2017: Brooklyn Botanic Garden Cherry Blossom '
          'Festival',
 'start_date_time': datetime.datetime(2017, 4, 29, 10, 0),
 'cost_free': 0,
 'must_see': 1,
 'location_name': 'Brooklyn Botanic Garden'}
{'title': 'Sakura Matsuri 2018: Brooklyn Botanic Garden Cherry Blossom '
          'Festival',
 'start_date_time': datetime.datetime(2018, 4, 28, 10, 0),
 'cost_free': 0,
 'must_see': 1,
 'location_name': 'Brooklyn Botanic Garden'}
{'title': 'Sakura Matsuri 2017: Brooklyn Botanic Garden Cherry Blossom '
          'Festival',
 'start_date_time': datetime.datetime(2017, 4, 29, 10, 0),
 'cost_free': 0,
 'must_see': 1,
 'location_name': 'Brooklyn Botanic Garden'}
{'title': 'Sakura Matsuri 2018: Brooklyn Botanic Garden Cherry Blossom '
          'Festival',
 'start_date_time': datetime.datetime(2018, 4, 29, 10, 0),
 'cost_free': 0,
 'must_see': 1,
 'location_name': 'Brooklyn Botanic Garden'}
{'title': 'Sakura Matsuri 2017: Brooklyn Botanic Garden Cherry Blossom '

In [51]:
# max dates are 2018/04 ..None in 2018/06..same event happens in April ( 2016,17 and 18 ) 

In [52]:
# Drop indexes
db.events.drop_indexes()
db.neighbourhoods.drop_indexes()