# Geospatial Index

---
### [Geospatial Index](https://docs.mongodb.com/manual/geospatial-queries/)

- Support queries on geospatial data.
> - Think of GPS location in Google maps, Uber, Zomato, etc.

- Support objects for point, line, polygon, etc.

- Can find nearest point to a given location, point existing within a bounding geometry, and more.

----

In [6]:
!mongorestore --db training Data/airbnb

2023-08-24T16:31:50.938-0600	The --db and --collection flags are deprecated for this use-case; please use --nsInclude instead, i.e. with --nsInclude=${DATABASE}.${COLLECTION}
2023-08-24T16:31:50.940-0600	building a list of collections to restore from Data\airbnb dir
2023-08-24T16:31:50.940-0600	reading metadata for training.airbnb from Data\airbnb\airbnb.metadata.json
2023-08-24T16:31:50.964-0600	restoring training.airbnb from Data\airbnb\airbnb.bson
2023-08-24T16:31:51.738-0600	finished restoring training.airbnb (36905 documents, 0 failures)
2023-08-24T16:31:51.738-0600	restoring indexes for collection training.airbnb from metadata
2023-08-24T16:31:51.738-0600	index: &idx.IndexDocument{Options:primitive.M{"2dsphereIndexVersion":3, "name":"location_2dsphere", "v":2}, Key:primitive.D{primitive.E{Key:"location", Value:"2dsphere"}}, PartialFilterExpression:primitive.D(nil)}
2023-08-24T16:31:52.115-0600	36905 document(s) restored successfully. 0 document(s) failed to restore.


In [16]:
db.list_collection_names()

['airbnb']

In [17]:
db.airbnb

Collection(Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True), 'nyc'), 'airbnb')

In [25]:
db.nyc.count_documents({})

0

### Connect to local server

---

In [7]:
# Importing the required libraries
import pymongo

import pprint as pp
pp.sorted = lambda x, key=None: x

In [8]:
# Connect to local host
client = pymongo.MongoClient("mongodb://localhost:27017/")

In [9]:
# Connect to database
db = client['nyc']

In [10]:
# Sample document
pp.pprint(db.airbnb.find_one())

None


---
**Drop previous indexes.**

---

In [None]:
db.airbnb.drop_indexes()

---
**Create geospatial index.**

----

In [None]:
# Create geospatial index
db.airbnb.create_index([
                        ('location', '2dsphere')
                    ])

----
### $nearSphere operator

Suppose you want to find an accomodation close to location -73.93414657 longitude and 40.82302903 latitude.

[$nearSphere](https://docs.mongodb.com/manual/reference/operator/query/nearSphere/#mongodb-query-op.-nearSphere) specifies a point for which a geospatial query returns the documents from nearest to farthest.

----

In [None]:
# Query using geospatial index
pp.pprint(
    db.airbnb.find_one({
                        'location': {
                                        '$nearSphere': {
                                                            '$geometry': {
                                                                            'type': 'Point',
                                                                            'coordinates': [-73.93414657, 40.82302903]
                                                                        }
                                                        }
                                                    }
                        })
)

---
**Query using minimum or maximum distances to narrow down the search parameter.**

`$minDistance` filters the results of query to those documents that are at least the specified distance from the center point. While `$maxDistance` determines the maximum distance to look for. Both fields take values in meters.

---

In [None]:
# Query using minDistance
cur = db.airbnb.find({
                'location': {
                                '$nearSphere': {
                                                    '$geometry': {
                                                                    'type': 'Point',
                                                                    'coordinates': [-73.93414657, 40.82302903]
                                                                },
                                                    '$maxDistance': 1500
                                                }
                                            }
            })

for doc in cur:
    print(doc)

---
For exclusion certain documents based on location, we can use `$minDistance`.

---

In [None]:
# Query minDistance and maxDistance
cur = db.airbnb.find({
                'location': {
                                '$nearSphere': {
                                                    '$geometry': {
                                                        'type': 'Point',
                                                        'coordinates': [-73.93414657, 40.82302903]
                                                    },
                                                    '$minDistance': 1000,
                                                    '$maxDistance': 1500
                                                }
                            }
            })

for doc in cur:
    pp.pprint(doc)

----
### Neighbourhood data

New York City neighbourhood boundaries data. 

Website - https://opendata.cityofnewyork.us/

----

In [None]:
# # Restore neighbourhoods data
# !mongorestore --db nyc --collection neighbourhoods /home/avadmin/Desktop/Mongo/Indexing/nyc_neighbourhoods/nyc/neighbourhoods.bson

In [None]:
# Collections
db.list_collection_names()

In [None]:
# Sample document
pp.pprint(
    db.neighbourhoods.find_one()
)

In [None]:
# Create geospatial index
db.neighbourhood.create_index([('geometry', '2dsphere')])

---
### $geoWithin operator

Suppose you need to find out the number of accomodations within a specific neighbourhood.

You would need to use [$geoWithin](https://docs.mongodb.com/manual/reference/operator/query/geoWithin/#mongodb-query-op.-geoWithin) that selects documents with geospatial data that exists entirely within a specified shape.

Find how many accomodations fall in the `Upper West Side` neighbourhood.

---

In [None]:
# Query
pp.pprint(
        db.neighbourhoods.find_one({
                                'properties.ntaname': 'Upper West Side'
                            })
)

In [None]:
# Neighbourhood
neighbourhood_loc = db.neighbourhoods.find_one({
                                                'properties.ntaname': 'Upper West Side'
                                            })['geometry']

In [None]:
# Neighbourhood geometry
neighbourhood_loc

----

Find all the documents that fall within the neighbourhood boundary in the airbnb collection. 

---

In [None]:
# Number of accomodations that fall within the neighbourhood
db.airbnb.find({
                'location': {
                                '$geoWithin': {
                                                '$geometry': neighbourhood_loc
                                            }
                            }
            }).count()

In [None]:
# Documents
cur = db.airbnb.find({
                        'location': {
                                        '$geoWithin': {
                                            '$geometry': neighbourhood_loc
                                                    }
                                    }
                    },
                    {
                        'neighbourhood':1,
                        '_id':0,
                        'accom_id':1
                    })

for doc in cur:
    pp.pprint(doc)

---
### Aggregation Pipeline

We can calculate `$nearSphere` queries in aggregate pipeline suing [$geoNear](https://docs.mongodb.com/manual/reference/operator/aggregation/geoNear/#-geonear--aggregation-) operator.

It outputs documents in order of nearest to farthest from a specified point.

**Syntax -** `{ $geoNear: { <geoNear options> } }`

The `$geoNear` pipeline operator takes advantage of a geospatial index. When using `$geoNear`, the `$geoNear` pipeline operation must appear as the first stage in an aggregation pipeline.


----


For example, finding all documents in airbnb collection nearest to `[-73.93414657, 40.82302903]` between 1000 and 5000 meters from it.

---

In [None]:
db.airbnb.find_one({})

In [None]:
# Aggregate pipeline
cur = db.airbnb.aggregate([
                        # geoNear
                        {
                            '$geoNear':{
                                            # Point
                                            'near': {
                                                        'type': 'Point',
                                                        'coordinates': [-73.93414657, 40.82302903]
                                                    },
                                            # Output field with calculated distance
                                            'distanceField': 'Distance',
                                            # Optional fields
                                            # Spherical geometry
                                            'spherical': True,
                                            # Maximum distance
                                            'maxDistance': 5000,
                                            # Minimum distance
                                            'minDistance': 1000, 
                                            # Quey
                                            'query': {'room_type': 'Private room'},
                                            # Location of the matched document
                                            'includeLocs': 'Location'
                                        }
                        },
                        # Project
                        {
                            '$project':{
                                            '_id':0,
                                            'ID': '$accom_id',
                                            'Distance': 1,
                                            'Location': 1,
                                            'Room': '$room_type'
                                        }
                        },
                        # Limit
                        {
                            '$limit': 5
                       }
                ])

for doc in cur:
    pp.pprint(doc)

----
### Exercises

- Find number of accomodations within 500 meters around `-73.9857, 40.7484` in airbnb collection?

- How many accomodations in airbnb are within the neighbourhoods whose `boro_name` is `Manhattan` and `boro_code` is `1`? ***Use the neighbourhoods collections for this along with the $geoWithin operator.*** 


----