[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/mongodb-developer/GenAI-Showcase/blob/main/notebooks/advanced_techniques/geospatialqueries_vectorsearch_spritzes.ipynb)

[![View Article](https://img.shields.io/badge/View%20Article-blue)](https://www.mongodb.com/developer/products/mongodb/geospatial-queries-vector-search/)


First install your Google Maps library and install OpenAI, since we will need the Google Maps library for our Google Places API and we will need OpenAI to embed our documents.





In [None]:
!pip install googlemaps
!pip install openai==0.28

Collecting googlemaps
  Downloading googlemaps-4.10.0.tar.gz (33 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: googlemaps
  Building wheel for googlemaps (setup.py) ... [?25l[?25hdone
  Created wheel for googlemaps: filename=googlemaps-4.10.0-py3-none-any.whl size=40716 sha256=aa9beccb8d49bbfcafd19079f1c0a481c60a44b4f47d62b4a571907d6daaede5
  Stored in directory: /root/.cache/pip/wheels/17/f8/79/999d5d37118fd35d7219ef57933eb9d09886c4c4503a800f84
Successfully built googlemaps
Installing collected packages: googlemaps
Successfully installed googlemaps-4.10.0
Collecting openai==0.28
  Downloading openai-0.28.0-py3-none-any.whl.metadata (13 kB)
Downloading openai-0.28.0-py3-none-any.whl (76 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.5/76.5 kB[0m [31m3.5 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: openai
Successfully installed openai-0.28.0


Let's now pass in our imports. We are going to be including the `getpass` library since we will need it to write in our secret keys.

In [None]:
import getpass

import googlemaps
import openai

Write in your secret keys, we will need to write in our Google API Key and our [OpenAI API Key](https://help.openai.com/en/articles/4936850-where-do-i-find-my-openai-api-key).

In [None]:
# google API Key
google_api_key = getpass.getpass(prompt="Put in Google API Key here")
map_client = googlemaps.Client(key=google_api_key)

# openAI API Key
openai_api_key = getpass.getpass(prompt="Put in OpenAI API Key here")

Put in Google API Key here··········
Put in OpenAI API Key here··········


Now, let's set ourselves up for Vector Search success. First, set your key and then establish our embedding function. For this tutorial, we are using OpenAI's "text-embedding-3-small" embedding model. We are going to be embedding the reviews of our spritz locations so we can make some judgements on where to go!

In [None]:
# set your key
openai.api_key = openai_api_key

# embedding model we are using
EMBEDDING_MODEL = "text-embedding-3-small"


# our embedding function
def get_embedding(text):
    response = openai.Embedding.create(input=text, model=EMBEDDING_MODEL)
    return response["data"][0]["embedding"]

When using Nearby Search in our Google Maps API, we are required to set up three parameters: location, radius, and keyword. For our location, we can find our starting coordinates (the very middle of West Village) by right clicking on Google Maps and copying the coordinates to our clipboard. That is how I got the coordinates shown below.

For our radius, we have to have it in meters. Since I'm not very savvy with meters, let's write a small function to help us make that conversion.

Our keyword will just be what we're hoping to find from the Google Places API, aperol spritzes!

We can then make our API call using the `places_nearby` method.


In [None]:
# for Google Maps API we need to use a radius in meters. Let's first change our miles to meters
def miles_to_meters(miles):
    return miles * 1609.344


middle_of_west_village = (40.73490473393682, -74.00521094160642)
search_radius = miles_to_meters(
    0.4
)  # West Village is small so just do less than half a mile.
spritz_finder = "aperol spritz"

# making the API call using our places_nearby method and our parameters
response = map_client.places_nearby(
    location=middle_of_west_village, radius=search_radius, keyword=spritz_finder
)

Before we can go ahead and print out our locations, let's think about our end goal. We want to achieve a couple of things before we insert our documents into our MongoDB Atlas cluster. We want to:
1. Get detailed information about our locations, so we need to make another API call to get our `place_id`, the location `name`, our `formatted_address`, the `geometry`, some `reviews` (only up to 5), and the location `rating`. You can find more fields to return (if your heart desires!) from the [Nearby Search documentation](https://developers.google.com/maps/documentation/places/web-service/search-nearby)

2. Then, we want to embed our reviews for each location using our embedding function. We want to make sure that we have a field for these so our vectors are stored in an array inside of our cluster. We are choosing to embed here just to make things easier for ourselves in the long run. Let's also join all the five reviews together into one string to make things a bit easier on ourselves with the embedding.

3. While we're creating our dictionary with all the important information we want to portray, we need to think about how our coordinates are set up. MongoDB Geospatial Queries requires GeoJSON format. This means we need to make sure we have the proper format, or else we won't be able to use our Geospatial Queries operators later. We also need to keep in mind that the longitude and latitude is stored in a nested array underneath geometry and location inside of our Google Places API. So, we unfortunatly can't just access it out, we need to work some magic first. Here is an example output of what I copied from the documentation:
```
{
  "html_attributions": [],
  "results":
    [
      {
        "business_status": "OPERATIONAL",
        "geometry":
          {
            "location": { "lat": -33.8587323, "lng": 151.2100055 },
            "viewport":
              {
                "northeast":
                  { "lat": -33.85739847010727, "lng": 151.2112436298927 },
                "southwest":
                  { "lat": -33.86009812989271, "lng": 151.2085439701072 },
              },

```

With all this in mind, let's get to it:

In [None]:
# find information we want: use the Nearby Places documentation to figure out which fields you want
spritz_locations = []
for location in response.get("results", []):
    location_detail = map_client.place(
        place_id=location["place_id"],
        fields=["name", "formatted_address", "geometry", "reviews", "rating"],
    )

    # these are the specific details we want to be saved as fields in our documents
    details = location_detail.get("result", {})

    # we want to embed the five reviews so lets extract and join together
    location_reviews = details.get("reviews", [])
    store_reviews = [review["text"] for review in location_reviews[:5]]
    joined_reviews = " ".join(store_reviews)

    # generate embedding on your reviews
    embedding_reviews = get_embedding(joined_reviews)

    # MongoDB geospatial queries require GeoJSON formatting. We know that the longitute and latitude is nested inside Geometry and Location.
    # so let's grab it using .get and then format it how we want.
    geometry = details.get("geometry", {})
    location = geometry.get("location", {})

    # both are nested under location so open it up
    longitute = location.get("lng")
    latitude = location.get("lat")

    location_info = {
        "name": details.get("name"),
        "address": details.get("formatted_address"),
        # MongoDB geospatial queries require GeoJSON formatting
        "location": {"type": "Point", "coordinates": [longitute, latitude]},
        "rating": details.get("rating"),
        "reviews": store_reviews,
        "embedding": embedding_reviews,
    }
    spritz_locations.append(location_info)

Let's print out our output and see what our spritz locations in the West Village neighborhood of NYC are! Let's also check and make sure we can see an embedding field.

In [None]:
# Print our spritz information
for location in spritz_locations:
    print(
        f"Name: {location['name']}, Address: {location['address']}, Coordinates: {location['location']}, Rating: {location['rating']}, Reviews: {location['reviews']}, Embedding: {location['embedding']}"
    )

Name: Bar Pisellino, Address: 52 Grove Street, 7th Ave S at, New York, NY 10014, USA, Coordinates: {'type': 'Point', 'coordinates': [-74.0034603, 40.7329348]}, Rating: 4.3, Reviews: ['this place gets sooo busy in the summer even on a weeknight so come knowing there will likely be a line or wait time. wait was longer than expected and the drinks are definitely pricey too on avg of 18-20 per drink. the aperol spritz was great and i love the ambience and location of this bar but indoor seating is limited. overall a cute bar to come with friends on a nice warm day and enjoy a drink or two. wish they had happy hours!', 'Came for a martini on the 4th of July. Was pretty busy but grateful they were open! I believe our server was Benji who was doing a good job! Was the only server tending to the outside area so waited a little bit but not bad at all consider it was a holiday. Just had a martini with vodka and it was delicious. Came served with a little side car!', 'Love the interior and atmosp

Now that we have our documents formatted the way we want them to be, let's insert everything into MongoDB Atlas using the `pymongo` library.

First, let's install `pymongo`

In [None]:
# install pymongo
!pip install pymongo

Collecting pymongo
  Downloading pymongo-4.8.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (22 kB)
Collecting dnspython<3.0.0,>=1.16.0 (from pymongo)
  Downloading dnspython-2.6.1-py3-none-any.whl.metadata (5.8 kB)
Downloading pymongo-4.8.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.2/1.2 MB[0m [31m24.4 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading dnspython-2.6.1-py3-none-any.whl (307 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m307.7/307.7 kB[0m [31m19.5 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: dnspython, pymongo
Successfully installed dnspython-2.6.1 pymongo-4.8.0


Now, let's set up our MongoDB Connection, in order to do this please make sure you have your connection string, if you need help finding it please refer to the documentation.

Keep in mind that you can name your database and collection anything you like! I am naming my database "spritz_summer" and my collection "spritz_locations_WV". Run the code block below to insert your documents into your cluster.

In [None]:
from pymongo import MongoClient

# set up your MongoDB connection
connection_string = getpass.getpass(
    prompt="Enter connection string WITH USER + PASS here"
)
client = MongoClient(
    connection_string, appname="devrel.showcase.geospatial_vector_search"
)

# name your database and collection anything you want since it will be created when you enter your data
database = client["spritz_summer"]
collection = database["spritz_locations_WV"]

# insert our spritz locations
collection.insert_many(spritz_locations)

Enter connection string WITH USER + PASS here··········


InsertManyResult([ObjectId('66bcd2b05f6a590b646ef361'), ObjectId('66bcd2b05f6a590b646ef362'), ObjectId('66bcd2b05f6a590b646ef363'), ObjectId('66bcd2b05f6a590b646ef364'), ObjectId('66bcd2b05f6a590b646ef365'), ObjectId('66bcd2b05f6a590b646ef366'), ObjectId('66bcd2b05f6a590b646ef367'), ObjectId('66bcd2b05f6a590b646ef368'), ObjectId('66bcd2b05f6a590b646ef369'), ObjectId('66bcd2b05f6a590b646ef36a'), ObjectId('66bcd2b05f6a590b646ef36b'), ObjectId('66bcd2b05f6a590b646ef36c'), ObjectId('66bcd2b05f6a590b646ef36d'), ObjectId('66bcd2b05f6a590b646ef36e'), ObjectId('66bcd2b05f6a590b646ef36f'), ObjectId('66bcd2b05f6a590b646ef370'), ObjectId('66bcd2b05f6a590b646ef371'), ObjectId('66bcd2b05f6a590b646ef372'), ObjectId('66bcd2b05f6a590b646ef373'), ObjectId('66bcd2b05f6a590b646ef374')], acknowledged=True)

Perfect! Go ahead and check back in MongoDB Atlas in your cluster and make sure that everything looks the way we want it to look before we proceed. Please double check that your embedding field is there and that it's an array of 1536.

## Which one comes first, Vector Search or Geospatial Queries?
Both of these need to be the first stage in their aggregation pipelines, so instead of making one pipeline we are going to do a little loophole. We will do two pipelines. But how will we decide which?!

When I'm using Google Maps to figure out where to go, I normally first search for what I'm looking for and then I see how far away it is from where I currently am and pick the closest location to me. So let's keep that mindset in place and start off with MongoDB Atlas Vector Search for this tutorial. But, I understand intuitively some of you might prefer to search via all nearby locations and then utilize Vector Search, so I'll highlight that method of searching for your spritz's as well.

## MongoDB Atlas Vector Search
We have a couple steps here. Our first step is to create a Vector Search Index. Do this inside of MongoDB Atlas by following this documentation. Please keep in mind that your index is NOT run in your script, it lives in your cluster. You'll know it's ready to go when the button turns green and it's activated.

In [None]:
# create a Vector Search Index so we can use it
{
    "fields": [
        {
            "numDimensions": 1536,
            "path": "embedding",
            "similarity": "cosine",
            "type": "vector",
        }
    ]
}

Once it's activated, let's get to Vector Searching!

So, let's say I just finished dinner with my besties at our favorite restaurant in the West Village, Balaboosta. The food was great and it's a warm summer day and we're in the mood for post dinner spritz's outside, and we would prefer to be seated quickly. Let's see if we can find a spot!

Our first step with building our our pipeline is to embed our query. We can't compare text to vectors, we have to compare vectors to vectors. Do this with only a couple lines since we are using the same embedding model that we embedded our reviews with:

In [None]:
# You have to embed your queries just the same way you embedded your documents.
# my query
query_description = "outdoor seating quick service"


# we need to embed the query as well, since our documents are embedded
query_vector = get_embedding(query_description)

Now, let's build out our aggregation pipeline. Since we are going to be using a $geoNear pipeline next, we want to keep in the IDs found from this search:

In [None]:
spritz_near_me_vector = [
    {
        "$vectorSearch": {
            "index": "vector_index",
            "path": "embedding",
            "queryVector": query_vector,
            "numCandidates": 15,
            "limit": 5,
        }
    },
    {
        "$project": {
            "_id": 1,  # we want to keep this in place so we can search again using GeoNear
            "name": 1,
            "rating": 1,
            "reviews": 1,
            # "address": 1,
            # "location": 1,
            # "embedding": 1
        }
    },
]

Let's print out our results and see what happens from our query of "outdoor seating quick service" :

In [None]:
spritz_near_me_vector_results = list(collection.aggregate(spritz_near_me_vector))
for result in spritz_near_me_vector_results:
    print(result)

{'_id': ObjectId('66bcd2b05f6a590b646ef369'), 'name': 'While We Were Young Kitchen & Cocktails', 'rating': 4.3, 'reviews': ["We went here for my birthday! It's a small space but the light lavender paint brightens and makes it feel wider than it is. The truffle fries and kale salad are to DIE for. If you're a rabbit like me you'll love it. They are really good at fries too. While everything may be slightly pricey the protons are totally worth it. I wish me and my boyfriend actually ordered to share everything lol. We just wish his burger was cooked a little bit better. The drinks were well made if i had another i wouldn't have made it home lol.", '3.5 / 5.0 - I came here for weekend brunch with a friend and we had a nice time catching up.\n\nWhile We Were Young is such a cute place, with a stylish decor. The dining area is quite small inside, but because of the large windows and layout of the tables, it didn’t feel claustrophobic at all.\n\nThe service was great. The staff was welcoming

We have five fantastic options! Let's go ahead and save the IDs from our above pipeline in a simple line:

In [None]:
# now, we want to take the _ids from our above pipeline so we can use it to geo search
spritz_near_me_ids = [result["_id"] for result in spritz_near_me_vector_results]
print(spritz_near_me_ids)

[ObjectId('66bcd2b05f6a590b646ef369'), ObjectId('66bcd2b05f6a590b646ef361'), ObjectId('66bcd2b05f6a590b646ef373'), ObjectId('66bcd2b05f6a590b646ef366'), ObjectId('66bcd2b05f6a590b646ef374')]


Now that they're saved, we can build out our $geoNear pipeline and see which one of these options is closest to us from our starting location, Balaboosta, and which one we can walk over to!

To figure out the coordinates of Balaboosta, I right clicked on Google Maps and saved in the coordinates and then made sure I was including the longitude and latitude in the proper order.

In [None]:
# https://www.mongodb.com/docs/manual/geospatial-queries/

# create a 2dsphere on our location field, so this is now putting a 2dsphere index on our spritz_locationsWV
collection.create_index({"location": "2dsphere"})

# use the $geoNear operator to return documents that are at least 100 meters and at most 1000 meters from our specified GeoJSON point.
spritz_near_me_geo = [
    {
        "$geoNear": {
            "near": {
                "type": "Point",
                "coordinates": [-74.0059456749148, 40.73781277366724],
            },
            # here we are saying that we only want to use the sample size from above
            "query": {"_id": {"$in": spritz_near_me_ids}},
            "minDistance": 100,
            "maxDistance": 1000,
            "spherical": True,
            "distanceField": "dist.calculated",
        }
    },
    {
        "$project": {
            "_id": 0,
            "name": 1,
            "address": 1,
            "rating": 1,
            "dist.calculated": 1,
            # "location": 1,
            # "embedding": 1
        }
    },
    {"$limit": 3},
    {"$sort": {"dist.calculated": 1}},
]

spritz_near_me_geo_results = collection.aggregate(spritz_near_me_geo)
for result in spritz_near_me_geo_results:
    print(result)

{'name': 'Pastis', 'address': '52 Gansevoort St, New York, NY 10014, USA', 'rating': 4.5, 'dist': {'calculated': 182.82575242382333}}
{'name': 'While We Were Young Kitchen & Cocktails', 'address': '183 W 10th St, New York, NY 10014, USA', 'rating': 4.3, 'dist': {'calculated': 468.19791207065526}}
{'name': 'Bar Pisellino', 'address': '52 Grove Street, 7th Ave S at, New York, NY 10014, USA', 'rating': 4.3, 'dist': {'calculated': 582.0735279994905}}


Seems like the restaurant we're heading over to is Pastis since it's the closest and fits our criteria perfectly.

## Other way around! Geospatial Queries first, then Vector Search

In [None]:
# xreate a 2dsphere index on oue location field
collection.create_index({"location": "2dsphere"})

# our $geoNear pipeline
spritz_near_me_geo = [
    {
        "$geoNear": {
            "near": {
                "type": "Point",
                "coordinates": [-74.0059456749148, 40.73781277366724],
            },
            "minDistance": 100,
            "maxDistance": 1000,
            "spherical": True,
            "distanceField": "dist.calculated",
        }
    },
    {"$project": {"_id": 1, "dist.calculated": 1}},
]

# list of ID's and distances so we can use them as our sample size
places_ids = list(collection.aggregate(spritz_near_me_geo))
distances = {
    result["_id"]: result["dist"]["calculated"] for result in places_ids
}  # have to create a new dictionary to keep our distances
spritz_near_me_ids = [result["_id"] for result in places_ids]
# print(spritz_near_me_ids)

# our vector search index that was created inside of MongoDB Atlas
vector_search_index = {
    "fields": [
        {
            "numDimensions": 1536,
            "path": "embedding",
            "similarity": "cosine",
            "type": "vector",
        },
        {"type": "filter", "path": "_id"},
    ]
}

# vector search pipeline
spritz_near_me_vector = [
    {
        "$vectorSearch": {
            "index": "vector_index",
            "path": "embedding",
            "queryVector": query_vector,
            "numCandidates": 15,
            "limit": 3,
            "filter": {"_id": {"$in": spritz_near_me_ids}},
        }
    },
    {
        "$project": {
            "_id": 1,  # we want to keep this in place
            "name": 1,
            "rating": 1,
            "dist.calculated": 1,
            # "reviews": 1
            # "address": 1,
            # "location": 1,
            # "embedding": 1
        }
    },
]


spritz_near_me_vector_results = collection.aggregate(spritz_near_me_vector)
for result in spritz_near_me_vector_results:
    result["dist.calculated"] = distances.get(result["_id"])
    print(result)

{'_id': ObjectId('66bcd2b05f6a590b646ef369'), 'name': 'While We Were Young Kitchen & Cocktails', 'rating': 4.3, 'dist.calculated': 468.19791207065526}
{'_id': ObjectId('66bcd2b05f6a590b646ef361'), 'name': 'Bar Pisellino', 'rating': 4.3, 'dist.calculated': 582.0735279994905}
{'_id': ObjectId('66bcd2b05f6a590b646ef373'), 'name': 'Pastis', 'rating': 4.5, 'dist.calculated': 182.82575242382333}
