# Install and Connect

To get started, we'll need to connect to our Elastic deployment using the Python client.

In [21]:
!pip install -qU elasticsearch requests openai python-dotenv langchain-openai

[33mDEPRECATION: Configuring installation scheme with distutils config files is deprecated and will no longer work in the near future. If you are using a Homebrew or Linuxbrew Python, please see discussion at https://github.com/Homebrew/homebrew-core/issues/76621[0m[33m
[0m[33mDEPRECATION: Configuring installation scheme with distutils config files is deprecated and will no longer work in the near future. If you are using a Homebrew or Linuxbrew Python, please see discussion at https://github.com/Homebrew/homebrew-core/issues/76621[0m[33m
[0m

Next, we need to import the modules we need. 

In [2]:
from dotenv import load_dotenv
import os
from elasticsearch import Elasticsearch, helpers, exceptions
from elasticsearch.helpers import BulkIndexError
import time
import json as JSON

Now we can instantiate the Python Elasticsearch client. Then we create a client object that instantiates an instance of the Elasticsearch class

In [44]:
load_dotenv()

ES_USER = os.getenv("ES_USER")
ES_PASSWORD = os.getenv("ES_PASSWORD")
ES_ENDPOINT = os.getenv("ES_ENDPOINT")

url = f"https://{ES_USER}:{ES_PASSWORD}@{ES_ENDPOINT}:9200"
print(url)

client = Elasticsearch(url, ca_certs = "./http_ca.crt", verify_certs = True)

https://elastic:=VnaMJck+DbYXpHR1Fch@localhost:9200


In [4]:
print(client.info())

{'name': 'liuxgm.local', 'cluster_name': 'elasticsearch', 'cluster_uuid': 'JXoZ_Xu-QnasteO4AWnVvQ', 'version': {'number': '8.13.4', 'build_flavor': 'default', 'build_type': 'tar', 'build_hash': 'da95df118650b55a500dcc181889ac35c6d8da7c', 'build_date': '2024-05-06T22:04:45.107454559Z', 'build_snapshot': False, 'lucene_version': '9.10.0', 'minimum_wire_compatibility_version': '7.17.0', 'minimum_index_compatibility_version': '7.0.0'}, 'tagline': 'You Know, for Search'}


Refer to https://www.elastic.co/guide/en/elasticsearch/client/python-api/current/connecting.html#connect-self-managed-new to learn how to connect to a self-managed deployment.

Read https://www.elastic.co/guide/en/elasticsearch/client/python-api/current/connecting.html#connect-self-managed-new to learn how to connect using API keys.

# Download and Deploy ELSER Model

In this example, we are going to download and deploy the ELSER model in our ML node. Make sure you have an ML node in order to run the ELSER model.

In [5]:
# delete model if already downloaded and deployed
try:
    client.ml.delete_trained_model(model_id=".elser_model_2", force=True)
    print("Model deleted successfully, We will proceed with creating one")
except exceptions.NotFoundError:
    print("Model doesn't exist, but We will proceed with creating one")

# Creates the ELSER model configuration. Automatically downloads the model if it doesn't exist.
client.ml.put_trained_model(
    model_id=".elser_model_2", input={"field_names": ["text_field"]}
)

Model deleted successfully, We will proceed with creating one


ObjectApiResponse({'model_id': '.elser_model_2', 'model_type': 'pytorch', 'model_package': {'packaged_model_id': 'elser_model_2', 'model_repository': 'https://ml-models.elastic.co', 'minimum_version': '11.0.0', 'size': 438123914, 'sha256': '2e0450a1c598221a919917cbb05d8672aed6c613c028008fedcd696462c81af0', 'metadata': {}, 'tags': [], 'vocabulary_file': 'elser_model_2.vocab.json'}, 'created_by': 'api_user', 'version': '12.0.0', 'create_time': 1717137482538, 'model_size_bytes': 0, 'estimated_operations': 0, 'license_level': 'platinum', 'description': 'Elastic Learned Sparse EncodeR v2', 'tags': ['elastic'], 'metadata': {}, 'input': {'field_names': ['text_field']}, 'inference_config': {'text_expansion': {'vocabulary': {'index': '.ml-inference-native-000002'}, 'tokenization': {'bert': {'do_lower_case': True, 'with_special_tokens': True, 'max_sequence_length': 512, 'truncate': 'first', 'span': -1}}}}, 'location': {'index': {'name': '.ml-inference-native-000002'}}})

The above command will download the ELSER model. This will take a few minutes to complete. Use the following command to check the status of the model download.

In [6]:
while True:
    status = client.ml.get_trained_models(
        model_id=".elser_model_2", include="definition_status"
    )

    if status["trained_model_configs"][0]["fully_defined"]:
        print("ELSER Model is downloaded and ready to be deployed.")
        break
    else:
        print("ELSER Model is downloaded but not ready to be deployed.")
    time.sleep(5)

ELSER Model is downloaded but not ready to be deployed.
ELSER Model is downloaded but not ready to be deployed.
ELSER Model is downloaded but not ready to be deployed.
ELSER Model is downloaded but not ready to be deployed.
ELSER Model is downloaded but not ready to be deployed.
ELSER Model is downloaded but not ready to be deployed.
ELSER Model is downloaded but not ready to be deployed.
ELSER Model is downloaded but not ready to be deployed.
ELSER Model is downloaded but not ready to be deployed.
ELSER Model is downloaded but not ready to be deployed.
ELSER Model is downloaded but not ready to be deployed.
ELSER Model is downloaded but not ready to be deployed.
ELSER Model is downloaded but not ready to be deployed.
ELSER Model is downloaded but not ready to be deployed.
ELSER Model is downloaded but not ready to be deployed.
ELSER Model is downloaded but not ready to be deployed.
ELSER Model is downloaded but not ready to be deployed.
ELSER Model is downloaded but not ready to be de

Once the model is downloaded, we can deploy the model in our ML node. Use the following command to deploy the model.

In [7]:
# Start trained model deployment if not already deployed
client.ml.start_trained_model_deployment(
    model_id=".elser_model_2", number_of_allocations=1, wait_for="starting"
)

while True:
    status = client.ml.get_trained_models_stats(
        model_id=".elser_model_2",
    )
    if status["trained_model_stats"][0]["deployment_stats"]["state"] == "started":
        print("ELSER Model has been successfully deployed.")
        break
    else:
        print("ELSER Model is currently being deployed.")
    time.sleep(5)

ELSER Model is currently being deployed.
ELSER Model has been successfully deployed.


### Indexing Documents with ELSER
In order to use ELSER on our Elastic Cloud deployment we'll need to create an ingest pipeline that contains an inference processor that runs the ELSER model. Let's add that pipeline using the [put_pipeline](https://www.elastic.co/guide/en/elasticsearch/reference/master/put-pipeline-api.html) method.

In [8]:
client.ingest.put_pipeline(
    id="elser-ingest-pipeline",
    description="Ingest pipeline for ELSER",
    processors=[
        {"html_strip": {"field": "name", "ignore_failure": True}},
        {"html_strip": {"field": "description", "ignore_failure": True}},
        {"html_strip": {"field": "amenities", "ignore_failure": True}},
        {"html_strip": {"field": "host_about", "ignore_failure": True}},
        {
            "inference": {
                "model_id": ".elser_model_2",
                "input_output": [
                    {"input_field": "name", "output_field": "name_embedding"}
                ],
                "ignore_failure": True,
            }
        },
        {
            "inference": {
                "model_id": ".elser_model_2",
                "input_output": [
                    {
                        "input_field": "description",
                        "output_field": "description_embedding",
                    }
                ],
                "ignore_failure": True,
            }
        },
        {
            "inference": {
                "model_id": ".elser_model_2",
                "input_output": [
                    {"input_field": "amenities", "output_field": "amenities_embedding"}
                ],
                "ignore_failure": True,
            }
        },
        {
            "inference": {
                "model_id": ".elser_model_2",
                "input_output": [
                    {
                        "input_field": "host_about",
                        "output_field": "host_about_embedding",
                    }
                ],
                "ignore_failure": True,
            }
        },
    ],
)

ObjectApiResponse({'acknowledged': True})

### Preparing the AirBnB Listings

Next up we need to prepare the index. We will map everything as keyword unless otherwise specified. We will also map the `name` and the `description` of the listing as `sparse_vectors` using ELSER.

In [9]:
client.indices.delete(index="airbnb-listings", ignore_unavailable=True)
client.indices.create(
    index="airbnb-listings",
    settings={"index": {"default_pipeline": "elser-ingest-pipeline"}},
    mappings={
        "dynamic_templates": [
            {
                "stringsaskeywords": {
                    "match": "*",
                    "match_mapping_type": "string",
                    "mapping": {"type": "keyword"},
                }
            }
        ],
        "properties": {
            "host_about_embedding": {"type": "sparse_vector"},
            "amenities_embedding": {"type": "sparse_vector"},
            "description_embedding": {"type": "sparse_vector"},
            "name_embedding": {"type": "sparse_vector"},
            "location": {"type": "geo_point"},
        },
    },
)

ObjectApiResponse({'acknowledged': True, 'shards_acknowledged': True, 'index': 'airbnb-listings'})

### Downloading the AirBnB data

next up we will download the AirBnB listings csv and upload it to Elasticsearch. This can take a couple of minutes! The AirBnB listing is roughly ~80mb of CSV expanded and roughly 40.000 documents. In the code below we added an if condition to only process the first 5.000 documents.

In [10]:
import requests
import gzip
import shutil
import csv

# Download the CSV file
# url = "https://data.insideairbnb.com/united-states/ny/new-york-city/2024-03-07/data/listings.csv.gz"
# response = requests.get(url, stream=True)

# Save the downloaded file
#with open("listings.csv.gz", "wb") as file:
#    shutil.copyfileobj(response.raw, file)

# Unpack the CSV file
#with gzip.open("./listings.csv.gz", "rb") as file_in:
#    with open("listings.csv", "wb") as file_out:
#        shutil.copyfileobj(file_in, file_out)

def remove_empty_fields(data):
    empty_fields = []
    # Iterate over the dictionary items
    for key, value in data.items():
        # Check if the value is empty (None, empty string, empty list, etc.)
        if not value:
            empty_fields.append(key)
    # Remove empty fields from the dictionary
    for key in empty_fields:
        del data[key]
    return data


def prepare_documents():
    with open("./listings.csv", "r", encoding="utf-8") as file:
        reader = csv.DictReader(file, delimiter=",")
        # we are going to only add the first 5.000 listings.
        limit = 5000
        for index, row in enumerate(reader):
            if index >= limit:
                break
            if index % 250 == 0:
                print(f"Processing document {index}")
            row["location"] = {
                "lat": float(row["latitude"]),
                "lon": float(row["longitude"]),
            }
            row = remove_empty_fields(row)
            yield {
                "_index": "airbnb-listings",
                "_source": dict(row),
            }

# Note: A bigger chunk_size might cause "connection timeout error"
helpers.bulk(client, prepare_documents(), chunk_size=10)

Processing document 0
Processing document 250
Processing document 500
Processing document 750
Processing document 1000
Processing document 1250
Processing document 1500
Processing document 1750
Processing document 2000
Processing document 2250
Processing document 2500
Processing document 2750
Processing document 3000
Processing document 3250
Processing document 3500
Processing document 3750
Processing document 4000
Processing document 4250
Processing document 4500
Processing document 4750


(5000, [])

### Prepare the MTA subway stations index

We need to prepare the index and make sure that we treat the geo location as a geo location.

In [11]:
client.indices.delete(index="mta-stations", ignore_unavailable=True)
client.indices.create(
    index="mta-stations",
    mappings={
        "dynamic_templates": [
            {
                "stringsaskeywords": {
                    "match": "*",
                    "match_mapping_type": "string",
                    "mapping": {"type": "keyword"},
                }
            }
        ],
        "properties": {"location": {"type": "geo_point"}},
    },
)

ObjectApiResponse({'acknowledged': True, 'shards_acknowledged': True, 'index': 'mta-stations'})

### Index the MTA data

We now need to index the data for the MTA.

In [12]:
import csv

# Download the CSV file
url = "https://data.ny.gov/api/views/39hk-dx4f/rows.csv?accessType=DOWNLOAD"
response = requests.get(url)


# Parse and index the CSV data
def prepare_documents():
    reader = csv.DictReader(response.text.splitlines())
    for row in reader:
        row["location"] = {
            "lat": float(row["GTFS Latitude"]),
            "lon": float(row["GTFS Longitude"]),
        }
        yield {
            "_index": "mta-stations",
            "_source": dict(row),
        }


# Index the documents
helpers.bulk(client, prepare_documents())

(496, [])

### Prepare points of interest

Same as before. We want to index the points of interests and use ELSER to make sure that any semantic searches are working. E.g. searching for `sights with gardens` should return `Central Park` even though it does not contain `garden` in the name.

In [13]:
client.indices.delete(index="points-of-interest", ignore_unavailable=True)
client.indices.create(
    index="points-of-interest",
    settings={"index": {"default_pipeline": "elser-ingest-pipeline"}},
    mappings={
        "dynamic_templates": [
            {
                "stringsaskeywords": {
                    "match": "*",
                    "match_mapping_type": "string",
                    "mapping": {"type": "keyword"},
                }
            }
        ],
        "properties": {
            "NAME": {"type": "text"},
            "location": {"type": "geo_point"},
            "name_embedding": {"type": "sparse_vector"},
        },
    },
)

ObjectApiResponse({'acknowledged': True, 'shards_acknowledged': True, 'index': 'points-of-interest'})

### Download points of interest

The `the_geom` looks like this: `POINT (-74.00701717096757 40.724634757833414)` which is formatted as a Well-Known Text point format and we officially support this. I personally always like to store my coordinates in lat & lon as an Object to make sure that there are no confusions.

In [14]:
import csv

# Download the CSV file
url = "https://data.cityofnewyork.us/api/views/t95h-5fsr/rows.csv?accessType=DOWNLOAD"
response = requests.get(url)


# Parse and index the CSV data
def prepare_documents():
    reader = csv.DictReader(response.text.splitlines())
    for row in reader:
        row["location"] = {
            "lat": float(row["the_geom"].split(" ")[2].replace(")", "")),
            "lon": float(row["the_geom"].split(" ")[1].replace("(", "")),
        }
        row["name"] = row["NAME"].lower()
        yield {
            "_index": "points-of-interest",
            "_source": dict(row),
        }


# Index the documents
helpers.bulk(client, prepare_documents(),chunk_size=10)

(20567, [])

### Now we have everything prepped

First let's see how well ELSER does with a "geo" query. Let's as it for an AirBnB next to Central Park and Empire State Building. Also we are just looking at the description, not the name or the about author as of now. Let's keep it simple.

In [15]:
response = client.search(
    index="airbnb-*",
    size=10,
    query={
        "text_expansion": {
            "description_embedding": {
                "model_id": ".elser_model_2",
                "model_text": "Next to Central Park and Empire State Building",
            }
        }
    },
)

for hit in response["hits"]["hits"]:
    doc_id = hit["_id"]
    score = hit["_score"]
    name = hit["_source"]["name"]
    location = hit["_source"]["location"]
    print(
        f"Score: {score}\nTitle: {name}\nLocation: {location}\nDocument ID: {doc_id}\n"
    )

Score: 18.937181
Title: Midtown Gem!
Location: {'lon': -73.9809504105621, 'lat': 40.75069179184811}
Document ID: BVlozY8BHFTyMPLT4Ish

Score: 18.497099
Title: 2-Bedroom Apartment in Manhattan
Location: {'lon': -73.98831, 'lat': 40.769}
Document ID: fFlkzY8BHFTyMPLTCoYx

Score: 18.24646
Title: 4 Units w/ Impressive City View! Pets Allowed!
Location: {'lon': -73.9758092977172, 'lat': 40.76538626102478}
Document ID: bllqzY8BHFTyMPLTMYzH

Score: 18.24646
Title: Minutes to Central Park Zoo and MoMA! Parking!
Location: {'lon': -73.97414, 'lat': 40.76354}
Document ID: pVlnzY8BHFTyMPLTZ4nB

Score: 18.24646
Title: Central Park Views, Luxury Stay, Pets Are Welcome!
Location: {'lon': -73.97432, 'lat': 40.76366}
Document ID: tllnzY8BHFTyMPLTf4me

Score: 18.24646
Title: Pleasant Stay! Located in the heart of Midtown!
Location: {'lon': -73.97621, 'lat': 40.76555}
Document ID: pVlszY8BHFTyMPLTqY5L

Score: 18.24646
Title: Pleasant Stay! Located in the heart of Midtown!
Location: {'lon': -73.9743032315

### Analysing the response

We indexed all AirBnBs, so it might be little different to what you get when you only do the first 5.000.

The next step is to run a geo_distance query within Elasticsearch. First to analyse how far apart `Central Park` and `Empire State Building` is. Since the `Central Park` is pretty big and contains a multitude of points of interest, we will use the `Bow Bridge` an iconic sight.

We will use a simple terms query to get the geo location of `Central Park Bow Bridge` and then run a `geo_distance` query with a `_geo_distance` sort to get the exact distance back. The `geo_distance` query as of now always requires a `distance` parameter. We add a `term` to search for `empire state building` since we are just interested in this.

In [16]:
response = client.search(
    index="points-of-interest",
    size=1,
    query={"term": {"name": "central park bow bridge"}},
)

for hit in response["hits"]["hits"]:
    # this should now be the central park bow bridge.
    print(f"Name: {hit['_source']['name']}\nLocation: {hit['_source']['location']}\n")
    response = client.search(
        index="points-of-interest",
        size=1,
        query={
            "bool": {
                "must": {"term": {"name": "empire state building"}},
                "filter": {
                    "geo_distance": {
                        "distance": "200km",
                        "location": {
                            "lat": hit["_source"]["location"]["lat"],
                            "lon": hit["_source"]["location"]["lon"],
                        },
                    }
                },
            }
        },
        sort=[
            {
                "_geo_distance": {
                    "location": {
                        "lat": hit["_source"]["location"]["lat"],
                        "lon": hit["_source"]["location"]["lon"],
                    },
                    "unit": "km",
                    "distance_type": "plane",
                    "order": "asc",
                }
            }
        ],
    )
    print(
        f"Distance to Empire State Building: {response['hits']['hits'][0]['sort'][0]} km"
    )

Name: central park bow bridge
Location: {'lon': -73.97178440451849, 'lat': 40.77577539823907}

Distance to Empire State Building: 3.247504472145157 km


### Comparing to Elser

Now our top scoring document:

```
Score: 20.003891
Title: Gorgeous 1 Bedroom - Upper East Side Manhattan -
Location: {'lon': -73.95856, 'lat': 40.76701}
Document ID: AkgfEI8BHToGwgcUA6-7
```

Let's run the calculation from above using geo_distance.

In [17]:
response = client.search(
    index="points-of-interest",
    size=10,
    query={
        "bool": {
            "must": {
                "terms": {"name": ["central park bow bridge", "empire state building"]}
            },
            "filter": {
                "geo_distance": {
                    "distance": "200km",
                    "location": {"lat": "40.76701", "lon": "-73.95856"},
                }
            },
        }
    },
    sort=[
        {
            "_geo_distance": {
                "location": {"lat": "40.76701", "lon": "-73.95856"},
                "unit": "km",
                "distance_type": "plane",
                "order": "asc",
            }
        }
    ],
)

for hit in response["hits"]["hits"]:
    print("Distance between AirBnB and", hit["_source"]["name"], hit["sort"][0], "km")

Distance between AirBnB and central park bow bridge 1.4799179352060348 km
Distance between AirBnB and empire state building 3.0577584374128617 km


### Analysing

Only 1.4km and 3km away from the two sights. Not that bad. Let's see what we can find when we create a geo-bounding box with the Empire State Building and the Central Park Bow Bridge. Additionally we will sort the result by the distance to the Central Park Bow Bridge and then by distance to Empire State Building.

In [18]:
response = client.search(
    index="points-of-interest",
    size=2,
    query={"terms": {"name": ["central park bow bridge", "empire state building"]}},
)

# for easier access we store the locations in two variables
central = {}
empire = {}
for hit in response["hits"]["hits"]:
    hit = hit["_source"]
    if "central park bow bridge" in hit["name"]:
        central = hit["location"]
    elif "empire state building" in hit["name"]:
        empire = hit["location"]

# Now we can run the geo_bounding_box query and sort it by the
# distance first to Central Park Bow Bridge
# and then to the Empire State Building.
response = client.search(
    index="airbnb-*",
    size=50,
    query={
        "geo_bounding_box": {
            "location": {
                "top_left": {"lat": central["lat"], "lon": empire["lon"]},
                "bottom_right": {"lat": empire["lat"], "lon": central["lon"]},
            }
        }
    },
    sort=[
        {
            "_geo_distance": {
                "location": {"lat": central["lat"], "lon": central["lon"]},
                "unit": "km",
                "distance_type": "plane",
                "order": "asc",
            }
        },
        {
            "_geo_distance": {
                "location": {"lat": empire["lat"], "lon": empire["lon"]},
                "unit": "km",
                "distance_type": "plane",
                "order": "asc",
            }
        },
    ],
)

for hit in response["hits"]["hits"]:
    print(f"Distance to Central Park Bow Bridge: {hit['sort'][0]} km")
    print(f"Distance to Empire State Building: {hit['sort'][1]} km")
    print(f"Title: {hit['_source']['name']}\nDocument ID: {hit['_id']}\n")

Distance to Central Park Bow Bridge: 0.7522774827930032 km
Distance to Empire State Building: 2.788981551632797 km
Title: Shared room at Lincoln Center
Document ID: b1lzzY8BHFTyMPLTt5UM

Distance to Central Park Bow Bridge: 0.8285887310207944 km
Distance to Empire State Building: 2.654302842845152 km
Title: Blueground | UWS, w/d, nr Lincon Center
Document ID: TFljzY8BHFTyMPLTv4bj

Distance to Central Park Bow Bridge: 0.9179057084098068 km
Distance to Empire State Building: 2.633649076174077 km
Title: Manhattan New York Upper West side
Document ID: YlllzY8BHFTyMPLT7Ig9

Distance to Central Park Bow Bridge: 1.1243476248351183 km
Distance to Empire State Building: 2.4285558932557834 km
Title: Luxury 3 BR condo + balcony w views of Central PK
Document ID: O1ljzY8BHFTyMPLTtIb7

Distance to Central Park Bow Bridge: 1.1712213104190234 km
Distance to Empire State Building: 2.2972489327173635 km
Title: Upper West Apartment Block Away from Central Park!
Document ID: MVlzzY8BHFTyMPLTeJWm

Distanc

## AI

Now let's finally get to the AI part. All of this was the setup and understanding what makes geo spatial searches tick and how they work. There is still a lot more to discover. Let's hookup it up to our OpenAI instance. 

In [42]:
from openai import OpenAI

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
client = OpenAI(
    # This is the default and can be omitted
    api_key=os.environ.get("OPENAI_API_KEY"),
)

# Set API key
openai = OpenAI()

# Let's do a test:
question = "What is the capital of France? Answer with just the capital city."

answer = openai.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": question,
        }
    ],
    model="gpt-3.5-turbo",
)

print(answer.choices[0].message.content)

Paris


Now that this works, we are sure that we are in the correct place to start our question. We are writing a prompt that forces ChatGPT to create a JSON response and extract the information from the question.

In [43]:
question = """
As an expert in named entity recognition machine learning models, I will give you a sentence from which I would like you to extract what needs to be found (location, apartment, airbnb, sight, etc) near which location and the distance between them. The distance needs to be a number expressed in kilometers. I would like the result to be expressed in JSON with the following fields: "what", "near", "distance_in_km". Only return the JSON.
Here is the sentence: "Get me the closest AirBnB between 1 miles distance from the Empire State Building"
"""

answer = openai.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": question,
        }
    ],
    model="gpt-3.5-turbo",
)
print(answer.choices[0].message.content)

{
  "what": "AirBnB",
  "near": "Empire State Building",
  "distance_in_km": 1.6
}


The answer in our case is the following

Here is the desired output:
```
{
    "what": "AirBnB",
    "near": "Empire State Building",
    "distance_in_km": 1610
}
```
1) Extract distance - done (1 miles)
2) Convert distance to km - done (1.6 km)
3) Extract location - This should be "Empire State Building", but in more general terms we should recognize that this is a location so we make a separate label called

In [45]:
json = answer.choices[0].message.content
# This now should contain just the json.
json = JSON.loads(json)

# first let's grab the location of the `near` field
# it could be multiple locations, so we will search for all of them.
near = client.search(
    index="points-of-interest",
    size=100,
    query={"bool": {"must": {"terms": {"name": [json["near"].lower()]}}}},
)

# we store just all of the geo-locations of the near locations.
near_location = []
sort = []

for hit in near["hits"]["hits"]:
    near_location.append(hit["_source"]["location"])
    sort.append(
        {
            "_geo_distance": {
                "location": {
                    "lat": hit["_source"]["location"]["lat"],
                    "lon": hit["_source"]["location"]["lon"],
                },
                "unit": "km",
                "distance_type": "plane",
                "order": "asc",
            }
        }
    )

query = {
    "geo_distance": {
        "distance": str(json["distance_in_km"]) + "km",
        "location": {"lat": near_location[0]["lat"], "lon": near_location[0]["lon"]},
    }
}
# Now let's get all the AirBnBs `what` near the `near` location.
# We always use the first location as our primary reference.
airbnbs = client.search(index="airbnb-*", size=100, query=query, sort=sort)

for hit in airbnbs["hits"]["hits"]:
    print(f"Distance to {json['near']}: {hit['sort'][0]} km")
    print(f"Title: {hit['_source']['name']}\nDocument ID: {hit['_id']}\n")

Distance to Empire State Building: 0.07179165027056177 km
Title: Gorgeous 1 bedroom luxury condo
Document ID: mlllzY8BHFTyMPLTHIen

Distance to Empire State Building: 0.11711484106305878 km
Title: Exclusive Private Studio 1103 | Private Bathroom
Document ID: 91lmzY8BHFTyMPLTjojD

Distance to Empire State Building: 0.11786577229462668 km
Title: Room available in Midtown
Document ID: fFlwzY8BHFTyMPLTs5Ie

Distance to Empire State Building: 0.12218665438647491 km
Title: A+ Location Deluxe Studio(3 beds) #5
Document ID: 9llozY8BHFTyMPLT1orX

Distance to Empire State Building: 0.1361089243182925 km
Title: Easy access to transit for seeing all the sights
Document ID: Z1lwzY8BHFTyMPLTmZKx

Distance to Empire State Building: 0.1361089243182925 km
Title: Well appointed queen with ADA features
Document ID: xllwzY8BHFTyMPLTBpHL

Distance to Empire State Building: 0.14812006828137478 km
Title: Cozy Scandinavian private room in Midtown
Document ID: P1lzzY8BHFTyMPLTgpWk

Distance to Empire State Bui

With that we now have combined geo spatial search with LLMs.

Some idea for further exploration:
* Let any LLM generate an itinerary with sights.
...