### Geospatial Recommendation System

![image-for-the-concept](https://github.com/lancedb/vectordb-recipes/blob/main/examples/Geospatial-Recommendation-System/Geospatial%20Recommendation%20System.png?raw=1)

In this tutorial, we'll enhance our restaurant recommendation system using Full Text Search (FTS) Indexes and Geospatial APIs.

1. Extract User Preferences: Identify key details from user input such as preferred cuisines and location.
2. Construct Query String: Synthesize these details into a structured query string for searching.
3. Perform FTS Index Search: Use the query string to find relevant restaurant recommendations.
4. Apply Geospatial Filtering: Use a Geospatial API to locate the user and refine recommendations based on proximity.

We can enhance later on by adding a filter to sort the recommendations based on distance

### Importing the relevant libraires

In [14]:
%%capture
!pip install lancedb pandas sentence-transformers requests openai tantivy

In [4]:
import pandas as pd

!wget --no-check-certificate 'https://drive.google.com/uc?export=download&id=17Div0ml4Nelr1C4QaGVJzC7lnMx--BkM' -O data.csv

--2025-01-05 10:34:14--  https://drive.google.com/uc?export=download&id=17Div0ml4Nelr1C4QaGVJzC7lnMx--BkM
Resolving drive.google.com (drive.google.com)... 74.125.126.139, 74.125.126.138, 74.125.126.102, ...
Connecting to drive.google.com (drive.google.com)|74.125.126.139|:443... connected.
HTTP request sent, awaiting response... 303 See Other
Location: https://drive.usercontent.google.com/download?id=17Div0ml4Nelr1C4QaGVJzC7lnMx--BkM&export=download [following]
--2025-01-05 10:34:14--  https://drive.usercontent.google.com/download?id=17Div0ml4Nelr1C4QaGVJzC7lnMx--BkM&export=download
Resolving drive.usercontent.google.com (drive.usercontent.google.com)... 74.125.202.132, 2607:f8b0:4001:c06::84
Connecting to drive.usercontent.google.com (drive.usercontent.google.com)|74.125.202.132|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 883287 (863K) [application/octet-stream]
Saving to: ‘data.csv’


2025-01-05 10:34:17 (121 MB/s) - ‘data.csv’ saved [883287/883287]



In [5]:
import lancedb
import pandas as pd

restaurant_data = pd.read_csv("data.csv")
restaurant_data = restaurant_data[restaurant_data.columns[1:]]
restaurant_data.dropna(inplace=True)
restaurant_data.drop_duplicates(inplace=True)
restaurant_data.head()

Unnamed: 0,Area,City,Restaurant,Price,Avg ratings,Total ratings,Food type,Address,Delivery time
0,Koramangala,Bangalore,Tandoor Hut,300.0,4.4,100,"Biryani,Chinese,North Indian,South Indian",5Th Block,59
1,Koramangala,Bangalore,Tunday Kababi,300.0,4.1,100,"Mughlai,Lucknowi",5Th Block,56
2,Jogupalya,Bangalore,Kim Lee,650.0,4.4,100,Chinese,Double Road,50
3,Indiranagar,Bangalore,New Punjabi Hotel,250.0,3.9,500,"North Indian,Punjabi,Tandoor,Chinese",80 Feet Road,57
4,Indiranagar,Bangalore,Nh8,350.0,4.0,50,"Rajasthani,Gujarati,North Indian,Snacks,Desser...",80 Feet Road,63


### Embedding the relevant parts of the data.

We will extract key information from the restaurant dataset columns and create a query string. This string will be encoded using our embedding model and then combined with additional data for storage in the Vector Database.

In [7]:
import os
from sentence_transformers import SentenceTransformer
from huggingface_hub import hf_hub_download

os.environ["HUGGING_FACE_HUB_TOKEN"] = "****"

model = SentenceTransformer('paraphrase-MiniLM-L6-v2', use_auth_token=True)
data_points_vectors = []

for _, row in restaurant_data.iterrows():
    filter_cols = ['Food type', 'Avg ratings', 'Address']
    data_point = "#".join(f"{col}/{row[col]}" for col in filter_cols)
    data_points_vectors.append(data_point)

# Add the new column to the DataFrame
restaurant_data["query_string"] = data_points_vectors

list_of_payloads = []

for index, row in restaurant_data.iterrows():
    encoded_vector = model.encode(row['query_string'])
    payload = {
        'Area': row['Area'],
        'City': row['City'],
        'Restaurant': row['Restaurant'],
        'Price': row['Price'],
        'Avg_ratings': row['Avg ratings'],
        'Total_ratings': row['Total ratings'],
        'Food_type': row['Food type'],
        'Address': row['Address'],
        'Delivery_time': row['Delivery time'],
        'query_string': row['query_string'],
        'vector': encoded_vector
    }

    list_of_payloads.append(payload)



### Using the LanceDB database

In [8]:
# Connect to the LanceDB instance
uri = "data"
db = lancedb.connect(uri)

lancedb_table = db.create_table("restaurant-geocoding-app", data=list_of_payloads)

In [9]:
df = lancedb_table.to_pandas()
df.head()

Unnamed: 0,Area,City,Restaurant,Price,Avg_ratings,Total_ratings,Food_type,Address,Delivery_time,query_string,vector
0,Koramangala,Bangalore,Tandoor Hut,300.0,4.4,100,"Biryani,Chinese,North Indian,South Indian",5Th Block,59,"Food type/Biryani,Chinese,North Indian,South I...","[0.12830292, 0.14721094, -0.086350575, 0.08263..."
1,Koramangala,Bangalore,Tunday Kababi,300.0,4.1,100,"Mughlai,Lucknowi",5Th Block,56,"Food type/Mughlai,Lucknowi#Avg ratings/4.1#Add...","[-0.10582731, 0.15009499, -0.35311985, 0.12081..."
2,Jogupalya,Bangalore,Kim Lee,650.0,4.4,100,Chinese,Double Road,50,Food type/Chinese#Avg ratings/4.4#Address/Doub...,"[-0.09362272, 0.16319357, 0.12415688, 0.012913..."
3,Indiranagar,Bangalore,New Punjabi Hotel,250.0,3.9,500,"North Indian,Punjabi,Tandoor,Chinese",80 Feet Road,57,"Food type/North Indian,Punjabi,Tandoor,Chinese...","[0.12705283, 0.17128171, 0.013174878, 0.239679..."
4,Indiranagar,Bangalore,Nh8,350.0,4.0,50,"Rajasthani,Gujarati,North Indian,Snacks,Desser...",80 Feet Road,63,"Food type/Rajasthani,Gujarati,North Indian,Sna...","[0.08238438, 0.014472998, -0.11513413, 0.28430..."


### Query Transformation

In [10]:
df["query_string"][0]

'Food type/Biryani,Chinese,North Indian,South Indian#Avg ratings/4.4#Address/5Th Block'

### Extracting the specifics from the query

Just like we pulled out the key details from our CSV to craft query strings, we’ll do the same with user queries. This step is important because it makes searching for the right recommendations much smoother. I mean doing so we can easily run the FTS Index Search.

In [11]:
from openai import OpenAI
OPENAI_API_KEY = "****"
client = OpenAI(api_key = OPENAI_API_KEY)


query_string = "Hi, I am looking for a casual dining restaurant where Indian or Italian food is served near the HSR Bangalore"

# Helper prompt to extract structured data from ip_prompt
total_prompt = f"""Query String: {query_string}\n\n\
Now from the query string above extract these following entities pinpoints:
1. Food type : Extract the food type
2. Avg ratings : Extract the average ratings
3. Address : Extract the current exact location, don't consider the fillers like "near" or "nearby".

NOTE : For the Current location, try to understand the pin point location in the query string. Do not give any extra information. If you make the mistakes, bad things
will happen.

Finally return a python dictionary using those points as keys and don't write the markdown of python. If value of a key is not mentioned, then set it as None.
"""

# Make a request to OpenAI's API
completion = client.chat.completions.create(
    model="gpt-4o",  # Use the appropriate model
    store=True,
    messages=[
        {"role": "user", "content": total_prompt}
    ]
)

# Extract the generated text
content = completion.choices[0].message.content
print(content)

{
  "Food type": "Indian or Italian",
  "Avg ratings": None,
  "Address": "HSR Bangalore"
}


In [12]:
import ast

# Convert the string content to a dictionary
try:
    response_dict = ast.literal_eval(content)
except (ValueError, SyntaxError) as e:
    print("Error parsing the response:", e)
    response_dict = {}


filter_cols = ['Food type', 'Avg ratings', 'Address']
query_string_parts = [f"{col}/{response_dict.get(col)}" for col in filter_cols if response_dict.get(col)]

query_string = "#".join(query_string_parts)
print((query_string))

Food type/Indian or Italian#Address/HSR Bangalore


### Using LanceDB FTS for searching

In [15]:
# Create the FTS index and search
lancedb_table.create_fts_index("query_string", replace=True)
results = lancedb_table.search(query_string).to_pandas()

### GeoSpatial Recommendation

Ok now we will use the Google Geospatial API to pinpoint the exact locations of restaurants and find their coordinates. The next step is to calculate the distance between these restaurants and the user's location. For this, I am going to use the Haversine formula, which uses the coordinates of two points to draw an imaginary straight line between them, measuring the distance across the Earth's surface. There's some math behind how this formula works, but we'll keep things simple and focus on its application for now.  

In [18]:
import requests
import math

def get_google_geocoding(address, api_key):
    base_url = "https://maps.googleapis.com/maps/api/geocode/json"
    params = {"address": address, "key": api_key}
    response = requests.get(base_url, params=params)

    if response.status_code == 200:
        result = response.json()
        if result["status"] == "OK":
            latitude = result["results"][0]["geometry"]["location"]["lat"]
            longitude = result["results"][0]["geometry"]["location"]["lng"]
            return (latitude, longitude)
        else:
            print(f"Google API: No results found for address: {address}")
            return None
    else:
        print(f"Google API: Request failed for address: {address}")
        return None

def haversine(coord1, coord2):
    R = 6371.0  # Radius of the Earth in kilometers
    lat1, lon1 = map(math.radians, coord1)
    lat2, lon2 = map(math.radians, coord2)
    dlat = lat2 - lat1
    dlon = lon2 - lon1
    a = math.sin(dlat / 2)**2 + math.cos(lat1) * math.cos(lat2) * math.sin(dlon / 2)**2
    c = 2 * math.atan2(math.sqrt(a), math.sqrt(1 - a))
    distance = R * c
    return distance

def process_top_restaurants(data, current_location, api_key, top_n=5):
    current_coords = get_google_geocoding(current_location, api_key)
    if not current_coords:
        return

    for index, row in data.head(top_n).iterrows():
        complete_address = f"{row['Restaurant']}, {row['City']}"
        restaurant_coords = get_google_geocoding(complete_address, api_key)
        if restaurant_coords:
            distance = haversine(current_coords, restaurant_coords)
            print(f"Restaurant Name: {row['Restaurant']}")
            print(f"Distance: {distance:.2f} km")
            print(f"Area: {row['Area']}")
            print(f"Price: {row['Price']}")
            print(f"Coordinates: {restaurant_coords}")
            print(f"Cuisines Type: {row['Food_type']}")
            print("-" * 40)

# Example usage
GOOGLE_GEOSPATIAL_API_KEY = '****'
current_location = 'HSR, Bengaluru, India'
process_top_restaurants(results, current_location, GOOGLE_GEOSPATIAL_API_KEY, top_n=5)

Restaurant Name: Cafe Azzure
Distance: 8.06 km
Area: Ashok Nagar
Price: 1000.0
Coordinates: (12.975012, 77.6076558)
Cuisines Type: American,Italian
----------------------------------------
Restaurant Name: Hyderabad Biryaani House
Distance: 8.55 km
Area: Victoria Layout
Price: 499.0
Coordinates: (12.9715987, 77.5945627)
Cuisines Type: Indian
----------------------------------------
Restaurant Name: Aaliyar Ambur Dum Biryani
Distance: 7.53 km
Area: Ashok Nagar
Price: 200.0
Coordinates: (12.9694702, 77.60761529999999)
Cuisines Type: Indian
----------------------------------------
Restaurant Name: Jw Kitchen - Jw Marriott
Distance: 8.58 km
Area: Ashok Nagar
Price: 1000.0
Coordinates: (12.972231, 77.59495299999999)
Cuisines Type: Indian,Continental
----------------------------------------
Restaurant Name: The Ritz-Carlton - Ganache
Distance: 8.55 km
Area: Ashok Nagar
Price: 1000.0
Coordinates: (12.9715987, 77.5945627)
Cuisines Type: Indian,Bakery
----------------------------------------
