# A Data-Driven Framework for Proximity-Based Restaurant Ranking Using NLP and Mapping APIs

**Sian Bhari**

*Goal: This project aims to find and rank the best Italian restaurants near the Conrad Hotel in New York by using geolocation, mapping, and sentiment analysis. It combines location data, user reviews, and proximity to score each venue and highlight the top choice.*

**Packages used:**
- **requests** — Sends HTTP requests to the Foursquare and Google Places APIs to fetch venue and review data.
- **pandas** — Handles tabular data: creates DataFrames, normalizes JSON, aggregates sentiment scores, and calculates final rankings.
- **numpy** — Used implicitly by pandas for numerical operations; not directly referenced in the code.
- **geopy** — Uses `Nominatim` to convert the Conrad Hotel’s address into coordinates and `geodesic` to calculate distances between locations.
- **pandas.json_normalize** — Flattens nested JSON data (from Foursquare API) into a structured DataFrame.
- **folium** — Visualizes the map with markers for the Conrad Hotel and nearby Italian restaurants.
- **textblob** — Performs sentiment analysis on Google Places reviews (calculates polarity and subjectivity).
- **time** — Introduces a 1-second delay between review API calls to respect rate limits.

In [26]:
# Importing required libraries
import requests  # handle API requests
import pandas as pd  # data analysis
import numpy as np  # vectorized data operations

# Install and import geopy
!pip install geopy
from geopy.geocoders import Nominatim  # convert address into coordinates

# JSON to DataFrame transformation (fixed import)
from pandas import json_normalize

# Install and import folium for mapping
!pip install folium
import folium

print('Folium installed')
print('Libraries imported.')

[0mFolium installed
Libraries imported.


### Search for Italian food that is within 500 metres from the Conrad Hotel

In [2]:
query = "Italian"
radius = 500
limit = 10
import requests
from geopy.geocoders import Nominatim
import pandas as pd
import folium
from pandas import json_normalize

# Your Foursquare API Key (v3)
API_KEY = 'fsq3h8L1OWTl7+aY0bxoaFoHijSnabYjGIBxSq4cpJPeWpI='
headers = {
    "Accept": "application/json",
    "Authorization": API_KEY
}

# Get coordinates of Conrad Hotel using geopy
address = '102 North End Ave, New York, NY'
geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(f"Coordinates of Conrad Hotel: {latitude}, {longitude}")

# Search parameters
query = "Italian"
radius = 500
limit = 10
search_url = "https://api.foursquare.com/v3/places/search"
params = {
    "query": query,
    "ll": f"{latitude},{longitude}",
    "radius": radius,
    "limit": limit
}

# Make the API request
response = requests.get(search_url, headers=headers, params=params)
print("Status code:", response.status_code)

# Parse the response
results = response.json()
venues = results.get("results", [])

# Check and display results
if not venues:
    print("No venues found.")
else:
    print(f"Found {len(venues)} venues:")
    df = json_normalize(venues)
    df = df[["name", "location.address", "geocodes.main.latitude", "geocodes.main.longitude", "categories"]]
    df.columns = ["Name", "Address", "Lat", "Lng", "Categories"]
    df["Category"] = df["Categories"].apply(lambda x: x[0]["name"] if isinstance(x, list) and x else None)
    df = df.drop(columns=["Categories"])
    display(df)

# Create map centered on Conrad Hotel
map_it = folium.Map(location=[latitude, longitude], zoom_start=15)

# Add marker for Conrad Hotel (red)
folium.CircleMarker(
    [latitude, longitude],
    radius=10,
    popup='Conrad Hotel',
    color='red',
    fill=True,
    fill_color='red',
    fill_opacity=0.7
).add_to(map_it)

# Add restaurant markers (blue)
for _, row in df.iterrows():
    folium.CircleMarker(
        [row["Lat"], row["Lng"]],
        radius=5,
        popup=row["Name"],
        color='blue',
        fill=True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(map_it)

# Display map
map_it

Coordinates of Conrad Hotel: 40.7151482, -74.0156573
Status code: 200
Found 6 venues:


Unnamed: 0,Name,Address,Lat,Lng,Category
0,Parm,250 Vesey St,40.714418,-74.015893,Italian Restaurant
1,Sant Ambroeus Brookfield,200 Vesey St,40.713759,-74.014339,Bar
2,St Ambroeus,200 Vesey St,40.713857,-74.014692,Italian Restaurant
3,Gigino Trattoria,323 Greenwich St,40.717261,-74.010506,Pizzeria
4,Harry's Italian Pizza Bar,225 Murray St,40.715258,-74.014748,Italian Restaurant
5,Sauce pizzeria,225 Liberty St,40.711686,-74.015217,Pizzeria


### Sentiment Analysis of Top Italian Restaurants Near Conrad Hotel (NYC)

| Metric           | Range        | Meaning                  |
| ---------------- | ------------ | ------------------------ |
| **Polarity**     | -1.0 to +1.0 | Negativity to Positivity |
| **Subjectivity** | 0.0 to 1.0   | Fact to Opinion          |

#### The Reliability score quantifies how trustworthy the aggregated sentiment for a restaurant is, based on:

- Review Volume (how many people rated it)

- Objectivity of reviews

- Consistency of sentiment (variance in opinions)

Formula:
Reliability
=
(
0.5
×
Norm_Reviews
)
+
(
0.25
×
(
1
−
Subjectivity
)
)
+
(
0.25
×
(
1
−
Polarity_STD
)
)
Reliability=(0.5×Norm_Reviews)+(0.25×(1−Subjectivity))+(0.25×(1−Polarity_STD))


In [4]:
import requests

API_KEY = "AIzaSyBx4aO-veYZnLZKiE_iVZdFP6Emi07sNq4"
location = f"{latitude},{longitude}"  # Conrad Hotel
radius = 1000
type_ = "restaurant"
keyword = "Italian"

url = (
    f"https://maps.googleapis.com/maps/api/place/nearbysearch/json"
    f"?location={location}&radius={radius}&type={type_}&keyword={keyword}&key={API_KEY}"
)

response = requests.get(url)
places_data = response.json()

# Display top 5 results
for place in places_data.get("results", [])[:5]:
    print(f"{place['name']}, Rating: {place.get('rating', 'N/A')} ({place.get('user_ratings_total', 0)} reviews)")

Piccola Cucina Osteria Siciliana, Rating: 4.6 (2241 reviews)
A Pasta Bar, Rating: 4.2 (1516 reviews)
Serafina Tribeca, Rating: 4.3 (1278 reviews)
Eataly, Rating: 4.2 (5911 reviews)
Mamo, Rating: 4.4 (733 reviews)


In [23]:
import requests
from textblob import TextBlob
import pandas as pd
import time
from geopy.distance import geodesic

API_KEY = "AIzaSyBx4aO-veYZnLZKiE_iVZdFP6Emi07sNq4"
latitude = 40.7151482
longitude = -74.0156573
location = f"{latitude},{longitude}"
radius = 1000
type_ = "restaurant"
keyword = "Italian"

# Step 1: Get nearby restaurants
url = (
    f"https://maps.googleapis.com/maps/api/place/nearbysearch/json"
    f"?location={location}&radius={radius}&type={type_}&keyword={keyword}&key={API_KEY}"
)
response = requests.get(url)
places_data = response.json()

sentiment_results = []

# Step 2: Loop through top 5 places
for place in places_data.get("results", [])[:5]:
    name = place["name"]
    place_id = place["place_id"]
    rating = place.get("rating", "N/A")
    review_count = place.get("user_ratings_total", 0)

    # Step 3: Get reviews
    details_url = (
        f"https://maps.googleapis.com/maps/api/place/details/json"
        f"?place_id={place_id}&fields=reviews&key={API_KEY}"
    )
    details_response = requests.get(details_url)

    # No unexpected indent here
    reviews = details_response.json().get("result", {}).get("reviews", [])[:100]

    for review in reviews:
        text = review.get("text", "")
        if text:
            blob = TextBlob(text)
            polarity = blob.polarity
            subjectivity = blob.subjectivity
            sentiment = "Positive" if polarity > 0 else "Negative" if polarity < 0 else "Neutral"

            sentiment_results.append({
                "Restaurant": name,
                "Rating": rating,
                "Total Reviews": review_count,
                "Review": text,
                "Sentiment": sentiment,
                "Polarity": polarity,
                "Subjectivity": subjectivity
            })

    time.sleep(1)

# Step 4: Create and display table
df = pd.DataFrame(sentiment_results)
df_summary = df.groupby("Restaurant").agg({
    "Rating": "first",
    "Total Reviews": "first",
    "Polarity": "mean",
    "Subjectivity": "mean",
    "Sentiment": lambda x: x.value_counts().idxmax()  # Most common sentiment
}).reset_index()

# Calculate standard deviation of polarity
polarity_std = df.groupby("Restaurant")["Polarity"].std().reset_index(name="Polarity_STD")
df_summary = df_summary.merge(polarity_std, on="Restaurant")

# Normalize components for reliability
df_summary["Norm_Reviews"] = (df_summary["Total Reviews"] - df_summary["Total Reviews"].min()) / \
                             (df_summary["Total Reviews"].max() - df_summary["Total Reviews"].min())
df_summary["Norm_Subjectivity"] = 1 - df_summary["Subjectivity"]
df_summary["Norm_Stability"] = 1 - df_summary["Polarity_STD"].fillna(0)

# Calculate reliability
df_summary["Reliability"] = (
    0.5 * df_summary["Norm_Reviews"] +
    0.25 * df_summary["Norm_Subjectivity"] +
    0.25 * df_summary["Norm_Stability"]
)

coordinates = {}  # To store coordinates for each restaurant

for place in places_data.get("results", [])[:5]:
    name = place["name"]
    place_id = place["place_id"]
    rating = place.get("rating", "N/A")
    review_count = place.get("user_ratings_total", 0)

    # ✅ Extract coordinates
    lat = place["geometry"]["location"]["lat"]
    lng = place["geometry"]["location"]["lng"]
    coordinates[name] = (lat, lng)

df_summary["Distance_km"] = df_summary["Restaurant"].apply(lambda x: geodesic((latitude, longitude), coordinates[x]).km)

# Normalize relevant features (higher is better unless specified)
df_summary["Norm_Rating"] = (df_summary["Rating"] - df_summary["Rating"].min()) / (df_summary["Rating"].max() - df_summary["Rating"].min())
df_summary["Norm_Reviews"] = (df_summary["Total Reviews"] - df_summary["Total Reviews"].min()) / (df_summary["Total Reviews"].max() - df_summary["Total Reviews"].min())
df_summary["Norm_Polarity"] = (df_summary["Polarity"] - df_summary["Polarity"].min()) / (df_summary["Polarity"].max() - df_summary["Polarity"].min())
df_summary["Norm_Subjectivity"] = 1 - df_summary["Subjectivity"]  # lower subjectivity is better
df_summary["Norm_Distance"] = 1 - (df_summary["Distance_km"] - df_summary["Distance_km"].min()) / (df_summary["Distance_km"].max() - df_summary["Distance_km"].min())

# Compute Overall Score
df_summary["Overall Score"] = (
    0.2 * df_summary["Norm_Rating"] +
    0.15 * df_summary["Norm_Reviews"] +
    0.15 * df_summary["Norm_Polarity"] +
    0.15 * df_summary["Norm_Subjectivity"] +
    0.2 * df_summary["Reliability"] +
    0.15 * df_summary["Norm_Distance"]
)


# Drop intermediate columns including Polarity_STD
df_summary = df_summary.drop(columns=[col for col in df_summary.columns if 'Norm' in col or col == "Polarity_STD"])
df_summary

Unnamed: 0,Restaurant,Rating,Total Reviews,Polarity,Subjectivity,Sentiment,Reliability,Distance_km,Overall Score
0,A Pasta Bar,4.2,1516,0.304397,0.604344,Positive,0.406271,1.270949,0.202089
1,Eataly,4.2,5911,0.41822,0.632532,Positive,0.779182,0.63593,0.650288
2,Mamo,4.4,733,0.324536,0.567296,Positive,0.31264,1.244216,0.29701
3,Piccola Cucina Osteria Siciliana,4.6,2241,0.351217,0.554022,Positive,0.495566,1.516069,0.471396
4,Serafina Tribeca,4.3,1278,0.417689,0.589224,Positive,0.366662,0.568536,0.500036


Overall we can understand that: 

* Higher = better: Rating, Polarity, Reliability, Total Reviews

* Lower = better: Distance, Subjectivity

Eataly - scores the highest because it has the most reviews, high sentiment polarity, strong reliability and is located nearby, making it a consistently trusted and popular choice.

Pasta Bar - scores the lowest due to its relatively low sentiment polarity, higher subjectivity and being among the farthest, with no standout strengths to offset these factors.