## Note: MySQL OpenGIS for Nearest Neighbour Query / Proximity Search

Objective is to leverage OpenGIS for NNQ.

This notebook demonstrates NNQ by using MySQL OpenGIS implementation.

**Query**: find the list of restaurants that serve $cuisine and is within Xkm from a location (Lat, Lon)

Created the following restaurants table in test_spatial

```
CREATE TABLE restaurants (
    id INT AUTO_INCREMENT PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    cuisine VARCHAR(63) NOT NULL,
    longitude FLOAT NOT NULL,
    latitude FLOAT NOT NULL,
    location POINT NOT NULL SRID 4326  -- 'NOT NULL SRID xyz' is needed for Spatial Index
);
```
Note that
- MySQL requires 'location' column to be 'NOT NULL' so that a spatial index can be defined.
- MySQL requires 'SRID xyz' specification so that query planner will utilize spatial index when appropriate.



Tests the spatial part of query by using **MBRContains**, **MBR_Within(location, Polygon)**, and **ST_Within(location,ST_Buffer())**.

Query planner leverages spatial index for all variations.

Queries with MBRContains or MBR_Within predicates require additional ST_Distance function based filtering.

Tests the query by adding B-Tree index on 'cuisine' column.

We can conclude that a client can use a single query to retrieve paginated query results.

Setup

In [12]:
from prompt_toolkit.formatted_text import to_plain_text
%pip install --user PyMySQL
%pip install --user pymysql[rsa]
%pip install --user cryptography

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


In [1]:
import pymysql
import pandas as pd

In [2]:
def getConfigOrDefault(config_filename, config,label,default=None):
    if config.get(label) is not None:
        return config.get(label)
    if default is not None:
        return default
    print(f"{config_filename} file does not have {label} parameter!")
    return None

In [3]:
from dotenv import dotenv_values

config_filename = './mysql.cfg'
config = dotenv_values(config_filename)

USER_NAME = getConfigOrDefault(config_filename, config, 'MYSQL_UNAME')
USER_PWD  = getConfigOrDefault(config_filename, config, 'MYSQL_UPWD')
MYSQL_HOST = getConfigOrDefault(config_filename, config, 'MYSQL_HOST', 'localhost')
MYSQL_PORT = getConfigOrDefault(config_filename, config, 'MYSQL_PORT', '3306')
MYSQL_DB = getConfigOrDefault(config_filename, config, 'MYSQL_DBNAME', 'test_spatial')

if USER_PWD is None or USER_PWD is None:
    print(f"{config_filename} file does not have parameters: MYSQL_UNAME and/or MYSQL_UPWD!")

In [8]:
import pymysql

# Database connection details
db_host = MYSQL_HOST      # IP Address
db_user = USER_NAME       # MySQL username
db_password = USER_PWD    # MySQL password
db_name = MYSQL_DB        # Database name

# Establish a persistent connection
conn = pymysql.connect(
    host=db_host,
    user=db_user,
    password=db_password,
    database=db_name,
    cursorclass=pymysql.cursors.DictCursor  # Fetch results as dictionaries
)

print("Connected to MySQL database")

Connected to MySQL database


In [28]:
query="""
CREATE TABLE restaurants (
    id INT AUTO_INCREMENT PRIMARY KEY,
    name VARCHAR(255),
    coordinates POINT
);
"""
response = None
try:
    with conn.cursor() as cursor:
        response = cursor.execute(query)
        conn.commit()  # Commit changes for INSERT/UPDATE/DELETE

except pymysql.MySQLError as e:
    print(f"Query Error: {e}")

print(response)


0


In [9]:
query="""
show tables;
"""
response = None
try:
    with conn.cursor() as cursor:
        cursor.execute(query)
        response = cursor.fetchall()
        conn.commit()  # Commit changes for INSERT/UPDATE/DELETE

except pymysql.MySQLError as e:
    print(f"Query Error: {e}")

print(response)

()


In [10]:
query="""
DROP TABLE restaurants;
"""
response = None
try:
    with conn.cursor() as cursor:
        response = cursor.execute(query)
        conn.commit()  # Commit changes

except pymysql.MySQLError as e:
    print(f"Query Error: {e}")

print(response)

Query Error: (1051, "Unknown table 'test_spatial.restaurants'")
None


In [11]:
response = conn.close()
print(response)

None


In [31]:
dml_insert="""
INSERT INTO restaurants (name, coordinates)
VALUES ('First Duck', ST_PointFromText('POINT(72.8777 19.0760)'));
"""
response = None
try:
    with conn.cursor() as cursor:
        response = cursor.execute(dml_insert)
        conn.commit()  # Commit changes for INSERT/UPDATE/DELETE

except pymysql.MySQLError as e:
    print(f"Query Error: {e}")

print(response)

1


In [12]:
# Helper Function
def execute_sqL_with_fetchX(sql_stmt, x=0):
    """ Executes SQL statement with fetchnone
        Not worried about performance

    :param sql_stmt:
    :param x: if x=1, then calls fetchone(). If x=2, then calls fetchall().
    :return:
    """

    print(f"Executing SQL statement with fetchX-{sql_stmt,x}")
    response = None
    try:
        # Connect to an existing database
        connection = pymysql.connect(
                host=db_host,
                user=db_user,
                password=db_password,
                database=db_name,
                cursorclass=pymysql.cursors.DictCursor  # Fetch results as dictionaries
        )

        # Create a cursor to perform database operations
        cursor = connection.cursor()
        # Executing given SQL query
        response = cursor.execute(sql_stmt)
        connection.commit()

        if x == 1:
            response = cursor.fetchone()
        elif x == 2:
            response = cursor.fetchall()
        else:
            pass

        cursor.close()
        connection.close()

    except (Exception) as error:
        response = f"Error : {error}"
        print(response)
    finally:
        if (connection):
            #cursor.close()
            #connection.close()
            print(f"Connection to MySQL({MYSQL_HOST}:{MYSQL_PORT}) is closed")
    return response

In [36]:
print(execute_sqL_with_fetchX("SELECT * FROM restaurants",2))

Executing SQL statement with fetchX-('SELECT * FROM restaurants', 2)
Connection to MySQL(127.0.0.1:5432) is closed
[{'id': 1, 'name': 'First Duck', 'coordinates': b'\x00\x00\x00\x00\x01\x01\x00\x00\x00\xc0\xec\x9e<,8R@\xfa~j\xbct\x133@'}, {'id': 2, 'name': 'First Duck', 'coordinates': b'\x00\x00\x00\x00\x01\x01\x00\x00\x00\xc0\xec\x9e<,8R@\xfa~j\xbct\x133@'}]


In [219]:
print(execute_sqL_with_fetchX("DROP TABLE IF EXISTS restaurants",0))

Executing SQL statement with fetchX-('DROP TABLE IF EXISTS restaurants', 0)
Connection to MySQL(127.0.0.1:5432) is closed
0


In [223]:
print(execute_sqL_with_fetchX("SHOW TABLES;",2))

Executing SQL statement with fetchX-('SHOW TABLES;', 2)
Connection to MySQL(127.0.0.1:5432) is closed
[{'Tables_in_test_spatial': 'restaurants'}]


## Create restaurants table

In [222]:
query="""
CREATE TABLE restaurants (
    id INT AUTO_INCREMENT PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    cuisine VARCHAR(63) NOT NULL,
    longitude FLOAT NOT NULL,
    latitude FLOAT NOT NULL,
    location POINT NOT NULL SRID 4326  -- 'NOT NULL SRID xyz' is needed for Spatial Index
);
"""
print(execute_sqL_with_fetchX(query,0))

Executing SQL statement with fetchX-('\nCREATE TABLE restaurants (\n    id INT AUTO_INCREMENT PRIMARY KEY,\n    name VARCHAR(255) NOT NULL,\n    cuisine VARCHAR(63) NOT NULL,\n    longitude FLOAT NOT NULL,\n    latitude FLOAT NOT NULL,\n    location POINT NOT NULL SRID 4326\n);\n', 0)
Connection to MySQL(127.0.0.1:5432) is closed
0


## Load data from file to restaurants table

In [224]:
filename="./restaurants.csv"
df_loaded = pd.read_csv(filename)
df_loaded[:10]

Unnamed: 0,Name,Lon,Lat
0,Morris Park Bake Shop,-73.856077,40.848447
1,Wendy'S,-73.961704,40.662942
2,Riviera Caterer,-73.98242,40.579505
3,Tov Kosher Kitchen,-73.860115,40.731174
4,Brunos On The Boulevard,-73.880383,40.764312
5,Dj Reynolds Pub And Restaurant,-73.985136,40.767692
6,Wilken'S Fine Food,-73.906851,40.619903
7,Regina Caterers,-74.005289,40.628886
8,Taste The Tropics Ice Cream,-73.948261,40.640827
9,Kosher Island,-74.137729,40.611957


In [225]:
import random
def assignCuisineRandomly(df,cuisine,cname):
    for i in range(df.shape[0]):
        df.at[i,cname] = cuisine[random.randint(0,len(cuisine)-1)]

In [226]:
cuisine = ['italian', 'chinese', 'french', 'zambian', 'egyptian', 'canadian', 'mexican', 'vietnamese', 'cajun', 'korean', 'thai', 'brazilian','colombian','peruvian','ecuadorian', 'japanese','indian','malaysian','russian', 'indonesian']
assignCuisineRandomly(df_loaded,cuisine,'Cuisine')
df_loaded[-4:]

Unnamed: 0,Name,Lon,Lat,Cuisine
4996,Wagner College - Hawk' Nest,-74.092853,40.615121,vietnamese
4997,Ellen Deli & Grocery,-74.00781,40.725708,ecuadorian
4998,Crepes On Columbus,-73.961831,40.801052,thai
4999,Capital Grille,-73.974723,40.751244,mexican


Generates INSERT statement

In [227]:
def formatSQL(df):
    """
    Returns formatted SQL INSERT statement

    POINT(x y) where x is Latitude and y is Longitude, SRID 4326

    :param df: dictionary containing all attributes of a row
    :return: Formatted SQL INSERT statement
    """
    name = df['Name'].replace("'"," ")
    sql_stmt = f"INSERT INTO test_spatial.restaurants (name, cuisine, longitude, latitude, location) VALUES ('{name}','{df['Cuisine']}',{df['Lon']},{df['Lat']},ST_PointFromText('POINT({df['Lat']} {df['Lon']})', 4326));"

    return sql_stmt

In [228]:
#print(formatSQL(df_loaded[0]))
idx=4996
print(type(df_loaded.loc[idx].to_dict()))
df_loaded.loc[idx].to_dict()
print(formatSQL(df_loaded.loc[idx].to_dict()))

<class 'dict'>
INSERT INTO test_spatial.restaurants (name, cuisine, longitude, latitude, location) VALUES ('Wagner College - Hawk  Nest','vietnamese',-74.09285299999999,40.61512099999999,ST_PointFromText('POINT(40.61512099999999 -74.09285299999999)', 4326));


In [231]:
# Load data to test_spatial.restaurants table
# using ST_PointFromText('POINT($longitude $latitude)')
try:
    # Connect to an existing database
    connection = pymysql.connect(
        host=db_host,
        user=db_user,
        password=db_password,
        database=db_name,
        cursorclass=pymysql.cursors.DictCursor  # Fetch results as dictionaries
    )

    # Create a cursor to perform database operations
    cursor = connection.cursor()
    # Executing a SQL query
    cursor.execute("SELECT version();")
    # Fetch result
    record = cursor.fetchone()
    print("You are connected to - ", record, "\n")
    sql_stmt = ""
    for i in range(df_loaded.shape[0]):
        #addARestaurant(connection, df_loaded.loc[i].to_dict())
        row = df_loaded.loc[i]
        if row['Lat']>=-90 and row['Lat']<=90 and row['Lon']>-180 and row['Lon']<180:
            sql_stmt = formatSQL(df_loaded.loc[i].to_dict())
            #print(sql_stmt)
            cursor.execute(sql_stmt)
        else:
            print(f"Skipped {i} because of lon:{row['Lon']} lat:{row['Lat']}")

    connection.commit()

    #Closing the connection
    connection.close()

except (Exception) as error:
    print("Error: ", error)
finally:
    #if (connection):
    #cursor.close()
    #connection.close()
    print("MySQL connection is closed")

You are connected to -  {'version()': '8.0.44'} 

MySQL connection is closed


## Query

In [232]:
NNQ_LON = -74.092853
NNQ_LAT =  40.615121
NNQ_RADIUS = 2000
query = f"""
SELECT id, name
    , cuisine
    , longitude
    , latitude
    , ST_Distance(location, ST_PointFromText('POINT({NNQ_LAT} {NNQ_LON})',4326)) AS dist2NNQ
FROM test_spatial.restaurants
WHERE ST_Distance(location, ST_PointFromText('POINT({NNQ_LAT} {NNQ_LON})',4326),'metre') < {NNQ_RADIUS};
"""

explain_query = "Explain "+ query

In [233]:
reply = execute_sqL_with_fetchX(query,2)
print(type(reply))

Executing SQL statement with fetchX-("\nSELECT id, name\n    , cuisine\n    , longitude\n    , latitude\n    , ST_Distance(location, ST_PointFromText('POINT(40.615121 -74.092853)',4326)) AS dist2NNQ\nFROM test_spatial.restaurants\nWHERE ST_Distance(location, ST_PointFromText('POINT(40.615121 -74.092853)',4326),'metre') < 2000;\n", 2)
Connection to MySQL(127.0.0.1:5432) is closed
<class 'list'>


In [234]:
#print(reply)

df = pd.DataFrame(reply)
df = df.sort_values(by=['dist2NNQ'])
df[:10]

Unnamed: 0,id,name,cuisine,longitude,latitude,dist2NNQ
13,4997,Wagner College - Hawk Nest,vietnamese,-74.0928,40.6151,1.862645e-09
1,221,Roadhouse Restaurant,chinese,-74.1006,40.6134,680.5019
12,4595,Dunkin Donuts,egyptian,-74.1039,40.6165,945.5402
3,1345,Island Chateau,colombian,-74.0859,40.6012,1657.007
0,87,Labetti S Post # 2159,ecuadorian,-74.0744,40.6097,1670.551
8,4036,Garibaldi Deli Restaurant,zambian,-74.0768,40.6263,1840.276
10,4398,Vida,canadian,-74.0798,40.6287,1874.32
2,255,Jody S Club,egyptian,-74.1018,40.6308,1898.157
6,3582,The Black Dog Grill,indian,-74.0976,40.6319,1903.09
5,3283,Dock Street Bar & Grill,canadian,-74.0746,40.6252,1907.043


In [235]:
reply = execute_sqL_with_fetchX(explain_query,2)
#print(reply)

df1 = pd.DataFrame(reply)
df1

Executing SQL statement with fetchX-("Explain \nSELECT id, name\n    , cuisine\n    , longitude\n    , latitude\n    , ST_Distance(location, ST_PointFromText('POINT(40.615121 -74.092853)',4326)) AS dist2NNQ\nFROM test_spatial.restaurants\nWHERE ST_Distance(location, ST_PointFromText('POINT(40.615121 -74.092853)',4326),'metre') < 2000;\n", 2)
Connection to MySQL(127.0.0.1:5432) is closed


Unnamed: 0,id,select_type,table,partitions,type,possible_keys,key,key_len,ref,rows,filtered,Extra
0,1,SIMPLE,restaurants,,ALL,,,,,4912,100.0,Using where


Execution is scanning the restaurants table, see "rows" and "possible_keys" columns.

## Spatial Index

Let's test how a spatial index can help and how query planner will use.

In [236]:
create_spatial_index = f"""
CREATE SPATIAL INDEX restaurants_spatial_idx ON test_spatial.restaurants (location) ;
"""
drop_spatial_index = "DROP INDEX restaurants_spatial_idx ON test_spatial.restaurants;"
show_indexes = "SHOW INDEXES FROM test_spatial.restaurants;"


In [237]:
reply = execute_sqL_with_fetchX(create_spatial_index,1)
print(reply)

Executing SQL statement with fetchX-('\nCREATE SPATIAL INDEX restaurants_spatial_idx ON test_spatial.restaurants (location) ;\n', 1)
Connection to MySQL(127.0.0.1:5432) is closed
None


In [238]:
reply = execute_sqL_with_fetchX(query,2)
print(type(reply))

Executing SQL statement with fetchX-("\nSELECT id, name\n    , cuisine\n    , longitude\n    , latitude\n    , ST_Distance(location, ST_PointFromText('POINT(40.615121 -74.092853)',4326)) AS dist2NNQ\nFROM test_spatial.restaurants\nWHERE ST_Distance(location, ST_PointFromText('POINT(40.615121 -74.092853)',4326),'metre') < 2000;\n", 2)
Connection to MySQL(127.0.0.1:5432) is closed
<class 'list'>


In [239]:
reply = execute_sqL_with_fetchX(show_indexes,2)
df1 = pd.DataFrame(reply)
df1

Executing SQL statement with fetchX-('SHOW INDEXES FROM test_spatial.restaurants;', 2)
Connection to MySQL(127.0.0.1:5432) is closed


Unnamed: 0,Table,Non_unique,Key_name,Seq_in_index,Column_name,Collation,Cardinality,Sub_part,Packed,Null,Index_type,Comment,Index_comment,Visible,Expression
0,restaurants,0,PRIMARY,1,id,A,4912,,,,BTREE,,,YES,
1,restaurants,1,restaurants_spatial_idx,1,location,A,4912,32.0,,,SPATIAL,,,YES,


In [240]:
reply = execute_sqL_with_fetchX(explain_query,2)
df1 = pd.DataFrame(reply)
df1

Executing SQL statement with fetchX-("Explain \nSELECT id, name\n    , cuisine\n    , longitude\n    , latitude\n    , ST_Distance(location, ST_PointFromText('POINT(40.615121 -74.092853)',4326)) AS dist2NNQ\nFROM test_spatial.restaurants\nWHERE ST_Distance(location, ST_PointFromText('POINT(40.615121 -74.092853)',4326),'metre') < 2000;\n", 2)
Connection to MySQL(127.0.0.1:5432) is closed


Unnamed: 0,id,select_type,table,partitions,type,possible_keys,key,key_len,ref,rows,filtered,Extra
0,1,SIMPLE,restaurants,,ALL,,,,,4912,100.0,Using where


In [241]:
import math

def get_bounding_box(lat, lon, distance_m):
    # Earth's radius constants
    lat_const = 111320.0

    # Calculate offsets
    delta_lat = distance_m / lat_const
    delta_lon = distance_m / (lat_const * math.cos(math.radians(lat)))

    return {
        "min_lat": lat - delta_lat,
        "max_lat": lat + delta_lat,
        "min_lon": lon - delta_lon,
        "max_lon": lon + delta_lon
    }

In [242]:
mbb = get_bounding_box(NNQ_LAT,NNQ_LON,NNQ_RADIUS)
print(mbb)


{'min_lat': 40.59715477650018, 'max_lat': 40.63308722349982, 'min_lon': -74.11652080993636, 'max_lon': -74.06918519006365}


In [243]:
# MBB : Minimum Bounding Box
#
def get_mbb_query(lat, lon, distance_m):
    """
    Returns a SQL query that has the MBB

    MBRContains() function
        finds the MBB of the first parameter and
        checks whether the MBB of the first geometry contains the second geometry object.

    Because lat and lon are in 4326 but distance_m is in meters,
    there is a calculation (spherical Earth model) that approximately maps from meters to 4326 dimensions.

    :param lat: center of NNQ
    :param lon: center of NNQ
    :param distance_m: unit is meters,
    :return:
    """
    lon_const = 111320.0   ## in meters
    lat_const = 111133.0   ## in meters
    delta_lon = distance_m / (lon_const * math.cos(math.radians(lat)))
    delta_lat = distance_m / lat_const
    top_lon = lon + delta_lon
    top_lat = lat + delta_lat
    bottom_lon = lon - delta_lon
    bottom_lat = lat - delta_lat

    query_v2 = f"""
    SELECT id
        , name
        , cuisine
        , longitude
        , latitude
        , ST_Distance(location, ST_PointFromText('POINT({lat} {lon})',4326),'metre') AS dist2NNQ
    FROM test_spatial.restaurants
    WHERE MBRContains
            (   ST_GeomFromText('LINESTRING({top_lat} {top_lon},{bottom_lat} {bottom_lon})',4326),
                location
            )
    ORDER BY dist2NNQ;
    """
    return query_v2



In [244]:
query_v2 = get_mbb_query(NNQ_LAT,NNQ_LON,NNQ_RADIUS)
explain_query_v2 = "Explain "+ query_v2

In [245]:
print(explain_query_v2)

Explain 
    SELECT id
        , name
        , cuisine
        , longitude
        , latitude
        , ST_Distance(location, ST_PointFromText('POINT(40.615121 -74.092853)',4326),'metre') AS dist2NNQ
    FROM test_spatial.restaurants
    WHERE MBRContains
            (   ST_GeomFromText('LINESTRING(40.63311745469843 -74.06918519006365,40.597124545301575 -74.11652080993636)',4326),
                location
            )
    ORDER BY dist2NNQ;
    


In [246]:
reply = execute_sqL_with_fetchX(query_v2,2)
df1 = pd.DataFrame(reply)
df1

Executing SQL statement with fetchX-("\n    SELECT id\n        , name\n        , cuisine\n        , longitude\n        , latitude\n        , ST_Distance(location, ST_PointFromText('POINT(40.615121 -74.092853)',4326),'metre') AS dist2NNQ\n    FROM test_spatial.restaurants\n    WHERE MBRContains\n            (   ST_GeomFromText('LINESTRING(40.63311745469843 -74.06918519006365,40.597124545301575 -74.11652080993636)',4326),\n                location\n            )\n    ORDER BY dist2NNQ;\n    ", 2)
Connection to MySQL(127.0.0.1:5432) is closed


Unnamed: 0,id,name,cuisine,longitude,latitude,dist2NNQ
0,4997,Wagner College - Hawk Nest,vietnamese,-74.0928,40.6151,1.862645e-09
1,221,Roadhouse Restaurant,chinese,-74.1006,40.6134,680.5019
2,4595,Dunkin Donuts,egyptian,-74.1039,40.6165,945.5402
3,1345,Island Chateau,colombian,-74.0859,40.6012,1657.007
4,87,Labetti S Post # 2159,ecuadorian,-74.0744,40.6097,1670.551
5,4036,Garibaldi Deli Restaurant,zambian,-74.0768,40.6263,1840.276
6,4398,Vida,canadian,-74.0798,40.6287,1874.32
7,255,Jody S Club,egyptian,-74.1018,40.6308,1898.157
8,3582,The Black Dog Grill,indian,-74.0976,40.6319,1903.09
9,3283,Dock Street Bar & Grill,canadian,-74.0746,40.6252,1907.043


Results after row 13 are beyond the given query distance but this is due to specified Minimum Bounding Box.

In [247]:
reply = execute_sqL_with_fetchX(explain_query_v2,2)
df1 = pd.DataFrame(reply)
df1

Executing SQL statement with fetchX-("Explain \n    SELECT id\n        , name\n        , cuisine\n        , longitude\n        , latitude\n        , ST_Distance(location, ST_PointFromText('POINT(40.615121 -74.092853)',4326),'metre') AS dist2NNQ\n    FROM test_spatial.restaurants\n    WHERE MBRContains\n            (   ST_GeomFromText('LINESTRING(40.63311745469843 -74.06918519006365,40.597124545301575 -74.11652080993636)',4326),\n                location\n            )\n    ORDER BY dist2NNQ;\n    ", 2)
Connection to MySQL(127.0.0.1:5432) is closed


Unnamed: 0,id,select_type,table,partitions,type,possible_keys,key,key_len,ref,rows,filtered,Extra
0,1,SIMPLE,restaurants,,range,restaurants_spatial_idx,restaurants_spatial_idx,34,,167,100.0,Using where; Using filesort


Query leverages R-Tree index and only checked 167 rows (out of 5000), see **rows** and **possible_keys** columns.

In [248]:
# Defines a square as bounding box for given NNQ query
def get_polygon_of_query(lat, lon, distance_m):
    """
    Returns a SQL query that utilize MBRWithin() function

    MBRWithin() function has the second parameter representing a polygon.

    :param lat: Latitude, center of NNQ
    :param lon: Longitude, center of NNQ
    :param distance_m: distance in meters
    :return: Returns a SQL query that has the MBRWithin() function
    """
    # Calculate meter to Latitude and Longitude in 4326
    lon_const = 111320.0   ## in meters
    lat_const = 111133.0   ## in meters
    delta_lon = distance_m / (lon_const * math.cos(math.radians(lat)))
    delta_lat = distance_m / lat_const

    top_lat = lat + delta_lat
    right_lon = lon + delta_lon

    bottom_lat = lat - delta_lat
    left_lon = lon - delta_lon

    query = f"""
    SELECT id
        , name
        , cuisine
        , longitude
        , latitude
        , ST_Distance(location, ST_PointFromText('POINT({lat} {lon})',4326),'metre') AS dist2NNQ
    FROM test_spatial.restaurants
    WHERE MBRWithin
            (   location,
                ST_GeomFromText('Polygon(({top_lat} {right_lon},{bottom_lat} {right_lon},{bottom_lat} {left_lon},{top_lat} {left_lon},{top_lat} {right_lon}))',4326)
            )
    ORDER BY dist2NNQ;
    """
    return query

print('✅ get_polygon_of_query() defined!')

✅ get_polygon_of_query() defined!


In [249]:
query_v3 = get_polygon_of_query(NNQ_LAT,NNQ_LON,NNQ_RADIUS)
explain_query_v3 = "Explain "+ query_v3

In [250]:
print(explain_query_v3)

Explain 
    SELECT id
        , name
        , cuisine
        , longitude
        , latitude
        , ST_Distance(location, ST_PointFromText('POINT(40.615121 -74.092853)',4326),'metre') AS dist2NNQ
    FROM test_spatial.restaurants
    WHERE MBRWithin
            (   location,
                ST_GeomFromText('Polygon((40.63311745469843 -74.06918519006365,40.597124545301575 -74.06918519006365,40.597124545301575 -74.11652080993636,40.63311745469843 -74.11652080993636,40.63311745469843 -74.06918519006365))',4326)
            )
    ORDER BY dist2NNQ;
    


In [251]:
reply = execute_sqL_with_fetchX(query_v3,2)
df1 = pd.DataFrame(reply)
df1

Executing SQL statement with fetchX-("\n    SELECT id\n        , name\n        , cuisine\n        , longitude\n        , latitude\n        , ST_Distance(location, ST_PointFromText('POINT(40.615121 -74.092853)',4326),'metre') AS dist2NNQ\n    FROM test_spatial.restaurants\n    WHERE MBRWithin\n            (   location,\n                ST_GeomFromText('Polygon((40.63311745469843 -74.06918519006365,40.597124545301575 -74.06918519006365,40.597124545301575 -74.11652080993636,40.63311745469843 -74.11652080993636,40.63311745469843 -74.06918519006365))',4326)\n            )\n    ORDER BY dist2NNQ;\n    ", 2)
Connection to MySQL(127.0.0.1:5432) is closed


Unnamed: 0,id,name,cuisine,longitude,latitude,dist2NNQ
0,4997,Wagner College - Hawk Nest,vietnamese,-74.0928,40.6151,1.862645e-09
1,221,Roadhouse Restaurant,chinese,-74.1006,40.6134,680.5019
2,4595,Dunkin Donuts,egyptian,-74.1039,40.6165,945.5402
3,1345,Island Chateau,colombian,-74.0859,40.6012,1657.007
4,87,Labetti S Post # 2159,ecuadorian,-74.0744,40.6097,1670.551
5,4036,Garibaldi Deli Restaurant,zambian,-74.0768,40.6263,1840.276
6,4398,Vida,canadian,-74.0798,40.6287,1874.32
7,255,Jody S Club,egyptian,-74.1018,40.6308,1898.157
8,3582,The Black Dog Grill,indian,-74.0976,40.6319,1903.09
9,3283,Dock Street Bar & Grill,canadian,-74.0746,40.6252,1907.043


Return results have additional points beyond the distance of NNQ because of MBB specified by a polygon.

In [252]:
reply = execute_sqL_with_fetchX(explain_query_v3,2)
df1 = pd.DataFrame(reply)
df1

Executing SQL statement with fetchX-("Explain \n    SELECT id\n        , name\n        , cuisine\n        , longitude\n        , latitude\n        , ST_Distance(location, ST_PointFromText('POINT(40.615121 -74.092853)',4326),'metre') AS dist2NNQ\n    FROM test_spatial.restaurants\n    WHERE MBRWithin\n            (   location,\n                ST_GeomFromText('Polygon((40.63311745469843 -74.06918519006365,40.597124545301575 -74.06918519006365,40.597124545301575 -74.11652080993636,40.63311745469843 -74.11652080993636,40.63311745469843 -74.06918519006365))',4326)\n            )\n    ORDER BY dist2NNQ;\n    ", 2)
Connection to MySQL(127.0.0.1:5432) is closed


Unnamed: 0,id,select_type,table,partitions,type,possible_keys,key,key_len,ref,rows,filtered,Extra
0,1,SIMPLE,restaurants,,range,restaurants_spatial_idx,restaurants_spatial_idx,34,,167,100.0,Using where; Using filesort


In [253]:
# Returns SQL query that utilize ST_Buffer() function
def get_circle_of_query(lat, lon, distance_m):
    """
    Returns SQL query that utilize ST_Buffer() function
    ST_Buffer() function can represent a circle.

    :param lat: Latitude, center of NNQ
    :param lon: Longitude, center of NNQ
    :param distance_m: distance in meters
    :return:
    """

    query = f"""
    SELECT id
        , name
        , cuisine
        , longitude
        , latitude
        , ST_Distance(location, ST_PointFromText('POINT({lat} {lon})',4326),'metre') AS dist2NNQ
    FROM test_spatial.restaurants
    USE INDEX(restaurants_spatial_idx)
    WHERE ST_Within
            (   location,
                ST_Buffer(ST_GeomFromText('POINT({lat} {lon})',4326),{distance_m})
            )
    ORDER BY dist2NNQ;
    """
    return query

print('✅ get_circle_of_query() defined!')

✅ get_circle_of_query() defined!


In [254]:
query_v4 = get_circle_of_query(NNQ_LAT,NNQ_LON,NNQ_RADIUS)
explain_query_v4 = "Explain "+ query_v4

In [255]:
print(explain_query_v4)

Explain 
    SELECT id
        , name
        , cuisine
        , longitude
        , latitude
        , ST_Distance(location, ST_PointFromText('POINT(40.615121 -74.092853)',4326),'metre') AS dist2NNQ
    FROM test_spatial.restaurants
    USE INDEX(restaurants_spatial_idx)
    WHERE ST_Within
            (   location,
                ST_Buffer(ST_GeomFromText('POINT(40.615121 -74.092853)',4326),2000)
            )
    ORDER BY dist2NNQ;
    


In [256]:
reply = execute_sqL_with_fetchX(query_v4,2)
df1 = pd.DataFrame(reply)
df1

Executing SQL statement with fetchX-("\n    SELECT id\n        , name\n        , cuisine\n        , longitude\n        , latitude\n        , ST_Distance(location, ST_PointFromText('POINT(40.615121 -74.092853)',4326),'metre') AS dist2NNQ\n    FROM test_spatial.restaurants\n    USE INDEX(restaurants_spatial_idx)\n    WHERE ST_Within\n            (   location,\n                ST_Buffer(ST_GeomFromText('POINT(40.615121 -74.092853)',4326),2000)\n            )\n    ORDER BY dist2NNQ;\n    ", 2)
Connection to MySQL(127.0.0.1:5432) is closed


Unnamed: 0,id,name,cuisine,longitude,latitude,dist2NNQ
0,4997,Wagner College - Hawk Nest,vietnamese,-74.0928,40.6151,1.862645e-09
1,221,Roadhouse Restaurant,chinese,-74.1006,40.6134,680.5019
2,4595,Dunkin Donuts,egyptian,-74.1039,40.6165,945.5402
3,1345,Island Chateau,colombian,-74.0859,40.6012,1657.007
4,87,Labetti S Post # 2159,ecuadorian,-74.0744,40.6097,1670.551
5,4036,Garibaldi Deli Restaurant,zambian,-74.0768,40.6263,1840.276
6,4398,Vida,canadian,-74.0798,40.6287,1874.32
7,255,Jody S Club,egyptian,-74.1018,40.6308,1898.157
8,3582,The Black Dog Grill,indian,-74.0976,40.6319,1903.09
9,3283,Dock Street Bar & Grill,canadian,-74.0746,40.6252,1907.043


Query results do not contain any record beyond the given distance in NNQ.

In [257]:
reply = execute_sqL_with_fetchX(explain_query_v4,2)
df1 = pd.DataFrame(reply)
df1

Executing SQL statement with fetchX-("Explain \n    SELECT id\n        , name\n        , cuisine\n        , longitude\n        , latitude\n        , ST_Distance(location, ST_PointFromText('POINT(40.615121 -74.092853)',4326),'metre') AS dist2NNQ\n    FROM test_spatial.restaurants\n    USE INDEX(restaurants_spatial_idx)\n    WHERE ST_Within\n            (   location,\n                ST_Buffer(ST_GeomFromText('POINT(40.615121 -74.092853)',4326),2000)\n            )\n    ORDER BY dist2NNQ;\n    ", 2)
Connection to MySQL(127.0.0.1:5432) is closed


Unnamed: 0,id,select_type,table,partitions,type,possible_keys,key,key_len,ref,rows,filtered,Extra
0,1,SIMPLE,restaurants,,range,restaurants_spatial_idx,restaurants_spatial_idx,34,,167,100.0,Using where; Using filesort


Query leveraged R-Tree index.

In [262]:
# Implementation of query with cuisine predicate
#
def get_query_with_circle_cuisine(lat, lon, distance_m,cuisine):
    """
    Returns SQL query that utilize ST_Buffer() function and accepts predicate to filter by cuisine

    :param lat: Latitude, center of NNQ
    :param lon: Longitude, center of NNQ
    :param distance_m: distance in meters
    :param cuisine:
    :return:
    """
    query = f"""
    SELECT id
        , name
        , cuisine
        , longitude
        , latitude
        , ST_Distance(location, ST_PointFromText('POINT({lat} {lon})',4326),'metre') AS dist2NNQ
    FROM test_spatial.restaurants
    USE INDEX(restaurants_spatial_idx)
    WHERE ST_Within
            (   location,
                ST_Buffer(ST_GeomFromText('POINT({lat} {lon})',4326),{distance_m})
            )
            AND cuisine in ('{cuisine}')
    ORDER BY dist2NNQ;
    """
    return query

print('✅ get_query_with_circle_cuisine() defined!')

✅ get_query_with_circle_cuisine() defined!


In [263]:
query_v5 = get_query_with_circle_cuisine(NNQ_LAT,NNQ_LON,NNQ_RADIUS,'chinese')
explain_query_v5 = "Explain "+ query_v5
print(explain_query_v5)

Explain 
    SELECT id
        , name
        , cuisine
        , longitude
        , latitude
        , ST_Distance(location, ST_PointFromText('POINT(40.615121 -74.092853)',4326),'metre') AS dist2NNQ
    FROM test_spatial.restaurants
    USE INDEX(restaurants_spatial_idx)
    WHERE ST_Within
            (   location,
                ST_Buffer(ST_GeomFromText('POINT(40.615121 -74.092853)',4326),2000)
            )
            AND cuisine in ('chinese')
    ORDER BY dist2NNQ;
    


In [264]:
reply = execute_sqL_with_fetchX(query_v5,2)
df1 = pd.DataFrame(reply)
df1

Executing SQL statement with fetchX-("\n    SELECT id\n        , name\n        , cuisine\n        , longitude\n        , latitude\n        , ST_Distance(location, ST_PointFromText('POINT(40.615121 -74.092853)',4326),'metre') AS dist2NNQ\n    FROM test_spatial.restaurants\n    USE INDEX(restaurants_spatial_idx)\n    WHERE ST_Within\n            (   location,\n                ST_Buffer(ST_GeomFromText('POINT(40.615121 -74.092853)',4326),2000)\n            )\n            AND cuisine in ('chinese')\n    ORDER BY dist2NNQ;\n    ", 2)
Connection to MySQL(127.0.0.1:5432) is closed


Unnamed: 0,id,name,cuisine,longitude,latitude,dist2NNQ
0,221,Roadhouse Restaurant,chinese,-74.1006,40.6134,680.501933
1,4245,Cafe Milano,chinese,-74.1029,40.6311,1963.934565
2,2272,Afternoone S Restaurant,chinese,-74.1033,40.631,1972.94682


Query results contain records satisfying the NNQ and cuisine predicate.

In [265]:
reply = execute_sqL_with_fetchX(explain_query_v5,2)
df1 = pd.DataFrame(reply)
df1

Executing SQL statement with fetchX-("Explain \n    SELECT id\n        , name\n        , cuisine\n        , longitude\n        , latitude\n        , ST_Distance(location, ST_PointFromText('POINT(40.615121 -74.092853)',4326),'metre') AS dist2NNQ\n    FROM test_spatial.restaurants\n    USE INDEX(restaurants_spatial_idx)\n    WHERE ST_Within\n            (   location,\n                ST_Buffer(ST_GeomFromText('POINT(40.615121 -74.092853)',4326),2000)\n            )\n            AND cuisine in ('chinese')\n    ORDER BY dist2NNQ;\n    ", 2)
Connection to MySQL(127.0.0.1:5432) is closed


Unnamed: 0,id,select_type,table,partitions,type,possible_keys,key,key_len,ref,rows,filtered,Extra
0,1,SIMPLE,restaurants,,range,restaurants_spatial_idx,restaurants_spatial_idx,34,,167,10.0,Using where; Using filesort


## Adding B-Tree index

In [266]:
create_index_cuisine = f"""
CREATE INDEX restaurants_idx_cuisine ON test_spatial.restaurants (cuisine) ;
"""
drop_index_cuisine = "DROP INDEX restaurants_idx_cuisine ON test_spatial.restaurants;"


In [267]:
reply = execute_sqL_with_fetchX(create_index_cuisine,1)
print(reply)

Executing SQL statement with fetchX-('\nCREATE INDEX restaurants_idx_cuisine ON test_spatial.restaurants (cuisine) ;\n', 1)
Connection to MySQL(127.0.0.1:5432) is closed
None


In [268]:
reply = execute_sqL_with_fetchX(show_indexes,2)
df1 = pd.DataFrame(reply)
df1

Executing SQL statement with fetchX-('SHOW INDEXES FROM test_spatial.restaurants;', 2)
Connection to MySQL(127.0.0.1:5432) is closed


Unnamed: 0,Table,Non_unique,Key_name,Seq_in_index,Column_name,Collation,Cardinality,Sub_part,Packed,Null,Index_type,Comment,Index_comment,Visible,Expression
0,restaurants,0,PRIMARY,1,id,A,4912,,,,BTREE,,,YES,
1,restaurants,1,restaurants_spatial_idx,1,location,A,4912,32.0,,,SPATIAL,,,YES,
2,restaurants,1,restaurants_idx_cuisine,1,cuisine,A,20,,,,BTREE,,,YES,


In [None]:
reply = execute_sqL_with_fetchX(drop_index_cuisine,1)
print(reply)

In [269]:
reply = execute_sqL_with_fetchX(query_v5,2)
df1 = pd.DataFrame(reply)
df1

Executing SQL statement with fetchX-("\n    SELECT id\n        , name\n        , cuisine\n        , longitude\n        , latitude\n        , ST_Distance(location, ST_PointFromText('POINT(40.615121 -74.092853)',4326),'metre') AS dist2NNQ\n    FROM test_spatial.restaurants\n    USE INDEX(restaurants_spatial_idx)\n    WHERE ST_Within\n            (   location,\n                ST_Buffer(ST_GeomFromText('POINT(40.615121 -74.092853)',4326),2000)\n            )\n            AND cuisine in ('chinese')\n    ORDER BY dist2NNQ;\n    ", 2)
Connection to MySQL(127.0.0.1:5432) is closed


Unnamed: 0,id,name,cuisine,longitude,latitude,dist2NNQ
0,221,Roadhouse Restaurant,chinese,-74.1006,40.6134,680.501933
1,4245,Cafe Milano,chinese,-74.1029,40.6311,1963.934565
2,2272,Afternoone S Restaurant,chinese,-74.1033,40.631,1972.94682


In [270]:
reply = execute_sqL_with_fetchX(explain_query_v5,2)
df1 = pd.DataFrame(reply)
df1

Executing SQL statement with fetchX-("Explain \n    SELECT id\n        , name\n        , cuisine\n        , longitude\n        , latitude\n        , ST_Distance(location, ST_PointFromText('POINT(40.615121 -74.092853)',4326),'metre') AS dist2NNQ\n    FROM test_spatial.restaurants\n    USE INDEX(restaurants_spatial_idx)\n    WHERE ST_Within\n            (   location,\n                ST_Buffer(ST_GeomFromText('POINT(40.615121 -74.092853)',4326),2000)\n            )\n            AND cuisine in ('chinese')\n    ORDER BY dist2NNQ;\n    ", 2)
Connection to MySQL(127.0.0.1:5432) is closed


Unnamed: 0,id,select_type,table,partitions,type,possible_keys,key,key_len,ref,rows,filtered,Extra
0,1,SIMPLE,restaurants,,range,restaurants_spatial_idx,restaurants_spatial_idx,34,,167,5.0,Using where; Using filesort


# Summary

MySQL uses R-Tree for spatial index implementation.

This notebook studied multiple approaches in query predicates and all leveraged spatial index when available.
 - MBRContains in predicate, because of definition of MBR, there are some points beyond the given query distance.
 - MBRWithin in predicate, because of definition of MBR, there are some points beyond the given query distance.
 - [ST_BUFFER](https://dev.mysql.com/doc/refman/8.4/en/spatial-operator-functions.html) in predicate, returns the points according to given query distance. Please note that 'point_circle' is the default strategy.


## Appendix

In [None]:
import math

def normalize_coordinates(lat, lon):
    """
    Wraps longitude to [-180, 180] and latitude to [-90, 90].
    If latitude wraps over a pole, the longitude is flipped by 180 degrees.
    """

    # 1. Wrap Longitude first to a general 360 range
    # Formula: ((x + 180) % 360) - 180
    lon = ((lon + 180) % 360) - 180

    # 2. Wrap Latitude
    # Latitude is more complex because it reflects back from the poles
    # We use a double-modulo approach to simulate walking over the poles
    lat = (lat + 180) % 360
    if lat < 0:
        lat += 360
    lat -= 180

    if lat > 90:
        lat = 180 - lat
        lon += 180
    elif lat < -90:
        lat = -180 - lat
        lon += 180

    # 3. Final Longitude Wrap (in case the latitude flip pushed lon out of bounds)
    lon = ((lon + 180) % 360) - 180

    return round(lat, 6), round(lon, 6)


# --- Test Cases ---
print(f"Standard Point: {normalize_coordinates(40, -74)}")  # (40.0, -74.0)
print(f"Past Antimeridian: {normalize_coordinates(10, 190)}")  # (10.0, -170.0)
print(f"Past North Pole: {normalize_coordinates(100, 0)}")  # (80.0, -180.0)
print(f"Multiple Revolutions: {normalize_coordinates(10, 730)}")  # (10.0, 10.0)

In [None]:
j = 0
for i in range(df_loaded.shape[0]):
    row = df_loaded.loc[i]
    if row['Lat'] >= -90 and row['Lat'] <= 90 and row['Lon'] > -180 and row['Lon'] < 180:
        j += 1
    else:
        print(f"{i} lon:{row['Lon']} lat:{row['Lat']}")
        print(normalize_coordinates(row['Lat'], row['Lon']))

print(f" j: {j} i: {i}")
