## Snowflake provides the following data types for geospatial data:

### The [GEOGRAPHY](https://docs.snowflake.com/en/sql-reference/data-types-geospatial#geography-data-type) data type, which models Earth as though it were a perfect sphere.
#### Points on the earth are represented as degrees of longitude (from -180 degrees to +180 degrees) and latitude (-90 to +90). Snowflake uses 14 decimal places to store GEOGRAPHY coordinates.

### The [GEOMETRY](https://docs.snowflake.com/en/sql-reference/data-types-geospatial#geometry-data-type) data type, which represents features in a planar (Euclidean, Cartesian) coordinate system.
#### The coordinates are represented as pairs of real numbers (x, y). Currently, only 2D coordinates are supported.

# Analyze New York City Small Business Data Using Snowflake Geospatial Functions.
## Goals
### 1. Convert LATITUDE & LONGITUDE in FLOAT to GEOGRAPHY Data Type. 
### 2. Aggregate all the small business locations for a selected sales territory.
### 3. Create a minimum bounding box (Envelope) that encompasses the small businesses in a territory. 
### 4. Display the Box and the small businesses located within that box. 
## Steps:
### 1. Create GEOGRAPHY Data Type for Each New York City Small Business Using [ST_MAKEPOINT](https://docs.snowflake.com/en/sql-reference/functions/st_makepoint)
### 2. Aggregate the GEOGRAPHY Points for the Sales Territory of your Choice Using [ST_COLLECT](https://docs.snowflake.com/en/sql-reference/functions/st_collect) For eg.: Bronx
### 3. Convert to GEOMETRY Data Type for Easy Envelope (Box) Creation Using [TO_GEOMETRY](https://docs.snowflake.com/en/sql-reference/functions/to_geometry)
### 4. Create Envelope (minimum bounding box) With the GEOMETRY Object Using [ST_ENVELOPE](https://docs.snowflake.com/en/sql-reference/functions/st_envelope)
### 5. Find the Center of the Envelope for Proper Positioning on the Map Using [ST_CENTROID](https://docs.snowflake.com/en/sql-reference/functions/st_centroid)
### 6. Layer the Envelope & Bronx Small Business GEOGRAPHY Points on the Map Using PyDeck.  

## Create NYC Small Business Table from the CSV file. 
### You can download the CSV file [here](https://github.com/rrprasan/Finance/tree/main/Snowflake/Notebooks/Miscellaneous_Topics/NYC_Small_Business_Geospatial_Analysis).
### [Follow these instructions](https://github.com/rrprasan/Finance/blob/main/Snowflake/Notebooks/Miscellaneous_Topics/NYC_Small_Business_Geospatial_Analysis/ReadMe.md) to create the Small Business table from the CSV file using Snowflake Snowsight. 
### After creating the table and loading the data, comeback to this Notebook to work through the geospatial example. 

## Why should we convert to [GEOGRAPHY](https://docs.snowflake.com/en/sql-reference/data-types-geospatial#geography-data-type) data type in Snowflake? 
### If you have geospatial data (for example, longitude and latitude data, WKT, WKB, GeoJSON, and so on), Snowflake suggests converting and storing this data in GEOGRAPHY columns, rather than keeping the data in their original formats in VARCHAR, VARIANT or NUMBER columns. 
### Storing your data in GEOGRAPHY columns can significantly improve the *performance of queries* that use geospatial functionality.
(Source: [Snowflake Documentation](https://docs.snowflake.com/en/sql-reference/data-types-geospatial))

## For this learning exercise, we are selecting only the the businesses that have latitude and longtiude in its data.
- ### We will create a view in Snowflake to only select the NYC Small Businesses with LATITUDE and LONGITUDE values in the table. 
- ### The [Selection](https://en.wikipedia.org/wiki/Selection_%28relational_algebra%29) in our query is restricted to NOT NULL values for LATITUDE and LONGITUDE columns.
- ### One way to clean the data set is to using [Geocoding](https://en.wikipedia.org/wiki/Address_geocoding) to get the LATITUDE and LONGITUDE for each business. We will explore this in a later demo.  

In [None]:
create or replace view NYC_SMALL_BUSINESS_VW(
	VENDOR_FORMAL_NAME,
	VENDOR_DBA,
	FIRST_NAME,
	LAST_NAME,
	TELEPHONE,
	BUSINESS_DESCRIPTION,
	CITY,
	BOROUGH,
	LATITUDE,
	LONGITUDE
) as
SELECT
    VENDOR_FORMAL_NAME,
	VENDOR_DBA,
	FIRST_NAME,
	LAST_NAME,
	TELEPHONE,
    BUSINESS_DESCRIPTION,
    CITY, 
    BOROUGH,
    LATITUDE,
    LONGITUDE,
FROM
    NYC_SMALL_BUSINESS_TBL
WHERE
    LONGITUDE IS NOT NULL
OR
    LATITUDE IS NOT NULL;

## Test the View


In [None]:
SELECT
    *
FROM
    NYC_SMALL_BUSINESS_VW
LIMIT 10;

### How many small businesses do we have in the view?  

In [None]:
SELECT 
    COUNT(*)
FROM
    NYC_SMALL_BUSINESS_VW

### What are the [Boroughs in New York City](https://en.wikipedia.org/wiki/Boroughs_of_New_York_City)? 

In [None]:
SELECT
    DISTINCT BOROUGH
FROM
    NYC_SMALL_BUSINESS_VW;

### We are going to create a Streamlit application to show the businesses in each NYC borough on a map. 
- ### To accomplish this task, we need to do the following to the data:
#### 1. Convert the LATITUDE and LONGITUDE in FLOAT data type to GEOGRAPHY data type using ST_MAKEPOINT function. 
#### 2. We will then focus our attention on creating a [Minimum Bounding Box](https://en.wikipedia.org/wiki/Minimum_bounding_box) (Envelope)
#### 3. The Minimum Bounding Box will encompass all the businesses (aggregation) in a borough
    3.a. Aggregation of all Small Businesses in a Selected Territory. 
    3.b. Convert Aggregation From GEOGRAPHY to GEOMETRY Data Type.  
    3.c. Create Minimum Bounding Box (Envelope) Using the Create Geometry.
#### 4. We will explore all these geospatial features in SQL and then use Python to do the same and create a map. 

### [ST_MAKEPOINT](https://docs.snowflake.com/en/sql-reference/functions/st_makepoint) constructs a GEOGRAPHY object that represents a point with the specified longitude and latitude.

In [None]:
SELECT 
    NYCSB.LONGITUDE,
    NYCSB.LATITUDE,
    ST_MAKEPOINT(NYCSB.LONGITUDE, NYCSB.LATITUDE) SMALL_BUSINESS_LOCATION,
    NYCSB.VENDOR_FORMAL_NAME, 
    NYCSB.BUSINESS_DESCRIPTION,
    NYCSB.CITY, 
    NYCSB.FIRST_NAME, 
    NYCSB.LAST_NAME, 
FROM NYC_SMALL_BUSINESS_VW NYCSB;

## Create a View With ST_MAKEPOINT for this Demo
### You can use this view as the input for the practicing the next step. Or, you can use the query as a CTE ([Common Table Expression](https://docs.snowflake.com/en/user-guide/queries-cte))
### The CTE Method is also illustrusted below.  

In [None]:
CREATE OR REPLACE VIEW NYC_SMALL_BUSINESS_LAT_LON_MAKEPOINT_VW
AS
SELECT 
    NYCSB.LONGITUDE,
    NYCSB.LATITUDE,
    ST_MAKEPOINT(NYCSB.LONGITUDE, NYCSB.LATITUDE) SMALL_BUSINESS_LOCATION,
    NYCSB.VENDOR_FORMAL_NAME, 
    NYCSB.BUSINESS_DESCRIPTION,
    NYCSB.CITY, 
    NYCSB.BOROUGH,
    NYCSB.FIRST_NAME, 
    NYCSB.LAST_NAME, 
FROM NYC_SMALL_BUSINESS_VW NYCSB;

## Goal: Create Minimum Bounding Box (Envelope)
### Requirements:
#### 1. Aggregation of all Small Businesses in a Selected Territory. 
#### 2. Convert Aggregation From GEOGRAPHY to GEOMETRY Data Type.  
#### 3. Create Minimum Bounding Box (Envelope) Using the Create Geometry.

### We are going to see an example for aggregating all the business in Bronx borough using [ST_COLLECT](https://docs.snowflake.com/en/sql-reference/functions/st_collect)

## Aggregate All Small Business Locations for Borough = Bronx using the view.  

In [None]:
SELECT
    ST_COLLECT(SMALL_BUSINESS_LOCATION) BRONX_SMALL_BUSINESS_POINTS,
FROM
   NYC_SMALL_BUSINESS_LAT_LON_MAKEPOINT_VW
WHERE
    BOROUGH ilike 'Bronx'

### Here's the CTE query as explained by [Snowflake Copilot](https://docs.snowflake.com/en/user-guide/snowflake-copilot)
- *Snowflake Copilot can be accessed by clicking on the "<" at the bottom right side of your Snowflake Notebook to open the Copilot Sidebar*.
- This query performs geospatial analysis on small businesses in the Bronx through two main steps:

- First, it creates a CTE that transforms latitude and longitude coordinates into spatial points using ST_MAKEPOINT for each business from the NYC_SMALL_BUSINESS_VW view, while also selecting relevant business information like name, description, and location details

- Then, the main query uses ST_COLLECT to aggregate all the spatial points of businesses in the Bronx (filtered case-insensitively using ILIKE) into a single geometry collection, effectively creating a spatial representation of all Bronx business locations

- The result is a single row containing a geometry collection of all small business locations in the Bronx borough, which could be useful for visualization or further spatial analysis.

In [None]:
WITH NYC_SMALL_BUSINESS_LAT_LON_MAKEPOINT_CTE 
AS (
SELECT 
    NYCSB.LONGITUDE,
    NYCSB.LATITUDE,
    ST_MAKEPOINT(NYCSB.LONGITUDE, NYCSB.LATITUDE) SMALL_BUSINESS_LOCATION,
    NYCSB.VENDOR_FORMAL_NAME, 
    NYCSB.BUSINESS_DESCRIPTION,
    NYCSB.CITY, 
    NYCSB.BOROUGH,
    NYCSB.FIRST_NAME, 
    NYCSB.LAST_NAME, 
FROM NYC_SMALL_BUSINESS_VW NYCSB
)
SELECT
    ST_COLLECT(SMALL_BUSINESS_LOCATION) BRONX_SMALL_BUSINESS_POINTS,
FROM
   NYC_SMALL_BUSINESS_LAT_LON_MAKEPOINT_CTE
WHERE
    BOROUGH ilike 'Bronx'

### Convert the Geospatial Aggregation from GEOGRAPHY to GEOMETRY Data Type to create the Envelope ([Minimum Bounding Box](https://en.wikipedia.org/wiki/Minimum_bounding_box))
### Why convert from GEOGRAPHY to GEOMETRY Data Type?
- ST_ENVELOPE function has been deprecated for GEOGRAPHY objects. The use of this function with GEOGRAPHY objects will be obsoleted in a future release (TBD).

- As an alternative, for GEOGRAPHY objects, use ST_XMIN, ST_XMAX, ST_YMIN, and ST_YMAX to determine the vertices of the bounding box around an input GEOGRAPHY object.
- It is much easier to use the GEOMETRY Data Type to create the Envelope. So, the conversion becomes essential. 

In [None]:
WITH BRONX_SMALL_BUS_AGGR_CTE 
AS (
SELECT
    ST_COLLECT(SMALL_BUSINESS_LOCATION) BRONX_SMALL_BUSINESS_POINTS,
FROM
   NYC_SMALL_BUSINESS_LAT_LON_MAKEPOINT_VW
WHERE
    BOROUGH ilike 'Bronx'
)
SELECT 
    TO_GEOMETRY(BRONX_SMALL_BUSINESS_POINTS) BRONX_SMALL_BUSINESS_GEOMETRY
FROM
    BRONX_SMALL_BUS_AGGR_CTE

In [None]:
WITH BRONX_SMALL_BUS_AGGR_CTE 
AS (
SELECT
    ST_COLLECT(SMALL_BUSINESS_LOCATION) BRONX_SMALL_BUSINESS_POINTS,
FROM
   NYC_SMALL_BUSINESS_LAT_LON_MAKEPOINT_VW
WHERE
    BOROUGH ilike 'Bronx'
)
SELECT 
    TO_GEOMETRY(BRONX_SMALL_BUSINESS_POINTS) BRONX_SMALL_BUSINESS_GEOMETRY
FROM
    BRONX_SMALL_BUS_AGGR_CTE

In [None]:
CREATE OR REPLACE VIEW NYC_SMALL_BUSINESS_BRONX_LAT_LON_AGGR_VW
AS 
SELECT
    ST_COLLECT(SMALL_BUSINESS_LOCATION) BRONX_SMALL_BUSINESS_POINTS,
FROM
   NYC_SMALL_BUSINESS_LAT_LON_MAKEPOINT_VW
WHERE
    BOROUGH ilike 'Bronx';

## Create a Minimum Bounding Box with ST_ENVELOPE SQL Function
### Query Explained by Snowflake Copilot:
- This query builds upon the previous one and performs two additional geospatial operations:

- First, it creates a CTE that converts the previously collected points (geometry collection of Bronx businesses) from GEOGRAPHY to GEOMETRY type using TO_GEOMETRY

- Then, it uses ST_ENVELOPE to compute the minimum bounding rectangle (MBR) that contains all the Bronx business locations. This creates a rectangular geometry that completely encompasses all business points in the Bronx, useful for understanding the geographical extent of business distribution

- The result is a single row containing a geometry object representing the smallest possible rectangle that contains all small business locations in the Bronx borough.

In [None]:
WITH BRONX_SMALL_BUSINESS_LAT_LON_AGGR_CTE
AS
(SELECT 
    TO_GEOMETRY(BRONX_SMALL_BUSINESS_POINTS) BRONX_SMALL_BUSINESS_GEOMETRY
FROM
    NYC_SMALL_BUSINESS_BRONX_LAT_LON_AGGR_VW)
SELECT
    ST_ENVELOPE(BRONX_SMALL_BUSINESS_GEOMETRY) MIN_BOUNDING_BOX_BRONX
FROM
    BRONX_SMALL_BUSINESS_LAT_LON_AGGR_CTE;

### Store the GEOGRAPHY to GEOMETRY Coversion as a View. 

In [None]:
CREATE OR REPLACE VIEW BRONX_SMALL_BUSINESS_GEOMETRY_VW
AS
SELECT 
    TO_GEOMETRY(BRONX_SMALL_BUSINESS_POINTS) BRONX_SMALL_BUSINESS_GEOMETRY
FROM
    NYC_SMALL_BUSINESS_BRONX_LAT_LON_AGGR_VW

In [None]:
SELECT
    ST_ENVELOPE(BRONX_SMALL_BUSINESS_GEOMETRY)
FROM
    BRONX_SMALL_BUSINESS_GEOMETRY_VW;

## Find the Center of the Minimum Bounding Box.
### This center point is used to position the box on the map at the center of all of Bronx's business.  
### Query explanation from Snowflake Copilot:
- This query builds upon the previous one by finding the center point of the minimum bounding box that contains all Bronx business locations:

- First, it creates a CTE that gets the minimum bounding rectangle (from the previous query's result) using ST_ENVELOPE
- Then, it applies ST_CENTROID to calculate the geometric center point of that bounding box
- The result is a single point (geometry) representing the center of the rectangular area that encompasses all Bronx business locations, which could be useful for understanding the central location of business distribution in the Bronx.


In [None]:
WITH BRONX_ENVELOPE_CTE AS(
SELECT
    ST_ENVELOPE(BRONX_SMALL_BUSINESS_GEOMETRY) MIN_BOUNDING_BOX_BRONX
FROM
    BRONX_SMALL_BUSINESS_GEOMETRY_VW
)
SELECT
    ST_CENTROID(MIN_BOUNDING_BOX_BRONX) BOUNDING_BOX_CENTER
FROM
    BRONX_ENVELOPE_CTE

### We will use the same functions (ST_MAKEPOINT, ST_COLLECT, ST_ENVELOPE, TO_GEOMETRY, ST_CENTROID) in Python to create a map of the small business that fall in a borough.  

In [None]:
# Import python packages
import streamlit as st
from snowflake.snowpark.context import get_active_session
from snowflake.snowpark.functions import *
from snowflake.snowpark.types import *
import json
import pandas as pd
import numpy as np
import pydeck as pdk

# Write directly to the app
st.title("New York City Small Business Location & Information")
st.write(
    """This app shows New York City small businesses on a map.
    """
)

# Get the current credentials
session = get_active_session()

### This Python code gets all the NYC Small Businesses into a Data Frame and uses the Map widget in Streamlit to visualize all the business.  
### The Map widget in Streamlit is the easiest way to create a map. 
### But, we will go further and create a map using PyDeck package and will have a Minimum Bounding Box on the Map.  

In [None]:
NYC_Small_Business_latlon = session.table('DEMODB.EQUITY_RESEARCH."NYC_SMALL_BUSINESS_VW"')
# pdGeoNYCSBDF = pd.DataFrame(NYC_Small_Business_latlon)
st.markdown('#### A dataframe which shows small businesses in NYC')
st.dataframe(NYC_Small_Business_latlon)
st.map(NYC_Small_Business_latlon, latitude='LATITUDE', longitude='LONGITUDE')

### ST_MAKEPOINT, ST_COLLECT, ST_ENVELOPE, TO_GEOMETRY, TO_GEOGRAPHY Functions in a Python Data Frame.  
### The following code in Python visualizes all the businesses in New York City found in our view.  

In [None]:
#create a point from the coordinates
envelope = NYC_Small_Business_latlon.with_column('POINT',call_function('ST_MAKEPOINT',col('"LONGITUDE"'),col('"LATITUDE"')))

#collect all the points into one row of data
envelope = envelope.select(call_function('ST_COLLECT',col('POINT')).alias('POINTS'))

#### convert from geography to geometry
envelope = envelope.select(to_geometry('POINTS').alias('POINTS'))


#create a rectangular shape which boarders the minimum possible size which covers all of the points
envelope = envelope.select(call_function('ST_ENVELOPE',col('POINTS')).alias('BOUNDARY'))

#convert back to geography
envelope = envelope.select(to_geography('BOUNDARY').alias('BOUNDARY'))
envelope.collect()[0][0]

### Here you will visualize the New York City Minimum Bounding Box (Envelope on a Map)

In [None]:
#find the centre point so the map will render from that location

centre = envelope.with_column('CENTROID',call_function('ST_CENTROID',col('BOUNDARY')))
centre = centre.with_column('LON',call_function('ST_X',col('CENTROID')))
centre = centre.with_column('LAT',call_function('ST_Y',col('CENTROID')))

#create LON and LAT variables

centrepd = centre.select('LON','LAT').to_pandas()
LON = centrepd.LON.iloc[0]
LAT = centrepd.LAT.iloc[0]

### transform the data in pandas so the pydeck visualisation tool can view it as a polygon

envelopepd = envelope.to_pandas()
envelopepd["coordinates"] = envelopepd["BOUNDARY"].apply(lambda row: json.loads(row)["coordinates"][0])


####visualise on a map

#### create a layer - this layer will visualise the rectangle

polygon_layer = pdk.Layer(
            "PolygonLayer",
            envelopepd,
            opacity=0.3,
            get_polygon="coordinates",
            filled=True,
            get_fill_color=[16, 14, 40],
            auto_highlight=True,
            pickable=False,
        )

 
#### render the map 
    
st.pydeck_chart(pdk.Deck(
    map_style=None,
    initial_view_state=pdk.ViewState(
        latitude=LAT,
        longitude=LON,
        zoom=8,
        height=400
        ),
    
layers= [polygon_layer]

))

### We have the Envelope created. 
### We can now visualize all the NYC small businesses inside the Envelope. 
### You can zoom in and mouse over each business to get the business name and the description. 
### You can expand on this and make it more useful for your use case by adding details of your choice to the tooltip.
### Remember, we are using a view to look at small businesses with Latitude and Longitude in the original data published by the city. So, this is a map of the subset of the businesses in New York City.  

In [None]:
placespd = NYC_Small_Business_latlon.to_pandas()
poi_l = pdk.Layer(
            'ScatterplotLayer',
            data=placespd,
            get_position='[LONGITUDE, LATITUDE]',
            get_color='[255,255,255]',
            get_radius=30,
            pickable=True)
    
st.pydeck_chart(pdk.Deck(
    map_style=None,
    initial_view_state=pdk.ViewState(
        latitude=LAT,
        longitude=LON,
        zoom=10,
        height=500
        ),
    
layers= [polygon_layer, poi_l], tooltip = {'text':"Name: {VENDOR_FORMAL_NAME},\n Description: {BUSINESS_DESCRIPTION}, \n City: {CITY},\n Borough:{BOROUGH}"}

))

## Reset your database if you choose to remove this data.  

In [None]:
DROP VIEW BRONX_SMALL_BUSINESS_GEOMETRY_VW;

In [None]:
DROP VIEW NYC_SMALL_BUSINESS_BRONX_LAT_LON_AGGR_VW;

In [None]:
DROP VIEW NYC_SMALL_BUSINESS_LAT_LON_MAKEPOINT_VW;

In [None]:
DROP VIEW NYC_SMALL_BUSINESS_VW;

In [None]:
DROP TABLE NYC_SMALL_BUSINESS_TBL;