# Spatial Query in sedona


In this tutorial, we will play with spatial join query. To better understand the query, we use this [website](https://www.keene.edu/campus/maps/tool/) to get coordinates.

For example, the below polygone represent Île-de-France:
```text
1.8814087,49.2265665
1.8099976,48.5884175
2.9347229,48.5820584
3.0528259,49.2068317
1.8814087,49.2265665
```

polygone represents casd:
```text
2.3065817,48.8204849
2.3063672,48.8177934
2.3113775,48.8177369
2.3114955,48.8205838
2.3065817,48.8204849

```

polygone represents insee:
```text
2.3066783,48.8179488
2.3065925,48.8159283
2.3108518,48.8159566
2.3109269,48.8179559
2.3066783,48.8179488

```

polygone represents hospital Paul-Brousse:
```text
2.359668,48.7974719
2.3590672,48.7944258
2.3640883,48.7945106
2.3641849,48.7977687
2.359668,48.7974719
```


      
The coordinates of eiffel-tour:
```text
2.2949409,48.8579388 
```

The coordinates of bordeaux:
```text
-0.574851 44.8453837
```

Below is a list of predefine functions that implements geospatial predicates:

- **ST_Contains**(polygondf.polygonshape,pointdf.pointshape): Return true if A fully contains B.
- **ST_Crosses**(polygondf.polygonshape,polygondf.polygonshape): Return true if A crosses B. (E.g. check if a polygon cross inside a polygone)
- **ST_Disjoint**(polygondf.polygonshape,polygondf.polygonshape): Return true if A and B are disjoint. (E.g. check if a polygon disjoint another polygone)
- **ST_DWithin**(leftGeometry: Geometry, rightGeometry: Geometry, distance: Double, useSpheroid: Optional(Boolean) = false): 
- **ST_Equals**(A: Geometry, B: Geometry): Return true if A equals to B. (E.g. checks if two line string LINESTRING(0 0,10 10), LINESTRING(0 0,5 5,10 10) equals.)
- **ST_Intersects**(polygondf.polygonshape,pointdf.pointshape): Return true if A intersects B. 
- **ST_OrderingEquals**(A: geometry, B: geometry): Returns true if the geometries are equal and the coordinates are in the same order
- **ST_Overlaps**(A: Geometry, B: Geometry): Return true if A overlaps B
- **ST_Relate**(geom1: Geometry, geom2: Geometry, intersectionMatrix: String):

In [12]:
from pyspark.sql import DataFrame
from sedona.spark import SedonaContext
import geopandas as gpd
from pyspark.sql.functions import trim, col
from pathlib import Path

In [2]:
# get the project root dir
project_root_dir = Path.cwd().parent.parent

In [3]:
# build a sedona session (sedona = 1.6.1)
jar_folder = Path(f"{project_root_dir}/jars/sedona-35-213-161")
jar_list = [str(jar) for jar in jar_folder.iterdir() if jar.is_file()]
jar_path = ",".join(jar_list)

# build a sedona session (sedona = 1.6.1) offline
config = SedonaContext.builder() \
    .master("local[*]") \
    .config('spark.jars', jar_path). \
    getOrCreate()

In [4]:
# create a sedona context
sedona = SedonaContext.create(config)
sc = sedona.sparkContext

In [5]:
# this sets the encoding of shape files
sc.setSystemProperty("sedona.global.charset", "utf8")

In [13]:
def evalSpaceJoinQuery(TargetQuery:str)->DataFrame:
    inQuery = f"{TargetQuery} as result"
    return sedona.sql(inQuery)

## 1. ST_Contains VS ST_Within

- **ST_Contains**(A: Geometry, B: Geometry): Return true if A fully contains B.
- **ST_Within**(A: Geometry, B: Geometry): Return true if A is fully contained by B

In below code example, we check if:
- eiffel_tour(point) within Île-de-France(polygon)/ Île-de-France contains eiffel_tour
- bordeaux city hall(point) in Île-de-France(polygon) / Île-de-France contains eiffel_tour
- casd(polygon) inside Île-de-France(polygon) /

In [36]:
ile_france = "POLYGON((1.8814087 49.2265665,1.8099976 48.5884175,2.9347229 48.5820584,3.0528259 49.2068317,1.8814087 49.2265665))"

casd = "POLYGON((2.3065817  48.8204849,2.3063672  48.8177934,2.3113775  48.8177369,2.3114955  48.8205838,2.3065817  48.8204849))"

insee = "POLYGON((2.3066783  48.8179488,2.3065925  48.8159283,2.3108518  48.8159566,2.3109269  48.8179559,2.3066783  48.8179488))"
paul_brousse= "POLYGON((2.359668  48.7974719,2.3590672  48.7944258,2.3640883  48.7945106,2.3641849  48.7977687,2.359668  48.7974719))"

eiffel_tour = "POINT(2.2949409 48.8579388)"
bordeaux = "POINT(-0.574851 44.8453837)"

In [47]:
# in query 1 we use ST_Contains
query1 = f"SELECT ST_Contains(ST_GeomFromWKT('{ile_france}'), ST_GeomFromWKT('{eiffel_tour}'))"

resu1 = evalSpaceJoinQuery(query1)

# we can do the same by using
query1bis = f"SELECT ST_Within(ST_GeomFromWKT('{eiffel_tour}'), ST_GeomFromWKT('{ile_france}'))"

resu1bis = evalSpaceJoinQuery(query1bis)


In [48]:
resu1.show()
resu1bis.show()

+------+
|result|
+------+
|  true|
+------+

+------+
|result|
+------+
|  true|
+------+



In [26]:
query2 = f"SELECT ST_Contains(ST_GeomFromWKT('{ile_france}'), ST_GeomFromWKT('{bordeaux}'))"

resu2 = evalSpaceJoinQuery(query2)

In [27]:
resu2.show()

+------+
|result|
+------+
| false|
+------+



In [32]:
query3 = f"SELECT ST_Contains(ST_GeomFromWKT('{ile_france}'), ST_GeomFromWKT('{casd}'))"
resu3 = evalSpaceJoinQuery(query3)
resu3.show()

+------+
|result|
+------+
|  true|
+------+



## 2. ST_Crosses

In [33]:
query4 = f"SELECT ST_Crosses(ST_GeomFromWKT('{casd}'),ST_GeomFromWKT('{insee}'))"
resu4 = evalSpaceJoinQuery(query4)
resu4.show()

+------+
|result|
+------+
| false|
+------+



## 3. ST_Disjoint




In [34]:
query = f"SELECT ST_Disjoint(ST_GeomFromWKT('{casd}'),ST_GeomFromWKT('{insee}'))"
resu = evalSpaceJoinQuery(query)
resu.show()

+------+
|result|
+------+
| false|
+------+



In [37]:
query = f"SELECT ST_Disjoint(ST_GeomFromWKT('{casd}'),ST_GeomFromWKT('{paul_brousse}'))"
resu = evalSpaceJoinQuery(query)
resu.show()

+------+
|result|
+------+
|  true|
+------+



## 4. ST_DWithin

Returns true if 'leftGeometry' and 'rightGeometry' are within a specified 'distance'. 

If useSpheroid is passed true, ST_DWithin uses Sedona's `ST_DistanceSpheroid` to check the spheroid distance between the centroids of two geometries. 

If useSpheroid is passed false, ST_DWithin uses Euclidean distance and the unit of the distance is the same as the CRS of the geometries. To obtain the correct result, please consider using `ST_Transform` to put data in an appropriate CRS.

The **unit of the distance in this case is meter**. 



In [44]:
# the unit of distance is meter
distance = 4700
query = f"SELECT ST_DWithin(ST_GeomFromWKT('{casd}'),ST_GeomFromWKT('{paul_brousse}'),{distance},true)"
resu = evalSpaceJoinQuery(query)
resu.show()

+------+
|result|
+------+
|  true|
+------+



## ST_Equals and ST_OrderingEquals

- **ST_Equals**: Returns true if the geometries are equal
- **ST_OrderingEquals**: Returns true if the geometries are equal and the coordinates are in the same order
We have two lines, line1 and line2. They represent the same line but with different coordinates.
So with function `ST_Equals`, it returns True. with function `ST_OrderingEquals`, it returns false 

In [45]:
line1 = "LINESTRING(0 0,10 10)"
line2 = "LINESTRING(0 0,5 5,10 10)"
query = f"SELECT ST_Equals(ST_GeomFromWKT('{line1}'),ST_GeomFromWKT('{line2}'))"
resu = evalSpaceJoinQuery(query)
resu.show()

+------+
|result|
+------+
|  true|
+------+



In [46]:
line1 = "LINESTRING(0 0,10 10)"
line2 = "LINESTRING(0 0,5 5,10 10)"
query = f"SELECT ST_OrderingEquals(ST_GeomFromWKT('{line1}'),ST_GeomFromWKT('{line2}'))"
resu = evalSpaceJoinQuery(query)
resu.show()

+------+
|result|
+------+
| false|
+------+

