# Calculate distance

In [1]:
from sedona.spark import *

In [2]:
# build a sedona session (sedona >= 1.4.1)
config = SedonaContext.builder(). \
    config('spark.jars.packages',
           'org.apache.sedona:sedona-spark-shaded-3.0_2.12:1.4.1,'
           'org.datasyslab:geotools-wrapper:1.4.0-28.2'). \
    getOrCreate()

# create a sedona context
sedona = SedonaContext.create(config)

23/07/20 16:04:29 WARN Utils: Your hostname, pengfei-Virtual-Machine resolves to a loopback address: 127.0.1.1; using 10.50.2.80 instead (on interface eth0)
23/07/20 16:04:29 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
:: loading settings :: url = jar:file:/home/pengfei/opt/spark-3.3.0/jars/ivy-2.5.0.jar!/org/apache/ivy/core/settings/ivysettings.xml


Ivy Default Cache set to: /home/pengfei/.ivy2/cache
The jars for the packages stored in: /home/pengfei/.ivy2/jars
org.apache.sedona#sedona-spark-shaded-3.0_2.12 added as a dependency
org.datasyslab#geotools-wrapper added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-21beb4f0-ed95-46c5-8f4d-b52cfe783bee;1.0
	confs: [default]
	found org.apache.sedona#sedona-spark-shaded-3.0_2.12;1.4.1 in central
	found org.datasyslab#geotools-wrapper;1.4.0-28.2 in central
:: resolution report :: resolve 206ms :: artifacts dl 5ms
	:: modules in use:
	org.apache.sedona#sedona-spark-shaded-3.0_2.12;1.4.1 from central in [default]
	org.datasyslab#geotools-wrapper;1.4.0-28.2 from central in [default]
	---------------------------------------------------------------------
	|                  |            modules            ||   artifacts   |
	|       conf       | number| search|dwnlded|evicted|| number|dwnlded|
	----------------------------------------------------------------

23/07/20 16:04:30 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable


Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
                                                                                

## 1. Euclidean distance between A and B

**ST_Distance (A:geometry, B:geometry)**

In [5]:
euclidean_dist_df=sedona.sql("SELECT ST_Distance(ST_GeomFromWKT('POINT (51.3168 -0.56)'), ST_GeomFromWKT('POINT (55.9533 -3.1883)'))")
euclidean_dist_df.show()

+-------------------------------------------------------------------------------------------------+
|st_distance(st_geomfromwkt(POINT (51.3168 -0.56), 0), st_geomfromwkt(POINT (55.9533 -3.1883), 0))|
+-------------------------------------------------------------------------------------------------+
|                                                                               5.3296428717128865|
+-------------------------------------------------------------------------------------------------+



## 2. Sphere distance

As we know the earth is not a 2D map, but a sphere. So the two point is in two different continant. The euclidean distance is no more accurate.

**ST_DistanceSphere (A:geometry, B:geometry, radius=6371008.0)** : It returns the `haversine / great-circle distance` of A using a given earth radius (default radius: 6371008.0). Unit is **meter**.

Compared to `ST_Distance + ST_Transform`, it works better for datasets that cover large regions such as continents or the entire planet. It is equivalent to PostGIS ST_Distance(geography, use_spheroid=false) and ST_DistanceSphere function and produces nearly identical results. It provides faster but less accurate result compared to ST_DistanceSpheroid.

> Geometry must be in EPSG:4326 (WGS84) projection and must be in lat/lon order. You can use ST_FlipCoordinates to swap lat and lon. For non-point data, we first take the centroids of both geometries and then compute the distance.

In [6]:
# We calculate the distance by using the default radius
sphere_dist_df1=sedona.sql("SELECT ST_DistanceSphere(ST_GeomFromWKT('POINT (51.3168 -0.56)'), ST_GeomFromWKT('POINT (55.9533 -3.1883)'))")
sphere_dist_df1.show()

+------------------------------------------------------------------------------------------------------------------+
|st_distancesphere(st_geomfromwkt(POINT (51.3168 -0.56), 0), st_geomfromwkt(POINT (55.9533 -3.1883), 0), 6371008.0)|
+------------------------------------------------------------------------------------------------------------------+
|                                                                                                 543796.9506134904|
+------------------------------------------------------------------------------------------------------------------+



In [7]:
# We calculate the distance by using a specific radius
sphere_dist_df2=sedona.sql("SELECT ST_DistanceSphere(ST_GeomFromWKT('POINT (51.3168 -0.56)'), ST_GeomFromWKT('POINT (55.9533 -3.1883)'),6378137.0)")
sphere_dist_df2.show()

+------------------------------------------------------------------------------------------------------------------+
|st_distancesphere(st_geomfromwkt(POINT (51.3168 -0.56), 0), st_geomfromwkt(POINT (55.9533 -3.1883), 0), 6378137.0)|
+------------------------------------------------------------------------------------------------------------------+
|                                                                                                 544405.4459192449|
+------------------------------------------------------------------------------------------------------------------+

