Pobierz dane for formacie GeoJson z wybranego źródła 

 

Źródło 1 

https://www.data.gov.uk/ 

https://environment.data.gov.uk/catchment-planning/v/c3-plan 

Źródło 2 

https://geojson.io/#map=2/0/20 

Źródło 3 

dbfs:/databricks-datasets/nyctaxi/ 

 

Zadanie 1 

Stwórz notatnik w którym użyjesz poniższych funkcji: 

Policz odległość pomiędzy punktami, sprawdź czy dany punkt geograficzny znajduje się w poligonie. Jeśli dane nie mają poligonu to możesz go stworzyć.  

Jeżeli dane z wybranego źródła nie mają wystarczająco danych żeby wykonać wszystkie funkcje, użyj innego źródła. 

Użyj jednej z wybranych bibliotek GeoMesa lub Mosaic lub Sedona. 

ST_Area 

ST_Distance 

ST_Contains 

ST_Intersects 

ST_Within 

In [0]:
%pip install databricks-mosaic


Python interpreter will be restarted.
Collecting databricks-mosaic
  Downloading databricks_mosaic-0.3.14-py3-none-any.whl (81.5 MB)
Collecting keplergl==0.3.2
  Downloading keplergl-0.3.2.tar.gz (9.7 MB)
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Collecting h3==3.7.3
  Downloading h3-3.7.3-cp38-cp38-manylinux2010_x86_64.whl (805 kB)
Collecting geopandas>=0.5.0
  Downloading geopandas-0.13.2-py3-none-any.whl (1.1 MB)
Collecting Shapely>=1.6.4.post2
  Downloading shapely-2.0.7-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.5 MB)
Collecting traittypes>=0.2.1
  Downloading traittypes-0.2.1-py2.py3-none-any.whl (8.6 kB)
Collecting pyproj>=3.0.1
  Downloading pyproj-3.5.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2

In [0]:
from mosaic import enable_mosaic
from pyspark.sql import functions as F

enable_mosaic(spark)



                Please use a Databricks:
                    - Photon-enabled Runtime for performance benefits
                    - Runtime ML for spatial AI benefits
                Mosaic will stop working on this cluster after v0.3.x.


## Załadowanie danych

In [0]:
from pyspark.sql.functions import explode, to_json, col, expr

polygon_df = (
    spark.read.option("multiline", "true").json("dbfs:/FileStore/tables/region.geojson")
    .select(explode("features").alias("feature"))
    .select(to_json(col("feature.geometry")).alias("geometry_json"))
    .withColumn("polygon_geom", expr("st_geomfromgeojson(geometry_json)"))
    .drop("geometry_json")
)

polygon_df.createOrReplaceTempView("region")
polygon_df.show()

+--------------------+
|        polygon_geom|
+--------------------+
|{3, 4326, [[[20.0...|
+--------------------+



In [0]:
from pyspark.sql import Row

point_df = spark.createDataFrame([
    Row(wkt_point="POINT(21.0122 52.2297)")
]).withColumn("point_geom", expr("st_geomfromwkt(wkt_point)"))

point_df.createOrReplaceTempView("location")
point_df.show()

+--------------------+--------------------+
|           wkt_point|          point_geom|
+--------------------+--------------------+
|POINT(21.0122 52....|{1, 0, [[[21.0122...|
+--------------------+--------------------+



In [0]:
spark.sql("""
SELECT st_contains(r.polygon_geom, l.point_geom) AS point_in_region
FROM region r CROSS JOIN location l
""").show()


+---------------+
|point_in_region|
+---------------+
|          false|
+---------------+



In [0]:
spark.sql("""
SELECT st_distance(r.polygon_geom, l.point_geom) AS distance_deg
FROM region r CROSS JOIN location l
""").show()

+-------------------+
|       distance_deg|
+-------------------+
|0.41443117956072995|
+-------------------+



In [0]:
spark.sql("""
SELECT st_area(polygon_geom) AS area_deg
FROM region
""").show()

+--------+
|area_deg|
+--------+
|     0.0|
+--------+



In [0]:
spark.sql("""
SELECT st_intersects(r.polygon_geom, l.point_geom) AS point_intersects_region
FROM region r CROSS JOIN location l
""").show()

+-----------------------+
|point_intersects_region|
+-----------------------+
|                  false|
+-----------------------+

