I started this cluster with Databricks 15.4 before learning that Mosaic 0.4 is limited to Databricks 13.x. Let's use Sedona instead...

In [0]:
dbutils.fs.mkdirs("/Workspace/Shared/sedona/1.7.2")

In [0]:
%sh
# Create directory
mkdir -p /Workspace/Shared/sedona/1.7.2

# Download Sedona JAR
curl -o /Workspace/Shared/sedona/1.7.2/sedona-spark-shaded-3.5_2.12-1.7.2.jar \
  "https://repo1.maven.org/maven2/org/apache/sedona/sedona-spark-shaded-3.5_2.12/1.7.2/sedona-spark-shaded-3.5_2.12-1.7.2.jar"


# Download Geotools wrapper
curl -o /Workspace/Shared/sedona/1.7.2/geotools-wrapper-1.7.2-28.5.jar \
  "https://repo1.maven.org/maven2/org/datasyslab/geotools-wrapper/1.7.2-28.5/geotools-wrapper-1.7.2-28.5.jar"


In [0]:
%sh
mkdir -p /Workspace/Shared/sedona/
cat > /Workspace/Shared/sedona/sedona-init.sh <<'EOF'
#!/bin/bash
cp /Workspace/Shared/sedona/1.7.2/*.jar /databricks/jars
EOF


In Cluster | Advanced, add init script and spark config:

```
(init)
/Workspace/Shared/sedona/sedona-init.sh

(do NOT prepend "dbfs:/" to the init script path)



(spark config)
spark.sql.extensions org.apache.sedona.viz.sql.SedonaVizExtensions,org.apache.sedona.sql.SedonaSqlExtensions
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.kryo.registrator org.apache.sedona.core.serde.SedonaKryoRegistrator
spark.sedona.enableParserExtension false

```

For reference, here's the (read-only) config:

```
{
  "data_security_mode": "DATA_SECURITY_MODE_DEDICATED",
  "single_user_name": "bryan@purr.io",
  "cluster_name": "okay-cluster",
  "kind": "CLASSIC_PREVIEW",
  "spark_conf": {
    "spark.sql.extensions": "org.apache.sedona.viz.sql.SedonaVizExtensions,org.apache.sedona.sql.SedonaSqlExtensions",
    "spark.serializer": "org.apache.spark.serializer.KryoSerializer",
    "spark.kryo.registrator": "org.apache.sedona.core.serde.SedonaKryoRegistrator",
    "spark.sedona.enableParserExtension": "false"
  },
  "spark_env_vars": {
    "PYSPARK_PYTHON": "/databricks/python3/bin/python3"
  },
  "aws_attributes": {
    "zone_id": "auto",
    "availability": "SPOT_WITH_FALLBACK",
    "first_on_demand": 1,
    "spot_bid_price_percent": 100
  },
  "runtime_engine": "PHOTON",
  "spark_version": "15.4.x-scala2.12",
  "node_type_id": "rd-fleet.xlarge",
  "autotermination_minutes": 120,
  "init_scripts": [
    {
      "workspace": {
        "destination": "dbfs:/Workspace/Shared/sedona/sedona-init.sh"
      }
    }
  ],
  "is_single_node": false,
  "autoscale": {
    "min_workers": 2,
    "max_workers": 8
  },
  "cluster_id": "0607-021933-dgy0wie3"
}
```

In [0]:
# %pip install apache-sedona geopandas

%pip install apache-sedona



In [0]:
dbutils.library.restartPython()

In [0]:
spark.sql("SELECT ST_Point(1, 1)").show()


In [0]:
from sedona.spark import SedonaContext
from sedona.sql.st_constructors import ST_Point

# Initialize Sedona
sedona = SedonaContext.create(spark)

# Create DataFrame and add geometry column
df = spark.createDataFrame([(1, -74.0060, 40.7128)], ["id", "lon", "lat"])
df = df.withColumn("geom", ST_Point("lon", "lat"))
df.show()

