# SedonaSpark => SedonaDB

This notebook shows how to convert a SedonaSpark DataFrame to a SedonaDB DataFrame.

You can convert the SedonaSpark DataFrame to an arrow table and then create the SedonaDB DataFrame from the arrow table.

SedonaSpark is normally run on a cluster and collecting the results in an arrow table puts all the data on the driver node.  This will result in an out of memory exception if the dataset is too large to fit on the driver node.

## Create the Spark DataFrame

In [7]:
from sedona.spark import *
from pyspark.sql.functions import col

In [15]:
config = (
    SedonaContext.builder()
    .config(
        "spark.jars.packages",
        "org.apache.sedona:sedona-spark-3.5_2.12:1.7.1,"
        "org.datasyslab:geotools-wrapper:1.7.1-28.5",
    )
    .config(
        "spark.jars.repositories",
        "https://artifacts.unidata.ucar.edu/repository/unidata-all",
    )
    .config("spark.executor.memory", "12G")
    .config("spark.driver.memory", "12G")
    .config("spark.sql.shuffle.partitions", "2")
    .getOrCreate()
)

sedona = SedonaContext.create(config)

25/09/28 10:31:18 WARN UDTRegistration: Cannot register UDT for org.geotools.coverage.grid.GridCoverage2D, which is already registered.
25/09/28 10:31:18 WARN SimpleFunctionRegistry: The function rs_union_aggr replaced a previously registered function.
25/09/28 10:31:18 WARN UDTRegistration: Cannot register UDT for org.locationtech.jts.geom.Geometry, which is already registered.
25/09/28 10:31:18 WARN UDTRegistration: Cannot register UDT for org.apache.sedona.common.geometryObjects.Geography, which is already registered.
25/09/28 10:31:18 WARN UDTRegistration: Cannot register UDT for org.locationtech.jts.index.SpatialIndex, which is already registered.
25/09/28 10:31:18 WARN SimpleFunctionRegistry: The function st_envelope_aggr replaced a previously registered function.
25/09/28 10:31:18 WARN SimpleFunctionRegistry: The function st_intersection_aggr replaced a previously registered function.
25/09/28 10:31:18 WARN SimpleFunctionRegistry: The function st_union_aggr replaced a previously

In [8]:
spark_df = sedona.createDataFrame(
    [
        ("a", "LINESTRING(2.0 5.0,6.0 1.0)"),
        ("b", "POINT(1.0 2.0)"),
        ("c", "POLYGON((7.0 1.0,7.0 3.0,9.0 3.0,7.0 1.0))"),
    ],
    ["id", "geometry"],
).withColumn("geometry", ST_GeomFromText(col("geometry")))

In [9]:
type(spark_df)

pyspark.sql.dataframe.DataFrame

## Convert the Spark DataFrame to a SedonaDB DataFrame

In [10]:
import sedona.db
sd = sedona.db.connect()

In [11]:
df = sd.create_data_frame(dataframe_to_arrow(spark_df))

In [12]:
df.show()

┌──────┬────────────────────────────┐
│  id  ┆          geometry          │
│ utf8 ┆          geometry          │
╞══════╪════════════════════════════╡
│ a    ┆ LINESTRING(2 5,6 1)        │
├╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ b    ┆ POINT(1 2)                 │
├╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ c    ┆ POLYGON((7 1,7 3,9 3,7 1)) │
└──────┴────────────────────────────┘


In [13]:
df.schema

SedonaSchema with 2 fields:
  id: utf8<Utf8>
  geometry: geometry<Wkb>

In [14]:
type(df)

sedonadb.dataframe.DataFrame