- Author: Ben Du
- Date: 2020-06-17
- Title: Empty DataFrames in Spark
- Slug: spark-dataframe-empty
- Category: Computer Science
- Tags: programming, Scala, Spark, DataFrame, empty
- Modified: 2020-06-17


In [2]:
%%classpath add mvn
org.apache.spark spark-core_2.11 2.1.1
org.apache.spark spark-sql_2.11 2.1.1

In [3]:
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions._

val spark = SparkSession
    .builder()
    .master("local")
    .appName("Spark SQL basic example")
    .config("spark.some.config.option", "some-value")
    .getOrCreate()

import spark.implicits._

org.apache.spark.sql.SparkSession$implicits$@1c18cefa

An empty DataFrame with 1 column.

In [4]:
val s = Seq.empty[String]
val df = s.toDF("x")
df.show

+---+
|  x|
+---+
+---+



null

The size of an empty DataFrame is 0 of course.

In [5]:
df.count

0

Write the empty DataFrame into a file.

In [5]:
df.write.mode("overwrite").csv("empty.csv")

null

An empty DataFrame with no rows or columns.

In [6]:
val df2 = spark.emptyDataFrame
df2.show

++
||
++
++



null

The size of an empty DataFrame is 0 of course.

In [7]:
df2.count

0

Write the empty DataFrame into a file.

In [7]:
df2.write.mode("overwrite").csv("empty2.csv")

null

## Add a Column into an Empty DataFrame

The resulting DataFrame is still empty but with one more column.

In [8]:
import org.apache.spark.sql.functions._

df.withColumn("y", lit(1)).show

+---+---+
|  x|  y|
+---+---+
+---+---+



null