- Title: Case of Column Names in Spark DataFrames
- Slug: spark-case-col-names
- Date: 2019-12-03
- Category: Programming
- Tags: programming, Scala, Spark, DataFrame, case, column names
- Author: Ben Du

## Comment

Even though Spark DataFrame/SQL APIs do not distinguish cases of column names,
the columns saved into HDFS are case-sensitive!

In [1]:
%%classpath add mvn
org.apache.spark spark-core_2.11 2.3.1
org.apache.spark spark-sql_2.11 2.3.1

In [2]:
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions._

val spark = SparkSession.builder()
    .master("local[2]")
    .appName("Spark Column Example")
    .config("spark.some.config.option", "some-value")
    .getOrCreate()

import spark.implicits._

org.apache.spark.sql.SparkSession$implicits$@556040f4

Create a Spark DataFrame whose column names are in lower case.

In [15]:
val df = Seq(
    (1L, "a", "foo", 3.0),
    (2L, "b", "bar", 4.0),
    (3L, "c", "foo", 5.0),
    (4L, "d", "bar", 7.0)
).toDF("col1", "col2", "col3", "col4")
df.show

+----+----+----+----+
|col1|col2|col3|col4|
+----+----+----+----+
|   1|   a| foo| 3.0|
|   2|   b| bar| 4.0|
|   3|   c| foo| 5.0|
|   4|   d| bar| 7.0|
+----+----+----+----+



null

Even though the column names of the DataFrame is in lower case,
it is case-insensitive when you access them using Spark DataFrame/SQL APIs.

In [16]:
df.select("col1", "COL2", "Col3").show

+----+----+----+
|col1|COL2|Col3|
+----+----+----+
|   1|   a| foo|
|   2|   b| bar|
|   3|   c| foo|
|   4|   d| bar|
+----+----+----+



The case of column names are preserved when you write a Spark DataFrame into disk.

In [18]:
df.select("col1", "COL2", "Col3").write.mode("overwrite").parquet("/tmp/df")

In [19]:
%%python

import pandas as pd
pd.read_parquet("/tmp/df")

Unnamed: 0,col1,COL2,Col3
0,1,a,foo
1,2,b,bar
2,3,c,foo
3,4,d,bar


## References

https://spark.apache.org/docs/latest/api/java/index.html?org/apache/spark/sql/Dataset.html

https://spark.apache.org/docs/latest/api/java/index.html?org/apache/spark/sql/functions.html

https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/Row.html