2. Getting Started with Spark

Jump to bottom

Simon Renauld edited this page Oct 21, 2021 · 13 revisions

This section shows a first example of Spark

2.1. Apache Spark Example and Core Concepts

Read CSV Data:

Lazy operation: CSV has been converted to a DataFrame and then being converted into a local array or list of rows.

flightData2015 = df=spark.read.format("csv").option("header","true").load("C:/Users/renau/OneDrive/02-Data Projects/09-Apache-Spark/Spark-The-Definitive-Guide/data/flight-data/csv/2015-summary.csv")

flightData2015.take(5)

We can now call the explain plan which explain us about the stucture:

flightData2015.sort("count").explain() 

>>>  FileScan csv [DEST_COUNTRY_NAME#38,ORIGIN_COUNTRY_NAME#39,count#40] Batched: false, DataFilters: [], Format: CSV, Location: InMemoryFileIndex[file:/C:/Users/renau/OneDrive/02-Data Projects/09-Apache-Spark/Spark-The-Defini..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<DEST_COUNTRY_NAME:string,ORIGIN_COUNTRY_NAME:string,count:string>