-
Notifications
You must be signed in to change notification settings - Fork 0
2. Getting Started with Spark
Simon Renauld edited this page Oct 21, 2021
·
13 revisions
This section shows a first example of Spark
Read CSV Data:
Lazy operation: CSV has been converted to a DataFrame and then being converted into a local array or list of rows.
flightData2015 = df=spark.read.format("csv").option("header","true").load("C:/Users/renau/OneDrive/02-Data Projects/09-Apache-Spark/Spark-The-Definitive-Guide/data/flight-data/csv/2015-summary.csv")
flightData2015.take(5)
We can now call the explain plan which explain us about the stucture:
flightData2015.sort("count").explain()
>>> FileScan csv [DEST_COUNTRY_NAME#38,ORIGIN_COUNTRY_NAME#39,count#40] Batched: false, DataFilters: [], Format: CSV, Location: InMemoryFileIndex[file:/C:/Users/renau/OneDrive/02-Data Projects/09-Apache-Spark/Spark-The-Defini..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct<DEST_COUNTRY_NAME:string,ORIGIN_COUNTRY_NAME:string,count:string>