## Overview

This notebook will show you how to create and query a table or DataFrame or Dataset using DepartureDelay.csv dataset.

This notebook is written in **Python** so the default cell type is Python. However, you can use different languages by using the `%LANGUAGE` syntax. Python, Scala, SQL, and R are all supported.

In [None]:
# File location and type
file_location = "/databricks-datasets/flights/departuredelays.csv"
file_type = "csv"

# CSV options
infer_schema = False
first_row_is_header = True
delimiter = ","

# The applied options are for CSV files. For other file types, these will be ignored.
df = spark.read.format(file_type) \
  .option("inferSchema", infer_schema) \
  .option("header", first_row_is_header) \
  .option("delimiter", delimiter) \
  .load(file_location)


In [None]:
# display the first 10 rows of the dataframe
display(df.limit(10))

Creating a dataset in scala

In [None]:
%scala
val ds = spark.read.options(Map("inferSchema"->"true","delimiter"->",","header"->"true"))
  .csv("/databricks-datasets/flights/departuredelays.csv").as[(Int, Int, Int, String, String)]
// notice at the end the cast .as[(..)] this casts the DataFrame to a DataSet

In [None]:
%scala
display(ds.limit(10))


// Using the standard Spark commands, take() and foreach(), print the first 10 rows of the Datasets. 
Print 10 lines

In [None]:
%scala
ds.take(10).foreach(println(_))

In [None]:
%scala
ds.filter(ds("delay")<0).show(false)

Create a TempView

In [None]:
%scala
ds.createOrReplaceTempView("DepartureDelay")

In [None]:
%sql
select * from DepartureDelay
where delay<0
order by delay asc