### Step 1: Read the data

Now that we have specified our file metadata, we can create a DataFrame. Notice that we use an *option* to specify that we want to infer the schema from the file. We can also explicitly set this to a particular schema if we have one already.

First, let's create a DataFrame in Python.

In [0]:
dbutils.widgets.removeAll()
dbutils.widgets.text("file_name", "")
dbutils.widgets.text("table_name", "")

In [0]:
file_name = getArgument("file_name")
file_type = "csv"
data_location = "/mnt/sparkcontainer"
table_name = getArgument("table_name")

### Step 2: Read the data

Now that we have specified our file metadata, we can create a DataFrame. Notice that we use an *option* to specify that we want to infer the schema from the file. We can also explicitly set this to a particular schema if we have one already.

First, let's create a DataFrame in Python.

In [0]:
source_location = "{0}/Triage/{1}.{2}".format(data_location, file_name, file_type)
df = spark.read.format(file_type).options(inferSchema='true', header='True').load(source_location)

display(df)

AIRPORT_ID,AIRPORT,DISPLAY_AIRPORT_NAME,LATITUDE,LONGITUDE
10001,01A,Afognak Lake Airport,58.10944444,-152.9066667
10003,03A,Bear Creek Mining Strip,65.54805556,-161.0716667
10004,04A,Lik Mining Camp,68.08333333,-163.1666667
10005,05A,Little Squaw Airport,67.57,-148.1838889
10006,06A,Kizhuyak Bay,57.74527778,-152.8827778
10007,07A,Klawock Seaplane Base,55.55472222,-133.1016667
10008,08A,Elizabeth Island Airport,59.15694444,-151.8291667
10009,09A,Augustin Island,59.36277778,-153.4305556
10010,1B1,Columbia County,42.29138889,-73.71027778
10011,1G4,Grand Canyon West,35.98611111,-113.8169444


In [0]:
target_location = "{0}/Bronze/{1}".format(data_location, file_name)
df.write.format("delta").mode("overwrite").save(target_location)
#df.write.format("delta").mode("overwrite").option("overwriteSchema", "true").partitionBy("Year", "Month").save(target_location)

In [0]:
spark.conf.set("tables.location", target_location)

In [0]:
%sql
SET table_Location = ${tables.location}

key,value
table_Location,/mnt/sparkcontainer/Bronze/AirportCodeLocationLookupClean


In [0]:
%sql
DROP TABLE IF EXISTS ${table_name};

CREATE TABLE ${table_name}
USING DELTA LOCATION '${tables.location}'

In [0]:
%sql
SELECT * FROM ${table_name}

AIRPORT_ID,AIRPORT,DISPLAY_AIRPORT_NAME,LATITUDE,LONGITUDE
10001,01A,Afognak Lake Airport,58.10944444,-152.9066667
10003,03A,Bear Creek Mining Strip,65.54805556,-161.0716667
10004,04A,Lik Mining Camp,68.08333333,-163.1666667
10005,05A,Little Squaw Airport,67.57,-148.1838889
10006,06A,Kizhuyak Bay,57.74527778,-152.8827778
10007,07A,Klawock Seaplane Base,55.55472222,-133.1016667
10008,08A,Elizabeth Island Airport,59.15694444,-151.8291667
10009,09A,Augustin Island,59.36277778,-153.4305556
10010,1B1,Columbia County,42.29138889,-73.71027778
10011,1G4,Grand Canyon West,35.98611111,-113.8169444


In [0]:
%sql
DESCRIBE TABLE EXTENDED ${table_name}

col_name,data_type,comment
Year,int,
Month,int,
DayofMonth,int,
DayOfWeek,int,
Carrier,string,
CRSDepTime,int,
DepDelay,int,
DepDel15,int,
CRSArrTime,int,
ArrDelay,int,
