##Load data from ADLS (Azure DataLake Storage)
Add your credentials from your Storage Account (ADLS)

In [2]:
STORAGE_ACCOUNT_NAME = '<YOUR_STORAGE_ACCOUNT>'
CONTAINER_INPUT_RAW = '<YOUR_CONTAINER>'
CONTAINER_INPUT_TABLES = '<YOUR_CONTAINER>'
CONTAINER_INPUT_MODELS = '<YOUR_CONTAINER>'
KEY = '<YOUR_KEY>'

##Mount ADLS
We will mount two folders (input and output files)

In [4]:
dbutils.fs.mount(
  source = "wasbs://{}@{}.blob.core.windows.net".format(CONTAINER_INPUT_RAW, STORAGE_ACCOUNT_NAME),
  mount_point = "/mnt/{}".format(CONTAINER_INPUT_RAW),
  extra_configs = {"fs.azure.account.key.{}.blob.core.windows.net".format(STORAGE_ACCOUNT_NAME):"{}".format(KEY)})

In [5]:
dbutils.fs.mount(
  source = "wasbs://{}@{}.blob.core.windows.net".format(CONTAINER_INPUT_TABLES, STORAGE_ACCOUNT_NAME),
  mount_point = "/mnt/{}".format(CONTAINER_INPUT_TABLES),
  extra_configs = {"fs.azure.account.key.{}.blob.core.windows.net".format(STORAGE_ACCOUNT_NAME):"{}".format(KEY)})

In [6]:
dbutils.fs.mount(
  source = "wasbs://{}@{}.blob.core.windows.net".format(CONTAINER_INPUT_MODELS, STORAGE_ACCOUNT_NAME),
  mount_point = "/mnt/{}".format(CONTAINER_INPUT_MODELS),
  extra_configs = {"fs.azure.account.key.{}.blob.core.windows.net".format(STORAGE_ACCOUNT_NAME):"{}".format(KEY)})

##Read from ADLS to Spark
Get data from ADLS and transform to Spark DataFrame

In [8]:
FILENAME = "/mnt/{}/UsedCars.csv".format(CONTAINER_INPUT_RAW)

UsedCars = (spark
  .read
  .option("header", True) \
  .option("sep", ',') \
  .csv(FILENAME)
)

In [9]:
display(UsedCars)

##Create a temporary table
We can create a table from Spark Dataframe to be able to use Spark SQL.

In [11]:
temp_table_name = "UsedCars_temp"

UsedCars.createOrReplaceTempView(temp_table_name)

In [12]:
%sql
select * from UsedCars_temp

##Persist to a Permanent Table
To share to all notebooks and users we can choose to persist the Spark Dataframe to a permanent table. In this case we have two options, persist as a **Managed Table** or **Unmanaged Table**. The differences between these two options can be found in this [link](https://docs.databricks.com/data/tables.html#managed-and-unmanaged-tables&language-python). *In almost all use cases, UNmanaged tables are preferred.*

###Create a managed table

In [15]:
permanent_table_name = "usedcars_managed"
UsedCars.write.saveAsTable(permanent_table_name)

###Create a unmanaged table

In [17]:
permanent_table_name = "usedcars_unmanaged"
UsedCars.write.option('path', "/mnt/tables/").saveAsTable(permanent_table_name)

##Review Spark Catalog

Note the `tableType` field for our tables and views:
- The unmanaged table `usedcars_unmanaged` is `EXTERNAL`
- The managed table `usedcars_managed` is `MANAGED`
- The temp view `usedcars_temp` is `TEMPORARY`

In [19]:
spark.catalog.listTables()

##Unmount if necessary

In [21]:
dbutils.fs.mounts()
dbutils.fs.ls('/mnt/')

#CONTAINER = 'rawdata'
#dbutils.fs.unmount("/mnt/{}".format(CONTAINER))