d-sandbox

<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png" alt="Databricks Learning" style="width: 1200px">
</div>

# Reading and Writing Data - Azure Data Lake Storage Gen2
**Technical Accomplishments:**
- Access an Azure Data Lake Storage Gen2 filesystem by mounting to DBFS

**Requirements:**
- ADLS Gen2 storage account in the same region as your Azure Databricks workspace
- A service principal with delegated permissions OR storage account access key

You will create Gen2 storage account and access it using a service principal in the steps below.

## ![Spark Logo Tiny](https://files.training.databricks.com/images/105/logo_spark_tiny.png) Classroom-Setup<br>

For each lesson to execute correctly, please make sure to run the **`Classroom-Setup`** cell at the start of each lesson (see the next cell) and the **`Classroom-Cleanup`** cell at the end of each lesson.

In [0]:
%run "../Includes/Classroom-Setup"

##![Spark Logo Tiny](https://files.training.databricks.com/images/105/logo_spark_tiny.png) Azure Data Lake Storage Gen2 (ADLS Gen2)

Azure Data Lake Storage Gen2 is a next-generation data lake solution for big data analytics.

ADLS Gen2 builds ADLS Gen1 capabilities - such as file system semantics, file-level security, and scale - 

into Azure Blob Storage, with its low-cost tiered storage, high availability, and disaster recovery features.

##![Spark Logo Tiny](https://files.training.databricks.com/images/105/logo_spark_tiny.png) Create Service Principal
You can access an ADLS Gen2 filesystem by using a **service principal** or by using the **storage account access key** directly.

However, if you want to mount the filesystem to DBFS, this requires OAuth 2.0 authentication, which means you'll have to use a **service principal**.

As you work through the following steps, record the **Directory ID**, **Application ID**, and **Secret** in the cell below:
1. In Azure Active Directory, go to Properties. Note the **Directory ID**
1. Go to App Registrations and create a new application registration
   * e.g. airlift-app-registration, Web app/API, https://can-be-literally-anything.com
1. Note the **Application ID**
1. Go to Certificates & Secrets and create a new client **secret** and copy its value

In [0]:
# TODO
directoryID = FILL_IN
applicationID = FILL_IN
keyValue = FILL_IN

##![Spark Logo Tiny](https://files.training.databricks.com/images/105/logo_spark_tiny.png) Create Storage Account

Follow these steps to <a href="https://docs.microsoft.com/en-us/azure/storage/data-lake-storage/quickstart-create-account" target="_blank">create your ADLS Gen2 storage account</a>.

As you work through the following steps, record the **Storage Account Name** and **File System Name** in the cell below:
0. Access the Azure Portal > Create a new resource > Storage account
0. Make sure to specify the correct *Resource Group* and *Region*, and use any unique string for the **Storage Account Name**  
  - Ensure the storage account is in the same region as your Azure Databricks workspace
0. Go to the **Advanced Tab** and **enable Hierarchical NameSpace**
0. Create a Data Lake Gen2 file system on the storage account and enter the **File System Name** in the cell below.
0. Under Access control (IAM) add a **Role assignment**, where the role is **Storage Blob Data Contributor (Preview)** assigned to the App Registration previously created

In [0]:
# TODO
storageAccountName = FILL_IN.VALUE
fileSystemName = FILL_IN.VALUE

##![Spark Logo Tiny](https://files.training.databricks.com/images/105/logo_spark_tiny.png) Mount ADLS Gen2 filesystem to DBFS

Use **`dbutils.fs.mount`** to <a href="https://docs.databricks.com/spark/latest/data-sources/azure/azure-datalake-gen2.html#mount-an-adls-filesystem-to-dbfs-using-a-service-principal-and-oauth-2-0" target="_blank">mount this filesystem</a> to DBFS.

In [0]:
configs = {
  "fs.azure.account.auth.type": "OAuth",
  "fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
  "fs.azure.account.oauth2.client.id": applicationID,
  "fs.azure.account.oauth2.client.secret": keyValue,
  "fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/{}/oauth2/token".format(directoryID)
}

mountPoint = f"/mnt/{username}-{fileSystemName}"
source = f"abfss://{fileSystemName}@{storageAccountName}.dfs.core.windows.net/"

In [0]:
dbutils.fs.mount(
  source = source,
  mount_point = mountPoint,
  extra_configs = configs)

### Access Your Files
Now, you can access files in your container as if they were local files!

In [0]:
files = dbutils.fs.ls(mountPoint)

display(files)

In [0]:
wikiEditsDF = spark.read.json(f"{mountPoint}/wikipedia/edits/snapshot-2016-05-26.json")

display(wikiEditsDF)

In [0]:
(wikiEditsDF.write
  .mode("overwrite")
  .format("delta")
  .partitionBy("channel")
  .save(f"{mountPoint}/wikiEdits")
)

### Unmount a Mount Point
Use **`dbutils.fs.unmount`** to unmount a mount point.

In [0]:
dbutils.fs.unmount(mountPoint)

## ![Spark Logo Tiny](https://files.training.databricks.com/images/105/logo_spark_tiny.png) Classroom-Cleanup<br>

Run the **`Classroom-Cleanup`** cell below to remove any artifacts created by this lesson.

In [0]:
%run "../Includes/Classroom-Cleanup"


-sandbox
&copy; 2020 Databricks, Inc. All rights reserved.<br/>
Apache, Apache Spark, Spark and the Spark logo are trademarks of the <a href="http://www.apache.org/">Apache Software Foundation</a>.<br/>
<br/>
<a href="https://databricks.com/privacy-policy">Privacy Policy</a> | <a href="https://databricks.com/terms-of-use">Terms of Use</a> | <a href="http://help.databricks.com/">Support</a>