d-sandbox

<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png" alt="Databricks Learning" style="width: 1200px">
</div>

# Reading and Writing Data - Azure Blob Storage
**Technical Accomplishments:**
- Access Azure Blob Storage directly using the DataFrame API
- Access Azure Blob Storage by mounting storage with DBFS

**Requirements:**
- A Shared Key or Shared Access Signature (for accessing *private* storage accounts)

You will configure a Shared Key and Shared Access Signature (SAS) in the steps below.

## ![Spark Logo Tiny](https://files.training.databricks.com/images/105/logo_spark_tiny.png) Classroom-Setup<br>

For each lesson to execute correctly, please make sure to run the **`Classroom-Setup`** cell at the start of each lesson (see the next cell) and the **`Classroom-Cleanup`** cell at the end of each lesson.

In [0]:
%run "../Includes/Classroom-Setup"

##![Spark Logo Tiny](https://files.training.databricks.com/images/105/logo_spark_tiny.png) Azure Blob Storage
Azure Blob Storage is a service for storing large amounts of unstructured object data, such as text or binary data.

You can use Blob Storage to expose data publicly to the world, or to store application data privately.

We will cover two ways you can access files from Azure Blob Storage here.

##![Spark Logo Tiny](https://files.training.databricks.com/images/105/logo_spark_tiny.png) Method 1: Access Azure Blob Storage Directly

It is possible to read directly from Azure Blob Storage using the <a href="https://docs.databricks.com/spark/latest/data-sources/azure/azure-storage.html#access-azure-blob-storage-using-the-dataframe-api" target="_blank">Spark API and Databricks APIs</a>.

### Configure Access to a Container Using a Shared Access Signature (SAS)

A <a href="https://docs.microsoft.com/en-us/azure/storage/common/storage-dotnet-shared-access-signature-part-1" target="_blank">Shared Access Signature (SAS)</a> is a great way to grant limited access to objects in your storage account to other clients, without exposing your account key.

Run the cell below to setup a SAS in your notebook by configuring the <a href="https://databricks.com/blog/2016/08/15/how-to-use-sparksession-in-apache-spark-2-0.html" target="_blank">SparkSession</a>.

In this example, we provide you with a SAS and storage container. Normally, you'll get these values from Azure, and then configure your SparkSession with these values.

In [0]:
(source, sasEntity, sasToken) = getAzureDataSource()

spark.conf.set(sasEntity, sasToken)

displayHTML("""
Retrieved the following values:
  <li><b style="color:green">source:</b> <code>{}</code></li>
  <li><b style="color:green">sasEntity:</b> <code>{}</code></li>
  <li><b style="color:green">sasToken:</b> <code>{}</code></li><br>

Successfully set up a Shared Access Signature (SAS) in your notebook.
""".format(source, sasEntity, sasToken))

### Read Data from Storage Account
You can now use standard Spark and Databricks APIs to read from the storage account.

In [0]:
path = source + "wikipedia/edits/snapshot-2016-05-26.json"
wikiEditsDF = spark.read.json(path)
display(wikiEditsDF)

##![Spark Logo Tiny](https://files.training.databricks.com/images/105/logo_spark_tiny.png) Method 2: Mount Azure Blob Storage with DBFS

Another way to access your Azure Blob Storage is to mount your storage with DBFS.

For this example, you'll create your own Azure <a href="https://docs.microsoft.com/en-us/azure/storage/common/storage-quickstart-create-account?tabs=azure-portal" target="_blank">Storage Account</a> and <a href="https://docs.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-portal" target="_blank">Container</a>.

This time, let's use a <a href="https://docs.microsoft.com/en-us/rest/api/storageservices/authorize-with-shared-key" target="_blank">Shared Key</a> to authorize requests to the the Storage Account.

### Create an Azure Storage Account and Container
As you work through the following steps, record the **Storage Account Name**, **Container Name**, and **Access Key** in the cell below:
0. Access the Azure Portal > Create a new resource > Storage account
0. Specify the correct *Resource Group* and *Region*, and use any unique string for the **Storage Account Name**
0. Access the new Storage account > Access Blobs
0. Create a New Container using any unique string for the **Container Name**
0. Retrieve the primary **Access Key** for the new Storage Account

In [0]:
# TODO
storageAccountName = FILL_IN.VALUE
containerName = FILL_IN.VALUE
storageEntity = f"fs.azure.account.key.{storageAccountName}.blob.core.windows.net"
accessKey = FILL_IN.VALUE

### Mount Container Using DBFS
Use **`dbutils.fs.mount`** to <a href="https://docs.databricks.com/spark/latest/data-sources/azure/azure-storage.html#mount-azure-blob-storage-containers-with-dbfs" target="_blank">mount this container</a> (or a folder inside this container) to DBFS.

First, let's construct the **`sourceURI`** and **`mountPoint`** variables:

In [0]:
sourceURI = f"wasbs://{containerName}@{storageAccountName}.blob.core.windows.net/"
mountPoint = f"/mnt/{username}-{containerName}"

displayHTML(f"""
  <li><b style="color:green">sourceURI:</b> <code>{sourceURI}</code></li>
  <li><b style="color:green">mountPoint:</b> <code>{mountPoint}</code></li>
""")

Now we can actually mount our blob:

In [0]:
dbutils.fs.mount(
  source = sourceURI, 
  mount_point = mountPoint, 
  extra_configs = {storageEntity: accessKey})

### Read and Write Data to Storage Account with DBFS
Now, you can access files in your container as if they were local files!

In [0]:
files = dbutils.fs.ls(mountPoint)

display(files)

Writing to this mount point will write to your storage account.

In [0]:
path = f"{mountPoint}/wikiEdits.parquet"

(wikiEditsDF
  .write
  .mode("overwrite")
  .parquet(path))

### Unmount a Mount Point
Use **`dbutils.fs.unmount`** to unmount a mount point.

In [0]:
dbutils.fs.unmount(mountPoint)

And for confirmation, let's take a look at all the mounts.

In [0]:
display(dbutils.fs.mounts())

## ![Spark Logo Tiny](https://files.training.databricks.com/images/105/logo_spark_tiny.png) Classroom-Cleanup<br>

Run the **`Classroom-Cleanup`** cell below to remove any artifacts created by this lesson.

In [0]:
%run "../Includes/Classroom-Cleanup"


-sandbox
&copy; 2020 Databricks, Inc. All rights reserved.<br/>
Apache, Apache Spark, Spark and the Spark logo are trademarks of the <a href="http://www.apache.org/">Apache Software Foundation</a>.<br/>
<br/>
<a href="https://databricks.com/privacy-policy">Privacy Policy</a> | <a href="https://databricks.com/terms-of-use">Terms of Use</a> | <a href="http://help.databricks.com/">Support</a>