#![Spark Logo Tiny](https://files.training.databricks.com/images/105/logo_spark_tiny.png) Key Vault-Backed Secret Scopes

## Learning Objectives
By the end of these lessons, you should be able to:
* Configure Databricks to access Key Vault secrets
* Read and write data directly from Blob Storage using secrets stored in Key Vault
* Set different levels of access permission using SAS at the Storage service level
* Mount Blob Storage into DBFS
* Describe how mounting impacts secure access to data

The overall goal of these three notebooks is to read and write data directly from Blob Storage using secrets stored in a Key Vault, accessed securely through the Databricks Secrets utility. 

This goal has been broken into 3 notebooks to make each step more digestible:
1. `1 - Blob Storage` - In the first notebook, we will add a file to a Blob on a Storage Account and generate SAS tokens with different permissions levels
1. `2 - Key Vault` - In the second notebook, we will configure an Azure Key Vault Access Policy and add text-based credentials as secrets
1. `3 - Key Vault` Backed Secret Scopes - In the third notebook, we will define a Secret Scope in Databircks by linking to the Key Vault and use the previously stored credentials to read and write from the Storage Container

### Online Resources

- [Azure Databricks Secrets](https://docs.azuredatabricks.net/user-guide/secrets/index.html)
- [Azure Key Vault](https://docs.microsoft.com/en-us/azure/key-vault/key-vault-whatis)
- [Azure Databricks DBFS](https://docs.azuredatabricks.net/user-guide/dbfs-databricks-file-system.html)
- [Introduction to Azure Blob storage](https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blobs-introduction)
- [Databricks with Azure Blob Storage](https://docs.databricks.com/spark/latest/data-sources/azure/azure-storage.html)
- [Azure Data Lake Storage Gen1](https://docs.azuredatabricks.net/spark/latest/data-sources/azure/azure-datalake.html#mount-azure-data-lake)
- [Azure Data Lake Storage Gen2](https://docs.databricks.com/spark/latest/data-sources/azure/azure-datalake-gen2.html)

##![Spark Logo Tiny](https://files.training.databricks.com/images/105/logo_spark_tiny.png) 3 - Key Vault Backed Secret Scopes

In this notebook, we will use the Secret Scopes API to securely connect to the Key Vault. The Secret Scopes API will allow us to use the Blob Storage SAS tokens, stored as Secrets in the Key Vault, to read and write data from Blob Storage. 

### Learning Objectives
By the end of this lesson, you should be able to:
- Create a Secret Scope connected to Azure Key Vault
- Mount Blob Storage to DBFS using a SAS token
- Write data to Blob using a SAS token in Spark Configuration

### Classroom setup

A quick script to define a username variable in Python and Scala.

In [4]:
%run ./Includes/User-Name

-sandbox

## Access Azure Databricks Secrets UI

Now that you have an instance of Azure Key Vault up and running, it is time to let Azure Databricks know how to connect to it.

The first step is to open a new web browser tab and navigate to `https://&lt;your_azure_databricks_url&gt;#secrets/createScope` 

<img alt="Side Note" title="Side Note" style="vertical-align: text-bottom; position: relative; height:1.75em; top:0.05em; transform:rotate(15deg)" src="https://files.training.databricks.com/static/images/icon-note.webp"/> The number after the `?o=` is the unique workspace identifier; append `#secrets/createScope` to this.

<img src="https://files.training.databricks.com/images/adbcore/config-keyvault/db-secrets.png" width=800px />

## Link Azure Databricks to Key Vault
We'll be copy/pasting some values from the Azure Portal to this UI.

In the Azure Portal on your Key Vault tab:
1. Go to properties
2. Copy and paste the DNS Name
3. Copy and paste the Resource ID

<img src="https://files.training.databricks.com/images/adbcore/config-keyvault/properties.png" width=800px />

### Add configuration values to the Databricks Secret Scope UI that you copied from the Azure Key Vault


In the Databricks Secrets UI:

1. Enter the name of the secret scope; here, we'll use `students`.
2. Paste the DNS Name
3. Paste the Resource ID
4. Click "Create"

<img src="https://files.training.databricks.com/images/adbcore/config-keyvault/db-secrets-complete.png" />

  > MANAGE permission allows users to read and write to this secret scope, and, in the case of accounts on the Azure Databricks Premium Plan, to change permissions for the scope.

  > Your account must have the Azure Databricks Premium Plan for you to be able to select Creator. This is the recommended approach: grant MANAGE permission to the Creator when you create the secret scope, and then assign more granular access permissions after you have tested the scope.

### Apply Changes

After a moment, you will see a dialog verifying that the secret scope has been created. Click "Ok" to close the box.

<img src="https://files.training.databricks.com/images/adbcore/config-keyvault/db-secrets-confirm.png" />

### List Secret Scopes

To list the existing secret scopes the `dbutils.secrets` utility can be used.

You can list all scopes currently available in your workspace with:

In [10]:
%python
dbutils.secrets.listScopes()

### List Secrets within a specific scope


To list the secrets within a specific scope, you can supply that scope name.

In [12]:
%python
dbutils.secrets.list("students")

### Using your Secrets

To use your secrets, you supply the scope and key to the `get` method.

Run the following cell to retrieve and print a secret.

In [14]:
%python
print(dbutils.secrets.get(scope="students", key="storageread"))

### Secrets are not displayed in clear text

Notice that the value when printed out is `[REDACTED]`. This is to prevent your secrets from being exposed.

## Mount Azure Blob Container - Read/List

In this section, we'll demonstrating using a `SASTOKEN` that only has list and read permissions managed at the Storage Account level.

**This means:**
- Any user within the workspace can view and read the files mounted using this key
- This key can be used to mount any container within the storage account with these privileges

Unmount directory if previously mounted.

In [18]:
%python

MOUNTPOINT = "/mnt/commonfiles"

if MOUNTPOINT in [mnt.mountPoint for mnt in dbutils.fs.mounts()]:
  dbutils.fs.unmount(MOUNTPOINT)

In [19]:
%python

# Add the Storage Account, Container, and reference the secret to pass the SAS Token
STORAGE_ACCOUNT = dbutils.secrets.get(scope="students", key="storageaccount")
CONTAINER = "commonfiles"
SASTOKEN = dbutils.secrets.get(scope="students", key="storageread")

# Do not change these values
SOURCE = "wasbs://{container}@{storage_acct}.blob.core.windows.net/".format(container=CONTAINER, storage_acct=STORAGE_ACCOUNT)
URI = "fs.azure.sas.{container}.{storage_acct}.blob.core.windows.net".format(container=CONTAINER, storage_acct=STORAGE_ACCOUNT)

try:
  dbutils.fs.mount(
    source=SOURCE,
    mount_point=MOUNTPOINT,
    extra_configs={URI:SASTOKEN})
except Exception as e:
  if "Directory already mounted" in str(e):
    pass # Ignore error if already mounted.
  else:
    raise e
print("Success.")

In [20]:
%python
dbutils.fs.ls(MOUNTPOINT)

### Define and display a Dataframe that reads a file from the mounted directory

In [22]:
salesDF = (spark.read
              .option("header", True)
              .option("inferSchema", True)
              .csv(MOUNTPOINT + "/sales.csv"))

display(salesDF)

### Filter the Dataframe and display the results

In [24]:
from pyspark.sql.functions import col

sales2004DF = (salesDF
                  .filter((col("ShipDateKey") > 20031231) &
                          (col("ShipDateKey") <= 20041231)))
display(sales2004DF)

### Details....


While we can list and read files with this token, our job will abort when we try to write.

In [26]:
try:
  sales2004DF.write.mode("overwrite").parquet(MOUNTPOINT + "/sales2004")
except Exception as e:
  print(e)

### Review

At this point you should see how to:
* Use Secrets to access blobstorage
* Mount the blobstore to dbfs (Data Bricks File System)

Mounting data to dbfs makes that content available to anyone in that workspace. 

If you want to access blob store directly without mounting the rest of the notebook demonstrate that process.

## Writing Directly to Blob using SAS token

Note that when you mount a directory, by default, all users within the workspace will have the same privileges to interact with that directory. Here, we'll look at using a SAS token to directly write to a blob (without mounting). This ensures that only users with the workspace that have access to the associated key vault will be able to write.

In [29]:
%python

CONTAINER = "commonfiles"
SASTOKEN = dbutils.secrets.get(scope="students", key="storagewrite")

# Redefine the source and URI for the new container
SOURCE = "wasbs://{container}@{storage_acct}.blob.core.windows.net/".format(container=CONTAINER, storage_acct=STORAGE_ACCOUNT)
URI = "fs.azure.sas.{container}.{storage_acct}.blob.core.windows.net".format(container=CONTAINER, storage_acct=STORAGE_ACCOUNT)
               
# Set up container SAS
spark.conf.set(URI, SASTOKEN)

### Listing Directory Contents and writing using SAS token

Because the configured container SAS gives us full permissions, we can interact with the blob storage using our `dbutils.fs` methods.

In [31]:
%python
dbutils.fs.ls(SOURCE)

We can write to this blob directly, without exposing this mount to others in our workspace.

In [33]:
sales2004DF.write.mode("overwrite").parquet(SOURCE + "/sales2004")

In [34]:
%python
dbutils.fs.ls(SOURCE)

### Deleting using SAS token

This scope also has delete permissions.

In [36]:
# ALL_NOTEBOOK
dbutils.fs.rm(SOURCE + "/sales2004", True)

### Cleaning up mounts

If you don't explicitly unmount, the read-only blob that you mounted at the beginning of this notebook will remain accessible in your workspace.

In [38]:
%python

if MOUNTPOINT in [mnt.mountPoint for mnt in dbutils.fs.mounts()]:
  dbutils.fs.unmount(MOUNTPOINT)

## Congratulations!

You should now be able to use the following tools in your workspace:

* Databricks Secrets
* Azure Key Vault
* SAS token
* dbutils.mount