# Securely Mounting Azure Data Lake Storage Gen2 in Databricks Notebook

## Overview

This notebook demonstrates how to securely mount Azure Data Lake Storage Gen2 (ADLS Gen2) containers to a Databricks workspace. It covers the setup of OAuth authentication, dynamic configuration through widgets, and error handling during the mounting process.

## Guide

Before proceeding with data loading and inspection in Spark, ensure access to Azure Data Lake Storage Gen2 is configured using a Service Principal for secure application and service authentication with Azure resources.

**Key Steps**

- 1. Register a Microsoft Entra ID Application: Create an application identity in Azure for integration.
- 2. Generate Client Secret: Obtain a secret for application authentication.
- 3. Assign Permissions: Grant access roles to the service principal for data lake access.
- 4. Secure Credential Storage: Use Azure Key Vault for storing the client secret.
- 5. Configure Databricks Workspace: Set up OAuth 2.0 authentication in Databricks with the service principal credentials.

Refer to [Microsoft Documentation](https://learn.microsoft.com/en-us/azure/databricks/connect/storage/tutorial-azure-storage) for detailed guidance.

## Prerequisites

- Access to an Azure subscription and permissions to manage resources.
- An Azure Data Lake Storage Gen2 account configured with appropriate access controls.
- A Databricks workspace with necessary permissions to create and manage clusters, notebooks, and secret scopes.

## Steps

### 1. Setup Secret Scope

First, store your Azure credentials (DNS Name and Resource ID) in a Databricks secret scope. This keeps your sensitive information secure.

List all the secret scopes to confirm

In [0]:
# List all secret scopes

dbutils.secrets.listScopes()

[SecretScope(name='secretScopeDemo')]

### 2. Configure Authentication

In your Databricks notebook, define the authentication parameters. Use widgets to allow dynamic specification of the secret scope containing your credentials.

In [0]:
# Configure access to Azure Data Lake Storage Gen2 using OAuth authentication by setting up necessary configurations and retrieving sensitive credentials securely via Databricks' secret scopes.

dbutils.widgets.text('keyvault_scope', 'secretScopeDemo')
keyvault_scope = dbutils.widgets.get('keyvault_scope')

configs = {
    "fs.azure.account.auth.type": "OAuth",
    "fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
    "fs.azure.account.oauth2.client.id": dbutils.secrets.get(keyvault_scope, "applicationID"),
    "fs.azure.account.oauth2.client.secret": dbutils.secrets.get(keyvault_scope, "secretSPValue"),
    "fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/{tenant-id}/oauth2/token"
}

_NOTE_: Replace {tenant-id} with your actual Azure tenant ID.

### 3. Mount the Data Lakes
Unmount any existing mount at the specified point, then mount the ADLS Gen2 filesystem for all the layers.

In [0]:
# BASE
# Unmount if exists, then mount Azure Data Lake Storage Gen2 at 'baseMountPoint' with 'configs'.

source = "abfss://base@sadbhgtsh.dfs.core.windows.net/"
baseMountPoint = "/mnt/base/"
try:
    dbutils.fs.unmount(baseMountPoint)
except:
    print("count not unmount base storage")
    pass

dbutils.fs.mount(
    source=source,
    mount_point=baseMountPoint,
    extra_configs=configs
)

# BRONZE
# Unmount if exists, then mount Azure Data Lake Storage Gen2 at 'bronzeMountPoint' with 'configs'.

source = "abfss://bronze@sadbhgtsh.dfs.core.windows.net/"
bronzeMountPoint = "/mnt/bronze/"
try:
    dbutils.fs.unmount(bronzeMountPoint)
except:
    print("count not unmount bronze storage")
    pass

dbutils.fs.mount(
    source=source,
    mount_point=bronzeMountPoint,
    extra_configs=configs
)

# SILVER
# Unmount if exists, then mount Azure Data Lake Storage Gen2 at 'silverMountPoint' with 'configs'.

source = "abfss://silver@sadbhgtsh.dfs.core.windows.net/"
silverMountPoint = "/mnt/silver/"
try:
    dbutils.fs.unmount(silverMountPoint)
except:
    print("count not unmount silver storage")
    pass

dbutils.fs.mount(
    source=source,
    mount_point=silverMountPoint,
    extra_configs=configs
)

# GOLD
# Unmount if exists, then mount Azure Data Lake Storage Gen2 at 'goldMountPoint' with 'configs'.

source = "abfss://gold@sadbhgtsh.dfs.core.windows.net/"
goldMountPoint = "/mnt/gold/"
try:
    dbutils.fs.unmount(goldMountPoint)
except:
    print("count not unmount gold storage")
    pass

dbutils.fs.mount(
    source=source,
    mount_point=goldMountPoint,
    extra_configs=configs
)

/mnt/base/ has been unmounted.
/mnt/bronze/ has been unmounted.
/mnt/silver/ has been unmounted.
/mnt/gold/ has been unmounted.


True

### 4. Exit Notebook with Mount Points Information

In [0]:
# Exit the notebook and return the mount points for the base, bronze, silver and gold data lakes as a JSON string.

dbutils.notebook.exit(str({"baseMountPoint": baseMountPoint, "bronzeMountPoint": bronzeMountPoint, "silverMountPoint": silverMountPoint, "goldMountPoint": goldMountPoint}))