##Adding a new Mount point to Azure Data Lake Gen2

Source: [Access Azure Data Lake Storage Gen2 using OAuth 2.0 with an Azure service principal](https://docs.databricks.com/data/data-sources/azure/adls-gen2/azure-datalake-gen2-sp-access.html)

We need a few bits of information to create a mount point. We nedd:
* The **Service Principal ID** - (Azure Portal -> Azure Active Directory service -> App Registrations -> Application (client) ID value)
* The **Service Principal Key** - (Azure Portal -> Azure Active Directory service -> App Registrations -> Certificates & Secrets -> Client secrets value)
* The **DirectoryID** - (Azure Portal -> Azure Active Directory service -> App Registrations -> Directory (tenant) ID value)
* The ADLS storage account name
* The ADLS container name

**Important.** To get Service Principal ID, Service Principal Key, DirectoryID you need create an Azure AD application, which will create an associated service principal used to access the storage account:
1. On Azure portal go to the Azure Active Directory service
2. Under Manage, click App Registrations.
3. Click + New registration. Enter a name for the application and click Register.

###Adding scoped secrets

To add a secret and a scope, this needs to be completed using the Databricks CLI.

If you need the Databricks CLI, you can pip install it localy: ```pip install databricks-cli```
Also, you can use **Cloud Shell** for this purpose.

You need to have created a personal access token (PAT) prior to using the CLI. (Databricks - User Settings - Generate New Token)

Use the command: ```databricks configure --token``` to configure the Databricks CLI

Once you're connected it is easy as this:
1. ```databricks secrets create-scope --scope Analysts --initial-manage-principal "users"```
2. ```databricks secrets put --scope Analysts --key SPID --string-value "Service Principal ID"```
3. ```databricks secrets put --scope Analysts --key SPKey --string-value "Service Principal Key"```
4. ```databricks secrets put --scope Analysts --key DirectoryID --string-value "Azure Directory ID"```

The **NOT** recommended approach:

In [0]:
ServicePrincipalID = "<Service Principal ID>"
ServicePrincipalKey = "<Service Principal Secret Key>"
DirectoryID = "Azure Directory ID"

The recommended approach:

In [0]:
import itertools as it
import sys
import os

# Gather relevant keys
ServicePrincipalID = dbutils.secrets.get(scope = 'Analysts', key = 'SPID')
ServicePrincipalKey = dbutils.secrets.get(scope = 'Analysts', key = 'SPKey')
DirectoryID = dbutils.secrets.get(scope = 'Analysts', key = 'DirectoryID')

# Combine DirectoryID into full string
Directory = f"https://login.microsoftonline.com/{DirectoryID}/oauth2/token"

# Input parameter: file for transform
dbutils.widgets.text("fileName", "", "")
input_file_name = "CoordStatusMA_Short.csv"

# Storage account options
# ADLS container name
container_name = "covidsbcontainer"
# ADLS storage account name
stac_name = "covidsbstac2"

# Variables for DBFS 
url = f"abfss://{container_name}@{stac_name}.dfs.core.windows.net/"
mnt_path = '/mnt/covid'
input_file_path = "/dbfs" + mnt_path + "/input/" + input_file_name
output_file_path = "/dbfs" + mnt_path + "/output/" + os.path.splitext(os.path.basename(input_file_path))[0] + "_prepared.csv"

# Create configurations for our connection
configs = {"fs.azure.account.auth.type": "OAuth",
           "fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
           "fs.azure.account.oauth2.client.id": ServicePrincipalID,
           "fs.azure.account.oauth2.client.secret": ServicePrincipalKey,
           "fs.azure.account.oauth2.client.endpoint": Directory}

# Mount the Data Lake onto DBFS at the /mnt/covid location
dbutils.fs.mount(
  source = url,
  mount_point = mnt_path,
  extra_configs = configs)

In [0]:
status = None
date = None
column_names=['Status','Date','Latitude','Longitude','Count']
if os.path.isfile(input_file_path):
  new_file = open(output_file_path, 'w')
  new_file.write(column_names[0] + "," + column_names[1] + "," + column_names[2] + "," + column_names[3] + "," + column_names[4] +"\n")
  with open(input_file_path, mode='r', encoding='cp1252') as f:
      for line in f.readlines():
          if len(line.strip()) > 0:
              if line.startswith('Status='):
                  status_line = line.split(',')[0]
                  status = status_line.split('=')[1]
              else:
                  #Get date
                  date = line.split(',')[0]
                  #Get list with Latitudes
                  _la = line.split(',')[1::3]
                  #Get list with Longitudes
                  _lo = line.split(',')[2::3]
                  #Get list with counts
                  _c = line.split(',')[3::3]
                  if len(_la) == len(_lo) == len(_c):
                      #Aggregate elements from lists
                      for (la, lo, c) in zip(_la, _lo, _c):
                          #while lists are not empty
                          if len(la) > 0 and len(lo) > 0 and len(c) > 0:
                              new_file.write(status.strip() + "," + date.strip() + "," + la.strip() + "," + lo.strip() + "," + c.strip() +"\n")
  new_file.close()

In [0]:
# Unmount the data lake
dbutils.fs.unmount(mnt_path)