## Using Azure Service Principal for Databricks automation

The main motivation is to remove any user account dependency on automation workflows. Solution was to use Service Principal (SP) instead of user accounts for job ownership and triggering. **There are several ways in which SP can be setup to handle Databricks automation and below process only illustrates one specific way.**

**IMPORTANT - SP credentials, PAT are sensitive information and should be treated with care. Use key vault to store and access them at runtime.**

### Steps:

- Service Principal needs to be assigned to "contributor" role for the Azure Databricks (ADB) workspace resource in Azure (This will add SP to the admins group first time it is added to ADB).
- Add SP to ADB, will require an existing admin PAT to add SP to workspace
- Get Azure Active Directory (AAD) OAuth2 access token using (SP) creds
- Get Databricks PAT using AAD OAuth2 access token
- Add PAT to Azure key vault and use this in automation, or to trigger jobs from Azure Data Factory (ADF)!
    - i.e,  ADB_PAT = dbutils.secrets.get(scope = "azure-kv", key = "sp-pat")   
    - **Currently ADF does not allow for OAuth2 workflow, needs PAT**  

**For jobs**
- Transfer ownership of jobs to SP
- As SP was added to admin group (via membership to contributor role in Azure), it will have inherited "Manage" permission to all Clusters, access to all notebooks and secret scopes!!

**Using SP without adding to admin group**
- Don't add SP to "contributor" role for the Azure Databricks (ADB) workspace resource in Azure
- Follow https://docs.microsoft.com/en-us/azure/databricks/tutorials/run-jobs-with-service-principals#--create-a-service-principal-in-azure-active-directory
- This requires managing permissions for Clusters and Secret scopes manually


**Ref**:
- https://docs.microsoft.com/en-us/azure/databricks/tutorials/run-jobs-with-service-principals
- https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/scim/scim-sp
- https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/scim/
- https://docs.databricks.com/dev-tools/api/latest/index.html
- https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/aad/service-prin-aad-token


### Another option for ADF
- [Using ADF "Managed Identity" to authenticate instead of PAT](https://techcommunity.microsoft.com/t5/azure-data-factory/azure-databricks-activities-now-support-managed-identity/ba-p/1922818)


In [None]:
import json
from getpass import getpass
import requests

In [None]:
ADB_ADMIN_PAT = getpass()       ## Existing Admin PAT for databricks to add SP to workspace

In [None]:
SP_secret = getpass()      ## This is the client secret received when SP/Application Was registered

In [None]:
ADB_instance = "adb-xxxxxxxxxxxxxxxx.xx.azuredatabricks.net"        ## Databricks instance domain name

SP_name = "xxxx"    ## name in AD - SP/Application name
SP_app_id =  "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"   ## client ID

AZ_TENENT_ID = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"    ## az account show -> is "homeTenantId" value 
AZ_ADB_RESOURCE_ID = "2ff814a6-3304-4ab8-85cb-cd0e6f879c1d"    ## this is common for all of databicks on azure


In [None]:
## Add SP to workspace
## see  https://docs.microsoft.com/en-us/azure/databricks/tutorials/run-jobs-with-service-principals#--add-the-service-principal-to-the-azure-databricks-workspace
headers_add_sp = {
    'Content-Type': 'application/scim+json',
    'Authorization': f'Bearer {ADB_ADMIN_PAT}'
}

payload_add_sp = {
    "schemas":[
      "urn:ietf:params:scim:schemas:core:2.0:ServicePrincipal"
    ],
    "applicationId": f"{SP_app_id}",
    "displayName": f"{SP_name}",
    "entitlements":[
      {
        "value":"allow-cluster-create"
      }
    ]
  }

URL_ADD_SP = f'https://{ADB_instance}/api/2.0/preview/scim/v2/ServicePrincipals'

resp_add_sp = requests.post(URL_ADD_SP, headers=headers_add_sp, json=payload_add_sp)
resp_add_sp.json()


In [None]:
### get AAD access token

head_aad_access_token = {
    'Content-Type': 'application/x-www-form-urlencoded',
}
data_aad_access_token = {
  'grant_type': 'client_credentials',
  'client_id': f'{SP_app_id}',
  'resource': f'{AZ_ADB_RESOURCE_ID}',
  'client_secret':  f'{SP_secret}'
}

In [None]:
resp_aad_token = requests.get(f'https://login.microsoftonline.com/{AZ_TENENT_ID}/oauth2/token', headers=head_aad_access_token, data=data_aad_access_token)
aad_token_sp = resp_aad_token.json()["access_token"]

In [None]:
#### get a PAT using SP creds

TOKEN_API_CREATE = f"https://{ADB_instance}/api/2.0/token/create"

headers_adb = {
    'Content-Type': 'application/json',
    'Authorization': f'Bearer {aad_token_sp}'
}

## skip the "lifetime_seconds" key to create the token without expiry!!
data_adb = { "lifetime_seconds" : 86400,
                "comment": "token generated using SP aad oauth token, expires in 24hrs" }

resp_token_create = requests.post(TOKEN_API_CREATE, headers=headers_adb, json = data_adb)
SP_PAT = resp_token_create.json()["token_value"]          ## save this to azure key vault

In [None]:
### TEST A JOB -- TRIGGER USING SP PAT

In [None]:
JOB_SUBMIT_URL = f"https://{ADB_instance}/api/2.0/jobs/run-now"
TEST_JOB = xxxx


In [None]:
data_job = {'job_id' : TEST_JOB}

head_job = {
    "Content-Type":  "application/json",
    "Authorization": f"Bearer {SP_PAT}"
    }
resp_job = requests.post(JOB_SUBMIT_URL, headers=head_job, json=data_job)
resp_job.json()