Deploying the Databricks Workspace Service should automatically set up a Unity Catalog pointing to the workspace storage account datalake container

**Is your feature request related to a problem? Please describe.**
Right now accessing workspace storage account data from within Databricks is a little convoluted (if there is an easier way please let me know!):

1. Get the workspace storage account data lake endpoint and account key
2. Put the account key in a Databricks Secret (or just use it directly in notebooks, although this is definitely not best practice).
3. Access the storage account directly from dbutils, e.g. 
```
spark.conf.set(
  "fs.azure.account.key.<storage-account-name>.dfs.core.windows.net", 
  dbutils.secrets.get(scope="<scope-name", key="storage-account-access-key-name"))

df = spark.read.json("dbfs:/databricks-datasets/iot/iot_devices.json")
```

**Describe the solution you'd like**
It would be a lot easier if the Databricks workspace service automatically created a Unity Catalog pointing to the datalake container.

This would require the following steps to be automated:
1. Grant the Databricks managed identity Storage Blob Data Contributor on the workspace storage account.
2. There is already a Databricks Credential set up to use the managed identity, so a new one does not need to be created.
3. Create an External Location in Databricks Unity Catalog configured to access the `datalake` container in the workspace storage account (`stgwsNNNN`).
4. Create a Catalog in Unity Catalog configured to access the external location.
5. All of the above steps can be performed via Terraform included in the Databricks Workspace Service (there is a first party terraform provider for databricks; their databricks CLI actually uses terraform internally to perform most of its functionality).

This would allow Databricks users to immediately start working with files/data in the datalake container.

**Describe alternatives you've considered**
Manually accessing the storage account via the code up above works, but is cumbersome and requires multiple manual steps/connections.

**Additional context**

Unity Catalog is the preferred method for Databricks to access cloud storage at this point.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Deploying the Databricks Workspace Service should automatically set up a Unity Catalog pointing to the workspace storage account datalake container #4488

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Deploying the Databricks Workspace Service should automatically set up a Unity Catalog pointing to the workspace storage account datalake container #4488

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions