# Access Azure Data Lake Using Access Keys
1. Set Spark Config
2. List Contents From a Container
3. Read Data from a File

***This is the recommended access pattern for external users.***

## SAS Token
- Provide fine grained access to the storage
- Restrict access to specific resource types/services
- Allow specific permissions
- Restrict access to specific time periods
- Limit access to specific IP addresses
- Recommended access pattern for external clients

### Create Secret Scopes in DB
https://learn.microsoft.com/en-us/azure/databricks/security/secrets/secret-scopes

### Create SAS Tokens for Storage Containers
https://learn.microsoft.com/en-us/azure/cognitive-services/translator/document-translation/how-to-guides/create-sas-tokens?tabs=Containers

In [0]:
# Set Spark Config

# spark.conf.set("fs.azure.account.auth.type.<storage-account>.dfs.core.windows.net", "SAS")
# spark.conf.set("fs.azure.sas.token.provider.type.<storage-account>.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.sas.FixedSASTokenProvider")
# spark.conf.set("fs.azure.sas.fixed.token.<storage-account>.dfs.core.windows.net", dbutils.secrets.get(scope="<scope>", key="<sas-token-key>"))

In [0]:
# Set Spark Config

spark.conf.set(
    "fs.azure.account.auth.type.dbcourselakehouse.dfs.core.windows.net", 
    "SAS"
    )

spark.conf.set(
    "fs.azure.sas.token.provider.type.dbcourselakehouse.dfs.core.windows.net", 
    "org.apache.hadoop.fs.azurebfs.sas.FixedSASTokenProvider"
    )

spark.conf.set(
    "fs.azure.sas.fixed.token.dbcourselakehouse.dfs.core.windows.net",
    dbutils.secrets.get(
        scope="DB Course Scope",
        key="sp=rl&st=2023-06-15T11:26:57Z&se=2023-06-15T19:26:57Z&skoid=1ad969db-a34f-43f5-95d6-1849dcaff648&sktid=3a13b434-1a06-4a17-a219-b729b0bf3604&skt=2023-06-15T11:26:57Z&ske=2023-06-15T19:26:57Z&sks=b&skv=2022-11-02&spr=https&sv=2022-11-02&sr=c&sig=hU0ULykWTcLnKtZElqF8b7MDTuaZ2SoHqGYOJNjXVbY%3D"
        )
    )

In [0]:
# Set Spark Config Using Secret Scope

# adls_sas_key = dbutils.secrets.get(
#     scope='<secret_scope>',
#     key='<secret_scope_key>'
# )

# spark.conf.set(
#     "fs.azure.account.auth.type.<storage_account_name>.dfs.core.windows.net", 
#     "SAS"
#     )

# spark.conf.set(
#     "fs.azure.sas.token.provider.type.<storage_account_name>.dfs.core.windows.net", 
#     "org.apache.hadoop.fs.azurebfs.sas.FixedSASTokenProvider"
#     )

# spark.conf.set(
#     "fs.azure.sas.fixed.token.<storage_account_name>.dfs.core.windows.net",
#     adls_sas_key
#     )

In [0]:
# Set Spark Config Using Secret Scope

adls_sas_key = dbutils.secrets.get(
    scope='databricks-course-secret-scope',
    key='databricks-course-adls-sas-key'
)

spark.conf.set(
    "fs.azure.account.auth.type.dbcourselakehouse.dfs.core.windows.net", 
    "SAS"
    )

spark.conf.set(
    "fs.azure.sas.token.provider.type.dbcourselakehouse.dfs.core.windows.net", 
    "org.apache.hadoop.fs.azurebfs.sas.FixedSASTokenProvider"
    )

spark.conf.set(
    "fs.azure.sas.fixed.token.dbcourselakehouse.dfs.core.windows.net",
    adls_sas_key
    )

In [0]:
# List Contents From a Container

# dbutils.fs.ls("abfss://<container@storage_account_name>.dfs.core.windows.net")

In [0]:
# List Contents From a Container

dbutils.fs.ls("abfss://raw@dbcourselakehouse.dfs.core.windows.net")

Out[3]: [FileInfo(path='abfss://raw@dbcourselakehouse.dfs.core.windows.net/demo/', name='demo/', size=0, modificationTime=1686826405000)]

In [0]:
# List Contents From a Container & Directory

# dbutils.fs.ls("abfss://<container@storage_account_name>.dfs.core.windows.net/<diretory>")

In [0]:
# List Contents From a Container & Directory

dbutils.fs.ls("abfss://raw@dbcourselakehouse.dfs.core.windows.net/demo")

Out[4]: [FileInfo(path='abfss://raw@dbcourselakehouse.dfs.core.windows.net/demo/calendar.csv', name='calendar.csv', size=75435, modificationTime=1686826417000)]

In [0]:
# Using the Display Function

# display(
#     dbutils.fs.ls("abfss://<container@storage_account_name>.dfs.core.windows.net/<directory>")
# )

In [0]:
# Using the Display Function

display(
    dbutils.fs.ls("abfss://raw@dbcourselakehouse.dfs.core.windows.net/demo")
)

path,name,size,modificationTime
abfss://raw@dbcourselakehouse.dfs.core.windows.net/demo/calendar.csv,calendar.csv,75435,1686826417000


In [0]:
# Read Data From a File Using Display Function

# display(
#     spark.read.csv("abfss://<container@storage_account_name>.dfs.core.windows.net/<directory>/<file_name>"")
# )

In [0]:
# Read Data From a File Using Display Function

display(
    spark.read.csv("abfss://raw@dbcourselakehouse.dfs.core.windows.net/demo/calendar.csv")
)

_c0,_c1,_c2,_c3,_c4,_c5,_c6,_c7,_c8,_c9,_c10,_c11
date_key,date,year,month,day,day_name,day_of_year,week_of_month,week_of_year,month_name,year_month,year_week
20200101,2020-01-01,2020,1,1,Wednesday,1,1,1,January,202001,202001
20200102,2020-01-02,2020,1,2,Thursday,2,1,1,January,202001,202001
20200103,2020-01-03,2020,1,3,Friday,3,1,1,January,202001,202001
20200104,2020-01-04,2020,1,4,Saturday,4,1,1,January,202001,202001
20200105,2020-01-05,2020,1,5,Sunday,5,2,2,January,202001,202002
20200106,2020-01-06,2020,1,6,Monday,6,2,2,January,202001,202002
20200107,2020-01-07,2020,1,7,Tuesday,7,2,2,January,202001,202002
20200108,2020-01-08,2020,1,8,Wednesday,8,2,2,January,202001,202002
20200109,2020-01-09,2020,1,9,Thursday,9,2,2,January,202001,202002
