# Using Datasets in Azure ML Synapse Spark

This mockup notebook covers using Datasets in Azure ML Synapse Spark for Tabular, File, 'local', and 'remote' usage. 

## Create Dataset

Either use an existing Dataset created from a Datastore in the UI, Python SDK, etc. or create one from a dataframe.

In [None]:
df = ... # dataframe in Synapse Spark session or otherwise

# create Azure ML Dataset 
from azureml.core import Workspace, Dataset

ws = Workspace.from_config() # see https://azureml/workspace 
dset = Dataset.Tabular.from_spark_dataframe(df).register('my-data', ws)

## Read in dataframe

In 'local' mode, you can use your user credentials like AAD to access the [workspace](https://aka.ms/azureml/workspace) object. In 'remote' runs, it is recommended to use the Run Token for accessing Workspace resources. This can be abstracted through the `azureml.core.Run.get_context` method to obtain the Workspace object, which can then be used to access Datasets.

### Tabular Datasets 

In [None]:
run = 'local'

# get the workspace 
if run is 'local':
    from azureml.core import Workspace
    ws = Workspace.from_config()
elif run is 'remote':
    from azureml.core import Run
    ws = Run.get_context().experiment.workspace

# get the dataset 
dset = ws.datasets['my-data']

# convert to PySpark dataframe
df = dset.to_spark_dataframe()

### File Datasets 

In [None]:
run = 'remote'

# get the workspace 
if run is 'local':
    from azureml.core import Workspace
    ws = Workspace.from_config()
elif run is 'remote':
    from azureml.core import Run
    ws = Run.get_context().experiment.workspace

# get the dataset 
dset = ws.datasets['my-data']

# mount as hdfs (and get the path)
path = dset.to_hdfs()

# use normal Spark readers 
df = spark.sql.read_csv(f'{path}/*.csv')