In [1]:
subscription_id = 'a89228ac-8bf4-4646-9d2c-442b1cb5d622'
resource_group = 'lauri-ml'
aml_workspace = 'lauri-ml'
# External data
datastore_name = 'tfworld'
# Example dataset
dataset_path = 'azure-service-classifier/data/train.csv'
dataset_name = 'Stackoverflow dataset'
dataset_descr = 'Tabular Stackoverflow dataset'
# Azure Stackoverflow dataset
azure_dataset_path = 'azure-service-classifier/data'
azure_dataset_name = 'Azure Services Dataset'
azure_dataset_descr = 'Dataset containing azure related posts on Stackoverflow'

In [2]:
from azureml.core import Workspace, Dataset
# from azureml.core.runconfig import RunConfiguration

## Connect To Workspace

In [None]:
# workspace = Workspace(
#     subscription_id=subscription_id, resource_group=resource_group, workspace_name=aml_workspace
# )
# workspace.write_config()

In [4]:
workspace = Workspace.from_config()
print('Workspace name: ' + workspace.name, 
      'Azure region: ' + workspace.location, 
      'Subscription id: ' + workspace.subscription_id, 
      'Resource group: ' + workspace.resource_group, sep = '\n')

Performing interactive authentication. Please follow the instructions on the terminal.
Interactive authentication successfully completed.
Workspace name: lauri-ml
Azure region: westeurope
Subscription id: a89228ac-8bf4-4646-9d2c-442b1cb5d622
Resource group: lauri-ml


#### If the datastore has already been registered, then you (and other users in your workspace) can directly run this cell.

In [8]:
datastore = workspace.datastores[datastore_name]

## Create Dataset

In [17]:
tabular = Dataset.Tabular.from_delimited_files(path=(datastore, dataset_path))
tabular = tabular.register(workspace, name=dataset_name, description=dataset_descr)

## Register Dataset

Azure Machine Learning service supports first class notion of a Dataset. A [Dataset](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.dataset.dataset?view=azure-ml-py) is a resource for exploring, transforming and managing data in Azure Machine Learning. The following Dataset types are supported:

* [TabularDataset](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.data.tabulardataset?view=azure-ml-py) represents data in a tabular format created by parsing the provided file or list of files.

* [FileDataset](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.data.filedataset?view=azure-ml-py) references single or multiple files in datastores or from public URLs.

First, we will use visual tools in Azure ML studio to register and explore our dataset as Tabular Dataset.

* **ACTION**: Follow [create-dataset](images/create-dataset.ipynb) guide to create Tabular Dataset from our training data.

#### Use created dataset in code

In [12]:
# Get a dataset by name
tabular_ds = Dataset.get_by_name(workspace=workspace, name=dataset_name)

# Load a TabularDataset into pandas DataFrame
df = tabular_ds.to_pandas_dataframe()

In [13]:
df.head(10)

Unnamed: 0,54398502,Issue with load for Azure functions <p>We are creating an application by making use of some Microsoft features like Vision API Text analytics and Bing translator onto Azure functions. As part of our application we are good with extracting and processing image content for smaller set of documents Whereas when we process larger set of documents we are resulting to much more processing time and various errors like database concurrency ocr error and translator error. It seems there were some limitation with Vision API(10 calls per second) and language detection (1000 call per minute) It would be great if you can help us to point towards parallelism with best statistics in terms of infrastructure like number of functions vs number of Vision API..etc. </p> <p>Thanks in advance.</p> <p>Regards Satya</p>,azure-functions
0,47576895,Terraform - Azure: Attach Existing Disk - Chan...,azure-virtual-machine
1,55119865,Devops npm task with custom command (build) no...,azure-devops
2,55502026,Azure DevOps (On Premise) - Minimatch Pattern ...,azure-devops
3,55659808,How do you get available Area Paths from Azure...,azure-devops
4,43471597,Azure WebJob Queue Trigger not responding to e...,azure-functions
5,55727018,Can we pass Databricks output to function in a...,azure-functions
6,52991806,azure ioteage device module fails to communica...,azure-devops
7,40620472,Azure Application Gateway with App Service <p>...,azure-web-app-service
8,9284781,Host ASPNET pages in Windows Azure Blob Storag...,azure-storage
9,48276779,Visual Studio Team Services how to publish cod...,azure-devops


## Register Dataset using SDK

In addition to UI we can register datasets using SDK. In this workshop we will register second type of Datasets using code - File Dataset. File Dataset allows specific folder in our datastore that contains our data files to be registered as a Dataset.

There is a folder within our datastore called **azure-service-data** that contains all our training and testing data. We will register this as a dataset.

#### If the dataset has already been registered, then you (and other users in your workspace) can directly run this cell.

In [16]:
azure_dataset = workspace.datasets[azure_dataset_name]

In [None]:
azure_dataset = Dataset.File.from_files(path=(datastore, azure_dataset_path))

azure_dataset = azure_dataset.register(
    workspace=workspace, name=azure_dataset_name, description=azure_dataset_descr
)