Skip to content

Latest commit

 

History

History
138 lines (83 loc) · 6.83 KB

File metadata and controls

138 lines (83 loc) · 6.83 KB

Data Annotation

Relevant Links

Prerequisites

Steps

  1. Create Datastore and Dataset
  2. Create Labeling Project and Annotate Data
  3. Export and use Labeled data

Creating a Datastore and Dataset

This guide will briefly show you how to create a datastore and dataset through the azure portal. For a more complete tutorial please see the guide on creating dataset using python sdk or through the azure portal.

Obtaining Azure Storage Key

To obtain an azure storage key, navigate to the storage account that contains your blob container, and select Access keys from the menu on the left. In this tab, click Show keys at the top, then copy the desired key (either one will work).

Access Keys

Creating a Datastore

Next, navigate back to the AzureML Studio and select Datastores in the menu on the left, then click New Datastore. This will bring up a prompt, fill out the form (including the account key), and click Create.

Create Datastore

Creating Dataset

After creating a Datastore, you can create a Dataset which allows fast access to the files for training/testing. Go to Datasets from the AzureML Studio menu, then click Create dataset, and choose From datastore from the dropdown menu.

Create Dataset

Next, in the prompts that appear set the name to padchest (or any unique name), and Dataset type to File then click Next. In the following menu, find and select the datastore that you previously created in the search bar and set the path to the path within the blob container where the the files are stored (if you are following the tutorial, this path should be: /). Finally, Complete the remaining prompts to create the dataset.

Create Dataset

Example dataset usage

Once a dataset has been created they are easy to use. Here is some example code:

# azureml-core of version 1.0.72 or higher is required
from azureml.core import Workspace, Dataset
import pandas as pd
import os
os.environ["RSLEX_DIRECT_VOLUME_MOUNT"] = "true" # IMPORTANT for performance

# When running from inside an AzureML notebook, 
# workspace can be pulled from the environment
workspace = Workspace.from_config()

# or Load workspace outside of azureML notebooks
# subscription_id, resource_group, workspace_name = '<sub_id>', '<rg_name>', '<ws_name>
# workspace = Workspace(subscription_id, resource_group, workspace_name)

# Find specified dataset
dataset_name = "padchest"
dataset = Dataset.get_by_name(workspace, name=dataset_name)

# On AzureML mounting it very easy to mount and use files
mount = dataset.mount()
mount.start()
print(mount.mount_point)

pc_csv_file = os.path.join(mount.mount_point, "PADCHEST_chest_x_ray_images_labels_160K_01.02.19.csv")
pc_df = pd.read_csv(pc_csv_file, low_memory=False, index_col=0)

# You can also download the dataset locally
# dataset.download(target_path='.', overwrite=False)

Create Labeling Project and Annotating Data

Note: This step is not required to complete this tutorial you can jump directly to the next steps and continue the tutorial.

Now that we have created a dataset, we can create a Labeling Project and start annotating data. For an indepth look at this process checkout the AzureML documentation on it here.

To create a labeling project, navigate to Data Labeling in from the Azure ML Studio menu, then select Add project. From the prompt give your labeling project a unique name, set the media type to Image, then select a Labeling task type. You can create a few different labeling projects of different types to experiment.

Create Label Project

Create Label Project

Go through the creation prompts until you reach the Select or create dataset page, on this menu select the dataset we create in previous steps (padchest).

Create Label Project

Continue to through the prompts, until the end and click Create project.

Create Label Project

The project can take some time to create, especially if there are numerous files within the dataset. Once the project has been created the State will show Running.

Create Label Project

Annotating Data

Now that we have created labeling projects, we can start annotating the data. A more complete guide to labeling can be found here, here we show a brief introduction.

To start annotating images, navigate to the Data Labeling page by clicking Data Labeling within AzureML studio (the same page from which you created the Labeling Project). From here select a Project name to view details.

Dashboard

From this page select Label data, review the instructions then click Start labeling at the bottom.

Instructions

From this page you can start annotating data based on the Label data type. Note that Azure ML supports regular (PNG/JPEG) images for annotations as well as DICOMs (only 2D, Xray modalities at the time of writing):

Segmentation

Segmentation

Bounding Box/Detection

Detection

Whole Image Classification

Segmentation

Export Labeling Data

Once you have annotated a project, you can export these labels into various formats from the Labeling project Dashboard. To do this, navigate to the desired Labeling project Dashboard and click Export then select the appropriate option from the dropdown menu.

Export Labels

By selecting, Azure ML dataset, this will create a versioned representation of your dataset that can be easily used throughout Azure ML by features such as Automated ML, Notebooks, and Designer.

Labeled Dataset

Next Steps

Now you are ready for to build a model in Azure ML!