# Train HuggingFace text classifiation Models with PyTorch

description: train single-node, including single-node multi-gpu, pytorch

In [None]:
!pip install --upgrade tensorboard azureml-tensorboard gitpython

In [None]:
from azureml.core import Workspace

ws = Workspace.from_config()
ws

In [None]:
import git
from pathlib import Path

# get root of git repo
prefix = Path(git.Repo(".", search_parent_directories=True).working_tree_dir)

# training script
source_dir = prefix.joinpath(
    "code", "train", "huggingface", "classification"
)
script_name = "train.py"

# environment file
environment_file = prefix.joinpath("environments", "huggingface-stc.yml")

# azure ml settings
environment_name = "hf-classification"
experiment_name = "hf-classification"
cluster_name = "gpu-K80-2"

## Create environment

Define a conda environment YAML file with your training script dependencies and create an Azure ML environment. The dependencies for this tutorial include **torch**, **torchvision**, and **pytorch-lightning**.

Since this example is for GPU training, you will need to specify a GPU base image that has the necessary dependencies. Azure ML maintains a set of base images published on Microsoft Container Registry (MCR) that you can use, see the [Azure/AzureML-Containers](https://github.com/Azure/AzureML-Containers) GitHub repo for more information.

Azure ML will build a conda environment with the dependencies you specified in your .yml file on the base image.

In [None]:
from azureml.core import Environment

env = Environment.from_conda_specification(environment_name, environment_file)

# specify a GPU base image
env.docker.enabled = True
env.docker.base_image = (
    "mcr.microsoft.com/azureml/openmpi3.1.2-cuda10.2-cudnn8-ubuntu18.04"
)

Alternatively, you can just capture all your dependencies directly in a custom Docker image or Dockerfile, and create your environment from that. For more information, see [Train with custom image](https://docs.microsoft.com/azure/machine-learning/how-to-train-with-custom-image).

## Configure and run training job
Create a ScriptRunConfig to specify the training script & arguments, environment, and cluster to run on.



In [None]:
import os
from azureml.core import ScriptRunConfig, Experiment

cluster = ws.compute_targets[cluster_name]

src = ScriptRunConfig(
    source_directory=source_dir,
    script=script_name,
    arguments=["--model_name_or_path", "bert-base-cased ",
               "--task_name", "MRPC",
               "--do_train", "",
               "--do_eval", "",
               "--max_seq_length", 128,
               "--per_device_train_batch_size", 32,
               "--learning_rate", 2e-5,
               "--num_train_epochs", 3,
               "--output_dir", "./output"
               ],
    compute_target=cluster,
    environment=env,
)
run = Experiment(ws, experiment_name).submit(src)
run

In [None]:
from azureml.widgets import RunDetails

RunDetails(run).show()

In [None]:
run.wait_for_completion(show_output=True)