MIGRATING TO THE NEW AZURE ML
This project was originally built for the old Azure Machine Learning Service. As of September 2018, the Azure ML service has changed drastically. Some of the improvements are:
- A powerful Python SDK to create projects, manage models, deploy services, create compute targets...
- Azure ML Workbench desktop application is deprecated
- "Machine learning experimentation account", the Workbench App, and "model management account" have been replaced by one clear Azure resource, the new Azure ML service workspace.
The following is an overview of how I migrated an existing service to the new Azure ML. This gives no info about the training of the model in the new Azure ML, since training was already done. It describes how to migrate the models, images and services. After this section, the documentation was not updated starting from section "0". This overview assumes some familiarity with the Azure ML service.
The new way of deploying a service with the Python SDK is demonstrated in the notebook newAzureML_deploy.ipynb. Below is an overview of steps that I took in order to support this deployment.
- Fetch the trained model of a succesful run (from the Azure Portal or in the deprecated Azure ML workbench) and save it to this working directory (my_model.h5), it is loaded from the notebook.
- Since there is now a nice Python SDK, we can handle everything in Python and without the Azure ML CLI. The process is described in the notebook "newAzureML_deploy.ipynb".
- Added a file "config.json" with info about my own Azure ML workspace. See the quickstarts on how to create one, or adapt the file "config_example.json" and rename it to "config.json". This file is loaded in the notebook.
- newAzureML_score.py is a slightly adapted version of score.py. It leverages the new SDK in the init() function to easily get the path to the model inside the container image, but you could also just use the old score.py file.
- A new conda dependencies file was made for the image creation (aml_config/newAzureML_env.yml) to support this new scoring file.
- callService.py and webcamdetect.py might no longer work, but calling the service is now easily done from the Python SDK. This is also demonstrated in the notebook.
- Other files have remained the same.
0.1 Azure ML components
- Experimentation Account: Required for Azure ML Workbench. Contains workspaces, which in turn contain projects. You can add multiple users (seats).
- Model Management Account: Used to register, maintain and deploy containerized ML services with the CLI: https://docs.microsoft.com/en-gb/azure/machine-learning/desktop-workbench/model-management-cli-reference
- Azure ML Workbench and CLI
0.2 Input files
Input files are read from the azure ML shared folder by the training script. These files are can be downloaded here. Save the files to the Azure ML shared directory on the compute target (see 2.1).
- Haarcascade_frontalface_default.xml (for the cascade face detector, included in the opencv-python library)
- The FACENET model file and weights in Keras format (both .h5 files). When adapting the code slightly, you could also use a .h5 file with contains both the model and the weights.
- Training images. Use one subfolder for each person, and name the subfolder like the person.
Building face recognition service using the Azure ML workbench and CLI. We are using a pretrained facenet model on which we add a dense layer with softmax activation for classification. We use Keras, a higher-level API on top of Tensorflow.
The input to the created service is a preprocessed image encoded as a string.
Train.py trains and saves the model. Images are read from files on the shared directory of the compute target, faces are extracted with a cascade classifier and then these faces are used to train a model. Starting from a pretrained facenet model,we add an extra dense layer with softmax activation to classify our own images. You can add extra people by just adding extra folders of images.
Note: you can train with different people by adding more subfolders to the image folder.
The model is trained using the higher-level API Keras with a Tensorflow backend.
Running this script can be done with the Workbench GUI. Select the script (e.g. train.py), add arguments if applicable, and run. The script is then submitted to the compute target (e.g. local). See next section.
We are using the outputs folder and the shared folder. More info: https://docs.microsoft.com/en-gb/azure/machine-learning/desktop-workbench/how-to-read-write-files
Input training data and the pretrained model are read from the shared folder. This is a requirement, since these files are to large to be saved in the project folder. Otherwise, they would need to be copied to the compute target every time an experiment is submitted. The shared folder is found with the environment variable "AZUREML_NATIVE_SHARE_DIRECTORY".
The trained keras model (my_model.h5) is saved to the outputs/ folder. This folder must be named outputs/ and receives special treatment. It is not submitted to the compute target when submitting an experiment. Saving files to the outputs folder is preferable when the script produces file that will change with every experiment (e.g. the resulting model after a run with new settings). They become part of the run history. Any outputs saved to this folder, can be retrieved after a run in the GUI or with the CLI, by specifying the run ID.
Score.py generates an API schema (schema.json) and is also passed to the CLI to generate a scoring container image.
Score.py must be run as an "experiment" first. The schema.json file is saved by run.py to the outputs/ folder. When running as an experiment (via the GUI or with az ml experiment submit), the code below "if name == 'main'" will run. This part of the code is not relevant inside the service container image when deploying.
The init() and run(..) methods are required for deploying the service:
- Init() defines what happens when the service is first started. In our case, this is mainly loading the model.
- Run(input_bytes) defines what happens with an input request. Since the schema was created with DataTypes.STANDARD, it expects a string as input. This string is an encoded version of an input, which is decoded by run(input_bytes) to a numpy array and served to the model. It returns a JSON string as output (required, must be JSON serializable. Otherwise the created webservice will not work).
Used to the test the service after deploying. This reads an image from a file, extracts the face*, preprocesses** (encodes) it and then sends it the service. The data that is sent in the request must be a JSON string with a "input_bytes" key, i.e. exactly matching the name of the argument in run(input_bytes).
Unfortunately, the face must be extracted and processed into an encoded string in advance before sending it to the service. Just uploading an image is not possible.
*In the current project, the face is extracted with a cascade classifier (instead of using some deep learning method such as object detection). Afterwards, only the extracted face is sent to a neural net for classification. Note that this was also done at training time.
**the code for encoding the image is found in the myImageLibrary.py file. Also note the importance of normalizing the image (dividing pixel values by 255) since this was also done at training time!!!
example usage: python callService.py --url (service url) --path (path to local image) --key (authorization key, when using cluster environment)
Detects people on the webcam, using the service URL.
3 Submitting the experiments (training)
First, the experiments train.py and score.py must be run on the compute target. Train.py will create a trained model, score.py will create a schema.json file.
Attaching a remote computetarget is done with the command below.
az ml computetarget attach remote -a (hostname/ip-address) -n (new targetname) -u (username) [-w (password)]
This creates two files, targetname.runconfig and targetname.compute. These contain information about the connection and configuration.
More info about compute target types and configuration can be found in the Documentation
prepare the environment (installing dependencies etc.)
- az ml experiment prepare -c (targetname)
Submit the training experiment (train.py) to the remote target
- az ml experiment submit -c (targetname) train.py [--epochs 5]
The script reads files (training data and base model) from the Azure ML shared directory. Make sure these files are present on the target machine in this directory. The shared directory location is set with the environment variable AZUREML_NATIVE_SHARE_DIRECTORY. Save the model files and images in this directory.
View the history of all previous runs with the following command. The run history is stored in the associated storage account and stores output files (most notable the model file) and metrics (if configured) for each run. All files stored in the 'outputs/' folder by the script, are considered outputs to be saved in history.
- az ml history list
Return generated model
This command will return the outputs of the experiment (situated on the target computer, in the outputs/ folder of that specific run) back to your local outputs/ folder in the project directory.
- az ml experiment return -r (run-id) -t (target name)
The run id is found in the output of the 'submit' command.
Along with the model, a number_to_text_label.json file is also present in the outputs of the experiment. Copy this file from the outputs/ folder to the root of the working directory. Otherwise, the service will return numbers instead of people's names. If the file remains in outputs/, it will not be present in the service container created in the next section.
3 Deploying the service
Our model is now trained and ready to be used. Using this model, we can create a service that takes an image as input and returns the name of the recognized person.
First, an environment must be created. Typically, you will set up a local environment for testing and a cluster environment for deploying. We are talking about deployment environments now, not about compute environments for training! You will also need to create your model management account in advance. Both are one-time steps. If not yet registered, register for the Microsoft.ContainerRegistry Resource provider.
The commands below perform these steps. More info can be found here, under 'prepare to operationalize locally'. The second command creates a local deployment environment for testing. Later, we will create a cluster environment in Azure Container Services.
- az provider register --namespace Microsoft.ContainerRegistry
- az ml env setup -n (new env name) --location "West Europe"
- az ml account modelmanagement create --location "West Europe" -n (new account name) -g (resource group name) --sku-name S1
Find the resource group name associated with the environment with
- az ml env show
After creating an environment and model management account, do the following: Login in to Azure, set the environment, set the model management account.
- az login
- az ml env set --name (env name) --resource-group (rg name)
- az ml account modelmanagement set --name (account name) --resource-group (rg name)
3.2 Deploying locally
Deploying is done in 4 steps: registering the model (with the given model file), creating a manifest (given the score.py file and the schema), building an image from this manifest and creating a service from this image. Thus, the 4 main components in model management are models, manifests, images and services.
- az ml model register --model outputs/my_model.h5 --name my_model.h5
- az ml manifest create --manifest-name my_manifest -f score.py -r python -i (model ID returned by previous command) -s outputs/schema.json -c aml_config\conda_dependencies.yml
- az ml image create -n imagefacerecognition --manifest-id (manifest ID returned by previous command)
- az ml service create realtime --image-id (Image ID returned by previous command) -n (service name)
These 4 steps can also be done in one command. More info here.
The registered models, manifests, images and services can be reviewed in the Model Management Account from the Azure portal.
The image is stored in an Azure container registry with a automatically generated name. There is (currently?) no option to store the image in your own registry. Locate the container with "az ml image usage -i (image ID)"
After the service was created, you can obtain the URL with the following command. In case of a cluster environment (AKS), this will also return an authorization key:
- az ml service usage realtime -i (service-id)
3.3 Deploying in an Azure Container Services (AKS) cluster
AKS is Azure's Kubernetes offering. With just a few commands, we can provision a cluster and deploy our service to it.
Setup a new cluster environment for deploying.
- az ml env setup -n (new cluster name) --location "West Europe" --cluster
Switch to this new environment and set execution context to cluster. The resource group for the -g parameter is created automatically when creating the environment and is typically named (new cluster name)rg. (also see az ml env list).
- az ml env set -n (cluster name) -g (*resource group')
- az ml env cluster
Now, use the image created in the previous section and create a service in exactly the same way.
- az ml service create realtime --image-id (Image ID) -n (service name)
The previous command will output the service ID. Now, obtain the service URL and authorization key.
- az ml service usage realtime -i (service-id)
- az ml service keys realtime -i (service-id)
After testing, don't forget to clean up the resources. The AKS cluster environment is not free, take it down immediately.
az ml service delete realtime –i (service ID) az ml env delete -n (cluster name) -g (resource group name)
3.4 Testing the model
See the callService.py script for an example on how to use the service.
- python callService.py --url (service URL) --path (path to test image) [--key (authorization key, if applicable. For cluster environments.)]
The webcamdetect.py script reads input from your webcam (if one is present) and outputs the people detected in the frame. Use a local service for this.
- python webcamdetect.py --url (service URL) [--key (if applicable)]