# Quickstart to integrate Recommenders in AzureML Designer

This notebook shows how to integrate any algorithm in Recommenders library into AzureML Designer. 

[AzureML Designer](https://docs.microsoft.com/en-us/azure/machine-learning/concept-designer) lets you visually connect datasets and modules on an interactive canvas to create machine learning models. 

![img](https://recodatasets.blob.core.windows.net/images/designer-drag-and-drop.gif)

One of the features of AzureML Designer is that it is possible for developers to integrate any python library to make it available as a module. In this notebook are are going to show how to integrate [SAR](sar_movielens.ipynb) and several other modules in Designer


## Installation

The first step is to install [Azure CLI](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli) and Module CLI extension. Assuming that you have installed the Recommenders environment `reco_base` as explained in the [SETUP.md](../../SETUP.md), you need to install:
```bash
conda activate reco_base
pip install azure-cli
# Uninstall azure-cli-ml (the `az ml` commands)
az extension remove -n azure-cli-ml
# Install local version of azure-cli-ml (which includes `az ml module` commands)
az extension add --source https://azuremlsdktestpypi.azureedge.net/CLI-SDK-Runners-Validation/13082891/azure_cli_ml-0.1.0.13082891-py3-none-any.whl --pip-extra-index-urls https://azuremlsdktestpypi.azureedge.net/CLI-SDK-Runners-Validation/13082891 --yes
```

## Module implementation

The scenario that we are going to reproduce in Designer, as a reference example, is the content of the [SAR quickstart notebook](sar_movielens.ipynb). In it, we load a dataset, split it into train and test sets, train SAR algorithm, predict using the test set and compute several ranking metrics (precision at k, recall at k, MAP and nDCG).

For the pipeline that we want to create in Designer, we need to build the following modules:

- Stratified splitter
- SAR training
- SAR prediction
- Precision at k
- Recall at k
- MAP
- nDCG

The python code is defined with a python entry and a yaml file. All the python entries and yaml files for this pipeline can be found in [reco_utils/azureml/azureml_designer_modules](../../reco_utils/azureml/azureml_designer_modules).


### Define python entry

To illustrate how a python entry is defined we are going to explain the [precision at k entry](../../reco_utils/azureml/azureml_designer_modules/entries/precision_at_k_entry.py). A simplified version of the code is shown next:

```python
# Dependencies
from azureml.studio.core.data_frame_schema import DataFrameSchema
from azureml.studio.core.io.data_frame_directory import (
    load_data_frame_from_directory,
    save_data_frame_to_directory,
)
from reco_utils.evaluation.python_evaluation import precision_at_k

# First, the input variables of precision_at_k are defined as argparse arguments
if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--rating-true", help="True DataFrame.")
    parser.add_argument("--rating-pred", help="Predicted DataFrame.")
    parser.add_argument(
        "--col-user", type=str, help="A string parameter with column name for user."
    )
    # ... more arguments
    args, _ = parser.parse_known_args()

    # This module has two main inputs from the canvas, the true and predicted labels
    # they are loaded into the runtime as a pandas DataFrame
    rating_true = load_data_frame_from_directory(args.rating_true).data
    rating_pred = load_data_frame_from_directory(args.rating_pred).data

    # The python function is instantiated and the computation is performed
    eval_precision = precision_at_k(rating_true, rating_pred)
    
    # To output the result to Designer, we write it as a DataFrame
    score_result = pd.DataFrame({"precision_at_k": [eval_precision]})
    save_data_frame_to_directory(
        args.score_result,
        score_result,
        schema=DataFrameSchema.data_frame_to_dict(score_result),
    )
```


### Define module specification yaml

Once we have the python entry, we need to create the yaml file that will interact with Designer, [precision_at_k.yaml](../../reco_utils/azureml/azureml_designer_modules/module_specs/precision_at_k.yaml).

```yaml
moduleIdentifier: 
  namespace: microsoft.com/cat
  moduleName: Precision at K
  moduleVersion: 1.1.0
description: "Precision at K metric from Recommenders repo: https://github.com/Microsoft/Recommenders."
metadata:
  annotations:
    tags: ["Recommenders", "Metrics"]
inputs:
- name: Rating true
  type: DataFrameDirectory
  description: True DataFrame.
- name: Rating pred
  type: DataFrameDirectory
  description: Predicted DataFrame.
- name: User column
  type: String
  default: UserId
  description: Column name of user IDs.
- name: Item column
  type: String
  default: MovieId
  description: Column name of item IDs.
- name: Rating column
  type: String
  default: Rating
  description: Column name of ratings.
- name: Prediction column
  type: String
  default: prediction
  description: Column name of predictions.
- name: Relevancy method
  type: String
  default: top_k
  description: method for determining relevancy ['top_k', 'by_threshold'].
- name: Top k
  type: Integer
  default: 10
  description: Number of top k items per user.
- name: Threshold
  type: Float
  default: 10.0
  description: Threshold of top items per user.
outputs:
- name: Score
  type: DataFrameDirectory
  description: Precision at k (min=0, max=1).
implementation:
  container:
    amlEnvironment:
      python:
        condaDependenciesFile: sar_conda.yaml
    additionalIncludes:
      - ../../../
    command: [python, reco_utils/azureml/azureml_designer_modules/entries/precision_at_k_entry.py]
    args:
    - --rating-true
    - inputPath: Rating true
    - --rating-pred
    - inputPath: Rating pred
    - --col-user
    - inputValue: User column
    - --col-item
    - inputValue: Item column
    - --col-rating
    - inputValue: Rating column
    - --col-prediction
    - inputValue: Prediction column
    - --relevancy-method
    - inputValue: Relevancy method
    - --k
    - inputValue: Top k
    - --threshold
    - inputValue: Threshold
    - --score-result
    - outputPath: Score
```

In the yaml file we can see a number of sections. The heading defines attributes like name, version or description. In the section inputs, all inputs are defined. The two main dataframes have ports, which can be connected to other modules. The inputs without port appear in a canvas menu. The output is defined as a DataFrame as well. The last section, implementation, defines the conda environment, the associated python entry and the arguments to the python file.


## Module Registration

Once the code is implemented, we need to register it as an AzureML Designer custom module. The registration can be performed following these simple steps:

In [None]:
!az login

In [None]:
!az account set -s "Your subscription name"
!az ml folder attach -w "Your workspace name" -g "Your resource group name"

In [None]:
import os
import tempfile
import shutil
import subprocess

In [4]:
# Regsiter modules with spec via Azure CLI
root_path = os.path.abspath(os.path.join(os.getcwd(), "../../"))
specs_folder = os.path.join(root_path, "reco_utils/azureml/azureml_designer_modules/module_specs")
github_prefix = 'https://github.com/microsoft/recommenders/blob/master/reco_utils/azureml/azureml_designer_modules/module_specs/'
specs = os.listdir(specs_folder)
for spec in specs:
    spec_path = github_prefix + spec
    print(f"Start to register module spec: {spec} ...")
    subprocess.run(f"az ml module register --spec-file {spec_path}", shell=True)
    print(f"Done.")

Start to register module spec: map.yaml ...
Done.
Start to register module spec: ndcg.yaml ...
Done.
Start to register module spec: precision_at_k.yaml ...
Done.
Start to register module spec: recall_at_k.yaml ...
Done.
Start to register module spec: sar_conda.yaml ...
Done.
Start to register module spec: sar_score.yaml ...
Done.
Start to register module spec: sar_train.yaml ...
Done.
Start to register module spec: stratified_splitter.yaml ...
Done.


## Running Recommenders in AzureML Designer

Once the modules are registered, they will appear in the canvas as the module `Recommenders`. There you will be able to create a pipeline like this:

![img](https://recodatasets.blob.core.windows.net/images/azureml_designer_sar_precisionatk.png)

Now, thanks to AzureML Designer, users can compute the latest state of the art algorithms in recommendation systems without writing a line of python code.

## References

1. [AzureML Designer documentation](https://docs.microsoft.com/en-us/azure/machine-learning/concept-designer)
1. [Tutorial: Prediction of automobile price](https://docs.microsoft.com/en-us/azure/machine-learning/tutorial-designer-automobile-price-train-score)
1. [Tutorial: Classification of time flight delays](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-designer-sample-classification-flight-delay)
1. [Tutorial: Text classification of company categories](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-designer-sample-text-classification)