
## Prerequisites:

Before running this notebook, make sure you have gone through the steps listed below: 

 - You have a workspace created https://docs.microsoft.com/en-us/azure/machine-learning/service/quickstart-get-started
 <br>
 - You have a development environment configured https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-environment

In [None]:
%load_ext autoreload
%autoreload 2

import os
from azureml.core import  (Workspace,Run,VERSION, 
                           Experiment,Datastore)
from azureml.core.runconfig import (RunConfiguration,
                                    DEFAULT_GPU_IMAGE)
from azureml.core.conda_dependencies import CondaDependencies
from azureml.core.compute import (AmlCompute, ComputeTarget)
from azureml.exceptions import ComputeTargetException
from azureml.data.data_reference import DataReference
from azureml.pipeline.core import (Pipeline, 
                                   PipelineData)
from azureml.pipeline.steps import PythonScriptStep
from azureml.widgets import RunDetails
import pandas as pd


print("AML SDK version :", VERSION)

In [None]:
subscription_id = ''
resource_group = ''
workspace_name = ''

In [None]:
project_folder = os.getcwd()
exp_name = "facereco"
ws = Workspace(workspace_name = workspace_name,
               subscription_id = subscription_id,
               resource_group = resource_group)

ws.write_config()
print('Workspace loaded:', ws.name)

## Data Store

Whilst the preprocessed dataset have been made available, you can download from [here](https://amlgitsamples.blob.core.windows.net/facereco/fgnet.zip), upload it over to your Azure blob storage account and point to it in the cell below


In [None]:
account_name = "amlgitsamples"
container_name = "facereco"
datastore_name = 'fgnet'
datastore = Datastore.register_azure_blob_container(workspace = ws, 
                                        datastore_name = datastore_name, 
                                        container_name = container_name,
                                        account_name = account_name, 
                                        overwrite = True)

## Compute target 

Here we choose to execute the pipeline on Batch AI, but you can easily swap the compute target to other [supported types](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-ml-pipelines#key-advantages)

In [None]:
cluster_name = "gpu-cluster"

try:
    cluster = ComputeTarget(ws, cluster_name)
    print(cluster_name, "found")
    
except ComputeTargetException:
    print(cluster_name, "not found, provisioning....")
    provisioning_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6',max_nodes=1)

    
    cluster = ComputeTarget.create(ws, cluster_name, provisioning_config)
    cluster.wait_for_completion(show_output=True)

## Run configuration


Here, we define the conda environment along with the packages dependencies needed by our training scripts along with the [run configuration](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-azure-machine-learning-architecture#run-configuration).

In [None]:
cd = CondaDependencies()
cd.add_conda_package('pandas')
cd.add_channel(channel = 'menpo')
cd.add_conda_package('matplotlib')
#cd.add_conda_package('opencv')
cd.add_conda_package('scikit-learn')

cd.add_pip_package('tensorflow-gpu==1.14')
cd.add_pip_package('keras==2.2.4')
cd.add_pip_package('keras-vggface')
cd.add_pip_package('opencv-python')


run_config = RunConfiguration(framework="python",
                              conda_dependencies= cd)
run_config.target = cluster
run_config.environment.docker.enabled = True
run_config.environment.docker.base_image = DEFAULT_GPU_IMAGE
run_config.environment.python.user_managed_dependencies = False

## Pipeline data input/output

We define a reference to the data store we registered earlier that point to the storage which contains the images.

Note that the pipelineData objects uses the default the data store of the workspace.

In [None]:
images_dir = DataReference(data_reference_name = 'images', 
                             path_on_datastore = 'fgnet', 
                             mode ="download", 
                             datastore = datastore
                          )
default_datastore=ws.datastores["workspaceblobstore"]

metadata_dir = PipelineData(name = 'outputs')
vggface_dir = PipelineData(name = 'vggface')
pca_dir = PipelineData(name = 'pca')
clf_dir = PipelineData(name = 'outputs')

## Pipeline steps
 

Below are the four steps files that makes up the pipeline. For detailed description of all steps, refer to the readme file [here](https://github.com/Azure/AMLSamples/blob/master/facereco/readme.md).
    
   - Step 1 metadata processing [file](./preprocess.py)
   - Step 2 VGG-Face features extraction [file](./vggface_features.py)
   - Step 3 Dimensionality reduction [file](./pca.py)
   - Step 4 Classifier training [file](./classifier.py)
   
Next, we declare the steps that makes up the pipeline

In [None]:
metadata_processing = PythonScriptStep(
                            name = 'process images metadata',
                            script_name = 'preprocess.py',
                            arguments = ['--images_dir', images_dir,\
                                         '--metadata_path', metadata_dir],
                            inputs = [images_dir],
                            outputs = [metadata_dir],
                            compute_target = cluster_name,
                            runconfig = run_config
                        )


vggface_features = PythonScriptStep(
                            name = 'VGG-face features extractor',
                            script_name = 'vggface_features.py',
                            arguments = ['--metadata_path', metadata_dir,\
                                         '--images_dir', images_dir,\
                                        '--vggface_path', vggface_dir],
                            inputs = [metadata_dir, images_dir],
                            outputs = [vggface_dir],
                            compute_target = cluster_name,
                            runconfig = run_config
                        )

pca_features = PythonScriptStep(
                            name = 'PCA features extractor',
                            script_name = 'pca.py',
                            arguments = ['--vggface_path', vggface_dir,\
                                        '--pca_path', pca_dir],
                            inputs = [vggface_dir],
                            outputs = [pca_dir],
                            compute_target = cluster_name,
                            runconfig = run_config
                        )

classifier_step = PythonScriptStep(
                            name = 'Fit classifier',
                            script_name = 'classifier.py',
                            arguments = ['--vggface_path', vggface_dir,\
                                        '--pca_path', pca_dir,\
                                        '--clf_path', clf_dir],
                            inputs = [vggface_dir, pca_dir],
                            outputs = [clf_dir],
                            compute_target = cluster_name,
                            runconfig = run_config
                        )

## Pipeline execution

Finally we put it all together, construct an experiment and train the pipeline.

In [None]:
pipeline = Pipeline(default_datastore=ws.datastores["workspaceblobstore"],
                description = 'face recognition pipeline', 
                default_source_directory = project_folder,
                workspace = ws,
                steps = [classifier_step]
                   )

pipeline_run = Experiment(workspace=ws, name ="Face_recognition_exp").submit(pipeline, regenerate_outputs=True)
RunDetails(pipeline_run).show()