Copyright (c) Microsoft Corporation. All rights reserved.  
Licensed under the MIT License.

## Goal

The goal of this notebook is to demonstrate how Azure Machine Learning Pipelines caches results of steps and how one can use the `hash_paths` parameter to `PythonScriptStep` or the `regenerate_outputs` argument to `Experiment.submit()` in order to change behavior. This is intended as a very simple `hello_world`-type example in order to highlight the functionality.

For a more complex example that uses a script to run a notebook via `papermill`, please see this [notebook](simple-pm-run-as-pipeline.ipynb).

This notebook assumes appropriate Azure Machine Learning services have already been created. Please see [simple-pm-run.ipynb](simple-pm-run.ipynb) for an example if necessary.

This notebook started as a copy of the Pipelines Getting Started Notebook [here](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-getting-started.ipynb).

## Dependencies

This notebook requires that `azureml-sdk` is installed in the environment in which it is run.

## Workload

This notebook will submit the [main.py](./hash_paths_examples/main.py) script as a step in an Azure ML pipeline. This script imports two helper files - [helper1.py](./hash_paths_examples/helper1.py) and [helper2.py](./hash_paths_examples/helper2.py), runs functions defined by each, and logs the results of each function.

This notebook builds pipelines multiple times after changing various files and parameters to show their respective effects.

To start out, we can see what values [helper1.py](./hash_paths_examples/helper1.py) and [helper2.py](./hash_paths_examples/helper2.py) produce, and therefore, what should be logged when [main.py](./hash_paths_examples/main.py) is run.

In [1]:
import hash_paths_examples   ## needed for reload() later
from hash_paths_examples.helper1 import helper1
from hash_paths_examples.helper2 import helper2

original_helper_vals={'msg1': helper1(), 'msg2': helper2()}
print('msg1: {}'.format(original_helper_vals['msg1']))
print('msg2: {}'.format(original_helper_vals['msg2']))

msg1: msg1
msg2: msg2


Next, we can import the relevant libraries and set up the infrastructure we need to execute [main.py](./hash_paths_examples/main.py) remotely.

### Azure Machine Learning Imports

In this first code cell, we import key Azure Machine Learning modules that we will use below. 

In [2]:
import os
import subprocess
from imp import reload 
import pandas as pd

import azureml.core
from azureml.core import Workspace, Experiment
from azureml.core.runconfig import CondaDependencies, RunConfiguration
from azureml.core.runconfig import DEFAULT_CPU_IMAGE
from azureml.core.compute import AmlCompute

## load pipeline dependencies
from azureml.pipeline.core import Pipeline
from azureml.pipeline.steps import PythonScriptStep

# Check core SDK version number
print("SDK version:", azureml.core.VERSION)

SDK version: 1.0.30


### Initialize Workspace

Initialize a [workspace](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.workspace) object from persisted configuration, or get it from Azure

In [3]:
## Adjust these to fit your environment
path_to_amlconfig = '.'
aml_compute_target = "aml-compute-d2" ## 2-16 characters

In [4]:
exp_name = "hash_paths_demo_tst"
pipeline_step_name = "run main.py"
script_name = "main.py"
# project folder
project_folder = './hash_paths_examples'

main_file = os.path.join(project_folder,'main.py')
helper_file = os.path.join(project_folder,'helper1.py')
helper2_file = os.path.join(project_folder,'helper2.py')

In [5]:
if os.path.isdir(os.path.join(path_to_amlconfig,'aml_config')):
    print('Loading Workspace information from configuration')
    ws = Workspace.from_config(path_to_amlconfig)
else:
    print('Getting Workspace information from Variables. This will fail if you have not set these!')
    SUBSCRIPTION_ID = os.getenv("AZ_SUB","")
    RESOURCE_GROUP = os.getenv("RESOURCE_GROUP","")
    WS_NAME = os.getenv("WS_NAME","")
    WS_LOCATION = 'eastus'
    ws=Workspace.get(name=WS_NAME,
                    resource_group=RESOURCE_GROUP,
                    subscription_id=SUBSCRIPTION_ID)

print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\n')


Loading Workspace information from configuration
jeremr_top10_mvl_aml
jeremr_top10_mvl
eastus
03909a66-bef8-4d52-8e9a-a346604e0902


## Compute Targets

We will use `AmlCompute` for executing our pipeline step. You may need to create an AmlCompute resource if you have not yet.

#### List of Compute Targets on the workspace

In [6]:
cts = ws.compute_targets
for ct in cts:
    print(ct)

jeremr-top10-adb
jeremr-top10-mvl
top10-mvl-d4v2
aml-compute-d2


In [7]:
## run_config.load() does not seem to work:
# my_aml_run_config = RunConfiguration()
# my_aml_run_config.load(path='.', name=aml_compute_target)
# print(my_aml_run_config.target) ## still prints 'local'
# it does not load the values...

## so recreate
aml_compute = AmlCompute(ws, aml_compute_target)

# cd = CondaDependencies.create(pip_packages=["ipykernel", "papermill", "azureml-sdk"])
#my_aml_run_config = RunConfiguration(conda_dependencies=cd)
my_aml_run_config = RunConfiguration()
my_aml_run_config.target = aml_compute_target
my_aml_run_config.environment.docker.enabled = True
my_aml_run_config.environment.docker.base_image = DEFAULT_CPU_IMAGE

## Run as a Pipeline 

Because we will re-create the pipeline multiple times, we will create a helper function that will create the pipeline, validate it, submit it, and retrieve metrics logged by the relevant step.

The steps that this function does are the following:

### Build and Validate the Pipeline

You have the option to [validate](https://docs.microsoft.com/en-us/python/api/azureml-pipeline-core/azureml.pipeline.core.pipeline.pipeline?view=azure-ml-py#validate) the pipeline prior to submitting for run. Note that this is not strictly necessary - the platform runs validation steps such as checking for circular dependencies even if you do not explicitly call the validate method.

### Submit the Pipeline

[Submitting](https://docs.microsoft.com/en-us/python/api/azureml-pipeline-core/azureml.pipeline.core.pipeline.pipeline?view=azure-ml-py#submit) the pipeline involves creating an [Experiment](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.experiment?view=azure-ml-py) object and providing the built pipeline for submission. 

###  Capture logged metrics

Generally speaking, pipelines are designed to accomodate multiple steps. In order to accomplish this, the result of `Experiment.submit()` with a pipeline is a nested structure of runs so that the top level run contains children that correspond to each step in the pipeline. In this case, the object that is produced by the call to `Experiment.submit()` represents the set of all steps within the pipeline. In order to access the metrics associated with the particular `PythonScriptStep` we want to examine, we need to get the children of the pipeline run and get the metrics associated with those children.


In [8]:
def get_step_metrics(pipeline_run):
    """Gets metrics logged for all children of a pipeline run.
    """
    steps=pipeline_run.get_children()
    logged_values = {i.name: i.get_metrics() for i in steps}
    return(logged_values)

# Uses default values for PythonScriptStep construct.
def build_and_run_pipeline(tag,pipeline_step_name=pipeline_step_name,hash_paths=None,regenerate_outputs=False):
    """Helper function build and run pipelines for testing purposes
    """
    step1 = PythonScriptStep(name=pipeline_step_name,
                             script_name=script_name, 
                             compute_target=aml_compute, 
                             source_directory=project_folder,
                             runconfig=my_aml_run_config,
                             allow_reuse=False,
                             hash_paths=hash_paths
                            )
    print("*** Step1 created")
    pipeline1 = Pipeline(workspace=ws, steps=[step1])
    print("*** Pipeline is built")
    pipeline1.validate()
    print("*** Pipeline validation complete")
    pipeline_run1 = Experiment(ws, exp_name).submit(pipeline1, regenerate_outputs=regenerate_outputs)
    print("*** Pipeline is submitted for execution")
    pipeline_run1.wait_for_completion()
    ## only return the metrics for the pipeline_step we care about
    metrics=get_step_metrics(pipeline_run1)[pipeline_step_name]
    ## add additional fields to make later presentation easier
    metrics['tag']=tag
    metrics['hash_paths']=hash_paths
    metrics['regenerate_outputs']=regenerate_outputs    
    ## reload these and log the current local status of helper1() and helper2()
    reload(hash_paths_examples.helper1)
    from hash_paths_examples.helper1 import helper1
    metrics['msg1_local']=helper1()
    reload(hash_paths_examples.helper2)
    from hash_paths_examples.helper2 import helper2
    metrics['msg2_local']=helper2()
    return(metrics)

In [9]:
metrics_original = build_and_run_pipeline(tag='ORIGINAL',regenerate_outputs=True)

*** Step1 created
*** Pipeline is built
*** Pipeline validation complete
Created step run main.py [3f3c6053][5e858c90-87c1-42d8-bbd8-5fb716a813ae], (This step will run and generate new outputs)
Submitted pipeline run: c74a811a-1b61-4304-9609-13c53a9f3bf3
*** Pipeline is submitted for execution
RunId: c74a811a-1b61-4304-9609-13c53a9f3bf3
Link to Portal: https://mlworkspace.azure.ai/portal/subscriptions/03909a66-bef8-4d52-8e9a-a346604e0902/resourceGroups/jeremr_top10_mvl/providers/Microsoft.MachineLearningServices/workspaces/jeremr_top10_mvl_aml/experiments/hash_paths_demo_tst/runs/c74a811a-1b61-4304-9609-13c53a9f3bf3
Status: NotStarted
.........
Status: Running
...........
Status: Finished
{'runId': 'c74a811a-1b61-4304-9609-13c53a9f3bf3', 'status': 'Completed', 'startTimeUtc': '2019-04-24T18:05:50.188506Z', 'endTimeUtc': '2019-04-24T18:06:56.416039Z', 'properties': {'azureml.runsource': 'azureml.PipelineRun', 'runSource': None, 'runType': 'HTTP', 'azureml.parameters': '{}'}, 'logFiles':

Now, we can print out the metrics (and additional fields we added in our wrapper) so that we can create a table below.

In [10]:
display(pd.DataFrame([metrics_original]))

Unnamed: 0,hash_paths,main_value,msg1,msg1_local,msg2,msg2_local,regenerate_outputs,tag
0,,1,msg1,msg1,msg2,msg2,True,ORIGINAL


In [11]:
tmpdf = pd.DataFrame([metrics_original])
tmpdf.append([metrics_original])
tmpdf

Unnamed: 0,hash_paths,main_value,msg1,msg1_local,msg2,msg2_local,regenerate_outputs,tag
0,,1,msg1,msg1,msg2,msg2,True,ORIGINAL


We can then confirm that the metrics in the pipeline step produce the values in our local project directory:

In [12]:
assert metrics_original['msg1'] == metrics_original['msg1_local'], "helper1() does not return the appropriate value"
assert metrics_original['msg2'] == metrics_original['msg2_local'], "helper2() does not return the appropriate value"

## Change a helper and see how that impacts the logged metrics

Try editing `msg` in [helper1.py](./hash_paths_examples/helper1.py), and then building and running the same pipeline. 

We can do this programmatically. First, check what `helper1()` currently returns, which we stored above.

In [13]:
print(original_helper_vals['msg1'])

msg1


Next, replace that string with a different string. Note that you can do this manually, or the following code will do this programmatically, but will only work in a unix environment with the `sed` command.

In [14]:
## Function to change the file in this very simple case:
def change_file(file_name, str1, str2):
    with open(file_name, 'r') as fh:
        content = fh.read()
    content = content.replace(str1, str2)
    with open(file_name, 'w') as fh:
        fh.write(content)
    

In [15]:
change_file(helper_file, original_helper_vals['msg1'], original_helper_vals['msg1'] + "-1")

Next, check the value of `helper1()`

In [16]:
reload(hash_paths_examples.helper1)
from hash_paths_examples.helper1 import helper1

new_local1_val = helper1()
print('ORIGINAL LOCAL VALUE: {}'.format(original_helper_vals['msg1']))
print('NEW LOCAL VALUE:      {}'.format(new_local1_val))
assert original_helper_vals['msg1'] != new_local1_val, 'Strings must be different! change_file() failed.'

ORIGINAL LOCAL VALUE: msg1
NEW LOCAL VALUE:      msg1-1


Now that you can verify that the file has changed, then you can run the pipeline again, and check the metrics.

In [17]:
metrics_change_helper1 = build_and_run_pipeline(tag='HELPER1_CHANGE')

*** Step1 created
*** Pipeline is built
*** Pipeline validation complete
Created step run main.py [96eb9c5a][5e858c90-87c1-42d8-bbd8-5fb716a813ae], (This step will run and generate new outputs)
Submitted pipeline run: 7133a69b-e598-47e5-bdd9-8634c9c934f7
*** Pipeline is submitted for execution
RunId: 7133a69b-e598-47e5-bdd9-8634c9c934f7
Link to Portal: https://mlworkspace.azure.ai/portal/subscriptions/03909a66-bef8-4d52-8e9a-a346604e0902/resourceGroups/jeremr_top10_mvl/providers/Microsoft.MachineLearningServices/workspaces/jeremr_top10_mvl_aml/experiments/hash_paths_demo_tst/runs/7133a69b-e598-47e5-bdd9-8634c9c934f7
Status: Running
..........
Status: Finished
{'runId': '7133a69b-e598-47e5-bdd9-8634c9c934f7', 'status': 'Completed', 'startTimeUtc': '2019-04-24T18:07:09.863826Z', 'endTimeUtc': '2019-04-24T18:08:17.12753Z', 'properties': {'azureml.runsource': 'azureml.PipelineRun', 'runSource': None, 'runType': 'HTTP', 'azureml.parameters': '{}'}, 'logFiles': {'logs/azureml/executionlogs.t

In [18]:
results_list = [metrics_original, metrics_change_helper1]
display(pd.DataFrame(results_list))

Unnamed: 0,hash_paths,main_value,msg1,msg1_local,msg2,msg2_local,regenerate_outputs,tag
0,,1,msg1,msg1,msg2,msg2,True,ORIGINAL
1,,1,msg1,msg1-1,msg2,msg2,False,HELPER1_CHANGE


This should result in the same output as the first run (`msg1` column in the prior table), even though the [helper1.py](./hash_paths_examples/helper1.py) file has demonstrably changed (`msg1_local` column in the prior table). We can try again, but now, instead of changing [helper1.py](./hash_paths_examples/helper1.py), we can change a line in [main.py](./hash_paths_examples/main.py), and then rerun and see the results.

In [19]:
change_file(main_file, "main_value = 1", "main_value = 2")

You can confirm that `main.py` has changed:

In [20]:
! grep 'main_value =' {main_file}

main_value = 2


In [21]:
metrics_change_main = build_and_run_pipeline(tag='MAIN_CHANGE')

*** Step1 created
*** Pipeline is built
*** Pipeline validation complete
Created step run main.py [4ba94026][acdb8fc9-0b46-4ffe-94a4-8af4c46ea418], (This step will run and generate new outputs)
Submitted pipeline run: 246bd7d5-8cfd-43d9-af63-f83d463ecdfb
*** Pipeline is submitted for execution
RunId: 246bd7d5-8cfd-43d9-af63-f83d463ecdfb
Link to Portal: https://mlworkspace.azure.ai/portal/subscriptions/03909a66-bef8-4d52-8e9a-a346604e0902/resourceGroups/jeremr_top10_mvl/providers/Microsoft.MachineLearningServices/workspaces/jeremr_top10_mvl_aml/experiments/hash_paths_demo_tst/runs/246bd7d5-8cfd-43d9-af63-f83d463ecdfb
Status: Running
....
Status: Finished
{'runId': '246bd7d5-8cfd-43d9-af63-f83d463ecdfb', 'status': 'Completed', 'startTimeUtc': '2019-04-24T18:08:26.598604Z', 'endTimeUtc': '2019-04-24T18:08:58.457949Z', 'properties': {'azureml.runsource': 'azureml.PipelineRun', 'runSource': None, 'runType': 'HTTP', 'azureml.parameters': '{}'}, 'logFiles': {'logs/azureml/stderrlogs.txt': 'ht

In [22]:
results_list.append(metrics_change_main)
display(pd.DataFrame(results_list))

Unnamed: 0,hash_paths,main_value,msg1,msg1_local,msg2,msg2_local,regenerate_outputs,tag
0,,1,msg1,msg1,msg2,msg2,True,ORIGINAL
1,,1,msg1,msg1-1,msg2,msg2,False,HELPER1_CHANGE
2,,2,msg1-1,msg1-1,msg2,msg2,False,MAIN_CHANGE


## Results Summary

Results of runs are cached and reused by default.

By default:

- Changing an imported module does **NOT** result in an update of the cached results.
- Chaning the main script referenced by `script_name` in a `PythonScriptStep` results in all changes in `source_dir` to be propagated.

Experiment with other file changes to convince yourself of that.

### How to change this behavior

There are at least two ways to change this behavior.

- Use the `hash_paths` argument to `PythonScriptStep()` in order to make sure that key files (like the notebook) are checked for changes prior to submission.
- set `regenerate_outputs=True` when you run `Experiment.submit()`.

To see these two options in action, see the following sections.

## Using hash_paths

Reset the [helper1.py](./hash_paths_examples/helper1.py) script, and rebuild the `PythonScriptStep` and pipeline by defining that file's path as a `hash_path`.

In [23]:
change_file(helper_file, metrics_change_main['msg1'], original_helper_vals['msg1'])

In [24]:
print(metrics_change_main['msg1'])
original_helper_vals['msg1']

msg1-1


'msg1'

In [25]:
! cat {helper_file}

def helper1():
    msg='msg1'
    return(msg)


In [26]:
metrics_change_helper_hash1 = build_and_run_pipeline(tag='HELPER1_CHANGE_HASH', hash_paths=['helper1.py'])

*** Step1 created
*** Pipeline is built
*** Pipeline validation complete
Created step run main.py [8caf0926][afe10abd-d8b9-4e04-8b33-f960fe13f700], (This step will run and generate new outputs)
Submitted pipeline run: f23213e3-c735-4821-a2fe-34db1265c48c
*** Pipeline is submitted for execution
RunId: f23213e3-c735-4821-a2fe-34db1265c48c
Link to Portal: https://mlworkspace.azure.ai/portal/subscriptions/03909a66-bef8-4d52-8e9a-a346604e0902/resourceGroups/jeremr_top10_mvl/providers/Microsoft.MachineLearningServices/workspaces/jeremr_top10_mvl_aml/experiments/hash_paths_demo_tst/runs/f23213e3-c735-4821-a2fe-34db1265c48c
Status: NotStarted
.........
Status: Running
......
Status: Finished
{'runId': 'f23213e3-c735-4821-a2fe-34db1265c48c', 'status': 'Completed', 'startTimeUtc': '2019-04-24T18:10:13.265789Z', 'endTimeUtc': '2019-04-24T18:10:51.080151Z', 'properties': {'azureml.runsource': 'azureml.PipelineRun', 'runSource': None, 'runType': 'HTTP', 'azureml.parameters': '{}'}, 'logFiles': {'lo

In [27]:
results_list.append(metrics_change_helper_hash1)
display(pd.DataFrame(results_list))

Unnamed: 0,hash_paths,main_value,msg1,msg1_local,msg2,msg2_local,regenerate_outputs,tag
0,,1,msg1,msg1,msg2,msg2,True,ORIGINAL
1,,1,msg1,msg1-1,msg2,msg2,False,HELPER1_CHANGE
2,,2,msg1-1,msg1-1,msg2,msg2,False,MAIN_CHANGE
3,"[helper1.py, main.py]",2,msg1,msg1,msg2,msg2,False,HELPER1_CHANGE_HASH


Even though we have only changed helper1, its message is still updated.

Note that when using the `hash_path` argument, changes in files not mentioned are still ignored. Try updating [helper2.py](./hash_paths_examples/helper2.py) but running with hash_path set to helper1. You should see similar behavior as above.

In [28]:
change_file(helper2_file, metrics_change_main['msg2'], metrics_change_main['msg2']+'-1')
metrics_change_helper2_hash1 = build_and_run_pipeline(tag='HELPER2_CHANGE', hash_paths=['helper1.py'])

*** Step1 created
*** Pipeline is built
*** Pipeline validation complete
Created step run main.py [6f069ed7][afe10abd-d8b9-4e04-8b33-f960fe13f700], (This step will run and generate new outputs)
Submitted pipeline run: b57d6d14-3584-43a5-bd93-17a973d5084f
*** Pipeline is submitted for execution
RunId: b57d6d14-3584-43a5-bd93-17a973d5084f
Link to Portal: https://mlworkspace.azure.ai/portal/subscriptions/03909a66-bef8-4d52-8e9a-a346604e0902/resourceGroups/jeremr_top10_mvl/providers/Microsoft.MachineLearningServices/workspaces/jeremr_top10_mvl_aml/experiments/hash_paths_demo_tst/runs/b57d6d14-3584-43a5-bd93-17a973d5084f
Status: Running
....
Status: Finished
{'runId': 'b57d6d14-3584-43a5-bd93-17a973d5084f', 'status': 'Completed', 'startTimeUtc': '2019-04-24T18:11:04.096981Z', 'endTimeUtc': '2019-04-24T18:11:34.888377Z', 'properties': {'azureml.runsource': 'azureml.PipelineRun', 'runSource': None, 'runType': 'HTTP', 'azureml.parameters': '{}'}, 'logFiles': {'logs/azureml/stderrlogs.txt': 'ht

In [29]:
results_list.append(metrics_change_helper2_hash1)
display(pd.DataFrame(results_list))

Unnamed: 0,hash_paths,main_value,msg1,msg1_local,msg2,msg2_local,regenerate_outputs,tag
0,,1,msg1,msg1,msg2,msg2,True,ORIGINAL
1,,1,msg1,msg1-1,msg2,msg2,False,HELPER1_CHANGE
2,,2,msg1-1,msg1-1,msg2,msg2,False,MAIN_CHANGE
3,"[helper1.py, main.py]",2,msg1,msg1,msg2,msg2,False,HELPER1_CHANGE_HASH
4,"[helper1.py, main.py]",2,msg1,msg1,msg2,msg2-1,False,HELPER2_CHANGE


If you want to look in all files in `source_dir` for changes, then you can add `'.'` as the only element in `hash_paths`.

In [30]:
metrics_change_helper2_hashdot = build_and_run_pipeline(tag='HASH_PATHS_DOT', hash_paths=['.'])

*** Step1 created
*** Pipeline is built
Step run main.py is ready to be created [86a48fc8]
*** Pipeline validation complete
Created step run main.py [86a48fc8][997ac2bc-1668-40d7-b68e-e40c3c4d83b0], (This step will run and generate new outputs)
Submitted pipeline run: 8d5d4d7b-9856-41e9-9821-00c0e541f68b
*** Pipeline is submitted for execution
RunId: 8d5d4d7b-9856-41e9-9821-00c0e541f68b
Link to Portal: https://mlworkspace.azure.ai/portal/subscriptions/03909a66-bef8-4d52-8e9a-a346604e0902/resourceGroups/jeremr_top10_mvl/providers/Microsoft.MachineLearningServices/workspaces/jeremr_top10_mvl_aml/experiments/hash_paths_demo_tst/runs/8d5d4d7b-9856-41e9-9821-00c0e541f68b
Status: Running
.....
Status: Finished
{'runId': '8d5d4d7b-9856-41e9-9821-00c0e541f68b', 'status': 'Completed', 'startTimeUtc': '2019-04-24T18:11:50.146973Z', 'endTimeUtc': '2019-04-24T18:12:27.477191Z', 'properties': {'azureml.runsource': 'azureml.PipelineRun', 'runSource': None, 'runType': 'HTTP', 'azureml.parameters': '{

In [31]:
results_list.append(metrics_change_helper2_hashdot)
display(pd.DataFrame(results_list))

Unnamed: 0,hash_paths,main_value,msg1,msg1_local,msg2,msg2_local,regenerate_outputs,tag
0,,1,msg1,msg1,msg2,msg2,True,ORIGINAL
1,,1,msg1,msg1-1,msg2,msg2,False,HELPER1_CHANGE
2,,2,msg1-1,msg1-1,msg2,msg2,False,MAIN_CHANGE
3,"[helper1.py, main.py]",2,msg1,msg1,msg2,msg2,False,HELPER1_CHANGE_HASH
4,"[helper1.py, main.py]",2,msg1,msg1,msg2,msg2-1,False,HELPER2_CHANGE
5,"[., main.py]",2,msg1,msg1,msg2-1,msg2-1,False,HASH_PATHS_DOT


## Explore regenerate_outputs

We can try the same experiment (this time with `hash_paths=None`) to see if the `regenerate_outputs` parameter in `Experiment.submit()` has a similar effect. 

In this case, because we know that script changes trigger an update, we'll just manipulate `helper1.py` and whether `regenerate_outputs` is `True` or `False` on the submit call.


In [32]:
change_file(helper2_file, metrics_change_helper2_hashdot['msg2'], metrics_original['msg2'])
## hash_path = None for this
metrics_change_helper1_regen = build_and_run_pipeline(tag='HELPER2_CHANGE', regenerate_outputs=True)

*** Step1 created
*** Pipeline is built
*** Pipeline validation complete
Created step run main.py [9ba1612a][31114ab8-e8ec-444b-825f-059b5e62b106], (This step will run and generate new outputs)
Submitted pipeline run: 9a2170f6-b4f9-4866-81ce-b6edddaad880
*** Pipeline is submitted for execution
RunId: 9a2170f6-b4f9-4866-81ce-b6edddaad880
Link to Portal: https://mlworkspace.azure.ai/portal/subscriptions/03909a66-bef8-4d52-8e9a-a346604e0902/resourceGroups/jeremr_top10_mvl/providers/Microsoft.MachineLearningServices/workspaces/jeremr_top10_mvl_aml/experiments/hash_paths_demo_tst/runs/9a2170f6-b4f9-4866-81ce-b6edddaad880
Status: Running
....
Status: Finished
{'runId': '9a2170f6-b4f9-4866-81ce-b6edddaad880', 'status': 'Completed', 'startTimeUtc': '2019-04-24T18:12:39.537701Z', 'endTimeUtc': '2019-04-24T18:13:11.081568Z', 'properties': {'azureml.runsource': 'azureml.PipelineRun', 'runSource': None, 'runType': 'HTTP', 'azureml.parameters': '{}'}, 'logFiles': {'logs/azureml/stdoutlogs.txt': 'ht

In [33]:
results_list.append(metrics_change_helper1_regen)
display(pd.DataFrame(results_list))

Unnamed: 0,hash_paths,main_value,msg1,msg1_local,msg2,msg2_local,regenerate_outputs,tag
0,,1,msg1,msg1,msg2,msg2,True,ORIGINAL
1,,1,msg1,msg1-1,msg2,msg2,False,HELPER1_CHANGE
2,,2,msg1-1,msg1-1,msg2,msg2,False,MAIN_CHANGE
3,"[helper1.py, main.py]",2,msg1,msg1,msg2,msg2,False,HELPER1_CHANGE_HASH
4,"[helper1.py, main.py]",2,msg1,msg1,msg2,msg2-1,False,HELPER2_CHANGE
5,"[., main.py]",2,msg1,msg1,msg2-1,msg2-1,False,HASH_PATHS_DOT
6,,2,msg1,msg1,msg2,msg2,True,HELPER2_CHANGE


**Note:** If `regenerate_outputs` is set to `True`, a new submit will always force generation of all step outputs, and disallow data reuse for any step of this run. Once this run is complete, however, subsequent runs may reuse the results of this run.


## Change back to original

You can now change the files back to the original status, and you should get the same results. Because we're changing `main.py`, the pipeline will be updated.

In [34]:
change_file(main_file, "main_value = 2", "main_value = 1")
metrics_restore_original = build_and_run_pipeline(tag='RESTORE_ORIGINAL')

*** Step1 created
*** Pipeline is built
*** Pipeline validation complete
Created step run main.py [1d0d801a][5e858c90-87c1-42d8-bbd8-5fb716a813ae], (This step will run and generate new outputs)
Submitted pipeline run: a6ae7853-192c-4a49-af44-8e1c39b20592
*** Pipeline is submitted for execution
RunId: a6ae7853-192c-4a49-af44-8e1c39b20592
Link to Portal: https://mlworkspace.azure.ai/portal/subscriptions/03909a66-bef8-4d52-8e9a-a346604e0902/resourceGroups/jeremr_top10_mvl/providers/Microsoft.MachineLearningServices/workspaces/jeremr_top10_mvl_aml/experiments/hash_paths_demo_tst/runs/a6ae7853-192c-4a49-af44-8e1c39b20592
Status: NotStarted
.........
Status: Running
....
Status: Finished
{'runId': 'a6ae7853-192c-4a49-af44-8e1c39b20592', 'status': 'Completed', 'startTimeUtc': '2019-04-24T18:14:22.167511Z', 'endTimeUtc': '2019-04-24T18:14:52.619439Z', 'properties': {'azureml.runsource': 'azureml.PipelineRun', 'runSource': None, 'runType': 'HTTP', 'azureml.parameters': '{}'}, 'logFiles': {'logs

In [35]:
results_list.append(metrics_restore_original)
display(pd.DataFrame(results_list))

Unnamed: 0,hash_paths,main_value,msg1,msg1_local,msg2,msg2_local,regenerate_outputs,tag
0,,1,msg1,msg1,msg2,msg2,True,ORIGINAL
1,,1,msg1,msg1-1,msg2,msg2,False,HELPER1_CHANGE
2,,2,msg1-1,msg1-1,msg2,msg2,False,MAIN_CHANGE
3,"[helper1.py, main.py]",2,msg1,msg1,msg2,msg2,False,HELPER1_CHANGE_HASH
4,"[helper1.py, main.py]",2,msg1,msg1,msg2,msg2-1,False,HELPER2_CHANGE
5,"[., main.py]",2,msg1,msg1,msg2-1,msg2-1,False,HASH_PATHS_DOT
6,,2,msg1,msg1,msg2,msg2,True,HELPER2_CHANGE
7,,1,msg1,msg1,msg2,msg2,False,RESTORE_ORIGINAL
