# External References
Copyright (c) Microsoft Corporation. All rights reserved.<br>
Licensed under the MIT License.

In addition to opening existing Dataflows in code and modifying them, it is also possible to create and persist Dataflows that reference another Dataflow that has been persisted to a DataPrep package. In this case, executing this Dataflow will load the referenced DataPrep package dynamically, execute the referenced Dataflow, and then execute the steps in the referencing Dataflow.

To demonstrate, we will create a Dataflow that loads and transforms some data. After that, we will persist this Dataflow to a DataPrep package.

In [1]:
import azureml.dataprep as dprep
import tempfile
import os

df = dprep.smart_read_file('./data/fixed_width_file.txt')
df = df.drop_errors(['Column7', 'Column8', 'Column9'], dprep.ColumnRelationship.ANY)
df = df.set_name('FWF')
pkg = dprep.Package(df)
pkg_path = os.path.join(tempfile.gettempdir(), 'package.dprep')
pkg = pkg.save(pkg_path)



Now that we have a package file, we can create a new Dataflow that references it.

In [2]:
new_df = dprep.Dataflow.reference(dprep.ExternalReference(pkg_path, 'FWF'))
new_df.head(10)

Unnamed: 0,Column1,Column2,Column3,Column4,Column5,Column6,Column7,Column8,Column9
0,10010.0,99999.0,JAN MAYEN,"azureml.dataprep.native.DataPrepError(""'Micros...",JN,ENJA,70933.0,-8667.0,90.0
1,10014.0,99999.0,SOERSTOKKEN,"azureml.dataprep.native.DataPrepError(""'Micros...",NO,ENSO,59783.0,5350.0,500.0
2,10015.0,99999.0,BRINGELAND,"azureml.dataprep.native.DataPrepError(""'Micros...",NO,ENBL,61383.0,5867.0,3270.0
3,10016.0,99999.0,RORVIK/RYUM,"azureml.dataprep.native.DataPrepError(""'Micros...",NO,,64850.0,11233.0,140.0
4,10017.0,99999.0,FRIGG,"azureml.dataprep.native.DataPrepError(""'Micros...",NO,ENFR,59933.0,2417.0,480.0
5,10020.0,99999.0,VERLEGENHUKEN,"azureml.dataprep.native.DataPrepError(""'Micros...",SV,,80050.0,16250.0,80.0
6,10030.0,99999.0,HORNSUND,"azureml.dataprep.native.DataPrepError(""'Micros...",SV,,77000.0,15500.0,120.0
7,10040.0,99999.0,NY-ALESUND II,"azureml.dataprep.native.DataPrepError(""'Micros...",SV,ENAS,78917.0,11933.0,80.0
8,10050.0,99999.0,ISFJORD RADIO,"azureml.dataprep.native.DataPrepError(""'Micros...",NO,ENIS,78067.0,13633.0,50.0
9,10060.0,99999.0,EDGEOYA,"azureml.dataprep.native.DataPrepError(""'Micros...",NO,,78250.0,22783.0,140.0


When executed, the new Dataflow returns the same results as the one we saved in our package. Since this reference is resolved on execution, updating the package file results in the changes being visible when re-executing the referencing Dataflow.

In [3]:
df = df.take(5)
pkg = dprep.Package(df)
pkg.save(pkg_path)

new_df.head(10)

Unnamed: 0,Column1,Column2,Column3,Column4,Column5,Column6,Column7,Column8,Column9
0,10010.0,99999.0,JAN MAYEN,"azureml.dataprep.native.DataPrepError(""'Micros...",JN,ENJA,70933.0,-8667.0,90.0
1,10014.0,99999.0,SOERSTOKKEN,"azureml.dataprep.native.DataPrepError(""'Micros...",NO,ENSO,59783.0,5350.0,500.0
2,10015.0,99999.0,BRINGELAND,"azureml.dataprep.native.DataPrepError(""'Micros...",NO,ENBL,61383.0,5867.0,3270.0
3,10016.0,99999.0,RORVIK/RYUM,"azureml.dataprep.native.DataPrepError(""'Micros...",NO,,64850.0,11233.0,140.0
4,10017.0,99999.0,FRIGG,"azureml.dataprep.native.DataPrepError(""'Micros...",NO,ENFR,59933.0,2417.0,480.0


As we can see, even though we did not modify new_df, it now returns only 5 records, as the package was updated with the Dataflow that resulted from calling `df.take(5)`.