# Importing TFX Orchestrators in tfx.v1 namespace

<table align="left">
  <td>
    <a 
    href="https://colab.research.google.com/github/gbih/machine_learning/blob/main/tfx-templates/template_orchestrator_beam.ipynb" 
    target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
  </td>
</table>


TFX supports multiple orchestrators to run pipelines. However, there is a subtle change in what orchestrators we can import when we use `Module: tfx.v1` (public modules for TFX).

Raised an issue on the TFX GitHub repo,  [tfx.v1 / AirflowDagRunner now inaccessible #5145](https://github.com/tensorflow/tfx/issues/5145)


---

Reference:
* https://github.com/tensorflow/tfx/tree/master/tfx/v1
* https://www.tensorflow.org/tfx/tutorials/tfx/penguin_simple
* https://www.tensorflow.org/tfx/api_docs/python/tfx/v1/orchestration/LocalDagRunner

In [6]:
import sys

# Need if running on Colab or Kaggle
IS_COLAB = "google.colab" in sys.modules
IS_KAGGLE = "kaggle_secrets" in sys.modules
if IS_COLAB or IS_KAGGLE:
    !pip install --upgrade tfx &> /dev/null
    print()
    print("Need to restart runtime on Colab")


Need to restart runtime on Colab


In [1]:
def HR():
    print("-"*40)
    
def dir_ex(obj):
    result = [x for x in dir(obj) if not x.startswith('_')]
    print(type(obj))
    print()
    for x in result:
        print(f'{x:<40}', end="")

---


In [2]:
# Giving an alias to variable tfx
from tfx import v1 as tfx
print('TFX version: {}'.format(tfx.__version__))
dir_ex(tfx)

TFX version: 1.9.1
<class 'module'>

components                              dsl                                     extensions                              orchestration                           proto                                   types                                   utils                                   

In [18]:
print(tfx.orchestration.LocalDagRunner)

<class 'tfx.orchestration.local.local_dag_runner.LocalDagRunner'>


In [7]:
tfx.__path__

['/usr/local/lib/python3.7/dist-packages/tfx/v1']

---
Using `from tfx import v1 as tfx` essentially limits us to the recently defined Public modules for TFX.

In [8]:
dir_ex(tfx.orchestration)
HR()

# Remember that this is actually `tfx.v1.orchestration`
print(tfx.orchestration)

# Added to later TFX version
# print(tfx.orchestration.experimental.KubeflowDagRunner)
# print(tfx.orchestration.experimental.KubeflowV2DagRunner)

<class 'module'>

LocalDagRunner                          experimental                            metadata                                ----------------------------------------
<module 'tfx.v1.orchestration' from '/usr/local/lib/python3.7/dist-packages/tfx/v1/orchestration/__init__.py'>


---
The orchestration runners now available via this v1 alias are now 
organized in the tfx.orchestration module.
The actual pathway is now `tfx.v1.orchestration`

https://github.com/tensorflow/tfx/tree/master/tfx/v1/orchestration

In [19]:
# LocalDagRunner
# Declared in https://github.com/tensorflow/tfx/blob/master/tfx/v1/orchestration/__init__.py
print((tfx.orchestration.LocalDagRunner))

# Added to later TFX version

# KubeflowDagRunner, added in TFX version: +1.9.2
# Declared in https://github.com/tensorflow/tfx/blob/master/tfx/v1/orchestration/experimental/__init__.py
# print(tfx.orchestration.experimental.KubeflowDagRunner)

# KubeflowV2DagRunner, added in TFX version: +1.9.2
# Declared in https://github.com/tensorflow/tfx/blob/master/tfx/v1/orchestration/experimental/__init__.py
# print(tfx.orchestration.experimental.KubeflowV2DagRunner)

<class 'tfx.orchestration.local.local_dag_runner.LocalDagRunner'>


In [15]:
# However, there are some orchestration runners not defined here anymore, 
# mainly BeamDagRunner

try:
    print(tfx.orchestration.beam.beam_dag_runner.BeamDagRunner)
except Exception as e:
    print(f"Error1: {e}")

Error1: module 'tfx.v1.orchestration' has no attribute 'beam'


In [16]:
# Because the directory setup, we can import BeamDagRunner
from tfx.orchestration.beam.beam_dag_runner import BeamDagRunner
print(BeamDagRunner)

<class 'tfx.orchestration.beam.beam_dag_runner.BeamDagRunner'>


---
We can check the other orchestration runners previously defined in
https://github.com/tensorflow/tfx/tree/master/tfx/orchestration
and make them still available by directly importing them.

In this case, we see that AirflowDagRunner is still unavailable.

In [17]:
# https://github.com/tensorflow/tfx/blob/master/tfx/examples/airflow_workshop/setup/dags/taxi_pipeline.py

try:
    from tfx.orchestration.airflow.airflow_dag_runner import AirflowDagRunner
except Exception as e:
    print(f"Error: {e}")
    

Error: No module named 'airflow'


# Note:

`tfx.orchestration.experimental.KubeflowDagRunner` and `tfx.orchestration.experimental.KubeflowV2DagRunner` seem to be added during or after TFX 1.9.2, hence we cannot test in the Colab TFX version (1.9.1)

