# Creating flows from your laptop machine learning code

To introduce data scientists to Metaflow, it will be key to show them how they can take their pre-existing ML code and turn it into flows. Arguably, the 3 most practical types of models are

* Random forests
* Boosted trees, and
* Neural nets.

To this end, in what follows, we show how you would take code for each of these types of models and turn it into a Metaflow.

## Setup instructions

We'll be using `conda` to install the necessary packages but you can also use `pip` or `virtualenv`. To use `conda`, install the Anaconda distribution from [here](https://www.anaconda.com/products/individual).
Using the command line, execute

```bash
conda env create env mf-tutorial
```
to create your environment. You can then activate it by executing

```bash
conda activate mf-tutorial
```

## Random forests

This is typical random forest code:

In [1]:
#Import scikit-learn dataset library
from sklearn import datasets

#Load dataset
iris = datasets.load_iris()
X = iris.data
y = iris.target
X.shape, y.shape

((150, 4), (150,))

In [2]:


# https://scikit-learn.org/stable/modules/ensemble.html#forest
from sklearn.model_selection import cross_val_score
from sklearn.datasets import make_blobs
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.tree import DecisionTreeClassifier

clf = DecisionTreeClassifier(max_depth=None, min_samples_split=2,
    random_state=0)
scores = cross_val_score(clf, X, y, cv=5)
print(scores)


clf = RandomForestClassifier(n_estimators=10, max_depth=None,
    min_samples_split=2, random_state=0)
scores = cross_val_score(clf, X, y, cv=5)
print(scores)


clf = ExtraTreesClassifier(n_estimators=10, max_depth=None,
    min_samples_split=2, random_state=0)
scores = cross_val_score(clf, X, y, cv=5)
print(scores)


[0.96666667 0.96666667 0.9        0.96666667 1.        ]
[0.96666667 0.96666667 0.9        0.93333333 1.        ]
[0.96666667 0.96666667 0.93333333 0.9        1.        ]


## Boosted trees



This is typical boosted tree code:

In [3]:
import xgboost as xgb
# read in data
dtrain = xgb.DMatrix('data/agaricus.txt.train')
dtest = xgb.DMatrix('data/agaricus.txt.test')
# specify parameters via map|
param = {'max_depth':2, 'eta':1, 'objective':'binary:logistic' }
num_round = 2
bst = xgb.train(param, dtrain, num_round)
# make prediction
preds = bst.predict(dtest)
print(preds)

[0.28583017 0.9239239  0.28583017 ... 0.9239239  0.05169873 0.9239239 ]


## Neural nets

This is (somewhat) typical deep learning code:

In [4]:
# https://keras.io/examples/vision/mnist_convnet/
import numpy as np
from tensorflow import keras
from tensorflow.keras import layers

In [5]:
# Model / data parameters
num_classes = 10
input_shape = (28, 28, 1)

# the data, split between train and test sets
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

# Scale images to the [0, 1] range
x_train = x_train.astype("float32") / 255
x_test = x_test.astype("float32") / 255
# Make sure images have shape (28, 28, 1)
x_train = np.expand_dims(x_train, -1)
x_test = np.expand_dims(x_test, -1)
print("x_train shape:", x_train.shape)
print(x_train.shape[0], "train samples")
print(x_test.shape[0], "test samples")


# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)


x_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples


In [6]:
model = keras.Sequential(
    [
        keras.Input(shape=input_shape),
        layers.Conv2D(32, kernel_size=(3, 3), activation="relu"),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Conv2D(64, kernel_size=(3, 3), activation="relu"),
        layers.MaxPooling2D(pool_size=(2, 2)),
        layers.Flatten(),
        layers.Dropout(0.5),
        layers.Dense(num_classes, activation="softmax"),
    ]
)

model.summary()


Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 26, 26, 32)        320       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 13, 13, 32)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 11, 11, 64)        18496     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 5, 5, 64)          0         
_________________________________________________________________
flatten (Flatten)            (None, 1600)              0         
_________________________________________________________________
dropout (Dropout)            (None, 1600)              0         
_________________________________________________________________
dense (Dense)                (None, 10)                1

2022-03-16 11:06:44.515311: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [7]:
batch_size = 128
epochs = 15

model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])

model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, validation_split=0.1)

2022-03-16 11:06:49.013027: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)


Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15


<tensorflow.python.keras.callbacks.History at 0x7fb1804fd7c0>

In [None]:
score = model.evaluate(x_test, y_test, verbose=0)
print("Test loss:", score[0])
print("Test accuracy:", score[1])


## Writing Local Machine Learning Flows


### Random Forests

In [8]:
%%writefile flows/local/rf_flow.py

from metaflow import FlowSpec, step, Parameter, JSONType, IncludeFile, card
import json

class ClassificationFlow(FlowSpec):
    """
    train a random forest
    """
    @card 
    @step
    def start(self):
        """
        Load the data
        """
        #Import scikit-learn dataset library
        from sklearn import datasets

        #Load dataset
        self.iris = datasets.load_iris()
        self.X = self.iris['data']
        self.y = self.iris['target']
        self.next(self.rf_model)
        

    @step
    def rf_model(self):
        """
        build random forest model
        """
        from sklearn.ensemble import RandomForestClassifier
        
        self.clf = RandomForestClassifier(n_estimators=10, max_depth=None,
            min_samples_split=2, random_state=0)
        self.next(self.train)

        
        
    @step
    def train(self):
        """
        Train the model
        """
        from sklearn.model_selection import cross_val_score
        self.scores = cross_val_score(self.clf, self.X, self.y, cv=5)
        self.next(self.end)
        
        
    @step
    def end(self):
        """
        End of flow, yo!
        """
        print("ClassificationFlow is all done.")


if __name__ == "__main__":
    ClassificationFlow()

Writing flows/local/rf_flow.py


Execute the above from the command line with

```bash
python flows/local/rf_flow.py run
```

In [9]:
! python flows/local/rf_flow.py run

[35m[1mMetaflow 2.5.0[0m[35m[22m executing [0m[31m[1mClassificationFlow[0m[35m[22m[0m[35m[22m for [0m[31m[1muser:hba[0m[35m[22m[K[0m[35m[22m[0m
[22mCreating local datastore in current directory (/Users/hba/Documents/Projects/tutorial-metaflow/.metaflow)[K[0m[22m[0m
[35m[22mValidating your flow...[K[0m[35m[22m[0m
[32m[1m    The graph looks good![K[0m[32m[1m[0m
[35m[22mRunning pylint...[K[0m[35m[22m[0m
[32m[1m    Pylint is happy![K[0m[32m[1m[0m
[35m2022-03-16 11:28:50.805 [0m[1mWorkflow starting (run-id 1647390530800029):[0m
[35m2022-03-16 11:28:50.816 [0m[32m[1647390530800029/start/1 (pid 12483)] [0m[1mTask is starting.[0m
[35m2022-03-16 11:28:52.625 [0m[32m[1647390530800029/start/1 (pid 12483)] [0m[1mTask finished successfully.[0m
[35m2022-03-16 11:28:52.635 [0m[32m[1647390530800029/rf_model/2 (pid 12491)] [0m[1mTask is starting.[0m
[35m2022-03-16 11:28:53.581 [0m[32m[1647390530800029/rf_model/2 (pid 124

In [10]:
%%writefile flows/local/tree_branch_flow.py

from metaflow import FlowSpec, step, Parameter, JSONType, IncludeFile, card
import json





class ClassificationFlow(FlowSpec):
    """
    train multiple tree based methods
    """
    @card 
    @step
    def start(self):
        """
        Load the data
        """
        #Import scikit-learn dataset library
        from sklearn import datasets

        #Load dataset
        self.iris = datasets.load_iris()
        self.X = self.iris['data']
        self.y = self.iris['target']
        self.next(self.rf_model, self.xt_model, self.dt_model)
    
                
    @step
    def rf_model(self):
        """
        build random forest model
        """
        from sklearn.ensemble import RandomForestClassifier
        from sklearn.model_selection import cross_val_score
        
        self.clf = RandomForestClassifier(n_estimators=10, max_depth=None,
            min_samples_split=2, random_state=0)
        self.scores = cross_val_score(self.clf, self.X, self.y, cv=5)
        self.next(self.choose_model)

    @step
    def xt_model(self):
        """
        build extra trees classifier
        """
        from sklearn.ensemble import ExtraTreesClassifier
        from sklearn.model_selection import cross_val_score
        

        self.clf = ExtraTreesClassifier(n_estimators=10, max_depth=None,
            min_samples_split=2, random_state=0)

        self.scores = cross_val_score(self.clf, self.X, self.y, cv=5)
        self.next(self.choose_model)

    @step
    def dt_model(self):
        """
        build decision tree classifier
        """
        from sklearn.tree import DecisionTreeClassifier
        from sklearn.model_selection import cross_val_score
        
        self.clf = DecisionTreeClassifier(max_depth=None, min_samples_split=2,
            random_state=0)

        self.scores = cross_val_score(self.clf, self.X, self.y, cv=5)

        self.next(self.choose_model)
                        
    @step
    def choose_model(self, inputs):
        """
        find 'best' model
        """
        import numpy as np

        def score(inp):
            return inp.clf,\
                   np.mean(inp.scores)

            
        self.results = sorted(map(score, inputs), key=lambda x: -x[1]) 
        self.model = self.results[0][0]
        self.next(self.end)
        
    @step
    def end(self):
        """
        End of flow, yo!
        """
        print('Scores:')
        print('\n'.join('%s %f' % res for res in self.results))


if __name__ == "__main__":
    ClassificationFlow()

Writing flows/local/tree_branch_flow.py


Execute the above from the command line with

```bash
python flows/local/tree_branch_flow.py run
```

In [11]:
! python flows/local/rf_flow.py run

[35m[1mMetaflow 2.5.0[0m[35m[22m executing [0m[31m[1mClassificationFlow[0m[35m[22m[0m[35m[22m for [0m[31m[1muser:hba[0m[35m[22m[K[0m[35m[22m[0m
[35m[22mValidating your flow...[K[0m[35m[22m[0m
[32m[1m    The graph looks good![K[0m[32m[1m[0m
[35m[22mRunning pylint...[K[0m[35m[22m[0m
[32m[1m    Pylint is happy![K[0m[32m[1m[0m
[35m2022-03-16 11:29:10.194 [0m[1mWorkflow starting (run-id 1647390550190432):[0m
[35m2022-03-16 11:29:10.202 [0m[32m[1647390550190432/start/1 (pid 12508)] [0m[1mTask is starting.[0m
[35m2022-03-16 11:29:11.998 [0m[32m[1647390550190432/start/1 (pid 12508)] [0m[1mTask finished successfully.[0m
[35m2022-03-16 11:29:12.006 [0m[32m[1647390550190432/rf_model/2 (pid 12516)] [0m[1mTask is starting.[0m
[35m2022-03-16 11:29:12.958 [0m[32m[1647390550190432/rf_model/2 (pid 12516)] [0m[1mTask finished successfully.[0m
[35m2022-03-16 11:29:12.966 [0m[32m[1647390550190432/train/3 (pid 12520)] [0m

### Boosted Trees

In [12]:
%%writefile flows/local/boosted_flow.py

from metaflow import FlowSpec, step, Parameter, JSONType, IncludeFile




class BSTFlow(FlowSpec):
    """
    train a boosted tree
    """

    @step
    def start(self):
        """
        Load the data & train model
        """
        import xgboost as xgb
        # from io import StringIO
        # read in data
        dtrain = xgb.DMatrix('data/agaricus.txt.train')
        #dtest = xgb.DMatrix('data/agaricus.txt.test')

                # specify parameters
        param = {'max_depth':2, 'eta':1, 'objective':'binary:logistic' }
        num_round = 2
        bst = xgb.train(param, dtrain, num_round)
        bst.save_model("model.json")
        self.next(self.predict)
        

        
        
    @step
    def predict(self):
        """
        make predictions
        """
        import xgboost as xgb

        dtest = xgb.DMatrix('data/agaricus.txt.test')
        # make prediction
        bst = xgb.Booster()
        bst.load_model("model.json")
        preds = bst.predict(dtest)
        self.next(self.end)
        
        
    @step
    def end(self):
        """
        End of flow, yo!
        """
        print("ClassificationFlow is all done.")


if __name__ == "__main__":
    BSTFlow()

Writing flows/local/boosted_flow.py


Execute the above from the command line with

```bash
python flows/local/boosted_flow.py run
```

In [13]:
! python flows/local/boosted_flow.py run

[35m[1mMetaflow 2.5.0[0m[35m[22m executing [0m[31m[1mBSTFlow[0m[35m[22m[0m[35m[22m for [0m[31m[1muser:hba[0m[35m[22m[K[0m[35m[22m[0m
[35m[22mValidating your flow...[K[0m[35m[22m[0m
[32m[1m    The graph looks good![K[0m[32m[1m[0m
[35m[22mRunning pylint...[K[0m[35m[22m[0m
[32m[1m    Pylint is happy![K[0m[32m[1m[0m
[35m2022-03-16 11:29:33.542 [0m[1mWorkflow starting (run-id 1647390573537743):[0m
[35m2022-03-16 11:29:33.551 [0m[32m[1647390573537743/start/1 (pid 12532)] [0m[1mTask is starting.[0m
[35m2022-03-16 11:29:34.692 [0m[32m[1647390573537743/start/1 (pid 12532)] [0m[1mTask finished successfully.[0m
[35m2022-03-16 11:29:34.700 [0m[32m[1647390573537743/predict/2 (pid 12536)] [0m[1mTask is starting.[0m
[35m2022-03-16 11:29:35.849 [0m[32m[1647390573537743/predict/2 (pid 12536)] [0m[1mTask finished successfully.[0m
[35m2022-03-16 11:29:35.856 [0m[32m[1647390573537743/end/3 (pid 12540)] [0m[1mTask is star

### Deep Learning

In [14]:
%%writefile flows/local/NN_flow.py

from metaflow import FlowSpec, step, Parameter, JSONType, IncludeFile
from taxi_modules import init, MODELS, MODEL_LIBRARIES
import json


class NNFlow(FlowSpec):
    """
    train a NN
    """

    @step
    def start(self):
        """
        Load the data
        """
        from tensorflow import keras

        # the data, split between train and test sets
        (self.x_train, self.y_train), (self.x_test, self.y_test) = keras.datasets.mnist.load_data()
        self.next(self.wrangle)
        
    @step
    def wrangle(self):
        """
        massage data
        """
        import numpy as np
        from tensorflow import keras
        # Model / data parameters
        self.num_classes = 10
        self.input_shape = (28, 28, 1)

        # Scale images to the [0, 1] range
        self.x_train = self.x_train.astype("float32") / 255
        self.x_test = self.x_test.astype("float32") / 255
        # Make sure images have shape (28, 28, 1)
        self.x_train = np.expand_dims(self.x_train, -1)
        self.x_test = np.expand_dims(self.x_test, -1)

        # convert class vectors to binary class matrices
        self.y_train = keras.utils.to_categorical(self.y_train, self.num_classes)
        self.y_test = keras.utils.to_categorical(self.y_test, self.num_classes)
        
        self.next(self.build_model)


    @step
    def build_model(self):
        """
        build NN model
        """
        import tempfile
        import numpy as np
        import tensorflow as tf
        from tensorflow import keras
        from tensorflow.keras import layers

        model = keras.Sequential(
            [
                keras.Input(shape=self.input_shape),
                layers.Conv2D(32, kernel_size=(3, 3), activation="relu"),
                layers.MaxPooling2D(pool_size=(2, 2)),
                layers.Conv2D(64, kernel_size=(3, 3), activation="relu"),
                layers.MaxPooling2D(pool_size=(2, 2)),
                layers.Flatten(),
                layers.Dropout(0.5),
                layers.Dense(self.num_classes, activation="softmax"),
            ]
        )
        model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
        with tempfile.NamedTemporaryFile() as f:
            tf.keras.models.save_model(model, f.name, save_format='h5')
            self.model = f.read()
        self.next(self.train)

        
        
    @step
    def train(self):
        """
        Train the model
        """
        import tempfile
        import tensorflow as tf
        self.batch_size = 128
        self.epochs = 15
        
        with tempfile.NamedTemporaryFile() as f:
            f.write(self.model)
            f.flush()
            model =  tf.keras.models.load_model(f.name)
        model.fit(self.x_train, self.y_train, batch_size=self.batch_size, epochs=self.epochs, validation_split=0.1)
        
        self.next(self.end)
        
        
    @step
    def end(self):
        """
        End of flow, yo!
        """
        print("ClassificationFlow is all done.")


if __name__ == "__main__":
    NNFlow()

Writing flows/local/NN_flow.py


Execute the above from the command line with

```bash
python flows/local/NN_flow.py run
```

In [16]:
! python flows/local/NN_flow.py run

[35m[1mMetaflow 2.5.0[0m[35m[22m executing [0m[31m[1mNNFlow[0m[35m[22m[0m[35m[22m for [0m[31m[1muser:hba[0m[35m[22m[K[0m[35m[22m[0m
[35m[22mValidating your flow...[K[0m[35m[22m[0m
[32m[1m    The graph looks good![K[0m[32m[1m[0m
[35m[22mRunning pylint...[K[0m[35m[22m[0m
[32m[1m    Pylint is happy![K[0m[32m[1m[0m
[35m2022-03-16 11:30:40.446 [0m[1mWorkflow starting (run-id 1647390640441582):[0m
[35m2022-03-16 11:30:40.459 [0m[32m[1647390640441582/start/1 (pid 12566)] [0m[1mTask is starting.[0m
[35m2022-03-16 11:30:43.742 [0m[32m[1647390640441582/start/1 (pid 12566)] [0m[1mTask finished successfully.[0m
[35m2022-03-16 11:30:43.752 [0m[32m[1647390640441582/wrangle/2 (pid 12573)] [0m[1mTask is starting.[0m
[35m2022-03-16 11:30:48.318 [0m[32m[1647390640441582/wrangle/2 (pid 12573)] [0m[1mTask finished successfully.[0m
[35m2022-03-16 11:30:48.328 [0m[32m[1647390640441582/build_model/3 (pid 12577)] [0m[1mTask 

## Flows for the Cloud

In [1]:
%%writefile flows/cloud/rf_flow_cloud.py

from metaflow import FlowSpec, step, Parameter, JSONType, IncludeFile, card, conda, conda_base
import json





class ClassificationFlow(FlowSpec):
    """
    train a random forest
    """
    @conda(libraries={'scikit-learn':'1.0.2'}) 
    @card
    @step
    def start(self):
        """
        Load the data
        """
        #Import scikit-learn dataset library
        from sklearn import datasets

        #Load dataset
        self.iris = datasets.load_iris()
        self.X = self.iris['data']
        self.y = self.iris['target']
        self.next(self.rf_model)
        
    @conda(libraries={'scikit-learn':'1.0.2'})
    @step
    def rf_model(self):
        """
        build random forest model
        """
        from sklearn.ensemble import RandomForestClassifier
        
        self.clf = RandomForestClassifier(n_estimators=10, max_depth=None,
            min_samples_split=2, random_state=0)
        self.next(self.train)

        
    @conda(libraries={'scikit-learn':'1.0.2'})       
    @step
    def train(self):
        """
        Train the model
        """
        from sklearn.model_selection import cross_val_score
        self.scores = cross_val_score(self.clf, self.X, self.y, cv=5)
        self.next(self.end)
        
        
    @step
    def end(self):
        """
        End of flow, yo!
        """
        print("ClassificationFlow is all done.")


if __name__ == "__main__":
    ClassificationFlow()

Overwriting flows/cloud/rf_flow_cloud.py


Execute the above from the command line with

```bash
python flows/cloud/rf_flow_cloud.py --environment=conda run --with batch
```

In [3]:
! python flows/cloud/rf_flow_cloud.py --environment=conda run --with batch

[35m[1mMetaflow 2.5.0[0m[35m[22m executing [0m[31m[1mClassificationFlow[0m[35m[22m[0m[35m[22m for [0m[31m[1muser:hba[0m[35m[22m[K[0m[35m[22m[0m
[35m[22mValidating your flow...[K[0m[35m[22m[0m
[32m[1m    The graph looks good![K[0m[32m[1m[0m
[35m[22mRunning pylint...[K[0m[35m[22m[0m
[32m[1m    Pylint is happy![K[0m[32m[1m[0m
[22mBootstrapping conda environment...(this could take a few minutes)[K[0m[22m[0m
[35m2022-03-16 11:40:52.815 [0m[1mWorkflow starting (run-id 7224):[0m
[35m2022-03-16 11:41:03.883 [0m[32m[7224/start/135700 (pid 13067)] [0m[1mTask is starting.[0m
[35m2022-03-16 11:41:06.449 [0m[32m[7224/start/135700 (pid 13067)] [0m[22m[37af24cc-085b-43b9-8e1f-045c3e8fc0ba] Task is starting (status SUBMITTED)...[0m
[35m2022-03-16 11:41:10.704 [0m[32m[7224/start/135700 (pid 13067)] [0m[22m[37af24cc-085b-43b9-8e1f-045c3e8fc0ba] Task is starting (status RUNNABLE)...[0m
[35m2022-03-16 11:41:40.735 [0m[32m[7

In [4]:
%%writefile flows/cloud/tree_branch_flow_cloud.py

from metaflow import FlowSpec, step, Parameter, JSONType, IncludeFile, card, conda, conda_base
import json





class ClassificationFlow(FlowSpec):
    """
    train multiple tree based methods
    """
    @conda(libraries={'scikit-learn':'1.0.2'}) 
    @card 
    @step
    def start(self):
        """
        Load the data
        """
        #Import scikit-learn dataset library
        from sklearn import datasets

        #Load dataset
        self.iris = datasets.load_iris()
        self.X = self.iris['data']
        self.y = self.iris['target']
        self.next(self.rf_model, self.xt_model, self.dt_model)
    
    @conda(libraries={'scikit-learn':'1.0.2'})             
    @step
    def rf_model(self):
        """
        build random forest model
        """
        from sklearn.ensemble import RandomForestClassifier
        from sklearn.model_selection import cross_val_score
        
        self.clf = RandomForestClassifier(n_estimators=10, max_depth=None,
            min_samples_split=2, random_state=0)
        self.scores = cross_val_score(self.clf, self.X, self.y, cv=5)
        self.next(self.choose_model)
    
    @conda(libraries={'scikit-learn':'1.0.2'}) 
    @step
    def xt_model(self):
        """
        build extra trees classifier
        """
        from sklearn.ensemble import ExtraTreesClassifier
        from sklearn.model_selection import cross_val_score
        

        self.clf = ExtraTreesClassifier(n_estimators=10, max_depth=None,
            min_samples_split=2, random_state=0)

        self.scores = cross_val_score(self.clf, self.X, self.y, cv=5)
        self.next(self.choose_model)
    
    @conda(libraries={'scikit-learn':'1.0.2'}) 
    @step
    def dt_model(self):
        """
        build decision tree classifier
        """
        from sklearn.tree import DecisionTreeClassifier
        from sklearn.model_selection import cross_val_score
        
        self.clf = DecisionTreeClassifier(max_depth=None, min_samples_split=2,
            random_state=0)

        self.scores = cross_val_score(self.clf, self.X, self.y, cv=5)

        self.next(self.choose_model)

    @conda(libraries={'scikit-learn':'1.0.2'})                         
    @step
    def choose_model(self, inputs):
        """
        find 'best' model
        """
        import numpy as np

        def score(inp):
            return inp.clf,\
                   np.mean(inp.scores)

            
        self.results = sorted(map(score, inputs), key=lambda x: -x[1]) 
        self.model = self.results[0][0]
        self.next(self.end)

    @conda(libraries={'scikit-learn':'1.0.2'})         
    @step
    def end(self):
        """
        End of flow, yo!
        """
        print('Scores:')
        print('\n'.join('%s %f' % res for res in self.results))


if __name__ == "__main__":
    ClassificationFlow()

Writing flows/cloud/tree_branch_flow_cloud.py


Execute the above from the command line with

```bash
python flows/cloud/tree_branch_flow_cloud.py --environment=conda run --with batch
```

In [5]:
! python flows/cloud/tree_branch_flow_cloud.py --environment=conda run --with batch

[35m[1mMetaflow 2.5.0[0m[35m[22m executing [0m[31m[1mClassificationFlow[0m[35m[22m[0m[35m[22m for [0m[31m[1muser:hba[0m[35m[22m[K[0m[35m[22m[0m
[35m[22mValidating your flow...[K[0m[35m[22m[0m
[32m[1m    The graph looks good![K[0m[32m[1m[0m
[35m[22mRunning pylint...[K[0m[35m[22m[0m
[32m[1m    Pylint is happy![K[0m[32m[1m[0m
[22mBootstrapping conda environment...(this could take a few minutes)[K[0m[22m[0m
[35m2022-03-16 11:50:02.709 [0m[1mWorkflow starting (run-id 7225):[0m
[35m2022-03-16 11:50:14.018 [0m[32m[7225/start/135705 (pid 13733)] [0m[1mTask is starting.[0m
[35m2022-03-16 11:50:16.538 [0m[32m[7225/start/135705 (pid 13733)] [0m[22m[86b488ca-24a6-4219-8738-9782818ed21b] Task is starting (status SUBMITTED)...[0m
[35m2022-03-16 11:50:18.644 [0m[32m[7225/start/135705 (pid 13733)] [0m[22m[86b488ca-24a6-4219-8738-9782818ed21b] Task is starting (status RUNNABLE)...[0m
[35m2022-03-16 11:50:20.766 [0m[32m[7