## Lesson 3: Writing Flows for the Cloud

## Learning objectives of this lesson

* How to burst to the cloud with you ML workflows!
* How to use `--with batch` to access cloud compute.
* How to define step-level dependencies using the `@conda` decorator.

In this lesson, we'll show how to get Metaflow working on the cloud for when you need to access more compute, for example. We'll be using AWS for the purposes of this lesson. I've configured my AWS so that I can access it from this Jupyter notebook. 

To reproduce this, you'll require access to compute and storage resources on AWS, which can be configured by
- Following the instructions [here](https://outerbounds.com/docs/admin/metaflow-on-aws/deployment-guide) or
- [Requesting a sandbox](https://docs.metaflow.org/metaflow-on-aws/metaflow-sandbox).



## Random Forest flows on the cloud

In this section, we'll get our random forest flows up and running on AWS:

In [1]:
%%writefile ../flows/cloud/rf_flow_cloud.py

from metaflow import FlowSpec, step, card, conda
import json





class ClassificationFlow(FlowSpec):
    """
    train a random forest
    """
    @conda(libraries={'scikit-learn':'1.0.2'}) 
    @card
    @step
    def start(self):
        """
        Load the data
        """
        #Import scikit-learn dataset library
        from sklearn import datasets

        #Load dataset
        self.iris = datasets.load_iris()
        self.X = self.iris['data']
        self.y = self.iris['target']
        self.next(self.rf_model)
        
    @conda(libraries={'scikit-learn':'1.0.2'})
    @step
    def rf_model(self):
        """
        build random forest model
        """
        from sklearn.ensemble import RandomForestClassifier
        
        self.clf = RandomForestClassifier(n_estimators=10, max_depth=None,
            min_samples_split=2, random_state=0)
        self.next(self.train)

        
    @conda(libraries={'scikit-learn':'1.0.2'})       
    @step
    def train(self):
        """
        Train the model
        """
        from sklearn.model_selection import cross_val_score
        self.scores = cross_val_score(self.clf, self.X, self.y, cv=5)
        self.next(self.end)
        
        
    @step
    def end(self):
        """
        End of flow!
        """
        print("ClassificationFlow is all done.")


if __name__ == "__main__":
    ClassificationFlow()

Overwriting ../flows/cloud/rf_flow_cloud.py


Execute the above from the command line with

```bash
python flows/cloud/rf_flow_cloud.py --environment=conda run --with batch
```

While this is executing, let's talk about the differences in the code we just wrote, in particular

- the `@conda` decorator and
- the `--with batch` option

Let's also check out the Metaflow card with:

```bash
python flows/cloud/rf_flow_cloud.py --environment=conda card view start
```

## Parallel training on the cloud

We'll also get our parallel training/branching example working on AWS:

In [2]:
%%writefile ../flows/cloud/tree_branch_flow_cloud.py

from metaflow import FlowSpec, step, card, conda
import json





class ClassificationFlow(FlowSpec):
    """
    train multiple tree based methods
    """
    @conda(libraries={'scikit-learn':'1.0.2'}) 
    @card 
    @step
    def start(self):
        """
        Load the data
        """
        #Import scikit-learn dataset library
        from sklearn import datasets

        #Load dataset
        self.iris = datasets.load_iris()
        self.X = self.iris['data']
        self.y = self.iris['target']
        self.next(self.rf_model, self.xt_model, self.dt_model)
    
    @conda(libraries={'scikit-learn':'1.0.2'})             
    @step
    def rf_model(self):
        """
        build random forest model
        """
        from sklearn.ensemble import RandomForestClassifier
        from sklearn.model_selection import cross_val_score
        
        self.clf = RandomForestClassifier(n_estimators=10, max_depth=None,
            min_samples_split=2, random_state=0)
        self.scores = cross_val_score(self.clf, self.X, self.y, cv=5)
        self.next(self.choose_model)
    
    @conda(libraries={'scikit-learn':'1.0.2'}) 
    @step
    def xt_model(self):
        """
        build extra trees classifier
        """
        from sklearn.ensemble import ExtraTreesClassifier
        from sklearn.model_selection import cross_val_score
        

        self.clf = ExtraTreesClassifier(n_estimators=10, max_depth=None,
            min_samples_split=2, random_state=0)

        self.scores = cross_val_score(self.clf, self.X, self.y, cv=5)
        self.next(self.choose_model)
    
    @conda(libraries={'scikit-learn':'1.0.2'}) 
    @step
    def dt_model(self):
        """
        build decision tree classifier
        """
        from sklearn.tree import DecisionTreeClassifier
        from sklearn.model_selection import cross_val_score
        
        self.clf = DecisionTreeClassifier(max_depth=None, min_samples_split=2,
            random_state=0)

        self.scores = cross_val_score(self.clf, self.X, self.y, cv=5)

        self.next(self.choose_model)

    @conda(libraries={'scikit-learn':'1.0.2'})                         
    @step
    def choose_model(self, inputs):
        """
        find 'best' model
        """
        import numpy as np

        def score(inp):
            return inp.clf,\
                   np.mean(inp.scores)

            
        self.results = sorted(map(score, inputs), key=lambda x: -x[1]) 
        self.model = self.results[0][0]
        self.next(self.end)

    @conda(libraries={'scikit-learn':'1.0.2'})         
    @step
    def end(self):
        """
        End of flow, yo!
        """
        print('Scores:')
        print('\n'.join('%s %f' % res for res in self.results))


if __name__ == "__main__":
    ClassificationFlow()

Overwriting ../flows/cloud/tree_branch_flow_cloud.py


Execute the above from the command line with

```bash
python flows/cloud/tree_branch_flow_cloud.py --environment=conda run --with batch
```

Let's also check out the Metaflow card:

```bash
python flows/cloud/tree_branch_flow_cloud.py --environment=conda card view start
```

## Lesson Recap

In this lesson, you learnt

* How to burst to the cloud with you ML workflows!
* How to use `--with batch` to access cloud compute.
* How to define step-level dependencies using the `@conda` decorator.