## Writing Flows for the Cloud

In this section, we'll show how to get Metaflow working on the cloud for when you need to access more compute, for example. We'll be using AWS for the purposes of this lesson. I've configured my AWS so that I can access it from this Jupyter notebook. To reproduce this, you'll need access to AWS instances

**to do:** provide link to instructions on how to do this OR let them know about the Outerbounds sandbox ;)

_note_: there's lots of stuff to explain here, e.g. `conda` decorator etc...

### Random Forest flows on the cloud

In this section, we'll get our random forest flows up and running on AWS:

In [1]:
%%writefile ../flows/cloud/rf_flow_cloud.py

from metaflow import FlowSpec, step, Parameter, JSONType, IncludeFile, card, conda, conda_base
import json





class ClassificationFlow(FlowSpec):
    """
    train a random forest
    """
    @conda(libraries={'scikit-learn':'1.0.2'}) 
    @card
    @step
    def start(self):
        """
        Load the data
        """
        #Import scikit-learn dataset library
        from sklearn import datasets

        #Load dataset
        self.iris = datasets.load_iris()
        self.X = self.iris['data']
        self.y = self.iris['target']
        self.next(self.rf_model)
        
    @conda(libraries={'scikit-learn':'1.0.2'})
    @step
    def rf_model(self):
        """
        build random forest model
        """
        from sklearn.ensemble import RandomForestClassifier
        
        self.clf = RandomForestClassifier(n_estimators=10, max_depth=None,
            min_samples_split=2, random_state=0)
        self.next(self.train)

        
    @conda(libraries={'scikit-learn':'1.0.2'})       
    @step
    def train(self):
        """
        Train the model
        """
        from sklearn.model_selection import cross_val_score
        self.scores = cross_val_score(self.clf, self.X, self.y, cv=5)
        self.next(self.end)
        
        
    @step
    def end(self):
        """
        End of flow, yo!
        """
        print("ClassificationFlow is all done.")


if __name__ == "__main__":
    ClassificationFlow()

Overwriting ../flows/cloud/rf_flow_cloud.py


Execute the above from the command line with

```bash
python flows/cloud/rf_flow_cloud.py --environment=conda run --with batch
```

In [2]:
! python ../flows/cloud/rf_flow_cloud.py --environment=conda run --with batch

[35m[1mMetaflow 2.5.0[0m[35m[22m executing [0m[31m[1mClassificationFlow[0m[35m[22m[0m[35m[22m for [0m[31m[1muser:hba[0m[35m[22m[K[0m[35m[22m[0m
[35m[22mValidating your flow...[K[0m[35m[22m[0m
[32m[1m    The graph looks good![K[0m[32m[1m[0m
[35m[22mRunning pylint...[K[0m[35m[22m[0m
[32m[1m    Pylint is happy![K[0m[32m[1m[0m
[22mBootstrapping conda environment...(this could take a few minutes)[K[0m[22m[0m
[35m2022-03-16 18:15:14.489 [0m[1mWorkflow starting (run-id 7244):[0m
[35m2022-03-16 18:15:21.130 [0m[32m[7244/start/135808 (pid 20305)] [0m[1mTask is starting.[0m
[35m2022-03-16 18:15:23.826 [0m[32m[7244/start/135808 (pid 20305)] [0m[22m[ede20ec9-8453-47ea-aca9-538ceca0a7e8] Task is starting (status SUBMITTED)...[0m
[35m2022-03-16 18:15:28.023 [0m[32m[7244/start/135808 (pid 20305)] [0m[22m[ede20ec9-8453-47ea-aca9-538ceca0a7e8] Task is starting (status RUNNABLE)...[0m
[35m2022-03-16 18:15:58.040 [0m[32m[7

Let's also check out the Metaflow card:

In [3]:
! python ../flows/cloud/rf_flow_cloud.py --environment=conda card view start

[35m[1mMetaflow 2.5.0[0m[35m[22m executing [0m[31m[1mClassificationFlow[0m[35m[22m[0m[35m[22m for [0m[31m[1muser:hba[0m[35m[22m[K[0m[35m[22m[0m
[32m[22mResolving card: ClassificationFlow/7246/start/135822[K[0m[32m[22m[0m


We'll also get our branching example working on AWS:

In [4]:
%%writefile ../flows/cloud/tree_branch_flow_cloud.py

from metaflow import FlowSpec, step, Parameter, JSONType, IncludeFile, card, conda, conda_base
import json





class ClassificationFlow(FlowSpec):
    """
    train multiple tree based methods
    """
    @conda(libraries={'scikit-learn':'1.0.2'}) 
    @card 
    @step
    def start(self):
        """
        Load the data
        """
        #Import scikit-learn dataset library
        from sklearn import datasets

        #Load dataset
        self.iris = datasets.load_iris()
        self.X = self.iris['data']
        self.y = self.iris['target']
        self.next(self.rf_model, self.xt_model, self.dt_model)
    
    @conda(libraries={'scikit-learn':'1.0.2'})             
    @step
    def rf_model(self):
        """
        build random forest model
        """
        from sklearn.ensemble import RandomForestClassifier
        from sklearn.model_selection import cross_val_score
        
        self.clf = RandomForestClassifier(n_estimators=10, max_depth=None,
            min_samples_split=2, random_state=0)
        self.scores = cross_val_score(self.clf, self.X, self.y, cv=5)
        self.next(self.choose_model)
    
    @conda(libraries={'scikit-learn':'1.0.2'}) 
    @step
    def xt_model(self):
        """
        build extra trees classifier
        """
        from sklearn.ensemble import ExtraTreesClassifier
        from sklearn.model_selection import cross_val_score
        

        self.clf = ExtraTreesClassifier(n_estimators=10, max_depth=None,
            min_samples_split=2, random_state=0)

        self.scores = cross_val_score(self.clf, self.X, self.y, cv=5)
        self.next(self.choose_model)
    
    @conda(libraries={'scikit-learn':'1.0.2'}) 
    @step
    def dt_model(self):
        """
        build decision tree classifier
        """
        from sklearn.tree import DecisionTreeClassifier
        from sklearn.model_selection import cross_val_score
        
        self.clf = DecisionTreeClassifier(max_depth=None, min_samples_split=2,
            random_state=0)

        self.scores = cross_val_score(self.clf, self.X, self.y, cv=5)

        self.next(self.choose_model)

    @conda(libraries={'scikit-learn':'1.0.2'})                         
    @step
    def choose_model(self, inputs):
        """
        find 'best' model
        """
        import numpy as np

        def score(inp):
            return inp.clf,\
                   np.mean(inp.scores)

            
        self.results = sorted(map(score, inputs), key=lambda x: -x[1]) 
        self.model = self.results[0][0]
        self.next(self.end)

    @conda(libraries={'scikit-learn':'1.0.2'})         
    @step
    def end(self):
        """
        End of flow, yo!
        """
        print('Scores:')
        print('\n'.join('%s %f' % res for res in self.results))


if __name__ == "__main__":
    ClassificationFlow()

Overwriting ../flows/cloud/tree_branch_flow_cloud.py


Execute the above from the command line with

```bash
python flows/cloud/tree_branch_flow_cloud.py --environment=conda run --with batch
```

In [5]:
! python ../flows/cloud/tree_branch_flow_cloud.py --environment=conda run --with batch

[35m[1mMetaflow 2.5.0[0m[35m[22m executing [0m[31m[1mClassificationFlow[0m[35m[22m[0m[35m[22m for [0m[31m[1muser:hba[0m[35m[22m[K[0m[35m[22m[0m
[35m[22mValidating your flow...[K[0m[35m[22m[0m
[32m[1m    The graph looks good![K[0m[32m[1m[0m
[35m[22mRunning pylint...[K[0m[35m[22m[0m
[32m[1m    Pylint is happy![K[0m[32m[1m[0m
[22mBootstrapping conda environment...(this could take a few minutes)[K[0m[22m[0m
[35m2022-03-16 18:25:09.372 [0m[1mWorkflow starting (run-id 7248):[0m
[35m2022-03-16 18:25:15.966 [0m[32m[7248/start/135835 (pid 20541)] [0m[1mTask is starting.[0m
[35m2022-03-16 18:25:18.510 [0m[32m[7248/start/135835 (pid 20541)] [0m[22m[5a84b95f-3876-45b8-b5ff-59f2a6b16577] Task is starting (status SUBMITTED)...[0m
[35m2022-03-16 18:25:23.821 [0m[32m[7248/start/135835 (pid 20541)] [0m[22m[5a84b95f-3876-45b8-b5ff-59f2a6b16577] Task is starting (status RUNNABLE)...[0m
[35m2022-03-16 18:25:27.012 [0m[32m[7

Let's also check out the Metaflow card:

In [6]:
! python ../flows/cloud/tree_branch_flow_cloud.py --environment=conda card view start

[35m[1mMetaflow 2.5.0[0m[35m[22m executing [0m[31m[1mClassificationFlow[0m[35m[22m[0m[35m[22m for [0m[31m[1muser:hba[0m[35m[22m[K[0m[35m[22m[0m
[32m[22mResolving card: ClassificationFlow/7248/start/135835[K[0m[32m[22m[0m
