## Lesson 3: Writing Flows for the Cloud

## Learning objectives of this lesson

* How to burst to the cloud with you ML workflows!
* How to use `--with batch` to access cloud compute.
* How to define step-level dependencies using the `@conda` decorator.

In this lesson, we'll show how to get Metaflow working on the cloud for when you need to access more compute, for example. We'll be using AWS for the purposes of this lesson. I've configured my AWS so that I can access it from this Jupyter notebook. 


To reproduce this, you'll require access to compute and storage resources on AWS, which you'll need to configure. A few ways to do this are:
- If you have an existing account, there’s a ready-made recipe, so called CloudFormation Template, that you can execute on your account. Go to [docs.metaflow.org](docs.metaflow.org) to learn more.
- If you don’t have an account (or you can’t use your company account, say), you can request a Metaflow Sandbox [here](https://docs.metaflow.org/metaflow-on-aws/metaflow-sandbox), which gives you a full AWS test environment for free to test with
- If you have any trouble with this, the [Metaflow community Slack](https://outerbounds-community.slack.com) is super friendly helpful for newcomers. You can join and ask for help!




## Why the Cloud?

If you're asking "Why should I care about running ML on AWS?", that's a great question. One answer is scalability: It’s like a huge massive laptop. And not only that, but infinitely many huge massive laptops at your fingertips!

Have you ever had a Pandas dataframe that runs out of memory because maybe you have 16GB on your laptop. Sure, you could restructure your code to do it differently, or use a smaller dataset, but there’s a better way. What if I told you that you could press a button and get more memory installed on your laptop in seconds - that’s essentially what the cloud gives you!

On top of this, a cloud-based workstation can pay big dividends when it comes to security, operational concerns, scalability, and interaction with production deployments.

## Random Forest flows on the cloud

In this section, we'll get our random forest flows up and running on AWS:

In [None]:
%%writefile ../flows/cloud/rf_flow_cloud.py

from metaflow import FlowSpec, step, card, conda
import json





class RF_Flow_cloud(FlowSpec):
    """
    train a random forest
    """
    @conda(libraries={'scikit-learn':'1.0.2'}) 
    @card
    @step
    def start(self):
        """
        Load the data
        """
        #Import scikit-learn dataset library
        from sklearn import datasets

        #Load dataset
        self.iris = datasets.load_iris()
        self.X = self.iris['data']
        self.y = self.iris['target']
        self.next(self.rf_model)
        
    @conda(libraries={'scikit-learn':'1.0.2'})
    @step
    def rf_model(self):
        """
        build random forest model
        """
        from sklearn.ensemble import RandomForestClassifier
        
        self.clf = RandomForestClassifier(n_estimators=10, max_depth=None,
            min_samples_split=2, random_state=0)
        self.next(self.train)

        
    @conda(libraries={'scikit-learn':'1.0.2'})       
    @step
    def train(self):
        """
        Train the model
        """
        from sklearn.model_selection import cross_val_score
        self.scores = cross_val_score(self.clf, self.X, self.y, cv=5)
        self.next(self.end)
        
        
    @step
    def end(self):
        """
        End of flow!
        """
        print("ClassificationFlow is all done.")


if __name__ == "__main__":
    RF_Flow_cloud()

Execute the above from the command line with

```bash
python flows/cloud/rf_flow_cloud.py --environment=conda run --with batch
```

While this is executing, let's talk about the differences in the code we just wrote, in particular

- the `@conda` decorator and
- the `--with batch` option

Also note that [you can use the `@batch` decorator to](https://docs.metaflow.org/metaflow/scaling#using-aws-batch-selectively-with-batch-decorator) to "selectively run some steps locally and some on AWS Batch."

Let's also check out the Metaflow card with:

```bash
python flows/cloud/rf_flow_cloud.py --environment=conda card view start
```

## Parallel training on the cloud

**HANDS-ON:** Write a flow that gets our parallel training/branching example from Lesson 2 working:

In [None]:
%%writefile ../flows/cloud/tree_branch_flow_cloud_student.py

from metaflow import FlowSpec, step, card, conda
import json





class Branch_Flow_Cloud(FlowSpec):
    """
    train multiple tree based methods
    """
    ____

if __name__ == "__main__":
    Branch_Flow_Cloud()

Execute the above from the command line with

```bash
python flows/cloud/tree_branch_flow_cloud.py --environment=conda run --with batch
```

Let's also check out the Metaflow card:

```bash
python flows/cloud/tree_branch_flow_cloud.py --environment=conda card view start
```

## Lesson Recap

In this lesson, you learnt

* How to burst to the cloud with you ML workflows!
* How to use `--with batch` to access cloud compute.
* How to define step-level dependencies using the `@conda` decorator.

Ready to take it to the next level with more powerful hardware? Check out this guide on [scaling model training to GPUs while tuning many models in parallel](https://outerbounds.com/docs/scale-model-training-and-tuning).