# Metaflow and the MLOps ecosystem

_Human-centricity_ is a foundational principle of Metaflow. As a result, MF strives to be compatible with all the other ML tools that you already use (and ones you may want to use!). In this lesson, we'll show how to incorporate 2 _types of tools_, those for 
* experiment tracking and
* data validation.

We'll be using Weights & Biases for the former and Great Expectations for the latter, but keep in mind that Metaflow is agnostic with respect to the other tools you use. Let's jump in:

## Experiment Tracking

[TO-DO: provide brief intro to experiment tracking]

Note that I've already logged into wandb using my terminal. 

[TO DO: include instructions on this, or a link, or instructions on putting credentials as env vars]

In [1]:
%%writefile ../flows/rf_flow_monitor.py
from metaflow import FlowSpec, step, Parameter, JSONType, IncludeFile, card
import json

class ClassificationFlow(FlowSpec):
    """
    train a random forest
    """
    @card 
    @step
    def start(self):
        """
        Load the data
        """
        #Import scikit-learn dataset library
        from sklearn import datasets
        from sklearn.model_selection import train_test_split

        #Load dataset
        self.iris = datasets.load_iris()
        self.X = self.iris['data']
        self.y = self.iris['target']
        self.labels = self.iris['target_names']

        self.X_train, self.X_test, self.y_train, self.y_test = train_test_split(self.X, self.y, test_size=0.2)
        self.next(self.rf_model)
        

    @step
    def rf_model(self):
        """
        build random forest model
        """
        from sklearn.ensemble import RandomForestClassifier
        
        
        self.clf = RandomForestClassifier(n_estimators=10, max_depth=None,
            min_samples_split=2, random_state=0)
        self.next(self.train)

        
        
    @step
    def train(self):
        """
        Train the model
        """
        import wandb
        from sklearn.model_selection import cross_val_score
        self.clf.fit(self.X_train, self.y_train)
        self.y_pred = self.clf.predict(self.X_test)
        self.y_probs = self.clf.predict_proba(self.X_test)
        self.next(self.monitor)
        

    
        
    @step
    def monitor(self):
        """
        plot some things using an experiment tracker
        
        """
        import wandb
        wandb.init(project="mf-rf-wandb", entity="hugobowne", name="mf-tutorial-iris")

        wandb.sklearn.plot_class_proportions(self.y_train, self.y_test, self.labels)
        wandb.sklearn.plot_learning_curve(self.clf, self.X_train, self.y_train)
        wandb.sklearn.plot_roc(self.y_test, self.y_probs, self.labels)
        wandb.sklearn.plot_precision_recall(self.y_test, self.y_probs, self.labels)
        wandb.sklearn.plot_feature_importances(self.clf)

        wandb.sklearn.plot_classifier(self.clf, 
                              self.X_train, self.X_test, 
                              self.y_train, self.y_test, 
                              self.y_pred, self.y_probs, 
                              self.labels, 
                              is_binary=True, 
                              model_name='RandomForest')

        wandb.finish()
        self.next(self.end)
    
    @step
    def end(self):
        """
        End of flow, yo!
        """
        print("ClassificationFlow is all done.")


if __name__ == "__main__":
    ClassificationFlow()


Overwriting ../flows/rf_flow_monitor.py


Execute the above from the command line with

```bash
! python ../flows/rf_flow_monitor.py run
```

In [2]:
! python ../flows/rf_flow_monitor.py run

[35m[1mMetaflow 2.5.0[0m[35m[22m executing [0m[31m[1mClassificationFlow[0m[35m[22m[0m[35m[22m for [0m[31m[1muser:hba[0m[35m[22m[K[0m[35m[22m[0m
[35m[22mValidating your flow...[K[0m[35m[22m[0m
[32m[1m    The graph looks good![K[0m[32m[1m[0m
[35m[22mRunning pylint...[K[0m[35m[22m[0m
[32m[1m    Pylint is happy![K[0m[32m[1m[0m
[35m2022-03-26 11:10:43.293 [0m[1mWorkflow starting (run-id 7824):[0m
[35m2022-03-26 11:10:49.260 [0m[32m[7824/start/138382 (pid 3358)] [0m[1mTask is starting.[0m
[35m2022-03-26 11:11:34.641 [0m[32m[7824/start/138382 (pid 3358)] [0m[1mTask finished successfully.[0m
[35m2022-03-26 11:11:38.541 [0m[32m[7824/rf_model/138383 (pid 3405)] [0m[1mTask is starting.[0m
[35m2022-03-26 11:11:55.406 [0m[32m[7824/rf_model/138383 (pid 3405)] [0m[1mTask finished successfully.[0m
[35m2022-03-26 11:11:59.280 [0m[32m[7824/train/138384 (pid 3414)] [0m[1mTask is starting.[0m
[35m2022-03-26 11:12:21.44

In [3]:
import wandb
%wandb hugobowne/mf-rf-wandb

## Data Validation

**NOTE:** THE REST OF THIS NB IS PRETTY MUCH JUST CODE. WILL WRITE MORE WORDS SOON~ -- HBA

```
@step
def data_validation(self):
    """
    Perform data validation with great_expectations
    """
    from data_validation import validate_data

    validate_data(current.run_id, current.flow_name, self.data_paths)

    self.next(...)
```

In [4]:
%%writefile ../flows/iris_validate.py

from metaflow import FlowSpec, step, Parameter, JSONType, IncludeFile, card
import json

class ClassificationFlow(FlowSpec):
    """
    train a random forest
    """
    @card 
    @step
    def start(self):
        """
        Load the data
        """
        #Import scikit-learn dataset library
        from sklearn import datasets
        from sklearn.model_selection import train_test_split

        #Load dataset
        self.iris = datasets.load_iris()
        self.X = self.iris['data']
        self.y = self.iris['target']
        self.labels = self.iris['target_names']

        self.X_train, self.X_test, self.y_train, self.y_test = train_test_split(self.X, self.y, test_size=0.2)
        self.next(self.data_validation)
        


    @step
    def data_validation(self):
        """
        Perform data validation with great_expectations
        """
        import pandas as pd
        from ruamel import yaml
        import great_expectations as ge
        from great_expectations.core.batch import RuntimeBatchRequest

        context = ge.get_context()

        
        from sklearn import datasets
        iris = datasets.load_iris()
        df = pd.DataFrame(data=iris['data'], columns=iris['feature_names'])
        df["target"] = iris['target']
        # df["sepal length (cm)"][0] = -1


        checkpoint_config = {
            "name": "flowers-test-flow-checkpoint",
            "config_version": 1,
            "class_name": "SimpleCheckpoint",
            "run_name_template": "%Y%m%d-%H%M%S-flower-power",
            "validations": [
                {
                    "batch_request": {
                        "datasource_name": "flowers",
                        "data_connector_name": "default_runtime_data_connector_name",
                        "data_asset_name": "iris",
                    },
                    "expectation_suite_name": "flowers-testing-suite",
                }
            ],
        }
        context.add_checkpoint(**checkpoint_config)


        results = context.run_checkpoint(
            checkpoint_name="flowers-test-flow-checkpoint",
            batch_request={
                "runtime_parameters": {"batch_data": df},
                "batch_identifiers": {
                    "default_identifier_name": "<YOUR MEANINGFUL IDENTIFIER>"
                },
            },
        )
        context.build_data_docs()
        context.open_data_docs()

        self.next(self.end)
        
        
#     @step
#     def rf_model(self):
#         """
#         build random forest model
#         """
#         from sklearn.ensemble import RandomForestClassifier
        
        
#         self.clf = RandomForestClassifier(n_estimators=10, max_depth=None,
#             min_samples_split=2, random_state=0)
#         self.next(self.train)

        
        
#     @step
#     def train(self):
#         """
#         Train the model
#         """
#         import wandb
#         from sklearn.model_selection import cross_val_score
#         self.clf.fit(self.X_train, self.y_train)
#         self.y_pred = self.clf.predict(self.X_test)
#         self.y_probs = self.clf.predict_proba(self.X_test)
#         self.next(self.monitor)
        

    
        
#     @step
#     def monitor(self):
#         """
#         plot some things using an experiment tracker
        
#         """
#         import wandb
#         wandb.init(project="mf-rf-wandb", entity="hugobowne", name="mf-tutorial-iris")

#         wandb.sklearn.plot_class_proportions(self.y_train, self.y_test, self.labels)
#         wandb.sklearn.plot_learning_curve(self.clf, self.X_train, self.y_train)
#         wandb.sklearn.plot_roc(self.y_test, self.y_probs, self.labels)
#         wandb.sklearn.plot_precision_recall(self.y_test, self.y_probs, self.labels)
#         wandb.sklearn.plot_feature_importances(self.clf)

#         wandb.sklearn.plot_classifier(self.clf, 
#                               self.X_train, self.X_test, 
#                               self.y_train, self.y_test, 
#                               self.y_pred, self.y_probs, 
#                               self.labels, 
#                               is_binary=True, 
#                               model_name='RandomForest')

#         wandb.finish()
#         self.next(self.end)
    
    @step
    def end(self):
        """
        End of flow, yo!
        """
        print("ClassificationFlow is all done.")


if __name__ == "__main__":
    ClassificationFlow()


Overwriting ../flows/iris_validate.py


In [5]:
! python ../flows/iris_validate.py run

[35m[1mMetaflow 2.5.0[0m[35m[22m executing [0m[31m[1mClassificationFlow[0m[35m[22m[0m[35m[22m for [0m[31m[1muser:hba[0m[35m[22m[K[0m[35m[22m[0m
[35m[22mValidating your flow...[K[0m[35m[22m[0m
[32m[1m    The graph looks good![K[0m[32m[1m[0m
[35m[22mRunning pylint...[K[0m[35m[22m[0m
[32m[1m    Pylint is happy![K[0m[32m[1m[0m
[35m2022-03-26 11:14:54.195 [0m[1mWorkflow starting (run-id 7825):[0m
[35m2022-03-26 11:15:00.017 [0m[32m[7825/start/138388 (pid 3495)] [0m[1mTask is starting.[0m
[35m2022-03-26 11:15:44.655 [0m[32m[7825/start/138388 (pid 3495)] [0m[1mTask finished successfully.[0m
[35m2022-03-26 11:15:48.437 [0m[32m[7825/data_validation/138389 (pid 3522)] [0m[1mTask is starting.[0m
[35m2022-03-26 11:15:56.744 [0m[32m[7825/data_validation/138389 (pid 3522)] [0m[22mDeprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations[0m
[35m2022-03-26 1

## Combination station!!

In [6]:
%%writefile ../flows/rf_flow_monitor_validate.py

from metaflow import FlowSpec, step, Parameter, JSONType, IncludeFile, card
import json

class ClassificationFlow(FlowSpec):
    """
    train a random forest
    """
    @card 
    @step
    def start(self):
        """
        Load the data
        """
        #Import scikit-learn dataset library
        from sklearn import datasets
        from sklearn.model_selection import train_test_split

        #Load dataset
        self.iris = datasets.load_iris()
        self.X = self.iris['data']
        self.y = self.iris['target']
        self.labels = self.iris['target_names']

        self.X_train, self.X_test, self.y_train, self.y_test = train_test_split(self.X, self.y, test_size=0.2)
        self.next(self.data_validation)
        

    @step
    def data_validation(self):
        """
        Perform data validation with great_expectations
        """
        import pandas as pd
        from ruamel import yaml
        import great_expectations as ge
        from great_expectations.core.batch import RuntimeBatchRequest

        context = ge.get_context()

        
        from sklearn import datasets
        iris = datasets.load_iris()
        df = pd.DataFrame(data=iris['data'], columns=iris['feature_names'])
        df["target"] = iris['target']
        # df["sepal length (cm)"][0] = -1


        checkpoint_config = {
            "name": "flowers-test-flow-checkpoint",
            "config_version": 1,
            "class_name": "SimpleCheckpoint",
            "run_name_template": "%Y%m%d-%H%M%S-flower-power",
            "validations": [
                {
                    "batch_request": {
                        "datasource_name": "flowers",
                        "data_connector_name": "default_runtime_data_connector_name",
                        "data_asset_name": "iris",
                    },
                    "expectation_suite_name": "flowers-testing-suite",
                }
            ],
        }
        context.add_checkpoint(**checkpoint_config)


        results = context.run_checkpoint(
            checkpoint_name="flowers-test-flow-checkpoint",
            batch_request={
                "runtime_parameters": {"batch_data": df},
                "batch_identifiers": {
                    "default_identifier_name": "<YOUR MEANINGFUL IDENTIFIER>"
                },
            },
        )
        context.build_data_docs()
        context.open_data_docs()

        self.next(self.rf_model)
        
        
    @step
    def rf_model(self):
        """
        build random forest model
        """
        from sklearn.ensemble import RandomForestClassifier
        
        
        self.clf = RandomForestClassifier(n_estimators=10, max_depth=None,
            min_samples_split=2, random_state=0)
        self.next(self.train)

        
        
    @step
    def train(self):
        """
        Train the model
        """
        import wandb
        from sklearn.model_selection import cross_val_score
        self.clf.fit(self.X_train, self.y_train)
        self.y_pred = self.clf.predict(self.X_test)
        self.y_probs = self.clf.predict_proba(self.X_test)
        self.next(self.monitor)
        

    
        
    @step
    def monitor(self):
        """
        plot some things using an experiment tracker
        
        """
        import wandb
        wandb.init(project="mf-rf-wandb", entity="hugobowne", name="mf-tutorial-iris")

        wandb.sklearn.plot_class_proportions(self.y_train, self.y_test, self.labels)
        wandb.sklearn.plot_learning_curve(self.clf, self.X_train, self.y_train)
        wandb.sklearn.plot_roc(self.y_test, self.y_probs, self.labels)
        wandb.sklearn.plot_precision_recall(self.y_test, self.y_probs, self.labels)
        wandb.sklearn.plot_feature_importances(self.clf)

        wandb.sklearn.plot_classifier(self.clf, 
                              self.X_train, self.X_test, 
                              self.y_train, self.y_test, 
                              self.y_pred, self.y_probs, 
                              self.labels, 
                              is_binary=True, 
                              model_name='RandomForest')

        wandb.finish()
        self.next(self.end)
    
    @step
    def end(self):
        """
        End of flow, yo!
        """
        print("ClassificationFlow is all done.")


if __name__ == "__main__":
    ClassificationFlow()


Overwriting ../flows/rf_flow_monitor_validate.py


In [7]:
! python ../flows/rf_flow_monitor_validate.py run

[35m[1mMetaflow 2.5.0[0m[35m[22m executing [0m[31m[1mClassificationFlow[0m[35m[22m[0m[35m[22m for [0m[31m[1muser:hba[0m[35m[22m[K[0m[35m[22m[0m
[35m[22mValidating your flow...[K[0m[35m[22m[0m
[32m[1m    The graph looks good![K[0m[32m[1m[0m
[35m[22mRunning pylint...[K[0m[35m[22m[0m
[32m[1m    Pylint is happy![K[0m[32m[1m[0m
[35m2022-03-26 11:16:45.707 [0m[1mWorkflow starting (run-id 7826):[0m
[35m2022-03-26 11:16:51.487 [0m[32m[7826/start/138392 (pid 3580)] [0m[1mTask is starting.[0m
[35m2022-03-26 11:17:36.845 [0m[32m[7826/start/138392 (pid 3580)] [0m[1mTask finished successfully.[0m
[35m2022-03-26 11:17:40.769 [0m[32m[7826/data_validation/138393 (pid 3630)] [0m[1mTask is starting.[0m
[35m2022-03-26 11:17:49.399 [0m[32m[7826/data_validation/138393 (pid 3630)] [0m[22mDeprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations[0m
[35m2022-03-26 1

In [8]:
# import wandb
%wandb hugobowne/mf-rf-wandb

## Data Not Validated

Let's do the above again but edit the data slightly so that it doesn't pass all our tests, to make sure that the tests are working:

In [9]:
%%writefile ../flows/rf_flow_monitor_validate_bad_data.py

from metaflow import FlowSpec, step, Parameter, JSONType, IncludeFile, card
import json

class ClassificationFlow(FlowSpec):
    """
    train a random forest
    """
    @card 
    @step
    def start(self):
        """
        Load the data
        """
        #Import scikit-learn dataset library
        from sklearn import datasets
        from sklearn.model_selection import train_test_split

        #Load dataset
        self.iris = datasets.load_iris()
        self.X = self.iris['data']
        self.y = self.iris['target']
        self.labels = self.iris['target_names']

        self.X_train, self.X_test, self.y_train, self.y_test = train_test_split(self.X, self.y, test_size=0.2)
        self.next(self.data_validation)
        

    @step
    def data_validation(self):
        """
        Perform data validation with great_expectations
        """
        import pandas as pd
        from ruamel import yaml
        import great_expectations as ge
        from great_expectations.core.batch import RuntimeBatchRequest

        context = ge.get_context()

        
        from sklearn import datasets
        iris = datasets.load_iris()
        df = pd.DataFrame(data=iris['data'], columns=iris['feature_names'])
        df["target"] = iris['target']
        df["petal length (cm)"][0] = -1


        checkpoint_config = {
            "name": "flowers-test-flow-checkpoint",
            "config_version": 1,
            "class_name": "SimpleCheckpoint",
            "run_name_template": "%Y%m%d-%H%M%S-flower-power",
            "validations": [
                {
                    "batch_request": {
                        "datasource_name": "flowers",
                        "data_connector_name": "default_runtime_data_connector_name",
                        "data_asset_name": "iris",
                    },
                    "expectation_suite_name": "flowers-testing-suite",
                }
            ],
        }
        context.add_checkpoint(**checkpoint_config)


        results = context.run_checkpoint(
            checkpoint_name="flowers-test-flow-checkpoint",
            batch_request={
                "runtime_parameters": {"batch_data": df},
                "batch_identifiers": {
                    "default_identifier_name": "<YOUR MEANINGFUL IDENTIFIER>"
                },
            },
        )
        context.build_data_docs()
        context.open_data_docs()

        self.next(self.rf_model)
        
        
    @step
    def rf_model(self):
        """
        build random forest model
        """
        from sklearn.ensemble import RandomForestClassifier
        
        
        self.clf = RandomForestClassifier(n_estimators=10, max_depth=None,
            min_samples_split=2, random_state=0)
        self.next(self.train)

        
        
    @step
    def train(self):
        """
        Train the model
        """
        import wandb
        from sklearn.model_selection import cross_val_score
        self.clf.fit(self.X_train, self.y_train)
        self.y_pred = self.clf.predict(self.X_test)
        self.y_probs = self.clf.predict_proba(self.X_test)
        self.next(self.monitor)
        

    
        
    @step
    def monitor(self):
        """
        plot some things using an experiment tracker
        
        """
        import wandb
        wandb.init(project="mf-rf-wandb", entity="hugobowne", name="mf-tutorial-iris")

        wandb.sklearn.plot_class_proportions(self.y_train, self.y_test, self.labels)
        wandb.sklearn.plot_learning_curve(self.clf, self.X_train, self.y_train)
        wandb.sklearn.plot_roc(self.y_test, self.y_probs, self.labels)
        wandb.sklearn.plot_precision_recall(self.y_test, self.y_probs, self.labels)
        wandb.sklearn.plot_feature_importances(self.clf)

        wandb.sklearn.plot_classifier(self.clf, 
                              self.X_train, self.X_test, 
                              self.y_train, self.y_test, 
                              self.y_pred, self.y_probs, 
                              self.labels, 
                              is_binary=True, 
                              model_name='RandomForest')

        wandb.finish()
        self.next(self.end)
    
    @step
    def end(self):
        """
        End of flow, yo!
        """
        print("ClassificationFlow is all done.")


if __name__ == "__main__":
    ClassificationFlow()


Overwriting ../flows/rf_flow_monitor_validate_bad_data.py


In [10]:
! python ../flows/rf_flow_monitor_validate_bad_data.py run

[35m[1mMetaflow 2.5.0[0m[35m[22m executing [0m[31m[1mClassificationFlow[0m[35m[22m[0m[35m[22m for [0m[31m[1muser:hba[0m[35m[22m[K[0m[35m[22m[0m
[35m[22mValidating your flow...[K[0m[35m[22m[0m
[32m[1m    The graph looks good![K[0m[32m[1m[0m
[35m[22mRunning pylint...[K[0m[35m[22m[0m
[32m[1m    Pylint is happy![K[0m[32m[1m[0m
[35m2022-03-26 11:22:51.783 [0m[1mWorkflow starting (run-id 7827):[0m
[35m2022-03-26 11:22:57.656 [0m[32m[7827/start/138399 (pid 3768)] [0m[1mTask is starting.[0m
[35m2022-03-26 11:23:41.731 [0m[32m[7827/start/138399 (pid 3768)] [0m[1mTask finished successfully.[0m
[35m2022-03-26 11:23:45.541 [0m[32m[7827/data_validation/138400 (pid 3808)] [0m[1mTask is starting.[0m
[35m2022-03-26 11:23:54.086 [0m[32m[7827/data_validation/138400 (pid 3808)] [0m[22mDeprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations[0m
[35m2022-03-26 1

## Deploying your model

In [13]:
%%writefile ../flows/RF-deploy.py


from metaflow import FlowSpec, step, Parameter, JSONType, IncludeFile, card, S3
import json

class ClassificationFlow(FlowSpec):
    """
    train a random forest
    """




    @card 
    @step
    def start(self):
        """
        Load the data
        """
        #Import scikit-learn dataset library
        from sklearn import datasets
        from sklearn.model_selection import train_test_split

        #Load dataset
        self.iris = datasets.load_iris()
        self.X = self.iris['data']
        self.y = self.iris['target']
        self.labels = self.iris['target_names']

        self.X_train, self.X_test, self.y_train, self.y_test = train_test_split(self.X, self.y, test_size=0.2)
        self.next(self.rf_model)
        


        
        
    @step
    def rf_model(self):
        """
        build random forest model
        """
        from sklearn.ensemble import RandomForestClassifier
        
        
        self.clf = RandomForestClassifier(n_estimators=10, max_depth=None,
            min_samples_split=2, random_state=0)
        self.next(self.train)

        
        
    @step
    def train(self):
        """
        Train the model
        """
        #import wandb
        from sklearn.model_selection import cross_val_score
        self.clf.fit(self.X_train, self.y_train)
        self.y_pred = self.clf.predict(self.X_test)
        self.y_probs = self.clf.predict_proba(self.X_test)
        self.next(self.deploy)
        

    
        

    

    @step
    def deploy(self):
        """
        Use SageMaker to deploy the model as a stand-alone, PaaS endpoint, with our choice of the underlying
        Docker image and hardware capabilities.

        Available images for inferences can be chosen from AWS official list:
        https://github.com/aws/deep-learning-containers/blob/master/available_images.md

        Once the endpoint is deployed, you can add a further step with for example behavioral testing, to
        ensure model robustness (e.g. see https://arxiv.org/pdf/2005.04118.pdf). Here, we just "prove" that
        the endpoint is up and running!

        also see here: https://github.com/jacopotagliabue/FREE_7773/blob/main/mlsys/training/small_flow_sagemaker.py

        """
        import os
        import time
        import joblib
        import shutil
        import tarfile
        from sagemaker.sklearn import SKLearnModel


        model_name = "model"
        local_tar_name = "model.tar.gz"

        os.makedirs(model_name, exist_ok=True)
        # save model to local folder
        joblib.dump(self.clf, "{}/{}.joblib".format(model_name, model_name))
        # save model as tar.gz
        with tarfile.open(local_tar_name, mode="w:gz") as _tar:
            _tar.add(model_name, recursive=True)
        # save model onto S3
        with S3(run=self) as s3:
            with open(local_tar_name, "rb") as in_file:
                data = in_file.read()
                self.model_s3_path = s3.put(local_tar_name, data)
                print('Model saved at {}'.format(self.model_s3_path))
        # remove local model folder and tar
        shutil.rmtree(model_name)
        os.remove(local_tar_name)
        # initialize SageMaker SKLearn Model
        sklearn_model = SKLearnModel(model_data=self.model_s3_path,
                                     role='oleg2-sagemaker-mztdpcvj',
                                     entry_point='../flows/sm_entry_point.py',
                                     framework_version='0.23-1',
                                     code_location='s3://oleg2-s3-mztdpcvj/sagemaker/')
        endpoint_name = 'HBA-RF-endpoint-{}'.format(int(round(time.time() * 1000)))
        print("\n\n================\nEndpoint name is: {}\n\n".format(endpoint_name))
        # deploy model
        predictor = sklearn_model.deploy(instance_type='ml.c5.2xlarge',
                                         initial_instance_count=1,
                                         endpoint_name=endpoint_name)
        # prepare a test input and check response
        test_input = self.X
        result = predictor.predict(test_input)
        print(result)
        
        self.next(self.end)
    
    @step
    def end(self):
        """
        End of flow, yo!
        """
        print("ClassificationFlow is all done.")


if __name__ == "__main__":
    ClassificationFlow()




Overwriting ../flows/RF-deploy.py


In [14]:
! python ../flows/RF-deploy.py run

[35m[1mMetaflow 2.5.0[0m[35m[22m executing [0m[31m[1mClassificationFlow[0m[35m[22m[0m[35m[22m for [0m[31m[1muser:hba[0m[35m[22m[K[0m[35m[22m[0m
[35m[22mValidating your flow...[K[0m[35m[22m[0m
[32m[1m    The graph looks good![K[0m[32m[1m[0m
[35m[22mRunning pylint...[K[0m[35m[22m[0m
[32m[1m    Pylint is happy![K[0m[32m[1m[0m
[35m2022-03-26 11:29:29.956 [0m[1mWorkflow starting (run-id 7829):[0m
[35m2022-03-26 11:29:35.861 [0m[32m[7829/start/138411 (pid 4032)] [0m[1mTask is starting.[0m
[35m2022-03-26 11:30:20.303 [0m[32m[7829/start/138411 (pid 4032)] [0m[1mTask finished successfully.[0m
[35m2022-03-26 11:30:24.100 [0m[32m[7829/rf_model/138412 (pid 4063)] [0m[1mTask is starting.[0m
[35m2022-03-26 11:30:40.441 [0m[32m[7829/rf_model/138412 (pid 4063)] [0m[1mTask finished successfully.[0m
[35m2022-03-26 11:30:44.236 [0m[32m[7829/train/138413 (pid 4092)] [0m[1mTask is starting.[0m
[35m2022-03-26 11:31:05.60

In [15]:
import boto3
import pandas as pd
from sklearn import datasets


iris = datasets.load_iris()
X = iris['data']

# Create a low-level client representing Amazon SageMaker Runtime
sagemaker_runtime = boto3.client("sagemaker-runtime", region_name='us-west-2')

# The name of the endpoint. The name must be unique within an AWS Region in your AWS account. 

endpoint_name='HBA-RF-endpoint-1648254679131'


# csv serialization
response = sagemaker_runtime.invoke_endpoint(
    EndpointName=endpoint_name,
    Body=pd.DataFrame(X).to_csv(header=False, index=False),
    ContentType="text/csv",
)

print(response["Body"].read())

b'[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]'


## Graveyard

In [None]:
! python RF-deploy-MVP-NB.py resume

In [None]:
! export METAFLOW_PROFILE=oleg2

In [None]:
from sklearn import datasets
iris = datasets.load_iris()
X = iris.data

In [None]:
import boto3
import pandas as pd

# Create a low-level client representing Amazon SageMaker Runtime
sagemaker_runtime = boto3.client("sagemaker-runtime", region_name='us-west-2')

# The name of the endpoint. The name must be unique within an AWS Region in your AWS account. 
#endpoint_name='RF-endpoint-1648025507362'
endpoint_name='RF-endpoint-1648169001436'
# endpoint_name='regression-1646368875724-endpoint'
endpoint_name='RF-endpoint-1648187750973'


# csv serialization
response = sagemaker_runtime.invoke_endpoint(
    EndpointName=endpoint_name,
    Body=pd.DataFrame(X).to_csv(header=False, index=False).encode("utf-8"),
    ContentType="text/csv",
)

print(response["Body"].read())