diff --git a/README.md b/README.md index aaa8696ce..33d704d10 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,7 @@ This repository contains example notebooks demonstrating the [Azure Machine Learning](https://azure.microsoft.com/en-us/services/machine-learning-service/) Python SDK which allows you to build, train, deploy and manage machine learning solutions using Azure. The AML SDK allows you the choice of using local or cloud compute resources, while managing and maintaining the complete data science workflow from the cloud. -![Azure ML Workflow](https://raw.githubusercontent.com/MicrosoftDocs/azure-docs/master/articles/machine-learning/service/media/concept-azure-machine-learning-architecture/workflow.png) +![Azure ML Workflow](https://raw.githubusercontent.com/MicrosoftDocs/azure-docs/master/articles/machine-learning/media/concept-azure-machine-learning-architecture/workflow.png) ## Quick installation @@ -13,15 +13,15 @@ Read more detailed instructions on [how to set up your environment](./NBSETUP.md ## How to navigate and use the example notebooks? If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, you should always run the [Configuration](./configuration.ipynb) notebook first when setting up a notebook library on a new machine or in a new environment. It configures your notebook library to connect to an Azure Machine Learning workspace, and sets up your workspace and compute to be used by many of the other examples. -This [index](.index.md) should assist in navigating the Azure Machine Learning notebook samples and encourage efficient retrieval of topics and content. +This [index](./index.md) should assist in navigating the Azure Machine Learning notebook samples and encourage efficient retrieval of topics and content. If you want to... - * ...try out and explore Azure ML, start with image classification tutorials: [Part 1 (Training)](./tutorials/img-classification-part1-training.ipynb) and [Part 2 (Deployment)](./tutorials/img-classification-part2-deploy.ipynb). + * ...try out and explore Azure ML, start with image classification tutorials: [Part 1 (Training)](./tutorials/image-classification-mnist-data/img-classification-part1-training.ipynb) and [Part 2 (Deployment)](./tutorials/image-classification-mnist-data/img-classification-part2-deploy.ipynb). * ...learn about experimentation and tracking run history, first [train within Notebook](./how-to-use-azureml/training/train-within-notebook/train-within-notebook.ipynb), then try [training on remote VM](./how-to-use-azureml/training/train-on-remote-vm/train-on-remote-vm.ipynb) and [using logging APIs](./how-to-use-azureml/training/logging-api/logging-api.ipynb). * ...train deep learning models at scale, first learn about [Machine Learning Compute](./how-to-use-azureml/training/train-on-amlcompute/train-on-amlcompute.ipynb), and then try [distributed hyperparameter tuning](./how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb) and [distributed training](./how-to-use-azureml/training-with-deep-learning/distributed-pytorch-with-horovod/distributed-pytorch-with-horovod.ipynb). - * ...deploy models as a realtime scoring service, first learn the basics by [training within Notebook and deploying to Azure Container Instance](./how-to-use-azureml/training/train-within-notebook/train-within-notebook.ipynb), then learn how to [register and manage models, and create Docker images](./how-to-use-azureml/deployment/register-model-create-image-deploy-service/register-model-create-image-deploy-service.ipynb), and [production deploy models on Azure Kubernetes Cluster](./how-to-use-azureml/deployment/production-deploy-to-aks/production-deploy-to-aks.ipynb). - * ...deploy models as a batch scoring service, first [train a model within Notebook](./how-to-use-azureml/training/train-within-notebook/train-within-notebook.ipynb), learn how to [register and manage models](./how-to-use-azureml/deployment/register-model-create-image-deploy-service/register-model-create-image-deploy-service.ipynb), then [create Machine Learning Compute for scoring compute](./how-to-use-azureml/training/train-on-amlcompute/train-on-amlcompute.ipynb), and [use Machine Learning Pipelines to deploy your model](https://aka.ms/pl-batch-scoring). + * ...deploy models as a realtime scoring service, first learn the basics by [training within Notebook and deploying to Azure Container Instance](./how-to-use-azureml/training/train-within-notebook/train-within-notebook.ipynb), then learn how to [production deploy models on Azure Kubernetes Cluster](./how-to-use-azureml/deployment/production-deploy-to-aks/production-deploy-to-aks.ipynb). + * ...deploy models as a batch scoring service, first [train a model within Notebook](./how-to-use-azureml/training/train-within-notebook/train-within-notebook.ipynb), then [create Machine Learning Compute for scoring compute](./how-to-use-azureml/training/train-on-amlcompute/train-on-amlcompute.ipynb), and [use Machine Learning Pipelines to deploy your model](https://aka.ms/pl-batch-scoring). * ...monitor your deployed models, learn about using [App Insights](./how-to-use-azureml/deployment/enable-app-insights-in-production-service/enable-app-insights-in-production-service.ipynb). ## Tutorials diff --git a/configuration.ipynb b/configuration.ipynb index 35401ea3b..91163dc85 100644 --- a/configuration.ipynb +++ b/configuration.ipynb @@ -103,7 +103,7 @@ "source": [ "import azureml.core\n", "\n", - "print(\"This notebook was created using version 1.0.76.2 of the Azure ML SDK\")\n", + "print(\"This notebook was created using version 1.3.0 of the Azure ML SDK\")\n", "print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")" ] }, diff --git a/how-to-use-azureml/README.md b/how-to-use-azureml/README.md index 78427ce05..c91109331 100644 --- a/how-to-use-azureml/README.md +++ b/how-to-use-azureml/README.md @@ -9,7 +9,6 @@ As a pre-requisite, run the [configuration Notebook](../configuration.ipynb) not * [train-on-amlcompute](./training/train-on-amlcompute): Use a 1-n node Azure ML managed compute cluster for remote runs on Azure CPU or GPU infrastructure. * [train-on-remote-vm](./training/train-on-remote-vm): Use Data Science Virtual Machine as a target for remote runs. * [logging-api](./track-and-monitor-experiments/logging-api): Learn about the details of logging metrics to run history. -* [register-model-create-image-deploy-service](./deployment/register-model-create-image-deploy-service): Learn about the details of model management. * [production-deploy-to-aks](./deployment/production-deploy-to-aks) Deploy a model to production at scale on Azure Kubernetes Service. * [enable-app-insights-in-production-service](./deployment/enable-app-insights-in-production-service) Learn how to use App Insights with production web service. diff --git a/how-to-use-azureml/automated-machine-learning/README.md b/how-to-use-azureml/automated-machine-learning/README.md index 70c29dcbc..ec5aa15a8 100644 --- a/how-to-use-azureml/automated-machine-learning/README.md +++ b/how-to-use-azureml/automated-machine-learning/README.md @@ -117,7 +117,7 @@ jupyter notebook - Simple example of using automated ML for regression - Uses azure compute for training -- [auto-ml-regression-hardware-performance-explanation-and-featurization.ipynb](regression-hardware-performance-explanation-and-featurization/auto-ml-regression-hardware-performance-explanation-and-featurization.ipynb) +- [auto-ml-regression-explanation-featurization.ipynb](regression-explanation-featurization/auto-ml-regression-explanation-featurization.ipynb) - Dataset: Hardware Performance Dataset - Shows featurization and excplanation - Uses azure compute for training @@ -144,7 +144,7 @@ jupyter notebook - Dataset: forecasting for a bike-sharing - Example of training an automated ML forecasting model on multiple time-series -- [automl-forecasting-function.ipynb](forecasting-high-frequency/automl-forecasting-function.ipynb) +- [auto-ml-forecasting-function.ipynb](forecasting-high-frequency/auto-ml-forecasting-function.ipynb) - Example of training an automated ML forecasting model on multiple time-series - [auto-ml-forecasting-beer-remote.ipynb](forecasting-beer-remote/auto-ml-forecasting-beer-remote.ipynb) @@ -152,7 +152,7 @@ jupyter notebook - Beer Production Forecasting - [auto-ml-continuous-retraining.ipynb](continuous-retraining/auto-ml-continuous-retraining.ipynb) - - Continous retraining using Pipelines and Time-Series TabularDataset + - Continuous retraining using Pipelines and Time-Series TabularDataset - [auto-ml-classification-text-dnn.ipynb](classification-text-dnn/auto-ml-classification-text-dnn.ipynb) - Classification with text data using deep learning in AutoML @@ -197,6 +197,17 @@ If automl_setup_linux.sh fails on Ubuntu Linux with the error: `unable to execut 4) Check that the region is one of the supported regions: `eastus2`, `eastus`, `westcentralus`, `southeastasia`, `westeurope`, `australiaeast`, `westus2`, `southcentralus` 5) Check that you have access to the region using the Azure Portal. +## import AutoMLConfig fails after upgrade from before 1.0.76 to 1.0.76 or later +There were package changes in automated machine learning version 1.0.76, which require the previous version to be uninstalled before upgrading to the new version. +If you have manually upgraded from a version of automated machine learning before 1.0.76 to 1.0.76 or later, you may get the error: +`ImportError: cannot import name 'AutoMLConfig'` + +This can be resolved by running: +`pip uninstall azureml-train-automl` and then +`pip install azureml-train-automl` + +The automl_setup.cmd script does this automatically. + ## workspace.from_config fails If the call `ws = Workspace.from_config()` fails: 1) Make sure that you have run the `configuration.ipynb` notebook successfully. diff --git a/how-to-use-azureml/automated-machine-learning/automl_env.yml b/how-to-use-azureml/automated-machine-learning/automl_env.yml index 32fd466cd..c8bfe39f3 100644 --- a/how-to-use-azureml/automated-machine-learning/automl_env.yml +++ b/how-to-use-azureml/automated-machine-learning/automl_env.yml @@ -2,8 +2,9 @@ name: azure_automl dependencies: # The python interpreter version. # Currently Azure ML only supports 3.5.2 and later. -- pip +- pip<=19.3.1 - python>=3.5.2,<3.6.8 +- wheel==0.30.0 - nb_conda - matplotlib==2.1.0 - numpy>=1.16.0,<=1.16.2 @@ -12,8 +13,7 @@ dependencies: - scipy>=1.0.0,<=1.1.0 - scikit-learn>=0.19.0,<=0.20.3 - pandas>=0.22.0,<=0.23.4 -- py-xgboost<=0.80 -- pyarrow>=0.11.0 +- py-xgboost<=0.90 - fbprophet==0.5 - pytorch=1.1.0 - cudatoolkit=9.0 @@ -21,18 +21,17 @@ dependencies: - pip: # Required packages for AzureML execution, history, and data preparation. - azureml-defaults + - azureml-dataprep[pandas] - azureml-train-automl - azureml-train - azureml-widgets - - azureml-explain-model - azureml-pipeline - - azureml-contrib-interpret - pytorch-transformers==1.0.0 - spacy==2.1.8 - - joblib - - onnxruntime==0.4.0 + - onnxruntime==1.0.0 - https://aka.ms/automl-resources/packages/en_core_web_sm-2.1.0.tar.gz channels: +- anaconda - conda-forge - pytorch diff --git a/how-to-use-azureml/automated-machine-learning/automl_env_mac.yml b/how-to-use-azureml/automated-machine-learning/automl_env_mac.yml index dc5b313a4..697494672 100644 --- a/how-to-use-azureml/automated-machine-learning/automl_env_mac.yml +++ b/how-to-use-azureml/automated-machine-learning/automl_env_mac.yml @@ -2,9 +2,10 @@ name: azure_automl dependencies: # The python interpreter version. # Currently Azure ML only supports 3.5.2 and later. -- pip +- pip<=19.3.1 - nomkl - python>=3.5.2,<3.6.8 +- wheel==0.30.0 - nb_conda - matplotlib==2.1.0 - numpy>=1.16.0,<=1.16.2 @@ -14,7 +15,6 @@ dependencies: - scikit-learn>=0.19.0,<=0.20.3 - pandas>=0.22.0,<0.23.0 - py-xgboost<=0.80 -- pyarrow>=0.11.0 - fbprophet==0.5 - pytorch=1.1.0 - cudatoolkit=9.0 @@ -22,18 +22,17 @@ dependencies: - pip: # Required packages for AzureML execution, history, and data preparation. - azureml-defaults + - azureml-dataprep[pandas] - azureml-train-automl - azureml-train - azureml-widgets - - azureml-explain-model - azureml-pipeline - - azureml-contrib-interpret - pytorch-transformers==1.0.0 - spacy==2.1.8 - - joblib - - onnxruntime==0.4.0 + - onnxruntime==1.0.0 - https://aka.ms/automl-resources/packages/en_core_web_sm-2.1.0.tar.gz channels: +- anaconda - conda-forge - pytorch diff --git a/how-to-use-azureml/automated-machine-learning/classification-bank-marketing-all-features/auto-ml-classification-bank-marketing-all-features.ipynb b/how-to-use-azureml/automated-machine-learning/classification-bank-marketing-all-features/auto-ml-classification-bank-marketing-all-features.ipynb index 1423925a0..351332c03 100644 --- a/how-to-use-azureml/automated-machine-learning/classification-bank-marketing-all-features/auto-ml-classification-bank-marketing-all-features.ipynb +++ b/how-to-use-azureml/automated-machine-learning/classification-bank-marketing-all-features/auto-ml-classification-bank-marketing-all-features.ipynb @@ -92,6 +92,49 @@ "from azureml.explain.model._internal.explanation_client import ExplanationClient" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This sample notebook may use features that are not available in previous versions of the Azure ML SDK." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(\"This notebook was created using version 1.3.0 of the Azure ML SDK\")\n", + "print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Accessing the Azure ML workspace requires authentication with Azure.\n", + "\n", + "The default authentication is interactive authentication using the default tenant. Executing the `ws = Workspace.from_config()` line in the cell below will prompt for authentication the first time that it is run.\n", + "\n", + "If you have multiple Azure tenants, you can specify the tenant by replacing the `ws = Workspace.from_config()` line in the cell below with the following:\n", + "\n", + "```\n", + "from azureml.core.authentication import InteractiveLoginAuthentication\n", + "auth = InteractiveLoginAuthentication(tenant_id = 'mytenantid')\n", + "ws = Workspace.from_config(auth = auth)\n", + "```\n", + "\n", + "If you need to run in an environment where interactive login is not possible, you can use Service Principal authentication by replacing the `ws = Workspace.from_config()` line in the cell below with the following:\n", + "\n", + "```\n", + "from azureml.core.authentication import ServicePrincipalAuthentication\n", + "auth = auth = ServicePrincipalAuthentication('mytenantid', 'myappid', 'mypassword')\n", + "ws = Workspace.from_config(auth = auth)\n", + "```\n", + "For more details, see [aka.ms/aml-notebook-auth](http://aka.ms/aml-notebook-auth)" + ] + }, { "cell_type": "code", "execution_count": null, @@ -106,7 +149,6 @@ "experiment=Experiment(ws, experiment_name)\n", "\n", "output = {}\n", - "output['SDK version'] = azureml.core.VERSION\n", "output['Subscription ID'] = ws.subscription_id\n", "output['Workspace'] = ws.name\n", "output['Resource Group'] = ws.resource_group\n", @@ -134,35 +176,22 @@ "metadata": {}, "outputs": [], "source": [ - "from azureml.core.compute import AmlCompute\n", - "from azureml.core.compute import ComputeTarget\n", + "from azureml.core.compute import ComputeTarget, AmlCompute\n", + "from azureml.core.compute_target import ComputeTargetException\n", "\n", - "# Choose a name for your cluster.\n", - "amlcompute_cluster_name = \"cpu-cluster-4\"\n", + "# Choose a name for your CPU cluster\n", + "cpu_cluster_name = \"cpu-cluster-4\"\n", "\n", - "found = False\n", - "# Check if this compute target already exists in the workspace.\n", - "cts = ws.compute_targets\n", - "if amlcompute_cluster_name in cts and cts[amlcompute_cluster_name].type == 'AmlCompute':\n", - " found = True\n", - " print('Found existing compute target.')\n", - " compute_target = cts[amlcompute_cluster_name]\n", - " \n", - "if not found:\n", - " print('Creating a new compute target...')\n", - " provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\", # for GPU, use \"STANDARD_NC6\"\n", - " #vm_priority = 'lowpriority', # optional\n", - " max_nodes = 6)\n", - "\n", - " # Create the cluster.\n", - " compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n", - " \n", - "print('Checking cluster status...')\n", - "# Can poll for a minimum number of nodes and for a specific timeout.\n", - "# If no min_node_count is provided, it will use the scale settings for the cluster.\n", - "compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n", - " \n", - "# For a more detailed view of current AmlCompute status, use get_status()." + "# Verify that cluster does not exist already\n", + "try:\n", + " compute_target = ComputeTarget(workspace=ws, name=cpu_cluster_name)\n", + " print('Found existing cluster, use it.')\n", + "except ComputeTargetException:\n", + " compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',\n", + " max_nodes=6)\n", + " compute_target = ComputeTarget.create(ws, cpu_cluster_name, compute_config)\n", + "\n", + "compute_target.wait_for_completion(show_output=True)" ] }, { @@ -288,13 +317,12 @@ "|**blacklist_models** | *List* of *strings* indicating machine learning algorithms for AutoML to avoid in this run.

Allowed values for **Classification**
LogisticRegression
SGD
MultinomialNaiveBayes
BernoulliNaiveBayes
SVM
LinearSVM
KNN
DecisionTree
RandomForest
ExtremeRandomTrees
LightGBM
GradientBoosting
TensorFlowDNN
TensorFlowLinearClassifier

Allowed values for **Regression**
ElasticNet
GradientBoosting
DecisionTree
KNN
LassoLars
SGD
RandomForest
ExtremeRandomTrees
LightGBM
TensorFlowLinearRegressor
TensorFlowDNN

Allowed values for **Forecasting**
ElasticNet
GradientBoosting
DecisionTree
KNN
LassoLars
SGD
RandomForest
ExtremeRandomTrees
LightGBM
TensorFlowLinearRegressor
TensorFlowDNN
Arima
Prophet|\n", "| **whitelist_models** | *List* of *strings* indicating machine learning algorithms for AutoML to use in this run. Same values listed above for **blacklist_models** allowed for **whitelist_models**.|\n", "|**experiment_exit_score**| Value indicating the target for *primary_metric*.
Once the target is surpassed the run terminates.|\n", - "|**experiment_timeout_minutes**| Maximum amount of time in minutes that all iterations combined can take before the experiment terminates.|\n", + "|**experiment_timeout_hours**| Maximum amount of time in hours that all iterations combined can take before the experiment terminates.|\n", "|**enable_early_stopping**| Flag to enble early termination if the score is not improving in the short term.|\n", "|**featurization**| 'auto' / 'off' Indicator for whether featurization step should be done automatically or not. Note: If the input data is sparse, featurization cannot be turned on.|\n", "|**n_cross_validations**|Number of cross validation splits.|\n", "|**training_data**|Input dataset, containing both features and label column.|\n", "|**label_column_name**|The name of the label column.|\n", - "|**model_explainability**|Indicate to explain each trained pipeline or not.|\n", "\n", "**_You can find more information about primary metrics_** [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-auto-train#primary-metric)" ] @@ -306,7 +334,7 @@ "outputs": [], "source": [ "automl_settings = {\n", - " \"experiment_timeout_minutes\" : 20,\n", + " \"experiment_timeout_hours\" : 0.3,\n", " \"enable_early_stopping\" : True,\n", " \"iteration_timeout_minutes\": 5,\n", " \"max_concurrent_iterations\": 4,\n", @@ -326,7 +354,6 @@ " training_data = train_data,\n", " label_column_name = label,\n", " validation_data = validation_dataset,\n", - " model_explainability=True,\n", " **automl_settings\n", " )" ] @@ -370,8 +397,6 @@ "outputs": [], "source": [ "#from azureml.train.automl.run import AutoMLRun\n", - "#experiment_name = 'automl-classification-bmarketing'\n", - "#experiment = Experiment(ws, experiment_name)\n", "#remote_run = AutoMLRun(experiment=experiment, run_id=' last_train_time: - # New data is available since the model was last trained - print("Dataset was last updated on {0}. Retraining...".format(dataset_changed_time)) - train_ds = train_ds.drop_columns(["partition_date"]) - X_train = train_ds.drop_columns( - columns=[args.target_column]).to_pandas_dataframe() - y_train = train_ds.keep_columns( - columns=[args.target_column]).to_pandas_dataframe() - - non_null = y_train[args.target_column].notnull() - y = y_train[non_null] - X = X_train[non_null] - - if not (args.output_x is None and args.output_y is None): - write_output(X, args.output_x) - write_output(y, args.output_y) -else: +if not dataset_changed_time > last_train_time: print("Cancelling run since there is no new data.") run.parent.cancel() +else: + # New data is available since the model was last trained + print("Dataset was last updated on {0}. Retraining...".format(dataset_changed_time)) diff --git a/how-to-use-azureml/automated-machine-learning/continuous-retraining/get_data.py b/how-to-use-azureml/automated-machine-learning/continuous-retraining/get_data.py deleted file mode 100644 index 6fe5a6a25..000000000 --- a/how-to-use-azureml/automated-machine-learning/continuous-retraining/get_data.py +++ /dev/null @@ -1,15 +0,0 @@ -import os -import pandas as pd - - -def get_data(): - print("In get_data") - print(os.environ['AZUREML_DATAREFERENCE_output_x']) - X_train = pd.read_csv( - os.environ['AZUREML_DATAREFERENCE_output_x'] + "/part-00000") - y_train = pd.read_csv( - os.environ['AZUREML_DATAREFERENCE_output_y'] + "/part-00000") - - print(X_train.head(3)) - - return {"X": X_train.values, "y": y_train.values.flatten()} diff --git a/how-to-use-azureml/automated-machine-learning/continuous-retraining/upload_weather_data.py b/how-to-use-azureml/automated-machine-learning/continuous-retraining/upload_weather_data.py index 08fd522a4..0dd882e9e 100644 --- a/how-to-use-azureml/automated-machine-learning/continuous-retraining/upload_weather_data.py +++ b/how-to-use-azureml/automated-machine-learning/continuous-retraining/upload_weather_data.py @@ -58,7 +58,7 @@ def get_noaa_data(start_time, end_time): print(traceback.format_exc()) print("Dataset with name {0} not found, registering new dataset.".format(args.ds_name)) register_dataset = True - end_time_last_slice = datetime.today() - relativedelta(weeks=1) + end_time_last_slice = datetime.today() - relativedelta(weeks=2) end_time = datetime.utcnow() train_df = get_noaa_data(end_time_last_slice, end_time) @@ -80,10 +80,10 @@ def get_noaa_data(start_time, end_time): target_path=folder_name, overwrite=True, show_progress=True) - - if register_dataset: - ds = Dataset.Tabular.from_delimited_files(dstor.path("{}/**/*.csv".format( - args.ds_name)), partition_format='/{partition_date:yyyy/MM/dd/hh/mm/ss}/data.csv') - ds.register(ws, name=args.ds_name) else: print("No new data since {0}.".format(end_time_last_slice)) + +if register_dataset: + ds = Dataset.Tabular.from_delimited_files(dstor.path("{}/**/*.csv".format( + args.ds_name)), partition_format='/{partition_date:yyyy/MM/dd/HH/mm/ss}/data.csv') + ds.register(ws, name=args.ds_name) diff --git a/how-to-use-azureml/automated-machine-learning/forecasting-beer-remote/auto-ml-forecasting-beer-remote.ipynb b/how-to-use-azureml/automated-machine-learning/forecasting-beer-remote/auto-ml-forecasting-beer-remote.ipynb index 7a69898f9..dba224acc 100644 --- a/how-to-use-azureml/automated-machine-learning/forecasting-beer-remote/auto-ml-forecasting-beer-remote.ipynb +++ b/how-to-use-azureml/automated-machine-learning/forecasting-beer-remote/auto-ml-forecasting-beer-remote.ipynb @@ -101,6 +101,23 @@ "from azureml.train.estimator import Estimator" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This sample notebook may use features that are not available in previous versions of the Azure ML SDK." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(\"This notebook was created using version 1.3.0 of the Azure ML SDK\")\n", + "print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")" + ] + }, { "cell_type": "markdown", "metadata": { @@ -128,7 +145,6 @@ "experiment = Experiment(ws, experiment_name)\n", "\n", "output = {}\n", - "output['SDK version'] = azureml.core.VERSION\n", "output['Subscription ID'] = ws.subscription_id\n", "output['Workspace'] = ws.name\n", "output['Resource Group'] = ws.resource_group\n", @@ -163,7 +179,7 @@ "from azureml.core.compute_target import ComputeTargetException\n", "\n", "# Choose a name for your CPU cluster\n", - "cpu_cluster_name = \"cpu-cluster\"\n", + "cpu_cluster_name = \"beer-cluster\"\n", "\n", "# Verify that cluster does not exist already\n", "try:\n", @@ -218,19 +234,18 @@ "import pandas as pd\n", "from pandas import DataFrame\n", "from pandas import Grouper\n", - "from matplotlib import pyplot\n", "from pandas import concat\n", - "from matplotlib import pyplot\n", "from pandas.plotting import register_matplotlib_converters\n", + "\n", "register_matplotlib_converters()\n", - "plt.tight_layout()\n", "plt.figure(figsize=(20, 10))\n", + "plt.tight_layout()\n", "\n", "plt.subplot(2, 1, 1)\n", "plt.title('Beer Production By Year')\n", "df = pd.read_csv(\"Beer_no_valid_split_train.csv\", parse_dates=True, index_col= 'DATE').drop(columns='grain')\n", "test_df = pd.read_csv(\"Beer_no_valid_split_test.csv\", parse_dates=True, index_col= 'DATE').drop(columns='grain')\n", - "pyplot.plot(df)\n", + "plt.plot(df)\n", "\n", "plt.subplot(2, 1, 2)\n", "plt.title('Beer Production By Month')\n", @@ -239,7 +254,8 @@ "months = DataFrame(months)\n", "months.columns = range(1,13)\n", "months.boxplot()\n", - "pyplot.show()\n" + "\n", + "plt.show()" ] }, { @@ -358,7 +374,7 @@ "\n", "automl_config = AutoMLConfig(task='forecasting', \n", " primary_metric='normalized_root_mean_squared_error',\n", - " experiment_timeout_minutes = 60,\n", + " experiment_timeout_hours = 1,\n", " training_data=train_dataset,\n", " label_column_name=target_column_name,\n", " validation_data=valid_dataset, \n", @@ -538,7 +554,7 @@ "metadata": {}, "outputs": [], "source": [ - "compute_target = ws.compute_targets['cpu-cluster']\n", + "compute_target = ws.compute_targets['beer-cluster']\n", "test_experiment = Experiment(ws, experiment_name + \"_test\")" ] }, diff --git a/how-to-use-azureml/automated-machine-learning/forecasting-beer-remote/auto-ml-forecasting-beer-remote.yml b/how-to-use-azureml/automated-machine-learning/forecasting-beer-remote/auto-ml-forecasting-beer-remote.yml index 999962ea8..a70c70336 100644 --- a/how-to-use-azureml/automated-machine-learning/forecasting-beer-remote/auto-ml-forecasting-beer-remote.yml +++ b/how-to-use-azureml/automated-machine-learning/forecasting-beer-remote/auto-ml-forecasting-beer-remote.yml @@ -1,12 +1,11 @@ name: auto-ml-forecasting-beer-remote dependencies: -- fbprophet==0.5 -- py-xgboost<=0.80 +- py-xgboost<=0.90 - pip: - azureml-sdk + - numpy==1.16.2 + - pandas==0.23.4 - azureml-train-automl - - azureml-train - azureml-widgets - matplotlib - - pandas_ml - - statsmodels + - azureml-train diff --git a/how-to-use-azureml/automated-machine-learning/forecasting-beer-remote/helper.py b/how-to-use-azureml/automated-machine-learning/forecasting-beer-remote/helper.py index 825f2b881..0da8e18a8 100644 --- a/how-to-use-azureml/automated-machine-learning/forecasting-beer-remote/helper.py +++ b/how-to-use-azureml/automated-machine-learning/forecasting-beer-remote/helper.py @@ -76,9 +76,12 @@ def get_result_df(remote_run): def run_inference(test_experiment, compute_target, script_folder, train_run, test_dataset, lookback_dataset, max_horizon, target_column_name, time_column_name, freq): - train_run.download_file('outputs/model.pkl', 'inference/model.pkl') - train_run.download_file('outputs/conda_env_v_1_0_0.yml', - 'inference/condafile.yml') + model_base_name = 'model.pkl' + if 'model_data_location' in train_run.properties: + model_location = train_run.properties['model_data_location'] + _, model_base_name = model_location.rsplit('/', 1) + train_run.download_file('outputs/{}'.format(model_base_name), 'inference/{}'.format(model_base_name)) + train_run.download_file('outputs/conda_env_v_1_0_0.yml', 'inference/condafile.yml') inference_env = Environment("myenv") inference_env.docker.enabled = True @@ -91,7 +94,8 @@ def run_inference(test_experiment, compute_target, script_folder, train_run, '--max_horizon': max_horizon, '--target_column_name': target_column_name, '--time_column_name': time_column_name, - '--frequency': freq + '--frequency': freq, + '--model_path': model_base_name }, inputs=[test_dataset.as_named_input('test_data'), lookback_dataset.as_named_input('lookback_data')], diff --git a/how-to-use-azureml/automated-machine-learning/forecasting-beer-remote/infer.py b/how-to-use-azureml/automated-machine-learning/forecasting-beer-remote/infer.py index 6b2fc9262..9b3a3171e 100644 --- a/how-to-use-azureml/automated-machine-learning/forecasting-beer-remote/infer.py +++ b/how-to-use-azureml/automated-machine-learning/forecasting-beer-remote/infer.py @@ -232,6 +232,9 @@ def MAPE(actual, pred): parser.add_argument( '--frequency', type=str, dest='freq', help='Frequency of prediction') +parser.add_argument( + '--model_path', type=str, dest='model_path', + default='model.pkl', help='Filename of model to be loaded') args = parser.parse_args() @@ -239,6 +242,7 @@ def MAPE(actual, pred): target_column_name = args.target_column_name time_column_name = args.time_column_name freq = args.freq +model_path = args.model_path print('args passed are: ') @@ -246,6 +250,7 @@ def MAPE(actual, pred): print(target_column_name) print(time_column_name) print(freq) +print(model_path) run = Run.get_context() # get input dataset by name @@ -267,7 +272,8 @@ def MAPE(actual, pred): y_lookback_df = lookback_dataset.with_timestamp_columns( None).keep_columns(columns=[target_column_name]) -fitted_model = joblib.load('model.pkl') +fitted_model = joblib.load(model_path) + if hasattr(fitted_model, 'get_lookback'): lookback = fitted_model.get_lookback() diff --git a/how-to-use-azureml/automated-machine-learning/forecasting-bike-share/auto-ml-forecasting-bike-share.ipynb b/how-to-use-azureml/automated-machine-learning/forecasting-bike-share/auto-ml-forecasting-bike-share.ipynb index 2432dfd16..e57266709 100644 --- a/how-to-use-azureml/automated-machine-learning/forecasting-bike-share/auto-ml-forecasting-bike-share.ipynb +++ b/how-to-use-azureml/automated-machine-learning/forecasting-bike-share/auto-ml-forecasting-bike-share.ipynb @@ -74,6 +74,23 @@ "from datetime import datetime" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This sample notebook may use features that are not available in previous versions of the Azure ML SDK." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(\"This notebook was created using version 1.3.0 of the Azure ML SDK\")\n", + "print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -95,7 +112,6 @@ "experiment = Experiment(ws, experiment_name)\n", "\n", "output = {}\n", - "output['SDK version'] = azureml.core.VERSION\n", "output['Subscription ID'] = ws.subscription_id\n", "output['Workspace'] = ws.name\n", "output['SKU'] = ws.sku\n", @@ -124,35 +140,22 @@ "metadata": {}, "outputs": [], "source": [ - "from azureml.core.compute import AmlCompute\n", - "from azureml.core.compute import ComputeTarget\n", + "from azureml.core.compute import ComputeTarget, AmlCompute\n", + "from azureml.core.compute_target import ComputeTargetException\n", "\n", "# Choose a name for your cluster.\n", - "amlcompute_cluster_name = \"cpu-cluster-bike\"\n", + "amlcompute_cluster_name = \"bike-cluster\"\n", "\n", - "found = False\n", - "# Check if this compute target already exists in the workspace.\n", - "cts = ws.compute_targets\n", - "if amlcompute_cluster_name in cts and cts[amlcompute_cluster_name].type == 'AmlCompute':\n", - " found = True\n", - " print('Found existing compute target.')\n", - " compute_target = cts[amlcompute_cluster_name]\n", - " \n", - "if not found:\n", - " print('Creating a new compute target...')\n", - " provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\", # for GPU, use \"STANDARD_NC6\"\n", - " #vm_priority = 'lowpriority', # optional\n", - " max_nodes = 4)\n", + "# Verify that cluster does not exist already\n", + "try:\n", + " compute_target = ComputeTarget(workspace=ws, name=amlcompute_cluster_name)\n", + " print('Found existing cluster, use it.')\n", + "except ComputeTargetException:\n", + " compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',\n", + " max_nodes=4)\n", + " compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, compute_config)\n", "\n", - " # Create the cluster.\n", - " compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n", - " \n", - "print('Checking cluster status...')\n", - "# Can poll for a minimum number of nodes and for a specific timeout.\n", - "# If no min_node_count is provided, it will use the scale settings for the cluster.\n", - "compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n", - " \n", - "# For a more detailed view of current AmlCompute status, use get_status()." + "compute_target.wait_for_completion(show_output=True)" ] }, { @@ -202,7 +205,7 @@ "outputs": [], "source": [ "dataset = Dataset.Tabular.from_delimited_files(path = [(datastore, 'dataset/bike-no.csv')]).with_timestamp_columns(fine_grain_timestamp=time_column_name) \n", - "dataset.take(5).to_pandas_dataframe()" + "dataset.take(5).to_pandas_dataframe().reset_index(drop=True)" ] }, { @@ -221,8 +224,8 @@ "outputs": [], "source": [ "# select data that occurs before a specified date\n", - "train = dataset.time_before(datetime(2012, 9, 1))\n", - "train.to_pandas_dataframe().tail(5)" + "train = dataset.time_before(datetime(2012, 8, 31), include_boundary=True)\n", + "train.to_pandas_dataframe().tail(5).reset_index(drop=True)" ] }, { @@ -231,8 +234,8 @@ "metadata": {}, "outputs": [], "source": [ - "test = dataset.time_after(datetime(2012, 8, 31))\n", - "test.to_pandas_dataframe().head(5)" + "test = dataset.time_after(datetime(2012, 9, 1), include_boundary=True)\n", + "test.to_pandas_dataframe().head(5).reset_index(drop=True)" ] }, { @@ -247,8 +250,8 @@ "|-|-|\n", "|**task**|forecasting|\n", "|**primary_metric**|This is the metric that you want to optimize.
Forecasting supports the following primary metrics
spearman_correlation
normalized_root_mean_squared_error
r2_score
normalized_mean_absolute_error\n", - "|**blacklist_models**|Models in blacklist won't be used by AutoML. All supported models can be found at [here](https://docs.microsoft.com/en-us/python/api/azureml-train-automl/azureml.train.automl.constants.supportedmodels.regression?view=azure-ml-py).|\n", - "|**experiment_timeout_minutes**|Experimentation timeout in minutes.|\n", + "|**blacklist_models**|Models in blacklist won't be used by AutoML. All supported models can be found at [here](https://docs.microsoft.com/en-us/python/api/azureml-train-automl-client/azureml.train.automl.constants.supportedmodels.forecasting?view=azure-ml-py).|\n", + "|**experiment_timeout_hours**|Experimentation timeout in hours.|\n", "|**training_data**|Input dataset, containing both features and label column.|\n", "|**label_column_name**|The name of the label column.|\n", "|**compute_target**|The remote compute for training.|\n", @@ -260,7 +263,7 @@ "|**target_lags**|The target_lags specifies how far back we will construct the lags of the target variable.|\n", "|**drop_column_names**|Name(s) of columns to drop prior to modeling|\n", "\n", - "This notebook uses the blacklist_models parameter to exclude some models that take a longer time to train on this dataset. You can choose to remove models from the blacklist_models list but you may need to increase the experiment_timeout_minutes parameter value to get results." + "This notebook uses the blacklist_models parameter to exclude some models that take a longer time to train on this dataset. You can choose to remove models from the blacklist_models list but you may need to increase the experiment_timeout_hours parameter value to get results." ] }, { @@ -305,7 +308,7 @@ "automl_config = AutoMLConfig(task='forecasting', \n", " primary_metric='normalized_root_mean_squared_error',\n", " blacklist_models = ['ExtremeRandomTrees'], \n", - " experiment_timeout_minutes=20,\n", + " experiment_timeout_hours=0.3,\n", " training_data=train,\n", " label_column_name=target_column_name,\n", " compute_target=compute_target,\n", diff --git a/how-to-use-azureml/automated-machine-learning/forecasting-bike-share/auto-ml-forecasting-bike-share.yml b/how-to-use-azureml/automated-machine-learning/forecasting-bike-share/auto-ml-forecasting-bike-share.yml index 4fbac4600..c488ffc3a 100644 --- a/how-to-use-azureml/automated-machine-learning/forecasting-bike-share/auto-ml-forecasting-bike-share.yml +++ b/how-to-use-azureml/automated-machine-learning/forecasting-bike-share/auto-ml-forecasting-bike-share.yml @@ -1,11 +1,10 @@ name: auto-ml-forecasting-bike-share dependencies: -- fbprophet==0.5 -- py-xgboost<=0.80 +- py-xgboost<=0.90 - pip: - azureml-sdk + - numpy==1.16.2 + - pandas==0.23.4 - azureml-train-automl - azureml-widgets - matplotlib - - pandas_ml - - statsmodels diff --git a/how-to-use-azureml/automated-machine-learning/forecasting-bike-share/forecasting_script.py b/how-to-use-azureml/automated-machine-learning/forecasting-bike-share/forecasting_script.py index 215ae3500..f3fb7b892 100644 --- a/how-to-use-azureml/automated-machine-learning/forecasting-bike-share/forecasting_script.py +++ b/how-to-use-azureml/automated-machine-learning/forecasting-bike-share/forecasting_script.py @@ -32,18 +32,17 @@ grain_column_names = [] -df = test_dataset.to_pandas_dataframe() +df = test_dataset.to_pandas_dataframe().reset_index(drop=True) -X_test_df = test_dataset.drop_columns(columns=[target_column_name]) -y_test_df = test_dataset.with_timestamp_columns( - None).keep_columns(columns=[target_column_name]) +X_test_df = test_dataset.drop_columns(columns=[target_column_name]).to_pandas_dataframe().reset_index(drop=True) +y_test_df = test_dataset.with_timestamp_columns(None).keep_columns(columns=[target_column_name]).to_pandas_dataframe() fitted_model = joblib.load('model.pkl') df_all = forecasting_helper.do_rolling_forecast( fitted_model, - X_test_df.to_pandas_dataframe(), - y_test_df.to_pandas_dataframe().values.T[0], + X_test_df, + y_test_df.values.T[0], target_column_name, time_column_name, max_horizon, diff --git a/how-to-use-azureml/automated-machine-learning/forecasting-energy-demand/auto-ml-forecasting-energy-demand.ipynb b/how-to-use-azureml/automated-machine-learning/forecasting-energy-demand/auto-ml-forecasting-energy-demand.ipynb index 266dde25a..d8af37fb5 100644 --- a/how-to-use-azureml/automated-machine-learning/forecasting-energy-demand/auto-ml-forecasting-energy-demand.ipynb +++ b/how-to-use-azureml/automated-machine-learning/forecasting-energy-demand/auto-ml-forecasting-energy-demand.ipynb @@ -28,11 +28,10 @@ "1. [Setup](#Setup)\n", "1. [Data and Forecasting Configurations](#Data)\n", "1. [Train](#Train)\n", - "1. [Results](#Results)\n", "\n", "Advanced Forecasting\n", "1. [Advanced Training](#advanced_training)\n", - "1. [Advanced Results](#advanced Results)" + "1. [Advanced Results](#advanced_results)" ] }, { @@ -85,6 +84,23 @@ "from datetime import datetime" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This sample notebook may use features that are not available in previous versions of the Azure ML SDK." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(\"This notebook was created using version 1.3.0 of the Azure ML SDK\")\n", + "print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -109,7 +125,6 @@ "experiment = Experiment(ws, experiment_name)\n", "\n", "output = {}\n", - "output['SDK version'] = azureml.core.VERSION\n", "output['Subscription ID'] = ws.subscription_id\n", "output['Workspace'] = ws.name\n", "output['Resource Group'] = ws.resource_group\n", @@ -140,35 +155,22 @@ "metadata": {}, "outputs": [], "source": [ - "from azureml.core.compute import AmlCompute\n", - "from azureml.core.compute import ComputeTarget\n", + "from azureml.core.compute import ComputeTarget, AmlCompute\n", + "from azureml.core.compute_target import ComputeTargetException\n", "\n", "# Choose a name for your cluster.\n", - "amlcompute_cluster_name = \"aml-compute\"\n", - "\n", - "found = False\n", - "# Check if this compute target already exists in the workspace.\n", - "cts = ws.compute_targets\n", - "if amlcompute_cluster_name in cts and cts[amlcompute_cluster_name].type == 'AmlCompute':\n", - " found = True\n", - " print('Found existing compute target.')\n", - " compute_target = cts[amlcompute_cluster_name]\n", - "\n", - "if not found:\n", - " print('Creating a new compute target...')\n", - " provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_DS12_V2\", # for GPU, use \"STANDARD_NC6\"\n", - " #vm_priority = 'lowpriority', # optional\n", - " max_nodes = 6)\n", - "\n", - " # Create the cluster.\\n\",\n", - " compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n", + "amlcompute_cluster_name = \"energy-cluster\"\n", "\n", - "print('Checking cluster status...')\n", - "# Can poll for a minimum number of nodes and for a specific timeout.\n", - "# If no min_node_count is provided, it will use the scale settings for the cluster.\n", - "compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n", + "# Verify that cluster does not exist already\n", + "try:\n", + " compute_target = ComputeTarget(workspace=ws, name=amlcompute_cluster_name)\n", + " print('Found existing cluster, use it.')\n", + "except ComputeTargetException:\n", + " compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_DS12_V2',\n", + " max_nodes=6)\n", + " compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, compute_config)\n", "\n", - "# For a more detailed view of current AmlCompute status, use get_status()." + "compute_target.wait_for_completion(show_output=True)" ] }, { @@ -211,7 +213,7 @@ "outputs": [], "source": [ "dataset = Dataset.Tabular.from_delimited_files(path = \"https://automlsamplenotebookdata.blob.core.windows.net/automl-sample-notebook-data/nyc_energy.csv\").with_timestamp_columns(fine_grain_timestamp=time_column_name) \n", - "dataset.take(5).to_pandas_dataframe()" + "dataset.take(5).to_pandas_dataframe().reset_index(drop=True)" ] }, { @@ -253,7 +255,7 @@ "source": [ "# split into train based on time\n", "train = dataset.time_before(datetime(2017, 8, 8, 5), include_boundary=True)\n", - "train.to_pandas_dataframe().sort_values(time_column_name).tail(5)" + "train.to_pandas_dataframe().reset_index(drop=True).sort_values(time_column_name).tail(5)" ] }, { @@ -263,8 +265,8 @@ "outputs": [], "source": [ "# split into test based on time\n", - "test = dataset.time_between(datetime(2017, 8, 8, 5), datetime(2017, 8, 10, 5))\n", - "test.to_pandas_dataframe().head(5)" + "test = dataset.time_between(datetime(2017, 8, 8, 6), datetime(2017, 8, 10, 5))\n", + "test.to_pandas_dataframe().reset_index(drop=True).head(5)" ] }, { @@ -301,8 +303,8 @@ "|-|-|\n", "|**task**|forecasting|\n", "|**primary_metric**|This is the metric that you want to optimize.
Forecasting supports the following primary metrics
spearman_correlation
normalized_root_mean_squared_error
r2_score
normalized_mean_absolute_error|\n", - "|**blacklist_models**|Models in blacklist won't be used by AutoML. All supported models can be found at [here](https://docs.microsoft.com/en-us/python/api/azureml-train-automl/azureml.train.automl.constants.supportedmodels.regression?view=azure-ml-py).|\n", - "|**experiment_timeout_minutes**|Maximum amount of time in minutes that the experiment take before it terminates.|\n", + "|**blacklist_models**|Models in blacklist won't be used by AutoML. All supported models can be found at [here](https://docs.microsoft.com/en-us/python/api/azureml-train-automl-client/azureml.train.automl.constants.supportedmodels.forecasting?view=azure-ml-py).|\n", + "|**experiment_timeout_hours**|Maximum amount of time in hours that the experiment take before it terminates.|\n", "|**training_data**|The training data to be used within the experiment.|\n", "|**label_column_name**|The name of the label column.|\n", "|**compute_target**|The remote compute for training.|\n", @@ -316,7 +318,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "This notebook uses the blacklist_models parameter to exclude some models that take a longer time to train on this dataset. You can choose to remove models from the blacklist_models list but you may need to increase the experiment_timeout_minutes parameter value to get results." + "This notebook uses the blacklist_models parameter to exclude some models that take a longer time to train on this dataset. You can choose to remove models from the blacklist_models list but you may need to increase the experiment_timeout_hours parameter value to get results." ] }, { @@ -333,7 +335,7 @@ "automl_config = AutoMLConfig(task='forecasting', \n", " primary_metric='normalized_root_mean_squared_error',\n", " blacklist_models = ['ExtremeRandomTrees', 'AutoArima', 'Prophet'], \n", - " experiment_timeout_minutes=20,\n", + " experiment_timeout_hours=0.3,\n", " training_data=train,\n", " label_column_name=target_column_name,\n", " compute_target=compute_target,\n", @@ -454,7 +456,7 @@ "metadata": {}, "outputs": [], "source": [ - "X_test = test.to_pandas_dataframe()\n", + "X_test = test.to_pandas_dataframe().reset_index(drop=True)\n", "y_test = X_test.pop(target_column_name).values" ] }, @@ -463,7 +465,7 @@ "metadata": {}, "source": [ "### Forecast Function\n", - "For forecasting, we will use the forecast function instead of the predict function. Using the predict method would result in getting predictions for EVERY horizon the forecaster can predict at. This is useful when training and evaluating the performance of the forecaster at various horizons, but the level of detail is excessive for normal use. Forecast function also can handle more complicated scenarios, see notebook on [high frequency forecasting](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/automated-machine-learning/forecasting-high-frequency/automl-forecasting-function.ipynb)." + "For forecasting, we will use the forecast function instead of the predict function. Using the predict method would result in getting predictions for EVERY horizon the forecaster can predict at. This is useful when training and evaluating the performance of the forecaster at various horizons, but the level of detail is excessive for normal use. Forecast function also can handle more complicated scenarios, see notebook on [high frequency forecasting](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/automated-machine-learning/forecasting-high-frequency/auto-ml-forecasting-function.ipynb)." ] }, { @@ -578,7 +580,7 @@ "automl_config = AutoMLConfig(task='forecasting', \n", " primary_metric='normalized_root_mean_squared_error',\n", " blacklist_models = ['ElasticNet','ExtremeRandomTrees','GradientBoosting','XGBoostRegressor','ExtremeRandomTrees', 'AutoArima', 'Prophet'], #These models are blacklisted for tutorial purposes, remove this for real use cases. \n", - " experiment_timeout_minutes=20,\n", + " experiment_timeout_hours=0.3,\n", " training_data=train,\n", " label_column_name=target_column_name,\n", " compute_target=compute_target,\n", @@ -633,7 +635,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Advanced Results\n", + "## Advanced Results\n", "We did not use lags in the previous model specification. In effect, the prediction was the result of a simple regression on date, grain and any additional features. This is often a very good prediction as common time series patterns like seasonality and trends can be captured in this manner. Such simple regression is horizon-less: it doesn't matter how far into the future we are predicting, because we are not using past data. In the previous example, the horizon was only used to split the data for cross-validation." ] }, diff --git a/how-to-use-azureml/automated-machine-learning/forecasting-energy-demand/auto-ml-forecasting-energy-demand.yml b/how-to-use-azureml/automated-machine-learning/forecasting-energy-demand/auto-ml-forecasting-energy-demand.yml index 4a4aeabd2..30672301c 100644 --- a/how-to-use-azureml/automated-machine-learning/forecasting-energy-demand/auto-ml-forecasting-energy-demand.yml +++ b/how-to-use-azureml/automated-machine-learning/forecasting-energy-demand/auto-ml-forecasting-energy-demand.yml @@ -2,11 +2,8 @@ name: auto-ml-forecasting-energy-demand dependencies: - pip: - azureml-sdk - - interpret + - numpy==1.16.2 + - pandas==0.23.4 - azureml-train-automl - azureml-widgets - matplotlib - - pandas_ml - - statsmodels - - azureml-explain-model - - azureml-contrib-interpret diff --git a/how-to-use-azureml/automated-machine-learning/forecasting-grouping/auto-ml-forecasting-grouping.ipynb b/how-to-use-azureml/automated-machine-learning/forecasting-grouping/auto-ml-forecasting-grouping.ipynb deleted file mode 100644 index d0d9eb53b..000000000 --- a/how-to-use-azureml/automated-machine-learning/forecasting-grouping/auto-ml-forecasting-grouping.ipynb +++ /dev/null @@ -1,551 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/forecasting-grouping/auto-ml-forecasting-grouping.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Automated Machine Learning\n", - "\n", - "_**Forecasting with grouping using Pipelines**_\n", - "\n", - "## Contents\n", - "\n", - "1. [Introduction](#Introduction)\n", - "2. [Setup](#Setup)\n", - "3. [Data](#Data)\n", - "4. [Compute](#Compute)\n", - "4. [AutoMLConfig](#AutoMLConfig)\n", - "5. [Pipeline](#Pipeline)\n", - "5. [Train](#Train)\n", - "6. [Test](#Test)\n", - "\n", - "\n", - "## Introduction\n", - "In this example we use Automated ML and Pipelines to train, select, and operationalize forecasting models for multiple time-series.\n", - "\n", - "If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration notebook](../../../configuration.ipynb) first if you haven't already to establish your connection to the AzureML Workspace.\n", - "\n", - "In this notebook you will learn how to:\n", - "\n", - "* Create an Experiment in an existing Workspace.\n", - "* Configure AutoML using AutoMLConfig.\n", - "* Use our helper script to generate pipeline steps to split, train, and deploy the models.\n", - "* Explore the results.\n", - "* Test the models.\n", - "\n", - "It is advised you ensure your cluster has at least one node per group.\n", - "\n", - "An Enterprise workspace is required for this notebook. To learn more about creating an Enterprise workspace or upgrading to an Enterprise workspace from the Azure portal, please visit our [Workspace page.](https://docs.microsoft.com/azure/machine-learning/service/concept-workspace#upgrade)\n", - "\n", - "## Setup\n", - "As part of the setup you have already created an Azure ML `Workspace` object. For Automated ML you will need to create an `Experiment` object, which is a named object in a `Workspace` used to run experiments. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import json\n", - "import logging\n", - "import warnings\n", - "\n", - "import numpy as np\n", - "import pandas as pd\n", - "\n", - "import azureml.core\n", - "\n", - "from azureml.core.workspace import Workspace\n", - "from azureml.core.experiment import Experiment\n", - "from azureml.train.automl import AutoMLConfig" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Accessing the Azure ML workspace requires authentication with Azure.\n", - "\n", - "The default authentication is interactive authentication using the default tenant. Executing the ws = Workspace.from_config() line in the cell below will prompt for authentication the first time that it is run.\n", - "\n", - "If you have multiple Azure tenants, you can specify the tenant by replacing the ws = Workspace.from_config() line in the cell below with the following:\n", - "```\n", - "from azureml.core.authentication import InteractiveLoginAuthentication\n", - "auth = InteractiveLoginAuthentication(tenant_id = 'mytenantid')\n", - "ws = Workspace.from_config(auth = auth)\n", - "```\n", - "If you need to run in an environment where interactive login is not possible, you can use Service Principal authentication by replacing the ws = Workspace.from_config() line in the cell below with the following:\n", - "```\n", - "from azureml.core.authentication import ServicePrincipalAuthentication\n", - "auth = auth = ServicePrincipalAuthentication('mytenantid', 'myappid', 'mypassword')\n", - "ws = Workspace.from_config(auth = auth)\n", - "```\n", - "For more details, see aka.ms/aml-notebook-auth" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "ws = Workspace.from_config()\n", - "ds = ws.get_default_datastore()\n", - "\n", - "# choose a name for the run history container in the workspace\n", - "experiment_name = 'automl-grouping-oj'\n", - "# project folder\n", - "project_folder = './sample_projects/{}'.format(experiment_name)\n", - "\n", - "experiment = Experiment(ws, experiment_name)\n", - "\n", - "output = {}\n", - "output['SDK version'] = azureml.core.VERSION\n", - "output['Subscription ID'] = ws.subscription_id\n", - "output['Workspace'] = ws.name\n", - "output['Resource Group'] = ws.resource_group\n", - "output['Location'] = ws.location\n", - "output['Project Directory'] = project_folder\n", - "output['Run History Name'] = experiment_name\n", - "pd.set_option('display.max_colwidth', -1)\n", - "outputDf = pd.DataFrame(data = output, index = [''])\n", - "outputDf.T" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Data\n", - "Upload data to your default datastore and then load it as a `TabularDataset`" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.dataset import Dataset" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# upload training and test data to your default datastore\n", - "ds = ws.get_default_datastore()\n", - "ds.upload(src_dir='./data', target_path='groupdata', overwrite=True, show_progress=True)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# load data from your datastore\n", - "data = Dataset.Tabular.from_delimited_files(path=ds.path('groupdata/dominicks_OJ_2_5_8_train.csv'))\n", - "data_test = Dataset.Tabular.from_delimited_files(path=ds.path('groupdata/dominicks_OJ_2_5_8_test.csv'))\n", - "\n", - "data.take(5).to_pandas_dataframe()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Compute \n", - "\n", - "#### Create or Attach existing AmlCompute\n", - "\n", - "You will need to create a compute target for your automated ML run. In this tutorial, you create AmlCompute as your training compute resource.\n", - "#### Creation of AmlCompute takes approximately 5 minutes. \n", - "If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n", - "As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read this article on the default limits and how to request more quota." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.compute import AmlCompute\n", - "from azureml.core.compute import ComputeTarget\n", - "\n", - "# Choose a name for your cluster.\n", - "amlcompute_cluster_name = \"cpu-cluster-11\"\n", - "\n", - "found = False\n", - "# Check if this compute target already exists in the workspace.\n", - "cts = ws.compute_targets\n", - "if amlcompute_cluster_name in cts and cts[amlcompute_cluster_name].type == 'AmlCompute':\n", - " found = True\n", - " print('Found existing compute target.')\n", - " compute_target = cts[amlcompute_cluster_name]\n", - " \n", - "if not found:\n", - " print('Creating a new compute target...')\n", - " provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\", # for GPU, use \"STANDARD_NC6\"\n", - " #vm_priority = 'lowpriority', # optional\n", - " max_nodes = 6)\n", - "\n", - " # Create the cluster.\n", - " compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n", - " \n", - "print('Checking cluster status...')\n", - "# Can poll for a minimum number of nodes and for a specific timeout.\n", - "# If no min_node_count is provided, it will use the scale settings for the cluster.\n", - "compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n", - " \n", - "# For a more detailed view of current AmlCompute status, use get_status()." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## AutoMLConfig\n", - "#### Create a base AutoMLConfig\n", - "This configuration will be used for all the groups in the pipeline." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "target_column = 'Quantity'\n", - "time_column_name = 'WeekStarting'\n", - "grain_column_names = ['Brand']\n", - "group_column_names = ['Store']\n", - "max_horizon = 20" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "automl_settings = {\n", - " \"iteration_timeout_minutes\" : 5,\n", - " \"experiment_timeout_minutes\" : 15,\n", - " \"primary_metric\" : 'normalized_mean_absolute_error',\n", - " \"time_column_name\": time_column_name,\n", - " \"grain_column_names\": grain_column_names,\n", - " \"max_horizon\": max_horizon,\n", - " \"drop_column_names\": ['logQuantity'],\n", - " \"max_concurrent_iterations\": 2,\n", - " \"max_cores_per_iteration\": -1\n", - "}\n", - "base_configuration = AutoMLConfig(task = 'forecasting',\n", - " path = project_folder,\n", - " n_cross_validations=3,\n", - " **automl_settings\n", - " )" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Pipeline\n", - "We've written a script to generate the individual pipeline steps used to create each automl step. Calling this script will return a list of PipelineSteps that will train multiple groups concurrently and then deploy these models.\n", - "\n", - "This step requires an Enterprise workspace to gain access to this feature. To learn more about creating an Enterprise workspace or upgrading to an Enterprise workspace from the Azure portal, please visit our [Workspace page.](https://docs.microsoft.com/azure/machine-learning/service/concept-workspace#upgrade).\n", - "\n", - "### Call the method to build pipeline steps\n", - "\n", - "`build_pipeline_steps()` takes as input:\n", - "* **automlconfig**: This is the configuration used for every automl step\n", - "* **df**: This is the dataset to be used for training\n", - "* **target_column**: This is the target column of the dataset\n", - "* **compute_target**: The compute to be used for training\n", - "* **deploy**: The option on to deploy the models after training, if set to true an extra step will be added to deploy a webservice with all the models (default is `True`)\n", - "* **service_name**: The service name for the model query endpoint\n", - "* **time_column_name**: The time column of the data" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.webservice import Webservice\n", - "from azureml.exceptions import WebserviceException\n", - "\n", - "service_name = 'grouped-model'\n", - "try:\n", - " # if you want to get existing service below is the command\n", - " # since aci name needs to be unique in subscription deleting existing aci if any\n", - " # we use aci_service_name to create azure aci\n", - " service = Webservice(ws, name=service_name)\n", - " if service:\n", - " service.delete()\n", - "except WebserviceException as e:\n", - " pass" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from build import build_pipeline_steps\n", - "\n", - "steps = build_pipeline_steps(\n", - " base_configuration, \n", - " data, \n", - " target_column,\n", - " compute_target, \n", - " group_column_names=group_column_names, \n", - " deploy=True, \n", - " service_name=service_name, \n", - " time_column_name=time_column_name\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Train\n", - "Use the list of steps generated from above to build the pipeline and submit it to your compute for remote training." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.pipeline.core import Pipeline\n", - "pipeline = Pipeline(\n", - " description=\"A pipeline with one model per data group using Automated ML.\",\n", - " workspace=ws, \n", - " steps=steps)\n", - "\n", - "pipeline_run = experiment.submit(pipeline)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.widgets import RunDetails\n", - "RunDetails(pipeline_run).show()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "pipeline_run.wait_for_completion(show_output=False)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Test\n", - "\n", - "Now we can use the holdout set to test our models and ensure our web-service is running as expected." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.webservice import AciWebservice\n", - "service = AciWebservice(ws, service_name)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "X_test = data_test.to_pandas_dataframe()\n", - "# Drop the column we are trying to predict (target column)\n", - "x_pred = X_test.drop(target_column, inplace=False, axis=1)\n", - "x_pred.head()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Get Predictions\n", - "test_sample = X_test.drop(target_column, inplace=False, axis=1).to_json()\n", - "predictions = service.run(input_data=test_sample)\n", - "print(predictions)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Convert predictions from JSON to DataFrame\n", - "pred_dict =json.loads(predictions)\n", - "X_pred = pd.read_json(pred_dict['predictions'])\n", - "X_pred.head()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Fix the index\n", - "PRED = 'pred_target'\n", - "X_pred[time_column_name] = pd.to_datetime(X_pred[time_column_name], unit='ms')\n", - "\n", - "X_pred.set_index([time_column_name] + grain_column_names, inplace=True, drop=True)\n", - "X_pred.rename({'_automl_target_col': PRED}, inplace=True, axis=1)\n", - "# Drop all but the target column and index\n", - "X_pred.drop(list(set(X_pred.columns.values).difference({PRED})), axis=1, inplace=True)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "X_test[time_column_name] = pd.to_datetime(X_test[time_column_name])\n", - "X_test.set_index([time_column_name] + grain_column_names, inplace=True, drop=True)\n", - "# Merge predictions with raw features\n", - "pred_test = X_test.merge(X_pred, left_index=True, right_index=True)\n", - "pred_test.head()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from sklearn.metrics import mean_absolute_error, mean_squared_error\n", - "def MAPE(actual, pred):\n", - " \"\"\"\n", - " Calculate mean absolute percentage error.\n", - " Remove NA and values where actual is close to zero\n", - " \"\"\"\n", - " not_na = ~(np.isnan(actual) | np.isnan(pred))\n", - " not_zero = ~np.isclose(actual, 0.0)\n", - " actual_safe = actual[not_na & not_zero]\n", - " pred_safe = pred[not_na & not_zero]\n", - " APE = 100*np.abs((actual_safe - pred_safe)/actual_safe)\n", - " return np.mean(APE)\n", - "\n", - "def get_metrics(actuals, preds):\n", - " return pd.Series(\n", - " {\n", - " \"RMSE\": np.sqrt(mean_squared_error(actuals, preds)),\n", - " \"NormRMSE\": np.sqrt(mean_squared_error(actuals, preds))/np.abs(actuals.max()-actuals.min()),\n", - " \"MAE\": mean_absolute_error(actuals, preds),\n", - " \"MAPE\": MAPE(actuals, preds)},\n", - " )" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "get_metrics(pred_test[PRED].values, pred_test[target_column].values)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "authors": [ - { - "name": "alyerman" - } - ], - "category": "other", - "compute": [ - "AML Compute" - ], - "datasets": [ - "Orange Juice Sales" - ], - "deployment": [ - "Azure Container Instance" - ], - "exclude_from_index": false, - "framework": [ - "Scikit-learn", - "Pytorch" - ], - "friendly_name": "Automated ML Grouping with Pipeline.", - "index_order": 10, - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.6" - }, - "tags": [ - "AutomatedML" - ], - "task": "Use AzureML Pipeline to trigger multiple Automated ML runs." - }, - "nbformat": 4, - "nbformat_minor": 2 -} \ No newline at end of file diff --git a/how-to-use-azureml/automated-machine-learning/forecasting-grouping/build.py b/how-to-use-azureml/automated-machine-learning/forecasting-grouping/build.py deleted file mode 100644 index b6a4a8b53..000000000 --- a/how-to-use-azureml/automated-machine-learning/forecasting-grouping/build.py +++ /dev/null @@ -1,144 +0,0 @@ -from typing import List, Dict -import copy -import json -import pandas as pd -import re - -from azureml.core import RunConfiguration -from azureml.core.compute import ComputeTarget -from azureml.core.conda_dependencies import CondaDependencies -from azureml.core.dataset import Dataset -from azureml.data import TabularDataset -from azureml.pipeline.core import PipelineData, PipelineParameter, TrainingOutput, StepSequence -from azureml.pipeline.steps import PythonScriptStep -from azureml.train.automl import AutoMLConfig -from azureml.train.automl.runtime import AutoMLStep - - -def _get_groups(data: Dataset, group_column_names: List[str]) -> pd.DataFrame: - return data._dataflow.distinct(columns=group_column_names)\ - .keep_columns(columns=group_column_names).to_pandas_dataframe() - - -def _get_configs(automlconfig: AutoMLConfig, - data: Dataset, - target_column: str, - compute_target: ComputeTarget, - group_column_names: List[str]) -> Dict[str, AutoMLConfig]: - # remove invalid characters regex - valid_chars = re.compile('[^a-zA-Z0-9-]') - groups = _get_groups(data, group_column_names) - configs = {} - for i, group in groups.iterrows(): - single = data - group_name = "#####".join(str(x) for x in group.values) - group_name = valid_chars.sub('', group_name) - for key in group.index: - single = single._dataflow.filter(data._dataflow[key] == group[key]) - t_dataset = TabularDataset._create(single) - group_conf = copy.deepcopy(automlconfig) - group_conf.user_settings['training_data'] = t_dataset - group_conf.user_settings['label_column_name'] = target_column - group_conf.user_settings['compute_target'] = compute_target - configs[group_name] = group_conf - return configs - - -def build_pipeline_steps(automlconfig: AutoMLConfig, - data: Dataset, - target_column: str, - compute_target: ComputeTarget, - group_column_names: list, - time_column_name: str, - deploy: bool, - service_name: str = 'grouping-demo') -> StepSequence: - steps = [] - - metrics_output_name = 'metrics_{}' - best_model_output_name = 'best_model_{}' - count = 0 - model_names = [] - - # get all automl configs by group - configs = _get_configs(automlconfig, data, target_column, compute_target, group_column_names) - - # build a runconfig for register model - register_config = RunConfiguration() - cd = CondaDependencies() - cd.add_pip_package('azureml-pipeline') - register_config.environment.python.conda_dependencies = cd - - # create each automl step end-to-end (train, register) - for group_name, conf in configs.items(): - # create automl metrics output - metirics_data = PipelineData( - name='metrics_data_{}'.format(group_name), - pipeline_output_name=metrics_output_name.format(group_name), - training_output=TrainingOutput(type='Metrics')) - # create automl model output - model_data = PipelineData( - name='model_data_{}'.format(group_name), - pipeline_output_name=best_model_output_name.format(group_name), - training_output=TrainingOutput(type='Model', metric=conf.user_settings['primary_metric'])) - - automl_step = AutoMLStep( - name='automl_{}'.format(group_name), - automl_config=conf, - outputs=[metirics_data, model_data], - allow_reuse=True) - steps.append(automl_step) - - # pass the group name as a parameter to the register step -> - # this will become the name of the model for this group. - group_name_param = PipelineParameter("group_name_{}".format(count), default_value=group_name) - count += 1 - - reg_model_step = PythonScriptStep( - 'register.py', - name='register_{}'.format(group_name), - arguments=["--model_name", group_name_param, "--model_path", model_data], - inputs=[model_data], - compute_target=compute_target, - runconfig=register_config, - source_directory="register", - allow_reuse=True - ) - steps.append(reg_model_step) - model_names.append(group_name) - - final_steps = steps - if deploy: - # modify the conda dependencies to ensure we pick up correct - # versions of azureml-defaults and azureml-train-automl - cd = CondaDependencies.create(pip_packages=['azureml-defaults', 'azureml-train-automl']) - automl_deps = CondaDependencies(conda_dependencies_file_path='deploy/myenv.yml') - cd._merge_dependencies(automl_deps) - cd.save('deploy/myenv.yml') - - # add deployment step - pp_group_column_names = PipelineParameter( - "group_column_names", - default_value="#####".join(list(reversed(group_column_names)))) - - pp_model_names = PipelineParameter( - "model_names", - default_value=json.dumps(model_names)) - - pp_service_name = PipelineParameter( - "service_name", - default_value=service_name) - - deployment_step = PythonScriptStep( - 'deploy.py', - name='service_deploy', - arguments=["--group_column_names", pp_group_column_names, - "--model_names", pp_model_names, - "--service_name", pp_service_name, - "--time_column_name", time_column_name], - compute_target=compute_target, - runconfig=RunConfiguration(), - source_directory="deploy" - ) - final_steps = StepSequence(steps=[steps, deployment_step]) - - return final_steps diff --git a/how-to-use-azureml/automated-machine-learning/forecasting-grouping/data/dominicks_OJ_2_5_8_test.csv b/how-to-use-azureml/automated-machine-learning/forecasting-grouping/data/dominicks_OJ_2_5_8_test.csv deleted file mode 100644 index a91b39316..000000000 --- a/how-to-use-azureml/automated-machine-learning/forecasting-grouping/data/dominicks_OJ_2_5_8_test.csv +++ /dev/null @@ -1,61 +0,0 @@ -WeekStarting,Store,Brand,Quantity,logQuantity,Advert,Price,Age60,COLLEGE,INCOME,Hincome150,Large HH,Minorities,WorkingWoman,SSTRDIST,SSTRVOL,CPDIST5,CPWVOL5 -1992-08-20,2,minute.maid,23488,10.06424493,1,1.94,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-08-20,2,tropicana,13376,9.501217335,1,2.79,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-08-27,2,tropicana,8128,9.00307017,0,2.75,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-08-27,2,minute.maid,19008,9.852615222,0,1.69,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-08-27,2,dominicks,9024,9.107642974,0,1.19,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-09-03,2,tropicana,19456,9.875910785,1,2.49,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-09-03,2,minute.maid,11584,9.357380115,0,1.81,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-09-03,2,dominicks,2048,7.624618986000001,0,2.09,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-09-10,2,tropicana,10048,9.215128888999999,0,2.64,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-09-10,2,minute.maid,26752,10.19436452,1,1.99,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-09-10,2,dominicks,1984,7.592870287999999,0,2.09,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-09-17,2,tropicana,6336,8.754002933999999,0,3.19,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-09-17,2,minute.maid,3904,8.269756948,0,2.83,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-09-17,2,dominicks,4160,8.333270353,0,1.77,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-09-24,2,tropicana,16192,9.692272572,1,2.79,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-09-24,2,minute.maid,3712,8.219326094,0,2.67,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-09-24,2,dominicks,35264,10.47061789,0,1.49,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-10-01,2,dominicks,8640,9.064157862,0,1.82,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-10-01,2,minute.maid,41216,10.62658181,1,2.19,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-10-01,2,tropicana,5824,8.66974259,0,2.97,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-08-20,5,tropicana,17728,9.78290059,1,2.79,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-08-20,5,minute.maid,27072,10.20625526,1,1.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-08-27,5,tropicana,9600,9.169518378,0,2.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-08-27,5,minute.maid,3840,8.253227646000001,0,1.69,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-08-27,5,dominicks,1856,7.526178913,0,1.29,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-09-03,5,tropicana,25664,10.15284451,1,2.49,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-09-03,5,minute.maid,6144,8.723231275,0,1.69,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-09-03,5,dominicks,3712,8.219326094,0,1.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-09-10,5,tropicana,9984,9.208739091,0,2.49,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-09-10,5,dominicks,2688,7.896552702,0,1.85,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-09-10,5,minute.maid,36416,10.50276352,1,1.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-09-17,5,tropicana,8576,9.056722882999999,0,2.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-09-17,5,minute.maid,5440,8.60153434,0,2.69,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-09-17,5,dominicks,6464,8.774003599999999,0,1.85,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-09-24,5,tropicana,13184,9.486759252,1,2.78,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-09-24,5,dominicks,40896,10.61878754,0,1.49,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-09-24,5,minute.maid,7680,8.946374826,0,2.49,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-10-01,5,dominicks,6144,8.723231275,0,1.85,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-10-01,5,minute.maid,50304,10.82583988,1,2.19,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-10-01,5,tropicana,7488,8.921057017999999,0,2.78,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-08-20,8,minute.maid,55552,10.9250748,1,1.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-08-20,8,tropicana,8576,9.056722882999999,1,2.79,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-08-27,8,tropicana,8000,8.987196821,0,2.89,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-08-27,8,minute.maid,18688,9.835636886,0,1.69,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-08-27,8,dominicks,19200,9.862665558,0,1.29,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-09-03,8,tropicana,21760,9.987828701,1,2.49,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-09-03,8,minute.maid,14656,9.592605087,0,1.69,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-09-03,8,dominicks,12800,9.45720045,0,1.79,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-09-10,8,tropicana,12800,9.45720045,0,2.49,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-09-10,8,minute.maid,30144,10.31374118,1,1.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-09-10,8,dominicks,15296,9.635346635,0,1.79,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-09-17,8,tropicana,10112,9.221478116,0,2.89,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-09-17,8,minute.maid,6208,8.733594062,0,2.49,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-09-17,8,dominicks,20992,9.951896692,0,1.79,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-09-24,8,tropicana,10304,9.240287448,1,2.79,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-09-24,8,minute.maid,7104,8.868413285,0,2.49,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-09-24,8,dominicks,73856,11.20987253,0,1.49,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-10-01,8,minute.maid,65856,11.09522582,1,2.19,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-10-01,8,dominicks,16192,9.692272572,0,1.79,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-10-01,8,tropicana,6400,8.764053269,0,2.89,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 diff --git a/how-to-use-azureml/automated-machine-learning/forecasting-grouping/data/dominicks_OJ_2_5_8_train.csv b/how-to-use-azureml/automated-machine-learning/forecasting-grouping/data/dominicks_OJ_2_5_8_train.csv deleted file mode 100644 index 8077497de..000000000 --- a/how-to-use-azureml/automated-machine-learning/forecasting-grouping/data/dominicks_OJ_2_5_8_train.csv +++ /dev/null @@ -1,973 +0,0 @@ -WeekStarting,Store,Brand,Quantity,logQuantity,Advert,Price,Age60,COLLEGE,INCOME,Hincome150,Large HH,Minorities,WorkingWoman,SSTRDIST,SSTRVOL,CPDIST5,CPWVOL5 -1990-06-14,2,dominicks,10560,9.264828557000001,1,1.59,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1990-06-14,2,minute.maid,4480,8.407378325,0,3.17,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1990-06-14,2,tropicana,8256,9.018695487999999,0,3.87,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1990-07-26,2,dominicks,8000,8.987196821,0,2.69,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1990-07-26,2,minute.maid,4672,8.449342525,0,3.17,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1990-07-26,2,tropicana,6144,8.723231275,0,3.87,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1990-08-02,2,tropicana,3840,8.253227646000001,0,3.87,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1990-08-02,2,minute.maid,20160,9.911455722000001,1,2.39,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1990-08-02,2,dominicks,6848,8.831711918,1,2.09,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1990-08-09,2,dominicks,2880,7.965545572999999,0,2.09,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1990-08-09,2,minute.maid,2688,7.896552702,0,3.17,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1990-08-09,2,tropicana,8000,8.987196821,0,3.87,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1990-08-23,2,dominicks,1600,7.377758908,0,2.09,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1990-08-23,2,minute.maid,3008,8.009030685,0,3.17,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1990-08-23,2,tropicana,8896,9.093357017,0,3.87,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1990-08-30,2,tropicana,7168,8.877381955,0,3.87,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1990-08-30,2,minute.maid,4672,8.449342525,0,3.17,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1990-08-30,2,dominicks,25344,10.140297300000002,1,1.89,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1990-09-06,2,dominicks,10752,9.282847063,0,1.89,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1990-09-06,2,minute.maid,2752,7.920083199,0,3.17,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1990-09-06,2,tropicana,10880,9.29468152,0,3.29,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1990-09-13,2,minute.maid,26176,10.17259824,1,2.19,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1990-09-13,2,dominicks,6656,8.803273982999999,0,1.89,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1990-09-13,2,tropicana,7744,8.954673629,0,3.29,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1990-09-20,2,dominicks,6592,8.793612072,0,1.79,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1990-09-20,2,minute.maid,3712,8.219326094,0,3.17,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1990-09-20,2,tropicana,8512,9.049232212,0,3.29,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1990-10-11,2,tropicana,5504,8.61323038,0,3.29,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1990-10-11,2,minute.maid,30656,10.33058368,1,1.99,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1990-10-11,2,dominicks,1728,7.454719948999999,0,2.69,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1990-10-18,2,tropicana,5888,8.68067166,0,3.56,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1990-10-18,2,minute.maid,3840,8.253227646000001,0,2.98,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1990-10-18,2,dominicks,33792,10.42797937,1,1.24,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1990-10-25,2,tropicana,8384,9.034080407000001,0,3.56,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1990-10-25,2,minute.maid,2816,7.943072717000001,0,3.17,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1990-10-25,2,dominicks,1920,7.560080465,0,1.59,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1990-11-01,2,tropicana,5952,8.691482577,0,3.56,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1990-11-01,2,minute.maid,23104,10.04776104,1,1.99,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1990-11-01,2,dominicks,8960,9.100525506,1,1.59,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1990-11-08,2,dominicks,11392,9.340666634,0,1.29,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1990-11-08,2,tropicana,6848,8.831711918,0,3.56,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1990-11-08,2,minute.maid,3392,8.129174997,0,3.17,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1990-11-15,2,tropicana,9216,9.128696383,0,3.87,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1990-11-15,2,minute.maid,26304,10.1774763,1,1.99,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1990-11-15,2,dominicks,28416,10.25470765,0,0.99,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1990-11-22,2,dominicks,17152,9.749870064,1,1.59,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1990-11-22,2,tropicana,12160,9.405907156,0,2.99,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1990-11-22,2,minute.maid,6336,8.754002933999999,0,1.99,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1990-11-29,2,tropicana,12672,9.447150114,0,2.99,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1990-11-29,2,minute.maid,9920,9.2023082,0,3.17,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1990-11-29,2,dominicks,26560,10.1871616,1,2.49,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1990-12-06,2,dominicks,6336,8.754002933999999,0,2.69,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1990-12-06,2,minute.maid,25280,10.13776885,1,1.99,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1990-12-06,2,tropicana,6528,8.783855897,0,3.59,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1990-12-13,2,dominicks,26368,10.17990643,1,1.39,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1990-12-13,2,tropicana,6144,8.723231275,0,3.59,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1990-12-13,2,minute.maid,14848,9.605620455,0,1.99,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1990-12-20,2,tropicana,21120,9.957975738,0,2.39,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1990-12-20,2,minute.maid,12288,9.416378455,0,1.99,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1990-12-20,2,dominicks,896,6.797940412999999,0,2.69,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1990-12-27,2,tropicana,12416,9.426741242,0,2.39,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1990-12-27,2,minute.maid,6272,8.743850562,0,1.99,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1990-12-27,2,dominicks,1472,7.294377299,0,2.69,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-01-03,2,tropicana,9472,9.156095357,0,3.59,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-01-03,2,minute.maid,9152,9.121727714,0,1.99,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-01-03,2,dominicks,1344,7.2034055210000005,0,2.69,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-01-10,2,tropicana,17920,9.793672686,0,2.59,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-01-10,2,minute.maid,4160,8.333270353,0,2.59,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-01-10,2,dominicks,111680,11.62339292,1,0.99,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-01-17,2,tropicana,9408,9.14931567,0,2.59,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-01-17,2,minute.maid,10176,9.227787286,0,2.59,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-01-17,2,dominicks,1856,7.526178913,0,2.69,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-01-24,2,tropicana,6272,8.743850562,0,3.59,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-01-24,2,minute.maid,29056,10.27698028,1,1.99,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-01-24,2,dominicks,5568,8.624791202,0,2.69,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-01-31,2,tropicana,6912,8.841014311,0,3.59,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-01-31,2,minute.maid,7104,8.868413285,0,2.59,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-01-31,2,dominicks,32064,10.37548918,1,1.49,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-02-07,2,tropicana,16768,9.727227587,0,2.49,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-02-07,2,dominicks,4352,8.378390789,0,1.49,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-02-07,2,minute.maid,7488,8.921057017999999,0,2.49,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-02-14,2,dominicks,704,6.556778356000001,0,2.69,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-02-14,2,minute.maid,4224,8.348537825,0,2.49,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-02-14,2,tropicana,6272,8.743850562,0,3.59,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-02-21,2,tropicana,7936,8.979164649,0,3.59,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-02-21,2,minute.maid,8960,9.100525506,0,2.49,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-02-21,2,dominicks,13760,9.529521112000001,0,2.69,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-02-28,2,tropicana,6144,8.723231275,0,3.59,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-02-28,2,minute.maid,22464,10.01966931,1,1.99,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-02-28,2,dominicks,43328,10.67655436,1,1.09,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-03-07,2,tropicana,7936,8.979164649,0,3.59,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-03-07,2,minute.maid,3840,8.253227646000001,0,2.59,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-03-07,2,dominicks,57600,10.96127785,1,1.09,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-03-14,2,tropicana,7808,8.962904128,0,3.59,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-03-14,2,minute.maid,12992,9.472089062,0,2.59,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-03-14,2,dominicks,704,6.556778356000001,0,2.69,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-03-21,2,tropicana,6080,8.712759975,0,3.59,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-03-21,2,minute.maid,70144,11.15830555,1,1.69,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-03-21,2,dominicks,6016,8.702177866,0,2.69,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-03-28,2,tropicana,42176,10.64960662,1,1.69,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-03-28,2,dominicks,10368,9.246479419,1,1.59,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-03-28,2,minute.maid,21248,9.964018052,0,1.69,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-04-04,2,dominicks,12608,9.442086812000001,0,1.59,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-04-04,2,minute.maid,5696,8.647519453,1,2.59,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-04-04,2,tropicana,4928,8.502688505,0,3.59,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-04-11,2,tropicana,29504,10.29228113,1,1.99,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-04-11,2,minute.maid,7680,8.946374826,0,2.09,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-04-11,2,dominicks,6336,8.754002933999999,0,1.59,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-04-18,2,tropicana,9984,9.208739091,0,3.59,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-04-18,2,minute.maid,6336,8.754002933999999,0,2.09,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-04-18,2,dominicks,140736,11.85464107,1,0.99,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-04-25,2,tropicana,35200,10.46880136,1,1.99,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-04-25,2,dominicks,960,6.866933285,1,2.69,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-04-25,2,minute.maid,8576,9.056722882999999,0,2.09,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-05-02,2,dominicks,1216,7.103322062999999,0,2.69,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-05-02,2,minute.maid,15104,9.622714887999999,0,2.09,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-05-02,2,tropicana,23936,10.08313888,0,1.99,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-05-09,2,tropicana,7104,8.868413285,0,3.59,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-05-09,2,minute.maid,76480,11.24478455,1,1.39,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-05-09,2,dominicks,1664,7.416979621,0,2.69,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-05-16,2,dominicks,4992,8.51559191,0,2.69,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-05-16,2,minute.maid,5056,8.528330936,0,2.39,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-05-16,2,tropicana,24512,10.10691807,1,2.29,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-05-23,2,tropicana,6336,8.754002933999999,0,3.59,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-05-23,2,minute.maid,4736,8.462948177000001,0,2.39,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-05-23,2,dominicks,27968,10.23881628,1,1.59,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-05-30,2,dominicks,12160,9.405907156,0,1.59,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-05-30,2,minute.maid,4480,8.407378325,0,2.39,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-05-30,2,tropicana,6080,8.712759975,0,3.59,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-06-06,2,tropicana,33536,10.42037477,0,1.99,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-06-06,2,minute.maid,4032,8.30201781,0,2.39,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-06-06,2,dominicks,2240,7.714231145,0,2.69,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-06-13,2,dominicks,5504,8.61323038,1,1.49,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-06-13,2,minute.maid,14784,9.601300794,1,1.79,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-06-13,2,tropicana,13248,9.491601877,0,1.99,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-06-20,2,tropicana,6208,8.733594062,0,3.59,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-06-20,2,dominicks,8832,9.086136769,0,1.29,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-06-20,2,minute.maid,12096,9.400630097999999,0,2.39,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-06-27,2,dominicks,2624,7.87245515,0,1.99,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-06-27,2,minute.maid,41792,10.64046021,1,1.69,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-06-27,2,tropicana,10624,9.270870872,0,3.59,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-07-04,2,tropicana,44672,10.70710219,0,1.99,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-07-04,2,minute.maid,10560,9.264828557000001,0,1.69,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-07-04,2,dominicks,10432,9.252633284,0,1.99,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-07-18,2,tropicana,20096,9.908276069,0,1.99,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-07-18,2,dominicks,8320,9.026417534,0,1.59,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-07-18,2,minute.maid,4224,8.348537825,0,2.39,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-07-25,2,dominicks,6784,8.822322178,0,1.99,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-07-25,2,minute.maid,2880,7.965545572999999,0,2.39,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-07-25,2,tropicana,9152,9.121727714,1,3.59,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-08-01,2,tropicana,21952,9.996613531,0,2.19,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-08-01,2,minute.maid,3968,8.286017467999999,0,2.39,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-08-01,2,dominicks,60544,11.01112565,1,0.99,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-08-08,2,dominicks,20608,9.933434629,0,0.99,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-08-08,2,minute.maid,3712,8.219326094,0,2.39,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-08-08,2,tropicana,13568,9.515469357999999,0,2.19,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-08-29,2,tropicana,4160,8.333270353,0,3.59,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-08-29,2,minute.maid,2816,7.943072717000001,0,2.39,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-08-29,2,dominicks,16064,9.684336023,0,1.39,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-09-05,2,tropicana,39424,10.58213005,1,1.99,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-09-05,2,minute.maid,4288,8.363575702999999,0,2.39,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-09-05,2,dominicks,12480,9.431882642,0,1.39,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-09-12,2,tropicana,5632,8.636219898,0,3.59,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-09-12,2,minute.maid,18240,9.811372264,1,1.99,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-09-12,2,dominicks,17024,9.742379392,0,1.39,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-09-19,2,dominicks,13440,9.505990614,1,1.59,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-09-19,2,minute.maid,7360,8.903815212,0,1.95,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-09-19,2,tropicana,9024,9.107642974,1,2.68,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-09-26,2,tropicana,6016,8.702177866,0,3.44,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-09-26,2,minute.maid,7808,8.962904128,0,1.83,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-09-26,2,dominicks,10112,9.221478116,0,1.59,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-10-03,2,dominicks,9088,9.114710141,0,1.56,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-10-03,2,minute.maid,13504,9.510741217,0,1.79,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-10-03,2,tropicana,7744,8.954673629,0,3.14,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-10-10,2,tropicana,6784,8.822322178,0,3.07,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-10-10,2,dominicks,22848,10.03661887,1,1.49,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-10-10,2,minute.maid,10048,9.215128888999999,0,1.91,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-10-17,2,dominicks,6976,8.850230966,0,1.65,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-10-17,2,minute.maid,135936,11.81993947,1,1.69,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-10-17,2,tropicana,6784,8.822322178,0,3.07,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-10-24,2,tropicana,6272,8.743850562,0,3.07,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-10-24,2,minute.maid,5056,8.528330936,0,2.39,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-10-24,2,dominicks,4160,8.333270353,0,1.99,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-10-31,2,tropicana,5312,8.577723691000001,0,3.07,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-10-31,2,minute.maid,27968,10.23881628,0,1.49,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-10-31,2,dominicks,3328,8.110126802,0,1.83,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-11-07,2,tropicana,9216,9.128696383,0,3.11,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-11-07,2,minute.maid,4736,8.462948177000001,0,2.39,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-11-07,2,dominicks,12096,9.400630097999999,1,1.69,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-11-14,2,tropicana,7296,8.895081532,0,3.19,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-11-14,2,minute.maid,7808,8.962904128,0,2.14,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-11-14,2,dominicks,6208,8.733594062,0,1.76,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-11-21,2,tropicana,34240,10.44114983,1,1.99,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-11-21,2,minute.maid,12480,9.431882642,0,1.99,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-11-21,2,dominicks,3008,8.009030685,0,1.99,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-11-28,2,dominicks,19456,9.875910785,1,1.5,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-11-28,2,minute.maid,9664,9.17616292,0,1.99,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-11-28,2,tropicana,7168,8.877381955,0,2.64,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-12-05,2,minute.maid,7168,8.877381955,0,2.06,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-12-05,2,dominicks,16768,9.727227587,0,1.59,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-12-05,2,tropicana,6080,8.712759975,0,3.19,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-12-12,2,dominicks,13568,9.515469357999999,1,1.59,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-12-12,2,minute.maid,4480,8.407378325,0,2.39,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-12-12,2,tropicana,5120,8.540909718,0,3.19,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-12-19,2,tropicana,8320,9.026417534,0,2.74,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-12-19,2,minute.maid,5952,8.691482577,0,2.22,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-12-19,2,dominicks,6080,8.712759975,0,1.61,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-12-26,2,dominicks,10432,9.252633284,1,1.69,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-12-26,2,minute.maid,21696,9.984883191,1,1.99,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1991-12-26,2,tropicana,17728,9.78290059,0,2.39,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-01-02,2,minute.maid,12032,9.395325046,0,1.99,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-01-02,2,dominicks,11712,9.368369236,0,1.69,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-01-02,2,tropicana,13120,9.481893063,0,2.35,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-01-09,2,dominicks,4032,8.30201781,0,1.76,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-01-09,2,minute.maid,7040,8.859363449,0,2.12,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-01-09,2,tropicana,13120,9.481893063,0,2.29,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-01-16,2,dominicks,6336,8.754002933999999,0,1.82,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-01-16,2,tropicana,9792,9.189321005,0,2.43,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-01-16,2,minute.maid,10240,9.234056899,1,2.49,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-01-23,2,tropicana,3520,8.166216269,0,3.19,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-01-23,2,minute.maid,6848,8.831711918,1,2.49,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-01-23,2,dominicks,13632,9.520175249,0,1.47,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-01-30,2,tropicana,5504,8.61323038,0,3.19,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-01-30,2,minute.maid,3968,8.286017467999999,0,2.61,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-01-30,2,dominicks,45120,10.71708089,0,1.29,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-02-06,2,tropicana,6720,8.812843434,0,3.19,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-02-06,2,minute.maid,5888,8.68067166,0,2.26,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-02-06,2,dominicks,9984,9.208739091,0,1.39,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-02-13,2,tropicana,20224,9.914625297,1,1.99,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-02-13,2,dominicks,4800,8.476371197,0,1.82,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-02-13,2,minute.maid,6208,8.733594062,0,1.99,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-02-20,2,dominicks,11776,9.373818841,0,1.69,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-02-20,2,minute.maid,72256,11.18797065,1,1.99,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-02-20,2,tropicana,5056,8.528330936,0,3.19,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-02-27,2,tropicana,43584,10.68244539,1,1.99,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-02-27,2,minute.maid,11520,9.351839934,0,2.11,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-02-27,2,dominicks,11584,9.357380115,0,1.54,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-03-05,2,tropicana,25728,10.15533517,0,1.79,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-03-05,2,minute.maid,5824,8.66974259,0,2.35,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-03-05,2,dominicks,51264,10.84474403,1,1.39,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-03-12,2,tropicana,31808,10.36747311,0,1.79,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-03-12,2,minute.maid,19392,9.872615889,1,1.99,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-03-12,2,dominicks,14976,9.614204199,0,1.44,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-03-19,2,tropicana,20736,9.939626599,0,1.91,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-03-19,2,minute.maid,9536,9.162829389,0,2.1,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-03-19,2,dominicks,30784,10.33475035,0,1.59,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-03-26,2,tropicana,15168,9.626943225,0,2.81,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-03-26,2,minute.maid,5312,8.577723691000001,0,2.28,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-03-26,2,dominicks,12480,9.431882642,0,1.6,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-04-02,2,tropicana,28096,10.2433825,1,2.5,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-04-02,2,dominicks,3264,8.090708716,0,1.99,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-04-02,2,minute.maid,14528,9.583833101,1,1.9,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-04-09,2,dominicks,8768,9.078864009,0,1.48,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-04-09,2,minute.maid,12416,9.426741242,0,2.12,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-04-09,2,tropicana,12416,9.426741242,0,2.58,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-04-16,2,tropicana,5376,8.589699882,0,3.19,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-04-16,2,minute.maid,5376,8.589699882,0,2.79,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-04-16,2,dominicks,70848,11.16829202,1,1.29,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-04-23,2,tropicana,9792,9.189321005,0,2.67,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-04-23,2,minute.maid,19008,9.852615222,1,2.09,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-04-23,2,dominicks,18560,9.828764006,0,1.42,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-04-30,2,tropicana,16960,9.738612909,1,2.39,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-04-30,2,minute.maid,3904,8.269756948,0,2.79,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-04-30,2,dominicks,9152,9.121727714,0,1.99,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-05-07,2,tropicana,8320,9.026417534,0,3.19,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-05-07,2,minute.maid,6336,8.754002933999999,0,2.79,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-05-07,2,dominicks,9600,9.169518378,0,2.0,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-05-14,2,tropicana,6912,8.841014311,0,3.19,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-05-14,2,minute.maid,5440,8.60153434,0,2.79,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-05-14,2,dominicks,4800,8.476371197,0,2.09,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-05-21,2,tropicana,6976,8.850230966,0,3.19,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-05-21,2,minute.maid,22400,10.01681624,1,2.09,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-05-21,2,dominicks,9664,9.17616292,0,1.69,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-05-28,2,minute.maid,3968,8.286017467999999,0,2.84,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-05-28,2,tropicana,7232,8.886270902,0,3.19,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-05-28,2,dominicks,45568,10.726961,0,1.69,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-06-04,2,tropicana,51520,10.84972536,1,2.49,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-06-04,2,minute.maid,3264,8.090708716,0,2.89,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-06-04,2,dominicks,20992,9.951896692,0,1.74,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-06-11,2,minute.maid,4352,8.378390789,0,2.89,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-06-11,2,tropicana,22272,10.01108556,0,2.21,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-06-11,2,dominicks,6592,8.793612072,0,2.09,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-06-18,2,dominicks,4992,8.51559191,0,2.05,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-06-18,2,minute.maid,4480,8.407378325,0,2.89,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-06-18,2,tropicana,46144,10.73952222,1,1.99,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-06-25,2,tropicana,4352,8.378390789,1,3.19,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-06-25,2,minute.maid,3840,8.253227646000001,0,2.52,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-06-25,2,dominicks,8064,8.99516499,0,1.24,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-07-02,2,tropicana,17280,9.757305042,0,2.69,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-07-02,2,minute.maid,13312,9.496421162999999,1,2.0,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-07-02,2,dominicks,7360,8.903815212,0,1.61,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-07-09,2,tropicana,5696,8.647519453,0,3.19,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-07-09,2,minute.maid,3776,8.236420527,1,2.33,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-07-09,2,dominicks,10048,9.215128888999999,0,1.4,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-07-16,2,tropicana,6848,8.831711918,0,3.19,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-07-16,2,dominicks,10112,9.221478116,0,1.91,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-07-16,2,minute.maid,4800,8.476371197,0,2.89,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-07-23,2,dominicks,9152,9.121727714,0,1.69,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-07-23,2,minute.maid,24960,10.12502982,1,2.29,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-07-23,2,tropicana,4416,8.392989587999999,0,3.19,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-07-30,2,tropicana,4672,8.449342525,0,3.16,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-07-30,2,minute.maid,4544,8.42156296,0,2.86,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-07-30,2,dominicks,36288,10.49924239,1,1.49,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-08-06,2,tropicana,7168,8.877381955,1,3.09,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-08-06,2,minute.maid,3968,8.286017467999999,1,2.81,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-08-06,2,dominicks,3776,8.236420527,1,1.99,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-08-13,2,tropicana,5056,8.528330936,0,3.19,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-08-13,2,dominicks,3328,8.110126802,0,1.97,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-08-13,2,minute.maid,49600,10.81174611,1,1.99,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1992-08-20,2,dominicks,13824,9.534161491,0,1.36,0.232864734,0.248934934,10.55320518,0.463887065,0.103953406,0.114279949,0.303585347,2.110122129,1.142857143,1.927279669,0.37692661299999997 -1990-06-14,5,dominicks,1792,7.491087594,1,1.59,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-06-14,5,minute.maid,4224,8.348537825,0,2.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-06-14,5,tropicana,5888,8.68067166,0,3.66,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-06-28,5,minute.maid,4352,8.378390789,0,2.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-06-28,5,dominicks,2496,7.82244473,0,2.49,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-06-28,5,tropicana,6976,8.850230966,0,3.66,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-07-05,5,dominicks,2944,7.98752448,0,2.49,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-07-05,5,minute.maid,4928,8.502688505,0,2.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-07-05,5,tropicana,6528,8.783855897,0,3.66,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-07-12,5,dominicks,1024,6.931471806,0,2.49,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-07-12,5,minute.maid,31168,10.34714721,1,2.59,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-07-12,5,tropicana,4928,8.502688505,0,3.66,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-07-26,5,dominicks,4224,8.348537825,0,2.49,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-07-26,5,minute.maid,10048,9.215128888999999,0,2.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-07-26,5,tropicana,5312,8.577723691000001,0,3.66,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-08-02,5,minute.maid,21760,9.987828701,1,2.39,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-08-02,5,tropicana,5120,8.540909718,0,3.66,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-08-02,5,dominicks,4544,8.42156296,1,2.09,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-08-09,5,dominicks,1728,7.454719948999999,0,2.09,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-08-09,5,minute.maid,4544,8.42156296,0,2.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-08-09,5,tropicana,7936,8.979164649,0,3.66,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-08-16,5,tropicana,6080,8.712759975,0,3.66,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-08-16,5,minute.maid,52224,10.86329744,1,1.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-08-16,5,dominicks,1216,7.103322062999999,0,2.09,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-08-23,5,dominicks,1152,7.049254841000001,0,2.09,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-08-23,5,minute.maid,3584,8.184234774,0,2.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-08-23,5,tropicana,4160,8.333270353,0,3.66,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-08-30,5,minute.maid,5120,8.540909718,0,2.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-08-30,5,tropicana,5888,8.68067166,0,3.66,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-08-30,5,dominicks,30144,10.31374118,1,1.89,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-09-06,5,dominicks,8960,9.100525506,0,1.89,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-09-06,5,minute.maid,4416,8.392989587999999,0,2.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-09-06,5,tropicana,9536,9.162829389,0,3.29,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-09-13,5,tropicana,8320,9.026417534,0,3.29,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-09-13,5,dominicks,8192,9.010913347,0,1.89,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-09-13,5,minute.maid,30208,10.31586207,1,2.19,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-09-20,5,dominicks,6528,8.783855897,0,1.79,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-09-20,5,minute.maid,4160,8.333270353,0,2.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-09-20,5,tropicana,8000,8.987196821,0,3.29,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-09-27,5,dominicks,34688,10.45414909,1,1.79,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-09-27,5,minute.maid,4992,8.51559191,0,2.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-09-27,5,tropicana,5824,8.66974259,0,3.66,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-10-04,5,dominicks,4672,8.449342525,0,1.79,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-10-04,5,minute.maid,13952,9.543378146,0,2.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-10-04,5,tropicana,10624,9.270870872,1,3.29,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-10-11,5,tropicana,6656,8.803273982999999,0,3.29,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-10-11,5,dominicks,1088,6.992096427000001,0,2.49,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-10-11,5,minute.maid,47680,10.772267300000001,1,1.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-10-18,5,tropicana,5184,8.553332238,0,3.51,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-10-18,5,minute.maid,7616,8.938006577000001,0,2.69,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-10-18,5,dominicks,69440,11.14821835,1,1.24,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-10-25,5,tropicana,4928,8.502688505,0,3.51,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-10-25,5,minute.maid,8896,9.093357017,0,2.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-10-25,5,dominicks,1280,7.154615357000001,0,1.59,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-11-01,5,tropicana,5888,8.68067166,0,3.51,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-11-01,5,minute.maid,28544,10.25920204,1,1.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-11-01,5,dominicks,35456,10.47604777,1,1.59,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-11-08,5,tropicana,5312,8.577723691000001,0,3.51,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-11-08,5,dominicks,13824,9.534161491,0,1.29,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-11-08,5,minute.maid,5440,8.60153434,0,2.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-11-15,5,tropicana,9984,9.208739091,0,3.66,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-11-15,5,minute.maid,52416,10.86696717,1,1.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-11-15,5,dominicks,14208,9.561560465,0,0.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-11-22,5,tropicana,8448,9.041685006,0,2.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-11-22,5,dominicks,29312,10.28575227,1,1.59,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-11-22,5,minute.maid,11712,9.368369236,0,1.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-11-29,5,tropicana,10880,9.29468152,0,2.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-11-29,5,minute.maid,13952,9.543378146,0,2.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-11-29,5,dominicks,52992,10.87789624,1,2.49,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-12-06,5,dominicks,15680,9.660141293999999,0,2.19,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-12-06,5,minute.maid,36160,10.49570882,1,1.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-12-06,5,tropicana,5696,8.647519453,0,3.39,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-12-13,5,tropicana,5696,8.647519453,0,3.39,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-12-13,5,minute.maid,12864,9.462187991,0,1.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-12-13,5,dominicks,43520,10.68097588,1,1.39,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-12-20,5,tropicana,32384,10.38541975,0,2.39,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-12-20,5,minute.maid,22208,10.00820786,0,1.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-12-20,5,dominicks,3904,8.269756948,0,2.19,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-12-27,5,tropicana,10752,9.282847063,0,2.39,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-12-27,5,minute.maid,9984,9.208739091,0,1.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-12-27,5,dominicks,896,6.797940412999999,0,2.19,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-01-03,5,tropicana,6912,8.841014311,0,3.39,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-01-03,5,minute.maid,14016,9.547954812999999,0,1.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-01-03,5,dominicks,2240,7.714231145,0,2.19,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-01-10,5,tropicana,13440,9.505990614,0,2.59,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-01-10,5,minute.maid,6080,8.712759975,0,2.46,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-01-10,5,dominicks,125760,11.74213061,1,0.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-01-17,5,tropicana,7808,8.962904128,0,2.59,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-01-17,5,minute.maid,7808,8.962904128,0,2.46,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-01-17,5,dominicks,1408,7.249925537,0,2.19,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-01-24,5,tropicana,5248,8.565602331000001,0,3.39,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-01-24,5,minute.maid,40896,10.61878754,1,1.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-01-24,5,dominicks,7232,8.886270902,0,2.19,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-01-31,5,tropicana,6208,8.733594062,0,3.39,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-01-31,5,minute.maid,6272,8.743850562,0,2.46,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-01-31,5,dominicks,41216,10.62658181,1,1.49,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-02-07,5,tropicana,21440,9.973013615,0,2.49,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-02-07,5,minute.maid,7872,8.971067439,0,2.41,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-02-07,5,dominicks,9024,9.107642974,0,1.49,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-02-14,5,dominicks,1600,7.377758908,0,2.19,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-02-14,5,tropicana,7360,8.903815212,0,3.39,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-02-14,5,minute.maid,6144,8.723231275,0,2.41,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-02-21,5,tropicana,6720,8.812843434,0,3.39,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-02-21,5,minute.maid,8448,9.041685006,0,2.41,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-02-21,5,dominicks,2496,7.82244473,0,2.19,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-02-28,5,tropicana,6656,8.803273982999999,0,3.39,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-02-28,5,minute.maid,18688,9.835636886,1,1.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-02-28,5,dominicks,6336,8.754002933999999,1,1.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-03-07,5,tropicana,6016,8.702177866,0,3.39,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-03-07,5,minute.maid,6272,8.743850562,0,2.46,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-03-07,5,dominicks,56384,10.93994071,1,1.09,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-03-14,5,tropicana,6144,8.723231275,0,3.39,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-03-14,5,minute.maid,12096,9.400630097999999,0,2.46,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-03-14,5,dominicks,1600,7.377758908,0,2.19,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-03-21,5,tropicana,4928,8.502688505,0,3.39,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-03-21,5,minute.maid,73216,11.20116926,1,1.69,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-03-21,5,dominicks,2944,7.98752448,0,2.19,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-03-28,5,tropicana,67712,11.1230187,1,1.69,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-03-28,5,minute.maid,18944,9.849242538,0,1.69,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-03-28,5,dominicks,13504,9.510741217,1,1.59,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-04-04,5,dominicks,5376,8.589699882,0,1.59,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-04-04,5,tropicana,8640,9.064157862,0,3.39,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-04-04,5,minute.maid,6400,8.764053269,1,2.46,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-04-11,5,tropicana,35520,10.477851199999998,1,1.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-04-11,5,minute.maid,8640,9.064157862,0,2.09,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-04-11,5,dominicks,6656,8.803273982999999,0,1.59,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-04-18,5,tropicana,9664,9.17616292,0,3.39,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-04-18,5,minute.maid,7296,8.895081532,0,2.09,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-04-18,5,dominicks,95680,11.46876457,1,0.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-04-25,5,tropicana,49088,10.80136989,1,1.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-04-25,5,minute.maid,12480,9.431882642,0,2.09,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-04-25,5,dominicks,896,6.797940412999999,1,2.19,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-05-02,5,dominicks,1728,7.454719948999999,0,2.19,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-05-02,5,minute.maid,14144,9.557045785,0,2.09,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-05-02,5,tropicana,14912,9.609921537,0,1.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-05-09,5,minute.maid,88256,11.38799696,1,1.39,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-05-09,5,tropicana,6464,8.774003599999999,0,3.39,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-05-09,5,dominicks,1280,7.154615357000001,0,2.19,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-05-16,5,dominicks,5696,8.647519453,0,2.19,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-05-16,5,minute.maid,6848,8.831711918,0,2.26,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-05-16,5,tropicana,25024,10.12759064,1,2.29,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-05-23,5,minute.maid,7808,8.962904128,0,2.26,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-05-23,5,tropicana,6272,8.743850562,0,3.39,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-05-23,5,dominicks,28288,10.25019297,1,1.59,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-05-30,5,dominicks,4864,8.489616424,0,1.59,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-05-30,5,minute.maid,6272,8.743850562,0,2.26,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-05-30,5,tropicana,5056,8.528330936,0,3.39,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-06-06,5,minute.maid,6144,8.723231275,0,2.26,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-06-06,5,tropicana,47616,10.77092412,0,1.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-06-06,5,dominicks,2880,7.965545572999999,0,2.09,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-06-13,5,dominicks,5760,8.658692754,1,1.41,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-06-13,5,minute.maid,27776,10.23192762,1,1.69,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-06-13,5,tropicana,13888,9.538780437,0,1.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-06-20,5,tropicana,6144,8.723231275,0,3.39,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-06-20,5,minute.maid,20800,9.942708266,0,2.26,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-06-20,5,dominicks,15040,9.618468598,0,1.29,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-06-27,5,dominicks,5120,8.540909718,0,1.89,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-06-27,5,minute.maid,45696,10.72976605,1,1.69,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-06-27,5,tropicana,9344,9.142489705,0,3.39,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-07-04,5,minute.maid,14336,9.570529135,0,1.69,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-07-04,5,tropicana,32896,10.40110635,0,1.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-07-04,5,dominicks,3264,8.090708716,0,1.89,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-07-11,5,dominicks,9536,9.162829389,1,1.59,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-07-11,5,minute.maid,4928,8.502688505,0,2.26,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-07-11,5,tropicana,21056,9.954940834,0,1.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-07-18,5,tropicana,15360,9.639522007,0,1.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-07-18,5,minute.maid,4608,8.435549202,0,2.26,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-07-18,5,dominicks,6208,8.733594062,0,1.59,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-07-25,5,dominicks,6592,8.793612072,0,1.89,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-07-25,5,tropicana,8000,8.987196821,1,3.39,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-07-25,5,minute.maid,5248,8.565602331000001,0,2.26,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-08-01,5,tropicana,21120,9.957975738,0,2.19,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-08-01,5,dominicks,63552,11.05961375,1,0.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-08-01,5,minute.maid,4224,8.348537825,0,2.26,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-08-08,5,dominicks,27968,10.23881628,0,0.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-08-08,5,minute.maid,4288,8.363575702999999,0,2.26,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-08-08,5,tropicana,11904,9.384629757,0,2.19,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-08-15,5,minute.maid,16896,9.734832187,0,2.26,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-08-15,5,tropicana,5056,8.528330936,0,3.39,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-08-15,5,dominicks,21760,9.987828701,1,1.49,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-08-22,5,dominicks,2688,7.896552702,0,1.49,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-08-22,5,minute.maid,77184,11.25394746,1,1.29,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-08-22,5,tropicana,4608,8.435549202,0,3.39,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-08-29,5,tropicana,6016,8.702177866,0,3.39,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-08-29,5,minute.maid,5184,8.553332238,0,2.26,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-08-29,5,dominicks,10432,9.252633284,0,1.39,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-09-05,5,tropicana,50752,10.83470631,1,1.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-09-05,5,minute.maid,5248,8.565602331000001,0,2.26,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-09-05,5,dominicks,9792,9.189321005,0,1.39,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-09-12,5,minute.maid,20672,9.936535407000001,1,1.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-09-12,5,tropicana,5632,8.636219898,0,3.39,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-09-12,5,dominicks,8448,9.041685006,0,1.39,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-09-26,5,tropicana,6400,8.764053269,0,3.39,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-09-26,5,dominicks,6912,8.841014311,0,1.58,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-09-26,5,minute.maid,12352,9.421573272,0,1.89,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-10-03,5,dominicks,8256,9.018695487999999,0,1.58,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-10-03,5,minute.maid,12032,9.395325046,0,1.79,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-10-03,5,tropicana,5440,8.60153434,0,2.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-10-10,5,minute.maid,13440,9.505990614,0,1.79,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-10-10,5,dominicks,28672,10.26367632,1,1.58,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-10-10,5,tropicana,8128,9.00307017,0,2.94,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-10-24,5,tropicana,7232,8.886270902,0,2.94,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-10-24,5,minute.maid,5824,8.66974259,0,2.26,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-10-24,5,dominicks,4416,8.392989587999999,0,1.58,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-10-31,5,tropicana,7168,8.877381955,0,2.94,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-10-31,5,minute.maid,50112,10.82201578,0,1.49,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-10-31,5,dominicks,1856,7.526178913,0,1.58,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-11-07,5,minute.maid,5184,8.553332238,0,2.26,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-11-07,5,tropicana,7872,8.971067439,0,2.94,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-11-07,5,dominicks,6528,8.783855897,1,1.58,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-11-14,5,tropicana,7552,8.929567707999999,0,2.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-11-14,5,minute.maid,8384,9.034080407000001,0,2.26,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-11-14,5,dominicks,6080,8.712759975,0,1.58,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-11-21,5,tropicana,69504,11.14913958,1,1.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-11-21,5,dominicks,3456,8.14786713,0,1.58,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-11-21,5,minute.maid,10112,9.221478116,0,1.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-11-28,5,dominicks,25856,10.16029796,1,1.58,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-11-28,5,minute.maid,8384,9.034080407000001,0,1.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-11-28,5,tropicana,8960,9.100525506,0,2.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-12-05,5,tropicana,6912,8.841014311,0,2.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-12-05,5,dominicks,25728,10.15533517,0,1.58,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-12-05,5,minute.maid,11456,9.346268889,0,1.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-12-12,5,dominicks,23552,10.06696602,1,1.58,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-12-12,5,minute.maid,5952,8.691482577,0,2.26,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-12-12,5,tropicana,6656,8.803273982999999,0,2.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-12-19,5,tropicana,8192,9.010913347,0,2.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-12-19,5,dominicks,2944,7.98752448,0,1.58,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-12-19,5,minute.maid,8512,9.049232212,0,2.26,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-12-26,5,dominicks,5888,8.68067166,1,1.58,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-12-26,5,minute.maid,27968,10.23881628,1,1.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1991-12-26,5,tropicana,13440,9.505990614,0,2.39,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-01-02,5,tropicana,12160,9.405907156,0,2.39,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-01-02,5,dominicks,6848,8.831711918,0,1.58,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-01-02,5,minute.maid,24000,10.08580911,0,1.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-01-09,5,dominicks,1792,7.491087594,0,1.58,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-01-09,5,minute.maid,6848,8.831711918,0,1.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-01-09,5,tropicana,11840,9.379238908,0,2.29,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-01-16,5,tropicana,8640,9.064157862,0,2.29,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-01-16,5,dominicks,5248,8.565602331000001,0,1.58,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-01-16,5,minute.maid,15104,9.622714887999999,1,2.49,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-01-23,5,tropicana,5888,8.68067166,0,2.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-01-23,5,minute.maid,11392,9.340666634,1,2.49,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-01-23,5,dominicks,16768,9.727227587,0,1.58,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-01-30,5,tropicana,7424,8.912473275,0,2.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-01-30,5,minute.maid,5824,8.66974259,0,2.49,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-01-30,5,dominicks,52160,10.8620712,0,1.58,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-02-06,5,tropicana,5632,8.636219898,0,2.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-02-06,5,minute.maid,7488,8.921057017999999,0,2.66,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-02-06,5,dominicks,16640,9.719564714,0,1.58,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-02-13,5,tropicana,33600,10.42228135,1,1.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-02-13,5,minute.maid,8320,9.026417534,0,1.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-02-13,5,dominicks,1344,7.2034055210000005,0,1.58,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-02-20,5,dominicks,4608,8.435549202,0,1.58,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-02-20,5,tropicana,5376,8.589699882,0,2.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-02-20,5,minute.maid,99904,11.511965,1,1.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-02-27,5,tropicana,54272,10.90176372,1,1.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-02-27,5,minute.maid,6976,8.850230966,0,1.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-02-27,5,dominicks,12672,9.447150114,0,1.58,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-03-05,5,tropicana,33600,10.42228135,0,1.79,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-03-05,5,minute.maid,9984,9.208739091,0,2.66,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-03-05,5,dominicks,48640,10.79220152,1,1.58,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-03-12,5,tropicana,24448,10.10430369,0,1.79,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-03-12,5,minute.maid,32832,10.39915893,1,1.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-03-12,5,dominicks,13248,9.491601877,0,1.58,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-03-19,5,tropicana,22784,10.03381381,0,1.79,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-03-19,5,minute.maid,8128,9.00307017,0,1.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-03-19,5,dominicks,29248,10.28356647,0,1.58,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-03-26,5,tropicana,19008,9.852615222,0,2.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-03-26,5,minute.maid,6464,8.774003599999999,0,2.66,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-03-26,5,dominicks,4608,8.435549202,0,1.58,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-04-02,5,tropicana,15808,9.66827142,1,2.5,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-04-02,5,minute.maid,36800,10.51325312,1,1.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-04-02,5,dominicks,3136,8.050703382,0,1.58,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-04-09,5,dominicks,13184,9.486759252,0,1.58,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-04-09,5,tropicana,14144,9.557045785,0,2.5,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-04-09,5,minute.maid,12928,9.467150781,0,1.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-04-16,5,tropicana,9600,9.169518378,0,2.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-04-16,5,minute.maid,7424,8.912473275,0,2.66,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-04-16,5,dominicks,67712,11.1230187,1,1.29,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-04-23,5,tropicana,10112,9.221478116,0,2.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-04-23,5,minute.maid,34176,10.43927892,1,1.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-04-23,5,dominicks,18880,9.84585844,0,1.29,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-04-30,5,minute.maid,4160,8.333270353,0,2.66,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-04-30,5,tropicana,31872,10.36948316,1,2.24,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-04-30,5,dominicks,6208,8.733594062,0,1.89,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-05-07,5,tropicana,9280,9.135616826,0,2.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-05-07,5,minute.maid,5952,8.691482577,0,2.66,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-05-07,5,dominicks,5952,8.691482577,0,1.89,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-05-14,5,tropicana,7680,8.946374826,0,2.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-05-14,5,minute.maid,6528,8.783855897,0,2.66,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-05-14,5,dominicks,4160,8.333270353,0,1.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-05-21,5,tropicana,8704,9.071537969,0,2.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-05-21,5,minute.maid,30656,10.33058368,1,1.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-05-21,5,dominicks,23488,10.06424493,0,1.69,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-05-28,5,tropicana,9920,9.2023082,0,2.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-05-28,5,dominicks,60480,11.01006801,0,1.69,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-05-28,5,minute.maid,6656,8.803273982999999,0,2.66,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-06-04,5,tropicana,91968,11.42919597,1,2.49,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-06-04,5,minute.maid,4416,8.392989587999999,0,2.69,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-06-04,5,dominicks,20416,9.924074186,0,1.69,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-06-11,5,tropicana,44096,10.69412435,0,2.49,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-06-11,5,dominicks,6336,8.754002933999999,0,1.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-06-11,5,minute.maid,5696,8.647519453,0,2.69,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-06-25,5,minute.maid,5696,8.647519453,0,2.69,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-06-25,5,tropicana,7296,8.895081532,1,2.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-06-25,5,dominicks,1408,7.249925537,0,1.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-07-02,5,tropicana,12928,9.467150781,0,2.69,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-07-02,5,minute.maid,39680,10.58860256,1,2.01,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-07-02,5,dominicks,4672,8.449342525,0,1.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-07-09,5,tropicana,6848,8.831711918,0,2.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-07-09,5,minute.maid,6208,8.733594062,1,2.19,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-07-09,5,dominicks,19520,9.87919486,0,1.29,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-07-16,5,tropicana,8064,8.99516499,0,2.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-07-16,5,minute.maid,7872,8.971067439,0,2.69,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-07-16,5,dominicks,7872,8.971067439,0,1.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-07-23,5,dominicks,5184,8.553332238,0,1.69,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-07-23,5,tropicana,4992,8.51559191,0,2.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-07-23,5,minute.maid,54528,10.90646961,1,2.29,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-07-30,5,tropicana,7360,8.903815212,0,2.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-07-30,5,minute.maid,6400,8.764053269,0,2.69,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-07-30,5,dominicks,42240,10.65112292,1,1.49,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-08-06,5,tropicana,8384,9.034080407000001,1,2.89,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-08-06,5,minute.maid,5888,8.68067166,1,2.65,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-08-06,5,dominicks,6592,8.793612072,1,1.89,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-08-13,5,tropicana,8832,9.086136769,0,2.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-08-13,5,minute.maid,56384,10.93994071,1,1.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-08-13,5,dominicks,2112,7.655390645,0,1.99,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1992-08-20,5,dominicks,21248,9.964018052,0,1.79,0.117368032,0.32122573,10.92237097,0.535883355,0.103091585,0.053875277,0.410568032,3.801997814,0.681818182,1.600573425,0.736306837 -1990-06-14,8,dominicks,14336,9.570529135,1,1.59,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-06-14,8,minute.maid,6080,8.712759975,0,2.62,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-06-14,8,tropicana,8896,9.093357017,0,3.19,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-06-21,8,dominicks,6400,8.764053269,0,2.29,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-06-21,8,minute.maid,51968,10.85838342,1,1.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-06-21,8,tropicana,7296,8.895081532,0,3.19,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-06-28,8,tropicana,10368,9.246479419,0,3.19,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-06-28,8,minute.maid,4928,8.502688505,0,2.62,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-06-28,8,dominicks,3968,8.286017467999999,0,2.29,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-07-05,8,dominicks,4352,8.378390789,0,2.29,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-07-05,8,minute.maid,5312,8.577723691000001,0,2.62,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-07-05,8,tropicana,6976,8.850230966,0,3.19,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-07-12,8,tropicana,6464,8.774003599999999,0,3.19,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-07-12,8,dominicks,3520,8.166216269,0,2.29,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-07-12,8,minute.maid,39424,10.58213005,1,2.59,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-07-19,8,tropicana,8192,9.010913347,0,3.19,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-07-19,8,dominicks,6464,8.774003599999999,0,2.29,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-07-19,8,minute.maid,5568,8.624791202,0,2.62,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-07-26,8,dominicks,5952,8.691482577,0,2.29,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-07-26,8,minute.maid,14592,9.588228712000001,0,2.62,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-07-26,8,tropicana,7936,8.979164649,0,3.19,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-08-02,8,tropicana,6656,8.803273982999999,0,3.19,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-08-02,8,minute.maid,22208,10.00820786,1,2.39,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-08-02,8,dominicks,8832,9.086136769,1,2.09,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-08-09,8,dominicks,7232,8.886270902,0,2.09,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-08-09,8,minute.maid,5760,8.658692754,0,2.62,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-08-09,8,tropicana,8256,9.018695487999999,0,3.19,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-08-16,8,tropicana,5568,8.624791202,0,3.19,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-08-16,8,minute.maid,54016,10.89703558,1,1.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-08-16,8,dominicks,5504,8.61323038,0,2.09,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-08-23,8,dominicks,4800,8.476371197,0,2.09,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-08-23,8,minute.maid,5824,8.66974259,0,2.62,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-08-23,8,tropicana,7488,8.921057017999999,0,3.19,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-08-30,8,tropicana,6144,8.723231275,0,3.19,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-08-30,8,minute.maid,6528,8.783855897,0,2.62,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-08-30,8,dominicks,52672,10.87183928,1,1.89,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-09-06,8,dominicks,16448,9.707959168,0,1.89,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-09-06,8,minute.maid,5440,8.60153434,0,2.62,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-09-06,8,tropicana,11008,9.30637756,0,3.19,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-09-13,8,minute.maid,36544,10.50627229,1,2.19,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-09-13,8,dominicks,19072,9.85597657,0,1.89,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-09-13,8,tropicana,5760,8.658692754,0,3.19,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-09-20,8,dominicks,13376,9.501217335,0,1.79,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-09-20,8,minute.maid,3776,8.236420527,0,2.62,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-09-20,8,tropicana,10112,9.221478116,0,3.19,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-09-27,8,tropicana,8448,9.041685006,0,3.19,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-09-27,8,minute.maid,5504,8.61323038,0,2.62,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-09-27,8,dominicks,61440,11.02581637,1,1.79,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-10-04,8,tropicana,8448,9.041685006,1,3.19,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-10-04,8,dominicks,13760,9.529521112000001,0,1.79,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-10-04,8,minute.maid,12416,9.426741242,0,2.62,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-10-11,8,minute.maid,53696,10.89109379,1,1.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-10-11,8,dominicks,3136,8.050703382,0,2.29,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-10-11,8,tropicana,7424,8.912473275,0,3.19,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-10-18,8,tropicana,5824,8.66974259,0,3.04,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-10-18,8,minute.maid,5696,8.647519453,0,2.51,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-10-18,8,dominicks,186176,12.13444774,1,1.14,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-10-25,8,tropicana,6656,8.803273982999999,0,3.04,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-10-25,8,minute.maid,4864,8.489616424,0,2.62,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-10-25,8,dominicks,3712,8.219326094,0,1.59,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-11-01,8,tropicana,6272,8.743850562,0,3.04,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-11-01,8,minute.maid,37184,10.52363384,1,1.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-11-01,8,dominicks,35776,10.48503256,1,1.59,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-11-08,8,tropicana,6912,8.841014311,0,3.04,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-11-08,8,minute.maid,5504,8.61323038,0,2.62,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-11-08,8,dominicks,26880,10.1991378,0,1.29,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-11-15,8,tropicana,10496,9.258749511,0,3.19,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-11-15,8,minute.maid,51008,10.83973776,1,1.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-11-15,8,dominicks,71680,11.17996705,0,0.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-11-22,8,tropicana,11840,9.379238908,0,2.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-11-22,8,minute.maid,11072,9.312174678,0,1.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-11-22,8,dominicks,25088,10.13014492,1,1.59,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-11-29,8,tropicana,9664,9.17616292,0,2.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-11-29,8,minute.maid,12160,9.405907156,0,2.62,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-11-29,8,dominicks,91456,11.42361326,1,2.29,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-12-06,8,minute.maid,30528,10.32639957,1,1.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-12-06,8,dominicks,23808,10.07777694,0,1.89,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-12-06,8,tropicana,6272,8.743850562,0,2.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-12-13,8,dominicks,89856,11.40596367,1,1.39,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-12-13,8,minute.maid,12096,9.400630097999999,0,1.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-12-13,8,tropicana,7168,8.877381955,0,2.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-12-20,8,minute.maid,16448,9.707959168,0,1.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-12-20,8,dominicks,12224,9.411156511,0,1.89,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-12-20,8,tropicana,29504,10.29228113,0,2.39,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-12-27,8,minute.maid,9344,9.142489705,0,1.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-12-27,8,dominicks,3776,8.236420527,0,1.89,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1990-12-27,8,tropicana,8704,9.071537969,0,2.39,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-01-03,8,tropicana,9280,9.135616826,0,2.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-01-03,8,minute.maid,16128,9.688312171,0,1.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-01-03,8,dominicks,13824,9.534161491,0,1.89,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-01-10,8,minute.maid,5376,8.589699882,0,2.17,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-01-10,8,dominicks,251072,12.43349503,1,0.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-01-10,8,tropicana,12224,9.411156511,0,2.59,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-01-17,8,minute.maid,6656,8.803273982999999,0,2.17,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-01-17,8,tropicana,10368,9.246479419,0,2.59,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-01-17,8,dominicks,4864,8.489616424,0,1.89,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-01-24,8,minute.maid,59712,10.99728828,1,1.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-01-24,8,dominicks,10176,9.227787286,0,1.89,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-01-24,8,tropicana,8128,9.00307017,0,2.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-01-31,8,tropicana,5952,8.691482577,0,2.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-01-31,8,minute.maid,9856,9.195835686,0,2.17,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-01-31,8,dominicks,105344,11.56498647,1,1.49,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-02-07,8,minute.maid,6720,8.812843434,0,2.12,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-02-07,8,dominicks,33600,10.42228135,0,1.49,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-02-07,8,tropicana,21696,9.984883191,0,2.49,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-02-14,8,dominicks,4736,8.462948177000001,0,1.89,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-02-14,8,minute.maid,4224,8.348537825,0,2.12,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-02-14,8,tropicana,7808,8.962904128,0,2.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-02-21,8,tropicana,8128,9.00307017,0,2.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-02-21,8,minute.maid,9728,9.182763604,0,2.12,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-02-21,8,dominicks,10304,9.240287448,0,1.89,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-02-28,8,tropicana,7424,8.912473275,0,2.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-02-28,8,minute.maid,40320,10.604602900000001,1,1.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-02-28,8,dominicks,5056,8.528330936,1,1.89,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-03-07,8,dominicks,179968,12.10053434,1,0.94,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-03-07,8,tropicana,5952,8.691482577,0,2.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-03-07,8,minute.maid,5120,8.540909718,0,2.17,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-03-14,8,minute.maid,19264,9.865993348,0,2.17,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-03-14,8,dominicks,4992,8.51559191,0,1.89,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-03-14,8,tropicana,7616,8.938006577000001,0,2.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-03-21,8,tropicana,5312,8.577723691000001,0,2.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-03-21,8,minute.maid,170432,12.04609167,1,1.69,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-03-21,8,dominicks,6400,8.764053269,0,1.89,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-03-28,8,minute.maid,39680,10.58860256,0,1.69,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-03-28,8,dominicks,14912,9.609921537,1,1.59,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-03-28,8,tropicana,161792,11.99406684,1,1.49,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-04-04,8,dominicks,34624,10.45230236,0,1.59,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-04-04,8,minute.maid,8128,9.00307017,1,2.17,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-04-04,8,tropicana,17280,9.757305042,0,2.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-04-11,8,tropicana,47040,10.75875358,1,1.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-04-11,8,minute.maid,9088,9.114710141,0,1.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-04-11,8,dominicks,10368,9.246479419,0,1.59,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-04-18,8,tropicana,14464,9.579418083,0,2.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-04-18,8,minute.maid,6720,8.812843434,0,1.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-04-18,8,dominicks,194880,12.18013926,1,0.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-04-25,8,tropicana,52928,10.87668778,1,1.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-04-25,8,dominicks,5696,8.647519453,1,1.89,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-04-25,8,minute.maid,7552,8.929567707999999,0,1.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-05-02,8,dominicks,7168,8.877381955,0,1.89,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-05-02,8,minute.maid,24768,10.11730778,0,1.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-05-02,8,tropicana,21184,9.961001459,0,1.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-05-09,8,tropicana,7360,8.903815212,0,2.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-05-09,8,minute.maid,183296,12.11885761,1,1.39,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-05-09,8,dominicks,2880,7.965545572999999,0,1.89,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-05-16,8,dominicks,12288,9.416378455,0,1.89,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-05-16,8,minute.maid,8896,9.093357017,0,1.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-05-16,8,tropicana,15744,9.664214619,1,2.29,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-06-06,8,dominicks,9280,9.135616826,0,1.69,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-06-06,8,tropicana,46912,10.75602879,0,1.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-06-06,8,minute.maid,6656,8.803273982999999,0,1.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-06-13,8,tropicana,18240,9.811372264,0,1.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-06-13,8,dominicks,25856,10.16029796,1,1.26,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-06-13,8,minute.maid,35456,10.47604777,1,1.49,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-06-20,8,dominicks,19264,9.865993348,0,1.29,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-06-20,8,minute.maid,17408,9.76468515,0,1.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-06-20,8,tropicana,6464,8.774003599999999,0,2.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-06-27,8,dominicks,6848,8.831711918,0,1.69,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-06-27,8,minute.maid,75520,11.2321528,1,1.69,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-06-27,8,tropicana,8512,9.049232212,0,2.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-07-04,8,tropicana,28416,10.25470765,0,1.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-07-04,8,minute.maid,21632,9.981928979,0,1.69,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-07-04,8,dominicks,12928,9.467150781,0,1.69,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-07-11,8,dominicks,44032,10.69267192,1,1.59,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-07-11,8,minute.maid,8384,9.034080407000001,0,1.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-07-11,8,tropicana,16960,9.738612909,0,1.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-07-18,8,minute.maid,9920,9.2023082,0,1.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-07-18,8,dominicks,25408,10.14281936,0,1.59,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-07-18,8,tropicana,8320,9.026417534,0,1.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-07-25,8,dominicks,38336,10.55414468,0,1.69,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-07-25,8,minute.maid,6592,8.793612072,0,1.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-07-25,8,tropicana,11136,9.317938383,1,2.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-08-01,8,tropicana,27712,10.22962081,0,2.19,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-08-01,8,minute.maid,7168,8.877381955,0,1.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-08-01,8,dominicks,152384,11.93415893,1,0.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-08-08,8,dominicks,54464,10.90529521,0,0.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-08-08,8,minute.maid,6208,8.733594062,0,1.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-08-08,8,tropicana,7744,8.954673629,0,2.19,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-08-15,8,minute.maid,30528,10.32639957,0,1.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-08-15,8,dominicks,47680,10.772267300000001,1,1.49,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-08-15,8,tropicana,5184,8.553332238,0,2.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-08-22,8,dominicks,14720,9.596962392,0,1.49,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-08-22,8,minute.maid,155840,11.95658512,1,1.29,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-08-22,8,tropicana,6272,8.743850562,0,2.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-08-29,8,tropicana,7744,8.954673629,0,2.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-08-29,8,dominicks,53248,10.88271552,0,1.39,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-08-29,8,minute.maid,10752,9.282847063,0,1.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-09-05,8,tropicana,53184,10.88151288,1,1.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-09-05,8,minute.maid,6976,8.850230966,0,1.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-09-05,8,dominicks,40576,10.61093204,0,1.39,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-09-12,8,dominicks,25856,10.16029796,0,1.39,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-09-12,8,tropicana,6784,8.822322178,0,2.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-09-12,8,minute.maid,31872,10.36948316,1,1.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-09-19,8,dominicks,24064,10.08847223,1,1.58,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-09-19,8,minute.maid,5312,8.577723691000001,0,1.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-09-19,8,tropicana,8000,8.987196821,1,2.49,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-09-26,8,tropicana,6592,8.793612072,0,2.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-09-26,8,minute.maid,33344,10.41463313,0,1.89,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-09-26,8,dominicks,15680,9.660141293999999,0,1.58,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-10-03,8,minute.maid,13504,9.510741217,0,1.79,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-10-03,8,dominicks,16576,9.715711145,0,1.58,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-10-03,8,tropicana,5248,8.565602331000001,0,2.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-10-10,8,dominicks,49664,10.8130356,1,1.58,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-10-10,8,tropicana,6592,8.793612072,0,2.94,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-10-10,8,minute.maid,13504,9.510741217,0,1.79,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-10-17,8,dominicks,10752,9.282847063,0,1.58,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-10-17,8,minute.maid,335808,12.72429485,1,1.69,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-10-17,8,tropicana,5888,8.68067166,0,2.94,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-10-24,8,tropicana,6336,8.754002933999999,0,2.94,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-10-24,8,dominicks,9792,9.189321005,0,1.58,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-10-24,8,minute.maid,13120,9.481893063,0,1.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-10-31,8,tropicana,5888,8.68067166,0,2.94,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-10-31,8,minute.maid,49664,10.8130356,0,1.49,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-10-31,8,dominicks,7104,8.868413285,0,1.58,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-11-07,8,dominicks,9216,9.128696383,1,1.58,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-11-07,8,tropicana,6080,8.712759975,0,2.94,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-11-07,8,minute.maid,10880,9.29468152,0,1.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-11-14,8,tropicana,6848,8.831711918,0,2.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-11-14,8,minute.maid,9984,9.208739091,0,1.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-11-14,8,dominicks,12608,9.442086812000001,0,1.58,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-11-21,8,tropicana,54016,10.89703558,1,1.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-11-21,8,minute.maid,9216,9.128696383,0,1.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-11-21,8,dominicks,16448,9.707959168,0,1.58,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-11-28,8,tropicana,10368,9.246479419,0,2.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-11-28,8,dominicks,27968,10.23881628,1,1.58,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-11-28,8,minute.maid,7680,8.946374826,0,1.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-12-05,8,minute.maid,7296,8.895081532,0,1.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-12-05,8,dominicks,37824,10.5406991,0,1.58,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-12-05,8,tropicana,5568,8.624791202,0,2.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-12-12,8,dominicks,33664,10.4241843,1,1.58,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-12-12,8,minute.maid,8192,9.010913347,0,1.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-12-12,8,tropicana,4864,8.489616424,0,2.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-12-19,8,tropicana,7232,8.886270902,0,2.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-12-19,8,minute.maid,6080,8.712759975,0,1.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-12-19,8,dominicks,17728,9.78290059,0,1.58,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-12-26,8,tropicana,15232,9.631153757,0,2.39,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-12-26,8,dominicks,25088,10.13014492,1,1.58,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1991-12-26,8,minute.maid,15040,9.618468598,1,1.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-01-02,8,minute.maid,9472,9.156095357,0,1.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-01-02,8,dominicks,13184,9.486759252,0,1.58,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-01-02,8,tropicana,47040,10.75875358,0,2.39,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-01-09,8,dominicks,3136,8.050703382,0,1.58,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-01-09,8,minute.maid,5888,8.68067166,0,1.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-01-09,8,tropicana,9280,9.135616826,0,2.29,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-01-16,8,tropicana,6720,8.812843434,0,2.29,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-01-16,8,minute.maid,14336,9.570529135,1,2.49,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-01-16,8,dominicks,5696,8.647519453,0,1.58,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-01-23,8,minute.maid,11712,9.368369236,1,2.49,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-01-23,8,dominicks,19008,9.852615222,0,1.58,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-01-23,8,tropicana,5056,8.528330936,0,2.89,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-01-30,8,minute.maid,7936,8.979164649,0,2.49,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-01-30,8,dominicks,121664,11.70901843,0,1.58,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-01-30,8,tropicana,6080,8.712759975,0,2.89,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-02-06,8,tropicana,10496,9.258749511,0,2.89,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-02-06,8,minute.maid,5184,8.553332238,0,2.39,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-02-06,8,dominicks,38848,10.56741187,0,1.58,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-02-13,8,minute.maid,7168,8.877381955,0,1.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-02-13,8,dominicks,6144,8.723231275,0,1.58,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-02-13,8,tropicana,39040,10.57234204,1,1.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-02-20,8,dominicks,13632,9.520175249,0,1.58,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-02-20,8,minute.maid,216064,12.28332994,1,1.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-02-20,8,tropicana,4480,8.407378325,0,2.89,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-02-27,8,tropicana,61760,11.03101119,1,1.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-02-27,8,minute.maid,15040,9.618468598,0,1.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-02-27,8,dominicks,9792,9.189321005,0,1.58,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-03-05,8,tropicana,15360,9.639522007,0,1.79,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-03-05,8,minute.maid,11840,9.379238908,0,2.39,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-03-05,8,dominicks,86912,11.37265139,1,1.58,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-03-12,8,minute.maid,25472,10.14533509,1,1.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-03-12,8,dominicks,24512,10.10691807,0,1.58,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-03-12,8,tropicana,54976,10.91465201,0,1.79,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-03-19,8,minute.maid,16384,9.704060528,0,1.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-03-19,8,dominicks,58048,10.96902553,0,1.58,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-03-19,8,tropicana,34368,10.44488118,0,1.79,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-03-26,8,tropicana,10752,9.282847063,0,2.89,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-03-26,8,minute.maid,20480,9.927204079,0,2.39,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-03-26,8,dominicks,13952,9.543378146,0,1.58,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-04-02,8,minute.maid,34688,10.45414909,1,1.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-04-02,8,dominicks,15168,9.626943225,0,1.58,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-04-02,8,tropicana,20096,9.908276069,1,2.5,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-04-09,8,dominicks,14592,9.588228712000001,0,1.58,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-04-09,8,minute.maid,22400,10.01681624,0,1.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-04-09,8,tropicana,16192,9.692272572,0,2.5,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-04-16,8,tropicana,6528,8.783855897,0,2.89,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-04-16,8,minute.maid,7808,8.962904128,0,2.39,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-04-16,8,dominicks,145088,11.88509573,1,1.29,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-04-23,8,tropicana,8320,9.026417534,0,2.89,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-04-23,8,minute.maid,48064,10.78028874,1,1.79,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-04-23,8,dominicks,43712,10.68537794,0,1.29,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-04-30,8,tropicana,30784,10.33475035,1,2.16,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-04-30,8,minute.maid,7360,8.903815212,0,2.39,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-04-30,8,dominicks,20608,9.933434629,0,1.69,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-05-07,8,tropicana,18048,9.800790154,0,2.89,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-05-07,8,minute.maid,6272,8.743850562,0,2.39,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-05-07,8,dominicks,18752,9.839055692,0,1.69,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-05-14,8,tropicana,12864,9.462187991,0,2.89,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-05-14,8,minute.maid,6400,8.764053269,0,2.39,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-05-14,8,dominicks,20160,9.911455722000001,0,1.79,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-05-21,8,tropicana,7168,8.877381955,0,2.89,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-05-21,8,minute.maid,54592,10.90764263,1,1.79,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-05-21,8,dominicks,18688,9.835636886,0,1.69,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-05-28,8,minute.maid,8128,9.00307017,0,2.39,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-05-28,8,tropicana,9024,9.107642974,0,2.89,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-05-28,8,dominicks,133824,11.80428078,0,1.69,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-06-04,8,tropicana,84992,11.35031241,1,2.49,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-06-04,8,minute.maid,4928,8.502688505,0,2.49,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-06-04,8,dominicks,63488,11.05860619,0,1.69,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-06-11,8,minute.maid,5440,8.60153434,0,2.49,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-06-11,8,tropicana,14144,9.557045785,0,2.49,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-06-11,8,dominicks,71040,11.17099838,0,1.79,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-06-25,8,tropicana,7488,8.921057017999999,1,2.89,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-06-25,8,minute.maid,5888,8.68067166,0,2.49,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-06-25,8,dominicks,15360,9.639522007,0,1.79,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-07-02,8,minute.maid,23872,10.0804615,1,2.02,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-07-02,8,dominicks,17728,9.78290059,0,1.79,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-07-02,8,tropicana,12352,9.421573272,0,2.69,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-07-09,8,tropicana,5696,8.647519453,0,2.89,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-07-09,8,minute.maid,6848,8.831711918,1,2.19,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-07-09,8,dominicks,24256,10.09641929,0,1.29,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-07-16,8,minute.maid,8192,9.010913347,0,2.49,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-07-16,8,dominicks,19968,9.901886271,0,1.79,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-07-16,8,tropicana,7680,8.946374826,0,2.89,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-07-23,8,dominicks,15936,9.67633598,0,1.69,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-07-23,8,minute.maid,55040,10.91581547,1,2.29,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-07-23,8,tropicana,5440,8.60153434,0,2.89,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-07-30,8,tropicana,5632,8.636219898,0,2.89,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-07-30,8,minute.maid,6528,8.783855897,0,2.49,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-07-30,8,dominicks,76352,11.24310951,1,1.49,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-08-06,8,tropicana,8960,9.100525506,1,2.79,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-08-06,8,minute.maid,6208,8.733594062,1,2.45,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-08-06,8,dominicks,17408,9.76468515,1,1.69,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-08-13,8,minute.maid,94720,11.45868045,1,1.99,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-08-13,8,tropicana,6080,8.712759975,0,2.89,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-08-13,8,dominicks,17536,9.77201119,0,1.79,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 -1992-08-20,8,dominicks,31232,10.34919849,0,1.59,0.252394035,0.095173274,10.59700966,0.054227156,0.131749698,0.035243328,0.283074736,2.636332801,1.5,2.905384316,0.641015947 diff --git a/how-to-use-azureml/automated-machine-learning/forecasting-grouping/deploy/deploy.py b/how-to-use-azureml/automated-machine-learning/forecasting-grouping/deploy/deploy.py deleted file mode 100644 index d49356674..000000000 --- a/how-to-use-azureml/automated-machine-learning/forecasting-grouping/deploy/deploy.py +++ /dev/null @@ -1,66 +0,0 @@ -import argparse -import json - -from azureml.core import Run, Model, Workspace -from azureml.core.conda_dependencies import CondaDependencies -from azureml.core.model import InferenceConfig -from azureml.core.webservice import AciWebservice - - -script_file_name = 'score.py' -conda_env_file_name = 'myenv.yml' - -print("In deploy.py") -parser = argparse.ArgumentParser() -parser.add_argument("--time_column_name", type=str, help="time column name") -parser.add_argument("--group_column_names", type=str, help="group column names") -parser.add_argument("--model_names", type=str, help="model names") -parser.add_argument("--service_name", type=str, help="service name") - -args = parser.parse_args() - -# replace the group column names in scoring script to the ones set by user -print("Update group_column_names") -print(args.group_column_names) - -with open(script_file_name, 'r') as cefr: - content = cefr.read() -with open(script_file_name, 'w') as cefw: - content = content.replace('<>', args.group_column_names.rstrip()) - cefw.write(content.replace('<>', args.time_column_name.rstrip())) - -with open(script_file_name, 'r') as cefr1: - content1 = cefr1.read() -print(content1) - -model_list = json.loads(args.model_names) -print(model_list) - -run = Run.get_context() -ws = run.experiment.workspace - -deployment_config = AciWebservice.deploy_configuration( - cpu_cores=1, - memory_gb=2, - tags={"method": "grouping"}, - description='grouping demo aci deployment' -) - -inference_config = InferenceConfig( - entry_script=script_file_name, - runtime='python', - conda_file=conda_env_file_name -) - -models = [] -for model_name in model_list: - models.append(Model(ws, name=model_name)) - -service = Model.deploy( - ws, - name=args.service_name, - models=models, - inference_config=inference_config, - deployment_config=deployment_config -) -service.wait_for_deployment(True) diff --git a/how-to-use-azureml/automated-machine-learning/forecasting-grouping/deploy/myenv.yml b/how-to-use-azureml/automated-machine-learning/forecasting-grouping/deploy/myenv.yml deleted file mode 100644 index 1fa82fa8d..000000000 --- a/how-to-use-azureml/automated-machine-learning/forecasting-grouping/deploy/myenv.yml +++ /dev/null @@ -1,11 +0,0 @@ -name: automl_grouping_env -dependencies: - # The python interpreter version. - - # Currently Azure ML only supports 3.5.2 and later. - -- python=3.6.2 -- numpy>=1.16.0,<=1.16.2 -- scikit-learn>=0.19.0,<=0.20.3 -- conda-forge::fbprophet==0.5 - diff --git a/how-to-use-azureml/automated-machine-learning/forecasting-grouping/deploy/score.py b/how-to-use-azureml/automated-machine-learning/forecasting-grouping/deploy/score.py deleted file mode 100644 index 42e89392a..000000000 --- a/how-to-use-azureml/automated-machine-learning/forecasting-grouping/deploy/score.py +++ /dev/null @@ -1,55 +0,0 @@ -import json -import pickle -import re - -import numpy as np -import pandas as pd -from sklearn.externals import joblib -from sklearn.linear_model import Ridge - -from azureml.core.model import Model -import azureml.train.automl - - -def init(): - global models - models = {} - global group_columns_str - group_columns_str = "<>" - global time_column_name - time_column_name = "<>" - - global group_columns - group_columns = group_columns_str.split("#####") - global valid_chars - valid_chars = re.compile('[^a-zA-Z0-9-]') - - -def run(raw_data): - try: - data = pd.read_json(raw_data) - # Make sure we have correct time points. - data[time_column_name] = pd.to_datetime(data[time_column_name], unit='ms') - dfs = [] - for grain, df_one in data.groupby(group_columns): - if isinstance(grain, int): - cur_group = str(grain) - elif isinstance(grain, str): - cur_group = grain - else: - cur_group = "#####".join(list(grain)) - cur_group = valid_chars.sub('', cur_group) - print("Query model for group {}".format(cur_group)) - if cur_group not in models: - model_path = Model.get_model_path(cur_group) - model = joblib.load(model_path) - models[cur_group] = model - _, xtrans = models[cur_group].forecast(df_one) - dfs.append(xtrans) - df_ret = pd.concat(dfs) - df_ret.reset_index(drop=False, inplace=True) - return json.dumps({'predictions': df_ret.to_json()}) - - except Exception as e: - error = str(e) - return error diff --git a/how-to-use-azureml/automated-machine-learning/forecasting-grouping/register/register.py b/how-to-use-azureml/automated-machine-learning/forecasting-grouping/register/register.py deleted file mode 100644 index d02e30066..000000000 --- a/how-to-use-azureml/automated-machine-learning/forecasting-grouping/register/register.py +++ /dev/null @@ -1,22 +0,0 @@ -import argparse - -from azureml.core import Run, Model - -parser = argparse.ArgumentParser() -parser.add_argument("--model_name") -parser.add_argument("--model_path") - -args = parser.parse_args() - -run = Run.get_context() -ws = run.experiment.workspace -print('retrieved ws: {}'.format(ws)) - -print('begin register model') -model = Model.register( - workspace=ws, - model_path=args.model_path, - model_name=args.model_name -) -print('model registered: {}'.format(model)) -print('complete') diff --git a/how-to-use-azureml/automated-machine-learning/forecasting-high-frequency/automl-forecasting-function.ipynb b/how-to-use-azureml/automated-machine-learning/forecasting-high-frequency/auto-ml-forecasting-function.ipynb similarity index 95% rename from how-to-use-azureml/automated-machine-learning/forecasting-high-frequency/automl-forecasting-function.ipynb rename to how-to-use-azureml/automated-machine-learning/forecasting-high-frequency/auto-ml-forecasting-function.ipynb index f2708ee0d..3cfac46fe 100644 --- a/how-to-use-azureml/automated-machine-learning/forecasting-high-frequency/automl-forecasting-function.ipynb +++ b/how-to-use-azureml/automated-machine-learning/forecasting-high-frequency/auto-ml-forecasting-function.ipynb @@ -68,6 +68,7 @@ "import logging\n", "import warnings\n", "\n", + "import azureml.core\n", "from azureml.core.dataset import Dataset\n", "from pandas.tseries.frequencies import to_offset\n", "from azureml.core.compute import AmlCompute\n", @@ -81,13 +82,29 @@ "np.set_printoptions(precision=4, suppress=True, linewidth=120)" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This sample notebook may use features that are not available in previous versions of the Azure ML SDK." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(\"This notebook was created using version 1.3.0 of the Azure ML SDK\")\n", + "print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")" + ] + }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ - "import azureml.core\n", "from azureml.core.workspace import Workspace\n", "from azureml.core.experiment import Experiment\n", "from azureml.train.automl import AutoMLConfig\n", @@ -100,7 +117,6 @@ "experiment = Experiment(ws, experiment_name)\n", "\n", "output = {}\n", - "output['SDK version'] = azureml.core.VERSION\n", "output['Subscription ID'] = ws.subscription_id\n", "output['Workspace'] = ws.name\n", "output['SKU'] = ws.sku\n", @@ -258,29 +274,22 @@ "metadata": {}, "outputs": [], "source": [ - "amlcompute_cluster_name = \"cpu-cluster-fcfn\"\n", - " \n", - "found = False\n", - "# Check if this compute target already exists in the workspace.\n", - "cts = ws.compute_targets\n", - "if amlcompute_cluster_name in cts and cts[amlcompute_cluster_name].type == 'AmlCompute':\n", - " found = True\n", - " print('Found existing compute target.')\n", - " compute_target = cts[amlcompute_cluster_name]\n", + "from azureml.core.compute import ComputeTarget, AmlCompute\n", + "from azureml.core.compute_target import ComputeTargetException\n", "\n", - "if not found:\n", - " print('Creating a new compute target...')\n", - " provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\", # for GPU, use \"STANDARD_NC6\"\n", - " #vm_priority = 'lowpriority', # optional\n", - " max_nodes = 6)\n", + "# Choose a name for your CPU cluster\n", + "amlcompute_cluster_name = \"fcfn-cluster\"\n", "\n", - " # Create the cluster.\\n\",\n", - " compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n", + "# Verify that cluster does not exist already\n", + "try:\n", + " compute_target = ComputeTarget(workspace=ws, name=amlcompute_cluster_name)\n", + " print('Found existing cluster, use it.')\n", + "except ComputeTargetException:\n", + " compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',\n", + " max_nodes=6)\n", + " compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, compute_config)\n", "\n", - "print('Checking cluster status...')\n", - "# Can poll for a minimum number of nodes and for a specific timeout.\n", - "# If no min_node_count is provided, it will use the scale settings for the cluster.\n", - "compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)" + "compute_target.wait_for_completion(show_output=True)" ] }, { @@ -335,7 +344,7 @@ "automl_config = AutoMLConfig(task='forecasting',\n", " debug_log='automl_forecasting_function.log',\n", " primary_metric='normalized_root_mean_squared_error',\n", - " experiment_timeout_minutes=15,\n", + " experiment_timeout_hours=0.25,\n", " enable_early_stopping=True,\n", " training_data=train_data,\n", " compute_target=compute_target,\n", @@ -459,8 +468,8 @@ "# use forecast_quantiles function, not the forecast() one\n", "y_pred_quantiles = fitted_model.forecast_quantiles(X_test)\n", "\n", - "# it all nicely aligns column-wise\n", - "pd.concat([X_test.reset_index(), y_pred_quantiles], axis=1)" + "# quantile forecasts returned in a Dataframe along with the time and grain columns \n", + "y_pred_quantiles" ] }, { @@ -701,7 +710,7 @@ "metadata": { "authors": [ { - "name": "erwright, nirovins" + "name": "erwright" } ], "category": "tutorial", diff --git a/how-to-use-azureml/automated-machine-learning/forecasting-high-frequency/auto-ml-forecasting-function.yml b/how-to-use-azureml/automated-machine-learning/forecasting-high-frequency/auto-ml-forecasting-function.yml new file mode 100644 index 000000000..b91ef1789 --- /dev/null +++ b/how-to-use-azureml/automated-machine-learning/forecasting-high-frequency/auto-ml-forecasting-function.yml @@ -0,0 +1,10 @@ +name: auto-ml-forecasting-function +dependencies: +- py-xgboost<=0.90 +- pip: + - azureml-sdk + - numpy==1.16.2 + - pandas==0.23.4 + - azureml-train-automl + - azureml-widgets + - matplotlib diff --git a/how-to-use-azureml/automated-machine-learning/forecasting-high-frequency/automl-forecasting-function.yml b/how-to-use-azureml/automated-machine-learning/forecasting-high-frequency/automl-forecasting-function.yml deleted file mode 100644 index 16b8a581f..000000000 --- a/how-to-use-azureml/automated-machine-learning/forecasting-high-frequency/automl-forecasting-function.yml +++ /dev/null @@ -1,11 +0,0 @@ -name: automl-forecasting-function -dependencies: -- fbprophet==0.5 -- py-xgboost<=0.80 -- pip: - - azureml-sdk - - azureml-train-automl - - azureml-widgets - - pandas_ml - - statsmodels - - matplotlib diff --git a/how-to-use-azureml/automated-machine-learning/forecasting-high-frequency/forecast_function_at_train.png b/how-to-use-azureml/automated-machine-learning/forecasting-high-frequency/forecast_function_at_train.png new file mode 100644 index 000000000..af006b851 Binary files /dev/null and b/how-to-use-azureml/automated-machine-learning/forecasting-high-frequency/forecast_function_at_train.png differ diff --git a/how-to-use-azureml/automated-machine-learning/forecasting-high-frequency/forecast_function_away_from_train.png b/how-to-use-azureml/automated-machine-learning/forecasting-high-frequency/forecast_function_away_from_train.png new file mode 100644 index 000000000..66c757034 Binary files /dev/null and b/how-to-use-azureml/automated-machine-learning/forecasting-high-frequency/forecast_function_away_from_train.png differ diff --git a/how-to-use-azureml/automated-machine-learning/forecasting-orange-juice-sales/auto-ml-forecasting-orange-juice-sales.ipynb b/how-to-use-azureml/automated-machine-learning/forecasting-orange-juice-sales/auto-ml-forecasting-orange-juice-sales.ipynb index e8d3a49bc..deb35f222 100644 --- a/how-to-use-azureml/automated-machine-learning/forecasting-orange-juice-sales/auto-ml-forecasting-orange-juice-sales.ipynb +++ b/how-to-use-azureml/automated-machine-learning/forecasting-orange-juice-sales/auto-ml-forecasting-orange-juice-sales.ipynb @@ -68,6 +68,23 @@ "from azureml.train.automl import AutoMLConfig" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This sample notebook may use features that are not available in previous versions of the Azure ML SDK." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(\"This notebook was created using version 1.3.0 of the Azure ML SDK\")\n", + "print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -89,7 +106,6 @@ "experiment = Experiment(ws, experiment_name)\n", "\n", "output = {}\n", - "output['SDK version'] = azureml.core.VERSION\n", "output['Subscription ID'] = ws.subscription_id\n", "output['Workspace'] = ws.name\n", "output['SKU'] = ws.sku\n", @@ -118,35 +134,22 @@ "metadata": {}, "outputs": [], "source": [ - "from azureml.core.compute import AmlCompute\n", - "from azureml.core.compute import ComputeTarget\n", + "from azureml.core.compute import ComputeTarget, AmlCompute\n", + "from azureml.core.compute_target import ComputeTargetException\n", "\n", - "# Choose a name for your cluster.\n", - "amlcompute_cluster_name = \"cpu-cluster-oj\"\n", + "# Choose a name for your CPU cluster\n", + "amlcompute_cluster_name = \"oj-cluster\"\n", "\n", - "found = False\n", - "# Check if this compute target already exists in the workspace.\n", - "cts = ws.compute_targets\n", - "if amlcompute_cluster_name in cts and cts[amlcompute_cluster_name].type == 'AmlCompute':\n", - " found = True\n", - " print('Found existing compute target.')\n", - " compute_target = cts[amlcompute_cluster_name]\n", - " \n", - "if not found:\n", - " print('Creating a new compute target...')\n", - " provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\", # for GPU, use \"STANDARD_NC6\"\n", - " #vm_priority = 'lowpriority', # optional\n", - " max_nodes = 6)\n", - "\n", - " # Create the cluster.\n", - " compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n", - " \n", - "print('Checking cluster status...')\n", - "# Can poll for a minimum number of nodes and for a specific timeout.\n", - "# If no min_node_count is provided, it will use the scale settings for the cluster.\n", - "compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n", - " \n", - "# For a more detailed view of current AmlCompute status, use get_status()." + "# Verify that cluster does not exist already\n", + "try:\n", + " compute_target = ComputeTarget(workspace=ws, name=amlcompute_cluster_name)\n", + " print('Found existing cluster, use it.')\n", + "except ComputeTargetException:\n", + " compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',\n", + " max_nodes=6)\n", + " compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, compute_config)\n", + "\n", + "compute_target.wait_for_completion(show_output=True)" ] }, { @@ -335,7 +338,7 @@ "|-|-|\n", "|**task**|forecasting|\n", "|**primary_metric**|This is the metric that you want to optimize.
Forecasting supports the following primary metrics
spearman_correlation
normalized_root_mean_squared_error
r2_score
normalized_mean_absolute_error\n", - "|**experiment_timeout_minutes**|Experimentation timeout in minutes.|\n", + "|**experiment_timeout_hours**|Experimentation timeout in hours.|\n", "|**enable_early_stopping**|If early stopping is on, training will stop when the primary metric is no longer improving.|\n", "|**training_data**|Input dataset, containing both features and label column.|\n", "|**label_column_name**|The name of the label column.|\n", @@ -366,7 +369,7 @@ "automl_config = AutoMLConfig(task='forecasting',\n", " debug_log='automl_oj_sales_errors.log',\n", " primary_metric='normalized_mean_absolute_error',\n", - " experiment_timeout_minutes=15,\n", + " experiment_timeout_hours=0.25,\n", " training_data=train_dataset,\n", " label_column_name=target_column_name,\n", " compute_target=compute_target,\n", @@ -631,9 +634,7 @@ "outputs": [], "source": [ "import json\n", - "# The request data frame needs to have y_query column which corresponds to query.\n", "X_query = X_test.copy()\n", - "X_query['y_query'] = np.NaN\n", "# We have to convert datetime to string, because Timestamps cannot be serialized to JSON.\n", "X_query[time_column_name] = X_query[time_column_name].astype(str)\n", "# The Service object accept the complex dictionary, which is internally converted to JSON string.\n", diff --git a/how-to-use-azureml/automated-machine-learning/forecasting-orange-juice-sales/auto-ml-forecasting-orange-juice-sales.yml b/how-to-use-azureml/automated-machine-learning/forecasting-orange-juice-sales/auto-ml-forecasting-orange-juice-sales.yml index f03860943..7d20e1741 100644 --- a/how-to-use-azureml/automated-machine-learning/forecasting-orange-juice-sales/auto-ml-forecasting-orange-juice-sales.yml +++ b/how-to-use-azureml/automated-machine-learning/forecasting-orange-juice-sales/auto-ml-forecasting-orange-juice-sales.yml @@ -1,11 +1,10 @@ name: auto-ml-forecasting-orange-juice-sales dependencies: -- fbprophet==0.5 -- py-xgboost<=0.80 +- py-xgboost<=0.90 - pip: - azureml-sdk + - numpy==1.16.2 + - pandas==0.23.4 - azureml-train-automl - azureml-widgets - matplotlib - - pandas_ml - - statsmodels diff --git a/how-to-use-azureml/automated-machine-learning/local-run-classification-credit-card-fraud/auto-ml-classification-credit-card-fraud-local.ipynb b/how-to-use-azureml/automated-machine-learning/local-run-classification-credit-card-fraud/auto-ml-classification-credit-card-fraud-local.ipynb index c6b5e60f0..20ec6564d 100644 --- a/how-to-use-azureml/automated-machine-learning/local-run-classification-credit-card-fraud/auto-ml-classification-credit-card-fraud-local.ipynb +++ b/how-to-use-azureml/automated-machine-learning/local-run-classification-credit-card-fraud/auto-ml-classification-credit-card-fraud-local.ipynb @@ -49,7 +49,9 @@ "2. Configure AutoML using `AutoMLConfig`.\n", "3. Train the model.\n", "4. Explore the results.\n", - "5. Test the fitted model." + "5. Visualization model's feature importance in azure portal\n", + "6. Explore any model's explanation and explore feature importance in azure portal\n", + "7. Test the fitted model." ] }, { @@ -71,13 +73,30 @@ "\n", "from matplotlib import pyplot as plt\n", "import pandas as pd\n", - "import os\n", "\n", "import azureml.core\n", "from azureml.core.experiment import Experiment\n", "from azureml.core.workspace import Workspace\n", "from azureml.core.dataset import Dataset\n", - "from azureml.train.automl import AutoMLConfig" + "from azureml.train.automl import AutoMLConfig\n", + "from azureml.explain.model._internal.explanation_client import ExplanationClient" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This sample notebook may use features that are not available in previous versions of the Azure ML SDK." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(\"This notebook was created using version 1.3.0 of the Azure ML SDK\")\n", + "print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")" ] }, { @@ -94,7 +113,6 @@ "experiment=Experiment(ws, experiment_name)\n", "\n", "output = {}\n", - "output['SDK version'] = azureml.core.VERSION\n", "output['Subscription ID'] = ws.subscription_id\n", "output['Workspace'] = ws.name\n", "output['Resource Group'] = ws.resource_group\n", @@ -155,8 +173,7 @@ "automl_settings = {\n", " \"n_cross_validations\": 3,\n", " \"primary_metric\": 'average_precision_score_weighted',\n", - " \"preprocess\": True,\n", - " \"experiment_timeout_minutes\": 10, # This is a time limit for testing purposes, remove it for real use cases, this will drastically limit ablity to find the best model possible\n", + " \"experiment_timeout_hours\": 0.25, # This is a time limit for testing purposes, remove it for real use cases, this will drastically limit ability to find the best model possible\n", " \"verbosity\": logging.INFO,\n", " \"enable_stack_ensemble\": False\n", "}\n", @@ -260,17 +277,134 @@ "metadata": {}, "source": [ "#### Print the properties of the model\n", - "The fitted_model is a python object and you can read the different properties of the object.\n", - "See *Print the properties of the model* section in [this sample notebook](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/automated-machine-learning/classification/auto-ml-classification.ipynb)." + "The fitted_model is a python object and you can read the different properties of the object.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "### Deploy\n", + "## Best Model 's explanation\n", + "Retrieve the explanation from the best_run which includes explanations for engineered features and raw features.\n", "\n", - "To deploy the model into a web service endpoint, see _Deploy_ section in [this sample notebook](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/automated-machine-learning/classification-with-deployment/auto-ml-classification-with-deployment.ipynb)" + "#### Download engineered feature importance from artifact store\n", + "You can use ExplanationClient to download the engineered feature explanations from the artifact store of the best_run. You can also use azure portal url to view the dash board visualization of the feature importance values of the engineered features." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "client = ExplanationClient.from_run(best_run)\n", + "engineered_explanations = client.download_model_explanation(raw=False)\n", + "print(engineered_explanations.get_feature_importance_dict())\n", + "print(\"You can visualize the engineered explanations under the 'Explanations (preview)' tab in the AutoML run at:-\\n\" + best_run.get_portal_url())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Explanations\n", + "In this section, we will show how to compute model explanations and visualize the explanations using azureml-explain-model package. Besides retrieving an existing model explanation for an AutoML model, you can also explain your AutoML model with different test data. The following steps will allow you to compute and visualize engineered feature importance based on your test data." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Retrieve any other AutoML model from training" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "automl_run, fitted_model = local_run.get_output(metric='accuracy')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Setup the model explanations for AutoML models\n", + "The fitted_model can generate the following which will be used for getting the engineered explanations using automl_setup_model_explanations:-\n", + "\n", + "1. Featurized data from train samples/test samples\n", + "2. Gather engineered name lists\n", + "3. Find the classes in your labeled column in classification scenarios\n", + "\n", + "The automl_explainer_setup_obj contains all the structures from above list." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "X_train = training_data.drop_columns(columns=[label_column_name])\n", + "y_train = training_data.keep_columns(columns=[label_column_name], validate=True)\n", + "X_test = validation_data.drop_columns(columns=[label_column_name])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.train.automl.runtime.automl_explain_utilities import automl_setup_model_explanations\n", + "\n", + "automl_explainer_setup_obj = automl_setup_model_explanations(fitted_model, X=X_train, \n", + " X_test=X_test, y=y_train, \n", + " task='classification')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Initialize the Mimic Explainer for feature importance\n", + "For explaining the AutoML models, use the MimicWrapper from azureml.explain.model package. The MimicWrapper can be initialized with fields in automl_explainer_setup_obj, your workspace and a LightGBM model which acts as a surrogate model to explain the AutoML model (fitted_model here). The MimicWrapper also takes the automl_run object where engineered explanations will be uploaded." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.explain.model.mimic.models.lightgbm_model import LGBMExplainableModel\n", + "from azureml.explain.model.mimic_wrapper import MimicWrapper\n", + "explainer = MimicWrapper(ws, automl_explainer_setup_obj.automl_estimator, LGBMExplainableModel, \n", + " init_dataset=automl_explainer_setup_obj.X_transform, run=automl_run,\n", + " features=automl_explainer_setup_obj.engineered_feature_names, \n", + " feature_maps=[automl_explainer_setup_obj.feature_map],\n", + " classes=automl_explainer_setup_obj.classes)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Use Mimic Explainer for computing and visualizing engineered feature importance\n", + "The explain() method in MimicWrapper can be called with the transformed test samples to get the feature importance for the generated engineered features. You can also use azure portal url to view the dash board visualization of the feature importance values of the engineered features." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "engineered_explanations = explainer.explain(['local', 'global'], eval_dataset=automl_explainer_setup_obj.X_test_transform)\n", + "print(engineered_explanations.get_feature_importance_dict())\n", + "print(\"You can visualize the engineered explanations under the 'Explanations (preview)' tab in the AutoML run at:-\\n\" + automl_run.get_portal_url())" ] }, { @@ -369,7 +503,7 @@ "metadata": { "authors": [ { - "name": "tzvikei" + "name": "anumamah" } ], "category": "tutorial", diff --git a/how-to-use-azureml/automated-machine-learning/local-run-classification-credit-card-fraud/auto-ml-classification-credit-card-fraud-local.yml b/how-to-use-azureml/automated-machine-learning/local-run-classification-credit-card-fraud/auto-ml-classification-credit-card-fraud-local.yml index 023f0ac4b..ed7aa7e05 100644 --- a/how-to-use-azureml/automated-machine-learning/local-run-classification-credit-card-fraud/auto-ml-classification-credit-card-fraud-local.yml +++ b/how-to-use-azureml/automated-machine-learning/local-run-classification-credit-card-fraud/auto-ml-classification-credit-card-fraud-local.yml @@ -2,10 +2,6 @@ name: auto-ml-classification-credit-card-fraud-local dependencies: - pip: - azureml-sdk - - interpret - - azureml-defaults - - azureml-explain-model - azureml-train-automl - azureml-widgets - matplotlib - - pandas_ml diff --git a/how-to-use-azureml/automated-machine-learning/regression-hardware-performance-explanation-and-featurization/auto-ml-regression-hardware-performance-explanation-and-featurization.ipynb b/how-to-use-azureml/automated-machine-learning/regression-explanation-featurization/auto-ml-regression-explanation-featurization.ipynb similarity index 86% rename from how-to-use-azureml/automated-machine-learning/regression-hardware-performance-explanation-and-featurization/auto-ml-regression-hardware-performance-explanation-and-featurization.ipynb rename to how-to-use-azureml/automated-machine-learning/regression-explanation-featurization/auto-ml-regression-explanation-featurization.ipynb index e5996ce5b..e730c092c 100644 --- a/how-to-use-azureml/automated-machine-learning/regression-hardware-performance-explanation-and-featurization/auto-ml-regression-hardware-performance-explanation-and-featurization.ipynb +++ b/how-to-use-azureml/automated-machine-learning/regression-explanation-featurization/auto-ml-regression-explanation-featurization.ipynb @@ -51,8 +51,8 @@ "4. Explore the results and featurization transparency options\n", "5. Setup remote compute for computing the model explanations for a given AutoML model.\n", "6. Start an AzureML experiment on your remote compute to compute explanations for an AutoML model.\n", - "7. Download the feature importance for engineered features and visualize the explanations for engineered features. \n", - "8. Download the feature importance for raw features and visualize the explanations for raw features. \n" + "7. Download the feature importance for engineered features and visualize the explanations for engineered features on azure portal. \n", + "8. Download the feature importance for raw features and visualize the explanations for raw features on azure portal. \n" ] }, { @@ -85,6 +85,23 @@ "from azureml.core.dataset import Dataset" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This sample notebook may use features that are not available in previous versions of the Azure ML SDK." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(\"This notebook was created using version 1.3.0 of the Azure ML SDK\")\n", + "print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")" + ] + }, { "cell_type": "code", "execution_count": null, @@ -98,7 +115,6 @@ "experiment = Experiment(ws, experiment_name)\n", "\n", "output = {}\n", - "output['SDK version'] = azureml.core.VERSION\n", "output['Subscription ID'] = ws.subscription_id\n", "output['Workspace Name'] = ws.name\n", "output['Resource Group'] = ws.resource_group\n", @@ -127,35 +143,22 @@ "metadata": {}, "outputs": [], "source": [ - "from azureml.core.compute import AmlCompute\n", - "from azureml.core.compute import ComputeTarget\n", + "from azureml.core.compute import ComputeTarget, AmlCompute\n", + "from azureml.core.compute_target import ComputeTargetException\n", "\n", "# Choose a name for your cluster.\n", - "amlcompute_cluster_name = \"cpu-cluster-5\"\n", - "\n", - "found = False\n", - "# Check if this compute target already exists in the workspace.\n", - "cts = ws.compute_targets\n", - "if amlcompute_cluster_name in cts and cts[amlcompute_cluster_name].type == 'AmlCompute':\n", - " found = True\n", - " print('Found existing compute target.')\n", - " compute_target = cts[amlcompute_cluster_name]\n", - "\n", - "if not found:\n", - " print('Creating a new compute target...')\n", - " provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\", # for GPU, use \"STANDARD_NC6\"\n", - " #vm_priority = 'lowpriority', # optional\n", - " max_nodes = 4)\n", - "\n", - " # Create the cluster.\\n\",\n", - " compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n", + "amlcompute_cluster_name = \"hardware-cluster\"\n", "\n", - "print('Checking cluster status...')\n", - "# Can poll for a minimum number of nodes and for a specific timeout.\n", - "# If no min_node_count is provided, it will use the scale settings for the cluster.\n", - "compute_target.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n", + "# Verify that cluster does not exist already\n", + "try:\n", + " compute_target = ComputeTarget(workspace=ws, name=amlcompute_cluster_name)\n", + " print('Found existing cluster, use it.')\n", + "except ComputeTargetException:\n", + " compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',\n", + " max_nodes=4)\n", + " compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, compute_config)\n", "\n", - "# For a more detailed view of current AmlCompute status, use get_status()." + "compute_target.wait_for_completion(show_output=True)" ] }, { @@ -206,9 +209,9 @@ "|-|-|\n", "|**task**|classification, regression or forecasting|\n", "|**primary_metric**|This is the metric that you want to optimize. Regression supports the following primary metrics:
spearman_correlation
normalized_root_mean_squared_error
r2_score
normalized_mean_absolute_error|\n", - "|**experiment_timeout_minutes**| Maximum amount of time in minutes that all iterations combined can take before the experiment terminates.|\n", + "|**experiment_timeout_hours**| Maximum amount of time in hours that all iterations combined can take before the experiment terminates.|\n", "|**enable_early_stopping**| Flag to enble early termination if the score is not improving in the short term.|\n", - "|**featurization**| 'auto' / 'off' / FeaturizationConfig Indicator for whether featurization step should be done automatically or not, or whether customized featurization should be used. Note: If the input data is sparse, featurization cannot be turned on.|\n", + "|**featurization**| 'auto' / 'off' / FeaturizationConfig Indicator for whether featurization step should be done automatically or not, or whether customized featurization should be used. Setting this enables AutoML to perform featurization on the input to handle *missing data*, and to perform some common *feature extraction*. Note: If the input data is sparse, featurization cannot be turned on.|\n", "|**n_cross_validations**|Number of cross validation splits.|\n", "|**training_data**|(sparse) array-like, shape = [n_samples, n_features]|\n", "|**label_column_name**|(sparse) array-like, shape = [n_samples, ], targets values.|" @@ -244,7 +247,7 @@ "source": [ "featurization_config = FeaturizationConfig()\n", "featurization_config.blocked_transformers = ['LabelEncoder']\n", - "#featurization_config.drop_columns = ['ERP', 'MMIN']\n", + "#featurization_config.drop_columns = ['MMIN']\n", "featurization_config.add_column_purpose('MYCT', 'Numeric')\n", "featurization_config.add_column_purpose('VendorName', 'CategoricalHash')\n", "#default strategy mean, add transformer param for for 3 columns\n", @@ -262,7 +265,7 @@ "source": [ "automl_settings = {\n", " \"enable_early_stopping\": True, \n", - " \"experiment_timeout_minutes\" : 10,\n", + " \"experiment_timeout_hours\" : 0.25,\n", " \"max_concurrent_iterations\": 4,\n", " \"max_cores_per_iteration\": -1,\n", " \"n_cross_validations\": 5,\n", @@ -320,8 +323,6 @@ "outputs": [], "source": [ "#from azureml.train.automl.run import AutoMLRun\n", - "#experiment_name = 'automl-regression-hardware'\n", - "#experiment = Experiment(ws, experiment_name)\n", "#remote_run = AutoMLRun(experiment=experiment, run_id='>', automl_run.experiment.name) # your experiment name.\n", + "content = content.replace('<>', automl_run.experiment.name) # your experiment name.\n", "content = content.replace('<>', automl_run.id) # Run-id of the AutoML run for which you want to explain the model.\n", "content = content.replace('<>', 'ERP') # Your target column name\n", "content = content.replace('<>', 'regression') # Training task type\n", @@ -532,8 +533,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "#### Create conda configuration for model explanations experiment\n", - "We need `azureml-explain-model`, `azureml-train-automl` and `azureml-core` packages for computing model explanations for your AutoML model on remote compute." + "#### Create conda configuration for model explanations experiment from automl_run object" ] }, { @@ -552,14 +552,9 @@ "# Set compute target to AmlCompute\n", "conda_run_config.target = compute_target\n", "conda_run_config.environment.docker.enabled = True\n", - "azureml_pip_packages = [\n", - " 'azureml-train-automl', 'azureml-core', 'azureml-explain-model'\n", - "]\n", "\n", "# specify CondaDependencies obj\n", - "conda_run_config.environment.python.conda_dependencies = CondaDependencies.create(\n", - " conda_packages=['scikit-learn', 'numpy','py-xgboost<=0.80'],\n", - " pip_packages=azureml_pip_packages)" + "conda_run_config.environment.python.conda_dependencies = automl_run.get_environment().python.conda_dependencies" ] }, { @@ -604,38 +599,8 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Feature importance and explanation dashboard\n", - "In this section we describe how you can download the explanation results from the explanations experiment and visualize the feature importance for your AutoML model. " - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Setup for visualizing the model explanation results\n", - "For visualizing the explanation results for the *fitted_model* we need to perform the following steps:-\n", - "1. Featurize test data samples.\n", - "\n", - "The *automl_explainer_setup_obj* contains all the structures from above list. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "X_test = test_data.drop_columns([label]).to_pandas_dataframe()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.train.automl.runtime.automl_explain_utilities import AutoMLExplainerSetupClass, automl_setup_model_explanations\n", - "explainer_setup_class = automl_setup_model_explanations(fitted_model, 'regression', X_test=X_test)" + "### Feature importance and visualizing explanation dashboard\n", + "In this section we describe how you can download the explanation results from the explanations experiment and visualize the feature importance for your AutoML model on the azure portal." ] }, { @@ -643,7 +608,7 @@ "metadata": {}, "source": [ "#### Download engineered feature importance from artifact store\n", - "You can use *ExplanationClient* to download the engineered feature explanations from the artifact store of the *automl_run*. You can also use ExplanationDashboard to view the dash board visualization of the feature importance values of the engineered features." + "You can use *ExplanationClient* to download the engineered feature explanations from the artifact store of the *automl_run*. You can also use azure portal url to view the dash board visualization of the feature importance values of the engineered features." ] }, { @@ -653,11 +618,10 @@ "outputs": [], "source": [ "from azureml.explain.model._internal.explanation_client import ExplanationClient\n", - "from interpret_community.widget import ExplanationDashboard\n", "client = ExplanationClient.from_run(automl_run)\n", - "engineered_explanations = client.download_model_explanation(raw=False)\n", + "engineered_explanations = client.download_model_explanation(raw=False, comment='engineered explanations')\n", "print(engineered_explanations.get_feature_importance_dict())\n", - "ExplanationDashboard(engineered_explanations, explainer_setup_class.automl_estimator, datasetX=explainer_setup_class.X_test_transform)" + "print(\"You can visualize the engineered explanations under the 'Explanations (preview)' tab in the AutoML run at:-\\n\" + automl_run.get_portal_url())" ] }, { @@ -665,7 +629,7 @@ "metadata": {}, "source": [ "#### Download raw feature importance from artifact store\n", - "You can use *ExplanationClient* to download the raw feature explanations from the artifact store of the *automl_run*. You can also use ExplanationDashboard to view the dash board visualization of the feature importance values of the raw features." + "You can use *ExplanationClient* to download the raw feature explanations from the artifact store of the *automl_run*. You can also use azure portal url to view the dash board visualization of the feature importance values of the raw features." ] }, { @@ -674,9 +638,9 @@ "metadata": {}, "outputs": [], "source": [ - "raw_explanations = client.download_model_explanation(raw=True)\n", + "raw_explanations = client.download_model_explanation(raw=True, comment='raw explanations')\n", "print(raw_explanations.get_feature_importance_dict())\n", - "ExplanationDashboard(raw_explanations, explainer_setup_class.automl_pipeline, datasetX=explainer_setup_class.X_test_raw)" + "print(\"You can visualize the raw explanations under the 'Explanations (preview)' tab in the AutoML run at:-\\n\" + automl_run.get_portal_url())" ] }, { @@ -718,20 +682,10 @@ "metadata": {}, "outputs": [], "source": [ - "from azureml.core.conda_dependencies import CondaDependencies \n", - "\n", - "azureml_pip_packages = [\n", - " 'azureml-explain-model', 'azureml-train-automl', 'azureml-defaults'\n", - "]\n", - " \n", - "\n", - "# specify CondaDependencies obj\n", - "myenv = CondaDependencies.create(conda_packages=['scikit-learn', 'pandas', 'numpy', 'py-xgboost<=0.80'],\n", - " pip_packages=azureml_pip_packages,\n", - " pin_sdk_version=True)\n", + "conda_dep = automl_run.get_environment().python.conda_dependencies\n", "\n", "with open(\"myenv.yml\",\"w\") as f:\n", - " f.write(myenv.serialize_to_string())\n", + " f.write(conda_dep.serialize_to_string())\n", "\n", "with open(\"myenv.yml\",\"r\") as f:\n", " print(f.read())" @@ -772,6 +726,7 @@ "from azureml.core.model import InferenceConfig\n", "from azureml.core.webservice import AciWebservice\n", "from azureml.core.model import Model\n", + "from azureml.core.environment import Environment\n", "\n", "aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, \n", " memory_gb=1, \n", @@ -779,9 +734,8 @@ " \"method\" : \"local_explanation\"}, \n", " description='Get local explanations for Machine test data')\n", "\n", - "inference_config = InferenceConfig(runtime= \"python\", \n", - " entry_script=\"score_explain.py\",\n", - " conda_file=\"myenv.yml\")\n", + "myenv = Environment.from_conda_specification(name=\"myenv\", file_path=\"myenv.yml\")\n", + "inference_config = InferenceConfig(entry_script=\"score_explain.py\", environment=myenv)\n", "\n", "# Use configs and models generated above\n", "service = Model.deploy(ws, 'model-scoring', [scoring_explainer_model, original_model], inference_config, aciconfig)\n", @@ -819,6 +773,7 @@ "outputs": [], "source": [ "if service.state == 'Healthy':\n", + " X_test = test_data.drop_columns([label]).to_pandas_dataframe()\n", " # Serialize the first row of the test data into json\n", " X_test_json = X_test[:1].to_json(orient='records')\n", " print(X_test_json)\n", diff --git a/how-to-use-azureml/automated-machine-learning/forecasting-grouping/auto-ml-forecasting-grouping.yml b/how-to-use-azureml/automated-machine-learning/regression-explanation-featurization/auto-ml-regression-explanation-featurization.yml similarity index 53% rename from how-to-use-azureml/automated-machine-learning/forecasting-grouping/auto-ml-forecasting-grouping.yml rename to how-to-use-azureml/automated-machine-learning/regression-explanation-featurization/auto-ml-regression-explanation-featurization.yml index 3527c5c8e..75fe1e52d 100644 --- a/how-to-use-azureml/automated-machine-learning/forecasting-grouping/auto-ml-forecasting-grouping.yml +++ b/how-to-use-azureml/automated-machine-learning/regression-explanation-featurization/auto-ml-regression-explanation-featurization.yml @@ -1,10 +1,7 @@ -name: auto-ml-forecasting-grouping +name: auto-ml-regression-explanation-featurization dependencies: - pip: - azureml-sdk - azureml-train-automl - - azureml-pipeline - azureml-widgets - - pandas_ml - - statsmodels - matplotlib diff --git a/how-to-use-azureml/automated-machine-learning/regression-hardware-performance-explanation-and-featurization/score_explain.py b/how-to-use-azureml/automated-machine-learning/regression-explanation-featurization/score_explain.py similarity index 100% rename from how-to-use-azureml/automated-machine-learning/regression-hardware-performance-explanation-and-featurization/score_explain.py rename to how-to-use-azureml/automated-machine-learning/regression-explanation-featurization/score_explain.py diff --git a/how-to-use-azureml/automated-machine-learning/regression-hardware-performance-explanation-and-featurization/train_explainer.py b/how-to-use-azureml/automated-machine-learning/regression-explanation-featurization/train_explainer.py similarity index 85% rename from how-to-use-azureml/automated-machine-learning/regression-hardware-performance-explanation-and-featurization/train_explainer.py rename to how-to-use-azureml/automated-machine-learning/regression-explanation-featurization/train_explainer.py index 473604645..cda35b494 100644 --- a/how-to-use-azureml/automated-machine-learning/regression-hardware-performance-explanation-and-featurization/train_explainer.py +++ b/how-to-use-azureml/automated-machine-learning/regression-explanation-featurization/train_explainer.py @@ -7,10 +7,11 @@ from sklearn.externals import joblib from azureml.core.dataset import Dataset from azureml.train.automl.runtime.automl_explain_utilities import AutoMLExplainerSetupClass, \ - automl_setup_model_explanations + automl_setup_model_explanations, automl_check_model_if_explainable from azureml.explain.model.mimic.models.lightgbm_model import LGBMExplainableModel from azureml.explain.model.mimic_wrapper import MimicWrapper from automl.client.core.common.constants import MODEL_PATH +from azureml.automl.core.shared.constants import MODEL_EXPLANATION_TAG from azureml.explain.model.scoring.scoring_explainer import TreeScoringExplainer, save @@ -22,9 +23,14 @@ ws = run.experiment.workspace # Get the AutoML run object from the experiment name and the workspace -experiment = Experiment(ws, '<>') +experiment = Experiment(ws, '<>') automl_run = Run(experiment=experiment, run_id='<>') +# Check if this AutoML model is explainable +if not automl_check_model_if_explainable(automl_run): + raise Exception("Model explanations is currently not supported for " + automl_run.get_properties().get( + 'run_algorithm')) + # Download the best model from the artifact store automl_run.download_file(name=MODEL_PATH, output_file_path='model.pkl') @@ -55,16 +61,18 @@ classes=automl_explainer_setup_obj.classes) # Compute the engineered explanations -engineered_explanations = explainer.explain(['local', 'global'], +engineered_explanations = explainer.explain(['local', 'global'], tag='engineered explanations', eval_dataset=automl_explainer_setup_obj.X_test_transform) # Compute the raw explanations -raw_explanations = explainer.explain(['local', 'global'], get_raw=True, +raw_explanations = explainer.explain(['local', 'global'], get_raw=True, tag='raw explanations', raw_feature_names=automl_explainer_setup_obj.raw_feature_names, eval_dataset=automl_explainer_setup_obj.X_test_transform) -print("Engineered and raw explanations computed successfully") +# Set tag that explanations completed +automl_run.tag(MODEL_EXPLANATION_TAG, 'True') +print("Engineered and raw explanations computed successfully") # Initialize the ScoringExplainer scoring_explainer = TreeScoringExplainer(explainer.explainer, feature_maps=[automl_explainer_setup_obj.feature_map]) diff --git a/how-to-use-azureml/automated-machine-learning/regression-hardware-performance-explanation-and-featurization/auto-ml-regression-hardware-performance-explanation-and-featurization.yml b/how-to-use-azureml/automated-machine-learning/regression-hardware-performance-explanation-and-featurization/auto-ml-regression-hardware-performance-explanation-and-featurization.yml deleted file mode 100644 index bc3c55518..000000000 --- a/how-to-use-azureml/automated-machine-learning/regression-hardware-performance-explanation-and-featurization/auto-ml-regression-hardware-performance-explanation-and-featurization.yml +++ /dev/null @@ -1,13 +0,0 @@ -name: auto-ml-regression-hardware-performance-explanation-and-featurization -dependencies: -- pip: - - azureml-sdk - - interpret - - azureml-defaults - - azureml-explain-model - - azureml-train-automl - - azureml-widgets - - matplotlib - - pandas_ml - - azureml-explain-model - - azureml-contrib-interpret diff --git a/how-to-use-azureml/automated-machine-learning/regression/auto-ml-regression.ipynb b/how-to-use-azureml/automated-machine-learning/regression/auto-ml-regression.ipynb index c1b305bdf..7955f5de1 100644 --- a/how-to-use-azureml/automated-machine-learning/regression/auto-ml-regression.ipynb +++ b/how-to-use-azureml/automated-machine-learning/regression/auto-ml-regression.ipynb @@ -79,6 +79,23 @@ "from azureml.train.automl import AutoMLConfig" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This sample notebook may use features that are not available in previous versions of the Azure ML SDK." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(\"This notebook was created using version 1.3.0 of the Azure ML SDK\")\n", + "print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")" + ] + }, { "cell_type": "code", "execution_count": null, @@ -93,7 +110,6 @@ "experiment = Experiment(ws, experiment_name)\n", "\n", "output = {}\n", - "output['SDK version'] = azureml.core.VERSION\n", "output['Subscription ID'] = ws.subscription_id\n", "output['Workspace'] = ws.name\n", "output['Resource Group'] = ws.resource_group\n", @@ -122,7 +138,7 @@ "from azureml.core.compute_target import ComputeTargetException\n", "\n", "# Choose a name for your CPU cluster\n", - "cpu_cluster_name = \"cpu-cluster-2\"\n", + "cpu_cluster_name = \"reg-cluster\"\n", "\n", "# Verify that cluster does not exist already\n", "try:\n", @@ -188,15 +204,18 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [ + "automlconfig-remarks-sample" + ] + }, "outputs": [], "source": [ "automl_settings = {\n", " \"n_cross_validations\": 3,\n", " \"primary_metric\": 'r2_score',\n", - " \"preprocess\": True,\n", " \"enable_early_stopping\": True, \n", - " \"experiment_timeout_minutes\": 20, #for real scenarios we reccommend a timeout of at least one hour \n", + " \"experiment_timeout_hours\": 0.3, #for real scenarios we reccommend a timeout of at least one hour \n", " \"max_concurrent_iterations\": 4,\n", " \"max_cores_per_iteration\": -1,\n", " \"verbosity\": logging.INFO,\n", diff --git a/how-to-use-azureml/automated-machine-learning/regression/auto-ml-regression.yml b/how-to-use-azureml/automated-machine-learning/regression/auto-ml-regression.yml index 18789ff36..01892fb3f 100644 --- a/how-to-use-azureml/automated-machine-learning/regression/auto-ml-regression.yml +++ b/how-to-use-azureml/automated-machine-learning/regression/auto-ml-regression.yml @@ -2,8 +2,7 @@ name: auto-ml-regression dependencies: - pip: - azureml-sdk + - pandas==0.23.4 - azureml-train-automl - azureml-widgets - matplotlib - - pandas_ml - - paramiko<2.5.0 diff --git a/how-to-use-azureml/automated-machine-learning/sql-server/setup/AutoMLTrain.sql b/how-to-use-azureml/automated-machine-learning/sql-server/setup/AutoMLTrain.sql index d0840ac29..359db0958 100644 --- a/how-to-use-azureml/automated-machine-learning/sql-server/setup/AutoMLTrain.sql +++ b/how-to-use-azureml/automated-machine-learning/sql-server/setup/AutoMLTrain.sql @@ -56,7 +56,7 @@ CREATE OR ALTER PROCEDURE [dbo].[AutoMLTrain] @task NVARCHAR(40)='classification', -- The type of task. Can be classification, regression or forecasting. @experiment_name NVARCHAR(32)='automl-sql-test', -- This can be used to find the experiment in the Azure Portal. @iteration_timeout_minutes INT = 15, -- The maximum time in minutes for training a single pipeline. - @experiment_timeout_minutes INT = 60, -- The maximum time in minutes for training all pipelines. + @experiment_timeout_hours FLOAT = 1, -- The maximum time in hours for training all pipelines. @n_cross_validations INT = 3, -- The number of cross validations. @blacklist_models NVARCHAR(MAX) = '', -- A comma separated list of algos that will not be used. -- The list of possible models can be found at: @@ -131,8 +131,8 @@ if __name__.startswith("sqlindb"): X_train = data_train - if experiment_timeout_minutes == 0: - experiment_timeout_minutes = None + if experiment_timeout_hours == 0: + experiment_timeout_hours = None if experiment_exit_score == 0: experiment_exit_score = None @@ -163,7 +163,7 @@ if __name__.startswith("sqlindb"): debug_log = log_file_name, primary_metric = primary_metric, iteration_timeout_minutes = iteration_timeout_minutes, - experiment_timeout_minutes = experiment_timeout_minutes, + experiment_timeout_hours = experiment_timeout_hours, iterations = iterations, n_cross_validations = n_cross_validations, preprocess = preprocess, @@ -204,7 +204,7 @@ if __name__.startswith("sqlindb"): @iterations INT, @task NVARCHAR(40), @experiment_name NVARCHAR(32), @iteration_timeout_minutes INT, - @experiment_timeout_minutes INT, + @experiment_timeout_hours FLOAT, @n_cross_validations INT, @blacklist_models NVARCHAR(MAX), @whitelist_models NVARCHAR(MAX), @@ -223,7 +223,7 @@ if __name__.startswith("sqlindb"): , @task = @task , @experiment_name = @experiment_name , @iteration_timeout_minutes = @iteration_timeout_minutes - , @experiment_timeout_minutes = @experiment_timeout_minutes + , @experiment_timeout_hours = @experiment_timeout_hours , @n_cross_validations = @n_cross_validations , @blacklist_models = @blacklist_models , @whitelist_models = @whitelist_models diff --git a/how-to-use-azureml/automated-machine-learning/sql-server/setup/auto-ml-sql-setup.ipynb b/how-to-use-azureml/automated-machine-learning/sql-server/setup/auto-ml-sql-setup.ipynb index 3935f3052..3b4eabc1f 100644 --- a/how-to-use-azureml/automated-machine-learning/sql-server/setup/auto-ml-sql-setup.ipynb +++ b/how-to-use-azureml/automated-machine-learning/sql-server/setup/auto-ml-sql-setup.ipynb @@ -235,7 +235,7 @@ " @task NVARCHAR(40)='classification', -- The type of task. Can be classification, regression or forecasting.\r\n", " @experiment_name NVARCHAR(32)='automl-sql-test', -- This can be used to find the experiment in the Azure Portal.\r\n", " @iteration_timeout_minutes INT = 15, -- The maximum time in minutes for training a single pipeline. \r\n", - " @experiment_timeout_minutes INT = 60, -- The maximum time in minutes for training all pipelines.\r\n", + " @experiment_timeout_hours FLOAT = 1, -- The maximum time in hours for training all pipelines.\r\n", " @n_cross_validations INT = 3, -- The number of cross validations.\r\n", " @blacklist_models NVARCHAR(MAX) = '', -- A comma separated list of algos that will not be used.\r\n", " -- The list of possible models can be found at:\r\n", @@ -307,8 +307,8 @@ "\r\n", " X_train = data_train\r\n", "\r\n", - " if experiment_timeout_minutes == 0:\r\n", - " experiment_timeout_minutes = None\r\n", + " if experiment_timeout_hours == 0:\r\n", + " experiment_timeout_hours = None\r\n", "\r\n", " if experiment_exit_score == 0:\r\n", " experiment_exit_score = None\r\n", @@ -337,7 +337,7 @@ " debug_log = log_file_name, \r\n", " primary_metric = primary_metric, \r\n", " iteration_timeout_minutes = iteration_timeout_minutes, \r\n", - " experiment_timeout_minutes = experiment_timeout_minutes,\r\n", + " experiment_timeout_hours = experiment_timeout_hours,\r\n", " iterations = iterations, \r\n", " n_cross_validations = n_cross_validations, \r\n", " preprocess = preprocess,\r\n", @@ -378,7 +378,7 @@ "\t\t\t\t @iterations INT, @task NVARCHAR(40),\r\n", "\t\t\t\t @experiment_name NVARCHAR(32),\r\n", "\t\t\t\t @iteration_timeout_minutes INT,\r\n", - "\t\t\t\t @experiment_timeout_minutes INT,\r\n", + "\t\t\t\t @experiment_timeout_hours FLOAT,\r\n", "\t\t\t\t @n_cross_validations INT,\r\n", "\t\t\t\t @blacklist_models NVARCHAR(MAX),\r\n", "\t\t\t\t @whitelist_models NVARCHAR(MAX),\r\n", @@ -396,7 +396,7 @@ "\t, @task = @task\r\n", "\t, @experiment_name = @experiment_name\r\n", "\t, @iteration_timeout_minutes = @iteration_timeout_minutes\r\n", - "\t, @experiment_timeout_minutes = @experiment_timeout_minutes\r\n", + "\t, @experiment_timeout_hours = @experiment_timeout_hours\r\n", "\t, @n_cross_validations = @n_cross_validations\r\n", "\t, @blacklist_models = @blacklist_models\r\n", "\t, @whitelist_models = @whitelist_models\r\n", @@ -560,9 +560,6 @@ "framework": [ "Azure ML AutoML" ], - "tags": [ - "" - ], "friendly_name": "Setup automated ML SQL integration", "index_order": 1, "kernelspec": { @@ -574,6 +571,9 @@ "name": "sql", "version": "" }, + "tags": [ + "" + ], "task": "None" }, "nbformat": 4, diff --git a/how-to-use-azureml/azure-databricks/amlsdk/deploy-to-aci-04.ipynb b/how-to-use-azureml/azure-databricks/amlsdk/deploy-to-aci-04.ipynb index f17bfb968..910e6eef3 100644 --- a/how-to-use-azureml/azure-databricks/amlsdk/deploy-to-aci-04.ipynb +++ b/how-to-use-azureml/azure-databricks/amlsdk/deploy-to-aci-04.ipynb @@ -11,6 +11,13 @@ "Licensed under the MIT License." ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Register Azure Databricks trained model and deploy it to ACI\n" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -161,9 +168,9 @@ "source": [ "from azureml.core.conda_dependencies import CondaDependencies \n", "\n", - "myacienv = CondaDependencies.create(conda_packages=['scikit-learn','numpy','pandas']) #showing how to add libs as an eg. - not needed for this model.\n", + "myacienv = CondaDependencies.create(conda_packages=['scikit-learn','numpy','pandas']) # showing how to add libs as an eg. - not needed for this model.\n", "\n", - "with open(\"mydeployenv.yml\",\"w\") as f:\n", + "with open(\"myenv.yml\",\"w\") as f:\n", " f.write(myacienv.serialize_to_string())" ] }, @@ -177,6 +184,9 @@ "from azureml.core.webservice import AciWebservice, Webservice\n", "from azureml.exceptions import WebserviceException\n", "from azureml.core.model import InferenceConfig\n", + "from azureml.core.environment import Environment\n", + "from azureml.core.conda_dependencies import CondaDependencies\n", + "\n", "\n", "myaci_config = AciWebservice.deploy_configuration(cpu_cores = 2, \n", " memory_gb = 2, \n", @@ -191,9 +201,16 @@ "except WebserviceException:\n", " pass\n", "\n", - "inference_config = InferenceConfig(runtime= 'spark-py', \n", - " entry_script='score_sparkml.py',\n", - " conda_file='mydeployenv.yml')\n", + "myenv = Environment.get(ws, name='AzureML-PySpark-MmlSpark-0.15')\n", + "# we need to add extra packages to procured environment\n", + "# in order to deploy amended environment we need to rename it\n", + "myenv.name = 'myenv'\n", + "model_dependencies = CondaDependencies('myenv.yml')\n", + "for pip_dep in model_dependencies.pip_packages:\n", + " myenv.python.conda_dependencies.add_pip_package(pip_dep)\n", + "for conda_dep in model_dependencies.conda_packages:\n", + " myenv.python.conda_dependencies.add_conda_package(conda_dep)\n", + "inference_config = InferenceConfig(entry_script='score_sparkml.py', environment=myenv)\n", "\n", "myservice = Model.deploy(ws, service_name, [mymodel], inference_config, myaci_config)\n", "myservice.wait_for_deployment(show_output=True)" @@ -255,6 +272,15 @@ "myservice.delete()" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Deploying to other types of computes\n", + "\n", + "In order to learn how to deploy to other types of compute targets, such as AKS, please take a look at the set of notebooks in the [deployment](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/deployment) folder." + ] + }, { "cell_type": "markdown", "metadata": {}, diff --git a/how-to-use-azureml/azure-databricks/amlsdk/deploy-to-aks-05.ipynb b/how-to-use-azureml/azure-databricks/amlsdk/deploy-to-aks-05.ipynb deleted file mode 100644 index f31b245bc..000000000 --- a/how-to-use-azureml/azure-databricks/amlsdk/deploy-to-aks-05.ipynb +++ /dev/null @@ -1,312 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Azure ML & Azure Databricks notebooks by Parashar Shah.\n", - "\n", - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "This notebook uses image from ACI notebook for deploying to AKS." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import azureml.core\n", - "\n", - "# Check core SDK version number\n", - "print(\"SDK version:\", azureml.core.VERSION)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Set auth to be used by workspace related APIs.\n", - "# For automation or CI/CD ServicePrincipalAuthentication can be used.\n", - "# https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.authentication.serviceprincipalauthentication?view=azure-ml-py\n", - "auth = None" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import Workspace\n", - "\n", - "ws = Workspace.from_config(auth = auth)\n", - "print('Workspace name: ' + ws.name, \n", - " 'Azure region: ' + ws.location, \n", - " 'Subscription id: ' + ws.subscription_id, \n", - " 'Resource group: ' + ws.resource_group, sep = '\\n')" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#Register the model\n", - "import os\n", - "from azureml.core.model import Model\n", - "\n", - "model_name = \"AdultCensus_runHistory_aks.mml\" # \n", - "model_name_dbfs = os.path.join(\"/dbfs\", model_name)\n", - "\n", - "print(\"copy model from dbfs to local\")\n", - "model_local = \"file:\" + os.getcwd() + \"/\" + model_name\n", - "dbutils.fs.cp(model_name, model_local, True)\n", - "\n", - "mymodel = Model.register(model_path = model_name, # this points to a local file\n", - " model_name = model_name, # this is the name the model is registered as, am using same name for both path and name. \n", - " description = \"ADB trained model by Parashar\",\n", - " workspace = ws)\n", - "\n", - "print(mymodel.name, mymodel.description, mymodel.version)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#%%writefile score_sparkml.py\n", - "score_sparkml = \"\"\"\n", - " \n", - "import json\n", - " \n", - "def init():\n", - " # One-time initialization of PySpark and predictive model\n", - " import pyspark\n", - " from azureml.core.model import Model\n", - " from pyspark.ml import PipelineModel\n", - " \n", - " global trainedModel\n", - " global spark\n", - " \n", - " spark = pyspark.sql.SparkSession.builder.appName(\"ADB and AML notebook by Parashar\").getOrCreate()\n", - " model_name = \"{model_name}\" #interpolated\n", - " model_path = Model.get_model_path(model_name)\n", - " trainedModel = PipelineModel.load(model_path)\n", - " \n", - "def run(input_json):\n", - " if isinstance(trainedModel, Exception):\n", - " return json.dumps({{\"trainedModel\":str(trainedModel)}})\n", - " \n", - " try:\n", - " sc = spark.sparkContext\n", - " input_list = json.loads(input_json)\n", - " input_rdd = sc.parallelize(input_list)\n", - " input_df = spark.read.json(input_rdd)\n", - " \n", - " # Compute prediction\n", - " prediction = trainedModel.transform(input_df)\n", - " #result = prediction.first().prediction\n", - " predictions = prediction.collect()\n", - " \n", - " #Get each scored result\n", - " preds = [str(x['prediction']) for x in predictions]\n", - " result = \",\".join(preds)\n", - " # you can return any data type as long as it is JSON-serializable\n", - " return result.tolist()\n", - " except Exception as e:\n", - " result = str(e)\n", - " return result\n", - " \n", - "\"\"\".format(model_name=model_name)\n", - " \n", - "exec(score_sparkml)\n", - " \n", - "with open(\"score_sparkml.py\", \"w\") as file:\n", - " file.write(score_sparkml)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.conda_dependencies import CondaDependencies \n", - "\n", - "myacienv = CondaDependencies.create(conda_packages=['scikit-learn','numpy','pandas']) #showing how to add libs as an eg. - not needed for this model.\n", - "\n", - "with open(\"mydeployenv.yml\",\"w\") as f:\n", - " f.write(myacienv.serialize_to_string())" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#create AKS compute\n", - "#it may take 20-25 minutes to create a new cluster\n", - "\n", - "from azureml.core.compute import AksCompute, ComputeTarget\n", - "from azureml.core.compute_target import ComputeTargetException\n", - "\n", - "aks_name = 'ps-aks-demo2' \n", - "\n", - "try:\n", - " aks_target = ComputeTarget(workspace=ws, name=aks_name)\n", - " print('Found existing cluster, use it.')\n", - "except ComputeTargetException:\n", - " # Use the default configuration (can also provide parameters to customize)\n", - " prov_config = AksCompute.provisioning_configuration()\n", - " \n", - " # Create the cluster\n", - " aks_target = ComputeTarget.create(workspace = ws, \n", - " name = aks_name, \n", - " provisioning_configuration = prov_config)\n", - "\n", - "aks_target.wait_for_completion(show_output = True)\n", - "\n", - "print(aks_target.provisioning_state)\n", - "print(aks_target.provisioning_errors)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#deploy to AKS\n", - "from azureml.core.webservice import AksWebservice, Webservice\n", - "from azureml.exceptions import WebserviceException\n", - "from azureml.core.model import InferenceConfig\n", - "\n", - "aks_config = AksWebservice.deploy_configuration(enable_app_insights=True)\n", - "\n", - "service_name = 'ps-aks-service'\n", - "\n", - "# Remove any existing service under the same name.\n", - "try:\n", - " Webservice(ws, service_name).delete()\n", - "except WebserviceException:\n", - " pass\n", - "\n", - "inference_config = InferenceConfig(runtime = 'spark-py', \n", - " entry_script ='score_sparkml.py',\n", - " conda_file ='mydeployenv.yml')\n", - "\n", - "aks_service = Model.deploy(ws, service_name, [mymodel], inference_config, aks_config, aks_target)\n", - "aks_service.wait_for_deployment(show_output=True)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "aks_service.deployment_status" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#for using the Web HTTP API \n", - "print(aks_service.scoring_uri)\n", - "print(aks_service.get_keys())" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import json\n", - "\n", - "#get the some sample data\n", - "test_data_path = \"AdultCensusIncomeTest\"\n", - "test = spark.read.parquet(test_data_path).limit(5)\n", - "\n", - "test_json = json.dumps(test.toJSON().collect())\n", - "\n", - "print(test_json)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#using data defined above predict if income is >50K (1) or <=50K (0)\n", - "aks_service.run(input_data=test_json)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#comment to not delete the web service\n", - "aks_service.delete()\n", - "#model.delete()\n", - "aks_target.delete() " - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/azure-databricks/amlsdk/deploy-to-aks-existingimage-05.png)" - ] - } - ], - "metadata": { - "authors": [ - { - "name": "pasha" - } - ], - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.8" - }, - "name": "deploy-to-aks-existingimage-05", - "notebookId": 1030695628045968 - }, - "nbformat": 4, - "nbformat_minor": 1 -} \ No newline at end of file diff --git a/how-to-use-azureml/azure-databricks/automl/automl-databricks-local-with-deployment.ipynb b/how-to-use-azureml/azure-databricks/automl/automl-databricks-local-with-deployment.ipynb index f4a423428..3ca930221 100644 --- a/how-to-use-azureml/azure-databricks/automl/automl-databricks-local-with-deployment.ipynb +++ b/how-to-use-azureml/azure-databricks/automl/automl-databricks-local-with-deployment.ipynb @@ -512,9 +512,11 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Retrieve the Best Model after the above run is complete \n", + "## Deploy\n", "\n", - "Below we select the best pipeline from our iterations. The `get_output` method returns the best run and the fitted model. The Model includes the pipeline and any pre-processing. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*." + "### Retrieve the Best Model\n", + "\n", + "Below we select the best pipeline from our iterations. The `get_output` method on `automl_classifier` returns the best run and the fitted model for the last invocation. Overloads on `get_output` allow you to retrieve the best run and fitted model for *any* logged metric or for a particular *iteration*." ] }, { @@ -523,17 +525,15 @@ "metadata": {}, "outputs": [], "source": [ - "best_run, fitted_model = local_run.get_output()\n", - "print(best_run)\n", - "print(fitted_model)" + "best_run, fitted_model = local_run.get_output()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "#### Best Model Based on Any Other Metric after the above run is complete based on the child run\n", - "Show the run and the model that has the smallest `log_loss` value:" + "### Download the conda environment file\n", + "From the *best_run* download the conda environment file that was used to train the AutoML model." ] }, { @@ -542,18 +542,20 @@ "metadata": {}, "outputs": [], "source": [ - "lookup_metric = \"log_loss\"\n", - "best_run, fitted_model = local_run.get_output(metric = lookup_metric)\n", - "print(best_run)\n", - "print(fitted_model)" + "from automl.client.core.common import constants\n", + "conda_env_file_name = 'conda_env.yml'\n", + "best_run.download_file(name=\"outputs/conda_env_v_1_0_0.yml\", output_file_path=conda_env_file_name)\n", + "with open(conda_env_file_name, \"r\") as conda_file:\n", + " conda_file_contents = conda_file.read()\n", + " print(conda_file_contents)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## Register the Fitted Model for Deployment\n", - "If neither metric nor iteration are specified in the register_model call, the iteration with the best primary metric is registered." + "### Download the model scoring file\n", + "From the *best_run* download the scoring file to get the predictions from the AutoML model." ] }, { @@ -562,49 +564,20 @@ "metadata": {}, "outputs": [], "source": [ - "description = 'AutoML Model'\n", - "tags = None\n", - "model = local_run.register_model(description = description, tags = tags)\n", - "local_run.model_id # This will be written to the scoring script file later in the notebook." + "from automl.client.core.common import constants\n", + "script_file_name = 'scoring_file.py'\n", + "best_run.download_file(name=\"outputs/scoring_file_v_1_0_0.py\", output_file_path=script_file_name)\n", + "with open(script_file_name, \"r\") as scoring_file:\n", + " scoring_file_contents = scoring_file.read()\n", + " print(scoring_file_contents)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## Create Scoring Script\n", - "Replace model_id with name of model from output of above register cell" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%writefile score.py\n", - "import pickle\n", - "import json\n", - "import numpy as np\n", - "import azureml.train.automl\n", - "from sklearn.externals import joblib\n", - "from azureml.core.model import Model\n", - "import pandas as pd\n", - "\n", - "def init():\n", - " global model\n", - " model_path = Model.get_model_path(model_name = '<>') # this name is model.id of model that we want to deploy\n", - " # deserialize the model file back into a sklearn model\n", - " model = joblib.load(model_path)\n", - "\n", - "def run(raw_data):\n", - " try:\n", - " data = (pd.DataFrame(np.array(json.loads(raw_data)['data']), columns=[str(i) for i in range(0,64)]))\n", - " result = model.predict(data)\n", - " except Exception as e:\n", - " result = str(e)\n", - " return json.dumps({\"error\": result})\n", - " return json.dumps({\"result\":result.tolist()})" + "## Register the Fitted Model for Deployment\n", + "If neither metric nor iteration are specified in the register_model call, the iteration with the best primary metric is registered." ] }, { @@ -613,43 +586,19 @@ "metadata": {}, "outputs": [], "source": [ - "#Replace <>\n", - "content = \"\"\n", - "with open(\"score.py\", \"r\") as fo:\n", - " content = fo.read()\n", - "\n", - "new_content = content.replace(\"<>\", local_run.model_id)\n", - "with open(\"score.py\", \"w\") as fw:\n", - " fw.write(new_content)" + "description = 'AutoML Model'\n", + "tags = None\n", + "model = local_run.register_model(description = description, tags = tags)\n", + "local_run.model_id # This will be written to the scoring script file later in the notebook." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "#### Create a YAML File for the Environment" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.conda_dependencies import CondaDependencies\n", + "### Deploy the model as a Web Service on Azure Container Instance\n", "\n", - "myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn'], pip_packages=['azureml-defaults', 'azureml-sdk[automl]'])\n", - "\n", - "conda_env_file_name = 'mydeployenv.yml'\n", - "myenv.save_to_file('.', conda_env_file_name)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Deploy the model as a Web Service on Azure Container Instance\n", - "Replace servicename with any meaningful name of service" + "Create the configuration needed for deploying the model as a web service service." ] }, { @@ -658,37 +607,17 @@ "metadata": {}, "outputs": [], "source": [ - "# this will take 10-15 minutes to finish\n", - "\n", - "from azureml.core.webservice import AciWebservice, Webservice\n", - "from azureml.exceptions import WebserviceException\n", "from azureml.core.model import InferenceConfig\n", - "from azureml.core.model import Model\n", - "import uuid\n", + "from azureml.core.webservice import AciWebservice\n", + "from azureml.core.environment import Environment\n", "\n", - "myaci_config = AciWebservice.deploy_configuration(\n", - " cpu_cores = 2, \n", - " memory_gb = 2, \n", - " tags = {'name':'Databricks Azure ML ACI'}, \n", - " description = 'This is for ADB and AutoML example.')\n", + "myenv = Environment.from_conda_specification(name=\"myenv\", file_path=conda_env_file_name)\n", + "inference_config = InferenceConfig(entry_script=script_file_name, environment=myenv)\n", "\n", - "inference_config = InferenceConfig(runtime= 'spark-py', \n", - " entry_script='score.py',\n", - " conda_file='mydeployenv.yml')\n", - "\n", - "guid = str(uuid.uuid4()).split(\"-\")[0]\n", - "service_name = \"myservice-{}\".format(guid)\n", - "\n", - "# Remove any existing service under the same name.\n", - "try:\n", - " Webservice(ws, service_name).delete()\n", - "except WebserviceException:\n", - " pass\n", - "\n", - "print(\"Creating service with name: {}\".format(service_name))\n", - "\n", - "myservice = Model.deploy(ws, service_name, [model], inference_config, myaci_config)\n", - "myservice.wait_for_deployment(show_output=True)" + "aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, \n", + " memory_gb = 1, \n", + " tags = {'area': \"digits\", 'type': \"automl_classification\"}, \n", + " description = 'sample service for Automl Classification')" ] }, { @@ -697,8 +626,14 @@ "metadata": {}, "outputs": [], "source": [ - "#for using the Web HTTP API \n", - "print(myservice.scoring_uri)" + "from azureml.core.webservice import Webservice\n", + "from azureml.core.model import Model\n", + "\n", + "aci_service_name = 'automl-databricks-local'\n", + "print(aci_service_name)\n", + "aci_service = Model.deploy(ws, aci_service_name, [model], inference_config, aciconfig)\n", + "aci_service.wait_for_deployment(True)\n", + "print(aci_service.state)" ] }, { @@ -742,7 +677,7 @@ "for index in np.random.choice(len(y_test), 2, replace = False):\n", " print(index)\n", " test_sample = json.dumps({'data':X_test[index:index + 1].values.tolist()})\n", - " predicted = myservice.run(input_data = test_sample)\n", + " predicted = aci_service.run(input_data = test_sample)\n", " label = y_test.values[index]\n", " predictedDict = json.loads(predicted)\n", " title = \"Label value = %d Predicted value = %s \" % ( label,predictedDict['result'][0]) \n", diff --git a/how-to-use-azureml/azureml-sdk-for-r/README.md b/how-to-use-azureml/azureml-sdk-for-r/README.md new file mode 100644 index 000000000..bd2784b97 --- /dev/null +++ b/how-to-use-azureml/azureml-sdk-for-r/README.md @@ -0,0 +1,36 @@ +## Examples to get started with Azure Machine Learning SDK for R + +Learn how to use Azure Machine Learning SDK for R for experimentation and model management. + +As a pre-requisite, go through the [Installation](vignettes/installation.Rmd) and [Configuration](vignettes/configuration.Rmd) vignettes to first install the package and set up your Azure Machine Learning Workspace unless you are running these examples on an Azure Machine Learning compute instance. Azure Machine Learning compute instances have the Azure Machine Learning SDK pre-installed and your workspace details pre-configured. + + +Samples +* Deployment + * [deploy-to-aci](./samples/deployment/deploy-to-aci): Deploy a model as a web service to Azure Container Instances (ACI). + * [deploy-to-local](./samples/deployment/deploy-to-local): Deploy a model as a web service locally. +* Training + * [train-on-amlcompute](./samples/training/train-on-amlcompute): Train a model on a remote AmlCompute cluster. + * [train-on-local](./samples/training/train-on-local): Train a model locally with Docker. + +Vignettes +* [deploy-to-aks](./vignettes/deploy-to-aks): Production deploy a model as a web service to Azure Kubernetes Service (AKS). +* [hyperparameter-tune-with-keras](./vignettes/hyperparameter-tune-with-keras): Hyperparameter tune a Keras model using HyperDrive, Azure ML's hyperparameter tuning functionality. +* [train-and-deploy-to-aci](./vignettes/train-and-deploy-to-aci): Train a caret model and deploy as a web service to Azure Container Instances (ACI). +* [train-with-tensorflow](./vignettes/train-with-tensorflow): Train a deep learning TensorFlow model with Azure ML. + +Find more information on the [official documentation site for Azure Machine Learning SDK for R](https://azure.github.io/azureml-sdk-for-r/). + + +### Troubleshooting + +- If the following error occurs when submitting an experiment using RStudio: + ```R + Error in py_call_impl(callable, dots$args, dots$keywords) : + PermissionError: [Errno 13] Permission denied + ``` + Move the files for your project into a subdirectory and reset the working directory to that directory before re-submitting. + + In order to submit an experiment, the Azure ML SDK must create a .zip file of the project directory to send to the service. However, + the SDK does not have permission to write into the .Rproj.user subdirectory that is automatically created during an RStudio + session. For this reason, the recommended best practice is to isolate project files into their own directory. diff --git a/how-to-use-azureml/azureml-sdk-for-r/samples/README.md b/how-to-use-azureml/azureml-sdk-for-r/samples/README.md new file mode 100644 index 000000000..070fe71d6 --- /dev/null +++ b/how-to-use-azureml/azureml-sdk-for-r/samples/README.md @@ -0,0 +1,11 @@ +## Azure Machine Learning samples +These samples are short code examples for using Azure Machine Learning SDK for R. If you are new to the R SDK, we recommend that you first take a look at the more detailed end-to-end [vignettes](../vignettes). + +Before running a sample in RStudio, set the working directory to the folder that contains the sample script in RStudio using `setwd(dirname)` or Session -> Set Working Directory -> To Source File Location. Each vignette assumes that the data and scripts are in the current working directory. + +1. [train-on-amlcompute](training/train-on-amlcompute): Train a model on a remote AmlCompute cluster. +2. [train-on-local](training/train-on-local): Train a model locally with Docker. +2. [deploy-to-aci](deployment/deploy-to-aci): Deploy a model as a web service to Azure Container Instances (ACI). +3. [deploy-to-local](deployment/deploy-to-local): Deploy a model as a web service locally. + +> Before you run these samples, make sure you have an Azure Machine Learning workspace. You can follow the [configuration vignette](../vignettes/configuration.Rmd) to set up a workspace. (You do not need to do this if you are running these examples on an Azure Machine Learning compute instance). diff --git a/how-to-use-azureml/azureml-sdk-for-r/samples/deployment/deploy-to-aci/deploy-to-aci.R b/how-to-use-azureml/azureml-sdk-for-r/samples/deployment/deploy-to-aci/deploy-to-aci.R new file mode 100644 index 000000000..1c286023f --- /dev/null +++ b/how-to-use-azureml/azureml-sdk-for-r/samples/deployment/deploy-to-aci/deploy-to-aci.R @@ -0,0 +1,59 @@ +# Copyright(c) Microsoft Corporation. +# Licensed under the MIT license. + +library(azuremlsdk) +library(jsonlite) + +ws <- load_workspace_from_config() + +# Register the model +model <- register_model(ws, model_path = "project_files/model.rds", + model_name = "model.rds") + +# Create environment +r_env <- r_environment(name = "r_env") + +# Create inference config +inference_config <- inference_config( + entry_script = "score.R", + source_directory = "project_files", + environment = r_env) + +# Create ACI deployment config +deployment_config <- aci_webservice_deployment_config(cpu_cores = 1, + memory_gb = 1) + +# Deploy the web service +service <- deploy_model(ws, + 'rservice', + list(model), + inference_config, + deployment_config) +wait_for_deployment(service, show_output = TRUE) + +# If you encounter any issue in deploying the webservice, please visit +# https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-troubleshoot-deployment + +# Inferencing +# versicolor +plant <- data.frame(Sepal.Length = 6.4, + Sepal.Width = 2.8, + Petal.Length = 4.6, + Petal.Width = 1.8) +# setosa +plant <- data.frame(Sepal.Length = 5.1, + Sepal.Width = 3.5, + Petal.Length = 1.4, + Petal.Width = 0.2) +# virginica +plant <- data.frame(Sepal.Length = 6.7, + Sepal.Width = 3.3, + Petal.Length = 5.2, + Petal.Width = 2.3) + +# Test the web service +predicted_val <- invoke_webservice(service, toJSON(plant)) +predicted_val + +# Delete the web service +delete_webservice(service) diff --git a/how-to-use-azureml/azureml-sdk-for-r/samples/deployment/deploy-to-aci/project_files/model.rds b/how-to-use-azureml/azureml-sdk-for-r/samples/deployment/deploy-to-aci/project_files/model.rds new file mode 100644 index 000000000..9e46c2fea Binary files /dev/null and b/how-to-use-azureml/azureml-sdk-for-r/samples/deployment/deploy-to-aci/project_files/model.rds differ diff --git a/how-to-use-azureml/azureml-sdk-for-r/samples/deployment/deploy-to-aci/project_files/score.R b/how-to-use-azureml/azureml-sdk-for-r/samples/deployment/deploy-to-aci/project_files/score.R new file mode 100644 index 000000000..be132918b --- /dev/null +++ b/how-to-use-azureml/azureml-sdk-for-r/samples/deployment/deploy-to-aci/project_files/score.R @@ -0,0 +1,17 @@ +# Copyright(c) Microsoft Corporation. +# Licensed under the MIT license. + +library(jsonlite) + +init <- function() { + model_path <- Sys.getenv("AZUREML_MODEL_DIR") + model <- readRDS(file.path(model_path, "model.rds")) + message("model is loaded") + + function(data) { + plant <- as.data.frame(fromJSON(data)) + prediction <- predict(model, plant) + result <- as.character(prediction) + toJSON(result) + } +} \ No newline at end of file diff --git a/how-to-use-azureml/azureml-sdk-for-r/samples/deployment/deploy-to-local/deploy-to-local.R b/how-to-use-azureml/azureml-sdk-for-r/samples/deployment/deploy-to-local/deploy-to-local.R new file mode 100644 index 000000000..dad3ba692 --- /dev/null +++ b/how-to-use-azureml/azureml-sdk-for-r/samples/deployment/deploy-to-local/deploy-to-local.R @@ -0,0 +1,112 @@ +# Copyright(c) Microsoft Corporation. +# Licensed under the MIT license. + +# Register model and deploy locally +# This example shows how to deploy a web service in step-by-step fashion: +# +# 1) Register model +# 2) Deploy the model as a web service in a local Docker container. +# 3) Invoke web service with SDK or call web service with raw HTTP call. +# 4) Quickly test changes to your entry script by reloading the local service. +# 5) Optionally, you can also make changes to model and update the local service. + +library(azuremlsdk) +library(jsonlite) + +ws <- load_workspace_from_config() + +# Register the model +model <- register_model(ws, model_path = "project_files/model.rds", + model_name = "model.rds") + +# Create environment +r_env <- r_environment(name = "r_env") + +# Create inference config +inference_config <- inference_config( + entry_script = "score.R", + source_directory = "project_files", + environment = r_env) + +# Create local deployment config +local_deployment_config <- local_webservice_deployment_config() + +# Deploy the web service +# NOTE: +# The Docker image runs as a Linux container. If you are running Docker for Windows, you need to ensure the Linux Engine is running: +# # PowerShell command to switch to Linux engine +# & 'C:\Program Files\Docker\Docker\DockerCli.exe' -SwitchLinuxEngine +service <- deploy_model(ws, + 'rservice-local', + list(model), + inference_config, + local_deployment_config) +# Wait for deployment +wait_for_deployment(service, show_output = TRUE) + +# Show the port of local service +message(service$port) + +# If you encounter any issue in deploying the webservice, please visit +# https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-troubleshoot-deployment + +# Inferencing +# versicolor +# plant <- data.frame(Sepal.Length = 6.4, +# Sepal.Width = 2.8, +# Petal.Length = 4.6, +# Petal.Width = 1.8) +# setosa +plant <- data.frame(Sepal.Length = 5.1, + Sepal.Width = 3.5, + Petal.Length = 1.4, + Petal.Width = 0.2) +# # virginica +# plant <- data.frame(Sepal.Length = 6.7, +# Sepal.Width = 3.3, +# Petal.Length = 5.2, +# Petal.Width = 2.3) + +#Test the web service +invoke_webservice(service, toJSON(plant)) + +## The last few lines of the logs should have the correct prediction and should display -> R[write to console]: "setosa" +cat(gsub(pattern = "\n", replacement = " \n", x = get_webservice_logs(service))) + +## Test the web service with a HTTP Raw request +# +# NOTE: +# To test the service locally use the https://localhost: URL + +# Import the request library +library(httr) +# Get the service scoring URL from the service object, its URL is for testing locally +local_service_url <- service$scoring_uri #Same as https://localhost: + +#POST request to web service +resp <- POST(local_service_url, body = plant, encode = "json", verbose()) + +## The last few lines of the logs should have the correct prediction and should display -> R[write to console]: "setosa" +cat(gsub(pattern = "\n", replacement = " \n", x = get_webservice_logs(service))) + + +# Optional, use a new scoring script +inference_config <- inference_config( + entry_script = "score_new.R", + source_directory = "project_files", + environment = r_env) + +## Then reload the service to see the changes made +reload_local_webservice_assets(service) + +## Check reloaded service, you will see the last line will say "this is a new scoring script! I was reloaded" +invoke_webservice(service, toJSON(plant)) +cat(gsub(pattern = "\n", replacement = " \n", x = get_webservice_logs(service))) + +# Update service +# If you want to change your model(s), environment, or deployment configuration, call update() to rebuild the Docker image. + +# update_local_webservice(service, models = [NewModelObject], deployment_config = deployment_config, wait = FALSE, inference_config = inference_config) + +# Delete service +delete_local_webservice(service) diff --git a/how-to-use-azureml/azureml-sdk-for-r/samples/deployment/deploy-to-local/project_files/model.rds b/how-to-use-azureml/azureml-sdk-for-r/samples/deployment/deploy-to-local/project_files/model.rds new file mode 100644 index 000000000..9e46c2fea Binary files /dev/null and b/how-to-use-azureml/azureml-sdk-for-r/samples/deployment/deploy-to-local/project_files/model.rds differ diff --git a/how-to-use-azureml/azureml-sdk-for-r/samples/deployment/deploy-to-local/project_files/score.R b/how-to-use-azureml/azureml-sdk-for-r/samples/deployment/deploy-to-local/project_files/score.R new file mode 100644 index 000000000..73bb16bab --- /dev/null +++ b/how-to-use-azureml/azureml-sdk-for-r/samples/deployment/deploy-to-local/project_files/score.R @@ -0,0 +1,18 @@ +# Copyright(c) Microsoft Corporation. +# Licensed under the MIT license. + +library(jsonlite) + +init <- function() { + model_path <- Sys.getenv("AZUREML_MODEL_DIR") + model <- readRDS(file.path(model_path, "model.rds")) + message("model is loaded") + + function(data) { + plant <- as.data.frame(fromJSON(data)) + prediction <- predict(model, plant) + result <- as.character(prediction) + message(result) + toJSON(result) + } +} \ No newline at end of file diff --git a/how-to-use-azureml/azureml-sdk-for-r/samples/deployment/deploy-to-local/project_files/score_new.R b/how-to-use-azureml/azureml-sdk-for-r/samples/deployment/deploy-to-local/project_files/score_new.R new file mode 100644 index 000000000..ebc57449e --- /dev/null +++ b/how-to-use-azureml/azureml-sdk-for-r/samples/deployment/deploy-to-local/project_files/score_new.R @@ -0,0 +1,19 @@ +# Copyright(c) Microsoft Corporation. +# Licensed under the MIT license. + +library(jsonlite) + +init <- function() { + model_path <- Sys.getenv("AZUREML_MODEL_DIR") + model <- readRDS(file.path(model_path, "model.rds")) + message("model is loaded") + + function(data) { + plant <- as.data.frame(fromJSON(data)) + prediction <- predict(model, plant) + result <- as.character(prediction) + message(result) + message("this is a new scoring script! I was reloaded") + toJSON(result) + } +} \ No newline at end of file diff --git a/how-to-use-azureml/azureml-sdk-for-r/samples/training/train-on-amlcompute/scripts/train.R b/how-to-use-azureml/azureml-sdk-for-r/samples/training/train-on-amlcompute/scripts/train.R new file mode 100644 index 000000000..9825c2bae --- /dev/null +++ b/how-to-use-azureml/azureml-sdk-for-r/samples/training/train-on-amlcompute/scripts/train.R @@ -0,0 +1,34 @@ +# This script loads a dataset of which the last column is supposed to be the +# class and logs the accuracy + +library(azuremlsdk) +library(caret) +library(optparse) +library(datasets) + + +iris_data <- data(iris) +summary(iris_data) + +in_train <- createDataPartition(y = iris_data$Species, p = .8, list = FALSE) +train_data <- iris_data[in_train,] +test_data <- iris_data[-in_train,] + +# Run algorithms using 10-fold cross validation +control <- trainControl(method = "cv", number = 10) +metric <- "Accuracy" + +set.seed(7) +model <- train(Species ~ ., + data = train_data, + method = "lda", + metric = metric, + trControl = control) +predictions <- predict(model, test_data) +conf_matrix <- confusionMatrix(predictions, test_data$Species) +message(conf_matrix) + +log_metric_to_run(metric, conf_matrix$overall["Accuracy"]) + +saveRDS(model, file = "./outputs/model.rds") +message("Model saved") diff --git a/how-to-use-azureml/azureml-sdk-for-r/samples/training/train-on-amlcompute/train-on-amlcompute.R b/how-to-use-azureml/azureml-sdk-for-r/samples/training/train-on-amlcompute/train-on-amlcompute.R new file mode 100644 index 000000000..e033db77c --- /dev/null +++ b/how-to-use-azureml/azureml-sdk-for-r/samples/training/train-on-amlcompute/train-on-amlcompute.R @@ -0,0 +1,41 @@ +# Copyright(c) Microsoft Corporation. +# Licensed under the MIT license. + +# Reminder: set working directory to current file location prior to running this script + +library(azuremlsdk) + +ws <- load_workspace_from_config() + +# Create AmlCompute cluster +cluster_name <- "r-cluster" +compute_target <- get_compute(ws, cluster_name = cluster_name) +if (is.null(compute_target)) { + vm_size <- "STANDARD_D2_V2" + compute_target <- create_aml_compute(workspace = ws, + cluster_name = cluster_name, + vm_size = vm_size, + max_nodes = 1) + + wait_for_provisioning_completion(compute_target, show_output = TRUE) +} + +# Define estimator +est <- estimator(source_directory = "scripts", + entry_script = "train.R", + compute_target = compute_target) + +experiment_name <- "train-r-script-on-amlcompute" +exp <- experiment(ws, experiment_name) + +# Submit job and display the run details +run <- submit_experiment(exp, est) +view_run_details(run) +wait_for_run_completion(run, show_output = TRUE) + +# Get the run metrics +metrics <- get_run_metrics(run) +metrics + +# Delete cluster +delete_compute(compute_target) diff --git a/how-to-use-azureml/azureml-sdk-for-r/samples/training/train-on-local/scripts/train.R b/how-to-use-azureml/azureml-sdk-for-r/samples/training/train-on-local/scripts/train.R new file mode 100644 index 000000000..720045870 --- /dev/null +++ b/how-to-use-azureml/azureml-sdk-for-r/samples/training/train-on-local/scripts/train.R @@ -0,0 +1,28 @@ +# This script loads a dataset of which the last column is supposed to be the +# class and logs the accuracy + +library(azuremlsdk) +library(caret) +library(datasets) + +iris_data <- data(iris) +summary(iris_data) + +in_train <- createDataPartition(y = iris_data$Species, p = .8, list = FALSE) +train_data <- iris_data[in_train,] +test_data <- iris_data[-in_train,] +# Run algorithms using 10-fold cross validation +control <- trainControl(method = "cv", number = 10) +metric <- "Accuracy" + +set.seed(7) +model <- train(Species ~ ., + data = train_data, + method = "lda", + metric = metric, + trControl = control) +predictions <- predict(model, test_data) +conf_matrix <- confusionMatrix(predictions, test_data$Species) +message(conf_matrix) + +log_metric_to_run(metric, conf_matrix$overall["Accuracy"]) \ No newline at end of file diff --git a/how-to-use-azureml/azureml-sdk-for-r/samples/training/train-on-local/train-on-local.R b/how-to-use-azureml/azureml-sdk-for-r/samples/training/train-on-local/train-on-local.R new file mode 100644 index 000000000..ecb75cd01 --- /dev/null +++ b/how-to-use-azureml/azureml-sdk-for-r/samples/training/train-on-local/train-on-local.R @@ -0,0 +1,26 @@ +# Copyright(c) Microsoft Corporation. +# Licensed under the MIT license. + +# Reminder: set working directory to current file location prior to running this script + +library(azuremlsdk) + +ws <- load_workspace_from_config() + +# Define estimator +est <- estimator(source_directory = "scripts", + entry_script = "train.R", + compute_target = "local") + +# Initialize experiment +experiment_name <- "train-r-script-on-local" +exp <- experiment(ws, experiment_name) + +# Submit job and display the run details +run <- submit_experiment(exp, est) +view_run_details(run) +wait_for_run_completion(run, show_output = TRUE) + +# Get the run metrics +metrics <- get_run_metrics(run) +metrics diff --git a/how-to-use-azureml/azureml-sdk-for-r/vignettes/README.md b/how-to-use-azureml/azureml-sdk-for-r/vignettes/README.md new file mode 100644 index 000000000..618714e81 --- /dev/null +++ b/how-to-use-azureml/azureml-sdk-for-r/vignettes/README.md @@ -0,0 +1,17 @@ +## Azure Machine Learning vignettes + +These vignettes are end-to-end tutorials for using Azure Machine Learning SDK for R. + +Before running a vignette in RStudio, set the working directory to the folder that contains the vignette file (.Rmd file) in RStudio using `setwd(dirname)` or Session -> Set Working Directory -> To Source File Location. Each vignette assumes that the data and scripts are in the current working directory. + +The following vignettes are included: +1. [installation](installation.Rmd): Install the Azure ML SDK for R. +2. [configuration](configuration.Rmd): Set up an Azure ML workspace. +3. [train-and-deploy-to-aci](train-and-deploy-to-aci): Train a caret model and deploy as a web service to Azure Container Instances (ACI). +4. [train-with-tensorflow](train-with-tensorflow/): Train a deep learning TensorFlow model with Azure ML. +5. [hyperparameter-tune-with-keras](hyperparameter-tune-with-keras/): Hyperparameter tune a Keras model using HyperDrive, Azure ML's hyperparameter tuning functionality. +6. [deploy-to-aks](deploy-to-aks/): Production deploy a model as a web service to Azure Kubernetes Service (AKS). + +> Before you run these samples, make sure you have an Azure Machine Learning workspace. You can follow the [configuration vignette](../vignettes/configuration.Rmd) to set up a workspace. (You do not need to do this if you are running these examples on an Azure Machine Learning compute instance). + +For additional examples on using the R SDK, see the [samples](../samples) folder. \ No newline at end of file diff --git a/how-to-use-azureml/azureml-sdk-for-r/vignettes/configuration.Rmd b/how-to-use-azureml/azureml-sdk-for-r/vignettes/configuration.Rmd new file mode 100644 index 000000000..2cde4d75e --- /dev/null +++ b/how-to-use-azureml/azureml-sdk-for-r/vignettes/configuration.Rmd @@ -0,0 +1,108 @@ +--- +title: "Set up an Azure ML workspace" +date: "`r Sys.Date()`" +output: rmarkdown::html_vignette +vignette: > + %\VignetteIndexEntry{Set up an Azure ML workspace} + %\VignetteEngine{knitr::rmarkdown} + \use_package{UTF-8} +--- + +This tutorial gets you started with the Azure Machine Learning service by walking through the requirements and instructions for setting up a workspace, the top-level resource for Azure ML. + +You do not need run this if you are working on an Azure Machine Learning Compute Instance, as the compute instance is already associated with an existing workspace. + +## What is an Azure ML workspace? +The workspace is the top-level resource for Azure ML, providing a centralized place to work with all the artifacts you create when you use Azure ML. The workspace keeps a history of all training runs, including logs, metrics, output, and a snapshot of your scripts. + +When you create a new workspace, it automatically creates several Azure resources that are used by the workspace: + +* Azure Container Registry: Registers docker containers that you use during training and when you deploy a model. To minimize costs, ACR is lazy-loaded until deployment images are created. +* Azure Storage account: Used as the default datastore for the workspace. +* Azure Application Insights: Stores monitoring information about your models. +* Azure Key Vault: Stores secrets that are used by compute targets and other sensitive information that's needed by the workspace. + +## Setup +This section describes the steps required before you can access any Azure ML service functionality. + +### Azure subscription +In order to create an Azure ML workspace, first you need access to an Azure subscription. An Azure subscription allows you to manage storage, compute, and other assets in the Azure cloud. You can [create a new subscription](https://azure.microsoft.com/en-us/free/) or access existing subscription information from the [Azure portal](https://portal.azure.com/). Later in this tutorial you will need information such as your subscription ID in order to create and access workspaces. + +### Azure ML SDK installation +Follow the [installation guide](https://azure.github.io/azureml-sdk-for-r/articles/installation.html) to install **azuremlsdk** on your machine. + +## Configure your workspace +### Workspace parameters +To use an Azure ML workspace, you will need to supply the following information: + +* Your subscription ID +* A resource group name +* (Optional) The region that will host your workspace +* A name for your workspace + +You can get your subscription ID from the [Azure portal](https://portal.azure.com/). + +You will also need access to a [resource group](https://docs.microsoft.com/en-us/azure/azure-resource-manager/resource-group-overview#resource-groups), which organizes Azure resources and provides a default region for the resources in a group. You can see what resource groups to which you have access, or create a new one in the Azure portal. If you don't have a resource group, the `create_workspace()` method will create one for you using the name you provide. + +The region to host your workspace will be used if you are creating a new workspace. You do not need to specify this if you are using an existing workspace. You can find the list of supported regions [here](https://azure.microsoft.com/en-us/global-infrastructure/services/?products=machine-learning-service). You should pick a region that is close to your location or that contains your data. + +The name for your workspace is unique within the subscription and should be descriptive enough to discern among other workspaces. The subscription may be used only by you, or it may be used by your department or your entire enterprise, so choose a name that makes sense for your situation. + +The following code chunk allows you to specify your workspace parameters. It uses `Sys.getenv` to read values from environment variables, which is useful for automation. If no environment variable exists, the parameters will be set to the specified default values. Replace the default values in the code below with your default parameter values. + +``` {r configure_parameters, eval=FALSE} +subscription_id <- Sys.getenv("SUBSCRIPTION_ID", unset = "") +resource_group <- Sys.getenv("RESOURCE_GROUP", default="") +workspace_name <- Sys.getenv("WORKSPACE_NAME", default="") +workspace_region <- Sys.getenv("WORKSPACE_REGION", default="eastus2") +``` + +### Create a new workspace +If you don't have an existing workspace and are the owner of the subscription or resource group, you can create a new workspace. If you don't have a resource group, `create_workspace()` will create one for you using the name you provide. If you don't want it to do so, set the `create_resource_group = FALSE` parameter. + +Note: As with other Azure services, there are limits on certain resources (e.g. AmlCompute quota) associated with the Azure ML service. Please read this [article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota. + +This cell will create an Azure ML workspace for you in a subscription, provided you have the correct permissions. + +This will fail if: + +* You do not have permission to create a workspace in the resource group. +* You do not have permission to create a resource group if it does not exist. +* You are not a subscription owner or contributor and no Azure ML workspaces have ever been created in this subscription. + +If workspace creation fails, please work with your IT admin to provide you with the appropriate permissions or to provision the required resources. + +There are additional parameters that are not shown below that can be configured when creating a workspace. Please see [`create_workspace()`](https://azure.github.io/azureml-sdk-for-r/reference/create_workspace.html) for more details. + +``` {r create_workspace, eval=FALSE} +library(azuremlsdk) + +ws <- create_workspace(name = workspace_name, + subscription_id = subscription_id, + resource_group = resource_group, + location = workspace_region, + exist_ok = TRUE) +``` + +You can out write out the workspace ARM properties to a config file with [`write_workspace_config()`](https://azure.github.io/azureml-sdk-for-r/reference/write_workspace_config.html). The method provides a simple way of reusing the same workspace across multiple files or projects. Users can save the workspace details with `write_workspace_config()`, and use [`load_workspace_from_config()`](https://azure.github.io/azureml-sdk-for-r/reference/load_workspace_from_config.html) to load the same workspace in different files or projects without retyping the workspace ARM properties. The method defaults to writing out the config file to the current working directory with "config.json" as the file name. To specify a different path or file name, set the `path` and `file_name` parameters. + +``` {r write_config, eval=FALSE} +write_workspace_config(ws) +``` + +### Access an existing workspace +You can access an existing workspace in a couple of ways. If your workspace properties were previously saved to a config file, you can load the workspace as follows: + +``` {r load_config, eval=FALSE} +ws <- load_workspace_from_config() +``` + +If Azure ML cannot find the config file, specify the path to the config file with the `path` parameter. The method defaults to starting the search in the current directory. + +You can also initialize a workspace using the [`get_workspace()`](https://azure.github.io/azureml-sdk-for-r/reference/get_workspace.html) method. + +``` {r get_workspace, eval=FALSE} +ws <- get_workspace(name = workspace_name, + subscription_id = subscription_id, + resource_group = resource_group) +``` \ No newline at end of file diff --git a/how-to-use-azureml/azureml-sdk-for-r/vignettes/deploy-to-aks/deploy-to-aks.Rmd b/how-to-use-azureml/azureml-sdk-for-r/vignettes/deploy-to-aks/deploy-to-aks.Rmd new file mode 100644 index 000000000..603af7525 --- /dev/null +++ b/how-to-use-azureml/azureml-sdk-for-r/vignettes/deploy-to-aks/deploy-to-aks.Rmd @@ -0,0 +1,188 @@ +--- +title: "Deploy a web service to Azure Kubernetes Service" +date: "`r Sys.Date()`" +output: rmarkdown::html_vignette +vignette: > + %\VignetteIndexEntry{Deploy a web service to Azure Kubernetes Service} + %\VignetteEngine{knitr::rmarkdown} + \use_package{UTF-8} +--- + +This tutorial demonstrates how to deploy a model as a web service on [Azure Kubernetes Service](https://azure.microsoft.com/en-us/services/kubernetes-service/) (AKS). AKS is good for high-scale production deployments; use it if you need one or more of the following capabilities: + +* Fast response time +* Autoscaling of the deployed service +* Hardware acceleration options such as GPU + +You will learn to: + +* Set up your testing environment +* Register a model +* Provision an AKS cluster +* Deploy the model to AKS +* Test the deployed service + +## Prerequisites +If you don’t have access to an Azure ML workspace, follow the [setup tutorial](https://azure.github.io/azureml-sdk-for-r/articles/configuration.html) to configure and create a workspace. + +## Set up your testing environment +Start by setting up your environment. This includes importing the **azuremlsdk** package and connecting to your workspace. + +### Import package +```{r import_package, eval=FALSE} +library(azuremlsdk) +``` + +### Load your workspace +Instantiate a workspace object from your existing workspace. The following code will load the workspace details from a **config.json** file if you previously wrote one out with `write_workspace_config()`. +```{r load_workspace, eval=FALSE} +ws <- load_workspace_from_config() +``` + +Or, you can retrieve a workspace by directly specifying your workspace details: +```{r get_workspace, eval=FALSE} +ws <- get_workspace("", "", "") +``` + +## Register the model +In this tutorial we will deploy a model that was trained in one of the [samples](https://github.com/Azure/azureml-sdk-for-r/blob/master/samples/training/train-on-amlcompute/train-on-amlcompute.R). The model was trained with the Iris dataset and can be used to determine if a flower is one of three Iris flower species (setosa, versicolor, virginica). We have provided the model file (`model.rds`) for the tutorial; it is located in the "project_files" directory of this vignette. + +First, register the model to your workspace with [`register_model()`](https://azure.github.io/azureml-sdk-for-r/reference/register_model.html). A registered model can be any collection of files, but in this case the R model file is sufficient. Azure ML will use the registered model for deployment. + +```{r register_model, eval=FALSE} +model <- register_model(ws, + model_path = "project_files/model.rds", + model_name = "iris_model", + description = "Predict an Iris flower type") +``` + +## Provision an AKS cluster +When deploying a web service to AKS, you deploy to an AKS cluster that is connected to your workspace. There are two ways to connect an AKS cluster to your workspace: + +* Create the AKS cluster. The process automatically connects the cluster to the workspace. +* Attach an existing AKS cluster to your workspace. You can attach a cluster with the [`attach_aks_compute()`](https://azure.github.io/azureml-sdk-for-r/reference/attach_aks_compute.html) method. + +Creating or attaching an AKS cluster is a one-time process for your workspace. You can reuse this cluster for multiple deployments. If you delete the cluster or the resource group that contains it, you must create a new cluster the next time you need to deploy. + +In this tutorial, we will go with the first method of provisioning a new cluster. See the [`create_aks_compute()`](https://azure.github.io/azureml-sdk-for-r/reference/create_aks_compute.html) reference for the full set of configurable parameters. If you pick custom values for the `agent_count` and `vm_size` parameters, you need to make sure `agent_count` multiplied by `vm_size` is greater than or equal to `12` virtual CPUs. + +``` {r provision_cluster, eval=FALSE} +aks_target <- create_aks_compute(ws, cluster_name = 'myakscluster') + +wait_for_provisioning_completion(aks_target, show_output = TRUE) +``` + +The Azure ML SDK does not provide support for scaling an AKS cluster. To scale the nodes in the cluster, use the UI for your AKS cluster in the Azure portal. You can only change the node count, not the VM size of the cluster. + +## Deploy as a web service +### Define the inference dependencies +To deploy a model, you need an **inference configuration**, which describes the environment needed to host the model and web service. To create an inference config, you will first need a scoring script and an Azure ML environment. + +The scoring script (`entry_script`) is an R script that will take as input variable values (in JSON format) and output a prediction from your model. For this tutorial, use the provided scoring file `score.R`. The scoring script must contain an `init()` method that loads your model and returns a function that uses the model to make a prediction based on the input data. See the [documentation](https://azure.github.io/azureml-sdk-for-r/reference/inference_config.html#details) for more details. + +Next, define an Azure ML **environment** for your script’s package dependencies. With an environment, you specify R packages (from CRAN or elsewhere) that are needed for your script to run. You can also provide the values of environment variables that your script can reference to modify its behavior. + +By default Azure ML will build a default Docker image that includes R, the Azure ML SDK, and additional required dependencies for deployment. See the documentation here for the full list of dependencies that will be installed in the default container. You can also specify additional packages to be installed at runtime, or even a custom Docker image to be used instead of the base image that will be built, using the other available parameters to [`r_environment()`](https://azure.github.io/azureml-sdk-for-r/reference/r_environment.html). + +```{r create_env, eval=FALSE} +r_env <- r_environment(name = "deploy_env") +``` + +Now you have everything you need to create an inference config for encapsulating your scoring script and environment dependencies. + +``` {r create_inference_config, eval=FALSE} +inference_config <- inference_config( + entry_script = "score.R", + source_directory = "project_files", + environment = r_env) +``` + +### Deploy to AKS +Now, define the deployment configuration that describes the compute resources needed, for example, the number of cores and memory. See the [`aks_webservice_deployment_config()`](https://azure.github.io/azureml-sdk-for-r/reference/aks_webservice_deployment_config.html) for the full set of configurable parameters. + +``` {r deploy_config, eval=FALSE} +aks_config <- aks_webservice_deployment_config(cpu_cores = 1, memory_gb = 1) +``` + +Now, deploy your model as a web service to the AKS cluster you created earlier. + +```{r deploy_service, eval=FALSE} +aks_service <- deploy_model(ws, + 'my-new-aksservice', + models = list(model), + inference_config = inference_config, + deployment_config = aks_config, + deployment_target = aks_target) + +wait_for_deployment(aks_service, show_output = TRUE) +``` + +To inspect the logs from the deployment: +```{r get_logs, eval=FALSE} +get_webservice_logs(aks_service) +``` + +If you encounter any issue in deploying the web service, please visit the [troubleshooting guide](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-troubleshoot-deployment). + +## Test the deployed service +Now that your model is deployed as a service, you can test the service from R using [`invoke_webservice()`](https://azure.github.io/azureml-sdk-for-r/reference/invoke_webservice.html). Provide a new set of data to predict from, convert it to JSON, and send it to the service. + +``` {r test_service, eval=FALSE} +library(jsonlite) +# versicolor +plant <- data.frame(Sepal.Length = 6.4, + Sepal.Width = 2.8, + Petal.Length = 4.6, + Petal.Width = 1.8) + +# setosa +# plant <- data.frame(Sepal.Length = 5.1, +# Sepal.Width = 3.5, +# Petal.Length = 1.4, +# Petal.Width = 0.2) + +# virginica +# plant <- data.frame(Sepal.Length = 6.7, +# Sepal.Width = 3.3, +# Petal.Length = 5.2, +# Petal.Width = 2.3) + +predicted_val <- invoke_webservice(aks_service, toJSON(plant)) +message(predicted_val) +``` + +You can also get the web service’s HTTP endpoint, which accepts REST client calls. You can share this endpoint with anyone who wants to test the web service or integrate it into an application. + +``` {r eval=FALSE} +aks_service$scoring_uri +``` + +## Web service authentication +When deploying to AKS, key-based authentication is enabled by default. You can also enable token-based authentication. Token-based authentication requires clients to use an Azure Active Directory account to request an authentication token, which is used to make requests to the deployed service. + +To disable key-based auth, set the `auth_enabled = FALSE` parameter when creating the deployment configuration with [`aks_webservice_deployment_config()`](https://azure.github.io/azureml-sdk-for-r/reference/aks_webservice_deployment_config.html). +To enable token-based auth, set `token_auth_enabled = TRUE` when creating the deployment config. + +### Key-based authentication +If key authentication is enabled, you can use the [`get_webservice_keys()`](https://azure.github.io/azureml-sdk-for-r/reference/get_webservice_keys.html) method to retrieve a primary and secondary authentication key. To generate a new key, use [`generate_new_webservice_key()`](https://azure.github.io/azureml-sdk-for-r/reference/generate_new_webservice_key.html). + +### Token-based authentication +If token authentication is enabled, you can use the [`get_webservice_token()`](https://azure.github.io/azureml-sdk-for-r/reference/get_webservice_token.html) method to retrieve a JWT token and that token's expiration time. Make sure to request a new token after the token's expiration time. + +## Clean up resources +Delete the resources once you no longer need them. Do not delete any resource you plan on still using. + +Delete the web service: +```{r delete_service, eval=FALSE} +delete_webservice(aks_service) +``` + +Delete the registered model: +```{r delete_model, eval=FALSE} +delete_model(model) +``` + +Delete the AKS cluster: +```{r delete_cluster, eval=FALSE} +delete_compute(aks_target) +``` \ No newline at end of file diff --git a/how-to-use-azureml/azureml-sdk-for-r/vignettes/deploy-to-aks/project_files/model.rds b/how-to-use-azureml/azureml-sdk-for-r/vignettes/deploy-to-aks/project_files/model.rds new file mode 100644 index 000000000..9e46c2fea Binary files /dev/null and b/how-to-use-azureml/azureml-sdk-for-r/vignettes/deploy-to-aks/project_files/model.rds differ diff --git a/how-to-use-azureml/azureml-sdk-for-r/vignettes/deploy-to-aks/project_files/score.R b/how-to-use-azureml/azureml-sdk-for-r/vignettes/deploy-to-aks/project_files/score.R new file mode 100644 index 000000000..53c47dac9 --- /dev/null +++ b/how-to-use-azureml/azureml-sdk-for-r/vignettes/deploy-to-aks/project_files/score.R @@ -0,0 +1,17 @@ +#' Copyright(c) Microsoft Corporation. +#' Licensed under the MIT license. + +library(jsonlite) + +init <- function() { + model_path <- Sys.getenv("AZUREML_MODEL_DIR") + model <- readRDS(file.path(model_path, "model.rds")) + message("model is loaded") + + function(data) { + plant <- as.data.frame(fromJSON(data)) + prediction <- predict(model, plant) + result <- as.character(prediction) + toJSON(result) + } +} diff --git a/how-to-use-azureml/azureml-sdk-for-r/vignettes/hyperparameter-tune-with-keras/hyperparameter-tune-with-keras.Rmd b/how-to-use-azureml/azureml-sdk-for-r/vignettes/hyperparameter-tune-with-keras/hyperparameter-tune-with-keras.Rmd new file mode 100644 index 000000000..200cac139 --- /dev/null +++ b/how-to-use-azureml/azureml-sdk-for-r/vignettes/hyperparameter-tune-with-keras/hyperparameter-tune-with-keras.Rmd @@ -0,0 +1,242 @@ +--- +title: "Hyperparameter tune a Keras model" +date: "`r Sys.Date()`" +output: rmarkdown::html_vignette +vignette: > + %\VignetteIndexEntry{Hyperparameter tune a Keras model} + %\VignetteEngine{knitr::rmarkdown} + \use_package{UTF-8} +--- + +This tutorial demonstrates how you can efficiently tune hyperparameters for a model using HyperDrive, Azure ML's hyperparameter tuning functionality. You will train a Keras model on the CIFAR10 dataset, automate hyperparameter exploration, launch parallel jobs, log your results, and find the best run. + +### What are hyperparameters? + +Hyperparameters are variable parameters chosen to train a model. Learning rate, number of epochs, and batch size are all examples of hyperparameters. + +Using brute-force methods to find the optimal values for parameters can be time-consuming, and poor-performing runs can result in wasted money. To avoid this, HyperDrive automates hyperparameter exploration in a time-saving and cost-effective manner by launching several parallel runs with different configurations and finding the configuration that results in best performance on your primary metric. + +Let's get started with the example to see how it works! + +## Prerequisites + +If you don’t have access to an Azure ML workspace, follow the [setup tutorial](https://azure.github.io/azureml-sdk-for-r/articles/configuration.html) to configure and create a workspace. + +## Set up development environment +The setup for your development work in this tutorial includes the following actions: + +* Import required packages +* Connect to a workspace +* Create an experiment to track your runs +* Create a remote compute target to use for training + +### Import **azuremlsdk** package +```{r eval=FALSE} +library(azuremlsdk) +``` + +### Load your workspace +Instantiate a workspace object from your existing workspace. The following code will load the workspace details from a **config.json** file if you previously wrote one out with [`write_workspace_config()`](https://azure.github.io/azureml-sdk-for-r/reference/write_workspace_config.html). +```{r load_workpace, eval=FALSE} +ws <- load_workspace_from_config() +``` + +Or, you can retrieve a workspace by directly specifying your workspace details: +```{r get_workpace, eval=FALSE} +ws <- get_workspace("", "", "") +``` + +### Create an experiment +An Azure ML **experiment** tracks a grouping of runs, typically from the same training script. Create an experiment to track hyperparameter tuning runs for the Keras model. + +```{r create_experiment, eval=FALSE} +exp <- experiment(workspace = ws, name = 'hyperdrive-cifar10') +``` + +If you would like to track your runs in an existing experiment, simply specify that experiment's name to the `name` parameter of `experiment()`. + +### Create a compute target +By using Azure Machine Learning Compute (AmlCompute), a managed service, data scientists can train machine learning models on clusters of Azure virtual machines. In this tutorial, you create a GPU-enabled cluster as your training environment. The code below creates the compute cluster for you if it doesn't already exist in your workspace. + +You may need to wait a few minutes for your compute cluster to be provisioned if it doesn't already exist. + +```{r create_cluster, eval=FALSE} +cluster_name <- "gpucluster" + +compute_target <- get_compute(ws, cluster_name = cluster_name) +if (is.null(compute_target)) +{ + vm_size <- "STANDARD_NC6" + compute_target <- create_aml_compute(workspace = ws, + cluster_name = cluster_name, + vm_size = vm_size, + max_nodes = 4) + + wait_for_provisioning_completion(compute_target, show_output = TRUE) +} +``` + +## Prepare the training script +A training script called `cifar10_cnn.R` has been provided for you in the "project_files" directory of this tutorial. + +In order to leverage HyperDrive, the training script for your model must log the relevant metrics during model training. When you configure the hyperparameter tuning run, you specify the primary metric to use for evaluating run performance. You must log this metric so it is available to the hyperparameter tuning process. + +In order to log the required metrics, you need to do the following **inside the training script**: + +* Import the **azuremlsdk** package +``` +library(azuremlsdk) +``` + +* Take the hyperparameters as command-line arguments to the script. This is necessary so that when HyperDrive carries out the hyperparameter sweep, it can run the training script with different values to the hyperparameters as defined by the search space. + +* Use the [`log_metric_to_run()`](https://azure.github.io/azureml-sdk-for-r/reference/log_metric_to_run.html) function to log the hyperparameters and the primary metric. +``` +log_metric_to_run("batch_size", batch_size) +... +log_metric_to_run("epochs", epochs) +... +log_metric_to_run("lr", lr) +... +log_metric_to_run("decay", decay) +... +log_metric_to_run("Loss", results[[1]]) +``` + +## Create an estimator + +An Azure ML **estimator** encapsulates the run configuration information needed for executing a training script on the compute target. Azure ML runs are run as containerized jobs on the specified compute target. By default, the Docker image built for your training job will include R, the Azure ML SDK, and a set of commonly used R packages. See the full list of default packages included [here](https://azure.github.io/azureml-sdk-for-r/reference/r_environment.html). The estimator is used to define the configuration for each of the child runs that the parent HyperDrive run will kick off. + +To create the estimator, define the following: + +* The directory that contains your scripts needed for training (`source_directory`). All the files in this directory are uploaded to the cluster node(s) for execution. The directory must contain your training script and any additional scripts required. +* The training script that will be executed (`entry_script`). +* The compute target (`compute_target`), in this case the AmlCompute cluster you created earlier. +* Any environment dependencies required for training. Since the training script requires the Keras package, which is not included in the image by default, pass the package name to the `cran_packages` parameter to have it installed in the Docker container where the job will run. See the [`estimator()`](https://azure.github.io/azureml-sdk-for-r/reference/estimator.html) reference for the full set of configurable options. +* Set the `use_gpu = TRUE` flag so the default base GPU Docker image will be built, since the job will be run on a GPU cluster. + +```{r create_estimator, eval=FALSE} +est <- estimator(source_directory = "project_files", + entry_script = "cifar10_cnn.R", + compute_target = compute_target, + cran_packages = c("keras"), + use_gpu = TRUE) +``` + +## Configure the HyperDrive run +To kick off hyperparameter tuning in Azure ML, you will need to configure a HyperDrive run, which will in turn launch individual children runs of the training scripts with the corresponding hyperparameter values. + +### Define search space + +In this experiment, we will use four hyperparameters: batch size, number of epochs, learning rate, and decay. In order to begin tuning, we must define the range of values we would like to explore from and how they will be distributed. This is called a parameter space definition and can be created with discrete or continuous ranges. + +__Discrete hyperparameters__ are specified as a choice among discrete values represented as a list. + +Advanced discrete hyperparameters can also be specified using a distribution. The following distributions are supported: + + * `quniform(low, high, q)` + * `qloguniform(low, high, q)` + * `qnormal(mu, sigma, q)` + * `qlognormal(mu, sigma, q)` + +__Continuous hyperparameters__ are specified as a distribution over a continuous range of values. The following distributions are supported: + + * `uniform(low, high)` + * `loguniform(low, high)` + * `normal(mu, sigma)` + * `lognormal(mu, sigma)` + +Here, we will use the [`random_parameter_sampling()`](https://azure.github.io/azureml-sdk-for-r/reference/random_parameter_sampling.html) function to define the search space for each hyperparameter. `batch_size` and `epochs` will be chosen from discrete sets while `lr` and `decay` will be drawn from continuous distributions. + +Other available sampling function options are: + + * [`grid_parameter_sampling()`](https://azure.github.io/azureml-sdk-for-r/reference/grid_parameter_sampling.html) + * [`bayesian_parameter_sampling()`](https://azure.github.io/azureml-sdk-for-r/reference/bayesian_parameter_sampling.html) + +```{r search_space, eval=FALSE} +sampling <- random_parameter_sampling(list(batch_size = choice(c(16, 32, 64)), + epochs = choice(c(200, 350, 500)), + lr = normal(0.0001, 0.005), + decay = uniform(1e-6, 3e-6))) +``` + +### Define termination policy + +To prevent resource waste, Azure ML can detect and terminate poorly performing runs. HyperDrive will do this automatically if you specify an early termination policy. + +Here, you will use the [`bandit_policy()`](https://azure.github.io/azureml-sdk-for-r/reference/bandit_policy.html), which terminates any runs where the primary metric is not within the specified slack factor with respect to the best performing training run. + +```{r termination_policy, eval=FALSE} +policy <- bandit_policy(slack_factor = 0.15) +``` + +Other termination policy options are: + + * [`median_stopping_policy()`](https://azure.github.io/azureml-sdk-for-r/reference/median_stopping_policy.html) + * [`truncation_selection_policy()`](https://azure.github.io/azureml-sdk-for-r/reference/truncation_selection_policy.html) + +If no policy is provided, all runs will continue to completion regardless of performance. + +### Finalize configuration + +Now, you can create a `HyperDriveConfig` object to define your HyperDrive run. Along with the sampling and policy definitions, you need to specify the name of the primary metric that you want to track and whether we want to maximize it or minimize it. The `primary_metric_name` must correspond with the name of the primary metric you logged in your training script. `max_total_runs` specifies the total number of child runs to launch. See the [hyperdrive_config()](https://azure.github.io/azureml-sdk-for-r/reference/hyperdrive_config.html) reference for the full set of configurable parameters. + +```{r create_config, eval=FALSE} +hyperdrive_config <- hyperdrive_config(hyperparameter_sampling = sampling, + primary_metric_goal("MINIMIZE"), + primary_metric_name = "Loss", + max_total_runs = 4, + policy = policy, + estimator = est) +``` + +## Submit the HyperDrive run + +Finally submit the experiment to run on your cluster. The parent HyperDrive run will launch the individual child runs. `submit_experiment()` will return a `HyperDriveRun` object that you will use to interface with the run. In this tutorial, since the cluster we created scales to a max of `4` nodes, all 4 child runs will be launched in parallel. + +```{r submit_run, eval=FALSE} +hyperdrive_run <- submit_experiment(exp, hyperdrive_config) +``` + +You can view the HyperDrive run’s details as a table. Clicking the “Web View” link provided will bring you to Azure Machine Learning studio, where you can monitor the run in the UI. + +```{r eval=FALSE} +view_run_details(hyperdrive_run) +``` + +Wait until hyperparameter tuning is complete before you run more code. + +```{r eval=FALSE} +wait_for_run_completion(hyperdrive_run, show_output = TRUE) +``` + +## Analyse runs by performance + +Finally, you can view and compare the metrics collected during all of the child runs! + +```{r analyse_runs, eval=FALSE} +# Get the metrics of all the child runs +child_run_metrics <- get_child_run_metrics(hyperdrive_run) +child_run_metrics + +# Get the child run objects sorted in descending order by the best primary metric +child_runs <- get_child_runs_sorted_by_primary_metric(hyperdrive_run) +child_runs + +# Directly get the run object of the best performing run +best_run <- get_best_run_by_primary_metric(hyperdrive_run) + +# Get the metrics of the best performing run +metrics <- get_run_metrics(best_run) +metrics +``` + +The `metrics` variable will include the values of the hyperparameters that resulted in the best performing run. + +## Clean up resources +Delete the resources once you no longer need them. Don't delete any resource you plan to still use. + +Delete the compute cluster: +```{r delete_compute, eval=FALSE} +delete_compute(compute_target) +``` \ No newline at end of file diff --git a/how-to-use-azureml/azureml-sdk-for-r/vignettes/hyperparameter-tune-with-keras/project_files/cifar10_cnn.R b/how-to-use-azureml/azureml-sdk-for-r/vignettes/hyperparameter-tune-with-keras/project_files/cifar10_cnn.R new file mode 100644 index 000000000..2d1f8ac9d --- /dev/null +++ b/how-to-use-azureml/azureml-sdk-for-r/vignettes/hyperparameter-tune-with-keras/project_files/cifar10_cnn.R @@ -0,0 +1,124 @@ +#' Modified from: "https://github.com/rstudio/keras/blob/master/vignettes/ +#' examples/cifar10_cnn.R" +#' +#' Train a simple deep CNN on the CIFAR10 small images dataset. +#' +#' It gets down to 0.65 test logloss in 25 epochs, and down to 0.55 after 50 +#' epochs, though it is still underfitting at that point. + +library(keras) +install_keras() + +library(azuremlsdk) + +# Parameters -------------------------------------------------------------- + +args <- commandArgs(trailingOnly = TRUE) + +batch_size <- as.numeric(args[2]) +log_metric_to_run("batch_size", batch_size) + +epochs <- as.numeric(args[4]) +log_metric_to_run("epochs", epochs) + +lr <- as.numeric(args[6]) +log_metric_to_run("lr", lr) + +decay <- as.numeric(args[8]) +log_metric_to_run("decay", decay) + +data_augmentation <- TRUE + + +# Data Preparation -------------------------------------------------------- + +# See ?dataset_cifar10 for more info +cifar10 <- dataset_cifar10() + +# Feature scale RGB values in test and train inputs +x_train <- cifar10$train$x / 255 +x_test <- cifar10$test$x / 255 +y_train <- to_categorical(cifar10$train$y, num_classes = 10) +y_test <- to_categorical(cifar10$test$y, num_classes = 10) + + +# Defining Model ---------------------------------------------------------- + +# Initialize sequential model +model <- keras_model_sequential() + +model %>% + +# Start with hidden 2D convolutional layer being fed 32x32 pixel images +layer_conv_2d( + filter = 32, kernel_size = c(3, 3), padding = "same", + input_shape = c(32, 32, 3) + ) %>% + layer_activation("relu") %>% + + # Second hidden layer + layer_conv_2d(filter = 32, kernel_size = c(3, 3)) %>% + layer_activation("relu") %>% + + # Use max pooling + layer_max_pooling_2d(pool_size = c(2, 2)) %>% + layer_dropout(0.25) %>% + + # 2 additional hidden 2D convolutional layers + layer_conv_2d(filter = 32, kernel_size = c(3, 3), padding = "same") %>% + layer_activation("relu") %>% + layer_conv_2d(filter = 32, kernel_size = c(3, 3)) %>% + layer_activation("relu") %>% + + # Use max pooling once more + layer_max_pooling_2d(pool_size = c(2, 2)) %>% + layer_dropout(0.25) %>% + + # Flatten max filtered output into feature vector + # and feed into dense layer + layer_flatten() %>% + layer_dense(512) %>% + layer_activation("relu") %>% + layer_dropout(0.5) %>% + + # Outputs from dense layer are projected onto 10 unit output layer + layer_dense(10) %>% + layer_activation("softmax") + +opt <- optimizer_rmsprop(lr, decay) + +model %>% + compile(loss = "categorical_crossentropy", + optimizer = opt, + metrics = "accuracy" +) + + +# Training ---------------------------------------------------------------- + +if (!data_augmentation) { + + model %>% + fit(x_train, + y_train, + batch_size = batch_size, + epochs = epochs, + validation_data = list(x_test, y_test), + shuffle = TRUE + ) + +} else { + + datagen <- image_data_generator(rotation_range = 20, + width_shift_range = 0.2, + height_shift_range = 0.2, + horizontal_flip = TRUE + ) + + datagen %>% fit_image_data_generator(x_train) + + results <- evaluate(model, x_train, y_train, batch_size) + log_metric_to_run("Loss", results[[1]]) + cat("Loss: ", results[[1]], "\n") + cat("Accuracy: ", results[[2]], "\n") +} \ No newline at end of file diff --git a/how-to-use-azureml/azureml-sdk-for-r/vignettes/installation.Rmd b/how-to-use-azureml/azureml-sdk-for-r/vignettes/installation.Rmd new file mode 100644 index 000000000..d072acabb --- /dev/null +++ b/how-to-use-azureml/azureml-sdk-for-r/vignettes/installation.Rmd @@ -0,0 +1,100 @@ +--- +title: "Install the Azure ML SDK for R" +date: "`r Sys.Date()`" +output: rmarkdown::html_vignette +vignette: > + %\VignetteIndexEntry{Install the Azure ML SDK for R} + %\VignetteEngine{knitr::rmarkdown} + \use_package{UTF-8} +--- + +This article covers the step-by-step instructions for installing the Azure ML SDK for R. + +You do not need run this if you are working on an Azure Machine Learning Compute Instance, as the compute instance already has the Azure ML SDK preinstalled. + +## Install Conda + +If you do not have Conda already installed on your machine, you will first need to install it, since the Azure ML R SDK uses **reticulate** to bind to the Python SDK. We recommend installing [Miniconda](https://docs.conda.io/en/latest/miniconda.html), which is a smaller, lightweight version of Anaconda. Choose the 64-bit binary for Python 3.5 or later. + +## Install the **azuremlsdk** R package +You will need **remotes** to install **azuremlsdk** from the GitHub repo. +``` {r install_remotes, eval=FALSE} +install.packages('remotes') +``` + +Then, you can use the `install_github` function to install the package. +``` {r install_azuremlsdk, eval=FALSE} +remotes::install_cran('azuremlsdk', repos = 'https://cloud.r-project.org/') +``` + +If you are using R installed from CRAN, which comes with 32-bit and 64-bit binaries, you may need to specify the parameter `INSTALL_opts=c("--no-multiarch")` to only build for the current 64-bit architecture. +``` {r eval=FALSE} +remotes::install_cran('azuremlsdk', repos = 'https://cloud.r-project.org/', INSTALL_opts=c("--no-multiarch")) +``` + +## Install the Azure ML Python SDK +Lastly, use the **azuremlsdk** R library to install the Python SDK. By default, `azuremlsdk::install_azureml()` will install the [latest version of the Python SDK](https://pypi.org/project/azureml-sdk/) in a conda environment called `r-azureml` if reticulate < 1.14 or `r-reticulate` if reticulate ≥ 1.14. +``` {r install_pythonsdk, eval=FALSE} +azuremlsdk::install_azureml() +``` + +If you would like to override the default version, environment name, or Python version, you can pass in those arguments. If you would like to restart the R session after installation or delete the conda environment if it already exists and create a new environment, you can also do so: +``` {r eval=FALSE} +azuremlsdk::install_azureml(version = NULL, + custom_envname = "", + conda_python_version = "", + restart_session = TRUE, + remove_existing_env = TRUE) +``` + +## Test installation +You can confirm your installation worked by loading the library and successfully retrieving a run. +``` {r test_installation, eval=FALSE} +library(azuremlsdk) +get_current_run() +``` + +## Troubleshooting +- In step 3 of the installation, if you get ssl errors on windows, it is due to an +outdated openssl binary. Install the latest openssl binaries from +[here](https://wiki.openssl.org/index.php/Binaries). + +- If installation fails due to this error: + + ```R + Error in strptime(xx, f, tz = tz) : + (converted from warning) unable to identify current timezone 'C': + please set environment variable 'TZ' + In R CMD INSTALL + Error in i.p(...) : + (converted from warning) installation of package ‘C:/.../azureml_0.4.0.tar.gz’ had non-zero exit + status + ``` + + You will need to set your time zone environment variable to GMT and restart the installation process. + + ```R + Sys.setenv(TZ='GMT') + ``` + +- If the following permission error occurs while installing in RStudio, + change your RStudio session to administrator mode, and re-run the installation command. + + ```R + Downloading GitHub repo Azure/azureml-sdk-for-r@master + Skipping 2 packages ahead of CRAN: reticulate, rlang + Running `R CMD build`... + + Error: (converted from warning) invalid package + 'C:/.../file2b441bf23631' + In R CMD INSTALL + Error in i.p(...) : + (converted from warning) installation of package + ‘C:/.../file2b441bf23631’ had non-zero exit status + In addition: Warning messages: + 1: In file(con, "r") : + cannot open file 'C:...\file2b44144a540f': Permission denied + 2: In file(con, "r") : + cannot open file 'C:...\file2b4463c21577': Permission denied + ``` + \ No newline at end of file diff --git a/how-to-use-azureml/azureml-sdk-for-r/vignettes/train-and-deploy-to-aci/project_files/accident_predict.R b/how-to-use-azureml/azureml-sdk-for-r/vignettes/train-and-deploy-to-aci/project_files/accident_predict.R new file mode 100644 index 000000000..37e16795c --- /dev/null +++ b/how-to-use-azureml/azureml-sdk-for-r/vignettes/train-and-deploy-to-aci/project_files/accident_predict.R @@ -0,0 +1,16 @@ +#' Copyright(c) Microsoft Corporation. +#' Licensed under the MIT license. + +library(jsonlite) + +init <- function() { + model_path <- Sys.getenv("AZUREML_MODEL_DIR") + model <- readRDS(file.path(model_path, "model.rds")) + message("logistic regression model loaded") + + function(data) { + vars <- as.data.frame(fromJSON(data)) + prediction <- as.numeric(predict(model, vars, type = "response") * 100) + toJSON(prediction) + } +} diff --git a/how-to-use-azureml/azureml-sdk-for-r/vignettes/train-and-deploy-to-aci/project_files/accidents.R b/how-to-use-azureml/azureml-sdk-for-r/vignettes/train-and-deploy-to-aci/project_files/accidents.R new file mode 100644 index 000000000..14dd52371 --- /dev/null +++ b/how-to-use-azureml/azureml-sdk-for-r/vignettes/train-and-deploy-to-aci/project_files/accidents.R @@ -0,0 +1,33 @@ +#' Copyright(c) Microsoft Corporation. +#' Licensed under the MIT license. + +library(azuremlsdk) +library(optparse) +library(caret) + +options <- list( + make_option(c("-d", "--data_folder")) +) + +opt_parser <- OptionParser(option_list = options) +opt <- parse_args(opt_parser) + +paste(opt$data_folder) + +accidents <- readRDS(file.path(opt$data_folder, "accidents.Rd")) +summary(accidents) + +mod <- glm(dead ~ dvcat + seatbelt + frontal + sex + ageOFocc + yearVeh + airbag + occRole, family = binomial, data = accidents) +summary(mod) +predictions <- factor(ifelse(predict(mod) > 0.1, "dead", "alive")) +conf_matrix <- confusionMatrix(predictions, accidents$dead) +message(conf_matrix) + +log_metric_to_run("Accuracy", conf_matrix$overall["Accuracy"]) + +output_dir = "outputs" +if (!dir.exists(output_dir)) { + dir.create(output_dir) +} +saveRDS(mod, file = "./outputs/model.rds") +message("Model saved") diff --git a/how-to-use-azureml/azureml-sdk-for-r/vignettes/train-and-deploy-to-aci/train-and-deploy-to-aci.Rmd b/how-to-use-azureml/azureml-sdk-for-r/vignettes/train-and-deploy-to-aci/train-and-deploy-to-aci.Rmd new file mode 100644 index 000000000..a0b9f4c33 --- /dev/null +++ b/how-to-use-azureml/azureml-sdk-for-r/vignettes/train-and-deploy-to-aci/train-and-deploy-to-aci.Rmd @@ -0,0 +1,326 @@ +--- +title: "Train and deploy your first model with Azure ML" +author: "David Smith" +date: "`r Sys.Date()`" +output: rmarkdown::html_vignette +vignette: > + %\VignetteIndexEntry{Train and deploy your first model with Azure ML} + %\VignetteEngine{knitr::rmarkdown} + \use_package{UTF-8} +--- + +In this tutorial, you learn the foundational design patterns in Azure Machine Learning. You'll train and deploy a **caret** model to predict the likelihood of a fatality in an automobile accident. After completing this tutorial, you'll have the practical knowledge of the R SDK to scale up to developing more-complex experiments and workflows. + +In this tutorial, you learn the following tasks: + +* Connect your workspace +* Load data and prepare for training +* Upload data to the datastore so it is available for remote training +* Create a compute resource +* Train a caret model to predict probability of fatality +* Deploy a prediction endpoint +* Test the model from R + +## Prerequisites + +If you don't have access to an Azure ML workspace, follow the [setup tutorial](https://azure.github.io/azureml-sdk-for-r/articles/configuration.html) to configure and create a workspace. + +## Set up your development environment +The setup for your development work in this tutorial includes the following actions: + +* Install required packages +* Connect to a workspace, so that your local computer can communicate with remote resources +* Create an experiment to track your runs +* Create a remote compute target to use for training + +### Install required packages +This tutorial assumes you already have the Azure ML SDK installed. Go ahead and import the **azuremlsdk** package. + +```{r eval=FALSE} +library(azuremlsdk) +``` + +The tutorial uses data from the [**DAAG** package](https://cran.r-project.org/package=DAAG). Install the package if you don't have it. + +```{r eval=FALSE} +install.packages("DAAG") +``` + +The training and scoring scripts (`accidents.R` and `accident_predict.R`) have some additional dependencies. If you plan on running those scripts locally, make sure you have those required packages as well. + +### Load your workspace +Instantiate a workspace object from your existing workspace. The following code will load the workspace details from the **config.json** file. You can also retrieve a workspace using [`get_workspace()`](https://azure.github.io/azureml-sdk-for-r/reference/get_workspace.html). + +```{r load_workpace, eval=FALSE} +ws <- load_workspace_from_config() +``` + +### Create an experiment +An Azure ML experiment tracks a grouping of runs, typically from the same training script. Create an experiment to track the runs for training the caret model on the accidents data. + +```{r create_experiment, eval=FALSE} +experiment_name <- "accident-logreg" +exp <- experiment(ws, experiment_name) +``` + +### Create a compute target +By using Azure Machine Learning Compute (AmlCompute), a managed service, data scientists can train machine learning models on clusters of Azure virtual machines. Examples include VMs with GPU support. In this tutorial, you create a single-node AmlCompute cluster as your training environment. The code below creates the compute cluster for you if it doesn't already exist in your workspace. + +You may need to wait a few minutes for your compute cluster to be provisioned if it doesn't already exist. + +```{r create_cluster, eval=FALSE} +cluster_name <- "rcluster" +compute_target <- get_compute(ws, cluster_name = cluster_name) +if (is.null(compute_target)) { + vm_size <- "STANDARD_D2_V2" + compute_target <- create_aml_compute(workspace = ws, + cluster_name = cluster_name, + vm_size = vm_size, + max_nodes = 1) + + wait_for_provisioning_completion(compute_target, show_output = TRUE) +} +``` + +## Prepare data for training +This tutorial uses data from the **DAAG** package. This dataset includes data from over 25,000 car crashes in the US, with variables you can use to predict the likelihood of a fatality. First, import the data into R and transform it into a new dataframe `accidents` for analysis, and export it to an `Rdata` file. + +```{r load_data, eval=FALSE} +library(DAAG) +data(nassCDS) + +accidents <- na.omit(nassCDS[,c("dead","dvcat","seatbelt","frontal","sex","ageOFocc","yearVeh","airbag","occRole")]) +accidents$frontal <- factor(accidents$frontal, labels=c("notfrontal","frontal")) +accidents$occRole <- factor(accidents$occRole) + +saveRDS(accidents, file="accidents.Rd") +``` + +### Upload data to the datastore +Upload data to the cloud so that it can be access by your remote training environment. Each Azure ML workspace comes with a default datastore that stores the connection information to the Azure blob container that is provisioned in the storage account attached to the workspace. The following code will upload the accidents data you created above to that datastore. + +```{r upload_data, eval=FALSE} +ds <- get_default_datastore(ws) + +target_path <- "accidentdata" +upload_files_to_datastore(ds, + list("./project_files/accidents.Rd"), + target_path = target_path, + overwrite = TRUE) +``` + + +## Train a model + +For this tutorial, fit a logistic regression model on your uploaded data using your remote compute cluster. To submit a job, you need to: + +* Prepare the training script +* Create an estimator +* Submit the job + +### Prepare the training script +A training script called `accidents.R` has been provided for you in the "project_files" directory of this tutorial. Notice the following details **inside the training script** that have been done to leverage the Azure ML service for training: + +* The training script takes an argument `-d` to find the directory that contains the training data. When you define and submit your job later, you point to the datastore for this argument. Azure ML will mount the storage folder to the remote cluster for the training job. +* The training script logs the final accuracy as a metric to the run record in Azure ML using `log_metric_to_run()`. The Azure ML SDK provides a set of logging APIs for logging various metrics during training runs. These metrics are recorded and persisted in the experiment run record. The metrics can then be accessed at any time or viewed in the run details page in [Azure Machine Learning studio](http://ml.azure.com). See the [reference](https://azure.github.io/azureml-sdk-for-r/reference/index.html#section-training-experimentation) for the full set of logging methods `log_*()`. +* The training script saves your model into a directory named **outputs**. The `./outputs` folder receives special treatment by Azure ML. During training, files written to `./outputs` are automatically uploaded to your run record by Azure ML and persisted as artifacts. By saving the trained model to `./outputs`, you'll be able to access and retrieve your model file even after the run is over and you no longer have access to your remote training environment. + +### Create an estimator + +An Azure ML estimator encapsulates the run configuration information needed for executing a training script on the compute target. Azure ML runs are run as containerized jobs on the specified compute target. By default, the Docker image built for your training job will include R, the Azure ML SDK, and a set of commonly used R packages. See the full list of default packages included [here](https://azure.github.io/azureml-sdk-for-r/reference/r_environment.html). + +To create the estimator, define: + +* The directory that contains your scripts needed for training (`source_directory`). All the files in this directory are uploaded to the cluster node(s) for execution. The directory must contain your training script and any additional scripts required. +* The training script that will be executed (`entry_script`). +* The compute target (`compute_target`), in this case the AmlCompute cluster you created earlier. +* The parameters required from the training script (`script_params`). Azure ML will run your training script as a command-line script with `Rscript`. In this tutorial you specify one argument to the script, the data directory mounting point, which you can access with `ds$path(target_path)`. +* Any environment dependencies required for training. The default Docker image built for training already contains the three packages (`caret`, `e1071`, and `optparse`) needed in the training script. So you don't need to specify additional information. If you are using R packages that are not included by default, use the estimator's `cran_packages` parameter to add additional CRAN packages. See the [`estimator()`](https://azure.github.io/azureml-sdk-for-r/reference/estimator.html) reference for the full set of configurable options. + +```{r create_estimator, eval=FALSE} +est <- estimator(source_directory = "project_files", + entry_script = "accidents.R", + script_params = list("--data_folder" = ds$path(target_path)), + compute_target = compute_target + ) +``` + +### Submit the job on the remote cluster + +Finally submit the job to run on your cluster. `submit_experiment()` returns a Run object that you then use to interface with the run. In total, the first run takes **about 10 minutes**. But for later runs, the same Docker image is reused as long as the script dependencies don't change. In this case, the image is cached and the container startup time is much faster. + +```{r submit_job, eval=FALSE} +run <- submit_experiment(exp, est) +``` + +You can view a table of the run's details. Clicking the "Web View" link provided will bring you to Azure Machine Learning studio, where you can monitor the run in the UI. + +```{r view_run, eval=FALSE} +view_run_details(run) +``` + +Model training happens in the background. Wait until the model has finished training before you run more code. + +```{r wait_run, eval=FALSE} +wait_for_run_completion(run, show_output = TRUE) +``` + +You -- and colleagues with access to the workspace -- can submit multiple experiments in parallel, and Azure ML will take of scheduling the tasks on the compute cluster. You can even configure the cluster to automatically scale up to multiple nodes, and scale back when there are no more compute tasks in the queue. This configuration is a cost-effective way for teams to share compute resources. + +## Retrieve training results +Once your model has finished training, you can access the artifacts of your job that were persisted to the run record, including any metrics logged and the final trained model. + +### Get the logged metrics +In the training script `accidents.R`, you logged a metric from your model: the accuracy of the predictions in the training data. You can see metrics in the [studio](https://ml.azure.com), or extract them to the local session as an R list as follows: + +```{r metrics, eval=FALSE} +metrics <- get_run_metrics(run) +metrics +``` + +If you've run multiple experiments (say, using differing variables, algorithms, or hyperparamers), you can use the metrics from each run to compare and choose the model you'll use in production. + +### Get the trained model +You can retrieve the trained model and look at the results in your local R session. The following code will download the contents of the `./outputs` directory, which includes the model file. + +```{r retrieve_model, eval=FALSE} +download_files_from_run(run, prefix="outputs/") +accident_model <- readRDS("project_files/outputs/model.rds") +summary(accident_model) +``` + +You see some factors that contribute to an increase in the estimated probability of death: + +* higher impact speed +* male driver +* older occupant +* passenger + +You see lower probabilities of death with: + +* presence of airbags +* presence seatbelts +* frontal collision + +The vehicle year of manufacture does not have a significant effect. + +You can use this model to make new predictions: + +```{r manual_predict, eval=FALSE} +newdata <- data.frame( # valid values shown below + dvcat="10-24", # "1-9km/h" "10-24" "25-39" "40-54" "55+" + seatbelt="none", # "none" "belted" + frontal="frontal", # "notfrontal" "frontal" + sex="f", # "f" "m" + ageOFocc=16, # age in years, 16-97 + yearVeh=2002, # year of vehicle, 1955-2003 + airbag="none", # "none" "airbag" + occRole="pass" # "driver" "pass" + ) + +## predicted probability of death for these variables, as a percentage +as.numeric(predict(accident_model,newdata, type="response")*100) +``` + +## Deploy as a web service + +With your model, you can predict the danger of death from a collision. Use Azure ML to deploy your model as a prediction service. In this tutorial, you will deploy the web service in [Azure Container Instances](https://docs.microsoft.com/en-us/azure/container-instances/) (ACI). + +### Register the model + +First, register the model you downloaded to your workspace with [`register_model()`](https://azure.github.io/azureml-sdk-for-r/reference/register_model.html). A registered model can be any collection of files, but in this case the R model object is sufficient. Azure ML will use the registered model for deployment. + +```{r register_model, eval=FALSE} +model <- register_model(ws, + model_path = "project_files/outputs/model.rds", + model_name = "accidents_model", + description = "Predict probablity of auto accident") +``` + +### Define the inference dependencies +To create a web service for your model, you first need to create a scoring script (`entry_script`), an R script that will take as input variable values (in JSON format) and output a prediction from your model. For this tutorial, use the provided scoring file `accident_predict.R`. The scoring script must contain an `init()` method that loads your model and returns a function that uses the model to make a prediction based on the input data. See the [documentation](https://azure.github.io/azureml-sdk-for-r/reference/inference_config.html#details) for more details. + +Next, define an Azure ML **environment** for your script's package dependencies. With an environment, you specify R packages (from CRAN or elsewhere) that are needed for your script to run. You can also provide the values of environment variables that your script can reference to modify its behavior. By default, Azure ML will build the same default Docker image used with the estimator for training. Since the tutorial has no special requirements, create an environment with no special attributes. + +```{r create_environment, eval=FALSE} +r_env <- r_environment(name = "basic_env") +``` + +If you want to use your own Docker image for deployment instead, specify the `custom_docker_image` parameter. See the [`r_environment()`](https://azure.github.io/azureml-sdk-for-r/reference/r_environment.html) reference for the full set of configurable options for defining an environment. + +Now you have everything you need to create an **inference config** for encapsulating your scoring script and environment dependencies. + +``` {r create_inference_config, eval=FALSE} +inference_config <- inference_config( + entry_script = "accident_predict.R", + source_directory = "project_files", + environment = r_env) +``` + +### Deploy to ACI +In this tutorial, you will deploy your service to ACI. This code provisions a single container to respond to inbound requests, which is suitable for testing and light loads. See [`aci_webservice_deployment_config()`](https://azure.github.io/azureml-sdk-for-r/reference/aci_webservice_deployment_config.html) for additional configurable options. (For production-scale deployments, you can also [deploy to Azure Kubernetes Service](https://azure.github.io/azureml-sdk-for-r/articles/deploy-to-aks/deploy-to-aks.html).) + +``` {r create_aci_config, eval=FALSE} +aci_config <- aci_webservice_deployment_config(cpu_cores = 1, memory_gb = 0.5) +``` + +Now you deploy your model as a web service. Deployment **can take several minutes**. + +```{r deploy_service, eval=FALSE} +aci_service <- deploy_model(ws, + 'accident-pred', + list(model), + inference_config, + aci_config) + +wait_for_deployment(aci_service, show_output = TRUE) +``` + +If you encounter any issue in deploying the web service, please visit the [troubleshooting guide](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-troubleshoot-deployment). + +## Test the deployed service + +Now that your model is deployed as a service, you can test the service from R using [`invoke_webservice()`](https://azure.github.io/azureml-sdk-for-r/reference/invoke_webservice.html). Provide a new set of data to predict from, convert it to JSON, and send it to the service. + +```{r test_deployment, eval=FALSE} +library(jsonlite) + +newdata <- data.frame( # valid values shown below + dvcat="10-24", # "1-9km/h" "10-24" "25-39" "40-54" "55+" + seatbelt="none", # "none" "belted" + frontal="frontal", # "notfrontal" "frontal" + sex="f", # "f" "m" + ageOFocc=22, # age in years, 16-97 + yearVeh=2002, # year of vehicle, 1955-2003 + airbag="none", # "none" "airbag" + occRole="pass" # "driver" "pass" + ) + +prob <- invoke_webservice(aci_service, toJSON(newdata)) +prob +``` + +You can also get the web service's HTTP endpoint, which accepts REST client calls. You can share this endpoint with anyone who wants to test the web service or integrate it into an application. + +```{r get_endpoint, eval=FALSE} +aci_service$scoring_uri +``` + +## Clean up resources + +Delete the resources once you no longer need them. Don't delete any resource you plan to still use. + +Delete the web service: +```{r delete_service, eval=FALSE} +delete_webservice(aci_service) +``` + +Delete the registered model: +```{r delete_model, eval=FALSE} +delete_model(model) +``` + +Delete the compute cluster: +```{r delete_compute, eval=FALSE} +delete_compute(compute_target) +``` diff --git a/how-to-use-azureml/azureml-sdk-for-r/vignettes/train-with-tensorflow/project_files/tf_mnist.R b/how-to-use-azureml/azureml-sdk-for-r/vignettes/train-with-tensorflow/project_files/tf_mnist.R new file mode 100644 index 000000000..4eaff5db8 --- /dev/null +++ b/how-to-use-azureml/azureml-sdk-for-r/vignettes/train-with-tensorflow/project_files/tf_mnist.R @@ -0,0 +1,62 @@ +# Copyright 2015 The TensorFlow Authors. All Rights Reserved. +# Copyright 2016 RStudio, Inc. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# ============================================================================== + + +library(tensorflow) +install_tensorflow(version = "1.13.2-gpu") + +library(azuremlsdk) + +# Create the model +x <- tf$placeholder(tf$float32, shape(NULL, 784L)) +W <- tf$Variable(tf$zeros(shape(784L, 10L))) +b <- tf$Variable(tf$zeros(shape(10L))) + +y <- tf$nn$softmax(tf$matmul(x, W) + b) + +# Define loss and optimizer +y_ <- tf$placeholder(tf$float32, shape(NULL, 10L)) +cross_entropy <- tf$reduce_mean(-tf$reduce_sum(y_ * log(y), + reduction_indices = 1L)) +train_step <- tf$train$GradientDescentOptimizer(0.5)$minimize(cross_entropy) + +# Create session and initialize variables +sess <- tf$Session() +sess$run(tf$global_variables_initializer()) + +# Load mnist data ) +datasets <- tf$contrib$learn$datasets +mnist <- datasets$mnist$read_data_sets("MNIST-data", one_hot = TRUE) + +# Train +for (i in 1:1000) { + batches <- mnist$train$next_batch(100L) + batch_xs <- batches[[1]] + batch_ys <- batches[[2]] + sess$run(train_step, + feed_dict = dict(x = batch_xs, y_ = batch_ys)) +} + +# Test trained model +correct_prediction <- tf$equal(tf$argmax(y, 1L), tf$argmax(y_, 1L)) +accuracy <- tf$reduce_mean(tf$cast(correct_prediction, tf$float32)) +cat("Accuracy: ", sess$run(accuracy, + feed_dict = dict(x = mnist$test$images, + y_ = mnist$test$labels))) + +log_metric_to_run("accuracy", + sess$run(accuracy, feed_dict = dict(x = mnist$test$images, + y_ = mnist$test$labels))) diff --git a/how-to-use-azureml/azureml-sdk-for-r/vignettes/train-with-tensorflow/train-with-tensorflow.Rmd b/how-to-use-azureml/azureml-sdk-for-r/vignettes/train-with-tensorflow/train-with-tensorflow.Rmd new file mode 100644 index 000000000..1290c4b4b --- /dev/null +++ b/how-to-use-azureml/azureml-sdk-for-r/vignettes/train-with-tensorflow/train-with-tensorflow.Rmd @@ -0,0 +1,143 @@ +--- +title: "Train a TensorFlow model" +date: "`r Sys.Date()`" +output: rmarkdown::html_vignette +vignette: > + %\VignetteIndexEntry{Train a TensorFlow model} + %\VignetteEngine{knitr::rmarkdown} + \use_package{UTF-8} +--- + +This tutorial demonstrates how run a TensorFlow job at scale using Azure ML. You will train a TensorFlow model to classify handwritten digits (MNIST) using a deep neural network (DNN) and log your results to the Azure ML service. + +## Prerequisites +If you don’t have access to an Azure ML workspace, follow the [setup tutorial](https://azure.github.io/azureml-sdk-for-r/articles/configuration.html) to configure and create a workspace. + +## Set up development environment +The setup for your development work in this tutorial includes the following actions: + +* Import required packages +* Connect to a workspace +* Create an experiment to track your runs +* Create a remote compute target to use for training + +### Import **azuremlsdk** package +```{r eval=FALSE} +library(azuremlsdk) +``` + +### Load your workspace +Instantiate a workspace object from your existing workspace. The following code will load the workspace details from a **config.json** file if you previously wrote one out with [`write_workspace_config()`](https://azure.github.io/azureml-sdk-for-r/reference/write_workspace_config.html). +```{r load_workpace, eval=FALSE} +ws <- load_workspace_from_config() +``` + +Or, you can retrieve a workspace by directly specifying your workspace details: +```{r get_workpace, eval=FALSE} +ws <- get_workspace("", "", "") +``` + +### Create an experiment +An Azure ML **experiment** tracks a grouping of runs, typically from the same training script. Create an experiment to track the runs for training the TensorFlow model on the MNIST data. + +```{r create_experiment, eval=FALSE} +exp <- experiment(workspace = ws, name = "tf-mnist") +``` + +If you would like to track your runs in an existing experiment, simply specify that experiment's name to the `name` parameter of `experiment()`. + +### Create a compute target +By using Azure Machine Learning Compute (AmlCompute), a managed service, data scientists can train machine learning models on clusters of Azure virtual machines. In this tutorial, you create a GPU-enabled cluster as your training environment. The code below creates the compute cluster for you if it doesn't already exist in your workspace. + +You may need to wait a few minutes for your compute cluster to be provisioned if it doesn't already exist. + +```{r create_cluster, eval=FALSE} +cluster_name <- "gpucluster" +compute_target <- get_compute(ws, cluster_name = cluster_name) +if (is.null(compute_target)) +{ + vm_size <- "STANDARD_NC6" + compute_target <- create_aml_compute(workspace = ws, + cluster_name = cluster_name, + vm_size = vm_size, + max_nodes = 4) + + wait_for_provisioning_completion(compute_target, show_output = TRUE) +} +``` + +## Prepare the training script + +A training script called `tf_mnist.R` has been provided for you in the "project_files" directory of this tutorial. The Azure ML SDK provides a set of logging APIs for logging various metrics during training runs. These metrics are recorded and persisted in the experiment run record, and can be be accessed at any time or viewed in the run details page in [Azure Machine Learning studio](http://ml.azure.com/). + +In order to collect and upload run metrics, you need to do the following **inside the training script**: + +* Import the **azuremlsdk** package +``` +library(azuremlsdk) +``` + +* Add the [`log_metric_to_run()`](https://azure.github.io/azureml-sdk-for-r/reference/log_metric_to_run.html) function to track our primary metric, "accuracy", for this experiment. If you have your own training script with several important metrics, simply create a logging call for each one within the script. +``` +log_metric_to_run("accuracy", + sess$run(accuracy, + feed_dict = dict(x = mnist$test$images, y_ = mnist$test$labels))) +``` + +See the [reference](https://azure.github.io/azureml-sdk-for-r/reference/index.html#section-training-experimentation) for the full set of logging methods `log_*()` available from the R SDK. + +## Create an estimator + +An Azure ML **estimator** encapsulates the run configuration information needed for executing a training script on the compute target. Azure ML runs are run as containerized jobs on the specified compute target. By default, the Docker image built for your training job will include R, the Azure ML SDK, and a set of commonly used R packages. See the full list of default packages included [here](https://azure.github.io/azureml-sdk-for-r/reference/r_environment.html). + +To create the estimator, define the following: + +* The directory that contains your scripts needed for training (`source_directory`). All the files in this directory are uploaded to the cluster node(s) for execution. The directory must contain your training script and any additional scripts required. +* The training script that will be executed (`entry_script`). +* The compute target (`compute_target`), in this case the AmlCompute cluster you created earlier. +* Any environment dependencies required for training. Since the training script requires the TensorFlow package, which is not included in the image by default, pass the package name to the `cran_packages` parameter to have it installed in the Docker container where the job will run. See the [`estimator()`](https://azure.github.io/azureml-sdk-for-r/reference/estimator.html) reference for the full set of configurable options. +* Set the `use_gpu = TRUE` flag so the default base GPU Docker image will be built, since the job will be run on a GPU cluster. + +```{r create_estimator, eval=FALSE} +est <- estimator(source_directory = "project_files", + entry_script = "tf_mnist.R", + compute_target = compute_target, + cran_packages = c("tensorflow"), + use_gpu = TRUE) +``` + +## Submit the job + +Finally submit the job to run on your cluster. [`submit_experiment()`](https://azure.github.io/azureml-sdk-for-r/reference/submit_experiment.html) returns a `Run` object that you can then use to interface with the run. + +```{r submit_job, eval=FALSE} +run <- submit_experiment(exp, est) +``` + +You can view the run’s details as a table. Clicking the “Web View” link provided will bring you to Azure Machine Learning studio, where you can monitor the run in the UI. + +```{r eval=FALSE} +view_run_details(run) +``` + +Model training happens in the background. Wait until the model has finished training before you run more code. + +```{r eval=FALSE} +wait_for_run_completion(run, show_output = TRUE) +``` + +## View run metrics +Once your job has finished, you can view the metrics collected during your TensorFlow run. + +```{r get_metrics, eval=FALSE} +metrics <- get_run_metrics(run) +metrics +``` + +## Clean up resources +Delete the resources once you no longer need them. Don't delete any resource you plan to still use. + +Delete the compute cluster: +```{r delete_compute, eval=FALSE} +delete_compute(compute_target) +``` \ No newline at end of file diff --git a/how-to-use-azureml/deployment/deploy-multi-model/multi-model-register-and-deploy.ipynb b/how-to-use-azureml/deployment/deploy-multi-model/multi-model-register-and-deploy.ipynb index 2cafca10a..1fbc55118 100644 --- a/how-to-use-azureml/deployment/deploy-multi-model/multi-model-register-and-deploy.ipynb +++ b/how-to-use-azureml/deployment/deploy-multi-model/multi-model-register-and-deploy.ipynb @@ -195,7 +195,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "You can now create and/or use an Environment object when deploying a Webservice. The Environment can have been previously registered with your Workspace, or it will be registered with it as a part of the Webservice deployment. Only Environments that were created using azureml-defaults version 1.0.48 or later will work with this new handling however.\n", + "You can now create and/or use an Environment object when deploying a Webservice. The Environment can have been previously registered with your Workspace, or it will be registered with it as a part of the Webservice deployment. Please note that your environment must include azureml-defaults with verion >= 1.0.45 as a pip dependency, because it contains the functionality needed to host the model as a web service.\n", "\n", "More information can be found in our [using environments notebook](../training/using-environments/using-environments.ipynb)." ] @@ -221,23 +221,30 @@ "## Create Inference Configuration\n", "\n", "There is now support for a source directory, you can upload an entire folder from your local machine as dependencies for the Webservice.\n", - "Note: in that case, your entry_script, conda_file, and extra_docker_file_steps paths are relative paths to the source_directory path.\n", + "Note: in that case, environments's entry_script and file_path are relative paths to the source_directory path; myenv.docker.base_dockerfile is a string containing extra docker steps or contents of the docker file.\n", "\n", "Sample code for using a source directory:\n", "\n", "```python\n", + "from azureml.core.environment import Environment\n", + "from azureml.core.model import InferenceConfig\n", + "\n", + "myenv = Environment.from_conda_specification(name='myenv', file_path='env/myenv.yml')\n", + "\n", + "# explicitly set base_image to None when setting base_dockerfile\n", + "myenv.docker.base_image = None\n", + "# add extra docker commends to execute\n", + "myenv.docker.base_dockerfile = \"FROM ubuntu\\n RUN echo \\\"hello\\\"\"\n", + "\n", "inference_config = InferenceConfig(source_directory=\"C:/abc\",\n", - " runtime= \"python\", \n", " entry_script=\"x/y/score.py\",\n", - " conda_file=\"env/myenv.yml\", \n", - " extra_docker_file_steps=\"helloworld.txt\")\n", + " environment=myenv)\n", "```\n", "\n", - " - source_directory = holds source path as string, this entire folder gets added in image so its really easy to access any files within this folder or subfolder\n", - " - runtime = Which runtime to use for the image. Current supported runtimes are 'spark-py' and 'python\n", - " - entry_script = contains logic specific to initializing your model and running predictions\n", - " - conda_file = manages conda and python package dependencies.\n", - " - extra_docker_file_steps = optional: any extra steps you want to inject into docker file" + " - file_path: input parameter to Environment constructor. Manages conda and python package dependencies.\n", + " - env.docker.base_dockerfile: any extra steps you want to inject into docker file\n", + " - source_directory: holds source path as string, this entire folder gets added in image so its really easy to access any files within this folder or subfolder\n", + " - entry_script: contains logic specific to initializing your model and running predictions" ] }, { @@ -278,7 +285,7 @@ "from azureml.exceptions import WebserviceException\n", "\n", "deployment_config = AciWebservice.deploy_configuration(cpu_cores=1, memory_gb=1)\n", - "aci_service_name = 'aciservice1'\n", + "aci_service_name = 'aciservice-multimodel'\n", "\n", "try:\n", " # if you want to get existing service below is the command\n", diff --git a/how-to-use-azureml/deployment/deploy-to-cloud/model-register-and-deploy.ipynb b/how-to-use-azureml/deployment/deploy-to-cloud/model-register-and-deploy.ipynb index e18da596f..6c81ead93 100644 --- a/how-to-use-azureml/deployment/deploy-to-cloud/model-register-and-deploy.ipynb +++ b/how-to-use-azureml/deployment/deploy-to-cloud/model-register-and-deploy.ipynb @@ -20,7 +20,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Register model and deploy as webservice\n", + "# Register model and deploy as webservice in ACI\n", "\n", "Following this notebook, you will:\n", "\n", @@ -45,6 +45,7 @@ "source": [ "import azureml.core\n", "\n", + "\n", "# Check core SDK version number.\n", "print('SDK version:', azureml.core.VERSION)" ] @@ -70,6 +71,7 @@ "source": [ "from azureml.core import Workspace\n", "\n", + "\n", "ws = Workspace.from_config()\n", "print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep='\\n')" ] @@ -91,6 +93,7 @@ "source": [ "from azureml.core import Dataset\n", "\n", + "\n", "datastore = ws.get_default_datastore()\n", "datastore.upload_files(files=['./features.csv', './labels.csv'],\n", " target_path='sklearn_regression/',\n", @@ -125,6 +128,7 @@ "from azureml.core import Model\n", "from azureml.core.resource_configuration import ResourceConfiguration\n", "\n", + "\n", "model = Model.register(workspace=ws,\n", " model_name='my-sklearn-model', # Name of the registered model in your workspace.\n", " model_path='./sklearn_regression_model.pkl', # Local file to upload and register as a model.\n", @@ -159,6 +163,8 @@ "\n", "The Azure Machine Learning service provides a default environment for supported model frameworks, including scikit-learn, based on the metadata you provided when registering your model. This is the easiest way to deploy your model.\n", "\n", + "Even when you deploy your model to ACI with a default environment you can still customize the deploy configuration (i.e. the number of cores and amount of memory made available for the deployment) using the [AciWebservice.deploy_configuration()](https://docs.microsoft.com/python/api/azureml-core/azureml.core.webservice.aci.aciwebservice#deploy-configuration-cpu-cores-none--memory-gb-none--tags-none--properties-none--description-none--location-none--auth-enabled-none--ssl-enabled-none--enable-app-insights-none--ssl-cert-pem-file-none--ssl-key-pem-file-none--ssl-cname-none--dns-name-label-none--). Look at the \"Use a custom environment\" section of this notebook for more information on deploy configuration.\n", + "\n", "**Note**: This step can take several minutes." ] }, @@ -171,6 +177,7 @@ "from azureml.core import Webservice\n", "from azureml.exceptions import WebserviceException\n", "\n", + "\n", "service_name = 'my-sklearn-service'\n", "\n", "# Remove any existing service under the same name.\n", @@ -198,6 +205,7 @@ "source": [ "import json\n", "\n", + "\n", "input_payload = json.dumps({\n", " 'data': [\n", " [ 0.03807591, 0.05068012, 0.06169621, 0.02187235, -0.0442235,\n", @@ -231,9 +239,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Use a custom environment (for all models)\n", + "### Use a custom environment\n", "\n", - "If you want more control over how your model is run, if it uses another framework, or if it has special runtime requirements, you can instead specify your own environment and scoring method.\n", + "If you want more control over how your model is run, if it uses another framework, or if it has special runtime requirements, you can instead specify your own environment and scoring method. Custom environments can be used for any model you want to deploy.\n", "\n", "Specify the model's runtime environment by creating an [Environment](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.environment%28class%29?view=azure-ml-py) object and providing the [CondaDependencies](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.conda_dependencies.condadependencies?view=azure-ml-py) needed by your model." ] @@ -247,6 +255,7 @@ "from azureml.core import Environment\n", "from azureml.core.conda_dependencies import CondaDependencies\n", "\n", + "\n", "environment = Environment('my-sklearn-environment')\n", "environment.python.conda_dependencies = CondaDependencies.create(pip_packages=[\n", " 'azureml-defaults',\n", @@ -278,7 +287,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Deploy your model in the custom environment by providing an [InferenceConfig](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.model.inferenceconfig?view=azure-ml-py) object to [Model.deploy()](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.model.model?view=azure-ml-py#deploy-workspace--name--models--inference-config--deployment-config-none--deployment-target-none-).\n", + "Deploy your model in the custom environment by providing an [InferenceConfig](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.model.inferenceconfig?view=azure-ml-py) object to [Model.deploy()](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.model.model?view=azure-ml-py#deploy-workspace--name--models--inference-config--deployment-config-none--deployment-target-none-). In this case we are also using the [AciWebservice.deploy_configuration()](https://docs.microsoft.com/python/api/azureml-core/azureml.core.webservice.aci.aciwebservice#deploy-configuration-cpu-cores-none--memory-gb-none--tags-none--properties-none--description-none--location-none--auth-enabled-none--ssl-enabled-none--enable-app-insights-none--ssl-cert-pem-file-none--ssl-key-pem-file-none--ssl-cname-none--dns-name-label-none--) method to generate a custom deploy configuration.\n", "\n", "**Note**: This step can take several minutes." ] @@ -288,15 +297,18 @@ "execution_count": null, "metadata": { "tags": [ - "azuremlexception-remarks-sample" + "azuremlexception-remarks-sample", + "sample-aciwebservice-deploy-config" ] }, "outputs": [], "source": [ "from azureml.core import Webservice\n", "from azureml.core.model import InferenceConfig\n", + "from azureml.core.webservice import AciWebservice\n", "from azureml.exceptions import WebserviceException\n", "\n", + "\n", "service_name = 'my-custom-env-service'\n", "\n", "# Remove any existing service under the same name.\n", @@ -305,11 +317,14 @@ "except WebserviceException:\n", " pass\n", "\n", - "inference_config = InferenceConfig(entry_script='score.py',\n", - " source_directory='.',\n", - " environment=environment)\n", + "inference_config = InferenceConfig(entry_script='score.py', environment=environment)\n", + "aci_config = AciWebservice.deploy_configuration(cpu_cores=1, memory_gb=1)\n", "\n", - "service = Model.deploy(ws, service_name, [model], inference_config)\n", + "service = Model.deploy(workspace=ws,\n", + " name=service_name,\n", + " models=[model],\n", + " inference_config=inference_config,\n", + " deployment_config=aci_config)\n", "service.wait_for_deployment(show_output=True)" ] }, @@ -326,8 +341,6 @@ "metadata": {}, "outputs": [], "source": [ - "import json\n", - "\n", "input_payload = json.dumps({\n", " 'data': [\n", " [ 0.03807591, 0.05068012, 0.06169621, 0.02187235, -0.0442235,\n", @@ -360,16 +373,118 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Model profiling\n", + "### Model Profiling\n", "\n", - "You can also take advantage of the profiling feature to estimate CPU and memory requirements for models.\n", + "Profile your model to understand how much CPU and memory the service, created as a result of its deployment, will need. Profiling returns information such as CPU usage, memory usage, and response latency. It also provides a CPU and memory recommendation based on the resource usage. You can profile your model (or more precisely the service built based on your model) on any CPU and/or memory combination where 0.1 <= CPU <= 3.5 and 0.1GB <= memory <= 15GB. If you do not provide a CPU and/or memory requirement, we will test it on the default configuration of 3.5 CPU and 15GB memory.\n", "\n", - "```python\n", - "profile = Model.profile(ws, \"profilename\", [model], inference_config, test_sample)\n", - "profile.wait_for_profiling(True)\n", - "profiling_results = profile.get_results()\n", - "print(profiling_results)\n", - "```" + "In order to profile your model you will need:\n", + "- a registered model\n", + "- an entry script\n", + "- an inference configuration\n", + "- a single column tabular dataset, where each row contains a string representing sample request data sent to the service.\n", + "\n", + "At this point we only support profiling of services that expect their request data to be a string, for example: string serialized json, text, string serialized image, etc. The content of each row of the dataset (string) will be put into the body of the HTTP request and sent to the service encapsulating the model for scoring.\n", + "\n", + "Below is an example of how you can construct an input dataset to profile a service which expects its incoming requests to contain serialized json. In this case we created a dataset based one hundred instances of the same request data. In real world scenarios however, we suggest that you use larger datasets with various inputs, especially if your model resource usage/behavior is input dependent." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You may want to register datasets using the register() method to your workspace so they can be shared with others, reused and referred to by name in your script.\n", + "You can try get the dataset first to see if it's already registered." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core import Datastore\n", + "from azureml.core.dataset import Dataset\n", + "from azureml.data import dataset_type_definitions\n", + "\n", + "dataset_name='diabetes_sample_request_data'\n", + "\n", + "dataset_registered = False\n", + "try:\n", + " sample_request_data = Dataset.get_by_name(workspace = ws, name = dataset_name)\n", + " dataset_registered = True\n", + "except:\n", + " print(\"The dataset {} is not registered in workspace yet.\".format(dataset_name))\n", + "\n", + "if not dataset_registered:\n", + " # create a string that can be utf-8 encoded and\n", + " # put in the body of the request\n", + " serialized_input_json = json.dumps({\n", + " 'data': [\n", + " [ 0.03807591, 0.05068012, 0.06169621, 0.02187235, -0.0442235,\n", + " -0.03482076, -0.04340085, -0.00259226, 0.01990842, -0.01764613]\n", + " ]\n", + " })\n", + " dataset_content = []\n", + " for i in range(100):\n", + " dataset_content.append(serialized_input_json)\n", + " dataset_content = '\\n'.join(dataset_content)\n", + " file_name = \"{}.txt\".format(dataset_name)\n", + " f = open(file_name, 'w')\n", + " f.write(dataset_content)\n", + " f.close()\n", + "\n", + " # upload the txt file created above to the Datastore and create a dataset from it\n", + " data_store = Datastore.get_default(ws)\n", + " data_store.upload_files(['./' + file_name], target_path='sample_request_data')\n", + " datastore_path = [(data_store, 'sample_request_data' +'/' + file_name)]\n", + " sample_request_data = Dataset.Tabular.from_delimited_files(\n", + " datastore_path,\n", + " separator='\\n',\n", + " infer_column_types=True,\n", + " header=dataset_type_definitions.PromoteHeadersBehavior.NO_HEADERS)\n", + " sample_request_data = sample_request_data.register(workspace=ws,\n", + " name=dataset_name,\n", + " create_new_version=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now that we have an input dataset we are ready to go ahead with profiling. In this case we are testing the previously introduced sklearn regression model on 1 CPU and 0.5 GB memory. The memory usage and recommendation presented in the result is measured in Gigabytes. The CPU usage and recommendation is measured in CPU cores." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from datetime import datetime\n", + "\n", + "\n", + "environment = Environment('my-sklearn-environment')\n", + "environment.python.conda_dependencies = CondaDependencies.create(pip_packages=[\n", + " 'azureml-defaults',\n", + " 'inference-schema[numpy-support]',\n", + " 'joblib',\n", + " 'numpy',\n", + " 'scikit-learn'\n", + "])\n", + "inference_config = InferenceConfig(entry_script='score.py', environment=environment)\n", + "# if cpu and memory_in_gb parameters are not provided\n", + "# the model will be profiled on default configuration of\n", + "# 3.5CPU and 15GB memory\n", + "profile = Model.profile(ws,\n", + " 'rgrsn-%s' % datetime.now().strftime('%m%d%Y-%H%M%S'),\n", + " [model],\n", + " inference_config,\n", + " input_dataset=sample_request_data,\n", + " cpu=1.0,\n", + " memory_in_gb=0.5)\n", + "\n", + "profile.wait_for_completion(True)\n", + "details = profile.get_details()" ] }, { @@ -405,7 +520,7 @@ "\n", " - To run a production-ready web service, see the [notebook on deployment to Azure Kubernetes Service](../production-deploy-to-aks/production-deploy-to-aks.ipynb).\n", " - To run a local web service, see the [notebook on deployment to a local Docker container](../deploy-to-local/register-model-deploy-local.ipynb).\n", - " - For more information on datasets, see the [notebook on training with datasets](../../work-with-data/datasets-tutorial/train-with-datasets.ipynb).\n", + " - For more information on datasets, see the [notebook on training with datasets](../../work-with-data/datasets-tutorial/train-with-datasets/train-with-datasets.ipynb).\n", " - For more information on environments, see the [notebook on using environments](../../training/using-environments/using-environments.ipynb).\n", " - For information on all the available deployment targets, see [“How and where to deploy models”](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-deploy-and-where#choose-a-compute-target)." ] @@ -414,7 +529,7 @@ "metadata": { "authors": [ { - "name": "aashishb" + "name": "vaidyas" } ], "category": "deployment", diff --git a/how-to-use-azureml/deployment/deploy-to-local/register-model-deploy-local-advanced.ipynb b/how-to-use-azureml/deployment/deploy-to-local/register-model-deploy-local-advanced.ipynb index 8adcce0c1..b0374f64f 100644 --- a/how-to-use-azureml/deployment/deploy-to-local/register-model-deploy-local-advanced.ipynb +++ b/how-to-use-azureml/deployment/deploy-to-local/register-model-deploy-local-advanced.ipynb @@ -189,6 +189,15 @@ " return error" ] }, + { + "cell_type": "markdown", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "Please note that you must indicate azureml-defaults with verion >= 1.0.45 as a pip dependency for your environemnt. This package contains the functionality needed to host the model as a web service." + ] + }, { "cell_type": "code", "execution_count": null, @@ -206,16 +215,6 @@ " - inference-schema[numpy-support]" ] }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%writefile C:/abc/dockerstep/customDockerStep.txt\n", - "RUN echo \"this is test\"" - ] - }, { "cell_type": "code", "execution_count": null, @@ -240,11 +239,10 @@ "source": [ "## Create Inference Configuration\n", "\n", - " - source_directory = holds source path as string, this entire folder gets added in image so its really easy to access any files within this folder or subfolder\n", - " - runtime = Which runtime to use for the image. Current supported runtimes are 'spark-py' and 'python\n", - " - entry_script = contains logic specific to initializing your model and running predictions\n", - " - conda_file = manages conda and python package dependencies.\n", - " - extra_docker_file_steps = optional: any extra steps you want to inject into docker file" + " - file_path: input parameter to Environment constructor. Manages conda and python package dependencies.\n", + " - env.docker.base_dockerfile: any extra steps you want to inject into docker file\n", + " - source_directory: holds source path as string, this entire folder gets added in image so its really easy to access any files within this folder or subfolder\n", + " - entry_script: contains logic specific to initializing your model and running predictions" ] }, { @@ -253,13 +251,19 @@ "metadata": {}, "outputs": [], "source": [ + "from azureml.core.environment import Environment\n", "from azureml.core.model import InferenceConfig\n", "\n", + "\n", + "myenv = Environment.from_conda_specification(name='myenv', file_path='env/myenv.yml')\n", + "\n", + "# explicitly set base_image to None when setting base_dockerfile\n", + "myenv.docker.base_image = None\n", + "myenv.docker.base_dockerfile = \"RUN echo \\\"this is test\\\"\"\n", + "\n", "inference_config = InferenceConfig(source_directory=\"C:/abc\",\n", - " runtime=\"python\", \n", " entry_script=\"x/y/score.py\",\n", - " conda_file=\"env/myenv.yml\", \n", - " extra_docker_file_steps=\"dockerstep/customDockerStep.txt\")" + " environment=myenv)\n" ] }, { diff --git a/how-to-use-azureml/deployment/deploy-to-local/register-model-deploy-local.ipynb b/how-to-use-azureml/deployment/deploy-to-local/register-model-deploy-local.ipynb index 6b9d519c4..0b7660a21 100644 --- a/how-to-use-azureml/deployment/deploy-to-local/register-model-deploy-local.ipynb +++ b/how-to-use-azureml/deployment/deploy-to-local/register-model-deploy-local.ipynb @@ -145,6 +145,110 @@ " environment=environment)" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Model Profiling\n", + "\n", + "Profile your model to understand how much CPU and memory the service, created as a result of its deployment, will need. Profiling returns information such as CPU usage, memory usage, and response latency. It also provides a CPU and memory recommendation based on the resource usage. You can profile your model (or more precisely the service built based on your model) on any CPU and/or memory combination where 0.1 <= CPU <= 3.5 and 0.1GB <= memory <= 15GB. If you do not provide a CPU and/or memory requirement, we will test it on the default configuration of 3.5 CPU and 15GB memory.\n", + "\n", + "In order to profile your model you will need:\n", + "- a registered model\n", + "- an entry script\n", + "- an inference configuration\n", + "- a single column tabular dataset, where each row contains a string representing sample request data sent to the service.\n", + "\n", + "At this point we only support profiling of services that expect their request data to be a string, for example: string serialized json, text, string serialized image, etc. The content of each row of the dataset (string) will be put into the body of the HTTP request and sent to the service encapsulating the model for scoring.\n", + "\n", + "Below is an example of how you can construct an input dataset to profile a service which expects its incoming requests to contain serialized json. In this case we created a dataset based one hundred instances of the same request data. In real world scenarios however, we suggest that you use larger datasets with various inputs, especially if your model resource usage/behavior is input dependent." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import json\n", + "from azureml.core import Datastore\n", + "from azureml.core.dataset import Dataset\n", + "from azureml.data import dataset_type_definitions\n", + "\n", + "\n", + "# create a string that can be put in the body of the request\n", + "serialized_input_json = json.dumps({\n", + " 'data': [\n", + " [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],\n", + " [10, 9, 8, 7, 6, 5, 4, 3, 2, 1]\n", + " ]\n", + "})\n", + "dataset_content = []\n", + "for i in range(100):\n", + " dataset_content.append(serialized_input_json)\n", + "dataset_content = '\\n'.join(dataset_content)\n", + "file_name = 'sample_request_data_diabetes.txt'\n", + "f = open(file_name, 'w')\n", + "f.write(dataset_content)\n", + "f.close()\n", + "\n", + "# upload the txt file created above to the Datastore and create a dataset from it\n", + "data_store = Datastore.get_default(ws)\n", + "data_store.upload_files(['./' + file_name], target_path='sample_request_data_diabetes')\n", + "datastore_path = [(data_store, 'sample_request_data_diabetes' +'/' + file_name)]\n", + "sample_request_data_diabetes = Dataset.Tabular.from_delimited_files(\n", + " datastore_path,\n", + " separator='\\n',\n", + " infer_column_types=True,\n", + " header=dataset_type_definitions.PromoteHeadersBehavior.NO_HEADERS)\n", + "sample_request_data_diabetes = sample_request_data_diabetes.register(workspace=ws,\n", + " name='sample_request_data_diabetes',\n", + " create_new_version=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now that we have an input dataset we are ready to go ahead with profiling. In this case we are testing the previously introduced sklearn regression model on 1 CPU and 0.5 GB memory. The memory usage and recommendation presented in the result is measured in Gigabytes. The CPU usage and recommendation is measured in CPU cores." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from datetime import datetime\n", + "from azureml.core import Environment\n", + "from azureml.core.conda_dependencies import CondaDependencies\n", + "from azureml.core.model import Model, InferenceConfig\n", + "\n", + "\n", + "environment = Environment('my-sklearn-environment')\n", + "environment.python.conda_dependencies = CondaDependencies.create(pip_packages=[\n", + " 'azureml-defaults',\n", + " 'inference-schema[numpy-support]',\n", + " 'joblib',\n", + " 'numpy',\n", + " 'scikit-learn'\n", + "])\n", + "inference_config = InferenceConfig(entry_script='score.py', environment=environment)\n", + "# if cpu and memory_in_gb parameters are not provided\n", + "# the model will be profiled on default configuration of\n", + "# 3.5CPU and 15GB memory\n", + "profile = Model.profile(ws,\n", + " 'profile-%s' % datetime.now().strftime('%m%d%Y-%H%M%S'),\n", + " [model],\n", + " inference_config,\n", + " input_dataset=sample_request_data_diabetes,\n", + " cpu=1.0,\n", + " memory_in_gb=0.5)\n", + "\n", + "profile.wait_for_completion(True)\n", + "details = profile.get_details()" + ] + }, { "cell_type": "markdown", "metadata": {}, diff --git a/how-to-use-azureml/deployment/enable-app-insights-in-production-service/enable-app-insights-in-production-service.ipynb b/how-to-use-azureml/deployment/enable-app-insights-in-production-service/enable-app-insights-in-production-service.ipynb index 66b827a83..91479a4cb 100644 --- a/how-to-use-azureml/deployment/enable-app-insights-in-production-service/enable-app-insights-in-production-service.ipynb +++ b/how-to-use-azureml/deployment/enable-app-insights-in-production-service/enable-app-insights-in-production-service.ipynb @@ -158,7 +158,8 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## 5. *Create myenv.yml file*" + "## 5. *Create myenv.yml file*\n", + "Please note that you must indicate azureml-defaults with verion >= 1.0.45 as a pip dependency, because it contains the functionality needed to host the model as a web service." ] }, { @@ -169,7 +170,8 @@ "source": [ "from azureml.core.conda_dependencies import CondaDependencies \n", "\n", - "myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn'])\n", + "myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn'],\n", + " pip_packages=['azureml-defaults'])\n", "\n", "with open(\"myenv.yml\",\"w\") as f:\n", " f.write(myenv.serialize_to_string())" @@ -189,10 +191,11 @@ "outputs": [], "source": [ "from azureml.core.model import InferenceConfig\n", + "from azureml.core.environment import Environment\n", "\n", - "inference_config = InferenceConfig(runtime= \"python\", \n", - " entry_script=\"score.py\",\n", - " conda_file=\"myenv.yml\")" + "\n", + "myenv = Environment.from_conda_specification(name=\"myenv\", file_path=\"myenv.yml\")\n", + "inference_config = InferenceConfig(entry_script=\"score.py\", environment=myenv)" ] }, { diff --git a/how-to-use-azureml/deployment/onnx/onnx-convert-aml-deploy-tinyyolo.ipynb b/how-to-use-azureml/deployment/onnx/onnx-convert-aml-deploy-tinyyolo.ipynb index ee50aaa34..9d5b89be0 100644 --- a/how-to-use-azureml/deployment/onnx/onnx-convert-aml-deploy-tinyyolo.ipynb +++ b/how-to-use-azureml/deployment/onnx/onnx-convert-aml-deploy-tinyyolo.ipynb @@ -244,7 +244,7 @@ "metadata": {}, "source": [ "### Setting up inference configuration\n", - "First we create a YAML file that specifies which dependencies we would like to see in our container." + "First we create a YAML file that specifies which dependencies we would like to see in our container. Please note that you must include azureml-defaults with verion >= 1.0.45 as a pip dependency, because it contains the functionality needed to host the model as a web service." ] }, { @@ -255,7 +255,7 @@ "source": [ "from azureml.core.conda_dependencies import CondaDependencies \n", "\n", - "myenv = CondaDependencies.create(pip_packages=[\"numpy\",\"onnxruntime==0.4.0\",\"azureml-core\"])\n", + "myenv = CondaDependencies.create(pip_packages=[\"numpy\", \"onnxruntime==0.4.0\", \"azureml-core\", \"azureml-defaults\"])\n", "\n", "with open(\"myenv.yml\",\"w\") as f:\n", " f.write(myenv.serialize_to_string())" @@ -275,11 +275,11 @@ "outputs": [], "source": [ "from azureml.core.model import InferenceConfig\n", + "from azureml.core.environment import Environment\n", "\n", - "inference_config = InferenceConfig(runtime= \"python\", \n", - " entry_script=\"score.py\",\n", - " conda_file=\"myenv.yml\",\n", - " extra_docker_file_steps = \"Dockerfile\")" + "\n", + "myenv = Environment.from_conda_specification(name=\"myenv\", file_path=\"myenv.yml\")\n", + "inference_config = InferenceConfig(entry_script=\"score.py\", environment=myenv)" ] }, { @@ -373,7 +373,7 @@ "metadata": {}, "outputs": [], "source": [ - "#aci_service.delete()" + "aci_service.delete()" ] } ], diff --git a/how-to-use-azureml/deployment/onnx/onnx-inference-facial-expression-recognition-deploy.ipynb b/how-to-use-azureml/deployment/onnx/onnx-inference-facial-expression-recognition-deploy.ipynb index 14e1c06de..9f2963e15 100644 --- a/how-to-use-azureml/deployment/onnx/onnx-inference-facial-expression-recognition-deploy.ipynb +++ b/how-to-use-azureml/deployment/onnx/onnx-inference-facial-expression-recognition-deploy.ipynb @@ -319,7 +319,8 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Write Environment File" + "### Write Environment File\n", + "Please note that you must indicate azureml-defaults with verion >= 1.0.45 as a pip dependency, because it contains the functionality needed to host the model as a web service." ] }, { @@ -330,7 +331,8 @@ "source": [ "from azureml.core.conda_dependencies import CondaDependencies \n", "\n", - "myenv = CondaDependencies.create(pip_packages=[\"numpy\", \"onnxruntime\", \"azureml-core\"])\n", + "\n", + "myenv = CondaDependencies.create(pip_packages=[\"numpy\", \"onnxruntime\", \"azureml-core\", \"azureml-defaults\"])\n", "\n", "with open(\"myenv.yml\",\"w\") as f:\n", " f.write(myenv.serialize_to_string())" @@ -350,11 +352,11 @@ "outputs": [], "source": [ "from azureml.core.model import InferenceConfig\n", + "from azureml.core.environment import Environment\n", + "\n", "\n", - "inference_config = InferenceConfig(runtime= \"python\", \n", - " entry_script=\"score.py\",\n", - " conda_file=\"myenv.yml\",\n", - " extra_docker_file_steps = \"Dockerfile\")" + "myenv = Environment.from_conda_specification(name=\"myenv\", file_path=\"myenv.yml\")\n", + "inference_config = InferenceConfig(entry_script=\"score.py\", environment=myenv)" ] }, { @@ -724,7 +726,7 @@ "source": [ "# remember to delete your service after you are done using it!\n", "\n", - "# aci_service.delete()" + "aci_service.delete()" ] }, { diff --git a/how-to-use-azureml/deployment/onnx/onnx-inference-mnist-deploy.ipynb b/how-to-use-azureml/deployment/onnx/onnx-inference-mnist-deploy.ipynb index c6c6c19f5..b9202470f 100644 --- a/how-to-use-azureml/deployment/onnx/onnx-inference-mnist-deploy.ipynb +++ b/how-to-use-azureml/deployment/onnx/onnx-inference-mnist-deploy.ipynb @@ -306,7 +306,7 @@ "source": [ "### Write Environment File\n", "\n", - "This step creates a YAML environment file that specifies which dependencies we would like to see in our Linux Virtual Machine." + "This step creates a YAML environment file that specifies which dependencies we would like to see in our Linux Virtual Machine. Please note that you must indicate azureml-defaults with verion >= 1.0.45 as a pip dependency, because it contains the functionality needed to host the model as a web service." ] }, { @@ -317,7 +317,7 @@ "source": [ "from azureml.core.conda_dependencies import CondaDependencies \n", "\n", - "myenv = CondaDependencies.create(pip_packages=[\"numpy\", \"onnxruntime\", \"azureml-core\"])\n", + "myenv = CondaDependencies.create(pip_packages=[\"numpy\", \"onnxruntime\", \"azureml-core\", \"azureml-defaults\"])\n", "\n", "with open(\"myenv.yml\",\"w\") as f:\n", " f.write(myenv.serialize_to_string())" @@ -337,11 +337,11 @@ "outputs": [], "source": [ "from azureml.core.model import InferenceConfig\n", + "from azureml.core.environment import Environment\n", "\n", - "inference_config = InferenceConfig(runtime= \"python\", \n", - " entry_script=\"score.py\",\n", - " extra_docker_file_steps = \"Dockerfile\",\n", - " conda_file=\"myenv.yml\")" + "\n", + "myenv = Environment.from_conda_specification(name=\"myenv\", file_path=\"myenv.yml\")\n", + "inference_config = InferenceConfig(entry_script=\"score.py\", environment=myenv)" ] }, { @@ -733,7 +733,7 @@ "source": [ "# remember to delete your service after you are done using it!\n", "\n", - "# aci_service.delete()" + "aci_service.delete()" ] }, { diff --git a/how-to-use-azureml/deployment/onnx/onnx-model-register-and-deploy.ipynb b/how-to-use-azureml/deployment/onnx/onnx-model-register-and-deploy.ipynb index f0334e854..42fdb6d8a 100644 --- a/how-to-use-azureml/deployment/onnx/onnx-model-register-and-deploy.ipynb +++ b/how-to-use-azureml/deployment/onnx/onnx-model-register-and-deploy.ipynb @@ -202,7 +202,7 @@ "metadata": { "authors": [ { - "name": "aashishb" + "name": "vaidyas" } ], "kernelspec": { diff --git a/how-to-use-azureml/deployment/onnx/onnx-modelzoo-aml-deploy-resnet50.ipynb b/how-to-use-azureml/deployment/onnx/onnx-modelzoo-aml-deploy-resnet50.ipynb index e7f75daee..fb408032d 100644 --- a/how-to-use-azureml/deployment/onnx/onnx-modelzoo-aml-deploy-resnet50.ipynb +++ b/how-to-use-azureml/deployment/onnx/onnx-modelzoo-aml-deploy-resnet50.ipynb @@ -241,7 +241,8 @@ "source": [ "from azureml.core.conda_dependencies import CondaDependencies \n", "\n", - "myenv = CondaDependencies.create(pip_packages=[\"numpy\",\"onnxruntime\",\"azureml-core\"])\n", + "\n", + "myenv = CondaDependencies.create(pip_packages=[\"numpy\", \"onnxruntime\", \"azureml-core\", \"azureml-defaults\"])\n", "\n", "with open(\"myenv.yml\",\"w\") as f:\n", " f.write(myenv.serialize_to_string())" @@ -251,7 +252,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Create the inference configuration object" + "Create the inference configuration object. Please note that you must indicate azureml-defaults with verion >= 1.0.45 as a pip dependency, because it contains the functionality needed to host the model as a web service." ] }, { @@ -261,11 +262,11 @@ "outputs": [], "source": [ "from azureml.core.model import InferenceConfig\n", + "from azureml.core.environment import Environment\n", + "\n", "\n", - "inference_config = InferenceConfig(runtime= \"python\", \n", - " entry_script=\"score.py\",\n", - " conda_file=\"myenv.yml\",\n", - " extra_docker_file_steps = \"Dockerfile\")" + "myenv = Environment.from_conda_specification(name=\"myenv\", file_path=\"myenv.yml\")\n", + "inference_config = InferenceConfig(entry_script=\"score.py\", environment=myenv)" ] }, { @@ -361,7 +362,7 @@ "metadata": {}, "outputs": [], "source": [ - "#aci_service.delete()" + "aci_service.delete()" ] } ], diff --git a/how-to-use-azureml/deployment/onnx/onnx-train-pytorch-aml-deploy-mnist.ipynb b/how-to-use-azureml/deployment/onnx/onnx-train-pytorch-aml-deploy-mnist.ipynb index e69ba0d8e..92d8ef5ef 100644 --- a/how-to-use-azureml/deployment/onnx/onnx-train-pytorch-aml-deploy-mnist.ipynb +++ b/how-to-use-azureml/deployment/onnx/onnx-train-pytorch-aml-deploy-mnist.ipynb @@ -405,7 +405,7 @@ "metadata": {}, "source": [ "### Create inference configuration\n", - "First we create a YAML file that specifies which dependencies we would like to see in our container." + "First we create a YAML file that specifies which dependencies we would like to see in our container. Please note that you must indicate azureml-defaults with verion >= 1.0.45 as a pip dependency, because it contains the functionality needed to host the model as a web service." ] }, { @@ -416,7 +416,7 @@ "source": [ "from azureml.core.conda_dependencies import CondaDependencies \n", "\n", - "myenv = CondaDependencies.create(pip_packages=[\"numpy\",\"onnxruntime\",\"azureml-core\"])\n", + "myenv = CondaDependencies.create(pip_packages=[\"numpy\",\"onnxruntime\",\"azureml-core\", \"azureml-defaults\"])\n", "\n", "with open(\"myenv.yml\",\"w\") as f:\n", " f.write(myenv.serialize_to_string())" @@ -436,11 +436,11 @@ "outputs": [], "source": [ "from azureml.core.model import InferenceConfig\n", + "from azureml.core.environment import Environment\n", "\n", - "inference_config = InferenceConfig(runtime= \"python\", \n", - " entry_script=\"score.py\",\n", - " conda_file=\"myenv.yml\",\n", - " extra_docker_file_steps = \"Dockerfile\")" + "\n", + "myenv = Environment.from_conda_specification(name=\"myenv\", file_path=\"myenv.yml\")\n", + "inference_config = InferenceConfig(entry_script=\"score.py\", environment=myenv)" ] }, { @@ -537,7 +537,7 @@ "metadata": {}, "outputs": [], "source": [ - "#aci_service.delete()" + "aci_service.delete()" ] } ], diff --git a/how-to-use-azureml/deployment/production-deploy-to-aks-gpu/production-deploy-to-aks-gpu.ipynb b/how-to-use-azureml/deployment/production-deploy-to-aks-gpu/production-deploy-to-aks-gpu.ipynb new file mode 100644 index 000000000..aed26009b --- /dev/null +++ b/how-to-use-azureml/deployment/production-deploy-to-aks-gpu/production-deploy-to-aks-gpu.ipynb @@ -0,0 +1,314 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/production-deploy-to-aks-gpu/production-deploy-to-aks-gpu.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Deploying a web service to Azure Kubernetes Service (AKS)\n", + "This notebook shows the steps for deploying a service: registering a model, creating an image, provisioning a cluster (one time action), and deploying a service to it. \n", + "We then test and delete the service, image and model." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import azureml.core\n", + "print(azureml.core.VERSION)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Get workspace\n", + "Load existing workspace from the config file info." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.workspace import Workspace\n", + "\n", + "ws = Workspace.from_config()\n", + "print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Register the model\n", + "Register an existing trained model, add descirption and tags. Prior to registering the model, you should have a TensorFlow [Saved Model](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/saved_model/README.md) in the `resnet50` directory. You can download a [pretrained resnet50](http://download.tensorflow.org/models/official/20181001_resnet/savedmodels/resnet_v1_fp32_savedmodel_NCHW_jpg.tar.gz) and unpack it to that directory." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#Register the model\n", + "from azureml.core.model import Model\n", + "model = Model.register(model_path = \"resnet50\", # this points to a local file\n", + " model_name = \"resnet50\", # this is the name the model is registered as\n", + " tags = {'area': \"Image classification\", 'type': \"classification\"},\n", + " description = \"Image classification trained on Imagenet Dataset\",\n", + " workspace = ws)\n", + "\n", + "print(model.name, model.description, model.version)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Provision the AKS Cluster\n", + "This is a one time setup. You can reuse this cluster for multiple deployments after it has been created. If you delete the cluster or the resource group that contains it, then you would have to recreate it." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.compute import ComputeTarget, AksCompute\n", + "from azureml.core.compute_target import ComputeTargetException\n", + "\n", + "# Choose a name for your GPU cluster\n", + "gpu_cluster_name = \"aks-gpu-cluster\"\n", + "\n", + "# Verify that cluster does not exist already\n", + "try:\n", + " gpu_cluster = ComputeTarget(workspace=ws, name=gpu_cluster_name)\n", + " print(\"Found existing gpu cluster\")\n", + "except ComputeTargetException:\n", + " print(\"Creating new gpu-cluster\")\n", + " \n", + " # Specify the configuration for the new cluster\n", + " compute_config = AksCompute.provisioning_configuration(cluster_purpose=AksCompute.ClusterPurpose.DEV_TEST,\n", + " agent_count=1,\n", + " vm_size=\"Standard_NV6\")\n", + " # Create the cluster with the specified name and configuration\n", + " gpu_cluster = ComputeTarget.create(ws, gpu_cluster_name, compute_config)\n", + "\n", + " # Wait for the cluster to complete, show the output log\n", + " gpu_cluster.wait_for_completion(show_output=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Deploy the model as a web service to AKS\n", + "\n", + "First create a scoring script" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%writefile score.py\n", + "import tensorflow as tf\n", + "import numpy as np\n", + "import json\n", + "import os\n", + "from azureml.contrib.services.aml_request import AMLRequest, rawhttp\n", + "from azureml.contrib.services.aml_response import AMLResponse\n", + "\n", + "def init():\n", + " global session\n", + " global input_name\n", + " global output_name\n", + " \n", + " session = tf.Session()\n", + "\n", + " # AZUREML_MODEL_DIR is an environment variable created during deployment.\n", + " # It is the path to the model folder (./azureml-models/$MODEL_NAME/$VERSION)\n", + " # For multiple models, it points to the folder containing all deployed models (./azureml-models)\n", + " model_path = os.path.join(os.getenv('AZUREML_MODEL_DIR'), 'resnet50')\n", + " model = tf.saved_model.loader.load(session, ['serve'], model_path)\n", + " if len(model.signature_def['serving_default'].inputs) > 1:\n", + " raise ValueError(\"This score.py only supports one input\")\n", + " input_name = [tensor.name for tensor in model.signature_def['serving_default'].inputs.values()][0]\n", + " output_name = [tensor.name for tensor in model.signature_def['serving_default'].outputs.values()]\n", + " \n", + "\n", + "@rawhttp\n", + "def run(request):\n", + " if request.method == 'POST':\n", + " reqBody = request.get_data(False)\n", + " resp = score(reqBody)\n", + " return AMLResponse(resp, 200)\n", + " if request.method == 'GET':\n", + " respBody = str.encode(\"GET is not supported\")\n", + " return AMLResponse(respBody, 405)\n", + " return AMLResponse(\"bad request\", 500)\n", + "\n", + "def score(data):\n", + " result = session.run(output_name, {input_name: [data]})\n", + " return json.dumps(result[1].tolist())\n", + "\n", + "if __name__ == \"__main__\":\n", + " init()\n", + " with open(\"test_image.jpg\", 'rb') as f:\n", + " content = f.read()\n", + " print(score(content))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now create the deployment configuration objects and deploy the model as a webservice." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Set the web service configuration (using default here)\n", + "from azureml.core.model import InferenceConfig\n", + "from azureml.core.webservice import AksWebservice\n", + "from azureml.core.conda_dependencies import CondaDependencies\n", + "from azureml.core.environment import Environment, DEFAULT_GPU_IMAGE\n", + "\n", + "env = Environment('deploytocloudenv')\n", + "# Please see [Azure ML Containers repository](https://github.com/Azure/AzureML-Containers#featured-tags)\n", + "# for open-sourced GPU base images.\n", + "env.docker.base_image = DEFAULT_GPU_IMAGE\n", + "env.python.conda_dependencies = CondaDependencies.create(conda_packages=['tensorflow-gpu==1.12.0','numpy'],\n", + " pip_packages=['azureml-contrib-services', 'azureml-defaults'])\n", + "\n", + "inference_config = InferenceConfig(entry_script=\"score.py\", environment=env)\n", + "aks_config = AksWebservice.deploy_configuration()\n", + "\n", + "# # Enable token auth and disable (key) auth on the webservice\n", + "# aks_config = AksWebservice.deploy_configuration(token_auth_enabled=True, auth_enabled=False)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%time\n", + "aks_service_name ='gpu-rn50'\n", + "\n", + "aks_service = Model.deploy(workspace=ws,\n", + " name=aks_service_name,\n", + " models=[model],\n", + " inference_config=inference_config,\n", + " deployment_config=aks_config,\n", + " deployment_target=gpu_cluster)\n", + "\n", + "aks_service.wait_for_deployment(show_output = True)\n", + "print(aks_service.state)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Test the web service\n", + "We test the web sevice by passing the test images content." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%time\n", + "import requests\n", + "\n", + "# if (key) auth is enabled, fetch keys and include in the request\n", + "key1, key2 = aks_service.get_keys()\n", + "\n", + "headers = {'Content-Type':'application/json', 'Authorization': 'Bearer ' + key1}\n", + "\n", + "# # if token auth is enabled, fetch token and include in the request\n", + "# access_token, fetch_after = aks_service.get_token()\n", + "# headers = {'Content-Type':'application/json', 'Authorization': 'Bearer ' + access_token}\n", + "\n", + "test_sample = open('snowleopardgaze.jpg', 'rb').read()\n", + "resp = requests.post(aks_service.scoring_uri, test_sample, headers=headers)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Clean up\n", + "Delete the service, image, model and compute target" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%%time\n", + "aks_service.delete()\n", + "model.delete()\n", + "gpu_cluster.delete()\n" + ] + } + ], + "metadata": { + "authors": [ + { + "name": "vaidyas" + } + ], + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.6" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} \ No newline at end of file diff --git a/how-to-use-azureml/deployment/production-deploy-to-aks-gpu/production-deploy-to-aks-gpu.yml b/how-to-use-azureml/deployment/production-deploy-to-aks-gpu/production-deploy-to-aks-gpu.yml new file mode 100644 index 000000000..c2afb644b --- /dev/null +++ b/how-to-use-azureml/deployment/production-deploy-to-aks-gpu/production-deploy-to-aks-gpu.yml @@ -0,0 +1,5 @@ +name: production-deploy-to-aks-gpu +dependencies: +- pip: + - azureml-sdk + - tensorflow diff --git a/how-to-use-azureml/deployment/production-deploy-to-aks-gpu/snowleopardgaze.jpg b/how-to-use-azureml/deployment/production-deploy-to-aks-gpu/snowleopardgaze.jpg new file mode 100644 index 000000000..80450160b Binary files /dev/null and b/how-to-use-azureml/deployment/production-deploy-to-aks-gpu/snowleopardgaze.jpg differ diff --git a/how-to-use-azureml/deployment/production-deploy-to-aks/production-deploy-to-aks.ipynb b/how-to-use-azureml/deployment/production-deploy-to-aks/production-deploy-to-aks.ipynb index 16649a241..5ea43c868 100644 --- a/how-to-use-azureml/deployment/production-deploy-to-aks/production-deploy-to-aks.ipynb +++ b/how-to-use-azureml/deployment/production-deploy-to-aks/production-deploy-to-aks.ipynb @@ -198,6 +198,124 @@ "inf_config = InferenceConfig(entry_script='score.py', environment=myenv)" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Model Profiling\n", + "\n", + "Profile your model to understand how much CPU and memory the service, created as a result of its deployment, will need. Profiling returns information such as CPU usage, memory usage, and response latency. It also provides a CPU and memory recommendation based on the resource usage. You can profile your model (or more precisely the service built based on your model) on any CPU and/or memory combination where 0.1 <= CPU <= 3.5 and 0.1GB <= memory <= 15GB. If you do not provide a CPU and/or memory requirement, we will test it on the default configuration of 3.5 CPU and 15GB memory.\n", + "\n", + "In order to profile your model you will need:\n", + "- a registered model\n", + "- an entry script\n", + "- an inference configuration\n", + "- a single column tabular dataset, where each row contains a string representing sample request data sent to the service.\n", + "\n", + "At this point we only support profiling of services that expect their request data to be a string, for example: string serialized json, text, string serialized image, etc. The content of each row of the dataset (string) will be put into the body of the HTTP request and sent to the service encapsulating the model for scoring.\n", + "\n", + "Below is an example of how you can construct an input dataset to profile a service which expects its incoming requests to contain serialized json. In this case we created a dataset based one hundred instances of the same request data. In real world scenarios however, we suggest that you use larger datasets with various inputs, especially if your model resource usage/behavior is input dependent." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You may want to register datasets using the register() method to your workspace so they can be shared with others, reused and referred to by name in your script.\n", + "You can try get the dataset first to see if it's already registered." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import json\n", + "from azureml.core import Datastore\n", + "from azureml.core.dataset import Dataset\n", + "from azureml.data import dataset_type_definitions\n", + "\n", + "dataset_name='sample_request_data'\n", + "\n", + "dataset_registered = False\n", + "try:\n", + " sample_request_data = Dataset.get_by_name(workspace = ws, name = dataset_name)\n", + " dataset_registered = True\n", + "except:\n", + " print(\"The dataset {} is not registered in workspace yet.\".format(dataset_name))\n", + "\n", + "if not dataset_registered:\n", + " input_json = {'data': [[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],\n", + " [10, 9, 8, 7, 6, 5, 4, 3, 2, 1]]}\n", + " # create a string that can be put in the body of the request\n", + " serialized_input_json = json.dumps(input_json)\n", + " dataset_content = []\n", + " for i in range(100):\n", + " dataset_content.append(serialized_input_json)\n", + " sample_request_data = '\\n'.join(dataset_content)\n", + " file_name = \"{}.txt\".format(dataset_name)\n", + " f = open(file_name, 'w')\n", + " f.write(sample_request_data)\n", + " f.close()\n", + "\n", + " # upload the txt file created above to the Datastore and create a dataset from it\n", + " data_store = Datastore.get_default(ws)\n", + " data_store.upload_files(['./' + file_name], target_path='sample_request_data')\n", + " datastore_path = [(data_store, 'sample_request_data' +'/' + file_name)]\n", + " sample_request_data = Dataset.Tabular.from_delimited_files(\n", + " datastore_path,\n", + " separator='\\n',\n", + " infer_column_types=True,\n", + " header=dataset_type_definitions.PromoteHeadersBehavior.NO_HEADERS)\n", + " sample_request_data = sample_request_data.register(workspace=ws,\n", + " name=dataset_name,\n", + " create_new_version=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now that we have an input dataset we are ready to go ahead with profiling. In this case we are testing the previously introduced sklearn regression model on 1 CPU and 0.5 GB memory. The memory usage and recommendation presented in the result is measured in Gigabytes. The CPU usage and recommendation is measured in CPU cores." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from datetime import datetime\n", + "from azureml.core import Environment\n", + "from azureml.core.conda_dependencies import CondaDependencies\n", + "from azureml.core.model import Model, InferenceConfig\n", + "\n", + "\n", + "environment = Environment('my-sklearn-environment')\n", + "environment.python.conda_dependencies = CondaDependencies.create(pip_packages=[\n", + " 'azureml-defaults',\n", + " 'inference-schema[numpy-support]',\n", + " 'joblib',\n", + " 'numpy',\n", + " 'scikit-learn'\n", + "])\n", + "inference_config = InferenceConfig(entry_script='score.py', environment=environment)\n", + "# if cpu and memory_in_gb parameters are not provided\n", + "# the model will be profiled on default configuration of\n", + "# 3.5CPU and 15GB memory\n", + "profile = Model.profile(ws,\n", + " 'sklearn-%s' % datetime.now().strftime('%m%d%Y-%H%M%S'),\n", + " [model],\n", + " inference_config,\n", + " input_dataset=sample_request_data,\n", + " cpu=1.0,\n", + " memory_in_gb=0.5)\n", + "\n", + "profile.wait_for_completion(True)\n", + "details = profile.get_details()" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -318,7 +436,11 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [ + "sample-deploy-to-aks" + ] + }, "outputs": [], "source": [ "# Set the web service configuration (using default here)\n", @@ -331,7 +453,11 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [ + "sample-deploy-to-aks" + ] + }, "outputs": [], "source": [ "%%time\n", @@ -452,7 +578,7 @@ "metadata": { "authors": [ { - "name": "aashishb" + "name": "vaidyas" } ], "kernelspec": { diff --git a/how-to-use-azureml/deployment/register-model-create-image-deploy-service/register-model-create-image-deploy-service.ipynb b/how-to-use-azureml/deployment/register-model-create-image-deploy-service/register-model-create-image-deploy-service.ipynb deleted file mode 100644 index 678152529..000000000 --- a/how-to-use-azureml/deployment/register-model-create-image-deploy-service/register-model-create-image-deploy-service.ipynb +++ /dev/null @@ -1,457 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/deployment/register-model-create-image-deploy-service/register-model-create-image-deploy-service.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Register Model, Create Image and Deploy Service\n", - "\n", - "This example shows how to deploy a web service in step-by-step fashion:\n", - "\n", - " 1. Register model\n", - " 2. Query versions of models and select one to deploy\n", - " 3. Create Docker image\n", - " 4. Query versions of images\n", - " 5. Deploy the image as web service\n", - " \n", - "**IMPORTANT**:\n", - " * This notebook requires you to first complete [train-within-notebook](../../training/train-within-notebook/train-within-notebook.ipynb) example\n", - " \n", - "The train-within-notebook example taught you how to deploy a web service directly from model in one step. This Notebook shows a more advanced approach that gives you more control over model versions and Docker image versions. " - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Prerequisites\n", - "If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the [configuration](../../../configuration.ipynb) Notebook first if you haven't." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Check core SDK version number\n", - "import azureml.core\n", - "\n", - "print(\"SDK version:\", azureml.core.VERSION)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Initialize Workspace\n", - "\n", - "Initialize a workspace object from persisted configuration." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "create workspace" - ] - }, - "outputs": [], - "source": [ - "from azureml.core import Workspace\n", - "\n", - "ws = Workspace.from_config()\n", - "print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Register Model" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You can add tags and descriptions to your models. Note you need to have a `sklearn_linreg_model.pkl` file in the current directory. This file is generated by the 01 notebook. The below call registers that file as a model with the same name `sklearn_linreg_model.pkl` in the workspace.\n", - "\n", - "Using tags, you can track useful information such as the name and version of the machine learning library used to train the model. Note that tags must be alphanumeric." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "register model from file" - ] - }, - "outputs": [], - "source": [ - "from azureml.core.model import Model\n", - "import sklearn\n", - "\n", - "library_version = \"sklearn\"+sklearn.__version__.replace(\".\",\"x\")\n", - "\n", - "model = Model.register(model_path = \"sklearn_regression_model.pkl\",\n", - " model_name = \"sklearn_regression_model.pkl\",\n", - " tags = {'area': \"diabetes\", 'type': \"regression\", 'version': library_version},\n", - " description = \"Ridge regression model to predict diabetes\",\n", - " workspace = ws)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You can explore the registered models within your workspace and query by tag. Models are versioned. If you call the register_model command many times with same model name, you will get multiple versions of the model with increasing version numbers." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "register model from file" - ] - }, - "outputs": [], - "source": [ - "regression_models = Model.list(workspace=ws, tags=['area'])\n", - "for m in regression_models:\n", - " print(\"Name:\", m.name,\"\\tVersion:\", m.version, \"\\tDescription:\", m.description, m.tags)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You can pick a specific model to deploy" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(model.name, model.description, model.version, sep = '\\t')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create Docker Image" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Show `score.py`. Note that the `sklearn_regression_model.pkl` in the `get_model_path` call is referring to a model named `sklearn_linreg_model.pkl` registered under the workspace. It is NOT referenceing the local file." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%writefile score.py\n", - "import os\n", - "import pickle\n", - "import json\n", - "import numpy\n", - "from sklearn.externals import joblib\n", - "from sklearn.linear_model import Ridge\n", - "\n", - "def init():\n", - " global model\n", - " # AZUREML_MODEL_DIR is an environment variable created during deployment.\n", - " # It is the path to the model folder (./azureml-models/$MODEL_NAME/$VERSION)\n", - " # For multiple models, it points to the folder containing all deployed models (./azureml-models)\n", - " model_path = os.path.join(os.getenv('AZUREML_MODEL_DIR'), 'sklearn_regression_model.pkl')\n", - " # deserialize the model file back into a sklearn model\n", - " model = joblib.load(model_path)\n", - "\n", - "# note you can pass in multiple rows for scoring\n", - "def run(raw_data):\n", - " try:\n", - " data = json.loads(raw_data)['data']\n", - " data = numpy.array(data)\n", - " result = model.predict(data)\n", - " # you can return any datatype as long as it is JSON-serializable\n", - " return result.tolist()\n", - " except Exception as e:\n", - " error = str(e)\n", - " return error" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.conda_dependencies import CondaDependencies \n", - "\n", - "myenv = CondaDependencies.create(conda_packages=['numpy','scikit-learn'])\n", - "\n", - "with open(\"myenv.yml\",\"w\") as f:\n", - " f.write(myenv.serialize_to_string())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Note that following command can take few minutes. \n", - "\n", - "You can add tags and descriptions to images. Also, an image can contain multiple models." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "create image", - "sample-image-create" - ] - }, - "outputs": [], - "source": [ - "from azureml.core.image import Image, ContainerImage\n", - "\n", - "image_config = ContainerImage.image_configuration(runtime= \"python\",\n", - " execution_script=\"score.py\",\n", - " conda_file=\"myenv.yml\",\n", - " tags = {'area': \"diabetes\", 'type': \"regression\"},\n", - " description = \"Image with ridge regression model\")\n", - "\n", - "image = Image.create(name = \"myimage1\",\n", - " # this is the model object. note you can pass in 0-n models via this list-type parameter\n", - " # in case you need to reference multiple models, or none at all, in your scoring script.\n", - " models = [model],\n", - " image_config = image_config, \n", - " workspace = ws)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "create image" - ] - }, - "outputs": [], - "source": [ - "image.wait_for_creation(show_output = True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Use a custom Docker image\n", - "\n", - "You can also specify a custom Docker image to be used as base image if you don't want to use the default base image provided by Azure ML. Please make sure the custom Docker image has Ubuntu >= 16.04, Conda >= 4.5.\\* and Python(3.5.\\* or 3.6.\\*).\n", - "\n", - "Only Supported for `ContainerImage`(from azureml.core.image) with `python` runtime.\n", - "```python\n", - "# use an image available in public Container Registry without authentication\n", - "image_config.base_image = \"mcr.microsoft.com/azureml/o16n-sample-user-base/ubuntu-miniconda\"\n", - "\n", - "# or, use an image available in a private Container Registry\n", - "image_config.base_image = \"myregistry.azurecr.io/mycustomimage:1.0\"\n", - "image_config.base_image_registry.address = \"myregistry.azurecr.io\"\n", - "image_config.base_image_registry.username = \"username\"\n", - "image_config.base_image_registry.password = \"password\"\n", - "\n", - "# or, use an image built during training.\n", - "image_config.base_image = run.properties[\"AzureML.DerivedImageName\"]\n", - "```\n", - "You can get the address of training image from the properties of a Run object. Only new runs submitted with azureml-sdk>=1.0.22 to AMLCompute targets will have the 'AzureML.DerivedImageName' property. Instructions on how to get a Run can be found in [manage-runs](../../training/manage-runs/manage-runs.ipynb). \n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "List images by tag and find out the detailed build log for debugging." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "create image" - ] - }, - "outputs": [], - "source": [ - "for i in Image.list(workspace = ws,tags = [\"area\"]):\n", - " print('{}(v.{} [{}]) stored at {} with build log {}'.format(i.name, i.version, i.creation_state, i.image_location, i.image_build_log_uri))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Deploy image as web service on Azure Container Instance\n", - "\n", - "Note that the service creation can take few minutes." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "deploy service", - "aci", - "sample-aciwebservice-deploy-config" - ] - }, - "outputs": [], - "source": [ - "from azureml.core.webservice import AciWebservice\n", - "\n", - "aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, \n", - " memory_gb = 1, \n", - " tags = {'area': \"diabetes\", 'type': \"regression\"}, \n", - " description = 'Predict diabetes using regression model')" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "deploy service", - "aci", - "sample-aciwebservice-deploy-from-image" - ] - }, - "outputs": [], - "source": [ - "from azureml.core.webservice import Webservice\n", - "\n", - "aci_service_name = 'my-aci-service-2'\n", - "print(aci_service_name)\n", - "aci_service = Webservice.deploy_from_image(deployment_config = aciconfig,\n", - " image = image,\n", - " name = aci_service_name,\n", - " workspace = ws)\n", - "aci_service.wait_for_deployment(True)\n", - "print(aci_service.state)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Test web service" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Call the web service with some dummy input data to get a prediction." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "deploy service", - "aci" - ] - }, - "outputs": [], - "source": [ - "import json\n", - "\n", - "test_sample = json.dumps({'data': [\n", - " [1,2,3,4,5,6,7,8,9,10], \n", - " [10,9,8,7,6,5,4,3,2,1]\n", - "]})\n", - "test_sample = bytes(test_sample,encoding = 'utf8')\n", - "\n", - "prediction = aci_service.run(input_data=test_sample)\n", - "print(prediction)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Delete ACI to clean up" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "deploy service", - "aci" - ] - }, - "outputs": [], - "source": [ - "aci_service.delete()" - ] - } - ], - "metadata": { - "authors": [ - { - "name": "aashishb" - } - ], - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.6" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} \ No newline at end of file diff --git a/how-to-use-azureml/deployment/register-model-create-image-deploy-service/register-model-create-image-deploy-service.yml b/how-to-use-azureml/deployment/register-model-create-image-deploy-service/register-model-create-image-deploy-service.yml deleted file mode 100644 index 509d9b40c..000000000 --- a/how-to-use-azureml/deployment/register-model-create-image-deploy-service/register-model-create-image-deploy-service.yml +++ /dev/null @@ -1,8 +0,0 @@ -name: register-model-create-image-deploy-service -dependencies: -- pip: - - azureml-sdk - - matplotlib - - tqdm - - scipy - - sklearn diff --git a/how-to-use-azureml/deployment/register-model-create-image-deploy-service/sklearn_regression_model.pkl b/how-to-use-azureml/deployment/register-model-create-image-deploy-service/sklearn_regression_model.pkl deleted file mode 100644 index d10309b6c..000000000 Binary files a/how-to-use-azureml/deployment/register-model-create-image-deploy-service/sklearn_regression_model.pkl and /dev/null differ diff --git a/how-to-use-azureml/deployment/spark/iris.model/data/_SUCCESS b/how-to-use-azureml/deployment/spark/iris.model/data/_SUCCESS new file mode 100644 index 000000000..e69de29bb diff --git a/how-to-use-azureml/deployment/spark/iris.model/data/part-00000-dabcf097-2b45-4b28-bbca-6c17889ddcbf-c000.snappy.parquet b/how-to-use-azureml/deployment/spark/iris.model/data/part-00000-dabcf097-2b45-4b28-bbca-6c17889ddcbf-c000.snappy.parquet new file mode 100644 index 000000000..8f17afc95 Binary files /dev/null and b/how-to-use-azureml/deployment/spark/iris.model/data/part-00000-dabcf097-2b45-4b28-bbca-6c17889ddcbf-c000.snappy.parquet differ diff --git a/how-to-use-azureml/deployment/spark/iris.model/metadata/_SUCCESS b/how-to-use-azureml/deployment/spark/iris.model/metadata/_SUCCESS new file mode 100644 index 000000000..e69de29bb diff --git a/how-to-use-azureml/deployment/spark/iris.model/metadata/part-00000 b/how-to-use-azureml/deployment/spark/iris.model/metadata/part-00000 new file mode 100644 index 000000000..312ecb55e --- /dev/null +++ b/how-to-use-azureml/deployment/spark/iris.model/metadata/part-00000 @@ -0,0 +1 @@ +{"class":"org.apache.spark.ml.classification.LogisticRegressionModel","timestamp":1570147252329,"sparkVersion":"2.4.0","uid":"LogisticRegression_5df3978caaf3","paramMap":{"regParam":0.01},"defaultParamMap":{"aggregationDepth":2,"threshold":0.5,"rawPredictionCol":"rawPrediction","featuresCol":"features","labelCol":"label","predictionCol":"prediction","family":"auto","regParam":0.0,"tol":1.0E-6,"probabilityCol":"probability","standardization":true,"elasticNetParam":0.0,"maxIter":100,"fitIntercept":true}} diff --git a/how-to-use-azureml/deployment/spark/model-register-and-deploy-spark.ipynb b/how-to-use-azureml/deployment/spark/model-register-and-deploy-spark.ipynb new file mode 100644 index 000000000..a254f8982 --- /dev/null +++ b/how-to-use-azureml/deployment/spark/model-register-and-deploy-spark.ipynb @@ -0,0 +1,343 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/deployment/deploy-to-cloud/model-register-and-deploy.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Register Spark Model and deploy as Webservice\n", + "\n", + "This example shows how to deploy a Webservice in step-by-step fashion:\n", + "\n", + " 1. Register Spark Model\n", + " 2. Deploy Spark Model as Webservice" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Prerequisites\n", + "If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the [configuration](../../../configuration.ipynb) Notebook first if you haven't." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Check core SDK version number\n", + "import azureml.core\n", + "\n", + "print(\"SDK version:\", azureml.core.VERSION)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Initialize Workspace\n", + "\n", + "Initialize a workspace object from persisted configuration." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "create workspace" + ] + }, + "outputs": [], + "source": [ + "from azureml.core import Workspace\n", + "\n", + "ws = Workspace.from_config()\n", + "print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep='\\n')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Register Model" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can add tags and descriptions to your Models. Note you need to have a `iris.model` file in the current directory. This model file is generated using [train in spark](../training/train-in-spark/train-in-spark.ipynb) notebook. The below call registers that file as a Model with the same name `iris.model` in the workspace.\n", + "\n", + "Using tags, you can track useful information such as the name and version of the machine learning library used to train the model. Note that tags must be alphanumeric." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "register model from file" + ] + }, + "outputs": [], + "source": [ + "from azureml.core.model import Model\n", + "\n", + "model = Model.register(model_path=\"iris.model\",\n", + " model_name=\"iris.model\",\n", + " tags={'type': \"regression\"},\n", + " description=\"Logistic regression model to predict iris species\",\n", + " workspace=ws)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Fetch Environment" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can now create and/or use an Environment object when deploying a Webservice. The Environment can have been previously registered with your Workspace, or it will be registered with it as a part of the Webservice deployment.\n", + "\n", + "In this notebook, we will be using 'AzureML-PySpark-MmlSpark-0.15', a curated environment.\n", + "\n", + "More information can be found in our [using environments notebook](../training/using-environments/using-environments.ipynb)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core import Environment\n", + "\n", + "env = Environment.get(ws, name='AzureML-PySpark-MmlSpark-0.15')\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create Inference Configuration\n", + "\n", + "There is now support for a source directory, you can upload an entire folder from your local machine as dependencies for the Webservice.\n", + "Note: in that case, your entry_script is relative path to the source_directory path.\n", + "\n", + "Sample code for using a source directory:\n", + "\n", + "```python\n", + "inference_config = InferenceConfig(source_directory=\"C:/abc\",\n", + " entry_script=\"x/y/score.py\",\n", + " environment=environment)\n", + "```\n", + "\n", + " - source_directory = holds source path as string, this entire folder gets added in image so its really easy to access any files within this folder or subfolder\n", + " - entry_script = contains logic specific to initializing your model and running predictions\n", + " - environment = An environment object to use for the deployment. Doesn't have to be registered" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "create image" + ] + }, + "outputs": [], + "source": [ + "from azureml.core.model import InferenceConfig\n", + "\n", + "inference_config = InferenceConfig(entry_script=\"score.py\", environment=env)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Deploy Model as Webservice on Azure Container Instance\n", + "\n", + "Note that the service creation can take few minutes." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "azuremlexception-remarks-sample" + ] + }, + "outputs": [], + "source": [ + "from azureml.core.webservice import AciWebservice, Webservice\n", + "from azureml.exceptions import WebserviceException\n", + "\n", + "deployment_config = AciWebservice.deploy_configuration(cpu_cores=1, memory_gb=1)\n", + "aci_service_name = 'aciservice1'\n", + "\n", + "try:\n", + " # if you want to get existing service below is the command\n", + " # since aci name needs to be unique in subscription deleting existing aci if any\n", + " # we use aci_service_name to create azure aci\n", + " service = Webservice(ws, name=aci_service_name)\n", + " if service:\n", + " service.delete()\n", + "except WebserviceException as e:\n", + " print()\n", + "\n", + "service = Model.deploy(ws, aci_service_name, [model], inference_config, deployment_config)\n", + "\n", + "service.wait_for_deployment(True)\n", + "print(service.state)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Test web service" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import json\n", + "test_sample = json.dumps({'features':{'type':1,'values':[4.3,3.0,1.1,0.1]},'label':2.0})\n", + "\n", + "test_sample_encoded = bytes(test_sample, encoding='utf8')\n", + "prediction = service.run(input_data=test_sample_encoded)\n", + "print(prediction)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Delete ACI to clean up" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "deploy service", + "aci" + ] + }, + "outputs": [], + "source": [ + "service.delete()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Model Profiling\n", + "\n", + "You can also take advantage of the profiling feature to estimate CPU and memory requirements for models.\n", + "\n", + "```python\n", + "profile = Model.profile(ws, \"profilename\", [model], inference_config, test_sample)\n", + "profile.wait_for_profiling(True)\n", + "profiling_results = profile.get_results()\n", + "print(profiling_results)\n", + "```" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Model Packaging\n", + "\n", + "If you want to build a Docker image that encapsulates your model and its dependencies, you can use the model packaging option. The output image will be pushed to your workspace's ACR.\n", + "\n", + "You must include an Environment object in your inference configuration to use `Model.package()`.\n", + "\n", + "```python\n", + "package = Model.package(ws, [model], inference_config)\n", + "package.wait_for_creation(show_output=True) # Or show_output=False to hide the Docker build logs.\n", + "package.pull()\n", + "```\n", + "\n", + "Instead of a fully-built image, you can also generate a Dockerfile and download all the assets needed to build an image on top of your Environment.\n", + "\n", + "```python\n", + "package = Model.package(ws, [model], inference_config, generate_dockerfile=True)\n", + "package.wait_for_creation(show_output=True)\n", + "package.save(\"./local_context_dir\")\n", + "```" + ] + } + ], + "metadata": { + "authors": [ + { + "name": "vaidyas" + } + ], + "category": "deployment", + "compute": [ + "None" + ], + "datasets": [ + "Iris" + ], + "deployment": [ + "Azure Container Instance" + ], + "exclude_from_index": false, + "framework": [ + "PySpark" + ], + "friendly_name": "Register Spark model and deploy as webservice", + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.2" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} \ No newline at end of file diff --git a/how-to-use-azureml/deployment/spark/model-register-and-deploy-spark.yml b/how-to-use-azureml/deployment/spark/model-register-and-deploy-spark.yml new file mode 100644 index 000000000..8414fbb0d --- /dev/null +++ b/how-to-use-azureml/deployment/spark/model-register-and-deploy-spark.yml @@ -0,0 +1,4 @@ +name: model-register-and-deploy-spark +dependencies: +- pip: + - azureml-sdk diff --git a/how-to-use-azureml/deployment/spark/score.py b/how-to-use-azureml/deployment/spark/score.py new file mode 100644 index 000000000..48543326e --- /dev/null +++ b/how-to-use-azureml/deployment/spark/score.py @@ -0,0 +1,37 @@ +import traceback +from pyspark.ml.linalg import VectorUDT +from azureml.core.model import Model +from pyspark.ml.classification import LogisticRegressionModel +from pyspark.sql.types import StructType, StructField +from pyspark.sql.types import DoubleType +from pyspark.sql import SQLContext +from pyspark import SparkContext + +sc = SparkContext.getOrCreate() +sqlContext = SQLContext(sc) +spark = sqlContext.sparkSession + +input_schema = StructType([StructField("features", VectorUDT()), StructField("label", DoubleType())]) +reader = spark.read +reader.schema(input_schema) + + +def init(): + global model + # note here "iris.model" is the name of the model registered under the workspace + # this call should return the path to the model.pkl file on the local disk. + model_path = Model.get_model_path('iris.model') + # Load the model file back into a LogisticRegression model + model = LogisticRegressionModel.load(model_path) + + +def run(data): + try: + input_df = reader.json(sc.parallelize([data])) + result = model.transform(input_df) + # you can return any datatype as long as it is JSON-serializable + return result.collect()[0]['prediction'] + except Exception as e: + traceback.print_exc() + error = str(e) + return error diff --git a/how-to-use-azureml/deployment/tensorflow/tensorflow-model-register-and-deploy.ipynb b/how-to-use-azureml/deployment/tensorflow/tensorflow-model-register-and-deploy.ipynb index 10a33aa5a..7cdbcca4c 100644 --- a/how-to-use-azureml/deployment/tensorflow/tensorflow-model-register-and-deploy.ipynb +++ b/how-to-use-azureml/deployment/tensorflow/tensorflow-model-register-and-deploy.ipynb @@ -234,7 +234,7 @@ "metadata": { "authors": [ { - "name": "aashishb" + "name": "vaidyas" } ], "kernelspec": { diff --git a/how-to-use-azureml/explain-model/azure-integration/remote-explanation/explain-model-on-amlcompute.ipynb b/how-to-use-azureml/explain-model/azure-integration/remote-explanation/explain-model-on-amlcompute.ipynb index b444686b5..2cae8957d 100644 --- a/how-to-use-azureml/explain-model/azure-integration/remote-explanation/explain-model-on-amlcompute.ipynb +++ b/how-to-use-azureml/explain-model/azure-integration/remote-explanation/explain-model-on-amlcompute.ipynb @@ -687,12 +687,12 @@ "source": [ "## Next\n", "Learn about other use cases of the explain package on a:\n", - "1. [Training time: regression problem](../../tabular-data/explain-binary-classification-local.ipynb) \n", - "1. [Training time: binary classification problem](../../tabular-data/explain-binary-classification-local.ipynb)\n", - "1. [Training time: multiclass classification problem](../../tabular-data/explain-multiclass-classification-local.ipynb)\n", + "1. [Training time: regression problem](https://github.com/interpretml/interpret-community/blob/master/notebooks/explain-regression-local.ipynb) \n", + "1. [Training time: binary classification problem](https://github.com/interpretml/interpret-community/blob/master/notebooks/explain-binary-classification-local.ipynb)\n", + "1. [Training time: multiclass classification problem](https://github.com/interpretml/interpret-community/blob/master/notebooks/explain-multiclass-classification-local.ipynb)\n", "1. Explain models with engineered features:\n", - " 1. [Simple feature transformations](../../tabular-data/simple-feature-transformations-explain-local.ipynb)\n", - " 1. [Advanced feature transformations](../../tabular-data/advanced-feature-transformations-explain-local.ipynb)\n", + " 1. [Simple feature transformations](https://github.com/interpretml/interpret-community/blob/master/notebooks/simple-feature-transformations-explain-local.ipynb)\n", + " 1. [Advanced feature transformations](https://github.com/interpretml/interpret-community/blob/master/notebooks/advanced-feature-transformations-explain-local.ipynb)\n", "1. [Save model explanations via Azure Machine Learning Run History](../run-history/save-retrieve-explanations-run-history.ipynb)\n", "1. Inferencing time: deploy a classification model and explainer:\n", " 1. [Deploy a locally-trained model and explainer](../scoring-time/train-explain-model-locally-and-deploy.ipynb)\n", diff --git a/how-to-use-azureml/explain-model/azure-integration/remote-explanation/explain-model-on-amlcompute.yml b/how-to-use-azureml/explain-model/azure-integration/remote-explanation/explain-model-on-amlcompute.yml index c460bb421..7a4b22f77 100644 --- a/how-to-use-azureml/explain-model/azure-integration/remote-explanation/explain-model-on-amlcompute.yml +++ b/how-to-use-azureml/explain-model/azure-integration/remote-explanation/explain-model-on-amlcompute.yml @@ -2,8 +2,9 @@ name: explain-model-on-amlcompute dependencies: - pip: - azureml-sdk - - interpret - azureml-interpret + - interpret-community[visualization] + - matplotlib - azureml-contrib-interpret - sklearn-pandas - azureml-dataprep diff --git a/how-to-use-azureml/explain-model/azure-integration/run-history/save-retrieve-explanations-run-history.ipynb b/how-to-use-azureml/explain-model/azure-integration/run-history/save-retrieve-explanations-run-history.ipynb index 1343d51bc..60dca68de 100644 --- a/how-to-use-azureml/explain-model/azure-integration/run-history/save-retrieve-explanations-run-history.ipynb +++ b/how-to-use-azureml/explain-model/azure-integration/run-history/save-retrieve-explanations-run-history.ipynb @@ -582,12 +582,12 @@ "source": [ "## Next\n", "Learn about other use cases of the explain package on a:\n", - "1. [Training time: regression problem](../../tabular-data/explain-binary-classification-local.ipynb) \n", - "1. [Training time: binary classification problem](../../tabular-data/explain-binary-classification-local.ipynb)\n", - "1. [Training time: multiclass classification problem](../../tabular-data/explain-multiclass-classification-local.ipynb)\n", + "1. [Training time: regression problem](https://github.com/interpretml/interpret-community/blob/master/notebooks/explain-regression-local.ipynb) \n", + "1. [Training time: binary classification problem](https://github.com/interpretml/interpret-community/blob/master/notebooks/explain-binary-classification-local.ipynb)\n", + "1. [Training time: multiclass classification problem](https://github.com/interpretml/interpret-community/blob/master/notebooks/explain-multiclass-classification-local.ipynb)\n", "1. Explain models with engineered features:\n", - " 1. [Simple feature transformations](../../tabular-data/simple-feature-transformations-explain-local.ipynb)\n", - " 1. [Advanced feature transformations](../../tabular-data/advanced-feature-transformations-explain-local.ipynb)\n", + " 1. [Simple feature transformations](https://github.com/interpretml/interpret-community/blob/master/notebooks/simple-feature-transformations-explain-local.ipynb)\n", + " 1. [Advanced feature transformations](https://github.com/interpretml/interpret-community/blob/master/notebooks/advanced-feature-transformations-explain-local.ipynb)\n", "1. [Run explainers remotely on Azure Machine Learning Compute (AMLCompute)](../remote-explanation/explain-model-on-amlcompute.ipynb)\n", "1. Inferencing time: deploy a classification model and explainer:\n", " 1. [Deploy a locally-trained model and explainer](../scoring-time/train-explain-model-locally-and-deploy.ipynb)\n", diff --git a/how-to-use-azureml/explain-model/azure-integration/run-history/save-retrieve-explanations-run-history.yml b/how-to-use-azureml/explain-model/azure-integration/run-history/save-retrieve-explanations-run-history.yml index e6f99504d..ff76d75f3 100644 --- a/how-to-use-azureml/explain-model/azure-integration/run-history/save-retrieve-explanations-run-history.yml +++ b/how-to-use-azureml/explain-model/azure-integration/run-history/save-retrieve-explanations-run-history.yml @@ -2,7 +2,8 @@ name: save-retrieve-explanations-run-history dependencies: - pip: - azureml-sdk - - interpret - azureml-interpret + - interpret-community[visualization] + - matplotlib - azureml-contrib-interpret - ipywidgets diff --git a/how-to-use-azureml/explain-model/azure-integration/scoring-time/train-explain-model-locally-and-deploy.ipynb b/how-to-use-azureml/explain-model/azure-integration/scoring-time/train-explain-model-locally-and-deploy.ipynb index 2f1dde13d..7a1af0297 100644 --- a/how-to-use-azureml/explain-model/azure-integration/scoring-time/train-explain-model-locally-and-deploy.ipynb +++ b/how-to-use-azureml/explain-model/azure-integration/scoring-time/train-explain-model-locally-and-deploy.ipynb @@ -308,7 +308,9 @@ "source": [ "## Deploy \n", "\n", - "Deploy Model and ScoringExplainer" + "Deploy Model and ScoringExplainer.\n", + "\n", + "Please note that you must indicate azureml-defaults with verion >= 1.0.45 as a pip dependency, because it contains the functionality needed to host the model as a web service." ] }, { @@ -319,7 +321,7 @@ "source": [ "from azureml.core.conda_dependencies import CondaDependencies \n", "\n", - "# WARNING: to install this, g++ needs to be available on the Docker image and is not by default (look at the next cell)\n", + "# azureml-defaults is required to host the model as a web service.\n", "azureml_pip_packages = [\n", " 'azureml-defaults', 'azureml-contrib-interpret', 'azureml-core', 'azureml-telemetry',\n", " 'azureml-interpret'\n", @@ -338,16 +340,6 @@ " print(f.read())" ] }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%writefile dockerfile\n", - "RUN apt-get update && apt-get install -y g++ " - ] - }, { "cell_type": "code", "execution_count": null, @@ -369,6 +361,8 @@ "from azureml.core.model import InferenceConfig\n", "from azureml.core.webservice import AciWebservice\n", "from azureml.core.model import Model\n", + "from azureml.core.environment import Environment\n", + "\n", "\n", "aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, \n", " memory_gb=1, \n", @@ -376,10 +370,8 @@ " \"method\" : \"local_explanation\"}, \n", " description='Get local explanations for IBM Employee Attrition data')\n", "\n", - "inference_config = InferenceConfig(runtime= \"python\", \n", - " entry_script=\"score_local_explain.py\",\n", - " conda_file=\"myenv.yml\",\n", - " extra_docker_file_steps=\"dockerfile\")\n", + "myenv = Environment.from_conda_specification(name=\"myenv\", file_path=\"myenv.yml\")\n", + "inference_config = InferenceConfig(entry_script=\"score_local_explain.py\", environment=myenv)\n", "\n", "# Use configs and models generated above\n", "service = Model.deploy(ws, 'model-scoring-deploy-local', [scoring_explainer_model, original_model], inference_config, aciconfig)\n", @@ -453,12 +445,12 @@ "source": [ "## Next\n", "Learn about other use cases of the explain package on a:\n", - "1. [Training time: regression problem](../../tabular-data/explain-binary-classification-local.ipynb) \n", - "1. [Training time: binary classification problem](../../tabular-data/explain-binary-classification-local.ipynb)\n", - "1. [Training time: multiclass classification problem](../../tabular-data/explain-multiclass-classification-local.ipynb)\n", + "1. [Training time: regression problem](https://github.com/interpretml/interpret-community/blob/master/notebooks/explain-regression-local.ipynb) \n", + "1. [Training time: binary classification problem](https://github.com/interpretml/interpret-community/blob/master/notebooks/explain-binary-classification-local.ipynb)\n", + "1. [Training time: multiclass classification problem](https://github.com/interpretml/interpret-community/blob/master/notebooks/explain-multiclass-classification-local.ipynb)\n", "1. Explain models with engineered features:\n", - " 1. [Simple feature transformations](../../tabular-data/simple-feature-transformations-explain-local.ipynb)\n", - " 1. [Advanced feature transformations](../../tabular-data/advanced-feature-transformations-explain-local.ipynb)\n", + " 1. [Simple feature transformations](https://github.com/interpretml/interpret-community/blob/master/notebooks/simple-feature-transformations-explain-local.ipynb)\n", + " 1. [Advanced feature transformations](https://github.com/interpretml/interpret-community/blob/master/notebooks/advanced-feature-transformations-explain-local.ipynb)\n", "1. [Save model explanations via Azure Machine Learning Run History](../run-history/save-retrieve-explanations-run-history.ipynb)\n", "1. [Run explainers remotely on Azure Machine Learning Compute (AMLCompute)](../remote-explanation/explain-model-on-amlcompute.ipynb)\n", "1. [Inferencing time: deploy a remotely-trained model and explainer](./train-explain-model-on-amlcompute-and-deploy.ipynb)" diff --git a/how-to-use-azureml/explain-model/azure-integration/scoring-time/train-explain-model-locally-and-deploy.yml b/how-to-use-azureml/explain-model/azure-integration/scoring-time/train-explain-model-locally-and-deploy.yml index 000daa741..b7a4cd6fd 100644 --- a/how-to-use-azureml/explain-model/azure-integration/scoring-time/train-explain-model-locally-and-deploy.yml +++ b/how-to-use-azureml/explain-model/azure-integration/scoring-time/train-explain-model-locally-and-deploy.yml @@ -2,8 +2,9 @@ name: train-explain-model-locally-and-deploy dependencies: - pip: - azureml-sdk - - interpret - azureml-interpret + - interpret-community[visualization] + - matplotlib - azureml-contrib-interpret - sklearn-pandas - ipywidgets diff --git a/how-to-use-azureml/explain-model/azure-integration/scoring-time/train-explain-model-on-amlcompute-and-deploy.ipynb b/how-to-use-azureml/explain-model/azure-integration/scoring-time/train-explain-model-on-amlcompute-and-deploy.ipynb index 0f3f6fe2d..946eef5ca 100644 --- a/how-to-use-azureml/explain-model/azure-integration/scoring-time/train-explain-model-on-amlcompute-and-deploy.ipynb +++ b/how-to-use-azureml/explain-model/azure-integration/scoring-time/train-explain-model-on-amlcompute-and-deploy.ipynb @@ -409,16 +409,6 @@ " print(f.read())" ] }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%writefile dockerfile\n", - "RUN apt-get update && apt-get install -y g++ " - ] - }, { "cell_type": "code", "execution_count": null, @@ -439,6 +429,8 @@ "from azureml.core.model import InferenceConfig\n", "from azureml.core.webservice import AciWebservice\n", "from azureml.core.model import Model\n", + "from azureml.core.environment import Environment\n", + "\n", "\n", "aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, \n", " memory_gb=1, \n", @@ -446,10 +438,8 @@ " \"method\" : \"local_explanation\"}, \n", " description='Get local explanations for IBM Employee Attrition data')\n", "\n", - "inference_config = InferenceConfig(runtime= \"python\", \n", - " entry_script=\"score_remote_explain.py\",\n", - " conda_file=\"myenv.yml\",\n", - " extra_docker_file_steps=\"dockerfile\")\n", + "myenv = Environment.from_conda_specification(name=\"myenv\", file_path=\"myenv.yml\")\n", + "inference_config = InferenceConfig(entry_script=\"score_remote_explain.py\", environment=myenv)\n", "\n", "# Use configs and models generated above\n", "service = Model.deploy(ws, 'model-scoring-service', [scoring_explainer_model, original_model], inference_config, aciconfig)\n", @@ -493,16 +483,15 @@ "source": [ "## Next\n", "Learn about other use cases of the explain package on a:\n", - "1. [Training time: regression problem](../../tabular-data/explain-binary-classification-local.ipynb) \n", - "1. [Training time: binary classification problem](../../tabular-data/explain-binary-classification-local.ipynb)\n", - "1. [Training time: multiclass classification problem](../../tabular-data/explain-multiclass-classification-local.ipynb)\n", + "1. [Training time: regression problem](https://github.com/interpretml/interpret-community/blob/master/notebooks/explain-regression-local.ipynb) \n", + "1. [Training time: binary classification problem](https://github.com/interpretml/interpret-community/blob/master/notebooks/explain-binary-classification-local.ipynb)\n", + "1. [Training time: multiclass classification problem](https://github.com/interpretml/interpret-community/blob/master/notebooks/explain-multiclass-classification-local.ipynb)\n", "1. Explain models with engineered features:\n", - " 1. [Simple feature transformations](../../tabular-data/simple-feature-transformations-explain-local.ipynb)\n", - " 1. [Advanced feature transformations](../../tabular-data/advanced-feature-transformations-explain-local.ipynb)\n", + " 1. [Simple feature transformations](https://github.com/interpretml/interpret-community/blob/master/notebooks/simple-feature-transformations-explain-local.ipynb)\n", + " 1. [Advanced feature transformations](https://github.com/interpretml/interpret-community/blob/master/notebooks/advanced-feature-transformations-explain-local.ipynb)\n", "1. [Save model explanations via Azure Machine Learning Run History](../run-history/save-retrieve-explanations-run-history.ipynb)\n", "1. [Run explainers remotely on Azure Machine Learning Compute (AMLCompute)](../remote-explanation/explain-model-on-amlcompute.ipynb)\n", - "1. [Inferencing time: deploy a locally-trained model and explainer](./train-explain-model-locally-and-deploy.ipynb)\n", - " " + "1. [Inferencing time: deploy a locally-trained model and explainer](./train-explain-model-locally-and-deploy.ipynb)" ] }, { diff --git a/how-to-use-azureml/explain-model/azure-integration/scoring-time/train-explain-model-on-amlcompute-and-deploy.yml b/how-to-use-azureml/explain-model/azure-integration/scoring-time/train-explain-model-on-amlcompute-and-deploy.yml index 2905fd700..ab5f0f94b 100644 --- a/how-to-use-azureml/explain-model/azure-integration/scoring-time/train-explain-model-on-amlcompute-and-deploy.yml +++ b/how-to-use-azureml/explain-model/azure-integration/scoring-time/train-explain-model-on-amlcompute-and-deploy.yml @@ -2,8 +2,9 @@ name: train-explain-model-on-amlcompute-and-deploy dependencies: - pip: - azureml-sdk - - interpret - azureml-interpret + - interpret-community[visualization] + - matplotlib - azureml-contrib-interpret - sklearn-pandas - azureml-dataprep diff --git a/how-to-use-azureml/machine-learning-pipelines/README.md b/how-to-use-azureml/machine-learning-pipelines/README.md index 094b21a1e..c13065fb8 100644 --- a/how-to-use-azureml/machine-learning-pipelines/README.md +++ b/how-to-use-azureml/machine-learning-pipelines/README.md @@ -42,6 +42,8 @@ In this directory, there are two types of notebooks: 1. [pipeline-batch-scoring.ipynb](https://aka.ms/pl-batch-score): This notebook demonstrates how to run a batch scoring job using Azure Machine Learning pipelines. 2. [pipeline-style-transfer.ipynb](https://aka.ms/pl-style-trans): This notebook demonstrates a multi-step pipeline that uses GPU compute. This sample also showcases how to use conda dependencies using runconfig when using Pipelines. -3. [nyc-taxi-data-regression-model-building.ipynb](https://aka.ms/pl-nyctaxi-tutorial): This notebook is an AzureML Pipelines version of the previously published two part sample. +3. [nyc-taxi-data-regression-model-building.ipynb](https://aka.ms/pl-nyctaxi-tutorial): This notebook is an AzureML Pipelines version of the previously published two part sample. +4. [file-dataset-image-inference-mnist.ipynb](https://aka.ms/pl-pr-filedata): This notebook demonstrates how to use ParallelRunStep to process unstructured data (file dataset). +5. [tabular-dataset-inference-iris.ipynb](https://aka.ms/pl-pr-tabulardata): This notebook demonstrates how to use ParallelRunStep to process structured data (tabular dataset). ![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/machine-learning-pipelines/README.png) diff --git a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/README.md b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/README.md index 09eab61c6..c0fae5ac2 100644 --- a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/README.md +++ b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/README.md @@ -18,5 +18,6 @@ These notebooks below are designed to go in sequence. 13. [aml-pipelines-showcasing-datapath-and-pipelineparameter.ipynb](https://aka.ms/pl-datapath): This notebook showcases how to use DataPath and PipelineParameter in AML Pipeline. 14. [aml-pipelines-how-to-use-pipeline-drafts.ipynb](http://aka.ms/pl-pl-draft): This notebook shows how to use Pipeline Drafts. Pipeline Drafts are mutable pipelines which can be used to submit runs and create Published Pipelines. 15. [aml-pipelines-hot-to-use-modulestep.ipynb](https://aka.ms/pl-modulestep): This notebook shows how to define Module, ModuleVersion and how to use them in an AML Pipeline using ModuleStep. +16. [aml-pipelines-with-notebook-runner-step.ipynb](https://aka.ms/pl-nbrstep): This notebook shows how you can run another notebook as a step in Azure Machine Learning Pipeline. ![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/README.png) diff --git a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-parameter-tuning-with-hyperdrive.ipynb b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-parameter-tuning-with-hyperdrive.ipynb index 93b71001e..1193163cb 100644 --- a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-parameter-tuning-with-hyperdrive.ipynb +++ b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-parameter-tuning-with-hyperdrive.ipynb @@ -246,7 +246,7 @@ "metadata": {}, "source": [ "## Create TensorFlow estimator\n", - "Next, we construct an [TensorFlow](https://docs.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.dnn.tensorflow?view=azure-ml-py) estimator object.\n", + "Next, we construct an [TensorFlow](https://docs.microsoft.com/python/api/azureml-train-core/azureml.train.dnn.tensorflow?view=azure-ml-py) estimator object.\n", "The TensorFlow estimator is providing a simple way of launching a TensorFlow training job on a compute target. It will automatically provide a docker image that has TensorFlow installed -- if additional pip or conda packages are required, their names can be passed in via the `pip_packages` and `conda_packages` arguments and they will be included in the resulting docker.\n", "\n", "The TensorFlow estimator also takes a `framework_version` parameter -- if no version is provided, the estimator will default to the latest version supported by AzureML. Use `TensorFlow.get_supported_versions()` to get a list of all versions supported by your current SDK version or see the [SDK documentation](https://docs.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.dnn?view=azure-ml-py) for the versions supported in the most current release.\n", @@ -385,7 +385,7 @@ "outputs": [], "source": [ "metrics_output_name = 'metrics_output'\n", - "metirics_data = PipelineData(name='metrics_data',\n", + "metrics_data = PipelineData(name='metrics_data',\n", " datastore=ds,\n", " pipeline_output_name=metrics_output_name)\n", "\n", @@ -395,7 +395,7 @@ " hyperdrive_config=hd_config,\n", " estimator_entry_script_arguments=['--data-folder', data_folder],\n", " inputs=[data_folder],\n", - " metrics_output=metirics_data)" + " metrics_output=metrics_data)" ] }, { @@ -620,14 +620,13 @@ "outputs": [], "source": [ "%%time\n", - "from azureml.core.model import InferenceConfig\n", + "from azureml.core.environment import Environment\n", + "from azureml.core.model import Model, InferenceConfig\n", "from azureml.core.webservice import AciWebservice\n", - "from azureml.core.webservice import Webservice\n", - "from azureml.core.model import Model\n", "\n", - "inference_config = InferenceConfig(runtime = \"python\", \n", - " entry_script = \"score.py\",\n", - " conda_file = \"myenv.yml\")\n", + "\n", + "myenv = Environment.from_conda_specification(name=\"env\", file_path=\"myenv.yml\")\n", + "inference_config = InferenceConfig(entry_script=\"score.py\", environment=myenv)\n", "\n", "aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, \n", " memory_gb=1, \n", diff --git a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-setup-schedule-for-a-published-pipeline.ipynb b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-setup-schedule-for-a-published-pipeline.ipynb index c637910cd..d16538747 100644 --- a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-setup-schedule-for-a-published-pipeline.ipynb +++ b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-setup-schedule-for-a-published-pipeline.ipynb @@ -180,7 +180,7 @@ "# just get the published pipeline object that you have the ID for.\n", "\n", "# Get all published pipeline objects in the workspace\n", - "all_pub_pipelines = PublishedPipeline.get_all(ws)\n", + "all_pub_pipelines = PublishedPipeline.list(ws)\n", "\n", "# We will iterate through the list of published pipelines and \n", "# use the last ID in the list for Schelue operations: \n", @@ -244,7 +244,7 @@ "metadata": {}, "outputs": [], "source": [ - "schedules = Schedule.get_all(ws, pipeline_id=pub_pipeline_id)\n", + "schedules = Schedule.list(ws, pipeline_id=pub_pipeline_id)\n", "\n", "# We will iterate through the list of schedules and \n", "# use the last recurrence schedule in the list for further operations: \n", @@ -272,7 +272,7 @@ "outputs": [], "source": [ "# Use active_only=False to get all schedules including disabled schedules\n", - "schedules = Schedule.get_all(ws, active_only=True) \n", + "schedules = Schedule.list(ws, active_only=True) \n", "print(\"Your workspace has the following schedules set up:\")\n", "for schedule in schedules:\n", " print(\"{} (Published pipeline: {}\".format(schedule.id, schedule.pipeline_id))" @@ -378,7 +378,8 @@ "### Create a schedule for the pipeline using a Datastore\n", "This schedule will run when additions or modifications are made to Blobs in the Datastore.\n", "By default, the Datastore container is monitored for changes. Use the path_on_datastore parameter to instead specify a path on the Datastore to monitor for changes. Note: the path_on_datastore will be under the container for the datastore, so the actual path monitored will be container/path_on_datastore. Changes made to subfolders in the container/path will not trigger the schedule.\n", - "Note: Only Blob Datastores are supported." + "Note: Only Blob Datastores are supported.\n", + "Note: Not supported for CMK workspaces. Please review these [instructions](https://docs.microsoft.com/azure/machine-learning/how-to-trigger-published-pipeline) in order to setup a blob trigger submission schedule with CMK enabled. Also see those instructions to bring your own LogicApp to avoid the schedule triggers per month limit." ] }, { diff --git a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-setup-versioned-pipeline-endpoints.ipynb b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-setup-versioned-pipeline-endpoints.ipynb index 54c2c2044..4fc2d9cce 100644 --- a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-setup-versioned-pipeline-endpoints.ipynb +++ b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-setup-versioned-pipeline-endpoints.ipynb @@ -230,7 +230,7 @@ "metadata": {}, "outputs": [], "source": [ - "endpoint_list = PipelineEndpoint.get_all(workspace=ws, active_only=True)\n", + "endpoint_list = PipelineEndpoint.list(workspace=ws, active_only=True)\n", "endpoint_list" ] }, @@ -360,7 +360,7 @@ "metadata": {}, "outputs": [], "source": [ - "versions = pipeline_endpoint_by_name.get_all_versions()\n", + "versions = pipeline_endpoint_by_name.list_versions()\n", "\n", "for ve in versions:\n", " print(ve.version)\n", @@ -381,7 +381,7 @@ "metadata": {}, "outputs": [], "source": [ - "pipelines = pipeline_endpoint_by_name.get_all_pipelines(active_only=True)\n", + "pipelines = pipeline_endpoint_by_name.list_pipelines(active_only=True)\n", "pipelines" ] }, diff --git a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-with-automated-machine-learning-step.ipynb b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-with-automated-machine-learning-step.ipynb index 44ecdfb36..34a2a723e 100644 --- a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-with-automated-machine-learning-step.ipynb +++ b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-with-automated-machine-learning-step.ipynb @@ -76,7 +76,7 @@ "from azureml.core.runconfig import RunConfiguration\n", "from azureml.core.conda_dependencies import CondaDependencies\n", "\n", - "from azureml.train.automl.runtime import AutoMLStep\n", + "from azureml.pipeline.steps import AutoMLStep\n", "\n", "# Check core SDK version number\n", "print(\"SDK version:\", azureml.core.VERSION)" @@ -105,7 +105,7 @@ "metadata": {}, "source": [ "## Create an Azure ML experiment\n", - "Let's create an experiment named \"automl-classification\" and a folder to hold the training scripts. The script runs will be recorded under the experiment in Azure.\n", + "Let's create an experiment named \"automlstep-classification\" and a folder to hold the training scripts. The script runs will be recorded under the experiment in Azure.\n", "\n", "The best practice is to use separate folders for scripts and its dependent files for each step and specify that folder as the `source_directory` for the step. This helps reduce the size of the snapshot created for the step (only the specific folder is snapshotted). Since changes in any files in the `source_directory` would trigger a re-upload of the snapshot, this helps keep the reuse of the step when there are no changes in the `source_directory` of the step." ] @@ -165,25 +165,6 @@ " # For a more detailed view of current AmlCompute status, use get_status()." ] }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# create a new RunConfig object\n", - "conda_run_config = RunConfiguration(framework=\"python\")\n", - "\n", - "conda_run_config.environment.docker.enabled = True\n", - "conda_run_config.environment.docker.base_image = azureml.core.runconfig.DEFAULT_CPU_IMAGE\n", - "\n", - "cd = CondaDependencies.create(pip_packages=['azureml-sdk[automl]'], \n", - " conda_packages=['numpy', 'py-xgboost<=0.80'])\n", - "conda_run_config.environment.python.conda_dependencies = cd\n", - "\n", - "print('run config is ready')" - ] - }, { "cell_type": "markdown", "metadata": {}, @@ -197,19 +178,30 @@ "metadata": {}, "outputs": [], "source": [ - "# The data referenced here was a 1MB simple random sample of the Chicago Crime data into a local temporary directory.\n", - "example_data = 'https://dprepdata.blob.core.windows.net/demo/crime0-random.csv'\n", - "dataset = Dataset.Tabular.from_delimited_files(example_data)\n", - "dataset.to_pandas_dataframe().describe()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "dataset.take(5).to_pandas_dataframe()" + "# Try to load the dataset from the Workspace. Otherwise, create it from the file\n", + "found = False\n", + "key = \"Crime-Dataset\"\n", + "description_text = \"Crime Dataset (used in the the aml-pipelines-with-automated-machine-learning-step.ipynb notebook)\"\n", + "\n", + "if key in ws.datasets.keys(): \n", + " found = True\n", + " dataset = ws.datasets[key] \n", + "\n", + "if not found:\n", + " # Create AML Dataset and register it into Workspace\n", + " # The data referenced here was a 1MB simple random sample of the Chicago Crime data into a local temporary directory.\n", + " example_data = 'https://dprepdata.blob.core.windows.net/demo/crime0-random.csv'\n", + " dataset = Dataset.Tabular.from_delimited_files(example_data)\n", + " dataset = dataset.drop_columns(['FBI Code'])\n", + " \n", + " #Register Dataset in Workspace\n", + " dataset = dataset.register(workspace=ws,\n", + " name=key,\n", + " description=description_text)\n", + "\n", + "\n", + "df = dataset.to_pandas_dataframe()\n", + "df.describe()" ] }, { @@ -229,9 +221,7 @@ "metadata": {}, "outputs": [], "source": [ - "X = dataset.drop_columns(columns=['Primary Type', 'FBI Code'])\n", - "y = dataset.keep_columns(columns=['Primary Type'], validate=True)\n", - "print('X and y are ready!')" + "dataset.take(5).to_pandas_dataframe()" ] }, { @@ -249,19 +239,18 @@ "outputs": [], "source": [ "automl_settings = {\n", - " \"iteration_timeout_minutes\" : 5,\n", - " \"iterations\" : 2,\n", - " \"primary_metric\" : 'AUC_weighted',\n", - " \"preprocess\" : True,\n", - " \"verbosity\" : logging.INFO\n", + " \"experiment_timeout_minutes\": 20,\n", + " \"max_concurrent_iterations\": 4,\n", + " \"primary_metric\" : 'AUC_weighted'\n", "}\n", - "automl_config = AutoMLConfig(task = 'classification',\n", - " debug_log = 'automl_errors.log',\n", + "automl_config = AutoMLConfig(compute_target=compute_target,\n", + " task = \"classification\",\n", + " training_data=dataset,\n", + " label_column_name=\"Primary Type\", \n", " path = project_folder,\n", - " compute_target=compute_target,\n", - " run_configuration=conda_run_config,\n", - " X = X,\n", - " y = y,\n", + " enable_early_stopping= True,\n", + " featurization= 'auto',\n", + " debug_log = \"automl_errors.log\",\n", " **automl_settings\n", " )" ] @@ -270,6 +259,8 @@ "cell_type": "markdown", "metadata": {}, "source": [ + "#### Create Pipeline and AutoMLStep\n", + "\n", "You can define outputs for the AutoMLStep using TrainingOutput." ] }, @@ -285,7 +276,7 @@ "metrics_output_name = 'metrics_output'\n", "best_model_output_name = 'best_model_output'\n", "\n", - "metirics_data = PipelineData(name='metrics_data',\n", + "metrics_data = PipelineData(name='metrics_data',\n", " datastore=ds,\n", " pipeline_output_name=metrics_output_name,\n", " training_output=TrainingOutput(type='Metrics'))\n", @@ -305,20 +296,28 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [ + "automlstep-remarks-sample1" + ] + }, "outputs": [], "source": [ "automl_step = AutoMLStep(\n", " name='automl_module',\n", " automl_config=automl_config,\n", - " outputs=[metirics_data, model_data],\n", + " outputs=[metrics_data, model_data],\n", " allow_reuse=True)" ] }, { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [ + "automlstep-remarks-sample2" + ] + }, "outputs": [], "source": [ "from azureml.pipeline.core import Pipeline\n", @@ -383,8 +382,8 @@ "outputs": [], "source": [ "import json\n", - "with open(metrics_output._path_on_datastore) as f: \n", - " metrics_output_result = f.read()\n", + "with open(metrics_output._path_on_datastore) as f:\n", + " metrics_output_result = f.read()\n", " \n", "deserialized_metrics_output = json.loads(metrics_output_result)\n", "df = pd.DataFrame(deserialized_metrics_output)\n", @@ -404,6 +403,7 @@ "metadata": {}, "outputs": [], "source": [ + "# Retrieve best model from Pipeline Run\n", "best_model_output = pipeline_run.get_pipeline_output(best_model_output_name)\n", "num_file_downloaded = best_model_output.download('.', show_progress=True)" ] @@ -421,6 +421,15 @@ "best_model" ] }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "best_model.steps" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -436,11 +445,11 @@ "metadata": {}, "outputs": [], "source": [ - "dataset = Dataset.Tabular.from_delimited_files(path='https://dprepdata.blob.core.windows.net/demo/crime0-test.csv')\n", + "dataset_test = Dataset.Tabular.from_delimited_files(path='https://dprepdata.blob.core.windows.net/demo/crime0-test.csv')\n", "df_test = dataset_test.to_pandas_dataframe()\n", - "df_test = df_test[pd.notnull(df['Primary Type'])]\n", + "df_test = df_test[pd.notnull(df_test['Primary Type'])]\n", "\n", - "y_test = df_test[['Primary Type']]\n", + "y_test = df_test['Primary Type']\n", "X_test = df_test.drop(['Primary Type', 'FBI Code'], axis=1)" ] }, @@ -459,15 +468,19 @@ "metadata": {}, "outputs": [], "source": [ - "from pandas_ml import ConfusionMatrix\n", - "\n", + "from sklearn.metrics import confusion_matrix\n", "ypred = best_model.predict(X_test)\n", - "\n", - "cm = ConfusionMatrix(y_test['Primary Type'], ypred)\n", - "\n", - "print(cm)\n", - "\n", - "cm.plot()" + "cm = confusion_matrix(y_test, ypred)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Visualize the confusion matrix\n", + "pd.DataFrame(cm).style.background_gradient(cmap='Blues', low=0, high=0.9)" ] } ], diff --git a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-with-notebook-runner-step.ipynb b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-with-notebook-runner-step.ipynb new file mode 100644 index 000000000..764cbee61 --- /dev/null +++ b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-with-notebook-runner-step.ipynb @@ -0,0 +1,432 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved. \n", + "Licensed under the MIT License." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-getting-started.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Azure Machine Learning Pipeline with NotebookRunnerStep\n", + "This notebook demonstrates the use of `NotebookRunnerStep`. It allows you to run a local notebook as a step in Azure Machine Learning Pipeline." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Introduction\n", + "In this example we showcase how you can run another notebook `notebook_runner/training_notebook.ipynb` as a step in Azure Machine Learning Pipeline.\n", + "\n", + "If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you have executed the [configuration](https://aka.ms/pl-config) before running this notebook.\n", + "\n", + "In this notebook you will learn how to:\n", + "1. Create an `Experiment` in an existing `Workspace`.\n", + "2. Create or Attach existing AmlCompute to a workspace.\n", + "3. Configure NotebookRun using `NotebokRunConfig`.\n", + "5. Use NotebookRunnerStep.\n", + "6. Run the notebook on `AmlCompute` as a pipeline step consuming the output of a python script step.\n", + "\n", + "Advantages of running your notebook as a step in pipeline:\n", + "1. Run your notebook like a python script without converting into .py files, leveraging complete end to end experience of Azure Machine Learning Pipelines.\n", + "2. Use pipeline intermediate data to and from the notebook along with other steps in pipeline.\n", + "3. Parameterize your notebook with [Pipeline Parameters](./aml-pipelines-publish-and-run-using-rest-endpoint.ipynb).\n", + "\n", + "Try some more [quick start notebooks](https://github.com/microsoft/recommenders/tree/master/notebooks/00_quick_start) with `NotebookRunnerStep`." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Azure Machine Learning and Pipeline SDK-specific imports" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "\n", + "import azureml.core\n", + "\n", + "from azureml.core.compute import AmlCompute, ComputeTarget\n", + "from azureml.core.runconfig import RunConfiguration\n", + "from azureml.data.data_reference import DataReference\n", + "from azureml.pipeline.core import PipelineData\n", + "from azureml.core.datastore import Datastore\n", + "\n", + "from azureml.core import Workspace, Experiment\n", + "from azureml.contrib.notebook import NotebookRunConfig, AzureMLNotebookHandler\n", + "\n", + "from azureml.pipeline.core import Pipeline\n", + "from azureml.pipeline.steps import PythonScriptStep\n", + "from azureml.contrib.notebook import NotebookRunnerStep\n", + "\n", + "# Check core SDK version number\n", + "print(\"SDK version:\", azureml.core.VERSION)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Initialize Workspace\n", + "\n", + "Initialize a [workspace](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.workspace(class%29) object from persisted configuration." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "ws = Workspace.from_config()\n", + "print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')\n", + "ws.set_default_datastore(\"workspaceblobstore\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Upload data to datastore" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "Datastore.get(ws, \"workspaceblobstore\").upload_files([\"./20news.pkl\"], target_path=\"20newsgroups\", overwrite=True)\n", + "print(\"Upload call completed\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create an Azure ML experiment\n", + "Let's create an experiment named \"notebook-step-run-example\" and a folder to holding the notebook and other scripts. The script runs will be recorded under the experiment in Azure.\n", + "\n", + "The best practice is to use separate folders for scripts and its dependent files for each step and specify that folder as the `source_directory` for the step. This helps reduce the size of the snapshot created for the step (only the specific folder is snapshotted). Since changes in any files in the `source_directory` would trigger a re-upload of the snapshot, this helps keep the reuse of the step when there are no changes in the `source_directory` of the step." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Choose a name for the run history container in the workspace.\n", + "experiment_name = 'notebook-step-run-example'\n", + "source_directory = 'notebook_runner'\n", + "\n", + "experiment = Experiment(ws, experiment_name)\n", + "experiment" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create or Attach an AmlCompute cluster\n", + "You will need to create a [compute target](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.computetarget?view=azure-ml-py) for your remote run. In this tutorial, you get the default `AmlCompute` as your training compute resource." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Choose a name for your cluster.\n", + "amlcompute_cluster_name = \"cpu-cluster\"\n", + "\n", + "found = False\n", + "# Check if this compute target already exists in the workspace.\n", + "cts = ws.compute_targets\n", + "if amlcompute_cluster_name in cts and cts[amlcompute_cluster_name].type == 'AmlCompute':\n", + " found = True\n", + " print('Found existing compute target.')\n", + " compute_target = cts[amlcompute_cluster_name]\n", + " \n", + "if not found:\n", + " print('Creating a new compute target...')\n", + " provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\", # for GPU, use \"STANDARD_NC6\"\n", + " #vm_priority = 'lowpriority', # optional\n", + " max_nodes = 4)\n", + "\n", + " # Create the cluster.\n", + " compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n", + " \n", + " # Can poll for a minimum number of nodes and for a specific timeout.\n", + " # If no min_node_count is provided, it will use the scale settings for the cluster.\n", + " compute_target.wait_for_completion(show_output = True, min_node_count = 1, timeout_in_minutes = 10)\n", + " \n", + " # For a more detailed view of current AmlCompute status, use get_status()." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create a new RunConfig object" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.conda_dependencies import CondaDependencies\n", + "\n", + "conda_run_config = RunConfiguration(framework=\"python\")\n", + "\n", + "conda_run_config.environment.docker.enabled = True\n", + "conda_run_config.environment.docker.base_image = azureml.core.runconfig.DEFAULT_CPU_IMAGE\n", + "\n", + "cd = CondaDependencies.create(pip_packages=['azureml-sdk'])\n", + "conda_run_config.environment.python.conda_dependencies = cd\n", + "\n", + "print('run config is ready')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Define input and outputs" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "input_data = DataReference(\n", + " datastore=Datastore.get(ws, \"workspaceblobstore\"),\n", + " data_reference_name=\"blob_test_data\",\n", + " path_on_datastore=\"20newsgroups/20news.pkl\")\n", + "\n", + "output_data = PipelineData(name=\"processed_data\",\n", + " datastore=Datastore.get(ws, \"workspaceblobstore\"))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create notebook run configuration and set parameters values" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "handler = AzureMLNotebookHandler(timeout=600, progress_bar=False, log_output=True)\n", + "\n", + "cfg = NotebookRunConfig(source_directory=source_directory, notebook=\"training_notebook.ipynb\",\n", + " handler = handler,\n", + " parameters={\"arg1\": \"Machine Learning\"},\n", + " run_config=conda_run_config)\n", + "\n", + "print(\"Notebook Run Config is created.\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Define PythonScriptStep" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print('Source directory for the step is {}.'.format(os.path.realpath('./train')))\n", + "python_script_step = PythonScriptStep(\n", + " script_name=\"train.py\",\n", + " arguments=[\"--input_data\", input_data],\n", + " inputs=[input_data],\n", + " outputs=[output_data],\n", + " compute_target=compute_target, \n", + " source_directory=\"./train\",\n", + " allow_reuse=True)\n", + "print(\"python_script_step created\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Define NotebookRunnerStep\n", + "\n", + "This step will consume intermediate output produced by `python_script_step` as an input.\n", + "\n", + "Optionally, a output of type `output_notebook_pipeline_data_name` can be added to the `NotebookRunnerStep` to redirect the `output_notebook` of notebook run to `NotebookRunnerStep`'s step output produced as `PipelineData` and can be further passed along the pipeline." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.pipeline.core import PipelineParameter\n", + "\n", + "output_from_notebook = PipelineData(name=\"notebook_processed_data\",\n", + " datastore=Datastore.get(ws, \"workspaceblobstore\"))\n", + "\n", + "my_pipeline_param = PipelineParameter(name=\"pipeline_param\", default_value=\"my_param\")\n", + "\n", + "print('Source directory for the step is {}.'.format(os.path.realpath(source_directory)))\n", + "notebook_runner_step = NotebookRunnerStep(name=\"training_notebook_step\",\n", + " notebook_run_config=cfg,\n", + " params={\"my_pipeline_param\": my_pipeline_param},\n", + " inputs=[output_data],\n", + " outputs=[output_from_notebook],\n", + " allow_reuse=True,\n", + " compute_target=compute_target,\n", + " output_notebook_pipeline_data_name=\"notebook_result\")\n", + "\n", + "print(\"Notebook Runner Step is Created.\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Build Pipeline\n", + "\n", + "Once we have the steps (or steps collection), we can build the [pipeline](https://docs.microsoft.com/en-us/python/api/azureml-pipeline-core/azureml.pipeline.core.pipeline.pipeline?view=azure-ml-py). By deafult, all these steps will run in **parallel** once we submit the pipeline for run.\n", + "\n", + "A pipeline is created with a list of steps and a workspace. Submit a pipeline using [submit](https://docs.microsoft.com/en-us/python/api/azureml-pipeline-core/azureml.pipeline.core.pipeline.pipeline?view=azure-ml-py#submit-experiment-name--pipeline-parameters-none--continue-on-step-failure-false--regenerate-outputs-false--parent-run-id-none----kwargs-). When submit is called, a [PipelineRun](https://docs.microsoft.com/en-us/python/api/azureml-pipeline-core/azureml.pipeline.core.pipelinerun?view=azure-ml-py) is created which in turn creates [StepRun](https://docs.microsoft.com/en-us/python/api/azureml-pipeline-core/azureml.pipeline.core.steprun?view=azure-ml-py) objects for each step in the workflow." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "pipeline1 = Pipeline(workspace=ws, steps=[notebook_runner_step])\n", + "print(\"Pipeline creation complete\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "pipeline_run1 = experiment.submit(pipeline1)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.widgets import RunDetails\n", + "RunDetails(pipeline_run1).show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Download output notebook\n", + "\n", + "`output_notebook` can be retrieved via pipeline step output if `output_notebook_pipeline_data_name` is provided to the `NotebookRunnerStep`" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "pipeline_run1.wait_for_completion()\n", + "train_step = pipeline_run1.find_step_run('training_notebook_step') # Retrieve the step runs by name `train.py`\n", + "\n", + "if train_step:\n", + " train_step_obj = train_step[0] # since we have only one step by name `training_notebook_step`\n", + " train_step_obj.get_output_data('notebook_result').download(source_directory) # download the output to source_directory" + ] + } + ], + "metadata": { + "authors": [ + { + "name": "sanpil" + } + ], + "category": "tutorial", + "compute": [ + "AML Compute" + ], + "datasets": [ + "Custom" + ], + "deployment": [ + "None" + ], + "exclude_from_index": false, + "framework": [ + "Azure ML" + ], + "friendly_name": "How to use run a notebook as a step in AML Pipelines", + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.7" + }, + "order_index": 12, + "star_tag": [ + "None" + ], + "tags": [ + "None" + ], + "task": "Demonstrates the use of NotebookRunnerStep" + }, + "nbformat": 4, + "nbformat_minor": 2 +} \ No newline at end of file diff --git a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-with-notebook-runner-step.yml b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-with-notebook-runner-step.yml new file mode 100644 index 000000000..3585659d6 --- /dev/null +++ b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-with-notebook-runner-step.yml @@ -0,0 +1,6 @@ +name: aml-pipelines-with-notebook-runner-step +dependencies: +- pip: + - azureml-sdk + - azureml-widgets + - azureml-contrib-notebook diff --git a/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/notebook_runner/training_notebook.ipynb b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/notebook_runner/training_notebook.ipynb new file mode 100644 index 000000000..db234669f --- /dev/null +++ b/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/notebook_runner/training_notebook.ipynb @@ -0,0 +1,106 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved. \n", + "Licensed under the MIT License." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/notebook_runner/training_notebook.png)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(\"In training_notebook.ipynb\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "tags": [ + "parameters" + ] + }, + "outputs": [], + "source": [ + "# declaring parameters to override\n", + "\n", + "arg1 = \"Azure\"\n", + "processed_data = None\n", + "notebook_processed_data = None\n", + "my_pipeline_param = None" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Final parameter values\n", + "\n", + "print(\"arg1: %s\" % arg1)\n", + "print(\"input from previous step: %s\" % processed_data)\n", + "print(\"output from notebook: %s\" % notebook_processed_data)\n", + "print(\"pipeline_parameter: %s\" % my_pipeline_param)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "if not (notebook_processed_data is None):\n", + " os.makedirs(notebook_processed_data, exist_ok=True)\n", + " print(\"%s created\" % notebook_processed_data)" + ] + } + ], + "metadata": { + "authors": [ + { + "name": "sanpil" + } + ], + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.7" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} \ No newline at end of file diff --git a/how-to-use-azureml/machine-learning-pipelines/nyc-taxi-data-regression-model-building/nyc-taxi-data-regression-model-building.ipynb b/how-to-use-azureml/machine-learning-pipelines/nyc-taxi-data-regression-model-building/nyc-taxi-data-regression-model-building.ipynb index 727de8865..aace1bf8a 100644 --- a/how-to-use-azureml/machine-learning-pipelines/nyc-taxi-data-regression-model-building/nyc-taxi-data-regression-model-building.ipynb +++ b/how-to-use-azureml/machine-learning-pipelines/nyc-taxi-data-regression-model-building/nyc-taxi-data-regression-model-building.ipynb @@ -16,16 +16,12 @@ "\n", "You can combine the two part tutorial into one using AzureML Pipelines as Pipelines provide a way to stitch together various steps involved (like data preparation and training in this case) in a machine learning workflow.\n", "\n", - "In this notebook, you learn how to prepare data for regression modeling by using the [Azure Machine Learning Data Prep SDK](https://aka.ms/data-prep-sdk) for Python. You run various transformations to filter and combine two different NYC taxi data sets. Once you prepare the NYC taxi data for regression modeling, then you will use [AutoMLStep](https://docs.microsoft.com/en-us/python/api/azureml-train-automl/azureml.train.automl.automlstep?view=azure-ml-py) available with [Azure Machine Learning Pipelines](https://aka.ms/aml-pipelines) to define your machine learning goals and constraints as well as to launch the automated machine learning process. The automated machine learning technique iterates over many combinations of algorithms and hyperparameters until it finds the best model based on your criterion.\n", + "In this notebook, you learn how to prepare data for regression modeling by using open source library [pandas](https://pandas.pydata.org/). You run various transformations to filter and combine two different NYC taxi datasets. Once you prepare the NYC taxi data for regression modeling, then you will use [AutoMLStep](https://docs.microsoft.com/python/api/azureml-train-automl-runtime/azureml.train.automl.runtime.automl_step.automlstep?view=azure-ml-py) available with [Azure Machine Learning Pipelines](https://aka.ms/aml-pipelines) to define your machine learning goals and constraints as well as to launch the automated machine learning process. The automated machine learning technique iterates over many combinations of algorithms and hyperparameters until it finds the best model based on your criterion.\n", "\n", "After you complete building the model, you can predict the cost of a taxi trip by training a model on data features. These features include the pickup day and time, the number of passengers, and the pickup location.\n", "\n", "## Prerequisite\n", - "If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the configuration Notebook located at https://github.com/Azure/MachineLearningNotebooks first if you haven't. This sets you up with a working config file that has information on your workspace, subscription id, etc.\n", - "\n", - "We will run various transformations to filter and combine two different NYC taxi data sets. We will use DataPrep SDK for this preparing data. \n", - "\n", - "Perform `pip install azureml-dataprep` if you have't already done so." + "If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the configuration Notebook located at https://github.com/Azure/MachineLearningNotebooks first if you haven't. This sets you up with a working config file that has information on your workspace, subscription id, etc." ] }, { @@ -108,7 +104,6 @@ "metadata": {}, "outputs": [], "source": [ - "import azureml.dataprep as dprep\n", "from IPython.display import display\n", "\n", "display(green_df_raw.head(5))\n", @@ -144,8 +139,8 @@ "if not os.path.exists(yelloDir):\n", " os.mkdir(yelloDir)\n", " \n", - "greenTaxiData = greenDir + \"/part-00000\"\n", - "yellowTaxiData = yelloDir + \"/part-00000\"\n", + "greenTaxiData = greenDir + \"/unprepared.parquet\"\n", + "yellowTaxiData = yelloDir + \"/unprepared.parquet\"\n", "\n", "green_df_raw.to_csv(greenTaxiData, index=False)\n", "yellow_df_raw.to_csv(yellowTaxiData, index=False)\n", @@ -169,17 +164,54 @@ "\n", "default_store.upload_files([greenTaxiData], \n", " target_path = 'green', \n", - " overwrite = False, \n", + " overwrite = True, \n", " show_progress = True)\n", "\n", "default_store.upload_files([yellowTaxiData], \n", " target_path = 'yellow', \n", - " overwrite = False, \n", + " overwrite = True, \n", " show_progress = True)\n", "\n", "print(\"Upload calls completed.\")" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create and register datasets\n", + "\n", + "By creating a dataset, you create a reference to the data source location. If you applied any subsetting transformations to the dataset, they will be stored in the dataset as well. You can learn more about the what subsetting capabilities are supported by referring to [our documentation](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.data.tabular_dataset.tabulardataset?view=azure-ml-py#remarks). The data remains in its existing location, so no extra storage cost is incurred." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core import Dataset\n", + "green_taxi_data = Dataset.Tabular.from_delimited_files(default_store.path('green/unprepared.parquet'))\n", + "yellow_taxi_data = Dataset.Tabular.from_delimited_files(default_store.path('yellow/unprepared.parquet'))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Register the taxi datasets with the workspace so that you can reuse them in other experiments or share with your colleagues who have access to your workspace." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "green_taxi_data = green_taxi_data.register(ws, 'green_taxi_data')\n", + "yellow_taxi_data = yellow_taxi_data.register(ws, 'yellow_taxi_data')" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -194,20 +226,22 @@ "metadata": {}, "outputs": [], "source": [ - "from azureml.core.compute import AmlCompute\n", - "from azureml.core.compute import ComputeTarget\n", + "from azureml.core.compute import ComputeTarget, AmlCompute\n", + "from azureml.core.compute_target import ComputeTargetException\n", "\n", - "aml_compute = ws.get_default_compute_target(\"CPU\")\n", + "# Choose a name for your CPU cluster\n", + "amlcompute_cluster_name = \"cpu-cluster\"\n", "\n", - "if aml_compute is None:\n", - " amlcompute_cluster_name = \"cpu-cluster\"\n", - " provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_D2_V2\",\n", - " max_nodes = 4)\n", + "# Verify that cluster does not exist already\n", + "try:\n", + " aml_compute = ComputeTarget(workspace=ws, name=amlcompute_cluster_name)\n", + " print('Found existing cluster, use it.')\n", + "except ComputeTargetException:\n", + " compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',\n", + " max_nodes=4)\n", + " aml_compute = ComputeTarget.create(ws, amlcompute_cluster_name, compute_config)\n", "\n", - " aml_compute = ComputeTarget.create(ws, amlcompute_cluster_name, provisioning_config)\n", - " aml_compute.wait_for_completion(show_output = True, min_node_count = None, timeout_in_minutes = 20)\n", - "\n", - "aml_compute" + "aml_compute.wait_for_completion(show_output=True)" ] }, { @@ -215,7 +249,7 @@ "metadata": {}, "source": [ "#### Define RunConfig for the compute\n", - "We need `azureml-dataprep` SDK for all the steps below. We will also use `pandas`, `scikit-learn` and `automl` for the training step. Defining the `runconfig` for that." + "We will also use `pandas`, `scikit-learn` and `automl`, `pyarrow` for the pipeline steps. Defining the `runconfig` for that." ] }, { @@ -242,13 +276,10 @@ "# Use conda_dependencies.yml to create a conda environment in the Docker image for execution\n", "aml_run_config.environment.python.user_managed_dependencies = False\n", "\n", - "# Auto-prepare the Docker image when used for execution (if it is not already prepared)\n", - "aml_run_config.auto_prepare_environment = True\n", - "\n", "# Specify CondaDependencies obj, add necessary packages\n", "aml_run_config.environment.python.conda_dependencies = CondaDependencies.create(\n", " conda_packages=['pandas','scikit-learn'], \n", - " pip_packages=['azureml-sdk', 'azureml-dataprep', 'azureml-train-automl'], \n", + " pip_packages=['azureml-sdk[automl,explain]', 'pyarrow'], \n", " pin_sdk_version=False)\n", "\n", "print (\"Run configuration created.\")" @@ -259,7 +290,7 @@ "metadata": {}, "source": [ "### Prepare data\n", - "Now we will prepare for regression modeling by using the `Azure Machine Learning Data Prep SDK for Python`. We run various transformations to filter and combine two different NYC taxi data sets.\n", + "Now we will prepare for regression modeling by using `pandas`. We run various transformations to filter and combine two different NYC taxi datasets.\n", "\n", "We achieve this by creating a separate step for each transformation as this allows us to reuse the steps and saves us from running all over again in case of any change. We will keep data preparation scripts in one subfolder and training scripts in another.\n", "\n", @@ -270,7 +301,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "#### Define Useful Colums\n", + "#### Define Useful Columns\n", "Here we are defining a set of \"useful\" columns for both Green and Yellow taxi data." ] }, @@ -304,18 +335,12 @@ "metadata": {}, "outputs": [], "source": [ - "from azureml.data.data_reference import DataReference \n", "from azureml.pipeline.core import PipelineData\n", "from azureml.pipeline.steps import PythonScriptStep\n", "\n", "# python scripts folder\n", "prepare_data_folder = './scripts/prepdata'\n", "\n", - "blob_green_data = DataReference(\n", - " datastore=default_store,\n", - " data_reference_name=\"green_taxi_data\",\n", - " path_on_datastore=\"green/part-00000\")\n", - "\n", "# rename columns as per Azure Machine Learning NYC Taxi tutorial\n", "green_columns = str({ \n", " \"vendorID\": \"vendor\",\n", @@ -332,7 +357,7 @@ "}).replace(\",\", \";\")\n", "\n", "# Define output after cleansing step\n", - "cleansed_green_data = PipelineData(\"green_taxi_data\", datastore=default_store)\n", + "cleansed_green_data = PipelineData(\"cleansed_green_data\", datastore=default_store).as_dataset()\n", "\n", "print('Cleanse script is in {}.'.format(os.path.realpath(prepare_data_folder)))\n", "\n", @@ -341,11 +366,10 @@ "cleansingStepGreen = PythonScriptStep(\n", " name=\"Cleanse Green Taxi Data\",\n", " script_name=\"cleanse.py\", \n", - " arguments=[\"--input_cleanse\", blob_green_data, \n", - " \"--useful_columns\", useful_columns,\n", + " arguments=[\"--useful_columns\", useful_columns,\n", " \"--columns\", green_columns,\n", " \"--output_cleanse\", cleansed_green_data],\n", - " inputs=[blob_green_data],\n", + " inputs=[green_taxi_data.as_named_input('raw_data')],\n", " outputs=[cleansed_green_data],\n", " compute_target=aml_compute,\n", " runconfig=aml_run_config,\n", @@ -369,11 +393,6 @@ "metadata": {}, "outputs": [], "source": [ - "blob_yellow_data = DataReference(\n", - " datastore=default_store,\n", - " data_reference_name=\"yellow_taxi_data\",\n", - " path_on_datastore=\"yellow/part-00000\")\n", - "\n", "yellow_columns = str({\n", " \"vendorID\": \"vendor\",\n", " \"tpepPickupDateTime\": \"pickup_datetime\",\n", @@ -389,7 +408,7 @@ "}).replace(\",\", \";\")\n", "\n", "# Define output after cleansing step\n", - "cleansed_yellow_data = PipelineData(\"yellow_taxi_data\", datastore=default_store)\n", + "cleansed_yellow_data = PipelineData(\"cleansed_yellow_data\", datastore=default_store).as_dataset()\n", "\n", "print('Cleanse script is in {}.'.format(os.path.realpath(prepare_data_folder)))\n", "\n", @@ -398,11 +417,10 @@ "cleansingStepYellow = PythonScriptStep(\n", " name=\"Cleanse Yellow Taxi Data\",\n", " script_name=\"cleanse.py\", \n", - " arguments=[\"--input_cleanse\", blob_yellow_data, \n", - " \"--useful_columns\", useful_columns,\n", + " arguments=[\"--useful_columns\", useful_columns,\n", " \"--columns\", yellow_columns,\n", " \"--output_cleanse\", cleansed_yellow_data],\n", - " inputs=[blob_yellow_data],\n", + " inputs=[yellow_taxi_data.as_named_input('raw_data')],\n", " outputs=[cleansed_yellow_data],\n", " compute_target=aml_compute,\n", " runconfig=aml_run_config,\n", @@ -428,7 +446,7 @@ "outputs": [], "source": [ "# Define output after merging step\n", - "merged_data = PipelineData(\"merged_data\", datastore=default_store)\n", + "merged_data = PipelineData(\"merged_data\", datastore=default_store).as_dataset()\n", "\n", "print('Merge script is in {}.'.format(os.path.realpath(prepare_data_folder)))\n", "\n", @@ -437,10 +455,9 @@ "mergingStep = PythonScriptStep(\n", " name=\"Merge Taxi Data\",\n", " script_name=\"merge.py\", \n", - " arguments=[\"--input_green_merge\", cleansed_green_data, \n", - " \"--input_yellow_merge\", cleansed_yellow_data,\n", - " \"--output_merge\", merged_data],\n", - " inputs=[cleansed_green_data, cleansed_yellow_data],\n", + " arguments=[\"--output_merge\", merged_data],\n", + " inputs=[cleansed_green_data.parse_parquet_files(file_extension=None),\n", + " cleansed_yellow_data.parse_parquet_files(file_extension=None)],\n", " outputs=[merged_data],\n", " compute_target=aml_compute,\n", " runconfig=aml_run_config,\n", @@ -466,7 +483,7 @@ "outputs": [], "source": [ "# Define output after merging step\n", - "filtered_data = PipelineData(\"filtered_data\", datastore=default_store)\n", + "filtered_data = PipelineData(\"filtered_data\", datastore=default_store).as_dataset()\n", "\n", "print('Filter script is in {}.'.format(os.path.realpath(prepare_data_folder)))\n", "\n", @@ -475,9 +492,8 @@ "filterStep = PythonScriptStep(\n", " name=\"Filter Taxi Data\",\n", " script_name=\"filter.py\", \n", - " arguments=[\"--input_filter\", merged_data, \n", - " \"--output_filter\", filtered_data],\n", - " inputs=[merged_data],\n", + " arguments=[\"--output_filter\", filtered_data],\n", + " inputs=[merged_data.parse_parquet_files(file_extension=None)],\n", " outputs=[filtered_data],\n", " compute_target=aml_compute,\n", " runconfig = aml_run_config,\n", @@ -503,7 +519,7 @@ "outputs": [], "source": [ "# Define output after normalize step\n", - "normalized_data = PipelineData(\"normalized_data\", datastore=default_store)\n", + "normalized_data = PipelineData(\"normalized_data\", datastore=default_store).as_dataset()\n", "\n", "print('Normalize script is in {}.'.format(os.path.realpath(prepare_data_folder)))\n", "\n", @@ -512,9 +528,8 @@ "normalizeStep = PythonScriptStep(\n", " name=\"Normalize Taxi Data\",\n", " script_name=\"normalize.py\", \n", - " arguments=[\"--input_normalize\", filtered_data, \n", - " \"--output_normalize\", normalized_data],\n", - " inputs=[filtered_data],\n", + " arguments=[\"--output_normalize\", normalized_data],\n", + " inputs=[filtered_data.parse_parquet_files(file_extension=None)],\n", " outputs=[normalized_data],\n", " compute_target=aml_compute,\n", " runconfig = aml_run_config,\n", @@ -544,8 +559,8 @@ "metadata": {}, "outputs": [], "source": [ - "# Define output after transforme step\n", - "transformed_data = PipelineData(\"transformed_data\", datastore=default_store)\n", + "# Define output after transform step\n", + "transformed_data = PipelineData(\"transformed_data\", datastore=default_store).as_dataset()\n", "\n", "print('Transform script is in {}.'.format(os.path.realpath(prepare_data_folder)))\n", "\n", @@ -554,9 +569,8 @@ "transformStep = PythonScriptStep(\n", " name=\"Transform Taxi Data\",\n", " script_name=\"transform.py\", \n", - " arguments=[\"--input_transform\", normalized_data,\n", - " \"--output_transform\", transformed_data],\n", - " inputs=[normalized_data],\n", + " arguments=[\"--output_transform\", transformed_data],\n", + " inputs=[normalized_data.parse_parquet_files(file_extension=None)],\n", " outputs=[transformed_data],\n", " compute_target=aml_compute,\n", " runconfig = aml_run_config,\n", @@ -571,8 +585,8 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Extract features\n", - "Add the following columns to be features for our model creation. The prediction value will be *cost*." + "### Split the data into train and test sets\n", + "This function segregates the data into dataset for model training and dataset for testing." ] }, { @@ -581,92 +595,11 @@ "metadata": {}, "outputs": [], "source": [ - "feature_columns = str(['pickup_weekday','pickup_hour', 'distance','passengers', 'vendor']).replace(\",\", \";\")\n", - "\n", "train_model_folder = './scripts/trainmodel'\n", "\n", - "print('Extract script is in {}.'.format(os.path.realpath(train_model_folder)))\n", - "\n", - "# features data after transform step\n", - "features_data = PipelineData(\"features_data\", datastore=default_store)\n", - "\n", - "# featurization step creation\n", - "# See the featurization.py for details about input and output\n", - "featurizationStep = PythonScriptStep(\n", - " name=\"Extract Features\",\n", - " script_name=\"featurization.py\", \n", - " arguments=[\"--input_featurization\", transformed_data, \n", - " \"--useful_columns\", feature_columns,\n", - " \"--output_featurization\", features_data],\n", - " inputs=[transformed_data],\n", - " outputs=[features_data],\n", - " compute_target=aml_compute,\n", - " runconfig = aml_run_config,\n", - " source_directory=train_model_folder,\n", - " allow_reuse=True\n", - ")\n", - "\n", - "print(\"featurizationStep created.\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Extract label" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "label_columns = str(['cost']).replace(\",\", \";\")\n", - "\n", - "# label data after transform step\n", - "label_data = PipelineData(\"label_data\", datastore=default_store)\n", - "\n", - "print('Extract script is in {}.'.format(os.path.realpath(train_model_folder)))\n", - "\n", - "# label step creation\n", - "# See the featurization.py for details about input and output\n", - "labelStep = PythonScriptStep(\n", - " name=\"Extract Labels\",\n", - " script_name=\"featurization.py\", \n", - " arguments=[\"--input_featurization\", transformed_data, \n", - " \"--useful_columns\", label_columns,\n", - " \"--output_featurization\", label_data],\n", - " inputs=[transformed_data],\n", - " outputs=[label_data],\n", - " compute_target=aml_compute,\n", - " runconfig = aml_run_config,\n", - " source_directory=train_model_folder,\n", - " allow_reuse=True\n", - ")\n", - "\n", - "print(\"labelStep created.\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Split the data into train and test sets\n", - "This function segregates the data into the **x**, features, dataset for model training and **y**, values to predict, dataset for testing." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ "# train and test splits output\n", - "output_split_train_x = PipelineData(\"output_split_train_x\", datastore=default_store)\n", - "output_split_train_y = PipelineData(\"output_split_train_y\", datastore=default_store)\n", - "output_split_test_x = PipelineData(\"output_split_test_x\", datastore=default_store)\n", - "output_split_test_y = PipelineData(\"output_split_test_y\", datastore=default_store)\n", + "output_split_train = PipelineData(\"output_split_train\", datastore=default_store).as_dataset()\n", + "output_split_test = PipelineData(\"output_split_test\", datastore=default_store).as_dataset()\n", "\n", "print('Data spilt script is in {}.'.format(os.path.realpath(train_model_folder)))\n", "\n", @@ -675,14 +608,10 @@ "testTrainSplitStep = PythonScriptStep(\n", " name=\"Train Test Data Split\",\n", " script_name=\"train_test_split.py\", \n", - " arguments=[\"--input_split_features\", features_data, \n", - " \"--input_split_labels\", label_data,\n", - " \"--output_split_train_x\", output_split_train_x,\n", - " \"--output_split_train_y\", output_split_train_y,\n", - " \"--output_split_test_x\", output_split_test_x,\n", - " \"--output_split_test_y\", output_split_test_y],\n", - " inputs=[features_data, label_data],\n", - " outputs=[output_split_train_x, output_split_train_y, output_split_test_x, output_split_test_y],\n", + " arguments=[\"--output_split_train\", output_split_train,\n", + " \"--output_split_test\", output_split_test],\n", + " inputs=[transformed_data.parse_parquet_files(file_extension=None)],\n", + " outputs=[output_split_train, output_split_test],\n", " compute_target=aml_compute,\n", " runconfig = aml_run_config,\n", " source_directory=train_model_folder,\n", @@ -697,7 +626,7 @@ "metadata": {}, "source": [ "## Use automated machine learning to build regression model\n", - "Now we will use **automated machine learning** to build the regression model. We will use [AutoMLStep](https://docs.microsoft.com/en-us/python/api/azureml-train-automl/azureml.train.automl.automlstep?view=azure-ml-py) in AML Pipelines for this part. These functions use various features from the data set and allow an automated model to build relationships between the features and the price of a taxi trip." + "Now we will use **automated machine learning** to build the regression model. We will use [AutoMLStep](https://docs.microsoft.com/python/api/azureml-train-automl-runtime/azureml.train.automl.runtime.automl_step.automlstep?view=azure-ml-py) in AML Pipelines for this part. Perform `pip install azureml-sdk[automl]`to get the automated machine learning package. These functions use various features from the data set and allow an automated model to build relationships between the features and the price of a taxi trip." ] }, { @@ -727,52 +656,13 @@ "print(\"Experiment created\")" ] }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Create get_data script\n", - "\n", - "A script with `get_data()` function is necessary to fetch training features(X) and labels(Y) on remote compute, from input data. Here we use mounted path of `train_test_split` step to get the x and y train values. They are added as environment variable on compute machine by default\n", - "\n", - "Note: Every DataReference are added as environment variable on compute machine since the defualt mode is mount" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print('get_data.py will be written to {}.'.format(os.path.realpath(train_model_folder)))" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%writefile $train_model_folder/get_data.py\n", - "import os\n", - "import pandas as pd\n", - "\n", - "def get_data():\n", - " print(\"In get_data\")\n", - " print(os.environ['AZUREML_DATAREFERENCE_output_split_train_x'])\n", - " X_train = pd.read_csv(os.environ['AZUREML_DATAREFERENCE_output_split_train_x'] + \"/part-00000\", header=0)\n", - " y_train = pd.read_csv(os.environ['AZUREML_DATAREFERENCE_output_split_train_y'] + \"/part-00000\", header=0)\n", - " \n", - " return { \"X\" : X_train.values, \"y\" : y_train.values.flatten() }" - ] - }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Define settings for autogeneration and tuning\n", "\n", - "Here we define the experiment parameter and model settings for autogeneration and tuning. We can specify automl_settings as **kwargs as well. Also note that we have to use a get_data() function for remote excutions. See get_data script for more details.\n", + "Here we define the experiment parameter and model settings for autogeneration and tuning. We can specify automl_settings as **kwargs as well.\n", "\n", "Use your defined training settings as a parameter to an `AutoMLConfig` object. Additionally, specify your training data and the type of model, which is `regression` in this case.\n", "\n", @@ -793,17 +683,20 @@ " \"iteration_timeout_minutes\" : 10,\n", " \"iterations\" : 2,\n", " \"primary_metric\" : 'spearman_correlation',\n", - " \"preprocess\" : True,\n", - " \"verbosity\" : logging.INFO,\n", " \"n_cross_validations\": 5\n", "}\n", "\n", + "train_X = output_split_train.parse_parquet_files(file_extension=None).keep_columns(['pickup_weekday','pickup_hour', 'distance','passengers', 'vendor'])\n", + "train_y = output_split_train.parse_parquet_files(file_extension=None).keep_columns('cost')\n", + "\n", "automl_config = AutoMLConfig(task = 'regression',\n", " debug_log = 'automated_ml_errors.log',\n", " path = train_model_folder,\n", - " compute_target=aml_compute,\n", - " run_configuration=aml_run_config,\n", - " data_script = train_model_folder + \"/get_data.py\",\n", + " compute_target = aml_compute,\n", + " run_configuration = aml_run_config,\n", + " featurization = 'auto',\n", + " X = train_X,\n", + " y = train_y,\n", " **automl_settings)\n", " \n", "print(\"AutoML config created.\")" @@ -822,15 +715,12 @@ "metadata": {}, "outputs": [], "source": [ - "from azureml.train.automl.runtime import AutoMLStep\n", - "\n", - "trainWithAutomlStep = AutoMLStep(\n", - " name='AutoML_Regression',\n", - " automl_config=automl_config,\n", - " inputs=[output_split_train_x, output_split_train_y],\n", - " allow_reuse=True,\n", - " hash_paths=[os.path.realpath(train_model_folder)])\n", + "from azureml.pipeline.steps import AutoMLStep\n", "\n", + "trainWithAutomlStep = AutoMLStep(name='AutoML_Regression',\n", + " automl_config=automl_config,\n", + " passthru_automl_config=False,\n", + " allow_reuse=True)\n", "print(\"trainWithAutomlStep created.\")" ] }, @@ -892,12 +782,11 @@ " return path\n", "\n", "def fetch_df(step, output_name):\n", - " output_data = step.get_output_data(output_name)\n", - " \n", + " output_data = step.get_output_data(output_name) \n", " download_path = './outputs/' + output_name\n", - " output_data.download(download_path)\n", - " df_path = get_download_path(download_path, output_name) + '/part-00000'\n", - " return dprep.auto_read_file(path=df_path)" + " output_data.download(download_path, overwrite=True)\n", + " df_path = get_download_path(download_path, output_name) + '/processed.parquet'\n", + " return pd.read_parquet(df_path)" ] }, { @@ -939,7 +828,7 @@ "merge_step = pipeline_run.find_step_run(mergingStep.name)[0]\n", "combined_df = fetch_df(merge_step, merged_data.name)\n", "\n", - "display(combined_df.get_profile())" + "display(combined_df.describe())" ] }, { @@ -958,7 +847,7 @@ "filter_step = pipeline_run.find_step_run(filterStep.name)[0]\n", "filtered_df = fetch_df(filter_step, filtered_data.name)\n", "\n", - "display(filtered_df.get_profile())" + "display(filtered_df.describe())" ] }, { @@ -996,7 +885,7 @@ "transform_step = pipeline_run.find_step_run(transformStep.name)[0]\n", "transformed_df = fetch_df(transform_step, transformed_data.name)\n", "\n", - "display(transformed_df.get_profile())\n", + "display(transformed_df.describe())\n", "display(transformed_df.head(5))" ] }, @@ -1014,16 +903,10 @@ "outputs": [], "source": [ "split_step = pipeline_run.find_step_run(testTrainSplitStep.name)[0]\n", - "train_split_x = fetch_df(split_step, output_split_train_x.name)\n", - "train_split_y = fetch_df(split_step, output_split_train_y.name)\n", - "\n", - "display_x_train = train_split_x.keep_columns(columns=[\"vendor\", \"pickup_weekday\", \"pickup_hour\", \"passengers\", \"distance\"])\n", - "display_y_train = train_split_y.rename_columns(column_pairs={\"Column1\": \"cost\"})\n", + "train_split = fetch_df(split_step, output_split_train.name)\n", "\n", - "display(display_x_train.get_profile())\n", - "display(display_x_train.head(5))\n", - "display(display_y_train.get_profile())\n", - "display(display_y_train.head(5))" + "display(train_split.describe())\n", + "display(train_split.head(5))" ] }, { @@ -1125,14 +1008,11 @@ "source": [ "# split_step = pipeline_run.find_step_run(testTrainSplitStep.name)[0]\n", "\n", - "# x_test = fetch_df(split_step, output_split_test_x.name)\n", - "# y_test = fetch_df(split_step, output_split_test_y.name)\n", - "\n", - "# display(x_test.keep_columns(columns=[\"vendor\", \"pickup_weekday\", \"pickup_hour\", \"passengers\", \"distance\"]).head(5))\n", - "# display(y_test.rename_columns(column_pairs={\"Column1\": \"cost\"}).head(5))\n", + "# x_test = fetch_df(split_step, output_split_test.name)[['distance','passengers', 'vendor','pickup_weekday','pickup_hour']]\n", + "# y_test = fetch_df(split_step, output_split_test.name)[['cost']]\n", "\n", - "# x_test = x_test.to_pandas_dataframe()\n", - "# y_test = y_test.to_pandas_dataframe()" + "# display(x_test.head(5))\n", + "# display(y_test.head(5))" ] }, { @@ -1150,9 +1030,9 @@ "metadata": {}, "outputs": [], "source": [ - "# y_predict = fitted_model.predict(x_test.values)\n", + "# y_predict = fitted_model.predict(x_test)\n", "\n", - "# y_actual = y_test.iloc[:,0].values.tolist()\n", + "# y_actual = y_test.values.tolist()\n", "\n", "# display(pd.DataFrame({'Actual':y_actual, 'Predicted':y_predict}).head(5))" ] @@ -1168,7 +1048,7 @@ "# fig = plt.figure(figsize=(14, 10))\n", "# ax1 = fig.add_subplot(111)\n", "\n", - "# distance_vals = [x[4] for x in x_test.values]\n", + "# distance_vals = [x[0] for x in x_test.values]\n", "\n", "# ax1.scatter(distance_vals[:100], y_predict[:100], s=18, c='b', marker=\"s\", label='Predicted')\n", "# ax1.scatter(distance_vals[:100], y_actual[:100], s=18, c='r', marker=\"o\", label='Actual')\n", @@ -1204,7 +1084,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.6.7" + "version": "3.6.9" } }, "nbformat": 4, diff --git a/how-to-use-azureml/machine-learning-pipelines/nyc-taxi-data-regression-model-building/nyc-taxi-data-regression-model-building.yml b/how-to-use-azureml/machine-learning-pipelines/nyc-taxi-data-regression-model-building/nyc-taxi-data-regression-model-building.yml index dcdee6963..12b58a211 100644 --- a/how-to-use-azureml/machine-learning-pipelines/nyc-taxi-data-regression-model-building/nyc-taxi-data-regression-model-building.yml +++ b/how-to-use-azureml/machine-learning-pipelines/nyc-taxi-data-regression-model-building/nyc-taxi-data-regression-model-building.yml @@ -4,6 +4,7 @@ dependencies: - azureml-sdk - azureml-widgets - azureml-opendatasets - - azureml-dataprep - azureml-train-automl - matplotlib + - pandas + - pyarrow diff --git a/how-to-use-azureml/machine-learning-pipelines/nyc-taxi-data-regression-model-building/scripts/prepdata/cleanse.py b/how-to-use-azureml/machine-learning-pipelines/nyc-taxi-data-regression-model-building/scripts/prepdata/cleanse.py index 0b8c4143a..bae27e828 100644 --- a/how-to-use-azureml/machine-learning-pipelines/nyc-taxi-data-regression-model-building/scripts/prepdata/cleanse.py +++ b/how-to-use-azureml/machine-learning-pipelines/nyc-taxi-data-regression-model-building/scripts/prepdata/cleanse.py @@ -3,15 +3,14 @@ import argparse import os -import pandas as pd -import azureml.dataprep as dprep +from azureml.core import Run def get_dict(dict_str): pairs = dict_str.strip("{}").split("\;") new_dict = {} for pair in pairs: - key, value = pair.strip('\\').split(":") + key, value = pair.strip().split(":") new_dict[key.strip().strip("'")] = value.strip().strip("'") return new_dict @@ -19,40 +18,37 @@ def get_dict(dict_str): print("Cleans the input data") +# Get the input green_taxi_data. To learn more about how to access dataset in your script, please +# see https://docs.microsoft.com/en-us/azure/machine-learning/how-to-train-with-datasets. +run = Run.get_context() +raw_data = run.input_datasets["raw_data"] + + parser = argparse.ArgumentParser("cleanse") -parser.add_argument("--input_cleanse", type=str, help="raw taxi data") parser.add_argument("--output_cleanse", type=str, help="cleaned taxi data directory") parser.add_argument("--useful_columns", type=str, help="useful columns to keep") parser.add_argument("--columns", type=str, help="rename column pattern") args = parser.parse_args() -print("Argument 1(input taxi data path): %s" % args.input_cleanse) -print("Argument 2(columns to keep): %s" % str(args.useful_columns.strip("[]").split("\;"))) -print("Argument 3(columns renaming mapping): %s" % str(args.columns.strip("{}").split("\;"))) -print("Argument 4(output cleansed taxi data path): %s" % args.output_cleanse) - -raw_df = dprep.read_csv(path=args.input_cleanse, header=dprep.PromoteHeadersMode.GROUPED) +print("Argument 1(columns to keep): %s" % str(args.useful_columns.strip("[]").split("\;"))) +print("Argument 2(columns renaming mapping): %s" % str(args.columns.strip("{}").split("\;"))) +print("Argument 3(output cleansed taxi data path): %s" % args.output_cleanse) -# These functions ensure that null data is removed from the data set, +# These functions ensure that null data is removed from the dataset, # which will help increase machine learning model accuracy. -# Visit https://docs.microsoft.com/en-us/azure/machine-learning/service/tutorial-data-prep -# for more details useful_columns = [s.strip().strip("'") for s in args.useful_columns.strip("[]").split("\;")] columns = get_dict(args.columns) -all_columns = dprep.ColumnSelector(term=".*", use_regex=True) -drop_if_all_null = [all_columns, dprep.ColumnRelationship(dprep.ColumnRelationship.ALL)] +new_df = (raw_data.to_pandas_dataframe() + .dropna(how='all') + .rename(columns=columns))[useful_columns] -new_df = (raw_df - .replace_na(columns=all_columns) - .drop_nulls(*drop_if_all_null) - .rename_columns(column_pairs=columns) - .keep_columns(columns=useful_columns)) +new_df.reset_index(inplace=True, drop=True) if not (args.output_cleanse is None): os.makedirs(args.output_cleanse, exist_ok=True) print("%s created" % args.output_cleanse) - write_df = new_df.write_to_csv(directory_path=dprep.LocalFileOutput(args.output_cleanse)) - write_df.run_local() + path = args.output_cleanse + "/processed.parquet" + write_df = new_df.to_parquet(path) diff --git a/how-to-use-azureml/machine-learning-pipelines/nyc-taxi-data-regression-model-building/scripts/prepdata/filter.py b/how-to-use-azureml/machine-learning-pipelines/nyc-taxi-data-regression-model-building/scripts/prepdata/filter.py index a72481859..a999c54ec 100644 --- a/how-to-use-azureml/machine-learning-pipelines/nyc-taxi-data-regression-model-building/scripts/prepdata/filter.py +++ b/how-to-use-azureml/machine-learning-pipelines/nyc-taxi-data-regression-model-building/scripts/prepdata/filter.py @@ -1,55 +1,47 @@ import argparse import os -import azureml.dataprep as dprep +from azureml.core import Run print("Filters out coordinates for locations that are outside the city border.", "Chain the column filter commands within the filter() function", "and define the minimum and maximum bounds for each field.") +run = Run.get_context() + +# To learn more about how to access dataset in your script, please +# see https://docs.microsoft.com/en-us/azure/machine-learning/how-to-train-with-datasets. +merged_data = run.input_datasets["merged_data"] +combined_df = merged_data.to_pandas_dataframe() + parser = argparse.ArgumentParser("filter") -parser.add_argument("--input_filter", type=str, help="merged taxi data directory") parser.add_argument("--output_filter", type=str, help="filter out out of city locations") args = parser.parse_args() -print("Argument 1(input taxi data path): %s" % args.input_filter) -print("Argument 2(output filtered taxi data path): %s" % args.output_filter) - -combined_df = dprep.read_csv(args.input_filter + '/part-*') +print("Argument (output filtered taxi data path): %s" % args.output_filter) # These functions filter out coordinates for locations that are outside the city border. -# Visit https://docs.microsoft.com/en-us/azure/machine-learning/service/tutorial-data-prep for more details - -# Create a condensed view of the dataflow to just show the lat/long fields, -# which makes it easier to evaluate missing or out-of-scope coordinates -decimal_type = dprep.TypeConverter(data_type=dprep.FieldType.DECIMAL) -combined_df = combined_df.set_column_types(type_conversions={ - "pickup_longitude": decimal_type, - "pickup_latitude": decimal_type, - "dropoff_longitude": decimal_type, - "dropoff_latitude": decimal_type -}) # Filter out coordinates for locations that are outside the city border. # Chain the column filter commands within the filter() function # and define the minimum and maximum bounds for each field -latlong_filtered_df = (combined_df - .drop_nulls(columns=["pickup_longitude", - "pickup_latitude", - "dropoff_longitude", - "dropoff_latitude"], - column_relationship=dprep.ColumnRelationship(dprep.ColumnRelationship.ANY)) - .filter(dprep.f_and(dprep.col("pickup_longitude") <= -73.72, - dprep.col("pickup_longitude") >= -74.09, - dprep.col("pickup_latitude") <= 40.88, - dprep.col("pickup_latitude") >= 40.53, - dprep.col("dropoff_longitude") <= -73.72, - dprep.col("dropoff_longitude") >= -74.09, - dprep.col("dropoff_latitude") <= 40.88, - dprep.col("dropoff_latitude") >= 40.53))) + +combined_df = combined_df.astype({"pickup_longitude": 'float64', "pickup_latitude": 'float64', + "dropoff_longitude": 'float64', "dropoff_latitude": 'float64'}) + +latlong_filtered_df = combined_df[(combined_df.pickup_longitude <= -73.72) & + (combined_df.pickup_longitude >= -74.09) & + (combined_df.pickup_latitude <= 40.88) & + (combined_df.pickup_latitude >= 40.53) & + (combined_df.dropoff_longitude <= -73.72) & + (combined_df.dropoff_longitude >= -74.72) & + (combined_df.dropoff_latitude <= 40.88) & + (combined_df.dropoff_latitude >= 40.53)] + +latlong_filtered_df.reset_index(inplace=True, drop=True) if not (args.output_filter is None): os.makedirs(args.output_filter, exist_ok=True) print("%s created" % args.output_filter) - write_df = latlong_filtered_df.write_to_csv(directory_path=dprep.LocalFileOutput(args.output_filter)) - write_df.run_local() + path = args.output_filter + "/processed.parquet" + write_df = latlong_filtered_df.to_parquet(path) diff --git a/how-to-use-azureml/machine-learning-pipelines/nyc-taxi-data-regression-model-building/scripts/prepdata/merge.py b/how-to-use-azureml/machine-learning-pipelines/nyc-taxi-data-regression-model-building/scripts/prepdata/merge.py index 4764023aa..bf3c8d936 100644 --- a/how-to-use-azureml/machine-learning-pipelines/nyc-taxi-data-regression-model-building/scripts/prepdata/merge.py +++ b/how-to-use-azureml/machine-learning-pipelines/nyc-taxi-data-regression-model-building/scripts/prepdata/merge.py @@ -1,29 +1,30 @@ - import argparse import os -import azureml.dataprep as dprep +from azureml.core import Run print("Merge Green and Yellow taxi data") +run = Run.get_context() + +# To learn more about how to access dataset in your script, please +# see https://docs.microsoft.com/en-us/azure/machine-learning/how-to-train-with-datasets. +cleansed_green_data = run.input_datasets["cleansed_green_data"] +cleansed_yellow_data = run.input_datasets["cleansed_yellow_data"] +green_df = cleansed_green_data.to_pandas_dataframe() +yellow_df = cleansed_yellow_data.to_pandas_dataframe() + parser = argparse.ArgumentParser("merge") -parser.add_argument("--input_green_merge", type=str, help="cleaned green taxi data directory") -parser.add_argument("--input_yellow_merge", type=str, help="cleaned yellow taxi data directory") parser.add_argument("--output_merge", type=str, help="green and yellow taxi data merged") args = parser.parse_args() - -print("Argument 1(input green taxi data path): %s" % args.input_green_merge) -print("Argument 2(input yellow taxi data path): %s" % args.input_yellow_merge) -print("Argument 3(output merge taxi data path): %s" % args.output_merge) - -green_df = dprep.read_csv(args.input_green_merge + '/part-*') -yellow_df = dprep.read_csv(args.input_yellow_merge + '/part-*') +print("Argument (output merge taxi data path): %s" % args.output_merge) # Appending yellow data to green data -combined_df = green_df.append_rows([yellow_df]) +combined_df = green_df.append(yellow_df, ignore_index=True) +combined_df.reset_index(inplace=True, drop=True) if not (args.output_merge is None): os.makedirs(args.output_merge, exist_ok=True) print("%s created" % args.output_merge) - write_df = combined_df.write_to_csv(directory_path=dprep.LocalFileOutput(args.output_merge)) - write_df.run_local() + path = args.output_merge + "/processed.parquet" + write_df = combined_df.to_parquet(path) diff --git a/how-to-use-azureml/machine-learning-pipelines/nyc-taxi-data-regression-model-building/scripts/prepdata/normalize.py b/how-to-use-azureml/machine-learning-pipelines/nyc-taxi-data-regression-model-building/scripts/prepdata/normalize.py index f7b384d12..589fd2976 100644 --- a/how-to-use-azureml/machine-learning-pipelines/nyc-taxi-data-regression-model-building/scripts/prepdata/normalize.py +++ b/how-to-use-azureml/machine-learning-pipelines/nyc-taxi-data-regression-model-building/scripts/prepdata/normalize.py @@ -1,47 +1,48 @@ import argparse import os -import azureml.dataprep as dprep +import pandas as pd +from azureml.core import Run print("Replace undefined values to relavant values and rename columns to meaningful names") +run = Run.get_context() + +# To learn more about how to access dataset in your script, please +# see https://docs.microsoft.com/en-us/azure/machine-learning/how-to-train-with-datasets. +filtered_data = run.input_datasets['filtered_data'] +combined_converted_df = filtered_data.to_pandas_dataframe() + parser = argparse.ArgumentParser("normalize") -parser.add_argument("--input_normalize", type=str, help="combined and converted taxi data") parser.add_argument("--output_normalize", type=str, help="replaced undefined values and renamed columns") args = parser.parse_args() -print("Argument 1(input taxi data path): %s" % args.input_normalize) -print("Argument 2(output normalized taxi data path): %s" % args.output_normalize) - -combined_converted_df = dprep.read_csv(args.input_normalize + '/part-*') +print("Argument (output normalized taxi data path): %s" % args.output_normalize) # These functions replace undefined values and rename to use meaningful names. -# Visit https://docs.microsoft.com/en-us/azure/machine-learning/service/tutorial-data-prep for more details +replaced_stfor_vals_df = (combined_converted_df.replace({"store_forward": "0"}, {"store_forward": "N"}) + .fillna({"store_forward": "N"})) + +replaced_distance_vals_df = (replaced_stfor_vals_df.replace({"distance": ".00"}, {"distance": 0}) + .fillna({"distance": 0})) -replaced_stfor_vals_df = combined_converted_df.replace(columns="store_forward", - find="0", - replace_with="N").fill_nulls("store_forward", "N") +normalized_df = replaced_distance_vals_df.astype({"distance": 'float64'}) -replaced_distance_vals_df = replaced_stfor_vals_df.replace(columns="distance", - find=".00", - replace_with=0).fill_nulls("distance", 0) +temp = pd.DatetimeIndex(normalized_df["pickup_datetime"]) +normalized_df["pickup_date"] = temp.date +normalized_df["pickup_time"] = temp.time -replaced_distance_vals_df = replaced_distance_vals_df.to_number(["distance"]) +temp = pd.DatetimeIndex(normalized_df["dropoff_datetime"]) +normalized_df["dropoff_date"] = temp.date +normalized_df["dropoff_time"] = temp.time -time_split_df = (replaced_distance_vals_df - .split_column_by_example(source_column="pickup_datetime") - .split_column_by_example(source_column="dropoff_datetime")) +del normalized_df["pickup_datetime"] +del normalized_df["dropoff_datetime"] -# Split the pickup and dropoff datetime values into the respective date and time columns -renamed_col_df = (time_split_df - .rename_columns(column_pairs={ - "pickup_datetime_1": "pickup_date", - "pickup_datetime_2": "pickup_time", - "dropoff_datetime_1": "dropoff_date", - "dropoff_datetime_2": "dropoff_time"})) +normalized_df.reset_index(inplace=True, drop=True) if not (args.output_normalize is None): os.makedirs(args.output_normalize, exist_ok=True) print("%s created" % args.output_normalize) - write_df = renamed_col_df.write_to_csv(directory_path=dprep.LocalFileOutput(args.output_normalize)) - write_df.run_local() + path = args.output_normalize + "/processed.parquet" + write_df = normalized_df.to_parquet(path) diff --git a/how-to-use-azureml/machine-learning-pipelines/nyc-taxi-data-regression-model-building/scripts/prepdata/transform.py b/how-to-use-azureml/machine-learning-pipelines/nyc-taxi-data-regression-model-building/scripts/prepdata/transform.py index c2ac6e95e..5584d6aba 100644 --- a/how-to-use-azureml/machine-learning-pipelines/nyc-taxi-data-regression-model-building/scripts/prepdata/transform.py +++ b/how-to-use-azureml/machine-learning-pipelines/nyc-taxi-data-regression-model-building/scripts/prepdata/transform.py @@ -1,22 +1,24 @@ import argparse import os -import azureml.dataprep as dprep +from azureml.core import Run print("Transforms the renamed taxi data to the required format") +run = Run.get_context() + +# To learn more about how to access dataset in your script, please +# see https://docs.microsoft.com/en-us/azure/machine-learning/how-to-train-with-datasets. +normalized_data = run.input_datasets['normalized_data'] +normalized_df = normalized_data.to_pandas_dataframe() + parser = argparse.ArgumentParser("transform") -parser.add_argument("--input_transform", type=str, help="renamed taxi data") parser.add_argument("--output_transform", type=str, help="transformed taxi data") args = parser.parse_args() -print("Argument 1(input taxi data path): %s" % args.input_transform) print("Argument 2(output final transformed taxi data): %s" % args.output_transform) -renamed_df = dprep.read_csv(args.input_transform + '/part-*') - # These functions transform the renamed data to be used finally for training. -# Visit https://docs.microsoft.com/en-us/azure/machine-learning/service/tutorial-data-prep for more details # Split the pickup and dropoff date further into the day of the week, day of the month, and month values. # To get the day of the week value, use the derive_column_by_example() function. @@ -27,62 +29,46 @@ # use the drop_columns() function to delete the original fields as the newly generated features are preferred. # Rename the rest of the fields to use meaningful descriptions. -transformed_features_df = (renamed_df - .derive_column_by_example( - source_columns="pickup_date", - new_column_name="pickup_weekday", - example_data=[("2009-01-04", "Sunday"), ("2013-08-22", "Thursday")]) - .derive_column_by_example( - source_columns="dropoff_date", - new_column_name="dropoff_weekday", - example_data=[("2013-08-22", "Thursday"), ("2013-11-03", "Sunday")]) - - .split_column_by_example(source_column="pickup_time") - .split_column_by_example(source_column="dropoff_time") - - .split_column_by_example(source_column="pickup_time_1") - .split_column_by_example(source_column="dropoff_time_1") - .drop_columns(columns=[ - "pickup_date", "pickup_time", "dropoff_date", "dropoff_time", - "pickup_date_1", "dropoff_date_1", "pickup_time_1", "dropoff_time_1"]) - - .rename_columns(column_pairs={ - "pickup_date_2": "pickup_month", - "pickup_date_3": "pickup_monthday", - "pickup_time_1_1": "pickup_hour", - "pickup_time_1_2": "pickup_minute", - "pickup_time_2": "pickup_second", - "dropoff_date_2": "dropoff_month", - "dropoff_date_3": "dropoff_monthday", - "dropoff_time_1_1": "dropoff_hour", - "dropoff_time_1_2": "dropoff_minute", - "dropoff_time_2": "dropoff_second"})) - -# Drop the pickup_datetime and dropoff_datetime columns because they're -# no longer needed (granular time features like hour, -# minute and second are more useful for model training). -processed_df = transformed_features_df.drop_columns(columns=["pickup_datetime", "dropoff_datetime"]) +normalized_df = normalized_df.astype({"pickup_date": 'datetime64', "dropoff_date": 'datetime64', + "pickup_time": 'datetime64', "dropoff_time": 'datetime64', + "distance": 'float64', "cost": 'float64'}) -# Use the type inference functionality to automatically check the data type of each field, -# and display the inference results. -type_infer = processed_df.builders.set_column_types() -type_infer.learn() +normalized_df["pickup_weekday"] = normalized_df["pickup_date"].dt.dayofweek +normalized_df["pickup_month"] = normalized_df["pickup_date"].dt.month +normalized_df["pickup_monthday"] = normalized_df["pickup_date"].dt.day -# The inference results look correct based on the data. Now apply the type conversions to the dataflow. -type_converted_df = type_infer.to_dataflow() +normalized_df["dropoff_weekday"] = normalized_df["dropoff_date"].dt.dayofweek +normalized_df["dropoff_month"] = normalized_df["dropoff_date"].dt.month +normalized_df["dropoff_monthday"] = normalized_df["dropoff_date"].dt.day + +normalized_df["pickup_hour"] = normalized_df["pickup_time"].dt.hour +normalized_df["pickup_minute"] = normalized_df["pickup_time"].dt.minute +normalized_df["pickup_second"] = normalized_df["pickup_time"].dt.second + +normalized_df["dropoff_hour"] = normalized_df["dropoff_time"].dt.hour +normalized_df["dropoff_minute"] = normalized_df["dropoff_time"].dt.minute +normalized_df["dropoff_second"] = normalized_df["dropoff_time"].dt.second + +# Drop the pickup_date, dropoff_date, pickup_time, dropoff_time columns because they're +# no longer needed (granular time features like hour, +# minute and second are more useful for model training). +del normalized_df["pickup_date"] +del normalized_df["dropoff_date"] +del normalized_df["pickup_time"] +del normalized_df["dropoff_time"] -# Before you package the dataflow, run two final filters on the data set. +# Before you package the dataset, run two final filters on the dataset. # To eliminate incorrectly captured data points, -# filter the dataflow on records where both the cost and distance variable values are greater than zero. +# filter the dataset on records where both the cost and distance variable values are greater than zero. # This step will significantly improve machine learning model accuracy, # because data points with a zero cost or distance represent major outliers that throw off prediction accuracy. -final_df = type_converted_df.filter(dprep.col("distance") > 0) -final_df = final_df.filter(dprep.col("cost") > 0) +final_df = normalized_df[(normalized_df.distance > 0) & (normalized_df.cost > 0)] +final_df.reset_index(inplace=True, drop=True) # Writing the final dataframe to use for training in the following steps if not (args.output_transform is None): os.makedirs(args.output_transform, exist_ok=True) print("%s created" % args.output_transform) - write_df = final_df.write_to_csv(directory_path=dprep.LocalFileOutput(args.output_transform)) - write_df.run_local() + path = args.output_transform + "/processed.parquet" + write_df = final_df.to_parquet(path) diff --git a/how-to-use-azureml/machine-learning-pipelines/nyc-taxi-data-regression-model-building/scripts/trainmodel/featurization.py b/how-to-use-azureml/machine-learning-pipelines/nyc-taxi-data-regression-model-building/scripts/trainmodel/featurization.py deleted file mode 100644 index bcf2338af..000000000 --- a/how-to-use-azureml/machine-learning-pipelines/nyc-taxi-data-regression-model-building/scripts/trainmodel/featurization.py +++ /dev/null @@ -1,31 +0,0 @@ -import argparse -import os -import azureml.dataprep as dprep -import azureml.core - -print("Extracts important features from prepared data") - -parser = argparse.ArgumentParser("featurization") -parser.add_argument("--input_featurization", type=str, help="input featurization") -parser.add_argument("--useful_columns", type=str, help="columns to use") -parser.add_argument("--output_featurization", type=str, help="output featurization") - -args = parser.parse_args() - -print("Argument 1(input training data path): %s" % args.input_featurization) -print("Argument 2(column features to use): %s" % str(args.useful_columns.strip("[]").split("\;"))) -print("Argument 3:(output featurized training data path) %s" % args.output_featurization) - -dflow_prepared = dprep.read_csv(args.input_featurization + '/part-*') - -# These functions extracts useful features for training -# Visit https://docs.microsoft.com/en-us/azure/machine-learning/service/tutorial-auto-train-models for more detail - -useful_columns = [s.strip().strip("'") for s in args.useful_columns.strip("[]").split("\;")] -dflow = dflow_prepared.keep_columns(useful_columns) - -if not (args.output_featurization is None): - os.makedirs(args.output_featurization, exist_ok=True) - print("%s created" % args.output_featurization) - write_df = dflow.write_to_csv(directory_path=dprep.LocalFileOutput(args.output_featurization)) - write_df.run_local() diff --git a/how-to-use-azureml/machine-learning-pipelines/nyc-taxi-data-regression-model-building/scripts/trainmodel/get_data.py b/how-to-use-azureml/machine-learning-pipelines/nyc-taxi-data-regression-model-building/scripts/trainmodel/get_data.py deleted file mode 100644 index 6472e46a2..000000000 --- a/how-to-use-azureml/machine-learning-pipelines/nyc-taxi-data-regression-model-building/scripts/trainmodel/get_data.py +++ /dev/null @@ -1,12 +0,0 @@ - -import os -import pandas as pd - - -def get_data(): - print("In get_data") - print(os.environ['AZUREML_DATAREFERENCE_output_split_train_x']) - X_train = pd.read_csv(os.environ['AZUREML_DATAREFERENCE_output_split_train_x'] + "/part-00000", header=0) - y_train = pd.read_csv(os.environ['AZUREML_DATAREFERENCE_output_split_train_y'] + "/part-00000", header=0) - - return {"X": X_train.values, "y": y_train.values.flatten()} diff --git a/how-to-use-azureml/machine-learning-pipelines/nyc-taxi-data-regression-model-building/scripts/trainmodel/train_test_split.py b/how-to-use-azureml/machine-learning-pipelines/nyc-taxi-data-regression-model-building/scripts/trainmodel/train_test_split.py index cdc80b619..48571e64f 100644 --- a/how-to-use-azureml/machine-learning-pipelines/nyc-taxi-data-regression-model-building/scripts/trainmodel/train_test_split.py +++ b/how-to-use-azureml/machine-learning-pipelines/nyc-taxi-data-regression-model-building/scripts/trainmodel/train_test_split.py @@ -1,48 +1,38 @@ import argparse import os -import azureml.dataprep as dprep import azureml.core +from azureml.core import Run from sklearn.model_selection import train_test_split def write_output(df, path): os.makedirs(path, exist_ok=True) print("%s created" % path) - df.to_csv(path + "/part-00000", index=False) + df.to_parquet(path + "/processed.parquet") print("Split the data into train and test") +run = Run.get_context() +transformed_data = run.input_datasets['transformed_data'] +transformed_df = transformed_data.to_pandas_dataframe() parser = argparse.ArgumentParser("split") -parser.add_argument("--input_split_features", type=str, help="input split features") -parser.add_argument("--input_split_labels", type=str, help="input split labels") -parser.add_argument("--output_split_train_x", type=str, help="output split train features") -parser.add_argument("--output_split_train_y", type=str, help="output split train labels") -parser.add_argument("--output_split_test_x", type=str, help="output split test features") -parser.add_argument("--output_split_test_y", type=str, help="output split test labels") +parser.add_argument("--output_split_train", type=str, help="output split train data") +parser.add_argument("--output_split_test", type=str, help="output split test data") args = parser.parse_args() -print("Argument 1(input taxi data features path): %s" % args.input_split_features) -print("Argument 2(input taxi data labels path): %s" % args.input_split_labels) -print("Argument 3(output training features split path): %s" % args.output_split_train_x) -print("Argument 4(output training labels split path): %s" % args.output_split_train_y) -print("Argument 5(output test features split path): %s" % args.output_split_test_x) -print("Argument 6(output test labels split path): %s" % args.output_split_test_y) - -x_df = dprep.read_csv(path=args.input_split_features, header=dprep.PromoteHeadersMode.GROUPED).to_pandas_dataframe() -y_df = dprep.read_csv(path=args.input_split_labels, header=dprep.PromoteHeadersMode.GROUPED).to_pandas_dataframe() +print("Argument 1(output training data split path): %s" % args.output_split_train) +print("Argument 2(output test data split path): %s" % args.output_split_test) # These functions splits the input features and labels into test and train data # Visit https://docs.microsoft.com/en-us/azure/machine-learning/service/tutorial-auto-train-models for more detail -x_train, x_test, y_train, y_test = train_test_split(x_df, y_df, test_size=0.2, random_state=223) +output_split_train, output_split_test = train_test_split(transformed_df, test_size=0.2, random_state=223) +output_split_train.reset_index(inplace=True, drop=True) +output_split_test.reset_index(inplace=True, drop=True) -if not (args.output_split_train_x is None and - args.output_split_test_x is None and - args.output_split_train_y is None and - args.output_split_test_y is None): - write_output(x_train, args.output_split_train_x) - write_output(y_train, args.output_split_train_y) - write_output(x_test, args.output_split_test_x) - write_output(y_test, args.output_split_test_y) +if not (args.output_split_train is None and + args.output_split_test is None): + write_output(output_split_train, args.output_split_train) + write_output(output_split_test, args.output_split_test) diff --git a/contrib/batch_inferencing/Code/digit_identification.py b/how-to-use-azureml/machine-learning-pipelines/parallel-run/Code/digit_identification.py similarity index 100% rename from contrib/batch_inferencing/Code/digit_identification.py rename to how-to-use-azureml/machine-learning-pipelines/parallel-run/Code/digit_identification.py diff --git a/contrib/batch_inferencing/Code/iris_score.py b/how-to-use-azureml/machine-learning-pipelines/parallel-run/Code/iris_score.py similarity index 100% rename from contrib/batch_inferencing/Code/iris_score.py rename to how-to-use-azureml/machine-learning-pipelines/parallel-run/Code/iris_score.py diff --git a/contrib/batch_inferencing/README.md b/how-to-use-azureml/machine-learning-pipelines/parallel-run/README.md similarity index 93% rename from contrib/batch_inferencing/README.md rename to how-to-use-azureml/machine-learning-pipelines/parallel-run/README.md index a1c23c83a..b7274ae9f 100644 --- a/contrib/batch_inferencing/README.md +++ b/how-to-use-azureml/machine-learning-pipelines/parallel-run/README.md @@ -11,13 +11,13 @@ Batch inference public preview offers a platform in which to do large inference ### Python package installation Following the convention of most AzureML Public Preview features, Batch Inference SDK is currently available as a contrib package. -If you're unfamiliar with creating a new Python environment, you may follow this example for [creating a conda environment](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-environment#local). Batch Inference package can be installed through the following pip command. +If you're unfamiliar with creating a new Python environment, you may follow this example for [creating a conda environment](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-environment#local). Batch Inference package can be installed through the following pip command. ``` pip install azureml-contrib-pipeline-steps ``` ### Creation of Azure Machine Learning Workspace -If you do not already have a Azure ML Workspace, please run the [configuration Notebook](../../configuration.ipynb). +If you do not already have a Azure ML Workspace, please run the [configuration Notebook](https://aka.ms/pl-config). ## Configure a Batch Inference job @@ -71,7 +71,7 @@ base_image_registry.password = "password" - **models**: zero or more model names already registered in Azure Machine Learning model registry. - **parallel_run_config**: ParallelRunConfig as defined above. - **inputs**: one or more Dataset objects. - - **output**: this should be a PipelineData object encapsulating an Azure BLOB container path. + - **output**: this should be a PipelineData object encapsulating an Azure BLOB container path. - **arguments**: list of custom arguments passed to scoring script (optional) - **allow_reuse**: optional, default value is True. If the inputs remain the same as a previous run, it will make the previous run results immediately available (skips re-computing the step). @@ -121,7 +121,8 @@ pipeline_run.wait_for_completion(show_output=True) # Sample notebooks -- [file-dataset-image-inference-mnist.ipynb](./file-dataset-image-inference-mnist.ipynb) demonstrates how to run batch inference on an MNIST dataset. -- [tabular-dataset-inference-iris.ipynb](./tabular-dataset-inference-iris.ipynb) demonstrates how to run batch inference on an IRIS dataset. +- [file-dataset-image-inference-mnist.ipynb](./file-dataset-image-inference-mnist.ipynb) demonstrates how to run batch inference on an MNIST dataset using FileDataset. +- [tabular-dataset-inference-iris.ipynb](./tabular-dataset-inference-iris.ipynb) demonstrates how to run batch inference on an IRIS dataset using TabularDataset. +- [pipeline-style-transfer.ipynb](../pipeline-style-transfer/pipeline-style-transfer.ipynb) demonstrates using ParallelRunStep in multi-step pipeline and using output from one step as input to ParallelRunStep. -![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/contrib/batch_inferencing/README.png) +![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/machine-learning-pipelines/parallel-run/README.png) diff --git a/contrib/batch_inferencing/file-dataset-image-inference-mnist.ipynb b/how-to-use-azureml/machine-learning-pipelines/parallel-run/file-dataset-image-inference-mnist.ipynb similarity index 98% rename from contrib/batch_inferencing/file-dataset-image-inference-mnist.ipynb rename to how-to-use-azureml/machine-learning-pipelines/parallel-run/file-dataset-image-inference-mnist.ipynb index a35b53206..398a7f6ff 100644 --- a/contrib/batch_inferencing/file-dataset-image-inference-mnist.ipynb +++ b/how-to-use-azureml/machine-learning-pipelines/parallel-run/file-dataset-image-inference-mnist.ipynb @@ -12,7 +12,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/machine-learning-pipelines/contrib/batch_inferencing/file-dataset-image-inference-mnist.png)" + "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/machine-learning-pipelines/parallel-run/file-dataset-image-inference-mnist.png)" ] }, { @@ -23,6 +23,11 @@ "\n", "In this notebook, we will demonstrate how to make predictions on large quantities of data asynchronously using the ML pipelines with Azure Machine Learning. Batch inference (or batch scoring) provides cost-effective inference, with unparalleled throughput for asynchronous applications. Batch prediction pipelines can scale to perform inference on terabytes of production data. Batch prediction is optimized for high throughput, fire-and-forget predictions for a large collection of data.\n", "\n", + "> **Note**\n", + "This notebook uses public preview functionality (ParallelRunStep). Please install azureml-contrib-pipeline-steps package before running this notebook. Pandas is used to display job results.\n", + "```\n", + "pip install azureml-contrib-pipeline-steps pandas\n", + "```\n", "> **Tip**\n", "If your system requires low-latency processing (to process a single document or small set of documents quickly), use [real-time scoring](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-consume-web-service) instead of batch prediction.\n", "\n", @@ -519,9 +524,6 @@ "name": "tracych" } ], - "friendly_name": "MNIST data inferencing using ParallelRunStep", - "exclude_from_index": false, - "index_order": 1, "category": "Other notebooks", "compute": [ "AML Compute" @@ -532,14 +534,12 @@ "deployment": [ "None" ], + "exclude_from_index": false, "framework": [ "None" ], - "tags": [ - "Batch Inferencing", - "Pipeline" - ], - "task": "Digit identification", + "friendly_name": "MNIST data inferencing using ParallelRunStep", + "index_order": 1, "kernelspec": { "display_name": "Python 3.6", "language": "python", @@ -556,7 +556,12 @@ "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.9" - } + }, + "tags": [ + "Batch Inferencing", + "Pipeline" + ], + "task": "Digit identification" }, "nbformat": 4, "nbformat_minor": 2 diff --git a/contrib/batch_inferencing/file-dataset-image-inference-mnist.yml b/how-to-use-azureml/machine-learning-pipelines/parallel-run/file-dataset-image-inference-mnist.yml similarity index 92% rename from contrib/batch_inferencing/file-dataset-image-inference-mnist.yml rename to how-to-use-azureml/machine-learning-pipelines/parallel-run/file-dataset-image-inference-mnist.yml index e1727c0a6..cd4be0864 100644 --- a/contrib/batch_inferencing/file-dataset-image-inference-mnist.yml +++ b/how-to-use-azureml/machine-learning-pipelines/parallel-run/file-dataset-image-inference-mnist.yml @@ -4,3 +4,4 @@ dependencies: - azureml-sdk - azureml-contrib-pipeline-steps - azureml-widgets + - pandas diff --git a/contrib/batch_inferencing/tabular-dataset-inference-iris.ipynb b/how-to-use-azureml/machine-learning-pipelines/parallel-run/tabular-dataset-inference-iris.ipynb similarity index 98% rename from contrib/batch_inferencing/tabular-dataset-inference-iris.ipynb rename to how-to-use-azureml/machine-learning-pipelines/parallel-run/tabular-dataset-inference-iris.ipynb index 61d93c1a6..5aae38616 100644 --- a/contrib/batch_inferencing/tabular-dataset-inference-iris.ipynb +++ b/how-to-use-azureml/machine-learning-pipelines/parallel-run/tabular-dataset-inference-iris.ipynb @@ -4,7 +4,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved. \n", + "Copyright (c) Microsoft Corporation. All rights reserved.\n", "Licensed under the MIT License." ] }, @@ -12,7 +12,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/machine-learning-pipelines/contrib/batch_inferencing/tabular-dataset-inference-iris.png)" + "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/machine-learning-pipelines/parallel-run/tabular-dataset-inference-iris.png)" ] }, { @@ -23,6 +23,11 @@ "\n", "In this notebook, we will demonstrate how to make predictions on large quantities of data asynchronously using the ML pipelines with Azure Machine Learning. Batch inference (or batch scoring) provides cost-effective inference, with unparalleled throughput for asynchronous applications. Batch prediction pipelines can scale to perform inference on terabytes of production data. Batch prediction is optimized for high throughput, fire-and-forget predictions for a large collection of data.\n", "\n", + "> **Note**\n", + "This notebook uses public preview functionality (ParallelRunStep). Please install azureml-contrib-pipeline-steps package before running this notebook. Pandas is used to display job results.\n", + "```\n", + "pip install azureml-contrib-pipeline-steps pandas\n", + "```\n", "> **Tip**\n", "If your system requires low-latency processing (to process a single document or small set of documents quickly), use [real-time scoring](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-consume-web-service) instead of batch prediction.\n", "\n", @@ -494,9 +499,6 @@ "name": "tracych" } ], - "friendly_name": "IRIS data inferencing using ParallelRunStep", - "exclude_from_index": false, - "index_order": 1, "category": "Other notebooks", "compute": [ "AML Compute" @@ -507,14 +509,12 @@ "deployment": [ "None" ], + "exclude_from_index": false, "framework": [ "None" ], - "tags": [ - "Batch Inferencing", - "Pipeline" - ], - "task": "Recognize flower type", + "friendly_name": "IRIS data inferencing using ParallelRunStep", + "index_order": 1, "kernelspec": { "display_name": "Python 3.6", "language": "python", @@ -531,7 +531,12 @@ "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.2" - } + }, + "tags": [ + "Batch Inferencing", + "Pipeline" + ], + "task": "Recognize flower type" }, "nbformat": 4, "nbformat_minor": 2 diff --git a/contrib/batch_inferencing/tabular-dataset-inference-iris.yml b/how-to-use-azureml/machine-learning-pipelines/parallel-run/tabular-dataset-inference-iris.yml similarity index 92% rename from contrib/batch_inferencing/tabular-dataset-inference-iris.yml rename to how-to-use-azureml/machine-learning-pipelines/parallel-run/tabular-dataset-inference-iris.yml index 8bbf1c7ad..6d1c08a8b 100644 --- a/contrib/batch_inferencing/tabular-dataset-inference-iris.yml +++ b/how-to-use-azureml/machine-learning-pipelines/parallel-run/tabular-dataset-inference-iris.yml @@ -4,3 +4,4 @@ dependencies: - azureml-sdk - azureml-contrib-pipeline-steps - azureml-widgets + - pandas diff --git a/how-to-use-azureml/machine-learning-pipelines/pipeline-batch-scoring/batch_scoring.py b/how-to-use-azureml/machine-learning-pipelines/pipeline-batch-scoring/batch_scoring.py deleted file mode 100644 index dfd135bd6..000000000 --- a/how-to-use-azureml/machine-learning-pipelines/pipeline-batch-scoring/batch_scoring.py +++ /dev/null @@ -1,119 +0,0 @@ -# Copyright (c) Microsoft. All rights reserved. -# Licensed under the MIT license. - -import os -import argparse -import datetime -import time -import tensorflow as tf -from math import ceil -import numpy as np -import shutil -from tensorflow.contrib.slim.python.slim.nets import inception_v3 -from azureml.core.model import Model - -slim = tf.contrib.slim - -parser = argparse.ArgumentParser(description="Start a tensorflow model serving") -parser.add_argument('--model_name', dest="model_name", required=True) -parser.add_argument('--label_dir', dest="label_dir", required=True) -parser.add_argument('--dataset_path', dest="dataset_path", required=True) -parser.add_argument('--output_dir', dest="output_dir", required=True) -parser.add_argument('--batch_size', dest="batch_size", type=int, required=True) - -args = parser.parse_args() - -image_size = 299 -num_channel = 3 - -# create output directory if it does not exist -os.makedirs(args.output_dir, exist_ok=True) - - -def get_class_label_dict(label_file): - label = [] - proto_as_ascii_lines = tf.gfile.GFile(label_file).readlines() - for l in proto_as_ascii_lines: - label.append(l.rstrip()) - return label - - -class DataIterator: - def __init__(self, data_dir): - self.file_paths = [] - image_list = os.listdir(data_dir) - # total_size = len(image_list) - self.file_paths = [data_dir + '/' + file_name.rstrip() for file_name in image_list] - - self.labels = [1 for file_name in self.file_paths] - - @property - def size(self): - return len(self.labels) - - def input_pipeline(self, batch_size): - images_tensor = tf.convert_to_tensor(self.file_paths, dtype=tf.string) - labels_tensor = tf.convert_to_tensor(self.labels, dtype=tf.int64) - input_queue = tf.train.slice_input_producer([images_tensor, labels_tensor], shuffle=False) - labels = input_queue[1] - images_content = tf.read_file(input_queue[0]) - - image_reader = tf.image.decode_jpeg(images_content, channels=num_channel, name="jpeg_reader") - float_caster = tf.cast(image_reader, tf.float32) - new_size = tf.constant([image_size, image_size], dtype=tf.int32) - images = tf.image.resize_images(float_caster, new_size) - images = tf.divide(tf.subtract(images, [0]), [255]) - - image_batch, label_batch = tf.train.batch([images, labels], batch_size=batch_size, capacity=5 * batch_size) - return image_batch - - -def main(_): - # start_time = datetime.datetime.now() - label_file_name = os.path.join(args.label_dir, "labels.txt") - label_dict = get_class_label_dict(label_file_name) - classes_num = len(label_dict) - test_feeder = DataIterator(data_dir=args.dataset_path) - total_size = len(test_feeder.labels) - count = 0 - # get model from model registry - model_path = Model.get_model_path(args.model_name) - with tf.Session() as sess: - test_images = test_feeder.input_pipeline(batch_size=args.batch_size) - with slim.arg_scope(inception_v3.inception_v3_arg_scope()): - input_images = tf.placeholder(tf.float32, [args.batch_size, image_size, image_size, num_channel]) - logits, _ = inception_v3.inception_v3(input_images, - num_classes=classes_num, - is_training=False) - probabilities = tf.argmax(logits, 1) - - sess.run(tf.global_variables_initializer()) - sess.run(tf.local_variables_initializer()) - coord = tf.train.Coordinator() - threads = tf.train.start_queue_runners(sess=sess, coord=coord) - saver = tf.train.Saver() - saver.restore(sess, model_path) - out_filename = os.path.join(args.output_dir, "result-labels.txt") - with open(out_filename, "w") as result_file: - i = 0 - while count < total_size and not coord.should_stop(): - test_images_batch = sess.run(test_images) - file_names_batch = test_feeder.file_paths[i * args.batch_size: - min(test_feeder.size, (i + 1) * args.batch_size)] - results = sess.run(probabilities, feed_dict={input_images: test_images_batch}) - new_add = min(args.batch_size, total_size - count) - count += new_add - i += 1 - for j in range(new_add): - result_file.write(os.path.basename(file_names_batch[j]) + ": " + label_dict[results[j]] + "\n") - result_file.flush() - coord.request_stop() - coord.join(threads) - - # copy the file to artifacts - shutil.copy(out_filename, "./outputs/") - # Move the processed data out of the blob so that the next run can process the data. - - -if __name__ == "__main__": - tf.app.run() diff --git a/how-to-use-azureml/machine-learning-pipelines/pipeline-batch-scoring/pipeline-batch-scoring.ipynb b/how-to-use-azureml/machine-learning-pipelines/pipeline-batch-scoring/pipeline-batch-scoring.ipynb deleted file mode 100644 index 86866d76a..000000000 --- a/how-to-use-azureml/machine-learning-pipelines/pipeline-batch-scoring/pipeline-batch-scoring.ipynb +++ /dev/null @@ -1,630 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved. \n", - "Licensed under the MIT License." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/machine-learning-pipelines/pipeline-batch-scoring/pipeline-batch-scoring.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "**Note**: Azure Machine Learning recently released ParallelRunStep for public preview, this will allow for parallelization of your workload across many compute nodes without the difficulty of orchestrating worker pools and queues. See the [batch inference notebooks](../../../contrib/batch_inferencing/) for examples on how to get started." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Using Azure Machine Learning Pipelines for batch prediction\n", - "\n", - "In this notebook we will demonstrate how to run a batch scoring job using Azure Machine Learning pipelines. Our example job will be to take an already-trained image classification model, and run that model on some unlabeled images. The image classification model that we'll use is the __[Inception-V3 model](https://arxiv.org/abs/1512.00567)__ and we'll run this model on unlabeled images from the __[ImageNet](http://image-net.org/)__ dataset. \n", - "\n", - "The outline of this notebook is as follows:\n", - "\n", - "- Register the pretrained inception model into the model registry. \n", - "- Store the dataset images in a blob container.\n", - "- Use the registered model to do batch scoring on the images in the data blob container." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Prerequisites\n", - "If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the configuration Notebook located at https://github.com/Azure/MachineLearningNotebooks first if you haven't. This sets you up with a working config file that has information on your workspace, subscription id, etc. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import Experiment\n", - "from azureml.core.compute import AmlCompute, ComputeTarget\n", - "from azureml.core.datastore import Datastore\n", - "from azureml.core.runconfig import CondaDependencies, RunConfiguration\n", - "from azureml.data.data_reference import DataReference\n", - "from azureml.pipeline.core import Pipeline, PipelineData\n", - "from azureml.pipeline.steps import PythonScriptStep" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import os\n", - "from azureml.core import Workspace\n", - "\n", - "ws = Workspace.from_config()\n", - "print('Workspace name: ' + ws.name, \n", - " 'Azure region: ' + ws.location, \n", - " 'Subscription id: ' + ws.subscription_id, \n", - " 'Resource group: ' + ws.resource_group, sep = '\\n')\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Set up machine learning resources" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Set up datastores\n", - "First, let\u00e2\u20ac\u2122s access the datastore that has the model, labels, and images. \n", - "\n", - "### Create a datastore that points to a blob container containing sample images\n", - "\n", - "We have created a public blob container `sampledata` on an account named `pipelinedata`, containing images from the ImageNet evaluation set. In the next step, we create a datastore with the name `images_datastore`, which points to this container. In the call to `register_azure_blob_container` below, setting the `overwrite` flag to `True` overwrites any datastore that was created previously with that name. \n", - "\n", - "This step can be changed to point to your blob container by providing your own `datastore_name`, `container_name`, and `account_name`." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "account_name = \"pipelinedata\"\n", - "datastore_name=\"images_datastore\"\n", - "container_name=\"sampledata\"\n", - "\n", - "batchscore_blob = Datastore.register_azure_blob_container(ws, \n", - " datastore_name=datastore_name, \n", - " container_name= container_name, \n", - " account_name=account_name, \n", - " overwrite=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Next, let\u00e2\u20ac\u2122s specify the default datastore for the outputs." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "def_data_store = ws.get_default_datastore()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Configure data references\n", - "Now you need to add references to the data, as inputs to the appropriate pipeline steps in your pipeline. A data source in a pipeline is represented by a DataReference object. The DataReference object points to data that lives in, or is accessible from, a datastore. We need DataReference objects corresponding to the following: the directory containing the input images, the directory in which the pretrained model is stored, the directory containing the labels, and the output directory." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "input_images = DataReference(datastore=batchscore_blob, \n", - " data_reference_name=\"input_images\",\n", - " path_on_datastore=\"batchscoring/images\",\n", - " mode=\"download\"\n", - " )\n", - "model_dir = DataReference(datastore=batchscore_blob, \n", - " data_reference_name=\"input_model\",\n", - " path_on_datastore=\"batchscoring/models\",\n", - " mode=\"download\" \n", - " )\n", - "label_dir = DataReference(datastore=batchscore_blob, \n", - " data_reference_name=\"input_labels\",\n", - " path_on_datastore=\"batchscoring/labels\",\n", - " mode=\"download\" \n", - " )\n", - "output_dir = PipelineData(name=\"scores\", \n", - " datastore=def_data_store, \n", - " output_path_on_compute=\"batchscoring/results\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create and attach Compute targets\n", - "Use the below code to create and attach Compute targets. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# choose a name for your cluster\n", - "aml_compute_name = os.environ.get(\"AML_COMPUTE_NAME\", \"gpu-cluster\")\n", - "cluster_min_nodes = os.environ.get(\"AML_COMPUTE_MIN_NODES\", 0)\n", - "cluster_max_nodes = os.environ.get(\"AML_COMPUTE_MAX_NODES\", 1)\n", - "vm_size = os.environ.get(\"AML_COMPUTE_SKU\", \"STANDARD_NC6\")\n", - "\n", - "\n", - "if aml_compute_name in ws.compute_targets:\n", - " compute_target = ws.compute_targets[aml_compute_name]\n", - " if compute_target and type(compute_target) is AmlCompute:\n", - " print('found compute target. just use it. ' + aml_compute_name)\n", - "else:\n", - " print('creating a new compute target...')\n", - " provisioning_config = AmlCompute.provisioning_configuration(vm_size = vm_size, # NC6 is GPU-enabled\n", - " vm_priority = 'lowpriority', # optional\n", - " min_nodes = cluster_min_nodes, \n", - " max_nodes = cluster_max_nodes)\n", - "\n", - " # create the cluster\n", - " compute_target = ComputeTarget.create(ws, aml_compute_name, provisioning_config)\n", - " \n", - " # can poll for a minimum number of nodes and for a specific timeout. \n", - " # if no min node count is provided it will use the scale settings for the cluster\n", - " compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n", - " \n", - " # For a more detailed view of current Azure Machine Learning Compute status, use get_status()\n", - " print(compute_target.get_status().serialize())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Prepare the Model" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Download the Model\n", - "\n", - "Download and extract the model from http://download.tensorflow.org/models/inception_v3_2016_08_28.tar.gz to `\"models\"`" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# create directory for model\n", - "model_dir = 'models'\n", - "if not os.path.isdir(model_dir):\n", - " os.mkdir(model_dir)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import tarfile\n", - "import urllib.request\n", - "\n", - "url=\"http://download.tensorflow.org/models/inception_v3_2016_08_28.tar.gz\"\n", - "response = urllib.request.urlretrieve(url, \"model.tar.gz\")\n", - "tar = tarfile.open(\"model.tar.gz\", \"r:gz\")\n", - "tar.extractall(model_dir)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Register the model with Workspace" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import shutil\n", - "from azureml.core.model import Model\n", - "\n", - "# register downloaded model \n", - "model = Model.register(model_path = \"models/inception_v3.ckpt\",\n", - " model_name = \"inception\", # this is the name the model is registered as\n", - " tags = {'pretrained': \"inception\"},\n", - " description = \"Imagenet trained tensorflow inception\",\n", - " workspace = ws)\n", - "# remove the downloaded dir after registration if you wish\n", - "shutil.rmtree(\"models\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Write your scoring script" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "To do the scoring, we use a batch scoring script `batch_scoring.py`, which is located in the same directory that this notebook is in. You can take a look at this script to see how you might modify it for your custom batch scoring task.\n", - "\n", - "The python script `batch_scoring.py` takes input images, applies the image classification model to these images, and outputs a classification result to a results file.\n", - "\n", - "The script `batch_scoring.py` takes the following parameters:\n", - "\n", - "- `--model_name`: the name of the model being used, which is expected to be in the `model_dir` directory\n", - "- `--label_dir` : the directory holding the `labels.txt` file \n", - "- `--dataset_path`: the directory containing the input images\n", - "- `--output_dir` : the script will run the model on the data and output a `results-label.txt` to this directory\n", - "- `--batch_size` : the batch size used in running the model.\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Build and run the batch scoring pipeline\n", - "You have everything you need to build the pipeline. Let\u00e2\u20ac\u2122s put all these together." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Specify the environment to run the script\n", - "Specify the conda dependencies for your script. You will need this object when you create the pipeline step later on." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.runconfig import DEFAULT_GPU_IMAGE\n", - "\n", - "cd = CondaDependencies.create(pip_packages=[\"tensorflow-gpu==1.13.1\", \"azureml-defaults\"])\n", - "\n", - "# Runconfig\n", - "amlcompute_run_config = RunConfiguration(conda_dependencies=cd)\n", - "amlcompute_run_config.environment.docker.enabled = True\n", - "amlcompute_run_config.environment.docker.base_image = DEFAULT_GPU_IMAGE\n", - "amlcompute_run_config.environment.spark.precache_packages = False" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Specify the parameters for your pipeline\n", - "A subset of the parameters to the python script can be given as input when we re-run a `PublishedPipeline`. In the current example, we define `batch_size` taken by the script as such parameter." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.pipeline.core.graph import PipelineParameter\n", - "batch_size_param = PipelineParameter(name=\"param_batch_size\", default_value=20)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create the pipeline step\n", - "Create the pipeline step using the script, environment configuration, and parameters. Specify the compute target you already attached to your workspace as the target of execution of the script. We will use PythonScriptStep to create the pipeline step." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "inception_model_name = \"inception_v3.ckpt\"\n", - "\n", - "batch_score_step = PythonScriptStep(\n", - " name=\"batch_scoring\",\n", - " script_name=\"batch_scoring.py\",\n", - " arguments=[\"--dataset_path\", input_images, \n", - " \"--model_name\", \"inception\",\n", - " \"--label_dir\", label_dir, \n", - " \"--output_dir\", output_dir, \n", - " \"--batch_size\", batch_size_param],\n", - " compute_target=compute_target,\n", - " inputs=[input_images, label_dir],\n", - " outputs=[output_dir],\n", - " runconfig=amlcompute_run_config\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Run the pipeline\n", - "At this point you can run the pipeline and examine the output it produced. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "pipelineparameterssample" - ] - }, - "outputs": [], - "source": [ - "pipeline = Pipeline(workspace=ws, steps=[batch_score_step])\n", - "pipeline_run = Experiment(ws, 'batch_scoring').submit(pipeline, pipeline_parameters={\"param_batch_size\": 20})" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Monitor the run" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.widgets import RunDetails\n", - "RunDetails(pipeline_run).show()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "pipeline_run.wait_for_completion(show_output=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Download and review output" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "step_run = list(pipeline_run.get_children())[0]\n", - "step_run.download_file(\"./outputs/result-labels.txt\")" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import pandas as pd\n", - "df = pd.read_csv(\"result-labels.txt\", delimiter=\":\", header=None)\n", - "df.columns = [\"Filename\", \"Prediction\"]\n", - "df.head()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Publish a pipeline and rerun using a REST call" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create a published pipeline\n", - "Once you are satisfied with the outcome of the run, you can publish the pipeline to run it with different input values later. When you publish a pipeline, you will get a REST endpoint that accepts invoking of the pipeline with the set of parameters you have already incorporated above using PipelineParameter." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "published_pipeline = pipeline_run.publish_pipeline(\n", - " name=\"Inception_v3_scoring\", description=\"Batch scoring using Inception v3 model\", version=\"1.0\")\n", - "\n", - "published_pipeline" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Get published pipeline\n", - "\n", - "You can get the published pipeline using **pipeline id**.\n", - "\n", - "To get all the published pipelines for a given workspace(ws): \n", - "```css\n", - "all_pub_pipelines = PublishedPipeline.get_all(ws)\n", - "```" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.pipeline.core import PublishedPipeline\n", - "\n", - "pipeline_id = published_pipeline.id # use your published pipeline id\n", - "published_pipeline = PublishedPipeline.get(ws, pipeline_id)\n", - "\n", - "published_pipeline" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Rerun the pipeline using the REST endpoint" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Get AAD token\n", - "[This notebook](https://aka.ms/pl-restep-auth) shows how to authenticate to AML workspace." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.authentication import InteractiveLoginAuthentication\n", - "import requests\n", - "\n", - "auth = InteractiveLoginAuthentication()\n", - "aad_token = auth.get_authentication_header()\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Run published pipeline" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "rest_endpoint = published_pipeline.endpoint\n", - "# specify batch size when running the pipeline\n", - "response = requests.post(rest_endpoint, \n", - " headers=aad_token, \n", - " json={\"ExperimentName\": \"batch_scoring\",\n", - " \"ParameterAssignments\": {\"param_batch_size\": 50}})" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "try:\n", - " response.raise_for_status()\n", - "except Exception: \n", - " raise Exception('Received bad response from the endpoint: {}\\n'\n", - " 'Response Code: {}\\n'\n", - " 'Headers: {}\\n'\n", - " 'Content: {}'.format(rest_endpoint, response.status_code, response.headers, response.content))\n", - "\n", - "run_id = response.json().get('Id')\n", - "print('Submitted pipeline run: ', run_id)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Monitor the new run" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.pipeline.core.run import PipelineRun\n", - "published_pipeline_run = PipelineRun(ws.experiments[\"batch_scoring\"], run_id)\n", - "\n", - "RunDetails(published_pipeline_run).show()" - ] - } - ], - "metadata": { - "authors": [ - { - "name": "sanpil" - } - ], - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.7" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} \ No newline at end of file diff --git a/how-to-use-azureml/machine-learning-pipelines/pipeline-batch-scoring/pipeline-batch-scoring.yml b/how-to-use-azureml/machine-learning-pipelines/pipeline-batch-scoring/pipeline-batch-scoring.yml deleted file mode 100644 index ac67d296d..000000000 --- a/how-to-use-azureml/machine-learning-pipelines/pipeline-batch-scoring/pipeline-batch-scoring.yml +++ /dev/null @@ -1,7 +0,0 @@ -name: pipeline-batch-scoring -dependencies: -- pip: - - azureml-sdk - - azureml-widgets - - pandas - - requests diff --git a/how-to-use-azureml/machine-learning-pipelines/pipeline-style-transfer/neural_style_mpi.py b/how-to-use-azureml/machine-learning-pipelines/pipeline-style-transfer/neural_style_mpi.py deleted file mode 100644 index d73f330ae..000000000 --- a/how-to-use-azureml/machine-learning-pipelines/pipeline-style-transfer/neural_style_mpi.py +++ /dev/null @@ -1,207 +0,0 @@ -# Original source: https://github.com/pytorch/examples/blob/master/fast_neural_style/neural_style/neural_style.py -import argparse -import os -import sys -import re - -from PIL import Image -import torch -from torchvision import transforms - -from mpi4py import MPI - - -def load_image(filename, size=None, scale=None): - img = Image.open(filename) - if size is not None: - img = img.resize((size, size), Image.ANTIALIAS) - elif scale is not None: - img = img.resize((int(img.size[0] / scale), int(img.size[1] / scale)), Image.ANTIALIAS) - return img - - -def save_image(filename, data): - img = data.clone().clamp(0, 255).numpy() - img = img.transpose(1, 2, 0).astype("uint8") - img = Image.fromarray(img) - img.save(filename) - - -class TransformerNet(torch.nn.Module): - def __init__(self): - super(TransformerNet, self).__init__() - # Initial convolution layers - self.conv1 = ConvLayer(3, 32, kernel_size=9, stride=1) - self.in1 = torch.nn.InstanceNorm2d(32, affine=True) - self.conv2 = ConvLayer(32, 64, kernel_size=3, stride=2) - self.in2 = torch.nn.InstanceNorm2d(64, affine=True) - self.conv3 = ConvLayer(64, 128, kernel_size=3, stride=2) - self.in3 = torch.nn.InstanceNorm2d(128, affine=True) - # Residual layers - self.res1 = ResidualBlock(128) - self.res2 = ResidualBlock(128) - self.res3 = ResidualBlock(128) - self.res4 = ResidualBlock(128) - self.res5 = ResidualBlock(128) - # Upsampling Layers - self.deconv1 = UpsampleConvLayer(128, 64, kernel_size=3, stride=1, upsample=2) - self.in4 = torch.nn.InstanceNorm2d(64, affine=True) - self.deconv2 = UpsampleConvLayer(64, 32, kernel_size=3, stride=1, upsample=2) - self.in5 = torch.nn.InstanceNorm2d(32, affine=True) - self.deconv3 = ConvLayer(32, 3, kernel_size=9, stride=1) - # Non-linearities - self.relu = torch.nn.ReLU() - - def forward(self, X): - y = self.relu(self.in1(self.conv1(X))) - y = self.relu(self.in2(self.conv2(y))) - y = self.relu(self.in3(self.conv3(y))) - y = self.res1(y) - y = self.res2(y) - y = self.res3(y) - y = self.res4(y) - y = self.res5(y) - y = self.relu(self.in4(self.deconv1(y))) - y = self.relu(self.in5(self.deconv2(y))) - y = self.deconv3(y) - return y - - -class ConvLayer(torch.nn.Module): - def __init__(self, in_channels, out_channels, kernel_size, stride): - super(ConvLayer, self).__init__() - reflection_padding = kernel_size // 2 - self.reflection_pad = torch.nn.ReflectionPad2d(reflection_padding) - self.conv2d = torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride) - - def forward(self, x): - out = self.reflection_pad(x) - out = self.conv2d(out) - return out - - -class ResidualBlock(torch.nn.Module): - """ResidualBlock - introduced in: https://arxiv.org/abs/1512.03385 - recommended architecture: http://torch.ch/blog/2016/02/04/resnets.html - """ - - def __init__(self, channels): - super(ResidualBlock, self).__init__() - self.conv1 = ConvLayer(channels, channels, kernel_size=3, stride=1) - self.in1 = torch.nn.InstanceNorm2d(channels, affine=True) - self.conv2 = ConvLayer(channels, channels, kernel_size=3, stride=1) - self.in2 = torch.nn.InstanceNorm2d(channels, affine=True) - self.relu = torch.nn.ReLU() - - def forward(self, x): - residual = x - out = self.relu(self.in1(self.conv1(x))) - out = self.in2(self.conv2(out)) - out = out + residual - return out - - -class UpsampleConvLayer(torch.nn.Module): - """UpsampleConvLayer - Upsamples the input and then does a convolution. This method gives better results - compared to ConvTranspose2d. - ref: http://distill.pub/2016/deconv-checkerboard/ - """ - - def __init__(self, in_channels, out_channels, kernel_size, stride, upsample=None): - super(UpsampleConvLayer, self).__init__() - self.upsample = upsample - if upsample: - self.upsample_layer = torch.nn.Upsample(mode='nearest', scale_factor=upsample) - reflection_padding = kernel_size // 2 - self.reflection_pad = torch.nn.ReflectionPad2d(reflection_padding) - self.conv2d = torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride) - - def forward(self, x): - x_in = x - if self.upsample: - x_in = self.upsample_layer(x_in) - out = self.reflection_pad(x_in) - out = self.conv2d(out) - return out - - -def stylize(args, comm): - - rank = comm.Get_rank() - size = comm.Get_size() - - device = torch.device("cuda" if args.cuda else "cpu") - with torch.no_grad(): - style_model = TransformerNet() - state_dict = torch.load(os.path.join(args.model_dir, args.style + ".pth")) - # remove saved deprecated running_* keys in InstanceNorm from the checkpoint - for k in list(state_dict.keys()): - if re.search(r'in\d+\.running_(mean|var)$', k): - del state_dict[k] - style_model.load_state_dict(state_dict) - style_model.to(device) - - filenames = os.listdir(args.content_dir) - filenames = sorted(filenames) - partition_size = len(filenames) // size - partitioned_filenames = filenames[rank * partition_size: (rank + 1) * partition_size] - print("RANK {} - is processing {} images out of the total {}".format(rank, len(partitioned_filenames), - len(filenames))) - - output_paths = [] - for filename in partitioned_filenames: - # print("Processing {}".format(filename)) - full_path = os.path.join(args.content_dir, filename) - content_image = load_image(full_path, scale=args.content_scale) - content_transform = transforms.Compose([ - transforms.ToTensor(), - transforms.Lambda(lambda x: x.mul(255)) - ]) - content_image = content_transform(content_image) - content_image = content_image.unsqueeze(0).to(device) - - output = style_model(content_image).cpu() - - output_path = os.path.join(args.output_dir, filename) - save_image(output_path, output[0]) - - output_paths.append(output_path) - - print("RANK {} - number of pre-aggregated output files {}".format(rank, len(output_paths))) - - output_paths_list = comm.gather(output_paths, root=0) - - if rank == 0: - print("RANK {} - number of aggregated output files {}".format(rank, len(output_paths_list))) - print("RANK {} - end".format(rank)) - - -def main(): - arg_parser = argparse.ArgumentParser(description="parser for fast-neural-style") - - arg_parser.add_argument("--content-scale", type=float, default=None, - help="factor for scaling down the content image") - arg_parser.add_argument("--model-dir", type=str, required=True, - help="saved model to be used for stylizing the image.") - arg_parser.add_argument("--cuda", type=int, required=True, - help="set it to 1 for running on GPU, 0 for CPU") - arg_parser.add_argument("--style", type=str, help="style name") - arg_parser.add_argument("--content-dir", type=str, required=True, - help="directory holding the images") - arg_parser.add_argument("--output-dir", type=str, required=True, - help="directory holding the output images") - args = arg_parser.parse_args() - - comm = MPI.COMM_WORLD - - if args.cuda and not torch.cuda.is_available(): - print("ERROR: cuda is not available, try running on CPU") - sys.exit(1) - os.makedirs(args.output_dir, exist_ok=True) - stylize(args, comm) - - -if __name__ == "__main__": - main() diff --git a/how-to-use-azureml/machine-learning-pipelines/pipeline-style-transfer/pipeline-style-transfer.ipynb b/how-to-use-azureml/machine-learning-pipelines/pipeline-style-transfer/pipeline-style-transfer.ipynb index 9dbaf1f3e..0643b8a9d 100644 --- a/how-to-use-azureml/machine-learning-pipelines/pipeline-style-transfer/pipeline-style-transfer.ipynb +++ b/how-to-use-azureml/machine-learning-pipelines/pipeline-style-transfer/pipeline-style-transfer.ipynb @@ -16,13 +16,6 @@ "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/machine-learning-pipelines/pipeline-style-transfer/pipeline-style-transfer.png)" ] }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "**Note**: Azure Machine Learning recently released ParallelRunStep for public preview, this will allow for parallelization of your workload across many compute nodes without the difficulty of orchestrating worker pools and queues. See the [batch inference notebooks](../../../contrib/batch_inferencing/) for examples on how to get started." - ] - }, { "cell_type": "markdown", "metadata": {}, @@ -31,7 +24,13 @@ "Using modified code from `pytorch`'s neural style [example](https://pytorch.org/tutorials/advanced/neural_style_tutorial.html), we show how to setup a pipeline for doing style transfer on video. The pipeline has following steps:\n", "1. Split a video into images\n", "2. Run neural style on each image using one of the provided models (from `pytorch` pretrained models for this example).\n", - "3. Stitch the image back into a video." + "3. Stitch the image back into a video.\n", + "\n", + "> **Note**\n", + "This notebook uses public preview functionality (ParallelRunStep). Please install azureml-contrib-pipeline-steps package before running this notebook.\n", + "```\n", + "pip install azureml-contrib-pipeline-steps\n", + "```" ] }, { @@ -57,19 +56,25 @@ "metadata": {}, "outputs": [], "source": [ - "import os\n", + "# Check core SDK version number\n", + "import azureml.core\n", + "\n", + "print(\"SDK version:\", azureml.core.VERSION)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ "from azureml.core import Workspace, Experiment\n", "\n", "ws = Workspace.from_config()\n", "print('Workspace name: ' + ws.name, \n", " 'Azure region: ' + ws.location, \n", " 'Subscription id: ' + ws.subscription_id, \n", - " 'Resource group: ' + ws.resource_group, sep = '\\n')\n", - "\n", - "scripts_folder = \"scripts_folder\"\n", - "\n", - "if not os.path.isdir(scripts_folder):\n", - " os.mkdir(scripts_folder)" + " 'Resource group: ' + ws.resource_group, sep = '\\n')" ] }, { @@ -82,11 +87,96 @@ "from azureml.core.datastore import Datastore\n", "from azureml.data.data_reference import DataReference\n", "from azureml.pipeline.core import Pipeline, PipelineData\n", - "from azureml.pipeline.steps import PythonScriptStep, MpiStep\n", + "from azureml.pipeline.steps import PythonScriptStep\n", "from azureml.core.runconfig import CondaDependencies, RunConfiguration\n", "from azureml.core.compute_target import ComputeTargetException" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Download models" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "\n", + "# create directory for model\n", + "model_dir = 'models'\n", + "if not os.path.isdir(model_dir):\n", + " os.mkdir(model_dir)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import urllib.request\n", + "\n", + "def download_model(model_name):\n", + " # downloaded models from https://pytorch.org/tutorials/advanced/neural_style_tutorial.html are kept here\n", + " url=\"https://pipelinedata.blob.core.windows.net/styletransfer/saved_models/\" + model_name\n", + " local_path = os.path.join(model_dir, model_name)\n", + " urllib.request.urlretrieve(url, local_path)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Register all Models" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.model import Model\n", + "mosaic_model = None\n", + "candy_model = None\n", + "\n", + "models = Model.list(workspace=ws, tags=['scenario'])\n", + "for m in models:\n", + " print(\"Name:\", m.name,\"\\tVersion:\", m.version, \"\\tDescription:\", m.description, m.tags)\n", + " if m.name == 'mosaic' and mosaic_model is None:\n", + " mosaic_model = m\n", + " elif m.name == 'candy' and candy_model is None:\n", + " candy_model = m\n", + "\n", + "if mosaic_model is None:\n", + " print('Mosaic model does not exist, registering it')\n", + " download_model('mosaic.pth')\n", + " mosaic_model = Model.register(model_path = os.path.join(model_dir, \"mosaic.pth\"),\n", + " model_name = \"mosaic\",\n", + " tags = {'type': \"mosaic\", 'scenario': \"Style transfer using batch inference\"},\n", + " description = \"Style transfer - Mosaic\",\n", + " workspace = ws)\n", + "else:\n", + " print('Reusing existing mosaic model')\n", + " \n", + "\n", + "if candy_model is None:\n", + " print('Candy model does not exist, registering it')\n", + " download_model('candy.pth')\n", + " candy_model = Model.register(model_path = os.path.join(model_dir, \"candy.pth\"),\n", + " model_name = \"candy\",\n", + " tags = {'type': \"candy\", 'scenario': \"Style transfer using batch inference\"},\n", + " description = \"Style transfer - Candy\",\n", + " workspace = ws)\n", + "else:\n", + " print('Reusing existing candy model')" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -122,7 +212,7 @@ "except ComputeTargetException:\n", " print(\"creating new cluster\")\n", " provisioning_config = AmlCompute.provisioning_configuration(vm_size = \"STANDARD_NC6\",\n", - " max_nodes = 3)\n", + " max_nodes = 3)\n", "\n", " # create the cluster\n", " gpu_cluster = ComputeTarget.create(ws, gpu_cluster_name, provisioning_config)\n", @@ -145,8 +235,7 @@ "metadata": {}, "outputs": [], "source": [ - "import shutil\n", - "shutil.copy(\"neural_style_mpi.py\", scripts_folder)" + "scripts_folder = \"scripts\"" ] }, { @@ -155,31 +244,11 @@ "metadata": {}, "outputs": [], "source": [ - "%%writefile $scripts_folder/process_video.py\n", - "import argparse\n", - "import glob\n", - "import os\n", - "import subprocess\n", - "\n", - "parser = argparse.ArgumentParser(description=\"Process input video\")\n", - "parser.add_argument('--input_video', required=True)\n", - "parser.add_argument('--output_audio', required=True)\n", - "parser.add_argument('--output_images', required=True)\n", - "\n", - "args = parser.parse_args()\n", - "\n", - "os.makedirs(args.output_audio, exist_ok=True)\n", - "os.makedirs(args.output_images, exist_ok=True)\n", - "\n", - "subprocess.run(\"ffmpeg -i {} {}/video.aac\"\n", - " .format(args.input_video, args.output_audio),\n", - " shell=True, check=True\n", - " )\n", + "process_video_script_file = \"process_video.py\"\n", "\n", - "subprocess.run(\"ffmpeg -i {} {}/%05d_video.jpg -hide_banner\"\n", - " .format(args.input_video, args.output_images),\n", - " shell=True, check=True\n", - " )" + "# peek at contents\n", + "with open(os.path.join(scripts_folder, process_video_script_file)) as process_video_file:\n", + " print(process_video_file.read())" ] }, { @@ -188,31 +257,11 @@ "metadata": {}, "outputs": [], "source": [ - "%%writefile $scripts_folder/stitch_video.py\n", - "import argparse\n", - "import os\n", - "import subprocess\n", - "\n", - "parser = argparse.ArgumentParser(description=\"Process input video\")\n", - "parser.add_argument('--images_dir', required=True)\n", - "parser.add_argument('--input_audio', required=True)\n", - "parser.add_argument('--output_dir', required=True)\n", - "\n", - "args = parser.parse_args()\n", - "\n", - "os.makedirs(args.output_dir, exist_ok=True)\n", + "stitch_video_script_file = \"stitch_video.py\"\n", "\n", - "subprocess.run(\"ffmpeg -framerate 30 -i {}/%05d_video.jpg -c:v libx264 -profile:v high -crf 20 -pix_fmt yuv420p \"\n", - " \"-y {}/video_without_audio.mp4\"\n", - " .format(args.images_dir, args.output_dir),\n", - " shell=True, check=True\n", - " )\n", - "\n", - "subprocess.run(\"ffmpeg -i {}/video_without_audio.mp4 -i {}/video.aac -map 0:0 -map 1:0 -vcodec \"\n", - " \"copy -acodec copy -y {}/video_with_audio.mp4\"\n", - " .format(args.output_dir, args.input_audio, args.output_dir),\n", - " shell=True, check=True\n", - " )" + "# peek at contents\n", + "with open(os.path.join(scripts_folder, stitch_video_script_file)) as stitch_video_file:\n", + " print(stitch_video_file.read())" ] }, { @@ -233,15 +282,6 @@ "video_ds = Datastore.register_azure_blob_container(ws, \"videos\", \"sample-videos\",\n", " account_name=account_name, overwrite=True)\n", "\n", - "# datastore for models\n", - "models_ds = Datastore.register_azure_blob_container(ws, \"models\", \"styletransfer\", \n", - " account_name=\"pipelinedata\", \n", - " overwrite=True)\n", - " \n", - "# downloaded models from https://pytorch.org/tutorials/advanced/neural_style_tutorial.html are kept here\n", - "models_dir = DataReference(data_reference_name=\"models\", datastore=models_ds, \n", - " path_on_datastore=\"saved_models\", mode=\"download\")\n", - "\n", "# the default blob store attached to a workspace\n", "default_datastore = ws.get_default_datastore()" ] @@ -274,15 +314,10 @@ "cd = CondaDependencies()\n", "\n", "cd.add_channel(\"conda-forge\")\n", - "cd.add_conda_package(\"ffmpeg\")\n", - "\n", - "cd.add_channel(\"pytorch\")\n", - "cd.add_conda_package(\"pytorch\")\n", - "cd.add_conda_package(\"torchvision\")\n", + "cd.add_conda_package(\"ffmpeg==4.0.2\")\n", "\n", "# Runconfig\n", "amlcompute_run_config = RunConfiguration(conda_dependencies=cd)\n", - "amlcompute_run_config.environment.docker.enabled = True\n", "amlcompute_run_config.environment.docker.base_image = \"pytorch/pytorch\"\n", "amlcompute_run_config.environment.spark.precache_packages = False" ] @@ -294,9 +329,12 @@ "outputs": [], "source": [ "ffmpeg_audio = PipelineData(name=\"ffmpeg_audio\", datastore=default_datastore)\n", - "ffmpeg_images = PipelineData(name=\"ffmpeg_images\", datastore=default_datastore)\n", "processed_images = PipelineData(name=\"processed_images\", datastore=default_datastore)\n", - "output_video = PipelineData(name=\"output_video\", datastore=default_datastore)" + "output_video = PipelineData(name=\"output_video\", datastore=default_datastore)\n", + "\n", + "ffmpeg_images_ds_name = \"ffmpeg_images_data\"\n", + "ffmpeg_images = PipelineData(name=\"ffmpeg_images\", datastore=default_datastore)\n", + "ffmpeg_images_file_dataset = ffmpeg_images.as_dataset()" ] }, { @@ -304,7 +342,10 @@ "metadata": {}, "source": [ "# Define tweakable parameters to pipeline\n", - "These parameters can be changed when the pipeline is published and rerun from a REST call" + "These parameters can be changed when the pipeline is published and rerun from a REST call.\n", + "As part of ParallelRunStep following 2 pipeline parameters will be created which can be used to override values.\n", + " node_count\n", + " process_count_per_node" ] }, { @@ -314,10 +355,8 @@ "outputs": [], "source": [ "from azureml.pipeline.core.graph import PipelineParameter\n", - "# create a parameter for style (one of \"candy\", \"mosaic\", \"rain_princess\", \"udnie\") to transfer the images to\n", - "style_param = PipelineParameter(name=\"style\", default_value=\"mosaic\")\n", - "# create a parameter for the number of nodes to use in step no. 2 (style transfer)\n", - "nodecount_param = PipelineParameter(name=\"nodecount\", default_value=1)" + "# create a parameter for style (one of \"candy\", \"mosaic\") to transfer the images to\n", + "style_param = PipelineParameter(name=\"style\", default_value=\"mosaic\")" ] }, { @@ -331,36 +370,15 @@ " script_name=\"process_video.py\",\n", " arguments=[\"--input_video\", orangutan_video,\n", " \"--output_audio\", ffmpeg_audio,\n", - " \"--output_images\", ffmpeg_images,\n", + " \"--output_images\", ffmpeg_images_file_dataset,\n", " ],\n", " compute_target=cpu_cluster,\n", " inputs=[orangutan_video],\n", - " outputs=[ffmpeg_images, ffmpeg_audio],\n", + " outputs=[ffmpeg_images_file_dataset, ffmpeg_audio],\n", " runconfig=amlcompute_run_config,\n", " source_directory=scripts_folder\n", ")\n", "\n", - "# create a MPI step for distributing style transfer step across multiple nodes in AmlCompute \n", - "# using 'nodecount_param' PipelineParameter\n", - "distributed_style_transfer_step = MpiStep(\n", - " name=\"mpi style transfer\",\n", - " script_name=\"neural_style_mpi.py\",\n", - " arguments=[\"--content-dir\", ffmpeg_images,\n", - " \"--output-dir\", processed_images,\n", - " \"--model-dir\", models_dir,\n", - " \"--style\", style_param,\n", - " \"--cuda\", 1\n", - " ],\n", - " compute_target=gpu_cluster,\n", - " node_count=nodecount_param, \n", - " process_count_per_node=1,\n", - " inputs=[models_dir, ffmpeg_images],\n", - " outputs=[processed_images],\n", - " pip_packages=[\"mpi4py\", \"torch\", \"torchvision\"],\n", - " use_gpu=True,\n", - " source_directory=scripts_folder\n", - ")\n", - "\n", "stitch_video_step = PythonScriptStep(\n", " name=\"stitch\",\n", " script_name=\"stitch_video.py\",\n", @@ -379,7 +397,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Run the pipeline" + "# Create environment, parallel step run config and parallel run step" ] }, { @@ -388,16 +406,19 @@ "metadata": {}, "outputs": [], "source": [ - "pipeline = Pipeline(workspace=ws, steps=[stitch_video_step])\n", - "# submit the pipeline and provide values for the PipelineParameters used in the pipeline\n", - "pipeline_run = Experiment(ws, 'style_transfer').submit(pipeline, pipeline_parameters={\"style\": \"mosaic\", \"nodecount\": 3})" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Monitor using widget" + "from azureml.core import Environment\n", + "from azureml.core.runconfig import DEFAULT_GPU_IMAGE\n", + "\n", + "parallel_cd = CondaDependencies()\n", + "\n", + "parallel_cd.add_channel(\"pytorch\")\n", + "parallel_cd.add_conda_package(\"pytorch\")\n", + "parallel_cd.add_conda_package(\"torchvision\")\n", + "parallel_cd.add_conda_package(\"pillow<7\") # needed for torchvision==0.4.0\n", + "\n", + "styleenvironment = Environment(name=\"styleenvironment\")\n", + "styleenvironment.python.conda_dependencies=parallel_cd\n", + "styleenvironment.docker.base_image = DEFAULT_GPU_IMAGE" ] }, { @@ -406,22 +427,48 @@ "metadata": {}, "outputs": [], "source": [ - "from azureml.widgets import RunDetails\n", - "RunDetails(pipeline_run).show()" + "from azureml.contrib.pipeline.steps import ParallelRunConfig\n", + "\n", + "parallel_run_config = ParallelRunConfig(\n", + " environment=styleenvironment,\n", + " entry_script='transform.py',\n", + " output_action='summary_only',\n", + " mini_batch_size=\"1\",\n", + " error_threshold=1,\n", + " source_directory=scripts_folder,\n", + " compute_target=gpu_cluster, \n", + " node_count=3)" ] }, { - "cell_type": "markdown", + "cell_type": "code", + "execution_count": null, "metadata": {}, + "outputs": [], "source": [ - "Downloads the video in `output_video` folder" + "from azureml.contrib.pipeline.steps import ParallelRunStep\n", + "from datetime import datetime\n", + "\n", + "parallel_step_name = 'styletransfer-' + datetime.now().strftime('%Y%m%d%H%M')\n", + "\n", + "distributed_style_transfer_step = ParallelRunStep(\n", + " name=parallel_step_name,\n", + " inputs=[ffmpeg_images_file_dataset], # Input file share/blob container/file dataset\n", + " output=processed_images, # Output file share/blob container\n", + " models=[mosaic_model, candy_model],\n", + " tags = {'scenario': \"batch inference\", 'type': \"demo\"},\n", + " properties = {'area': \"style transfer\"},\n", + " arguments=[\"--style\", style_param],\n", + " parallel_run_config=parallel_run_config,\n", + " allow_reuse=True #[optional - default value True]\n", + ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "# Download output video" + "# Run the pipeline" ] }, { @@ -430,10 +477,9 @@ "metadata": {}, "outputs": [], "source": [ - "def download_video(run, target_dir=None):\n", - " stitch_run = run.find_step_run(\"stitch\")[0]\n", - " port_data = stitch_run.get_output_data(\"output_video\")\n", - " port_data.download(target_dir, show_progress=True)" + "pipeline = Pipeline(workspace=ws, steps=[stitch_video_step])\n", + "\n", + "pipeline.validate()" ] }, { @@ -442,15 +488,15 @@ "metadata": {}, "outputs": [], "source": [ - "pipeline_run.wait_for_completion()\n", - "download_video(pipeline_run, \"output_video_mosaic\")" + "# submit the pipeline and provide values for the PipelineParameters used in the pipeline\n", + "pipeline_run = Experiment(ws, 'styletransfer_parallel_mosaic').submit(pipeline)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "# Publish pipeline" + "# Monitor using widget" ] }, { @@ -459,24 +505,9 @@ "metadata": {}, "outputs": [], "source": [ - "published_pipeline = pipeline_run.publish_pipeline(\n", - " name=\"batch score style transfer\", description=\"style transfer\", version=\"1.0\")\n", - "\n", - "published_pipeline" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Get published pipeline\n", - "\n", - "You can get the published pipeline using **pipeline id**.\n", - "\n", - "To get all the published pipelines for a given workspace(ws): \n", - "```css\n", - "all_pub_pipelines = PublishedPipeline.get_all(ws)\n", - "```" + "# Track pipeline run progress\n", + "from azureml.widgets import RunDetails\n", + "RunDetails(pipeline_run).show()" ] }, { @@ -485,27 +516,21 @@ "metadata": {}, "outputs": [], "source": [ - "from azureml.pipeline.core import PublishedPipeline\n", - "\n", - "pipeline_id = published_pipeline.id # use your published pipeline id\n", - "published_pipeline = PublishedPipeline.get(ws, pipeline_id)\n", - "\n", - "published_pipeline" + "pipeline_run.wait_for_completion()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "# Re-run pipeline through REST calls for other styles" + "Downloads the video in `output_video` folder" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## Get AAD token\n", - "[This notebook](https://aka.ms/pl-restep-auth) shows how to authenticate to AML workspace." + "# Download output video" ] }, { @@ -514,18 +539,10 @@ "metadata": {}, "outputs": [], "source": [ - "from azureml.core.authentication import InteractiveLoginAuthentication\n", - "import requests\n", - "\n", - "auth = InteractiveLoginAuthentication()\n", - "aad_token = auth.get_authentication_header()\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Get endpoint URL" + "def download_video(run, target_dir=None):\n", + " stitch_run = run.find_step_run(\"stitch\")[0]\n", + " port_data = stitch_run.get_output_data(\"output_video\")\n", + " port_data.download(target_dir, show_progress=True)" ] }, { @@ -534,21 +551,15 @@ "metadata": {}, "outputs": [], "source": [ - "rest_endpoint = published_pipeline.endpoint" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Send request and monitor" + "pipeline_run.wait_for_completion()\n", + "download_video(pipeline_run, \"output_video_mosaic\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Run the pipeline using PipelineParameter values style='candy' and nodecount=2" + "# Publish pipeline" ] }, { @@ -557,28 +568,21 @@ "metadata": {}, "outputs": [], "source": [ - "response = requests.post(rest_endpoint, \n", - " headers=aad_token,\n", - " json={\"ExperimentName\": \"style_transfer\",\n", - " \"ParameterAssignments\": {\"style\": \"candy\", \"nodecount\": 2}})" + "pipeline_name = \"style-transfer-batch-inference\"\n", + "print(pipeline_name)\n", + "\n", + "published_pipeline = pipeline.publish(\n", + " name=pipeline_name, \n", + " description=pipeline_name)\n", + "print(\"Newly published pipeline id: {}\".format(published_pipeline.id))" ] }, { - "cell_type": "code", - "execution_count": null, + "cell_type": "markdown", "metadata": {}, - "outputs": [], "source": [ - "try:\n", - " response.raise_for_status()\n", - "except Exception: \n", - " raise Exception('Received bad response from the endpoint: {}\\n'\n", - " 'Response Code: {}\\n'\n", - " 'Headers: {}\\n'\n", - " 'Content: {}'.format(rest_endpoint, response.status_code, response.headers, response.content))\n", - "\n", - "run_id = response.json().get('Id')\n", - "print('Submitted pipeline run: ', run_id)" + "# Get published pipeline\n", + "This is another way to get the published pipeline." ] }, { @@ -587,16 +591,32 @@ "metadata": {}, "outputs": [], "source": [ - "from azureml.pipeline.core.run import PipelineRun\n", - "published_pipeline_run_candy = PipelineRun(ws.experiments[\"style_transfer\"], run_id)\n", - "RunDetails(published_pipeline_run_candy).show()" + "from azureml.pipeline.core import PublishedPipeline\n", + "\n", + "# You could retrieve all pipelines that are published, or \n", + "# just get the published pipeline object that you have the ID for.\n", + "\n", + "# Get all published pipeline objects in the workspace\n", + "all_pub_pipelines = PublishedPipeline.list(ws)\n", + "\n", + "# We will iterate through the list of published pipelines and \n", + "# use the last ID in the list for Schelue operations: \n", + "print(\"Published pipelines found in the workspace:\")\n", + "for pub_pipeline in all_pub_pipelines:\n", + " print(\"Name:\", pub_pipeline.name,\"\\tDescription:\", pub_pipeline.description, \"\\tId:\", pub_pipeline.id, \"\\tStatus:\", pub_pipeline.status)\n", + " if(pub_pipeline.name == pipeline_name):\n", + " published_pipeline = pub_pipeline\n", + "\n", + "print(\"Published pipeline id: {}\".format(published_pipeline.id))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Run the pipeline using PipelineParameter values style='rain_princess' and nodecount=3" + "# Run pipeline through REST calls for other styles\n", + "\n", + "# Get AAD token" ] }, { @@ -605,28 +625,18 @@ "metadata": {}, "outputs": [], "source": [ - "response = requests.post(rest_endpoint, \n", - " headers=aad_token,\n", - " json={\"ExperimentName\": \"style_transfer\",\n", - " \"ParameterAssignments\": {\"style\": \"rain_princess\", \"nodecount\": 3}})" + "from azureml.core.authentication import InteractiveLoginAuthentication\n", + "import requests\n", + "\n", + "auth = InteractiveLoginAuthentication()\n", + "aad_token = auth.get_authentication_header()" ] }, { - "cell_type": "code", - "execution_count": null, + "cell_type": "markdown", "metadata": {}, - "outputs": [], "source": [ - "try:\n", - " response.raise_for_status()\n", - "except Exception: \n", - " raise Exception('Received bad response from the endpoint: {}\\n'\n", - " 'Response Code: {}\\n'\n", - " 'Headers: {}\\n'\n", - " 'Content: {}'.format(rest_endpoint, response.status_code, response.headers, response.content))\n", - "\n", - "run_id = response.json().get('Id')\n", - "print('Submitted pipeline run: ', run_id)" + "# Get endpoint URL" ] }, { @@ -635,15 +645,15 @@ "metadata": {}, "outputs": [], "source": [ - "published_pipeline_run_rain = PipelineRun(ws.experiments[\"style_transfer\"], run_id)\n", - "RunDetails(published_pipeline_run_rain).show()" + "rest_endpoint = published_pipeline.endpoint\n", + "print(\"Pipeline REST endpoing: {}\".format(rest_endpoint))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Run the pipeline using PipelineParameter values style='udnie' and nodecount=4" + "# Send request and monitor" ] }, { @@ -652,45 +662,24 @@ "metadata": {}, "outputs": [], "source": [ + "experiment_name = 'styletransfer_parallel_candy'\n", "response = requests.post(rest_endpoint, \n", " headers=aad_token,\n", - " json={\"ExperimentName\": \"style_transfer\",\n", - " \"ParameterAssignments\": {\"style\": \"udnie\", \"nodecount\": 3}})\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "try:\n", - " response.raise_for_status()\n", - "except Exception: \n", - " raise Exception('Received bad response from the endpoint: {}\\n'\n", - " 'Response Code: {}\\n'\n", - " 'Headers: {}\\n'\n", - " 'Content: {}'.format(rest_endpoint, response.status_code, response.headers, response.content))\n", + " json={\"ExperimentName\": experiment_name,\n", + " \"ParameterAssignments\": {\"style\": \"candy\", \"aml_node_count\": 2}})\n", + "run_id = response.json()[\"Id\"]\n", "\n", - "run_id = response.json().get('Id')\n", - "print('Submitted pipeline run: ', run_id)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "published_pipeline_run_udnie = PipelineRun(ws.experiments[\"style_transfer\"], run_id)\n", - "RunDetails(published_pipeline_run_udnie).show()" + "from azureml.pipeline.core.run import PipelineRun\n", + "published_pipeline_run_candy = PipelineRun(ws.experiments[experiment_name], run_id)\n", + "\n", + "RunDetails(published_pipeline_run_candy).show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## Download output from re-run" + "# Download output from re-run" ] }, { @@ -699,9 +688,7 @@ "metadata": {}, "outputs": [], "source": [ - "published_pipeline_run_candy.wait_for_completion()\n", - "published_pipeline_run_rain.wait_for_completion()\n", - "published_pipeline_run_udnie.wait_for_completion()" + "published_pipeline_run_candy.wait_for_completion()" ] }, { @@ -710,18 +697,30 @@ "metadata": {}, "outputs": [], "source": [ - "download_video(published_pipeline_run_candy, target_dir=\"output_video_candy\")\n", - "download_video(published_pipeline_run_rain, target_dir=\"output_video_rain_princess\")\n", - "download_video(published_pipeline_run_udnie, target_dir=\"output_video_udnie\")" + "download_video(published_pipeline_run_candy, target_dir=\"output_video_candy\")" ] } ], "metadata": { "authors": [ { - "name": "sanpil" + "name": "sanpil joringer asraniwa pansav tracych" } ], + "category": "Other notebooks", + "compute": [ + "AML Compute" + ], + "datasets": [], + "deployment": [ + "None" + ], + "exclude_from_index": true, + "framework": [ + "None" + ], + "friendly_name": "Style transfer using ParallelRunStep", + "index_order": 1, "kernelspec": { "display_name": "Python 3.6", "language": "python", @@ -737,8 +736,13 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.6.7" - } + "version": "3.6.9" + }, + "tags": [ + "Batch Inferencing", + "Pipeline" + ], + "task": "Style transfer" }, "nbformat": 4, "nbformat_minor": 2 diff --git a/how-to-use-azureml/machine-learning-pipelines/pipeline-style-transfer/pipeline-style-transfer.yml b/how-to-use-azureml/machine-learning-pipelines/pipeline-style-transfer/pipeline-style-transfer.yml index a77e69222..6e5fcd7e7 100644 --- a/how-to-use-azureml/machine-learning-pipelines/pipeline-style-transfer/pipeline-style-transfer.yml +++ b/how-to-use-azureml/machine-learning-pipelines/pipeline-style-transfer/pipeline-style-transfer.yml @@ -2,5 +2,6 @@ name: pipeline-style-transfer dependencies: - pip: - azureml-sdk + - azureml-contrib-pipeline-steps - azureml-widgets - requests diff --git a/how-to-use-azureml/machine-learning-pipelines/pipeline-style-transfer/scripts/process_video.py b/how-to-use-azureml/machine-learning-pipelines/pipeline-style-transfer/scripts/process_video.py new file mode 100644 index 000000000..1148f5330 --- /dev/null +++ b/how-to-use-azureml/machine-learning-pipelines/pipeline-style-transfer/scripts/process_video.py @@ -0,0 +1,22 @@ +import argparse +import glob +import os +import subprocess + +parser = argparse.ArgumentParser(description="Process input video") +parser.add_argument('--input_video', required=True) +parser.add_argument('--output_audio', required=True) +parser.add_argument('--output_images', required=True) + +args = parser.parse_args() + +os.makedirs(args.output_audio, exist_ok=True) +os.makedirs(args.output_images, exist_ok=True) + +subprocess.run("ffmpeg -i {} {}/video.aac".format(args.input_video, args.output_audio), + shell=True, + check=True) + +subprocess.run("ffmpeg -i {} {}/%05d_video.jpg -hide_banner".format(args.input_video, args.output_images), + shell=True, + check=True) diff --git a/how-to-use-azureml/machine-learning-pipelines/pipeline-style-transfer/scripts/stitch_video.py b/how-to-use-azureml/machine-learning-pipelines/pipeline-style-transfer/scripts/stitch_video.py new file mode 100644 index 000000000..ce237772c --- /dev/null +++ b/how-to-use-azureml/machine-learning-pipelines/pipeline-style-transfer/scripts/stitch_video.py @@ -0,0 +1,22 @@ +import argparse +import os +import subprocess + +parser = argparse.ArgumentParser(description="Process input video") +parser.add_argument('--images_dir', required=True) +parser.add_argument('--input_audio', required=True) +parser.add_argument('--output_dir', required=True) + +args = parser.parse_args() + +os.makedirs(args.output_dir, exist_ok=True) + +subprocess.run("ffmpeg -framerate 30 -i {}/%05d_video.jpg -c:v libx264 -profile:v high -crf 20 -pix_fmt yuv420p " + "-y {}/video_without_audio.mp4" + .format(args.images_dir, args.output_dir), + shell=True, check=True) + +subprocess.run("ffmpeg -i {}/video_without_audio.mp4 -i {}/video.aac -map 0:0 -map 1:0 -vcodec " + "copy -acodec copy -y {}/video_with_audio.mp4" + .format(args.output_dir, args.input_audio, args.output_dir), + shell=True, check=True) diff --git a/how-to-use-azureml/machine-learning-pipelines/pipeline-style-transfer/neural_style.py b/how-to-use-azureml/machine-learning-pipelines/pipeline-style-transfer/scripts/transform.py similarity index 70% rename from how-to-use-azureml/machine-learning-pipelines/pipeline-style-transfer/neural_style.py rename to how-to-use-azureml/machine-learning-pipelines/pipeline-style-transfer/scripts/transform.py index 1a59c6430..f8ac0ee44 100644 --- a/how-to-use-azureml/machine-learning-pipelines/pipeline-style-transfer/neural_style.py +++ b/how-to-use-azureml/machine-learning-pipelines/pipeline-style-transfer/scripts/transform.py @@ -1,28 +1,17 @@ -# Original source: https://github.com/pytorch/examples/blob/master/fast_neural_style/neural_style/neural_style.py import argparse import os import sys import re - +import json +import traceback from PIL import Image + import torch from torchvision import transforms +from azureml.core.model import Model -def load_image(filename, size=None, scale=None): - img = Image.open(filename) - if size is not None: - img = img.resize((size, size), Image.ANTIALIAS) - elif scale is not None: - img = img.resize((int(img.size[0] / scale), int(img.size[1] / scale)), Image.ANTIALIAS) - return img - - -def save_image(filename, data): - img = data.clone().clamp(0, 255).numpy() - img = img.transpose(1, 2, 0).astype("uint8") - img = Image.fromarray(img) - img.save(filename) +style_model = None class TransformerNet(torch.nn.Module): @@ -123,62 +112,61 @@ def forward(self, x): out = self.reflection_pad(x_in) out = self.conv2d(out) return out - -def stylize(args): - device = torch.device("cuda" if args.cuda else "cpu") + +def load_image(filename): + img = Image.open(filename) + return img + + +def save_image(filename, data): + img = data.clone().clamp(0, 255).numpy() + img = img.transpose(1, 2, 0).astype("uint8") + img = Image.fromarray(img) + img.save(filename) + + +def init(): + global output_path, args + global style_model, device + output_path = os.environ['AZUREML_BI_OUTPUT_PATH'] + print(f'output path: {output_path}') + print(f'Cuda available? {torch.cuda.is_available()}') + + arg_parser = argparse.ArgumentParser(description="parser for fast-neural-style") + arg_parser.add_argument("--style", type=str, help="style name") + args, unknown_args = arg_parser.parse_known_args() + device = torch.device("cuda" if torch.cuda.is_available() else "cpu") with torch.no_grad(): style_model = TransformerNet() - state_dict = torch.load(os.path.join(args.model_dir, args.style+".pth")) + model_path = Model.get_model_path(args.style) + state_dict = torch.load(os.path.join(model_path)) # remove saved deprecated running_* keys in InstanceNorm from the checkpoint for k in list(state_dict.keys()): if re.search(r'in\d+\.running_(mean|var)$', k): del state_dict[k] style_model.load_state_dict(state_dict) style_model.to(device) + print(f'Model loaded successfully. Path: {model_path}') + + +def run(mini_batch): - filenames = os.listdir(args.content_dir) + result = [] + for image_file_path in mini_batch: + img = load_image(image_file_path) - for filename in filenames: - print("Processing {}".format(filename)) - full_path = os.path.join(args.content_dir, filename) - content_image = load_image(full_path, scale=args.content_scale) + with torch.no_grad(): content_transform = transforms.Compose([ transforms.ToTensor(), transforms.Lambda(lambda x: x.mul(255)) ]) - content_image = content_transform(content_image) + content_image = content_transform(img) content_image = content_image.unsqueeze(0).to(device) output = style_model(content_image).cpu() + output_file_path = os.path.join(output_path, os.path.basename(image_file_path)) + save_image(output_file_path, output[0]) + result.append(output_file_path) - output_path = os.path.join(args.output_dir, filename) - save_image(output_path, output[0]) - -def main(): - arg_parser = argparse.ArgumentParser(description="parser for fast-neural-style") - - arg_parser.add_argument("--content-scale", type=float, default=None, - help="factor for scaling down the content image") - arg_parser.add_argument("--model-dir", type=str, required=True, - help="saved model to be used for stylizing the image.") - arg_parser.add_argument("--cuda", type=int, required=True, - help="set it to 1 for running on GPU, 0 for CPU") - arg_parser.add_argument("--style", type=str, - help="style name") - - arg_parser.add_argument("--content-dir", type=str, required=True, - help="directory holding the images") - arg_parser.add_argument("--output-dir", type=str, required=True, - help="directory holding the output images") - args = arg_parser.parse_args() - - if args.cuda and not torch.cuda.is_available(): - print("ERROR: cuda is not available, try running on CPU") - sys.exit(1) - os.makedirs(args.output_dir, exist_ok=True) - stylize(args) - - -if __name__ == "__main__": - main() + return result diff --git a/how-to-use-azureml/ml-frameworks/chainer/deployment/train-hyperparameter-tune-deploy-with-chainer/train-hyperparameter-tune-deploy-with-chainer.ipynb b/how-to-use-azureml/ml-frameworks/chainer/deployment/train-hyperparameter-tune-deploy-with-chainer/train-hyperparameter-tune-deploy-with-chainer.ipynb index 177b4b215..94ce595ef 100644 --- a/how-to-use-azureml/ml-frameworks/chainer/deployment/train-hyperparameter-tune-deploy-with-chainer/train-hyperparameter-tune-deploy-with-chainer.ipynb +++ b/how-to-use-azureml/ml-frameworks/chainer/deployment/train-hyperparameter-tune-deploy-with-chainer/train-hyperparameter-tune-deploy-with-chainer.ipynb @@ -418,6 +418,15 @@ "hyperdrive_run.wait_for_completion(show_output=True)" ] }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "assert(hyperdrive_run.get_status() == \"Completed\")" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -507,7 +516,7 @@ "metadata": {}, "source": [ "### Create myenv.yml\n", - "We also need to create an environment file so that Azure Machine Learning can install the necessary packages in the Docker image which are required by your scoring script. In this case, we need to specify conda packages `numpy` and `chainer`." + "We also need to create an environment file so that Azure Machine Learning can install the necessary packages in the Docker image which are required by your scoring script. In this case, we need to specify conda package `numpy` and pip install `chainer`. Please note that you must indicate azureml-defaults with verion >= 1.0.45 as a pip dependency, because it contains the functionality needed to host the model as a web service." ] }, { @@ -520,7 +529,8 @@ "\n", "cd = CondaDependencies.create()\n", "cd.add_conda_package('numpy')\n", - "cd.add_conda_package('chainer')\n", + "cd.add_pip_package('chainer==5.1.0')\n", + "cd.add_pip_package(\"azureml-defaults\")\n", "cd.save_to_file(base_directory='./', conda_file_path='myenv.yml')\n", "\n", "print(cd.serialize_to_string())" @@ -544,10 +554,11 @@ "from azureml.core.model import InferenceConfig\n", "from azureml.core.webservice import Webservice\n", "from azureml.core.model import Model\n", + "from azureml.core.environment import Environment\n", + "\n", "\n", - "inference_config = InferenceConfig(runtime= \"python\", \n", - " entry_script=\"chainer_score.py\",\n", - " conda_file=\"myenv.yml\")\n", + "myenv = Environment.from_conda_specification(name=\"myenv\", file_path=\"myenv.yml\")\n", + "inference_config = InferenceConfig(entry_script=\"chainer_score.py\", environment=myenv)\n", "\n", "aciconfig = AciWebservice.deploy_configuration(cpu_cores=1,\n", " auth_enabled=True, # this flag generates API keys to secure access\n", @@ -671,13 +682,11 @@ "metadata": {}, "outputs": [], "source": [ - "models = ws.models\n", - "for name, model in models.items():\n", - " print(\"Model: {}, ID: {}\".format(name, model.id))\n", + "model = ws.models['chainer-dnn-mnist']\n", + "print(\"Model: {}, ID: {}\".format('chainer-dnn-mnist', model.id))\n", " \n", - "webservices = ws.webservices\n", - "for name, webservice in webservices.items():\n", - " print(\"Webservice: {}, scoring URI: {}\".format(name, webservice.scoring_uri))" + "webservice = ws.webservices['chainer-mnist-1']\n", + "print(\"Webservice: {}, scoring URI: {}\".format('chainer-mnist-1', webservice.scoring_uri))" ] }, { diff --git a/how-to-use-azureml/ml-frameworks/pytorch/deployment/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb b/how-to-use-azureml/ml-frameworks/pytorch/deployment/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb index ed14ede77..936fb6273 100644 --- a/how-to-use-azureml/ml-frameworks/pytorch/deployment/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb +++ b/how-to-use-azureml/ml-frameworks/pytorch/deployment/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb @@ -440,6 +440,15 @@ "hyperdrive_run.wait_for_completion(show_output=True)" ] }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "assert(hyperdrive_run.get_status() == \"Completed\")" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -535,7 +544,7 @@ "source": [ "from azureml.core.conda_dependencies import CondaDependencies \n", "\n", - "myenv = CondaDependencies.create(pip_packages=['azureml-defaults', 'torch', 'torchvision'])\n", + "myenv = CondaDependencies.create(pip_packages=['azureml-defaults', 'torch', 'torchvision>=0.5.0'])\n", "\n", "with open(\"myenv.yml\",\"w\") as f:\n", " f.write(myenv.serialize_to_string())\n", @@ -561,10 +570,11 @@ "from azureml.core.model import InferenceConfig\n", "from azureml.core.webservice import Webservice\n", "from azureml.core.model import Model\n", + "from azureml.core.environment import Environment\n", + "\n", "\n", - "inference_config = InferenceConfig(runtime= \"python\", \n", - " entry_script=\"pytorch_score.py\",\n", - " conda_file=\"myenv.yml\")\n", + "myenv = Environment.from_conda_specification(name=\"myenv\", file_path=\"myenv.yml\")\n", + "inference_config = InferenceConfig(entry_script=\"pytorch_score.py\", environment=myenv)\n", "\n", "aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, \n", " memory_gb=1, \n", diff --git a/how-to-use-azureml/ml-frameworks/pytorch/training/mask-rcnn-object-detection/coco_eval.py b/how-to-use-azureml/ml-frameworks/pytorch/training/mask-rcnn-object-detection/coco_eval.py new file mode 100644 index 000000000..2f032132a --- /dev/null +++ b/how-to-use-azureml/ml-frameworks/pytorch/training/mask-rcnn-object-detection/coco_eval.py @@ -0,0 +1,350 @@ +import json +import tempfile + +import numpy as np +import copy +import time +import torch +import torch._six + +from pycocotools.cocoeval import COCOeval +from pycocotools.coco import COCO +import pycocotools.mask as mask_util + +from collections import defaultdict + +import utils + + +class CocoEvaluator(object): + def __init__(self, coco_gt, iou_types): + assert isinstance(iou_types, (list, tuple)) + coco_gt = copy.deepcopy(coco_gt) + self.coco_gt = coco_gt + + self.iou_types = iou_types + self.coco_eval = {} + for iou_type in iou_types: + self.coco_eval[iou_type] = COCOeval(coco_gt, iouType=iou_type) + + self.img_ids = [] + self.eval_imgs = {k: [] for k in iou_types} + + def update(self, predictions): + img_ids = list(np.unique(list(predictions.keys()))) + self.img_ids.extend(img_ids) + + for iou_type in self.iou_types: + results = self.prepare(predictions, iou_type) + coco_dt = loadRes(self.coco_gt, results) if results else COCO() + coco_eval = self.coco_eval[iou_type] + + coco_eval.cocoDt = coco_dt + coco_eval.params.imgIds = list(img_ids) + img_ids, eval_imgs = evaluate(coco_eval) + + self.eval_imgs[iou_type].append(eval_imgs) + + def synchronize_between_processes(self): + for iou_type in self.iou_types: + self.eval_imgs[iou_type] = np.concatenate(self.eval_imgs[iou_type], 2) + create_common_coco_eval(self.coco_eval[iou_type], self.img_ids, self.eval_imgs[iou_type]) + + def accumulate(self): + for coco_eval in self.coco_eval.values(): + coco_eval.accumulate() + + def summarize(self): + for iou_type, coco_eval in self.coco_eval.items(): + print("IoU metric: {}".format(iou_type)) + coco_eval.summarize() + + def prepare(self, predictions, iou_type): + if iou_type == "bbox": + return self.prepare_for_coco_detection(predictions) + elif iou_type == "segm": + return self.prepare_for_coco_segmentation(predictions) + elif iou_type == "keypoints": + return self.prepare_for_coco_keypoint(predictions) + else: + raise ValueError("Unknown iou type {}".format(iou_type)) + + def prepare_for_coco_detection(self, predictions): + coco_results = [] + for original_id, prediction in predictions.items(): + if len(prediction) == 0: + continue + + boxes = prediction["boxes"] + boxes = convert_to_xywh(boxes).tolist() + scores = prediction["scores"].tolist() + labels = prediction["labels"].tolist() + + coco_results.extend( + [ + { + "image_id": original_id, + "category_id": labels[k], + "bbox": box, + "score": scores[k], + } + for k, box in enumerate(boxes) + ] + ) + return coco_results + + def prepare_for_coco_segmentation(self, predictions): + coco_results = [] + for original_id, prediction in predictions.items(): + if len(prediction) == 0: + continue + + scores = prediction["scores"] + labels = prediction["labels"] + masks = prediction["masks"] + + masks = masks > 0.5 + + scores = prediction["scores"].tolist() + labels = prediction["labels"].tolist() + + rles = [ + mask_util.encode(np.array(mask[0, :, :, np.newaxis], dtype=np.uint8, order="F"))[0] + for mask in masks + ] + for rle in rles: + rle["counts"] = rle["counts"].decode("utf-8") + + coco_results.extend( + [ + { + "image_id": original_id, + "category_id": labels[k], + "segmentation": rle, + "score": scores[k], + } + for k, rle in enumerate(rles) + ] + ) + return coco_results + + def prepare_for_coco_keypoint(self, predictions): + coco_results = [] + for original_id, prediction in predictions.items(): + if len(prediction) == 0: + continue + + boxes = prediction["boxes"] + boxes = convert_to_xywh(boxes).tolist() + scores = prediction["scores"].tolist() + labels = prediction["labels"].tolist() + keypoints = prediction["keypoints"] + keypoints = keypoints.flatten(start_dim=1).tolist() + + coco_results.extend( + [ + { + "image_id": original_id, + "category_id": labels[k], + 'keypoints': keypoint, + "score": scores[k], + } + for k, keypoint in enumerate(keypoints) + ] + ) + return coco_results + + +def convert_to_xywh(boxes): + xmin, ymin, xmax, ymax = boxes.unbind(1) + return torch.stack((xmin, ymin, xmax - xmin, ymax - ymin), dim=1) + + +def merge(img_ids, eval_imgs): + all_img_ids = utils.all_gather(img_ids) + all_eval_imgs = utils.all_gather(eval_imgs) + + merged_img_ids = [] + for p in all_img_ids: + merged_img_ids.extend(p) + + merged_eval_imgs = [] + for p in all_eval_imgs: + merged_eval_imgs.append(p) + + merged_img_ids = np.array(merged_img_ids) + merged_eval_imgs = np.concatenate(merged_eval_imgs, 2) + + # keep only unique (and in sorted order) images + merged_img_ids, idx = np.unique(merged_img_ids, return_index=True) + merged_eval_imgs = merged_eval_imgs[..., idx] + + return merged_img_ids, merged_eval_imgs + + +def create_common_coco_eval(coco_eval, img_ids, eval_imgs): + img_ids, eval_imgs = merge(img_ids, eval_imgs) + img_ids = list(img_ids) + eval_imgs = list(eval_imgs.flatten()) + + coco_eval.evalImgs = eval_imgs + coco_eval.params.imgIds = img_ids + coco_eval._paramsEval = copy.deepcopy(coco_eval.params) + + +################################################################# +# From pycocotools, just removed the prints and fixed +# a Python3 bug about unicode not defined +################################################################# + +# Ideally, pycocotools wouldn't have hard-coded prints +# so that we could avoid copy-pasting those two functions + +def createIndex(self): + # create index + # print('creating index...') + anns, cats, imgs = {}, {}, {} + imgToAnns, catToImgs = defaultdict(list), defaultdict(list) + if 'annotations' in self.dataset: + for ann in self.dataset['annotations']: + imgToAnns[ann['image_id']].append(ann) + anns[ann['id']] = ann + + if 'images' in self.dataset: + for img in self.dataset['images']: + imgs[img['id']] = img + + if 'categories' in self.dataset: + for cat in self.dataset['categories']: + cats[cat['id']] = cat + + if 'annotations' in self.dataset and 'categories' in self.dataset: + for ann in self.dataset['annotations']: + catToImgs[ann['category_id']].append(ann['image_id']) + + # print('index created!') + + # create class members + self.anns = anns + self.imgToAnns = imgToAnns + self.catToImgs = catToImgs + self.imgs = imgs + self.cats = cats + + +maskUtils = mask_util + + +def loadRes(self, resFile): + """ + Load result file and return a result api object. + :param resFile (str) : file name of result file + :return: res (obj) : result api object + """ + res = COCO() + res.dataset['images'] = [img for img in self.dataset['images']] + + # print('Loading and preparing results...') + # tic = time.time() + if isinstance(resFile, torch._six.string_classes): + anns = json.load(open(resFile)) + elif type(resFile) == np.ndarray: + anns = self.loadNumpyAnnotations(resFile) + else: + anns = resFile + assert type(anns) == list, 'results in not an array of objects' + annsImgIds = [ann['image_id'] for ann in anns] + assert set(annsImgIds) == (set(annsImgIds) & set(self.getImgIds())), \ + 'Results do not correspond to current coco set' + if 'caption' in anns[0]: + imgIds = set([img['id'] for img in res.dataset['images']]) & set([ann['image_id'] for ann in anns]) + res.dataset['images'] = [img for img in res.dataset['images'] if img['id'] in imgIds] + for id, ann in enumerate(anns): + ann['id'] = id + 1 + elif 'bbox' in anns[0] and not anns[0]['bbox'] == []: + res.dataset['categories'] = copy.deepcopy(self.dataset['categories']) + for id, ann in enumerate(anns): + bb = ann['bbox'] + x1, x2, y1, y2 = [bb[0], bb[0] + bb[2], bb[1], bb[1] + bb[3]] + if 'segmentation' not in ann: + ann['segmentation'] = [[x1, y1, x1, y2, x2, y2, x2, y1]] + ann['area'] = bb[2] * bb[3] + ann['id'] = id + 1 + ann['iscrowd'] = 0 + elif 'segmentation' in anns[0]: + res.dataset['categories'] = copy.deepcopy(self.dataset['categories']) + for id, ann in enumerate(anns): + # now only support compressed RLE format as segmentation results + ann['area'] = maskUtils.area(ann['segmentation']) + if 'bbox' not in ann: + ann['bbox'] = maskUtils.toBbox(ann['segmentation']) + ann['id'] = id + 1 + ann['iscrowd'] = 0 + elif 'keypoints' in anns[0]: + res.dataset['categories'] = copy.deepcopy(self.dataset['categories']) + for id, ann in enumerate(anns): + s = ann['keypoints'] + x = s[0::3] + y = s[1::3] + x1, x2, y1, y2 = np.min(x), np.max(x), np.min(y), np.max(y) + ann['area'] = (x2 - x1) * (y2 - y1) + ann['id'] = id + 1 + ann['bbox'] = [x1, y1, x2 - x1, y2 - y1] + # print('DONE (t={:0.2f}s)'.format(time.time()- tic)) + + res.dataset['annotations'] = anns + createIndex(res) + return res + + +def evaluate(self): + ''' + Run per image evaluation on given images and store results (a list of dict) in self.evalImgs + :return: None + ''' + # tic = time.time() + # print('Running per image evaluation...') + p = self.params + # add backward compatibility if useSegm is specified in params + if p.useSegm is not None: + p.iouType = 'segm' if p.useSegm == 1 else 'bbox' + print('useSegm (deprecated) is not None. Running {} evaluation'.format(p.iouType)) + # print('Evaluate annotation type *{}*'.format(p.iouType)) + p.imgIds = list(np.unique(p.imgIds)) + if p.useCats: + p.catIds = list(np.unique(p.catIds)) + p.maxDets = sorted(p.maxDets) + self.params = p + + self._prepare() + # loop through images, area range, max detection number + catIds = p.catIds if p.useCats else [-1] + + if p.iouType == 'segm' or p.iouType == 'bbox': + computeIoU = self.computeIoU + elif p.iouType == 'keypoints': + computeIoU = self.computeOks + self.ious = { + (imgId, catId): computeIoU(imgId, catId) + for imgId in p.imgIds + for catId in catIds} + + evaluateImg = self.evaluateImg + maxDet = p.maxDets[-1] + evalImgs = [ + evaluateImg(imgId, catId, areaRng, maxDet) + for catId in catIds + for areaRng in p.areaRng + for imgId in p.imgIds + ] + # this is NOT in the pycocotools code, but could be done outside + evalImgs = np.asarray(evalImgs).reshape( + len(catIds), len(p.areaRng), len(p.imgIds)) + self._paramsEval = copy.deepcopy(self.params) + # toc = time.time() + # print('DONE (t={:0.2f}s).'.format(toc-tic)) + return p.imgIds, evalImgs + +################################################################# +# end of straight copy from pycocotools, just removing the prints +################################################################# diff --git a/how-to-use-azureml/ml-frameworks/pytorch/training/mask-rcnn-object-detection/coco_utils.py b/how-to-use-azureml/ml-frameworks/pytorch/training/mask-rcnn-object-detection/coco_utils.py new file mode 100644 index 000000000..26701a2cb --- /dev/null +++ b/how-to-use-azureml/ml-frameworks/pytorch/training/mask-rcnn-object-detection/coco_utils.py @@ -0,0 +1,252 @@ +import copy +import os +from PIL import Image + +import torch +import torch.utils.data +import torchvision + +from pycocotools import mask as coco_mask +from pycocotools.coco import COCO + +import transforms as T + + +class FilterAndRemapCocoCategories(object): + def __init__(self, categories, remap=True): + self.categories = categories + self.remap = remap + + def __call__(self, image, target): + anno = target["annotations"] + anno = [obj for obj in anno if obj["category_id"] in self.categories] + if not self.remap: + target["annotations"] = anno + return image, target + anno = copy.deepcopy(anno) + for obj in anno: + obj["category_id"] = self.categories.index(obj["category_id"]) + target["annotations"] = anno + return image, target + + +def convert_coco_poly_to_mask(segmentations, height, width): + masks = [] + for polygons in segmentations: + rles = coco_mask.frPyObjects(polygons, height, width) + mask = coco_mask.decode(rles) + if len(mask.shape) < 3: + mask = mask[..., None] + mask = torch.as_tensor(mask, dtype=torch.uint8) + mask = mask.any(dim=2) + masks.append(mask) + if masks: + masks = torch.stack(masks, dim=0) + else: + masks = torch.zeros((0, height, width), dtype=torch.uint8) + return masks + + +class ConvertCocoPolysToMask(object): + def __call__(self, image, target): + w, h = image.size + + image_id = target["image_id"] + image_id = torch.tensor([image_id]) + + anno = target["annotations"] + + anno = [obj for obj in anno if obj['iscrowd'] == 0] + + boxes = [obj["bbox"] for obj in anno] + # guard against no boxes via resizing + boxes = torch.as_tensor(boxes, dtype=torch.float32).reshape(-1, 4) + boxes[:, 2:] += boxes[:, :2] + boxes[:, 0::2].clamp_(min=0, max=w) + boxes[:, 1::2].clamp_(min=0, max=h) + + classes = [obj["category_id"] for obj in anno] + classes = torch.tensor(classes, dtype=torch.int64) + + segmentations = [obj["segmentation"] for obj in anno] + masks = convert_coco_poly_to_mask(segmentations, h, w) + + keypoints = None + if anno and "keypoints" in anno[0]: + keypoints = [obj["keypoints"] for obj in anno] + keypoints = torch.as_tensor(keypoints, dtype=torch.float32) + num_keypoints = keypoints.shape[0] + if num_keypoints: + keypoints = keypoints.view(num_keypoints, -1, 3) + + keep = (boxes[:, 3] > boxes[:, 1]) & (boxes[:, 2] > boxes[:, 0]) + boxes = boxes[keep] + classes = classes[keep] + masks = masks[keep] + if keypoints is not None: + keypoints = keypoints[keep] + + target = {} + target["boxes"] = boxes + target["labels"] = classes + target["masks"] = masks + target["image_id"] = image_id + if keypoints is not None: + target["keypoints"] = keypoints + + # for conversion to coco api + area = torch.tensor([obj["area"] for obj in anno]) + iscrowd = torch.tensor([obj["iscrowd"] for obj in anno]) + target["area"] = area + target["iscrowd"] = iscrowd + + return image, target + + +def _coco_remove_images_without_annotations(dataset, cat_list=None): + def _has_only_empty_bbox(anno): + return all(any(o <= 1 for o in obj["bbox"][2:]) for obj in anno) + + def _count_visible_keypoints(anno): + return sum(sum(1 for v in ann["keypoints"][2::3] if v > 0) for ann in anno) + + min_keypoints_per_image = 10 + + def _has_valid_annotation(anno): + # if it's empty, there is no annotation + if len(anno) == 0: + return False + # if all boxes have close to zero area, there is no annotation + if _has_only_empty_bbox(anno): + return False + # keypoints task have a slight different critera for considering + # if an annotation is valid + if "keypoints" not in anno[0]: + return True + # for keypoint detection tasks, only consider valid images those + # containing at least min_keypoints_per_image + if _count_visible_keypoints(anno) >= min_keypoints_per_image: + return True + return False + + assert isinstance(dataset, torchvision.datasets.CocoDetection) + ids = [] + for ds_idx, img_id in enumerate(dataset.ids): + ann_ids = dataset.coco.getAnnIds(imgIds=img_id, iscrowd=None) + anno = dataset.coco.loadAnns(ann_ids) + if cat_list: + anno = [obj for obj in anno if obj["category_id"] in cat_list] + if _has_valid_annotation(anno): + ids.append(ds_idx) + + dataset = torch.utils.data.Subset(dataset, ids) + return dataset + + +def convert_to_coco_api(ds): + coco_ds = COCO() + # annotation IDs need to start at 1, not 0, see torchvision issue #1530 + ann_id = 1 + dataset = {'images': [], 'categories': [], 'annotations': []} + categories = set() + for img_idx in range(len(ds)): + # find better way to get target + # targets = ds.get_annotations(img_idx) + img, targets = ds[img_idx] + image_id = targets["image_id"].item() + img_dict = {} + img_dict['id'] = image_id + img_dict['height'] = img.shape[-2] + img_dict['width'] = img.shape[-1] + dataset['images'].append(img_dict) + bboxes = targets["boxes"] + bboxes[:, 2:] -= bboxes[:, :2] + bboxes = bboxes.tolist() + labels = targets['labels'].tolist() + areas = targets['area'].tolist() + iscrowd = targets['iscrowd'].tolist() + if 'masks' in targets: + masks = targets['masks'] + # make masks Fortran contiguous for coco_mask + masks = masks.permute(0, 2, 1).contiguous().permute(0, 2, 1) + if 'keypoints' in targets: + keypoints = targets['keypoints'] + keypoints = keypoints.reshape(keypoints.shape[0], -1).tolist() + num_objs = len(bboxes) + for i in range(num_objs): + ann = {} + ann['image_id'] = image_id + ann['bbox'] = bboxes[i] + ann['category_id'] = labels[i] + categories.add(labels[i]) + ann['area'] = areas[i] + ann['iscrowd'] = iscrowd[i] + ann['id'] = ann_id + if 'masks' in targets: + ann["segmentation"] = coco_mask.encode(masks[i].numpy()) + if 'keypoints' in targets: + ann['keypoints'] = keypoints[i] + ann['num_keypoints'] = sum(k != 0 for k in keypoints[i][2::3]) + dataset['annotations'].append(ann) + ann_id += 1 + dataset['categories'] = [{'id': i} for i in sorted(categories)] + coco_ds.dataset = dataset + coco_ds.createIndex() + return coco_ds + + +def get_coco_api_from_dataset(dataset): + for _ in range(10): + if isinstance(dataset, torchvision.datasets.CocoDetection): + break + if isinstance(dataset, torch.utils.data.Subset): + dataset = dataset.dataset + if isinstance(dataset, torchvision.datasets.CocoDetection): + return dataset.coco + return convert_to_coco_api(dataset) + + +class CocoDetection(torchvision.datasets.CocoDetection): + def __init__(self, img_folder, ann_file, transforms): + super(CocoDetection, self).__init__(img_folder, ann_file) + self._transforms = transforms + + def __getitem__(self, idx): + img, target = super(CocoDetection, self).__getitem__(idx) + image_id = self.ids[idx] + target = dict(image_id=image_id, annotations=target) + if self._transforms is not None: + img, target = self._transforms(img, target) + return img, target + + +def get_coco(root, image_set, transforms, mode='instances'): + anno_file_template = "{}_{}2017.json" + PATHS = { + "train": ("train2017", os.path.join("annotations", anno_file_template.format(mode, "train"))), + "val": ("val2017", os.path.join("annotations", anno_file_template.format(mode, "val"))), + # "train": ("val2017", os.path.join("annotations", anno_file_template.format(mode, "val"))) + } + + t = [ConvertCocoPolysToMask()] + + if transforms is not None: + t.append(transforms) + transforms = T.Compose(t) + + img_folder, ann_file = PATHS[image_set] + img_folder = os.path.join(root, img_folder) + ann_file = os.path.join(root, ann_file) + + dataset = CocoDetection(img_folder, ann_file, transforms=transforms) + + if image_set == "train": + dataset = _coco_remove_images_without_annotations(dataset) + + # dataset = torch.utils.data.Subset(dataset, [i for i in range(500)]) + + return dataset + + +def get_coco_kp(root, image_set, transforms): + return get_coco(root, image_set, transforms, mode="person_keypoints") diff --git a/how-to-use-azureml/ml-frameworks/pytorch/training/mask-rcnn-object-detection/data.py b/how-to-use-azureml/ml-frameworks/pytorch/training/mask-rcnn-object-detection/data.py new file mode 100644 index 000000000..6b8ee4929 --- /dev/null +++ b/how-to-use-azureml/ml-frameworks/pytorch/training/mask-rcnn-object-detection/data.py @@ -0,0 +1,77 @@ +import numpy as np +import os +import torch.utils.data + +from azureml.core import Run +from PIL import Image + + +class PennFudanDataset(torch.utils.data.Dataset): + def __init__(self, root, transforms=None): + self.root = root + self.transforms = transforms + + # load all image files, sorting them to ensure that they are aligned + self.img_dir = os.path.join(root, "PNGImages") + self.mask_dir = os.path.join(root, "PedMasks") + + self.imgs = list(sorted(os.listdir(self.img_dir))) + self.masks = list(sorted(os.listdir(self.mask_dir))) + + def __getitem__(self, idx): + # load images ad masks + img_path = os.path.join(self.img_dir, self.imgs[idx]) + mask_path = os.path.join(self.mask_dir, self.masks[idx]) + + img = Image.open(img_path).convert("RGB") + # note that we haven't converted the mask to RGB, + # because each color corresponds to a different instance + # with 0 being background + mask = Image.open(mask_path) + + mask = np.array(mask) + # instances are encoded as different colors + obj_ids = np.unique(mask) + # first id is the background, so remove it + obj_ids = obj_ids[1:] + + # split the color-encoded mask into a set + # of binary masks + masks = mask == obj_ids[:, None, None] + + # get bounding box coordinates for each mask + num_objs = len(obj_ids) + boxes = [] + for i in range(num_objs): + pos = np.where(masks[i]) + xmin = np.min(pos[1]) + xmax = np.max(pos[1]) + ymin = np.min(pos[0]) + ymax = np.max(pos[0]) + boxes.append([xmin, ymin, xmax, ymax]) + + boxes = torch.as_tensor(boxes, dtype=torch.float32) + # there is only one class + labels = torch.ones((num_objs,), dtype=torch.int64) + masks = torch.as_tensor(masks, dtype=torch.uint8) + + image_id = torch.tensor([idx]) + area = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0]) + # suppose all instances are not crowd + iscrowd = torch.zeros((num_objs,), dtype=torch.int64) + + target = {} + target["boxes"] = boxes + target["labels"] = labels + target["masks"] = masks + target["image_id"] = image_id + target["area"] = area + target["iscrowd"] = iscrowd + + if self.transforms is not None: + img, target = self.transforms(img, target) + + return img, target + + def __len__(self): + return len(self.imgs) diff --git a/how-to-use-azureml/ml-frameworks/pytorch/training/mask-rcnn-object-detection/dockerfiles/Dockerfile b/how-to-use-azureml/ml-frameworks/pytorch/training/mask-rcnn-object-detection/dockerfiles/Dockerfile new file mode 100644 index 000000000..9b76f6001 --- /dev/null +++ b/how-to-use-azureml/ml-frameworks/pytorch/training/mask-rcnn-object-detection/dockerfiles/Dockerfile @@ -0,0 +1,16 @@ +# From https://github.com/microsoft/AzureML-BERT/blob/master/finetune/PyTorch/dockerfile + +FROM mcr.microsoft.com/azureml/base-gpu:openmpi3.1.2-cuda10.1-cudnn7-ubuntu18.04 + +RUN apt update && apt install git -y && rm -rf /var/lib/apt/lists/* + +RUN /opt/miniconda/bin/conda update -n base -c defaults conda +RUN /opt/miniconda/bin/conda install -y cython=0.29.15 numpy=1.18.1 +RUN /opt/miniconda/bin/conda install -y pytorch=1.4 torchvision=0.5.0 -c pytorch + +# Install cocoapi, required for drawing bounding boxes +RUN git clone https://github.com/cocodataset/cocoapi.git && cd cocoapi/PythonAPI && python setup.py build_ext install + +RUN pip install azureml-defaults +RUN pip install "azureml-dataprep[fuse]" +RUN pip install pandas pyarrow diff --git a/how-to-use-azureml/ml-frameworks/pytorch/training/mask-rcnn-object-detection/engine.py b/how-to-use-azureml/ml-frameworks/pytorch/training/mask-rcnn-object-detection/engine.py new file mode 100644 index 000000000..68c39a4fc --- /dev/null +++ b/how-to-use-azureml/ml-frameworks/pytorch/training/mask-rcnn-object-detection/engine.py @@ -0,0 +1,108 @@ +import math +import sys +import time +import torch + +import torchvision.models.detection.mask_rcnn + +from coco_utils import get_coco_api_from_dataset +from coco_eval import CocoEvaluator +import utils + + +def train_one_epoch(model, optimizer, data_loader, device, epoch, print_freq): + model.train() + metric_logger = utils.MetricLogger(delimiter=" ") + metric_logger.add_meter('lr', utils.SmoothedValue(window_size=1, fmt='{value:.6f}')) + header = 'Epoch: [{}]'.format(epoch) + + lr_scheduler = None + if epoch == 0: + warmup_factor = 1. / 1000 + warmup_iters = min(1000, len(data_loader) - 1) + + lr_scheduler = utils.warmup_lr_scheduler(optimizer, warmup_iters, warmup_factor) + + for images, targets in metric_logger.log_every(data_loader, print_freq, header): + images = list(image.to(device) for image in images) + targets = [{k: v.to(device) for k, v in t.items()} for t in targets] + + loss_dict = model(images, targets) + + losses = sum(loss for loss in loss_dict.values()) + + # reduce losses over all GPUs for logging purposes + loss_dict_reduced = utils.reduce_dict(loss_dict) + losses_reduced = sum(loss for loss in loss_dict_reduced.values()) + + loss_value = losses_reduced.item() + + if not math.isfinite(loss_value): + print("Loss is {}, stopping training".format(loss_value)) + print(loss_dict_reduced) + sys.exit(1) + + optimizer.zero_grad() + losses.backward() + optimizer.step() + + if lr_scheduler is not None: + lr_scheduler.step() + + metric_logger.update(loss=losses_reduced, **loss_dict_reduced) + metric_logger.update(lr=optimizer.param_groups[0]["lr"]) + + +def _get_iou_types(model): + model_without_ddp = model + if isinstance(model, torch.nn.parallel.DistributedDataParallel): + model_without_ddp = model.module + iou_types = ["bbox"] + if isinstance(model_without_ddp, torchvision.models.detection.MaskRCNN): + iou_types.append("segm") + if isinstance(model_without_ddp, torchvision.models.detection.KeypointRCNN): + iou_types.append("keypoints") + return iou_types + + +@torch.no_grad() +def evaluate(model, data_loader, device): + n_threads = torch.get_num_threads() + # FIXME remove this and make paste_masks_in_image run on the GPU + torch.set_num_threads(1) + cpu_device = torch.device("cpu") + model.eval() + metric_logger = utils.MetricLogger(delimiter=" ") + header = 'Test:' + + coco = get_coco_api_from_dataset(data_loader.dataset) + iou_types = _get_iou_types(model) + coco_evaluator = CocoEvaluator(coco, iou_types) + + for image, targets in metric_logger.log_every(data_loader, 100, header): + image = list(img.to(device) for img in image) + targets = [{k: v.to(device) for k, v in t.items()} for t in targets] + + torch.cuda.synchronize() + model_time = time.time() + outputs = model(image) + + outputs = [{k: v.to(cpu_device) for k, v in t.items()} for t in outputs] + model_time = time.time() - model_time + + res = {target["image_id"].item(): output for target, output in zip(targets, outputs)} + evaluator_time = time.time() + coco_evaluator.update(res) + evaluator_time = time.time() - evaluator_time + metric_logger.update(model_time=model_time, evaluator_time=evaluator_time) + + # gather the stats from all processes + metric_logger.synchronize_between_processes() + print("Averaged stats:", metric_logger) + coco_evaluator.synchronize_between_processes() + + # accumulate predictions from all images + coco_evaluator.accumulate() + coco_evaluator.summarize() + torch.set_num_threads(n_threads) + return coco_evaluator diff --git a/how-to-use-azureml/ml-frameworks/pytorch/training/mask-rcnn-object-detection/model.py b/how-to-use-azureml/ml-frameworks/pytorch/training/mask-rcnn-object-detection/model.py new file mode 100644 index 000000000..12e32effa --- /dev/null +++ b/how-to-use-azureml/ml-frameworks/pytorch/training/mask-rcnn-object-detection/model.py @@ -0,0 +1,23 @@ +import torchvision + +from torchvision.models.detection.faster_rcnn import FastRCNNPredictor +from torchvision.models.detection.mask_rcnn import MaskRCNNPredictor + + +def get_instance_segmentation_model(num_classes): + # load an instance segmentation model pre-trained on COCO + model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=True) + + # get the number of input features for the classifier + in_features = model.roi_heads.box_predictor.cls_score.in_features + # replace the pre-trained head with a new one + model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes) + + # now get the number of input features for the mask classifier + in_features_mask = model.roi_heads.mask_predictor.conv5_mask.in_channels + hidden_layer = 256 + # and replace the mask predictor with a new one + model.roi_heads.mask_predictor = MaskRCNNPredictor(in_features_mask, + hidden_layer, + num_classes) + return model diff --git a/how-to-use-azureml/ml-frameworks/pytorch/training/mask-rcnn-object-detection/pytorch-mask-rcnn.ipynb b/how-to-use-azureml/ml-frameworks/pytorch/training/mask-rcnn-object-detection/pytorch-mask-rcnn.ipynb new file mode 100644 index 000000000..e21a40aba --- /dev/null +++ b/how-to-use-azureml/ml-frameworks/pytorch/training/mask-rcnn-object-detection/pytorch-mask-rcnn.ipynb @@ -0,0 +1,544 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/ml-frameworks/pytorch/training/mask-rcnn-object-detection/pytorch-mask-rcnn.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Object detection with PyTorch, Mask R-CNN, and a custom Dockerfile\n", + "\n", + "In this tutorial, you will finetune a pre-trained [Mask R-CNN](https://arxiv.org/abs/1703.06870) model on images from the [Penn-Fudan Database for Pedestrian Detection and Segmentation](https://www.cis.upenn.edu/~jshi/ped_html/). The dataset has 170 images with 345 instances of pedestrians. After running this tutorial, you will have a model that can outline the silhouettes of all pedestrians within an image.\n", + "\n", + "You\u00e2\u20ac\u2122ll use Azure Machine Learning to: \n", + "\n", + "- Initialize a workspace \n", + "- Create a compute cluster\n", + "- Define a training environment\n", + "- Train a model remotely\n", + "- Register your model\n", + "- Generate predictions locally\n", + "\n", + "## Prerequisities\n", + "\n", + "- If you are using an Azure Machine Learning Notebook VM, your environment already meets these prerequisites. Otherwise, go through the [configuration notebook](../../../../../configuration.ipynb) to install the Azure Machine Learning Python SDK and [create an Azure ML Workspace](https://docs.microsoft.com/azure/machine-learning/how-to-manage-workspace#create-a-workspace). You also need matplotlib 3.2, pycocotools-2.0.0, torchvision >= 0.5.0 and torch >= 1.4.0.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Check core SDK version number, check other dependencies\n", + "import azureml.core\n", + "import matplotlib\n", + "import pycocotools\n", + "import torch\n", + "import torchvision\n", + "\n", + "print(\"SDK version:\", azureml.core.VERSION)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Diagnostics\n", + "\n", + "Opt-in diagnostics for better experience, quality, and security in future releases." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.telemetry import set_diagnostics_collection\n", + "\n", + "set_diagnostics_collection(send_diagnostics=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Initialize a workspace\n", + "\n", + "Initialize a [workspace](https://docs.microsoft.com/en-us/azure/machine-learning/concept-workspace) object from the existing workspace you created in the Prerequisites step. `Workspace.from_config()` creates a workspace object from the details stored in `config.json`, using the [from_config()](https://docs.microsoft.com/python/api/azureml-core/azureml.core.workspace(class)?view=azure-ml-py#from-config-path-none--auth-none---logger-none---file-name-none-) method." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.workspace import Workspace\n", + "\n", + "ws = Workspace.from_config()\n", + "print('Workspace name: ' + ws.name, \n", + " 'Azure region: ' + ws.location, \n", + " 'Subscription id: ' + ws.subscription_id, \n", + " 'Resource group: ' + ws.resource_group, sep='\\n')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create or attach existing Azure ML Managed Compute\n", + "\n", + "You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/concept-compute-target) for training your model. In this tutorial, we use [Azure ML managed compute](https://docs.microsoft.com/azure/machine-learning/how-to-set-up-training-targets#amlcompute) for our remote training compute resource. Specifically, the below code creates a `STANDARD_NC6` GPU cluster that autoscales from 0 to 4 nodes.\n", + "\n", + "**Creation of Compute takes approximately 5 minutes.** If the Aauzre ML Compute with that name is already in your workspace, this code will skip the creation process. \n", + "\n", + "As with other Azure servies, there are limits on certain resources associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/azure/machine-learning/how-to-manage-quotas) on the default limits and how to request more quota.\n", + "\n", + "> Note that the below code creates GPU compute. If you instead want to create CPU compute, provide a different VM size to the `vm_size` parameter, such as `STANDARD_D2_V2`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.compute import ComputeTarget, AmlCompute\n", + "from azureml.core.compute_target import ComputeTargetException\n", + "\n", + "\n", + "# choose a name for your cluster\n", + "cluster_name = 'gpu-cluster'\n", + "\n", + "try:\n", + " compute_target = ComputeTarget(workspace=ws, name=cluster_name)\n", + " print('Found existing compute target.')\n", + "except ComputeTargetException:\n", + " print('Creating a new compute target...')\n", + " compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6', \n", + " max_nodes=4)\n", + "\n", + " # create the cluster\n", + " compute_target = ComputeTarget.create(ws, cluster_name, compute_config)\n", + "\n", + " compute_target.wait_for_completion(show_output=True)\n", + "\n", + "# use get_status() to get a detailed status for the current cluster. \n", + "print(compute_target.get_status().serialize())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Define a training environment\n", + "\n", + "### Create a project directory\n", + "Create a directory that will contain all the code from your local machine that you will need access to on the remote resource. This includes the training script an any additional files your training script depends on." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "\n", + "project_folder = './pytorch-peds'\n", + "\n", + "try:\n", + " os.makedirs(project_folder, exist_ok=False)\n", + "except FileExistsError:\n", + " print('project folder {} exists, moving on...'.format(project_folder))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Copy training script and dependencies into project directory" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import shutil\n", + "\n", + "files_to_copy = ['data', 'model', 'script', 'utils', 'transforms', 'coco_eval', 'engine', 'coco_utils']\n", + "for file in files_to_copy:\n", + " shutil.copy(os.path.join(os.getcwd(), (file + '.py')), project_folder)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create an experiment" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core import Experiment\n", + "\n", + "experiment_name = 'pytorch-peds'\n", + "experiment = Experiment(ws, name=experiment_name)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Specify dependencies with a custom Dockerfile\n", + "\n", + "There are a number of ways to [use environments](https://docs.microsoft.com/azure/machine-learning/how-to-use-environments) for specifying dependencies during model training. In this case, we use a custom Dockerfile." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core import Environment\n", + "\n", + "my_env = Environment(name='maskr-docker')\n", + "my_env.docker.enabled = True\n", + "with open(\"dockerfiles/Dockerfile\", \"r\") as f:\n", + " dockerfile_contents=f.read()\n", + "my_env.docker.base_dockerfile=dockerfile_contents\n", + "my_env.docker.base_image = None\n", + "my_env.python.interpreter_path = '/opt/miniconda/bin/python'\n", + "my_env.python.user_managed_dependencies = True\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create a ScriptRunConfig\n", + "\n", + "Use the [ScriptRunConfig](https://docs.microsoft.com/python/api/azureml-core/azureml.core.scriptrunconfig?view=azure-ml-py) class to define your run. Specify the source directory, compute target, and environment." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.train.dnn import PyTorch\n", + "from azureml.core import ScriptRunConfig\n", + "\n", + "model_name = 'pytorch-peds'\n", + "output_dir = './outputs/'\n", + "n_epochs = 2\n", + "\n", + "script_args = [\n", + " '--model_name', model_name,\n", + " '--output_dir', output_dir,\n", + " '--n_epochs', n_epochs,\n", + "]\n", + "# Add training script to run config\n", + "runconfig = ScriptRunConfig(\n", + " source_directory=project_folder,\n", + " script=\"script.py\",\n", + " arguments=script_args)\n", + "\n", + "# Attach compute target to run config\n", + "runconfig.run_config.target = cluster_name\n", + "\n", + "# Uncomment the line below if you want to try this locally first\n", + "#runconfig.run_config.target = \"local\"\n", + "\n", + "# Attach environment to run config\n", + "runconfig.run_config.environment = my_env" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Train remotely\n", + "\n", + "### Submit your run" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Submit run \n", + "run = experiment.submit(runconfig)\n", + "\n", + "# to get more details of your run\n", + "print(run.get_details())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Monitor your run\n", + "\n", + "Use a widget to keep track of your run. You can also view the status of the run within the [Azure Machine Learning service portal](https://ml.azure.com)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.widgets import RunDetails\n", + "\n", + "RunDetails(run).show()\n", + "run.wait_for_completion(show_output=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Test your model\n", + "\n", + "Now that we are done training, let's see how well this model actually performs.\n", + "\n", + "### Get your latest run\n", + "First, pull the latest run using `experiment.get_runs()`, which lists runs from `experiment` in reverse chronological order." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core import Run\n", + "\n", + "last_run = next(experiment.get_runs())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Register your model\n", + "Next, [register the model](https://docs.microsoft.com/azure/machine-learning/concept-model-management-and-deployment#register-package-and-deploy-models-from-anywhere) from your run. Registering your model assigns it a version and helps you with auditability." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "last_run.register_model(model_name=model_name, model_path=os.path.join(output_dir, model_name))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Download your model\n", + "Next, download this registered model. Notice how we can initialize the `Model` object with the name of the registered model, rather than a path to the file itself." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core import Model\n", + "\n", + "model = Model(workspace=ws, name=model_name)\n", + "path = model.download(target_dir='model', exist_ok=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Use your model to make a prediction\n", + "\n", + "Run inferencing on a single test image and display the results." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import torch\n", + "from azureml.core import Dataset\n", + "from data import PennFudanDataset\n", + "from script import get_transform, download_data, NUM_CLASSES\n", + "from model import get_instance_segmentation_model\n", + "\n", + "if torch.cuda.is_available():\n", + " device = torch.device('cuda')\n", + "else:\n", + " device = torch.device('cpu')\n", + "\n", + "# Instantiate model with correct weights, cast to correct device, place in evaluation mode\n", + "predict_model = get_instance_segmentation_model(NUM_CLASSES)\n", + "predict_model.to(device)\n", + "predict_model.load_state_dict(torch.load(path, map_location=device))\n", + "predict_model.eval()\n", + "\n", + "# Load dataset\n", + "root_dir=download_data()\n", + "dataset_test = PennFudanDataset(root=root_dir, transforms=get_transform(train=False))\n", + "\n", + "# pick one image from the test set\n", + "img, _ = dataset_test[0]\n", + "\n", + "with torch.no_grad():\n", + " prediction = predict_model([img.to(device)])\n", + "\n", + "# model = torch.load(path)\n", + "#torch.load(model.get_model_path(model_name='outputs/model.pt'))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Display the input image\n", + "\n", + "While tensors are great for computers, a tensor of RGB values doesn't mean much to a human. Let's display the input image in a way that a human could understand." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from PIL import Image\n", + "\n", + "\n", + "Image.fromarray(img.mul(255).permute(1, 2, 0).byte().numpy())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Display the predicted masks\n", + "\n", + "The prediction consists of masks, displaying the outline of pedestrians in the image. Let's take a look at the first two masks, below." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "Image.fromarray(prediction[0]['masks'][0, 0].mul(255).byte().cpu().numpy())" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "Image.fromarray(prediction[0]['masks'][1, 0].mul(255).byte().cpu().numpy())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Next steps\n", + "\n", + "Congratulations! You just trained a Mask R-CNN model with PyTorch in Azure Machine Learning. As next steps, consider:\n", + "1. Learn more about using PyTorch in Azure Machine Learning service by checking out the [README](./README.md]\n", + "2. Try exporting your model to [ONNX](https://docs.microsoft.com/azure/machine-learning/concept-onnx) for accelerated inferencing." + ] + } + ], + "metadata": { + "authors": [ + { + "name": "gopalv" + } + ], + "category": "training", + "compute": [ + "AML Compute" + ], + "datasets": [ + "Custom" + ], + "deployment": [ + "None" + ], + "exclude_from_index": false, + "framework": [ + "PyTorch" + ], + "friendly_name": "PyTorch object detection", + "index_order": 1, + "kernel_info": { + "name": "python3" + }, + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.5-final" + }, + "nteract": { + "version": "nteract-front-end@1.0.0" + }, + "tags": [ + "remote run", + "docker" + ], + "task": "Fine-tune PyTorch object detection model with a custom dockerfile" + }, + "nbformat": 4, + "nbformat_minor": 2 +} \ No newline at end of file diff --git a/how-to-use-azureml/ml-frameworks/pytorch/training/mask-rcnn-object-detection/pytorch-mask-rcnn.yml b/how-to-use-azureml/ml-frameworks/pytorch/training/mask-rcnn-object-detection/pytorch-mask-rcnn.yml new file mode 100644 index 000000000..4302c3493 --- /dev/null +++ b/how-to-use-azureml/ml-frameworks/pytorch/training/mask-rcnn-object-detection/pytorch-mask-rcnn.yml @@ -0,0 +1,14 @@ +name: pytorch-mask-rcnn +dependencies: +- cython +- pytorch -c pytorch +- torchvision -c pytorch +- pip: + - azureml-sdk + - azureml-widgets + - azureml-dataprep + - fuse + - pandas + - matplotlib + - pillow==7.0.0 + - git+https://github.com/philferriere/cocoapi.git#subdirectory=PythonAPI diff --git a/how-to-use-azureml/ml-frameworks/pytorch/training/mask-rcnn-object-detection/script.py b/how-to-use-azureml/ml-frameworks/pytorch/training/mask-rcnn-object-detection/script.py new file mode 100644 index 000000000..5851cffaf --- /dev/null +++ b/how-to-use-azureml/ml-frameworks/pytorch/training/mask-rcnn-object-detection/script.py @@ -0,0 +1,117 @@ +import argparse +import os +import torch +import torchvision +import transforms as T +import urllib.request +import utils + +from azureml.core import Dataset, Run +from data import PennFudanDataset +from engine import train_one_epoch, evaluate +from model import get_instance_segmentation_model +from zipfile import ZipFile + +NUM_CLASSES = 2 + + +def download_data(): + data_file = 'PennFudanPed.zip' + ds_path = 'PennFudanPed/' + urllib.request.urlretrieve('https://www.cis.upenn.edu/~jshi/ped_html/PennFudanPed.zip', data_file) + zip = ZipFile(file=data_file) + zip.extractall(path=ds_path) + return os.path.join(ds_path, zip.namelist()[0]) + + +def get_transform(train): + transforms = [] + # converts the image, a PIL image, into a PyTorch Tensor + transforms.append(T.ToTensor()) + if train: + # during training, randomly flip the training images + # and ground-truth for data augmentation + transforms.append(T.RandomHorizontalFlip(0.5)) + return T.Compose(transforms) + + +def main(): + print("Torch version:", torch.__version__) + # get command-line arguments + parser = argparse.ArgumentParser() + parser.add_argument('--model_name', type=str, default="pytorch-peds.pt", + help='name with which to register your model') + parser.add_argument('--output_dir', default="local-outputs", + type=str, help='output directory') + parser.add_argument('--n_epochs', type=int, + default=10, help='number of epochs') + args = parser.parse_args() + + # In case user inputs a nested output directory + os.makedirs(name=args.output_dir, exist_ok=True) + + # Get a dataset by name + root_dir = download_data() + + # use our dataset and defined transformations + dataset = PennFudanDataset(root=root_dir, transforms=get_transform(train=True)) + dataset_test = PennFudanDataset(root=root_dir, transforms=get_transform(train=False)) + + # split the dataset in train and test set + torch.manual_seed(1) + indices = torch.randperm(len(dataset)).tolist() + dataset = torch.utils.data.Subset(dataset, indices[:-50]) + dataset_test = torch.utils.data.Subset(dataset_test, indices[-50:]) + + # define training and validation data loaders + data_loader = torch.utils.data.DataLoader( + dataset, batch_size=2, shuffle=True, num_workers=4, + collate_fn=utils.collate_fn) + + data_loader_test = torch.utils.data.DataLoader( + dataset_test, batch_size=1, shuffle=False, num_workers=4, + collate_fn=utils.collate_fn) + + if torch.cuda.is_available(): + print('Using GPU') + device = torch.device('cuda') + else: + print('Using CPU') + device = torch.device('cpu') + + # our dataset has two classes only - background and person + num_classes = NUM_CLASSES + + # get the model using our helper function + model = get_instance_segmentation_model(num_classes) + + # move model to the right device + model.to(device) + + # construct an optimizer + params = [p for p in model.parameters() if p.requires_grad] + optimizer = torch.optim.SGD(params, lr=0.005, + momentum=0.9, weight_decay=0.0005) + + # and a learning rate scheduler which decreases the learning rate by + # 10x every 3 epochs + lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, + step_size=3, + gamma=0.1) + + for epoch in range(args.n_epochs): + # train for one epoch, printing every 10 iterations + train_one_epoch( + model, optimizer, data_loader, device, epoch, print_freq=10) + # update the learning rate + lr_scheduler.step() + # evaluate on the test dataset + evaluate(model, data_loader_test, device=device) + + # Saving the state dict is recommended method, per + # https://pytorch.org/tutorials/beginner/saving_loading_models.html + torch.save(model.state_dict(), os.path.join(args.output_dir, args.model_name)) + + +if __name__ == '__main__': + main() diff --git a/how-to-use-azureml/ml-frameworks/pytorch/training/mask-rcnn-object-detection/transforms.py b/how-to-use-azureml/ml-frameworks/pytorch/training/mask-rcnn-object-detection/transforms.py new file mode 100644 index 000000000..73efc92bd --- /dev/null +++ b/how-to-use-azureml/ml-frameworks/pytorch/training/mask-rcnn-object-detection/transforms.py @@ -0,0 +1,50 @@ +import random +import torch + +from torchvision.transforms import functional as F + + +def _flip_coco_person_keypoints(kps, width): + flip_inds = [0, 2, 1, 4, 3, 6, 5, 8, 7, 10, 9, 12, 11, 14, 13, 16, 15] + flipped_data = kps[:, flip_inds] + flipped_data[..., 0] = width - flipped_data[..., 0] + # Maintain COCO convention that if visibility == 0, then x, y = 0 + inds = flipped_data[..., 2] == 0 + flipped_data[inds] = 0 + return flipped_data + + +class Compose(object): + def __init__(self, transforms): + self.transforms = transforms + + def __call__(self, image, target): + for t in self.transforms: + image, target = t(image, target) + return image, target + + +class RandomHorizontalFlip(object): + def __init__(self, prob): + self.prob = prob + + def __call__(self, image, target): + if random.random() < self.prob: + height, width = image.shape[-2:] + image = image.flip(-1) + bbox = target["boxes"] + bbox[:, [0, 2]] = width - bbox[:, [2, 0]] + target["boxes"] = bbox + if "masks" in target: + target["masks"] = target["masks"].flip(-1) + if "keypoints" in target: + keypoints = target["keypoints"] + keypoints = _flip_coco_person_keypoints(keypoints, width) + target["keypoints"] = keypoints + return image, target + + +class ToTensor(object): + def __call__(self, image, target): + image = F.to_tensor(image) + return image, target diff --git a/how-to-use-azureml/ml-frameworks/pytorch/training/mask-rcnn-object-detection/utils.py b/how-to-use-azureml/ml-frameworks/pytorch/training/mask-rcnn-object-detection/utils.py new file mode 100644 index 000000000..0e8e85601 --- /dev/null +++ b/how-to-use-azureml/ml-frameworks/pytorch/training/mask-rcnn-object-detection/utils.py @@ -0,0 +1,326 @@ +from __future__ import print_function + +from collections import defaultdict, deque +import datetime +import pickle +import time + +import torch +import torch.distributed as dist + +import errno +import os + + +class SmoothedValue(object): + """Track a series of values and provide access to smoothed values over a + window or the global series average. + """ + + def __init__(self, window_size=20, fmt=None): + if fmt is None: + fmt = "{median:.4f} ({global_avg:.4f})" + self.deque = deque(maxlen=window_size) + self.total = 0.0 + self.count = 0 + self.fmt = fmt + + def update(self, value, n=1): + self.deque.append(value) + self.count += n + self.total += value * n + + def synchronize_between_processes(self): + """ + Warning: does not synchronize the deque! + """ + if not is_dist_avail_and_initialized(): + return + t = torch.tensor([self.count, self.total], dtype=torch.float64, device='cuda') + dist.barrier() + dist.all_reduce(t) + t = t.tolist() + self.count = int(t[0]) + self.total = t[1] + + @property + def median(self): + d = torch.tensor(list(self.deque)) + return d.median().item() + + @property + def avg(self): + d = torch.tensor(list(self.deque), dtype=torch.float32) + return d.mean().item() + + @property + def global_avg(self): + return self.total / self.count + + @property + def max(self): + return max(self.deque) + + @property + def value(self): + return self.deque[-1] + + def __str__(self): + return self.fmt.format( + median=self.median, + avg=self.avg, + global_avg=self.global_avg, + max=self.max, + value=self.value) + + +def all_gather(data): + """ + Run all_gather on arbitrary picklable data (not necessarily tensors) + Args: + data: any picklable object + Returns: + list[data]: list of data gathered from each rank + """ + world_size = get_world_size() + if world_size == 1: + return [data] + + # serialized to a Tensor + buffer = pickle.dumps(data) + storage = torch.ByteStorage.from_buffer(buffer) + tensor = torch.ByteTensor(storage).to("cuda") + + # obtain Tensor size of each rank + local_size = torch.tensor([tensor.numel()], device="cuda") + size_list = [torch.tensor([0], device="cuda") for _ in range(world_size)] + dist.all_gather(size_list, local_size) + size_list = [int(size.item()) for size in size_list] + max_size = max(size_list) + + # receiving Tensor from all ranks + # we pad the tensor because torch all_gather does not support + # gathering tensors of different shapes + tensor_list = [] + for _ in size_list: + tensor_list.append(torch.empty((max_size,), dtype=torch.uint8, device="cuda")) + if local_size != max_size: + padding = torch.empty(size=(max_size - local_size,), dtype=torch.uint8, device="cuda") + tensor = torch.cat((tensor, padding), dim=0) + dist.all_gather(tensor_list, tensor) + + data_list = [] + for size, tensor in zip(size_list, tensor_list): + buffer = tensor.cpu().numpy().tobytes()[:size] + data_list.append(pickle.loads(buffer)) + + return data_list + + +def reduce_dict(input_dict, average=True): + """ + Args: + input_dict (dict): all the values will be reduced + average (bool): whether to do average or sum + Reduce the values in the dictionary from all processes so that all processes + have the averaged results. Returns a dict with the same fields as + input_dict, after reduction. + """ + world_size = get_world_size() + if world_size < 2: + return input_dict + with torch.no_grad(): + names = [] + values = [] + # sort the keys so that they are consistent across processes + for k in sorted(input_dict.keys()): + names.append(k) + values.append(input_dict[k]) + values = torch.stack(values, dim=0) + dist.all_reduce(values) + if average: + values /= world_size + reduced_dict = {k: v for k, v in zip(names, values)} + return reduced_dict + + +class MetricLogger(object): + def __init__(self, delimiter="\t"): + self.meters = defaultdict(SmoothedValue) + self.delimiter = delimiter + + def update(self, **kwargs): + for k, v in kwargs.items(): + if isinstance(v, torch.Tensor): + v = v.item() + assert isinstance(v, (float, int)) + self.meters[k].update(v) + + def __getattr__(self, attr): + if attr in self.meters: + return self.meters[attr] + if attr in self.__dict__: + return self.__dict__[attr] + raise AttributeError("'{}' object has no attribute '{}'".format( + type(self).__name__, attr)) + + def __str__(self): + loss_str = [] + for name, meter in self.meters.items(): + loss_str.append( + "{}: {}".format(name, str(meter)) + ) + return self.delimiter.join(loss_str) + + def synchronize_between_processes(self): + for meter in self.meters.values(): + meter.synchronize_between_processes() + + def add_meter(self, name, meter): + self.meters[name] = meter + + def log_every(self, iterable, print_freq, header=None): + i = 0 + if not header: + header = '' + start_time = time.time() + end = time.time() + iter_time = SmoothedValue(fmt='{avg:.4f}') + data_time = SmoothedValue(fmt='{avg:.4f}') + space_fmt = ':' + str(len(str(len(iterable)))) + 'd' + if torch.cuda.is_available(): + log_msg = self.delimiter.join([ + header, + '[{0' + space_fmt + '}/{1}]', + 'eta: {eta}', + '{meters}', + 'time: {time}', + 'data: {data}', + 'max mem: {memory:.0f}' + ]) + else: + log_msg = self.delimiter.join([ + header, + '[{0' + space_fmt + '}/{1}]', + 'eta: {eta}', + '{meters}', + 'time: {time}', + 'data: {data}' + ]) + MB = 1024.0 * 1024.0 + for obj in iterable: + data_time.update(time.time() - end) + yield obj + iter_time.update(time.time() - end) + if i % print_freq == 0 or i == len(iterable) - 1: + eta_seconds = iter_time.global_avg * (len(iterable) - i) + eta_string = str(datetime.timedelta(seconds=int(eta_seconds))) + if torch.cuda.is_available(): + print(log_msg.format( + i, len(iterable), eta=eta_string, + meters=str(self), + time=str(iter_time), data=str(data_time), + memory=torch.cuda.max_memory_allocated() / MB)) + else: + print(log_msg.format( + i, len(iterable), eta=eta_string, + meters=str(self), + time=str(iter_time), data=str(data_time))) + i += 1 + end = time.time() + total_time = time.time() - start_time + total_time_str = str(datetime.timedelta(seconds=int(total_time))) + print('{} Total time: {} ({:.4f} s / it)'.format( + header, total_time_str, total_time / len(iterable))) + + +def collate_fn(batch): + return tuple(zip(*batch)) + + +def warmup_lr_scheduler(optimizer, warmup_iters, warmup_factor): + + def f(x): + if x >= warmup_iters: + return 1 + alpha = float(x) / warmup_iters + return warmup_factor * (1 - alpha) + alpha + + return torch.optim.lr_scheduler.LambdaLR(optimizer, f) + + +def mkdir(path): + try: + os.makedirs(path) + except OSError as e: + if e.errno != errno.EEXIST: + raise + + +def setup_for_distributed(is_master): + """ + This function disables printing when not in master process + """ + import builtins as __builtin__ + builtin_print = __builtin__.print + + def print(*args, **kwargs): + force = kwargs.pop('force', False) + if is_master or force: + builtin_print(*args, **kwargs) + + __builtin__.print = print + + +def is_dist_avail_and_initialized(): + if not dist.is_available(): + return False + if not dist.is_initialized(): + return False + return True + + +def get_world_size(): + if not is_dist_avail_and_initialized(): + return 1 + return dist.get_world_size() + + +def get_rank(): + if not is_dist_avail_and_initialized(): + return 0 + return dist.get_rank() + + +def is_main_process(): + return get_rank() == 0 + + +def save_on_master(*args, **kwargs): + if is_main_process(): + torch.save(*args, **kwargs) + + +def init_distributed_mode(args): + if 'RANK' in os.environ and 'WORLD_SIZE' in os.environ: + args.rank = int(os.environ["RANK"]) + args.world_size = int(os.environ['WORLD_SIZE']) + args.gpu = int(os.environ['LOCAL_RANK']) + elif 'SLURM_PROCID' in os.environ: + args.rank = int(os.environ['SLURM_PROCID']) + args.gpu = args.rank % torch.cuda.device_count() + else: + print('Not using distributed mode') + args.distributed = False + return + + args.distributed = True + + torch.cuda.set_device(args.gpu) + args.dist_backend = 'nccl' + print('| distributed init (rank {}): {}'.format( + args.rank, args.dist_url), flush=True) + torch.distributed.init_process_group(backend=args.dist_backend, init_method=args.dist_url, + world_size=args.world_size, rank=args.rank) + torch.distributed.barrier() + setup_for_distributed(args.rank == 0) diff --git a/how-to-use-azureml/ml-frameworks/scikit-learn/training/train-hyperparameter-tune-deploy-with-sklearn/train-hyperparameter-tune-deploy-with-sklearn.ipynb b/how-to-use-azureml/ml-frameworks/scikit-learn/training/train-hyperparameter-tune-deploy-with-sklearn/train-hyperparameter-tune-deploy-with-sklearn.ipynb index a501e22fc..281864b32 100644 --- a/how-to-use-azureml/ml-frameworks/scikit-learn/training/train-hyperparameter-tune-deploy-with-sklearn/train-hyperparameter-tune-deploy-with-sklearn.ipynb +++ b/how-to-use-azureml/ml-frameworks/scikit-learn/training/train-hyperparameter-tune-deploy-with-sklearn/train-hyperparameter-tune-deploy-with-sklearn.ipynb @@ -487,6 +487,15 @@ "hyperdrive_run.wait_for_completion(show_output=True)" ] }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "assert(hyperdrive_run.get_status() == \"Completed\")" + ] + }, { "cell_type": "markdown", "metadata": {}, diff --git a/how-to-use-azureml/ml-frameworks/tensorflow/deployment/train-hyperparameter-tune-deploy-with-tensorflow/train-hyperparameter-tune-deploy-with-tensorflow.ipynb b/how-to-use-azureml/ml-frameworks/tensorflow/deployment/train-hyperparameter-tune-deploy-with-tensorflow/train-hyperparameter-tune-deploy-with-tensorflow.ipynb index ce6ff6dec..314161c1c 100644 --- a/how-to-use-azureml/ml-frameworks/tensorflow/deployment/train-hyperparameter-tune-deploy-with-tensorflow/train-hyperparameter-tune-deploy-with-tensorflow.ipynb +++ b/how-to-use-azureml/ml-frameworks/tensorflow/deployment/train-hyperparameter-tune-deploy-with-tensorflow/train-hyperparameter-tune-deploy-with-tensorflow.ipynb @@ -161,7 +161,7 @@ }, "source": [ "## Download MNIST dataset\n", - "In order to train on the MNIST dataset we will first need to download it from Yan LeCun's web site directly and save them in a `data` folder locally." + "In order to train on the MNIST dataset we will first need to download it from azuremlopendatasets blob directly and save them in a `data` folder locally. If you want you can directly download the same data from Yan LeCun's web site." ] }, { @@ -171,13 +171,17 @@ "outputs": [], "source": [ "import urllib\n", + "data_folder = 'data'\n", + "os.makedirs(data_folder, exist_ok=True)\n", "\n", - "os.makedirs('./data/mnist', exist_ok=True)\n", - "\n", - "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz', filename = './data/mnist/train-images.gz')\n", - "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz', filename = './data/mnist/train-labels.gz')\n", - "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz', filename = './data/mnist/test-images.gz')\n", - "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz', filename = './data/mnist/test-labels.gz')" + "urllib.request.urlretrieve('https://azureopendatastorage.blob.core.windows.net/mnist/train-images-idx3-ubyte.gz',\n", + " filename=os.path.join(data_folder, 'train-images.gz'))\n", + "urllib.request.urlretrieve('https://azureopendatastorage.blob.core.windows.net/mnist/train-labels-idx1-ubyte.gz',\n", + " filename=os.path.join(data_folder, 'train-labels.gz'))\n", + "urllib.request.urlretrieve('https://azureopendatastorage.blob.core.windows.net/mnist/t10k-images-idx3-ubyte.gz',\n", + " filename=os.path.join(data_folder, 'test-images.gz'))\n", + "urllib.request.urlretrieve('https://azureopendatastorage.blob.core.windows.net/mnist/t10k-labels-idx1-ubyte.gz',\n", + " filename=os.path.join(data_folder, 'test-labels.gz'))" ] }, { @@ -205,11 +209,11 @@ "from utils import load_data\n", "\n", "# note we also shrink the intensity values (X) from 0-255 to 0-1. This helps the neural network converge faster.\n", - "X_train = load_data('./data/mnist/train-images.gz', False) / 255.0\n", - "y_train = load_data('./data/mnist/train-labels.gz', True).reshape(-1)\n", + "X_train = load_data(os.path.join(data_folder, 'train-images.gz'), False) / 255.0\n", + "y_train = load_data(os.path.join(data_folder, 'train-labels.gz'), True).reshape(-1)\n", "\n", - "X_test = load_data('./data/mnist/test-images.gz', False) / 255.0\n", - "y_test = load_data('./data/mnist/test-labels.gz', True).reshape(-1)\n", + "X_test = load_data(os.path.join(data_folder, 'test-images.gz'), False) / 255.0\n", + "y_test = load_data(os.path.join(data_folder, 'test-labels.gz'), True).reshape(-1)\n", "\n", "count = 0\n", "sample_size = 30\n", @@ -239,10 +243,10 @@ "outputs": [], "source": [ "from azureml.core.dataset import Dataset\n", - "web_paths = ['http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz',\n", - " 'http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz',\n", - " 'http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz',\n", - " 'http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz'\n", + "web_paths = ['https://azureopendatastorage.blob.core.windows.net/mnist/train-images-idx3-ubyte.gz',\n", + " 'https://azureopendatastorage.blob.core.windows.net/mnist/train-labels-idx1-ubyte.gz',\n", + " 'https://azureopendatastorage.blob.core.windows.net/mnist/t10k-images-idx3-ubyte.gz',\n", + " 'https://azureopendatastorage.blob.core.windows.net/mnist/t10k-labels-idx1-ubyte.gz'\n", " ]\n", "dataset = Dataset.File.from_files(path = web_paths)" ] @@ -251,7 +255,8 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "You may want to regiester datasets using the register() method to your workspace so they can be shared with others, reused across various experiments, and referred to by name in your training script." + "You may want to regiester datasets using the register() method to your workspace so they can be shared with others, reused across various experiments, and referred to by name in your training script.\n", + "You can try get the dataset first to see if it's already registered." ] }, { @@ -260,10 +265,18 @@ "metadata": {}, "outputs": [], "source": [ - "dataset = dataset.register(workspace = ws,\n", - " name = 'mnist dataset',\n", - " description='training and test dataset',\n", - " create_new_version=True)\n", + "dataset_registered = False\n", + "try:\n", + " temp = Dataset.get_by_name(workspace = ws, name = 'mnist-dataset')\n", + " dataset_registered = True\n", + "except:\n", + " print(\"The dataset mnist-dataset is not registered in workspace yet.\")\n", + "\n", + "if not dataset_registered:\n", + " dataset = dataset.register(workspace = ws,\n", + " name = 'mnist-dataset',\n", + " description='training and test dataset',\n", + " create_new_version=True)\n", "# list the files referenced by dataset\n", "dataset.to_path()" ] @@ -819,6 +832,15 @@ "htr.wait_for_completion(show_output=True)" ] }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "assert(htr.get_status() == \"Completed\")" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -908,13 +930,16 @@ "def init():\n", " global X, output, sess\n", " tf.reset_default_graph()\n", - " model_root = Model.get_model_path('tf-dnn-mnist')\n", - " saver = tf.train.import_meta_graph(os.path.join(model_root, 'mnist-tf.model.meta'))\n", + " model_root = os.getenv('AZUREML_MODEL_DIR')\n", + " # the name of the folder in which to look for tensorflow model files\n", + " tf_model_folder = 'model'\n", + " saver = tf.train.import_meta_graph(\n", + " os.path.join(model_root, tf_model_folder, 'mnist-tf.model.meta'))\n", " X = tf.get_default_graph().get_tensor_by_name(\"network/X:0\")\n", " output = tf.get_default_graph().get_tensor_by_name(\"network/output/MatMul:0\")\n", - " \n", + "\n", " sess = tf.Session()\n", - " saver.restore(sess, os.path.join(model_root, 'mnist-tf.model'))\n", + " saver.restore(sess, os.path.join(model_root, tf_model_folder, 'mnist-tf.model'))\n", "\n", "def run(raw_data):\n", " data = np.array(json.loads(raw_data)['data'])\n", @@ -942,7 +967,8 @@ "\n", "cd = CondaDependencies.create()\n", "cd.add_conda_package('numpy')\n", - "cd.add_tensorflow_conda_package()\n", + "cd.add_pip_package('tensorflow==1.13.1')\n", + "cd.add_pip_package(\"azureml-defaults\")\n", "cd.save_to_file(base_directory='./', conda_file_path='myenv.yml')\n", "\n", "print(cd.serialize_to_string())" @@ -964,12 +990,12 @@ "source": [ "from azureml.core.webservice import AciWebservice\n", "from azureml.core.model import InferenceConfig\n", - "from azureml.core.webservice import Webservice\n", "from azureml.core.model import Model\n", + "from azureml.core.environment import Environment\n", + "\n", "\n", - "inference_config = InferenceConfig(runtime= \"python\", \n", - " entry_script=\"score.py\",\n", - " conda_file=\"myenv.yml\")\n", + "myenv = Environment.from_conda_specification(name=\"myenv\", file_path=\"myenv.yml\")\n", + "inference_config = InferenceConfig(entry_script=\"score.py\", environment=myenv)\n", "\n", "aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, \n", " memory_gb=1, \n", @@ -1110,13 +1136,11 @@ "metadata": {}, "outputs": [], "source": [ - "models = ws.models\n", - "for name, model in models.items():\n", - " print(\"Model: {}, ID: {}\".format(name, model.id))\n", + "model = ws.models['tf-dnn-mnist']\n", + "print(\"Model: {}, ID: {}\".format('tf-dnn-mnist', model.id))\n", " \n", - "webservices = ws.webservices\n", - "for name, webservice in webservices.items():\n", - " print(\"Webservice: {}, scoring URI: {}\".format(name, webservice.scoring_uri))" + "webservice = ws.webservices['tf-mnist-svc']\n", + "print(\"Webservice: {}, scoring URI: {}\".format('tf-mnist-svc', webservice.scoring_uri))" ] }, { diff --git a/how-to-use-azureml/ml-frameworks/tensorflow/training/distributed-tensorflow-with-horovod/tf_horovod_word2vec.py b/how-to-use-azureml/ml-frameworks/tensorflow/training/distributed-tensorflow-with-horovod/tf_horovod_word2vec.py index e075d4e8e..483053302 100644 --- a/how-to-use-azureml/ml-frameworks/tensorflow/training/distributed-tensorflow-with-horovod/tf_horovod_word2vec.py +++ b/how-to-use-azureml/ml-frameworks/tensorflow/training/distributed-tensorflow-with-horovod/tf_horovod_word2vec.py @@ -37,7 +37,7 @@ print("the input data is at %s" % input_data) # Step 1: Read data. -filename = glob.glob(os.path.join(input_data, '**/text8.zip'), recursive=True)[0] +filename = input_data # Read the data into a list of strings. diff --git a/how-to-use-azureml/ml-frameworks/tensorflow/training/hyperparameter-tune-and-warm-start-with-tensorflow/hyperparameter-tune-and-warm-start-with-tensorflow.ipynb b/how-to-use-azureml/ml-frameworks/tensorflow/training/hyperparameter-tune-and-warm-start-with-tensorflow/hyperparameter-tune-and-warm-start-with-tensorflow.ipynb index 86fc3d9b0..aabcacc51 100644 --- a/how-to-use-azureml/ml-frameworks/tensorflow/training/hyperparameter-tune-and-warm-start-with-tensorflow/hyperparameter-tune-and-warm-start-with-tensorflow.ipynb +++ b/how-to-use-azureml/ml-frameworks/tensorflow/training/hyperparameter-tune-and-warm-start-with-tensorflow/hyperparameter-tune-and-warm-start-with-tensorflow.ipynb @@ -149,7 +149,7 @@ "script_folder = './tf-mnist'\n", "os.makedirs(script_folder, exist_ok=True)\n", "\n", - "exp = Experiment(workspace=ws, name='tf-mnist')" + "exp = Experiment(workspace=ws, name='tf-mnist-2')" ] }, { @@ -171,13 +171,17 @@ "outputs": [], "source": [ "import urllib\n", + "data_folder = 'data'\n", + "os.makedirs(data_folder, exist_ok=True)\n", "\n", - "os.makedirs('./data/mnist', exist_ok=True)\n", - "\n", - "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz', filename = './data/mnist/train-images.gz')\n", - "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz', filename = './data/mnist/train-labels.gz')\n", - "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz', filename = './data/mnist/test-images.gz')\n", - "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz', filename = './data/mnist/test-labels.gz')" + "urllib.request.urlretrieve('https://azureopendatastorage.blob.core.windows.net/mnist/train-images-idx3-ubyte.gz',\n", + " filename=os.path.join(data_folder, 'train-images.gz'))\n", + "urllib.request.urlretrieve('https://azureopendatastorage.blob.core.windows.net/mnist/train-labels-idx1-ubyte.gz',\n", + " filename=os.path.join(data_folder, 'train-labels.gz'))\n", + "urllib.request.urlretrieve('https://azureopendatastorage.blob.core.windows.net/mnist/t10k-images-idx3-ubyte.gz',\n", + " filename=os.path.join(data_folder, 'test-images.gz'))\n", + "urllib.request.urlretrieve('https://azureopendatastorage.blob.core.windows.net/mnist/t10k-labels-idx1-ubyte.gz',\n", + " filename=os.path.join(data_folder, 'test-labels.gz'))" ] }, { @@ -204,13 +208,13 @@ "source": [ "from utils import load_data\n", "\n", - "# note we also shrink the intensity values (X) from 0-255 to 0-1. This helps the neural network converge faster.\n", - "X_train = load_data('./data/mnist/train-images.gz', False) / 255.0\n", - "y_train = load_data('./data/mnist/train-labels.gz', True).reshape(-1)\n", - "\n", - "X_test = load_data('./data/mnist/test-images.gz', False) / 255.0\n", - "y_test = load_data('./data/mnist/test-labels.gz', True).reshape(-1)\n", + "# note we also shrink the intensity values (X) from 0-255 to 0-1. This helps the model converge faster.\n", + "X_train = load_data(os.path.join(data_folder, 'train-images.gz'), False) / 255.0\n", + "X_test = load_data(os.path.join(data_folder, 'test-images.gz'), False) / 255.0\n", + "y_train = load_data(os.path.join(data_folder, 'train-labels.gz'), True).reshape(-1)\n", + "y_test = load_data(os.path.join(data_folder, 'test-labels.gz'), True).reshape(-1)\n", "\n", + "# now let's show some randomly chosen images from the training set.\n", "count = 0\n", "sample_size = 30\n", "plt.figure(figsize = (16, 6))\n", @@ -219,8 +223,8 @@ " plt.subplot(1, sample_size, count)\n", " plt.axhline('')\n", " plt.axvline('')\n", - " plt.text(x = 10, y = -10, s = y_train[i], fontsize = 18)\n", - " plt.imshow(X_train[i].reshape(28, 28), cmap = plt.cm.Greys)\n", + " plt.text(x=10, y=-10, s=y_train[i], fontsize=18)\n", + " plt.imshow(X_train[i].reshape(28, 28), cmap=plt.cm.Greys)\n", "plt.show()" ] }, @@ -251,7 +255,8 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Use the register() method to register datasets to your workspace so they can be shared with others, reused across various experiments, and referred to by name in your training script." + "Use the register() method to register datasets to your workspace so they can be shared with others, reused across various experiments, and referred to by name in your training script.\n", + "You can try get the dataset first to see if it's already registered." ] }, { @@ -260,10 +265,18 @@ "metadata": {}, "outputs": [], "source": [ - "dataset = dataset.register(workspace = ws,\n", - " name = 'mnist dataset',\n", - " description='training and test dataset',\n", - " create_new_version=True)" + "dataset_registered = False\n", + "try:\n", + " temp = Dataset.get_by_name(workspace = ws, name = 'mnist-dataset')\n", + " dataset_registered = True\n", + "except:\n", + " print(\"The dataset mnist-dataset is not registered in workspace yet.\")\n", + "\n", + "if not dataset_registered:\n", + " dataset = dataset.register(workspace = ws,\n", + " name = 'mnist-dataset',\n", + " description='training and test dataset',\n", + " create_new_version=True)" ] }, { @@ -630,6 +643,15 @@ "htr.wait_for_completion(show_output=True)" ] }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "assert(htr.get_status() == \"Completed\")" + ] + }, { "cell_type": "markdown", "metadata": {}, diff --git a/how-to-use-azureml/ml-frameworks/tensorflow/training/train-tensorflow-resume-training/train-tensorflow-resume-training.ipynb b/how-to-use-azureml/ml-frameworks/tensorflow/training/train-tensorflow-resume-training/train-tensorflow-resume-training.ipynb index be6851feb..71f1e7ec3 100644 --- a/how-to-use-azureml/ml-frameworks/tensorflow/training/train-tensorflow-resume-training/train-tensorflow-resume-training.ipynb +++ b/how-to-use-azureml/ml-frameworks/tensorflow/training/train-tensorflow-resume-training/train-tensorflow-resume-training.ipynb @@ -170,7 +170,8 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "you may want to register datasets using the register() method to your workspace so they can be shared with others, reused across various experiments, and referred to by name in your training script." + "you may want to register datasets using the register() method to your workspace so they can be shared with others, reused across various experiments, and referred to by name in your training script.\n", + "You can try get the dataset first to see if it's already registered." ] }, { @@ -179,11 +180,19 @@ "metadata": {}, "outputs": [], "source": [ - "#register dataset to workspace\n", - "dataset = dataset.register(workspace = ws,\n", - " name = 'mnist dataset',\n", - " description='training and test dataset',\n", - " create_new_version=True)" + "dataset_registered = False\n", + "try:\n", + " temp = Dataset.get_by_name(workspace = ws, name = 'mnist-dataset')\n", + " dataset_registered = True\n", + "except:\n", + " print(\"The dataset mnist-dataset is not registered in workspace yet.\")\n", + "\n", + "if not dataset_registered:\n", + " #register dataset to workspace\n", + " dataset = dataset.register(workspace = ws,\n", + " name = 'mnist-dataset',\n", + " description='training and test dataset',\n", + " create_new_version=True)" ] }, { diff --git a/how-to-use-azureml/monitor-models/data-drift/dataset/testing.csv b/how-to-use-azureml/monitor-models/data-drift/dataset/testing.csv new file mode 100644 index 000000000..64a243bb2 --- /dev/null +++ b/how-to-use-azureml/monitor-models/data-drift/dataset/testing.csv @@ -0,0 +1,346 @@ +latitude,longitude,temperature,windAngle,windSpeed,elevation +26.536,-81.755,17.8,10.0,2.1,9.0 +26.536,-81.755,16.7,360.0,1.5,9.0 +26.536,-81.755,16.1,350.0,1.5,9.0 +26.536,-81.755,15.0,0.0,0.0,9.0 +26.536,-81.755,14.4,350.0,1.5,9.0 +26.536,-81.755,0.0,0.0,0.0,9.0 +26.536,-81.755,13.9,360.0,2.1,9.0 +26.536,-81.755,13.3,350.0,1.5,9.0 +26.536,-81.755,13.3,10.0,2.1,9.0 +26.536,-81.755,13.3,360.0,1.5,9.0 +26.536,-81.755,13.3,0.0,0.0,9.0 +26.536,-81.755,12.2,0.0,0.0,9.0 +26.536,-81.755,11.7,0.0,0.0,9.0 +26.536,-81.755,14.4,0.0,0.0,9.0 +26.536,-81.755,17.2,10.0,2.6,9.0 +26.536,-81.755,20.0,20.0,2.6,9.0 +26.536,-81.755,22.2,10.0,3.6,9.0 +26.536,-81.755,23.3,30.0,4.6,9.0 +26.536,-81.755,23.3,330.0,2.6,9.0 +26.536,-81.755,24.4,0.0,0.0,9.0 +26.536,-81.755,25.0,360.0,3.1,9.0 +26.536,-81.755,24.4,20.0,4.1,9.0 +26.536,-81.755,23.3,10.0,2.6,9.0 +26.536,-81.755,21.1,30.0,2.1,9.0 +26.536,-81.755,18.3,0.0,0.0,9.0 +26.536,-81.755,17.2,30.0,2.1,9.0 +26.536,-81.755,15.6,60.0,2.6,9.0 +26.536,-81.755,15.6,0.0,0.0,9.0 +26.536,-81.755,13.9,60.0,2.6,9.0 +26.536,-81.755,12.8,70.0,2.6,9.0 +26.536,-81.755,0.0,0.0,0.0,9.0 +26.536,-81.755,11.7,70.0,2.1,9.0 +26.536,-81.755,12.2,20.0,2.1,9.0 +26.536,-81.755,11.7,30.0,1.5,9.0 +26.536,-81.755,11.1,40.0,2.1,9.0 +26.536,-81.755,12.2,40.0,2.6,9.0 +26.536,-81.755,12.2,30.0,2.6,9.0 +26.536,-81.755,12.2,0.0,0.0,9.0 +26.536,-81.755,15.0,30.0,6.2,9.0 +26.536,-81.755,17.2,50.0,3.6,9.0 +26.536,-81.755,20.6,60.0,5.1,9.0 +26.536,-81.755,22.8,50.0,4.6,9.0 +26.536,-81.755,24.4,80.0,6.2,9.0 +26.536,-81.755,25.0,100.0,5.7,9.0 +26.536,-81.755,25.6,60.0,3.1,9.0 +26.536,-81.755,25.6,80.0,4.6,9.0 +26.536,-81.755,25.0,90.0,5.1,9.0 +26.536,-81.755,24.4,80.0,5.1,9.0 +26.536,-81.755,21.1,60.0,2.6,9.0 +26.536,-81.755,19.4,70.0,3.6,9.0 +26.536,-81.755,18.3,70.0,2.6,9.0 +26.536,-81.755,18.3,80.0,2.6,9.0 +26.536,-81.755,17.2,60.0,1.5,9.0 +26.536,-81.755,16.1,70.0,2.6,9.0 +26.536,-81.755,15.6,70.0,2.6,9.0 +26.536,-81.755,0.0,0.0,0.0,9.0 +26.536,-81.755,16.1,50.0,2.6,9.0 +26.536,-81.755,15.6,50.0,2.1,9.0 +26.536,-81.755,15.0,50.0,1.5,9.0 +26.536,-81.755,15.0,0.0,0.0,9.0 +26.536,-81.755,15.0,0.0,0.0,9.0 +26.536,-81.755,14.4,0.0,0.0,9.0 +26.536,-81.755,14.4,30.0,4.1,9.0 +26.536,-81.755,16.1,40.0,1.5,9.0 +26.536,-81.755,19.4,0.0,1.5,9.0 +26.536,-81.755,22.8,90.0,2.6,9.0 +26.536,-81.755,24.4,130.0,3.6,9.0 +26.536,-81.755,25.6,100.0,4.6,9.0 +26.536,-81.755,26.1,120.0,3.1,9.0 +26.536,-81.755,26.7,0.0,2.6,9.0 +26.536,-81.755,27.2,0.0,0.0,9.0 +26.536,-81.755,27.2,40.0,3.1,9.0 +26.536,-81.755,26.1,30.0,1.5,9.0 +26.536,-81.755,22.8,310.0,2.1,9.0 +26.536,-81.755,23.3,330.0,2.1,9.0 +-34.067,-56.238,17.5,30.0,3.1,68.0 +-34.067,-56.238,21.2,30.0,5.7,68.0 +-34.067,-56.238,24.5,30.0,3.1,68.0 +-34.067,-56.238,27.5,330.0,3.6,68.0 +-34.067,-56.238,29.2,30.0,4.1,68.0 +-34.067,-56.238,31.0,20.0,4.6,68.0 +-34.067,-56.238,33.0,360.0,2.6,68.0 +-34.067,-56.238,33.6,60.0,3.1,68.0 +-34.067,-56.238,33.6,30.0,3.6,68.0 +-34.067,-56.238,18.6,40.0,3.1,68.0 +-34.067,-56.238,22.0,120.0,1.5,68.0 +-34.067,-56.238,25.0,120.0,2.6,68.0 +-34.067,-56.238,28.6,50.0,3.1,68.0 +-34.067,-56.238,30.6,50.0,4.1,68.0 +-34.067,-56.238,31.5,30.0,6.7,68.0 +-34.067,-56.238,32.0,40.0,7.2,68.0 +-34.067,-56.238,33.0,30.0,5.7,68.0 +-34.067,-56.238,33.2,360.0,3.6,68.0 +-34.067,-56.238,20.6,30.0,3.1,68.0 +-34.067,-56.238,21.2,0.0,0.0,68.0 +-34.067,-56.238,22.0,210.0,3.1,68.0 +-34.067,-56.238,23.0,210.0,3.6,68.0 +-34.067,-56.238,24.0,180.0,6.7,68.0 +-34.067,-56.238,24.5,210.0,7.2,68.0 +-34.067,-56.238,21.0,180.0,8.2,68.0 +-34.067,-56.238,20.0,180.0,6.7,68.0 +-34.083,-56.233,20.2,180.0,7.2,68.0 +-29.917,-71.2,16.6,290.0,4.1,146.0 +-29.916,-71.2,17.0,290.0,4.1,147.0 +-29.916,-71.2,16.0,310.0,3.1,147.0 +-29.916,-71.2,16.0,300.0,2.1,147.0 +-29.917,-71.2,15.1,0.0,0.0,146.0 +-29.916,-71.2,15.0,0.0,1.0,147.0 +-29.916,-71.2,15.0,160.0,1.0,147.0 +-29.916,-71.2,15.0,120.0,1.0,147.0 +-29.917,-71.2,14.3,190.0,1.0,146.0 +-29.916,-71.2,14.0,190.0,1.0,147.0 +-29.916,-71.2,14.0,0.0,0.0,147.0 +-29.916,-71.2,14.0,100.0,3.1,147.0 +-29.917,-71.2,12.9,0.0,0.0,146.0 +-29.916,-71.2,13.0,0.0,1.0,147.0 +-29.916,-71.2,14.0,0.0,0.5,147.0 +-29.916,-71.2,15.0,0.0,0.5,147.0 +-29.917,-71.2,15.9,0.0,0.0,146.0 +-29.916,-71.2,16.0,0.0,0.0,147.0 +-29.916,-71.2,17.0,270.0,4.6,147.0 +-29.916,-71.2,19.0,260.0,4.1,147.0 +-29.917,-71.2,18.1,270.0,6.2,146.0 +-29.916,-71.2,18.0,270.0,6.2,147.0 +-29.916,-71.2,19.0,270.0,6.2,147.0 +-29.916,-71.2,20.0,260.0,5.1,147.0 +-29.917,-71.2,19.6,280.0,6.2,146.0 +-29.916,-71.2,20.0,280.0,6.2,147.0 +-29.916,-71.2,20.0,270.0,6.2,147.0 +-29.916,-71.2,19.0,280.0,6.7,147.0 +-29.917,-71.2,18.3,270.0,5.7,146.0 +-29.916,-71.2,18.0,270.0,5.7,147.0 +-29.916,-71.2,18.0,0.0,0.0,147.0 +-29.916,-71.2,17.0,280.0,4.6,147.0 +-29.917,-71.2,15.9,280.0,4.1,146.0 +-29.916,-71.2,16.0,280.0,4.1,147.0 +-29.916,-71.2,15.0,280.0,3.6,147.0 +-29.916,-71.2,15.0,280.0,3.6,147.0 +-29.917,-71.2,15.4,280.0,4.1,146.0 +-29.916,-71.2,15.0,280.0,4.1,147.0 +-29.916,-71.2,16.0,240.0,2.1,147.0 +-29.916,-71.2,15.0,0.0,0.5,147.0 +-29.917,-71.2,15.8,80.0,3.6,146.0 +-29.916,-71.2,16.0,80.0,3.6,147.0 +-29.916,-71.2,16.0,10.0,1.5,147.0 +-29.916,-71.2,16.0,100.0,1.5,147.0 +-29.917,-71.2,15.3,130.0,1.5,146.0 +-29.916,-71.2,15.0,130.0,1.5,147.0 +-29.916,-71.2,15.0,110.0,1.0,147.0 +-29.916,-71.2,16.0,280.0,6.2,147.0 +-29.917,-71.2,15.9,240.0,3.6,146.0 +-29.916,-71.2,16.0,240.0,3.6,147.0 +-29.916,-71.2,16.0,240.0,3.1,147.0 +-29.916,-71.2,16.0,220.0,3.1,147.0 +-29.917,-71.2,16.4,260.0,3.1,146.0 +-29.916,-71.2,16.0,260.0,3.1,147.0 +-29.916,-71.2,17.0,230.0,2.6,147.0 +-29.916,-71.2,18.0,0.0,1.5,147.0 +-29.917,-71.2,20.3,340.0,2.6,146.0 +-29.916,-71.2,20.0,340.0,2.6,147.0 +-29.916,-71.2,21.0,270.0,5.1,147.0 +-29.916,-71.2,20.0,270.0,6.7,147.0 +-29.917,-71.2,19.2,280.0,6.7,146.0 +-29.916,-71.2,19.0,280.0,6.7,147.0 +-29.916,-71.2,19.0,310.0,2.6,147.0 +-29.916,-71.2,18.0,270.0,5.1,147.0 +-29.917,-71.2,17.0,300.0,4.6,146.0 +-29.916,-71.2,17.0,300.0,4.6,147.0 +-29.916,-71.2,17.0,300.0,3.6,147.0 +-29.916,-71.2,17.0,290.0,3.1,147.0 +-29.917,-71.2,16.3,290.0,2.1,146.0 +-29.916,-71.2,16.0,290.0,2.1,147.0 +-29.916,-71.2,17.0,270.0,1.0,147.0 +-29.916,-71.2,17.0,0.0,0.5,147.0 +-29.917,-71.2,16.5,160.0,2.1,146.0 +-29.916,-71.2,17.0,160.0,2.1,147.0 +-29.916,-71.2,15.0,120.0,3.1,147.0 +-29.916,-71.2,16.0,180.0,1.5,147.0 +-29.917,-71.2,14.7,0.0,0.0,146.0 +-29.916,-71.2,15.0,0.0,1.0,147.0 +-29.916,-71.2,15.0,300.0,1.0,147.0 +-29.916,-71.2,16.0,0.0,0.0,147.0 +-29.917,-71.2,18.5,110.0,1.0,146.0 +-29.916,-71.2,19.0,110.0,1.0,147.0 +-29.916,-71.2,20.0,270.0,3.6,147.0 +-29.916,-71.2,20.0,270.0,5.7,147.0 +-29.917,-71.2,20.0,280.0,6.2,146.0 +-29.916,-71.2,20.0,280.0,6.2,147.0 +-29.916,-71.2,21.0,290.0,6.7,147.0 +-29.916,-71.2,20.0,270.0,6.2,147.0 +-29.917,-71.2,21.0,260.0,6.7,146.0 +-29.916,-71.2,21.0,260.0,6.7,147.0 +-29.916,-71.2,20.0,270.0,6.2,147.0 +-29.916,-71.2,19.0,260.0,5.1,147.0 +-29.916,-71.2,18.0,280.0,4.6,147.0 +-29.917,-71.2,17.5,280.0,3.1,146.0 +-29.916,-71.2,18.0,280.0,3.1,147.0 +30.349,-85.788,11.1,0.0,0.0,21.0 +30.349,-85.788,11.1,0.0,0.0,21.0 +30.349,-85.788,9.4,0.0,0.0,21.0 +30.349,-85.788,9.4,0.0,0.0,21.0 +30.349,-85.788,8.3,300.0,2.1,21.0 +30.349,-85.788,11.1,280.0,1.5,21.0 +30.349,-85.788,0.0,0.0,0.0,21.0 +30.349,-85.788,10.6,320.0,3.1,21.0 +30.349,-85.788,9.4,310.0,3.1,21.0 +30.349,-85.788,7.8,320.0,2.6,21.0 +30.349,-85.788,6.1,340.0,2.1,21.0 +30.349,-85.788,6.7,330.0,2.6,21.0 +30.349,-85.788,6.1,310.0,1.5,21.0 +30.349,-85.788,7.2,310.0,2.1,21.0 +30.349,-85.788,12.8,360.0,3.1,21.0 +30.349,-85.788,15.0,0.0,3.1,21.0 +30.349,-85.788,16.7,20.0,4.6,21.0 +30.349,-85.788,18.9,30.0,5.1,21.0 +30.349,-85.788,19.4,10.0,4.1,21.0 +30.349,-85.788,21.1,330.0,2.6,21.0 +30.349,-85.788,21.1,10.0,4.6,21.0 +30.349,-85.788,21.7,360.0,4.1,21.0 +30.349,-85.788,21.7,30.0,2.1,21.0 +30.349,-85.788,21.7,330.0,2.6,21.0 +30.349,-85.788,16.1,350.0,2.1,21.0 +30.349,-85.788,11.7,0.0,0.0,21.0 +30.349,-85.788,8.9,0.0,0.0,21.0 +30.349,-85.788,9.4,0.0,0.0,21.0 +30.349,-85.788,7.8,0.0,0.0,21.0 +30.349,-85.788,11.1,30.0,3.1,21.0 +30.349,-85.788,7.2,0.0,0.0,21.0 +30.349,-85.788,7.2,0.0,0.0,21.0 +30.349,-85.788,0.0,0.0,0.0,21.0 +30.349,-85.788,7.8,30.0,2.1,21.0 +30.349,-85.788,8.3,40.0,2.6,21.0 +30.349,-85.788,7.2,50.0,1.5,21.0 +30.349,-85.788,8.3,60.0,1.5,21.0 +30.349,-85.788,5.6,40.0,2.1,21.0 +30.349,-85.788,6.7,40.0,2.1,21.0 +30.349,-85.788,7.8,50.0,3.1,21.0 +30.349,-85.788,11.7,70.0,2.6,21.0 +30.349,-85.788,15.6,70.0,3.1,21.0 +30.349,-85.788,18.9,100.0,3.6,21.0 +30.349,-85.788,20.0,130.0,3.6,21.0 +30.349,-85.788,21.1,140.0,4.1,21.0 +30.349,-85.788,21.7,150.0,4.1,21.0 +30.349,-85.788,21.7,170.0,3.1,21.0 +30.349,-85.788,22.2,170.0,3.1,21.0 +30.349,-85.788,20.6,0.0,0.0,21.0 +30.349,-85.788,17.2,0.0,0.0,21.0 +30.349,-85.788,14.4,0.0,0.0,21.0 +30.349,-85.788,12.8,100.0,1.5,21.0 +30.349,-85.788,13.3,100.0,1.5,21.0 +30.349,-85.788,10.6,0.0,0.0,21.0 +30.349,-85.788,9.4,0.0,0.0,21.0 +30.349,-85.788,7.8,0.0,0.0,21.0 +30.358,-85.799,8.3,0.0,0.0,21.0 +30.349,-85.788,0.0,0.0,0.0,21.0 +30.358,-85.799,6.7,0.0,0.0,21.0 +30.358,-85.799,7.2,0.0,0.0,21.0 +30.358,-85.799,7.2,0.0,0.0,21.0 +30.358,-85.799,8.3,50.0,1.5,21.0 +30.358,-85.799,9.4,0.0,0.0,21.0 +30.358,-85.799,8.9,0.0,0.0,21.0 +30.358,-85.799,10.0,340.0,1.5,21.0 +30.358,-85.799,12.8,40.0,1.5,21.0 +30.358,-85.799,16.7,100.0,2.1,21.0 +30.358,-85.799,21.1,100.0,1.5,21.0 +30.358,-85.799,23.3,0.0,0.0,21.0 +30.358,-85.799,25.0,180.0,4.6,21.0 +30.358,-85.799,24.4,230.0,3.6,21.0 +30.358,-85.799,25.0,210.0,4.1,21.0 +30.358,-85.799,23.9,170.0,4.1,21.0 +30.358,-85.799,22.8,0.0,0.0,21.0 +30.358,-85.799,19.4,0.0,0.0,21.0 +30.358,-85.799,17.8,140.0,2.1,21.0 +60.383,5.333,-0.7,0.0,0.0,36.0 +60.383,5.333,0.6,270.0,2.0,36.0 +60.383,5.333,-0.9,120.0,1.0,36.0 +60.383,5.333,-1.6,130.0,2.0,36.0 +60.383,5.333,-1.4,150.0,1.0,36.0 +60.383,5.333,-1.7,0.0,0.0,36.0 +60.383,5.333,-1.7,140.0,1.0,36.0 +60.383,5.333,-1.4,0.0,0.0,36.0 +60.383,5.333,-1.0,0.0,0.0,36.0 +60.383,5.333,-1.0,150.0,1.0,36.0 +60.383,5.333,-0.7,140.0,1.0,36.0 +60.383,5.333,0.5,150.0,1.0,36.0 +60.383,5.333,1.9,0.0,0.0,36.0 +60.383,5.333,1.7,0.0,0.0,36.0 +60.383,5.333,2.1,310.0,2.0,36.0 +60.383,5.333,1.5,90.0,1.0,36.0 +60.383,5.333,1.9,290.0,1.0,36.0 +60.383,5.333,2.0,320.0,1.0,36.0 +60.383,5.333,1.9,330.0,1.0,36.0 +60.383,5.333,1.3,350.0,1.0,36.0 +60.383,5.333,1.5,120.0,1.0,36.0 +60.383,5.333,1.3,150.0,2.0,36.0 +60.383,5.333,0.8,140.0,1.0,36.0 +60.383,5.333,0.3,300.0,1.0,36.0 +60.383,5.333,0.2,140.0,1.0,36.0 +60.383,5.333,0.4,140.0,1.0,36.0 +60.383,5.333,0.5,320.0,1.0,36.0 +60.383,5.333,1.5,330.0,1.0,36.0 +60.383,5.333,1.8,40.0,1.0,36.0 +60.383,5.333,2.3,170.0,1.0,36.0 +60.383,5.333,2.7,140.0,1.0,36.0 +60.383,5.333,3.1,330.0,1.0,36.0 +60.383,5.333,3.8,350.0,1.0,36.0 +60.383,5.333,3.8,140.0,1.0,36.0 +60.383,5.333,4.1,150.0,1.0,36.0 +60.383,5.333,4.4,180.0,1.0,36.0 +60.383,5.333,4.9,300.0,1.0,36.0 +60.383,5.333,5.2,320.0,1.0,36.0 +60.383,5.333,6.7,340.0,1.0,36.0 +60.383,5.333,6.9,250.0,1.0,36.0 +60.383,5.333,7.9,300.0,2.0,36.0 +60.383,5.333,5.5,140.0,1.0,36.0 +60.383,5.333,7.1,140.0,2.0,36.0 +60.383,5.333,7.0,280.0,2.0,36.0 +60.383,5.333,4.6,170.0,1.0,36.0 +60.383,5.333,4.8,330.0,1.0,36.0 +60.383,5.333,6.4,260.0,2.0,36.0 +60.383,5.333,6.2,340.0,1.0,36.0 +60.383,5.333,5.7,320.0,2.0,36.0 +60.383,5.333,5.2,100.0,1.0,36.0 +60.383,5.333,5.1,310.0,1.0,36.0 +60.383,5.333,4.9,290.0,2.0,36.0 +60.383,5.333,4.9,310.0,2.0,36.0 +60.383,5.333,6.1,320.0,2.0,36.0 +60.383,5.333,7.0,250.0,1.0,36.0 +60.383,5.333,5.3,140.0,1.0,36.0 +60.383,5.333,6.9,350.0,1.0,36.0 +60.383,5.333,9.7,110.0,3.0,36.0 +60.383,5.333,10.3,300.0,3.0,36.0 +60.383,5.333,8.7,310.0,1.0,36.0 +60.383,5.333,9.0,270.0,3.0,36.0 +60.383,5.333,11.6,80.0,3.0,36.0 +60.383,5.333,11.4,80.0,4.0,36.0 +60.383,5.333,9.7,70.0,5.0,36.0 +60.383,5.333,9.5,80.0,6.0,36.0 +60.383,5.333,8.7,80.0,5.0,36.0 +60.383,5.333,7.7,80.0,5.0,36.0 +60.383,5.333,8.2,80.0,4.0,36.0 +60.383,5.333,7.7,30.0,1.0,36.0 +60.383,5.333,7.2,310.0,1.0,36.0 +60.383,5.333,6.8,300.0,2.0,36.0 +60.383,5.333,6.7,140.0,1.0,36.0 diff --git a/how-to-use-azureml/monitor-models/data-drift/training-dataset/training.csv b/how-to-use-azureml/monitor-models/data-drift/dataset/training.csv similarity index 100% rename from how-to-use-azureml/monitor-models/data-drift/training-dataset/training.csv rename to how-to-use-azureml/monitor-models/data-drift/dataset/training.csv diff --git a/how-to-use-azureml/monitor-models/data-drift/drift-on-aks.ipynb b/how-to-use-azureml/monitor-models/data-drift/drift-on-aks.ipynb index 1e6a7f530..9131ed12e 100644 --- a/how-to-use-azureml/monitor-models/data-drift/drift-on-aks.ipynb +++ b/how-to-use-azureml/monitor-models/data-drift/drift-on-aks.ipynb @@ -92,7 +92,7 @@ "dstore = ws.get_default_datastore()\n", "\n", "# upload weather data\n", - "dstore.upload('training-dataset', 'drift-on-aks-data', overwrite=True, show_progress=False)" + "dstore.upload('dataset', 'drift-on-aks-data', overwrite=True, show_progress=False)" ] }, { @@ -229,7 +229,7 @@ "source": [ "## Run recent weather data through the webservice \n", "\n", - "The below cells take the past 2 days of weather data, filter and transform using the same processes as the training dataset, and runs the data through the service." + "The below cells take the weather data of Florida from 2019-11-20 to 2019-11-12, filter and transform using the same processes as the training dataset, and runs the data through the service." ] }, { @@ -238,16 +238,10 @@ "metadata": {}, "outputs": [], "source": [ - "from datetime import datetime, timedelta\n", - "from azureml.opendatasets import NoaaIsdWeather\n", - "\n", - "start = datetime.today() - timedelta(days=2)\n", - "end = datetime.today()\n", - "\n", - "isd = NoaaIsdWeather(start, end)\n", + "# create dataset \n", + "tset = Dataset.Tabular.from_delimited_files(dstore.path('drift-on-aks-data/testing.csv'))\n", "\n", - "df = isd.to_pandas_dataframe().fillna(0)\n", - "df = df[df['stationName'].str.contains('FLORIDA', regex=True, na=False)]\n", + "df = tset.to_pandas_dataframe().fillna(0)\n", "\n", "X_features = ['latitude', 'longitude', 'temperature', 'windAngle', 'windSpeed']\n", "y_features = ['elevation']\n", @@ -264,9 +258,9 @@ "source": [ "import json\n", "\n", - "today_data = json.dumps({'data': X.values.tolist()})\n", + "data = json.dumps({'data': X.values.tolist()})\n", "\n", - "data_encoded = bytes(today_data, encoding='utf8')\n", + "data_encoded = bytes(data, encoding='utf8')\n", "prediction = service.run(input_data=data_encoded)\n", "print(prediction)" ] @@ -342,6 +336,7 @@ "metadata": {}, "outputs": [], "source": [ + "from datetime import datetime, timedelta\n", "from azureml.datadrift import DataDriftDetector, AlertConfiguration\n", "\n", "services = [service_name]\n", @@ -395,15 +390,6 @@ "run = monitor.run(target_date, services, feature_list=feature_list, compute_target='cpu-cluster')" ] }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "time.sleep(1200)" - ] - }, { "cell_type": "markdown", "metadata": {}, @@ -423,6 +409,24 @@ "# Here we retrieve the individual service run to get its output results and metrics. \n", "\n", "child_run = list(run.get_children())[0]\n", + "child_run" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "child_run.wait_for_completion(wait_post_processing=True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ "results, metrics = monitor.get_output(run_id=child_run.id)" ] }, diff --git a/how-to-use-azureml/track-and-monitor-experiments/logging-api/logging-api.ipynb b/how-to-use-azureml/track-and-monitor-experiments/logging-api/logging-api.ipynb index 2cf01370a..fbc0d5065 100644 --- a/how-to-use-azureml/track-and-monitor-experiments/logging-api/logging-api.ipynb +++ b/how-to-use-azureml/track-and-monitor-experiments/logging-api/logging-api.ipynb @@ -100,7 +100,7 @@ "\n", "# Check core SDK version number\n", "\n", - "print(\"This notebook was created using SDK version 1.0.76.2, you are currently running version\", azureml.core.VERSION)" + "print(\"This notebook was created using SDK version 1.3.0, you are currently running version\", azureml.core.VERSION)" ] }, { diff --git a/how-to-use-azureml/track-and-monitor-experiments/tensorboard/tensorboard.ipynb b/how-to-use-azureml/track-and-monitor-experiments/tensorboard/tensorboard.ipynb index aa2837152..83ff13b5d 100644 --- a/how-to-use-azureml/track-and-monitor-experiments/tensorboard/tensorboard.ipynb +++ b/how-to-use-azureml/track-and-monitor-experiments/tensorboard/tensorboard.ipynb @@ -145,9 +145,12 @@ "import requests\n", "import os\n", "\n", - "tf_code = requests.get(\"https://raw.githubusercontent.com/tensorflow/tensorflow/r1.8/tensorflow/examples/tutorials/mnist/mnist_with_summaries.py\")\n", + "tf_code = requests.get(\"https://raw.githubusercontent.com/tensorflow/tensorflow/r2.1/tensorflow/examples/tutorials/mnist/mnist_with_summaries.py\")\n", + "input_code = requests.get(\"https://raw.githubusercontent.com/tensorflow/tensorflow/r2.1/tensorflow/examples/tutorials/mnist/input_data.py\")\n", "with open(os.path.join(exp_dir, \"mnist_with_summaries.py\"), \"w\") as file:\n", - " file.write(tf_code.text)" + " file.write(tf_code.text.replace(\"from tensorflow.examples.tutorials.mnist import input_data\", \"import input_data\"))\n", + "with open(os.path.join(exp_dir, \"input_data.py\"), \"w\") as file:\n", + " file.write(input_code.text)" ] }, { @@ -186,7 +189,7 @@ "from azureml.core import Experiment\n", "from azureml.core.script_run_config import ScriptRunConfig\n", "\n", - "logs_dir = os.path.join(os.curdir, \"logs\")\n", + "logs_dir = os.path.join(os.curdir, os.path.join(\"logs\", \"tb-logs\"))\n", "data_dir = os.path.abspath(os.path.join(os.curdir, \"mnist_data\"))\n", "\n", "if not path.exists(data_dir):\n", @@ -334,7 +337,8 @@ "tf_estimator = TensorFlow(source_directory=exp_dir,\n", " compute_target=attached_dsvm_compute,\n", " entry_script='mnist_with_summaries.py',\n", - " script_params=script_params)\n", + " script_params=script_params,\n", + " framework_version=\"2.0\")\n", "\n", "run = exp.submit(tf_estimator)\n", "\n", @@ -396,17 +400,16 @@ "metadata": {}, "outputs": [], "source": [ - "from azureml.core.compute import ComputeTarget, AmlCompute\n", - "\n", + "from azureml.core.compute import AmlCompute\n", "# choose a name for your cluster\n", - "cluster_name = \"cpucluster\"\n", + "cluster_name = \"cpu-cluster\"\n", "\n", "cts = ws.compute_targets\n", "found = False\n", "if cluster_name in cts and cts[cluster_name].type == 'AmlCompute':\n", - " found = True\n", - " print('Found existing compute target.')\n", - " compute_target = cts[cluster_name]\n", + " found = True\n", + " print('Found existing compute target.')\n", + " compute_target = cts[cluster_name]\n", "if not found:\n", " print('Creating a new compute target...')\n", " compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2', \n", @@ -444,7 +447,8 @@ "tf_estimator = TensorFlow(source_directory=exp_dir,\n", " compute_target=compute_target,\n", " entry_script='mnist_with_summaries.py',\n", - " script_params=script_params)\n", + " script_params=script_params,\n", + " framework_version=\"2.0\")\n", "\n", "run = exp.submit(tf_estimator)\n", "\n", @@ -539,6 +543,24 @@ "name": "roastala" } ], + "category": "training", + "compute": [ + "Local", + "DSVM", + "AML Compute" + ], + "datasets": [ + "None" + ], + "deployment": [ + "None" + ], + "exclude_from_index": false, + "framework": [ + "TensorFlow" + ], + "friendly_name": "Tensorboard integration with run history", + "index_order": 3, "kernelspec": { "display_name": "Python 3.6", "language": "python", @@ -556,28 +578,10 @@ "pygments_lexer": "ipython3", "version": "3.6.6" }, - "friendly_name": "Tensorboard integration with run history", - "exclude_from_index": false, - "index_order": 3, - "category": "training", - "task": "Run a TensorFlow job and view its Tensorboard output live", - "datasets": [ - "None" - ], - "compute": [ - "Local", - "DSVM", - "AML Compute" - ], - "deployment": [ - "None" - ], - "framework": [ - "TensorFlow" - ], "tags": [ "None" - ] + ], + "task": "Run a TensorFlow job and view its Tensorboard output live" }, "nbformat": 4, "nbformat_minor": 2 diff --git a/how-to-use-azureml/track-and-monitor-experiments/tensorboard/tensorboard.yml b/how-to-use-azureml/track-and-monitor-experiments/tensorboard/tensorboard.yml index cd0618ebd..024d3600f 100644 --- a/how-to-use-azureml/track-and-monitor-experiments/tensorboard/tensorboard.yml +++ b/how-to-use-azureml/track-and-monitor-experiments/tensorboard/tensorboard.yml @@ -3,4 +3,5 @@ dependencies: - pip: - azureml-sdk - azureml-tensorboard - - tensorflow<1.15 + - tensorflow + - setuptools>=41.0.0 diff --git a/how-to-use-azureml/track-and-monitor-experiments/using-mlflow/deploy-model/deploy-model.ipynb b/how-to-use-azureml/track-and-monitor-experiments/using-mlflow/deploy-model/deploy-model.ipynb deleted file mode 100644 index 4f4159407..000000000 --- a/how-to-use-azureml/track-and-monitor-experiments/using-mlflow/deploy-model/deploy-model.ipynb +++ /dev/null @@ -1,342 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/using-mlflow/deploy-model/deploy-model.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Deploy Model as Azure Machine Learning Web Service using MLflow\n", - "\n", - "This example shows you how to use mlflow together with Azure Machine Learning services for deploying a model as a web service. You'll learn how to:\n", - "\n", - " 1. Retrieve a previously trained scikit-learn model\n", - " 2. Create a Docker image from the model\n", - " 3. Deploy the model as a web service on Azure Container Instance\n", - " 4. Make a scoring request against the web service.\n", - "\n", - "## Prerequisites and Set-up\n", - "\n", - "This notebook requires you to first complete the [Use MLflow with Azure Machine Learning for Local Training Run](../train-local/train-local.ipnyb) or [Use MLflow with Azure Machine Learning for Remote Training Run](../train-remote/train-remote.ipnyb) notebook, so as to have an experiment run with uploaded model in your Azure Machine Learning Workspace.\n", - "\n", - "Also install following packages if you haven't already\n", - "\n", - "```\n", - "pip install azureml-mlflow pandas\n", - "```\n", - "\n", - "Then, import necessary packages:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import mlflow\n", - "import azureml.mlflow\n", - "import azureml.core\n", - "from azureml.core import Workspace\n", - "\n", - "# Check core SDK version number\n", - "print(\"SDK version:\", azureml.core.VERSION)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Connect to workspace and set MLflow tracking URI\n", - "\n", - "Setting the tracking URI is required for retrieving the model and creating an image using the MLflow APIs." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "ws = Workspace.from_config()\n", - "\n", - "mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Retrieve model from previous run\n", - "\n", - "Let's retrieve the experiment from training notebook, and list the runs within that experiment." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "experiment_name = \"experiment-with-mlflow\"\n", - "exp = ws.experiments[experiment_name]\n", - "\n", - "runs = list(exp.get_runs())\n", - "runs" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Then, let's select the most recent training run and find its ID. You also need to specify the path in run history where the model was saved. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "runid = runs[0].id\n", - "model_save_path = \"model\"" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create Docker image\n", - "\n", - "To create a Docker image with Azure Machine Learning for Model Management, use ```mlflow.azureml.build_image``` method. Specify the model path, your workspace, run ID and other parameters.\n", - "\n", - "MLflow automatically recognizes the model framework as scikit-learn, and creates the scoring logic and includes library dependencies for you.\n", - "\n", - "Note that the image creation can take several minutes." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import mlflow.azureml\n", - "\n", - "azure_image, azure_model = mlflow.azureml.build_image(model_uri=\"runs:/{}/{}\".format(runid, model_save_path),\n", - " workspace=ws,\n", - " model_name='diabetes-sklearn-model',\n", - " image_name='diabetes-sklearn-image',\n", - " synchronous=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Deploy web service\n", - "\n", - "Let's use Azure Machine Learning SDK to deploy the image as a web service. \n", - "\n", - "First, specify the deployment configuration. Azure Container Instance is a suitable choice for a quick dev-test deployment, while Azure Kubernetes Service is suitable for scalable production deployments.\n", - "\n", - "Then, deploy the image using Azure Machine Learning SDK's ```deploy_from_image``` method.\n", - "\n", - "Note that the deployment can take several minutes." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.webservice import AciWebservice, Webservice\n", - "\n", - "\n", - "aci_config = AciWebservice.deploy_configuration(cpu_cores=1, \n", - " memory_gb=1, \n", - " tags={\"method\" : \"sklearn\"}, \n", - " description='Diabetes model',\n", - " location='eastus2')\n", - "\n", - "\n", - "# Deploy the image to Azure Container Instances (ACI) for real-time serving\n", - "webservice = Webservice.deploy_from_image(\n", - " image=azure_image, workspace=ws, name=\"diabetes-model-1\", deployment_config=aci_config)\n", - "\n", - "\n", - "webservice.wait_for_deployment(show_output=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Make a scoring request\n", - "\n", - "Let's take the first few rows of test data and score them using the web service" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "test_rows = [\n", - " [0.01991321, 0.05068012, 0.10480869, 0.07007254, -0.03596778,\n", - " -0.0266789 , -0.02499266, -0.00259226, 0.00371174, 0.04034337],\n", - " [-0.01277963, -0.04464164, 0.06061839, 0.05285819, 0.04796534,\n", - " 0.02937467, -0.01762938, 0.03430886, 0.0702113 , 0.00720652],\n", - " [ 0.03807591, 0.05068012, 0.00888341, 0.04252958, -0.04284755,\n", - " -0.02104223, -0.03971921, -0.00259226, -0.01811827, 0.00720652]]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "MLflow-based web service for scikit-learn model requires the data to be converted to Pandas DataFrame, and then serialized as JSON. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import json\n", - "import pandas as pd\n", - "\n", - "test_rows_as_json = pd.DataFrame(test_rows).to_json(orient=\"split\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Let's pass the conveted and serialized data to web service to get the predictions." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "predictions = webservice.run(test_rows_as_json)\n", - "\n", - "print(predictions)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You can use the web service's scoring URI to make a raw HTTP request" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "webservice.scoring_uri" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You can diagnose the web service using ```get_logs``` method." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "webservice.get_logs()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Next Steps\n", - "\n", - "Learn about [model management and inference in Azure Machine Learning service](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-model-management-and-deployment)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "authors": [ - { - "name": "shipatel" - } - ], - "category": "deployment", - "compute": [ - "None" - ], - "datasets": [ - "Diabetes" - ], - "deployment": [ - "Azure Container Instance" - ], - "exclude_from_index": false, - "framework": [ - "Scikit-learn" - ], - "friendly_name": "Deploy a model as a web service using MLflow", - "index_order": 4, - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.4" - }, - "tags": [ - "None" - ], - "task": "Use MLflow with AML" - }, - "nbformat": 4, - "nbformat_minor": 2 -} \ No newline at end of file diff --git a/how-to-use-azureml/track-and-monitor-experiments/using-mlflow/deploy-model/deploy-model.yml b/how-to-use-azureml/track-and-monitor-experiments/using-mlflow/deploy-model/deploy-model.yml deleted file mode 100644 index 9abee0451..000000000 --- a/how-to-use-azureml/track-and-monitor-experiments/using-mlflow/deploy-model/deploy-model.yml +++ /dev/null @@ -1,8 +0,0 @@ -name: deploy-model -dependencies: -- scikit-learn -- matplotlib -- pip: - - azureml-sdk - - azureml-mlflow - - pandas diff --git a/how-to-use-azureml/track-and-monitor-experiments/using-mlflow/train-deploy-pytorch/scripts/train.py b/how-to-use-azureml/track-and-monitor-experiments/using-mlflow/train-deploy-pytorch/scripts/train.py deleted file mode 100644 index 85fa510f0..000000000 --- a/how-to-use-azureml/track-and-monitor-experiments/using-mlflow/train-deploy-pytorch/scripts/train.py +++ /dev/null @@ -1,150 +0,0 @@ -# Copyright (c) 2017, PyTorch Team -# All rights reserved -# Licensed under BSD 3-Clause License. - -# This example is based on PyTorch MNIST example: -# https://github.com/pytorch/examples/blob/master/mnist/main.py - -import mlflow -import mlflow.pytorch -from mlflow.utils.environment import _mlflow_conda_env -import warnings -import cloudpickle -import torch -import torch.nn as nn -import torch.nn.functional as F -import torch.optim as optim -import torchvision -from torchvision import datasets, transforms - - -class Net(nn.Module): - def __init__(self): - super(Net, self).__init__() - self.conv1 = nn.Conv2d(1, 20, 5, 1) - self.conv2 = nn.Conv2d(20, 50, 5, 1) - self.fc1 = nn.Linear(4 * 4 * 50, 500) - self.fc2 = nn.Linear(500, 10) - - def forward(self, x): - # Added the view for reshaping score requests - x = x.view(-1, 1, 28, 28) - x = F.relu(self.conv1(x)) - x = F.max_pool2d(x, 2, 2) - x = F.relu(self.conv2(x)) - x = F.max_pool2d(x, 2, 2) - x = x.view(-1, 4 * 4 * 50) - x = F.relu(self.fc1(x)) - x = self.fc2(x) - return F.log_softmax(x, dim=1) - - -def train(args, model, device, train_loader, optimizer, epoch): - model.train() - for batch_idx, (data, target) in enumerate(train_loader): - data, target = data.to(device), target.to(device) - optimizer.zero_grad() - output = model(data) - loss = F.nll_loss(output, target) - loss.backward() - optimizer.step() - if batch_idx % args.log_interval == 0: - print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format( - epoch, batch_idx * len(data), len(train_loader.dataset), - 100. * batch_idx / len(train_loader), loss.item())) - # Use MLflow logging - mlflow.log_metric("epoch_loss", loss.item()) - - -def test(args, model, device, test_loader): - model.eval() - test_loss = 0 - correct = 0 - with torch.no_grad(): - for data, target in test_loader: - data, target = data.to(device), target.to(device) - output = model(data) - # sum up batch loss - test_loss += F.nll_loss(output, target, reduction="sum").item() - # get the index of the max log-probability - pred = output.argmax(dim=1, keepdim=True) - correct += pred.eq(target.view_as(pred)).sum().item() - - test_loss /= len(test_loader.dataset) - print("\n") - print("Test set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n".format( - test_loss, correct, len(test_loader.dataset), - 100. * correct / len(test_loader.dataset))) - # Use MLflow logging - mlflow.log_metric("average_loss", test_loss) - - -class Args(object): - pass - - -# Training settings -args = Args() -setattr(args, 'batch_size', 64) -setattr(args, 'test_batch_size', 1000) -setattr(args, 'epochs', 3) # Higher number for better convergence -setattr(args, 'lr', 0.01) -setattr(args, 'momentum', 0.5) -setattr(args, 'no_cuda', True) -setattr(args, 'seed', 1) -setattr(args, 'log_interval', 10) -setattr(args, 'save_model', True) - -use_cuda = not args.no_cuda and torch.cuda.is_available() - -torch.manual_seed(args.seed) - -device = torch.device("cuda" if use_cuda else "cpu") - -kwargs = {'num_workers': 1, 'pin_memory': True} if use_cuda else {} -train_loader = torch.utils.data.DataLoader( - datasets.MNIST('../data', train=True, download=True, - transform=transforms.Compose([ - transforms.ToTensor(), - transforms.Normalize((0.1307,), (0.3081,)) - ])), - batch_size=args.batch_size, shuffle=True, **kwargs) -test_loader = torch.utils.data.DataLoader( - datasets.MNIST( - '../data', - train=False, - transform=transforms.Compose([ - transforms.ToTensor(), - transforms.Normalize((0.1307,), (0.3081,))])), - batch_size=args.test_batch_size, shuffle=True, **kwargs) - - -def driver(): - warnings.filterwarnings("ignore") - # Dependencies for deploying the model - pytorch_index = "https://download.pytorch.org/whl/" - pytorch_version = "cpu/torch-1.1.0-cp36-cp36m-linux_x86_64.whl" - deps = [ - "cloudpickle=={}".format(cloudpickle.__version__), - pytorch_index + pytorch_version, - "torchvision=={}".format(torchvision.__version__), - "Pillow=={}".format("6.0.0") - ] - with mlflow.start_run() as run: - model = Net().to(device) - optimizer = optim.SGD( - model.parameters(), - lr=args.lr, - momentum=args.momentum) - for epoch in range(1, args.epochs + 1): - train(args, model, device, train_loader, optimizer, epoch) - test(args, model, device, test_loader) - # Log model to run history using MLflow - if args.save_model: - model_env = _mlflow_conda_env(additional_pip_deps=deps) - mlflow.pytorch.log_model(model, "model", conda_env=model_env) - return run - - -if __name__ == "__main__": - driver() diff --git a/how-to-use-azureml/track-and-monitor-experiments/using-mlflow/train-deploy-pytorch/train-and-deploy-pytorch.ipynb b/how-to-use-azureml/track-and-monitor-experiments/using-mlflow/train-deploy-pytorch/train-and-deploy-pytorch.ipynb deleted file mode 100644 index ccc038d34..000000000 --- a/how-to-use-azureml/track-and-monitor-experiments/using-mlflow/train-deploy-pytorch/train-and-deploy-pytorch.ipynb +++ /dev/null @@ -1,501 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\n", - "\n", - "Licensed under the MIT License." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/using-mlflow/train-deploy-pytorch/train-deploy-pytorch.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Use MLflow with Azure Machine Learning to Train and Deploy PyTorch Image Classifier\n", - "\n", - "This example shows you how to use MLflow together with Azure Machine Learning services for tracking the metrics and artifacts while training a PyTorch model to classify MNIST digit images, and then deploy the model as a web service. You'll learn how to:\n", - "\n", - " 1. Set up MLflow tracking URI so as to use Azure ML\n", - " 2. Create experiment\n", - " 3. Instrument your model with MLflow tracking\n", - " 4. Train a PyTorch model locally\n", - " 5. Train a model on GPU compute on Azure\n", - " 6. View your experiment within your Azure ML Workspace in Azure Portal\n", - " 7. Create a Docker image from the trained model\n", - " 8. Deploy the model as a web service on Azure Container Instance\n", - " 9. Call the model to make predictions\n", - " \n", - "### Pre-requisites\n", - " \n", - "Make sure you have completed the [Configuration](../../../configuration.ipnyb) notebook to set up your Azure Machine Learning workspace and ensure other common prerequisites are met.\n", - "\n", - "Also, install mlflow-azureml package using ```pip install mlflow-azureml```. Note that mlflow-azureml installs mlflow package itself as a dependency, if you haven't done so previously.\n", - "\n", - "### Set-up\n", - "\n", - "Import packages and check versions of Azure ML SDK and MLflow installed on your computer. Then connect to your Workspace." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import sys, os\n", - "import mlflow\n", - "import mlflow.azureml\n", - "import mlflow.sklearn\n", - "\n", - "import azureml.core\n", - "from azureml.core import Workspace\n", - "\n", - "\n", - "print(\"SDK version:\", azureml.core.VERSION)\n", - "print(\"MLflow version:\", mlflow.version.VERSION)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "ws = Workspace.from_config()\n", - "ws.get_details()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Set tracking URI\n", - "\n", - "Set the MLFlow tracking URI to point to your Azure ML Workspace. The subsequent logging calls from MLFlow APIs will go to Azure ML services and will be tracked under your Workspace." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create Experiment\n", - "\n", - "In both MLflow and Azure ML, training runs are grouped into experiments. Let's create one for our experimentation." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "experiment_name = \"pytorch-with-mlflow\"\n", - "mlflow.set_experiment(experiment_name)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Train model locally while logging metrics and artifacts\n", - "\n", - "The ```scripts/train.py``` program contains the code to load the image dataset, and train and test the model. Within this program, the train.driver function wraps the end-to-end workflow.\n", - "\n", - "Within the driver, the ```mlflow.start_run``` starts MLflow tracking. Then, ```mlflow.log_metric``` functions are used to track the convergence of the neural network training iterations. Finally ```mlflow.pytorch.save_model``` is used to save the trained model in framework-aware manner.\n", - "\n", - "Let's add the program to search path, import it as a module, and then invoke the driver function. Note that the training can take few minutes." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "lib_path = os.path.abspath(\"scripts\")\n", - "sys.path.append(lib_path)\n", - "\n", - "import train" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run = train.driver()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You can view the metrics of the run at Azure Portal" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(azureml.mlflow.get_portal_url(run))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Train model on GPU compute on Azure\n", - "\n", - "Next, let's run the same script on GPU-enabled compute for faster training. If you've completed the the [Configuration](../../../configuration.ipnyb) notebook, you should have a GPU cluster named \"gpu-cluster\" available in your workspace. Otherwise, follow the instructions in the notebook to create one. For simplicity, this example uses single process on single VM to train the model.\n", - "\n", - "Create a PyTorch estimator to specify the training configuration: script, compute as well as additional packages needed. To enable MLflow tracking, include ```azureml-mlflow``` as pip package. The low-level specifications for the training run are encapsulated in the estimator instance." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.train.dnn import PyTorch\n", - "\n", - "pt = PyTorch(source_directory=\"./scripts\", \n", - " entry_script = \"train.py\", \n", - " compute_target = \"gpu-cluster\", \n", - " node_count = 1, \n", - " process_count_per_node = 1, \n", - " use_gpu=True,\n", - " pip_packages = [\"azureml-mlflow\", \"Pillow==6.0.0\"])\n", - "\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Get a reference to the experiment you created previously, but this time, as Azure Machine Learning experiment object.\n", - "\n", - "Then, use ```Experiment.submit``` method to start the remote training run. Note that the first training run often takes longer as Azure Machine Learning service builds the Docker image for executing the script. Subsequent runs will be faster as cached image is used." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import Experiment\n", - "\n", - "exp = Experiment(ws, experiment_name)\n", - "run = exp.submit(pt)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You can monitor the run and its metrics on Azure Portal." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Also, you can wait for run to complete." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run.wait_for_completion(show_output=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Deploy model as web service\n", - "\n", - "To deploy a web service, first create a Docker image, and then deploy that Docker image on inferencing compute.\n", - "\n", - "The ```mlflow.azureml.build_image``` function builds a Docker image from saved PyTorch model in a framework-aware manner. It automatically creates the PyTorch-specific inferencing wrapper code and specififies package dependencies for you." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "run.get_file_names()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Then build a docker image using *runs:/<run.id>/model* as the model_uri path.\n", - "\n", - "Note that the image building can take several minutes." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "model_path = \"model\"\n", - "\n", - "\n", - "azure_image, azure_model = mlflow.azureml.build_image(model_uri='runs:/{}/{}'.format(run.id, model_path),\n", - " workspace=ws,\n", - " model_name='pytorch_mnist',\n", - " image_name='pytorch-mnist-img',\n", - " synchronous=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Then, deploy the Docker image to Azure Container Instance: a serverless compute capable of running a single container. You can tag and add descriptions to help keep track of your web service. \n", - "\n", - "[Other inferencing compute choices](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-deploy-and-where) include Azure Kubernetes Service which provides scalable endpoint suitable for production use.\n", - "\n", - "Note that the service deployment can take several minutes." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core.webservice import AciWebservice, Webservice\n", - "\n", - "aci_config = AciWebservice.deploy_configuration(cpu_cores=2, \n", - " memory_gb=5, \n", - " tags={\"data\": \"MNIST\", \"method\" : \"pytorch\"}, \n", - " description=\"Predict using webservice\")\n", - "\n", - "\n", - "# Deploy the image to Azure Container Instances (ACI) for real-time serving\n", - "webservice = Webservice.deploy_from_image(\n", - " image=azure_image, workspace=ws, name=\"pytorch-mnist-1\", deployment_config=aci_config)\n", - "\n", - "\n", - "webservice.wait_for_deployment()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Once the deployment has completed you can check the scoring URI of the web service." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(\"Scoring URI is: {}\".format(webservice.scoring_uri))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "In case of a service creation issue, you can use ```webservice.get_logs()``` to get logs to debug." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Make predictions using web service\n", - "\n", - "To make the web service, create a test data set as normalized PyTorch tensors. \n", - "\n", - "Then, let's define a utility function that takes a random image and converts it into format and shape suitable for as input to PyTorch inferencing end-point. The conversion is done by: \n", - "\n", - " 1. Select a random (image, label) tuple\n", - " 2. Take the image and converting the tensor to NumPy array \n", - " 3. Reshape array into 1 x 1 x N array\n", - " * 1 image in batch, 1 color channel, N = 784 pixels for MNIST images\n", - " * Note also ```x = x.view(-1, 1, 28, 28)``` in net definition in ```train.py``` program to shape incoming scoring requests.\n", - " 4. Convert the NumPy array to list to make it into a built-in type.\n", - " 5. Create a dictionary {\"data\", <list>} that can be converted to JSON string for web service requests." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from torchvision import datasets, transforms\n", - "import random\n", - "import numpy as np\n", - "\n", - "test_data = datasets.MNIST('../data', train=False, transform=transforms.Compose([\n", - " transforms.ToTensor(),\n", - " transforms.Normalize((0.1307,), (0.3081,))]))\n", - "\n", - "\n", - "def get_random_image():\n", - " image_idx = random.randint(0,len(test_data))\n", - " image_as_tensor = test_data[image_idx][0]\n", - " return {\"data\": elem for elem in image_as_tensor.numpy().reshape(1,1,-1).tolist()}" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Then, invoke the web service using a random test image. Convert the dictionary containing the image to JSON string before passing it to web service.\n", - "\n", - "The response contains the raw scores for each label, with greater value indicating higher probability. Sort the labels and select the one with greatest score to get the prediction. Let's also plot the image sent to web service for comparison purposes." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%matplotlib inline\n", - "\n", - "import json\n", - "import matplotlib.pyplot as plt\n", - "\n", - "test_image = get_random_image()\n", - "\n", - "response = webservice.run(json.dumps(test_image))\n", - "\n", - "response = sorted(response[0].items(), key = lambda x: x[1], reverse = True)\n", - "\n", - "\n", - "print(\"Predicted label:\", response[0][0])\n", - "plt.imshow(np.array(test_image[\"data\"]).reshape(28,28), cmap = \"gray\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You can also call the web service using a raw POST method against the web service" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import requests\n", - "\n", - "response = requests.post(url=webservice.scoring_uri, data=json.dumps(test_image),headers={\"Content-type\": \"application/json\"})\n", - "print(response.text)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "authors": [ - { - "name": "shipatel" - } - ], - "category": "tutorial", - "celltoolbar": "Edit Metadata", - "compute": [ - "AML Compute" - ], - "datasets": [ - "MNIST" - ], - "deployment": [ - "Azure Container Instance" - ], - "exclude_from_index": false, - "framework": [ - "PyTorch" - ], - "friendly_name": "Use MLflow with Azure Machine Learning for training and deployment", - "index_order": 6, - "kernelspec": { - "display_name": "Python 3.6", - "language": "python", - "name": "python36" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.7.3" - }, - "name": "mlflow-sparksummit-pytorch", - "notebookId": 2495374963457641, - "tags": [ - "None" - ], - "task": "Use MLflow with Azure Machine Learning to train and deploy Pa yTorch image classifier model" - }, - "nbformat": 4, - "nbformat_minor": 1 -} \ No newline at end of file diff --git a/how-to-use-azureml/track-and-monitor-experiments/using-mlflow/train-deploy-pytorch/train-and-deploy-pytorch.yml b/how-to-use-azureml/track-and-monitor-experiments/using-mlflow/train-deploy-pytorch/train-and-deploy-pytorch.yml deleted file mode 100644 index 182dc3cf3..000000000 --- a/how-to-use-azureml/track-and-monitor-experiments/using-mlflow/train-deploy-pytorch/train-and-deploy-pytorch.yml +++ /dev/null @@ -1,8 +0,0 @@ -name: train-and-deploy-pytorch -dependencies: -- matplotlib -- pip: - - azureml-sdk - - azureml-mlflow - - https://download.pytorch.org/whl/cpu/torch-1.1.0-cp35-cp35m-win_amd64.whl - - https://download.pytorch.org/whl/cpu/torchvision-0.3.0-cp35-cp35m-win_amd64.whl diff --git a/how-to-use-azureml/training-with-deep-learning/export-run-history-to-tensorboard/export-run-history-to-tensorboard.yml b/how-to-use-azureml/training-with-deep-learning/export-run-history-to-tensorboard/export-run-history-to-tensorboard.yml index 6f79f760c..17aa9f1d3 100644 --- a/how-to-use-azureml/training-with-deep-learning/export-run-history-to-tensorboard/export-run-history-to-tensorboard.yml +++ b/how-to-use-azureml/training-with-deep-learning/export-run-history-to-tensorboard/export-run-history-to-tensorboard.yml @@ -3,7 +3,8 @@ dependencies: - pip: - azureml-sdk - azureml-tensorboard - - tensorflow<1.15.0 + - tensorflow - tqdm - scipy - sklearn + - setuptools>=41.0.0 diff --git a/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-keras/train-hyperparameter-tune-deploy-with-keras.ipynb b/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-keras/train-hyperparameter-tune-deploy-with-keras.ipynb index b6764fcce..ed1519f29 100644 --- a/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-keras/train-hyperparameter-tune-deploy-with-keras.ipynb +++ b/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-keras/train-hyperparameter-tune-deploy-with-keras.ipynb @@ -157,10 +157,14 @@ "data_folder = os.path.join(os.getcwd(), 'data')\n", "os.makedirs(data_folder, exist_ok=True)\n", "\n", - "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz', filename=os.path.join(data_folder, 'train-images.gz'))\n", - "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz', filename=os.path.join(data_folder, 'train-labels.gz'))\n", - "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz', filename=os.path.join(data_folder, 'test-images.gz'))\n", - "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz', filename=os.path.join(data_folder, 'test-labels.gz'))" + "urllib.request.urlretrieve('https://azureopendatastorage.blob.core.windows.net/mnist/train-images-idx3-ubyte.gz',\n", + " filename=os.path.join(data_folder, 'train-images.gz'))\n", + "urllib.request.urlretrieve('https://azureopendatastorage.blob.core.windows.net/mnist/train-labels-idx1-ubyte.gz',\n", + " filename=os.path.join(data_folder, 'train-labels.gz'))\n", + "urllib.request.urlretrieve('https://azureopendatastorage.blob.core.windows.net/mnist/t10k-images-idx3-ubyte.gz',\n", + " filename=os.path.join(data_folder, 'test-images.gz'))\n", + "urllib.request.urlretrieve('https://azureopendatastorage.blob.core.windows.net/mnist/t10k-labels-idx1-ubyte.gz',\n", + " filename=os.path.join(data_folder, 'test-labels.gz'))" ] }, { @@ -227,12 +231,10 @@ "outputs": [], "source": [ "from azureml.core.dataset import Dataset\n", - "\n", - "web_paths = [\n", - " 'http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz',\n", - " 'http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz',\n", - " 'http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz',\n", - " 'http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz'\n", + "web_paths = ['https://azureopendatastorage.blob.core.windows.net/mnist/train-images-idx3-ubyte.gz',\n", + " 'https://azureopendatastorage.blob.core.windows.net/mnist/train-labels-idx1-ubyte.gz',\n", + " 'https://azureopendatastorage.blob.core.windows.net/mnist/t10k-images-idx3-ubyte.gz',\n", + " 'https://azureopendatastorage.blob.core.windows.net/mnist/t10k-labels-idx1-ubyte.gz'\n", " ]\n", "dataset = Dataset.File.from_files(path = web_paths)" ] @@ -241,7 +243,8 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Use the `register()` method to register datasets to your workspace so they can be shared with others, reused across various experiments, and referred to by name in your training script." + "Use the `register()` method to register datasets to your workspace so they can be shared with others, reused across various experiments, and referred to by name in your training script.\n", + "You can try get the dataset first to see if it's already registered." ] }, { @@ -250,10 +253,18 @@ "metadata": {}, "outputs": [], "source": [ - "dataset = dataset.register(workspace = ws,\n", - " name = 'mnist dataset',\n", - " description='training and test dataset',\n", - " create_new_version=True)" + "dataset_registered = False\n", + "try:\n", + " temp = Dataset.get_by_name(workspace = ws, name = 'mnist-dataset')\n", + " dataset_registered = True\n", + "except:\n", + " print(\"The dataset mnist-dataset is not registered in workspace yet.\")\n", + "\n", + "if not dataset_registered:\n", + " dataset = dataset.register(workspace = ws,\n", + " name = 'mnist-dataset',\n", + " description='training and test dataset',\n", + " create_new_version=True)" ] }, { @@ -411,7 +422,7 @@ "metadata": {}, "outputs": [], "source": [ - "dataset = Dataset.get_by_name(ws, 'mnist dataset')\n", + "dataset = Dataset.get_by_name(ws, 'mnist-dataset')\n", "\n", "# list the files referenced by mnist dataset\n", "dataset.to_path()" @@ -799,6 +810,15 @@ "hdr.wait_for_completion(show_output=True)" ] }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "assert(hdr.get_status() == \"Completed\")" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -925,6 +945,7 @@ "cd = CondaDependencies.create()\n", "cd.add_tensorflow_conda_package()\n", "cd.add_conda_package('keras==2.2.5')\n", + "cd.add_pip_package(\"azureml-defaults\")\n", "cd.save_to_file(base_directory='./', conda_file_path='myenv.yml')\n", "\n", "print(cd.serialize_to_string())" @@ -947,10 +968,11 @@ "from azureml.core.webservice import AciWebservice\n", "from azureml.core.model import InferenceConfig\n", "from azureml.core.model import Model\n", + "from azureml.core.environment import Environment\n", + "\n", "\n", - "inference_config = InferenceConfig(runtime= \"python\", \n", - " entry_script=\"score.py\",\n", - " conda_file=\"myenv.yml\")\n", + "myenv = Environment.from_conda_specification(name=\"myenv\", file_path=\"myenv.yml\")\n", + "inference_config = InferenceConfig(entry_script=\"score.py\", environment=myenv)\n", "\n", "aciconfig = AciWebservice.deploy_configuration(cpu_cores=1,\n", " auth_enabled=True, # this flag generates API keys to secure access\n", @@ -1101,13 +1123,11 @@ "metadata": {}, "outputs": [], "source": [ - "models = ws.models\n", - "for name, model in models.items():\n", - " print(\"Model: {}, ID: {}\".format(name, model.id))\n", + "model = ws.models['keras-mlp-mnist']\n", + "print(\"Model: {}, ID: {}\".format('keras-mlp-mnist', model.id))\n", " \n", - "webservices = ws.webservices\n", - "for name, webservice in webservices.items():\n", - " print(\"Webservice: {}, scoring URI: {}\".format(name, webservice.scoring_uri))" + "webservice = ws.webservices['keras-mnist-svc']\n", + "print(\"Webservice: {}, scoring URI: {}\".format('keras-mnist-svc', webservice.scoring_uri))" ] }, { diff --git a/how-to-use-azureml/training/train-in-spark/train-in-spark.ipynb b/how-to-use-azureml/training/train-in-spark/train-in-spark.ipynb index b658eba05..3cacab4b9 100644 --- a/how-to-use-azureml/training/train-in-spark/train-in-spark.ipynb +++ b/how-to-use-azureml/training/train-in-spark/train-in-spark.ipynb @@ -149,6 +149,20 @@ " ssh_port=22, \n", " username=os.environ.get('hdiusername', ''), \n", " password=os.environ.get('hdipassword', ''))\n", + "\n", + "# The following Azure regions do not support attaching a HDI Cluster using the public IP address of the HDI Cluster.\n", + "# Instead, use the Azure Resource Manager ID of the HDI Cluster with the resource_id parameter:\n", + "# US East\n", + "# US West 2\n", + "# US South Central\n", + "# The resource ID of the HDI Cluster can be constructed using the\n", + "# subscription ID, resource group name, and cluster name using the following string format:\n", + "# /subscriptions//resourceGroups//providers/Microsoft.HDInsight/clusters/. \n", + "# If in US East, US West 2, or US South Central, use the following instead:\n", + "# attach_config = HDInsightCompute.attach_configuration(resource_id='',\n", + "# ssh_port=22,\n", + "# username=os.environ.get('hdiusername', ''),\n", + "# password=os.environ.get('hdipassword', ''))\n", " hdi_compute = ComputeTarget.attach(workspace=ws, \n", " name='myhdi', \n", " attach_configuration=attach_config)\n", @@ -272,7 +286,7 @@ "metadata": { "authors": [ { - "name": "aashishb" + "name": "sanpil" } ], "category": "training", diff --git a/how-to-use-azureml/training/train-on-local/train-on-local.ipynb b/how-to-use-azureml/training/train-on-local/train-on-local.ipynb index 5fe09adf1..11ef5a218 100644 --- a/how-to-use-azureml/training/train-on-local/train-on-local.ipynb +++ b/how-to-use-azureml/training/train-on-local/train-on-local.ipynb @@ -167,7 +167,10 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "name": "user_managed_env", + "msdoc": "how-to-track-experiments.md" + }, "outputs": [], "source": [ "from azureml.core import Environment\n", @@ -192,7 +195,10 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "name": "src", + "msdoc": "how-to-track-experiments.md" + }, "outputs": [], "source": [ "from azureml.core import ScriptRunConfig\n", @@ -204,7 +210,10 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "name": "run", + "msdoc": "how-to-track-experiments.md" + }, "outputs": [], "source": [ "run = exp.submit(src)" diff --git a/how-to-use-azureml/training/train-on-remote-vm/train-on-remote-vm.ipynb b/how-to-use-azureml/training/train-on-remote-vm/train-on-remote-vm.ipynb index 415fb061b..76a089068 100644 --- a/how-to-use-azureml/training/train-on-remote-vm/train-on-remote-vm.ipynb +++ b/how-to-use-azureml/training/train-on-remote-vm/train-on-remote-vm.ipynb @@ -23,7 +23,7 @@ "# 04. Train in a remote Linux VM\n", "* Create Workspace\n", "* Create `train.py` file\n", - "* Create and Attach a Remote VM (eg. DSVM) as compute resource.\n", + "* Create and Attach a Remote VM (eg. DSVM) as compute resource\n", "* Upload data files into default datastore\n", "* Configure & execute a run in a few different ways\n", " - Use system-built conda\n", @@ -126,7 +126,7 @@ "metadata": {}, "outputs": [], "source": [ - "# get the default datastore\n", + "# Get the default datastore\n", "ds = ws.get_default_datastore()\n", "print(ds.name, ds.datastore_type, ds.account_name, ds.container_name)" ] @@ -266,7 +266,23 @@ " ssh_port=22,\n", " username=username,\n", " private_key_file='./.ssh/id_rsa')\n", - " attached_dsvm_compute = ComputeTarget.attach(workspace=ws,\n", + "\n", + "\n", + "# The following Azure regions do not support attaching a virtual machine using the public IP address of the VM.\n", + "# Instead, use the Azure Resource Manager ID of the VM with the resource_id parameter:\n", + "# US East\n", + "# US West 2\n", + "# US South Central\n", + "# The resource ID of the VM can be constructed using the\n", + "# subscription ID, resource group name, and VM name using the following string format:\n", + "# /subscriptions//resourceGroups//providers/Microsoft.Compute/virtualMachines/. \n", + "# If in US East, US West 2, or US South Central, use the following instead:\n", + "# attach_config = RemoteCompute.attach_configuration(resource_id='',\n", + "# ssh_port=22,\n", + "# username='username',\n", + "# private_key_file='./.ssh/id_rsa')\n", + "\n", + " attached_dsvm_compute = ComputeTarget.attach(workspace=ws,\n", " name=compute_target_name,\n", " attach_configuration=attach_config)\n", " attached_dsvm_compute.wait_for_completion(show_output=True)" @@ -313,11 +329,11 @@ "from azureml.core import ScriptRunConfig\n", "from uuid import uuid4\n", "\n", + "script_arguments = ['--data-folder', dataset.as_named_input('diabetes').as_mount('/tmp/{}'.format(uuid4()))]\n", "src = ScriptRunConfig(source_directory=script_folder, \n", " script='train.py', \n", " # pass the dataset as a parameter to the training script\n", - " arguments=['--data-folder', \n", - " dataset.as_named_input('diabetes').as_mount('/tmp/{}'.format(uuid4()))]\n", + " arguments=script_arguments\n", " ) \n", "\n", "src.run_config.framework = \"python\"\n", @@ -392,14 +408,14 @@ "metadata": {}, "outputs": [], "source": [ - "run = exp.submit(config=src)\n", + " run = exp.submit(config=src)\n", "\n", - "from azureml.exceptions import ActivityFailedException\n", + " from azureml.exceptions import ActivityFailedException\n", "\n", - "try:\n", - " run.wait_for_completion(show_output=True)\n", - "except ActivityFailedException as ex:\n", - " print(ex)" + " try:\n", + " run.wait_for_completion(show_output=True)\n", + " except ActivityFailedException as ex:\n", + " print(ex)" ] }, { @@ -421,7 +437,8 @@ "with open(os.path.join(script_folder, './train2.py'), 'r') as training_script:\n", " print(training_script.read())\n", " \n", - "src.script = \"train2.py\"" + "src.script = \"train2.py\"\n", + "src.arguments = None" ] }, { @@ -493,6 +510,7 @@ "outputs": [], "source": [ "src.script = \"train.py\"\n", + "src.arguments = script_arguments\n", "\n", "run = exp.submit(config=src)\n", "\n", diff --git a/how-to-use-azureml/training/train-within-notebook/train-within-notebook.ipynb b/how-to-use-azureml/training/train-within-notebook/train-within-notebook.ipynb index 95b5df68e..7e3cbdde5 100644 --- a/how-to-use-azureml/training/train-within-notebook/train-within-notebook.ipynb +++ b/how-to-use-azureml/training/train-within-notebook/train-within-notebook.ipynb @@ -80,7 +80,9 @@ "metadata": { "tags": [ "install" - ] + ], + "name": "load_ws", + "msdoc": "how-to-track-experiments.md" }, "outputs": [], "source": [ @@ -113,7 +115,10 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "name": "load_data", + "msdoc": "how-to-track-experiments.md" + }, "outputs": [], "source": [ "from sklearn.datasets import load_diabetes\n", @@ -155,7 +160,9 @@ "tags": [ "local run", "outputs upload" - ] + ], + "name": "create_experiment", + "msdoc": "how-to-track-experiments.md" }, "outputs": [], "source": [ diff --git a/how-to-use-azureml/work-with-data/README.md b/how-to-use-azureml/work-with-data/README.md index b5ed63401..06387de91 100644 --- a/how-to-use-azureml/work-with-data/README.md +++ b/how-to-use-azureml/work-with-data/README.md @@ -10,7 +10,7 @@ With Azure Machine Learning datasets, you can: ## Learn how to use Azure Machine Learning datasets * [Create and register datasets](https://aka.ms/azureml/howto/createdatasets) -* Use [Datasets in training](datasets-tutorial/train-with-datasets.ipynb) +* Use [Datasets in training](datasets-tutorial/train-with-datasets/train-with-datasets.ipynb) * Use TabularDatasets in [automated machine learning training](https://aka.ms/automl-dataset) * Use FileDatasets in [image classification](https://aka.ms/filedataset-samplenotebook) * Use FileDatasets in [deep learning with hyperparameter tuning](https://aka.ms/filedataset-hyperdrive) diff --git a/how-to-use-azureml/work-with-data/datadrift-tutorial/datadrift-tutorial.ipynb b/how-to-use-azureml/work-with-data/datadrift-tutorial/datadrift-tutorial.ipynb index d6f0f1ed6..8459192c7 100644 --- a/how-to-use-azureml/work-with-data/datadrift-tutorial/datadrift-tutorial.ipynb +++ b/how-to-use-azureml/work-with-data/datadrift-tutorial/datadrift-tutorial.ipynb @@ -206,7 +206,11 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [ + "datadrift-remarks-sample" + ] + }, "outputs": [], "source": [ "from azureml.datadrift import DataDriftDetector, AlertConfiguration\n", @@ -290,32 +294,12 @@ "outputs": [], "source": [ "# backfill for one month\n", - "backfill = monitor.backfill(datetime(2019, 9, 1), datetime(2019, 10, 1))\n", + "backfill_start_date = datetime(2019, 9, 1)\n", + "backfill_end_date = datetime(2019, 10, 1)\n", + "backfill = monitor.backfill(backfill_start_date, backfill_end_date)\n", "backfill" ] }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Enable the monitor's pipeline schedule\n", - "\n", - "Turn on a scheduled pipeline which will anlayze the target dataset for drift every `frequency`. Use the latency parameter to adjust the start time of the pipeline. For instance, if it takes 24 hours for my data processing pipelines for data to arrive in the target dataset, set latency to 24. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# enable the pipeline schedule and recieve email alerts\n", - "monitor.enable_schedule()\n", - "\n", - "# disable the pipeline schedule \n", - "#monitor.disable_schedule()" - ] - }, { "cell_type": "markdown", "metadata": {}, @@ -332,8 +316,7 @@ "outputs": [], "source": [ "# make sure the backfill has completed\n", - "import time\n", - "time.sleep(1200)" + "backfill.wait_for_completion(wait_post_processing=True)" ] }, { @@ -353,16 +336,16 @@ "outputs": [], "source": [ "# plot the results from Python SDK \n", - "monitor.show()" + "monitor.show(backfill_start_date, backfill_end_date)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## See results in Azure Machine Learning studio (Enterprise only)\n", + "## Enable the monitor's pipeline schedule\n", "\n", - "The below cell will print a link to the monitor in the Azure Machine Learning studio, where the results can be viewed. Alertnatively, use the `show` or `get_results` to get and plot data drift results in Python." + "Turn on a scheduled pipeline which will anlayze the target dataset for drift every `frequency`. Use the latency parameter to adjust the start time of the pipeline. For instance, if it takes 24 hours for my data processing pipelines for data to arrive in the target dataset, set latency to 24. " ] }, { @@ -371,8 +354,11 @@ "metadata": {}, "outputs": [], "source": [ - "link = 'https://ml.azure.com/data/monitor/{}?wsid=/subscriptions/{}/resourcegroups/{}/workspaces/{}'.format(monitor.name, ws.subscription_id, ws.resource_group, ws.name)\n", - "print(link)" + "# enable the pipeline schedule and recieve email alerts\n", + "monitor.enable_schedule()\n", + "\n", + "# disable the pipeline schedule \n", + "#monitor.disable_schedule()" ] }, { diff --git a/how-to-use-azureml/work-with-data/dataset-api-change-notice.md b/how-to-use-azureml/work-with-data/dataset-api-change-notice.md index 46fdb8d8b..34c6fb97a 100644 --- a/how-to-use-azureml/work-with-data/dataset-api-change-notice.md +++ b/how-to-use-azureml/work-with-data/dataset-api-change-notice.md @@ -18,7 +18,7 @@ Methods to be deprecated|Replacement in the new version| [Dataset.from_parquet_files()](https://docs.microsoft.com/python/api/azureml-core/azureml.core.dataset.dataset?view=azure-ml-py#from-parquet-files-path--include-path-false--partition-format-none-)|[Dataset.Tabular.from_parquet_files()](https://docs.microsoft.com/python/api/azureml-core/azureml.data.dataset_factory.tabulardatasetfactory?view=azure-ml-py#from-parquet-files-path--validate-true--include-path-false--set-column-types-none-) [Dataset.from_sql_query()](https://docs.microsoft.com/python/api/azureml-core/azureml.core.dataset.dataset?view=azure-ml-py#from-sql-query-data-source--query-)|[Dataset.Tabular.from_sql_query()](https://docs.microsoft.com/python/api/azureml-core/azureml.data.dataset_factory.tabulardatasetfactory?view=azure-ml-py#from-sql-query-query--validate-true--set-column-types-none-) [Dataset.from_excel_files()](https://docs.microsoft.com/python/api/azureml-core/azureml.core.dataset.dataset?view=azure-ml-py#from-excel-files-path--sheet-name-none--use-column-headers-false--skip-rows-0--include-path-false--infer-column-types-true--partition-format-none-)|We will support creating a TabularDataset from Excel files in a future release. -[Dataset.from_json_files()](https://docs.microsoft.com/python/api/azureml-core/azureml.core.dataset.dataset?view=azure-ml-py#from-json-files-path--encoding--fileencoding-utf8--0---flatten-nested-arrays-false--include-path-false--partition-format-none-)| We will support creating a TabularDataset from json files in a future release. +[Dataset.from_json_files()](https://docs.microsoft.com/python/api/azureml-core/azureml.core.dataset.dataset?view=azure-ml-py#from-json-files-path--encoding--fileencoding-utf8--0---flatten-nested-arrays-false--include-path-false--partition-format-none-)| [Dataset.Tabular.from_json_lines_files](https://docs.microsoft.com/python/api/azureml-core/azureml.data.dataset_factory.tabulardatasetfactory?view=azure-ml-py#from-json-lines-files-path--validate-true--include-path-false--set-column-types-none--partition-format-none-) [Dataset.to_pandas_dataframe()](https://docs.microsoft.com/python/api/azureml-core/azureml.core.dataset.dataset?view=azure-ml-py#to-pandas-dataframe--)|[TabularDataset.to_pandas_dataframe()](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.data.tabulardataset?view=azure-ml-py#to-pandas-dataframe--) [Dataset.to_spark_dataframe()](https://docs.microsoft.com/python/api/azureml-core/azureml.core.dataset.dataset?view=azure-ml-py#to-spark-dataframe--)|[TabularDataset.to_spark_dataframe()](https://docs.microsoft.com/python/api/azureml-core/azureml.data.tabulardataset?view=azure-ml-py#to-spark-dataframe--) [Dataset.head(3)](https://docs.microsoft.com/python/api/azureml-core/azureml.core.dataset.dataset?view=azure-ml-py#head-count-)|[TabularDataset.take(3).to_pandas_dataframe()](https://docs.microsoft.com/python/api/azureml-core/azureml.data.tabulardataset?view=azure-ml-py#take-count-) @@ -29,27 +29,13 @@ Methods to be deprecated|Replacement in the new version| ## Why should I use the new Dataset API if I'm only dealing with tabular data? The current Dataset will be kept around for backward compatibility, but we strongly encourage you to move to TabularDataset for the new capabilities listed below: +- You are able to version and track the new typed Datasets. [Learn How](https://aka.ms/azureml/howto/versiondata) - You are able to use TabularDatasets as automated ML input. [Learn How](https://aka.ms/automl-dataset) -- You are able to version the new typed Datasets. [Learn How](https://aka.ms/azureml/howto/createdatasets) -- You will be able to use the new typed Datasets as ScriptRun, Estimator, HyperDrive input. -- You will be able to use the new typed Datasets in Azure Machine Learning Pipelines. -- You will be able to track the lineage of new typed Datasets for model reproducibility. - +- You are able to use the new typed Datasets as ScriptRun, Estimator, HyperDrive input. [Learn How](https://aka.ms/train-with-datasets) +- You are be able to use the new typed Datasets in Azure Machine Learning Pipelines. [Learn How](https://aka.ms/pl-datasets) ## How to migrate registered Datasets to new typed Datasets? -If you have registered Datasets created using the old API, you can easily migrate these old Datasets to the new typed Datasets using the following code. -```Python -from azureml.core.workspace import Workspace -from azureml.core.dataset import Dataset - -# get existing workspace -workspace = Workspace.from_config() -# This method will convert old Dataset without type to either a TabularDataset or a FileDataset object automatically. -new_ds = Dataset.get_by_name(workspace, 'old_ds_name') - -# register the new typed Dataset with the workspace -new_ds.register(workspace, 'new_ds_name') -``` +We handled the migration for you. All legacy datasets are migrated to new typed Datasets automatically. To use registered datasets, simply call [Dataset.get_by_name](https://docs.microsoft.com/python/api/azureml-core/azureml.core.dataset.dataset?view=azure-ml-py#get-by-name-workspace--name--version--latest--). ## How to provide feedback? If you have any feedback about our product, or if there is any missing capability that is essential for you to use new Dataset API, please email us at [AskAzureMLData@microsoft.com](mailto:AskAzureMLData@microsoft.com). diff --git a/how-to-use-azureml/work-with-data/datasets-tutorial/labeled-datasets/labeled-datasets.ipynb b/how-to-use-azureml/work-with-data/datasets-tutorial/labeled-datasets/labeled-datasets.ipynb new file mode 100644 index 000000000..ed210f6ad --- /dev/null +++ b/how-to-use-azureml/work-with-data/datasets-tutorial/labeled-datasets/labeled-datasets.ipynb @@ -0,0 +1,403 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/work-with-data/datasets-tutorial/labeled-datasets/labeled-datasets.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Introduction to labeled datasets\n", + "\n", + "Labeled datasets are output from Azure Machine Learning [labeling projects](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-create-labeling-projects). It captures the reference to the data (e.g. image files) and its labels. \n", + "\n", + "This tutorial introduces the capabilities of labeled datasets and how to use it in training.\n", + "\n", + "Learn how-to:\n", + "\n", + "> * Set up your development environment\n", + "> * Explore labeled datasets\n", + "> * Train a simple deep learning neural network on a remote cluster\n", + "\n", + "## Prerequisite:\n", + "* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning\n", + "* Go through Azure Machine Learning [labeling projects](https://docs.microsoft.com/azure/machine-learning/service/how-to-create-labeling-projects) and export the labels as an Azure Machine Learning dataset\n", + "* Go through the [configuration notebook](../../../configuration.ipynb) to:\n", + " * install the latest version of azureml-sdk\n", + " * install the latest version of azureml-contrib-dataset\n", + " * install [PyTorch](https://pytorch.org/)\n", + " * create a workspace and its configuration file (`config.json`)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Set up your development environment\n", + "\n", + "All the setup for your development work can be accomplished in a Python notebook. Setup includes:\n", + "\n", + "* Importing Python packages\n", + "* Connecting to a workspace to enable communication between your local computer and remote resources\n", + "* Creating an experiment to track all your runs\n", + "* Creating a remote compute target to use for training\n", + "\n", + "### Import packages\n", + "\n", + "Import Python packages you need in this session. Also display the Azure Machine Learning SDK version." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "import azureml.core\n", + "import azureml.contrib.dataset\n", + "from azureml.core import Dataset, Workspace, Experiment\n", + "from azureml.contrib.dataset import FileHandlingOption\n", + "\n", + "# check core SDK version number\n", + "print(\"Azure ML SDK Version: \", azureml.core.VERSION)\n", + "print(\"Azure ML Contrib Version\", azureml.contrib.dataset.VERSION)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Connect to workspace\n", + "\n", + "Create a workspace object from the existing workspace. `Workspace.from_config()` reads the file **config.json** and loads the details into an object named `workspace`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# load workspace\n", + "workspace = Workspace.from_config()\n", + "print('Workspace name: ' + workspace.name, \n", + " 'Azure region: ' + workspace.location, \n", + " 'Subscription id: ' + workspace.subscription_id, \n", + " 'Resource group: ' + workspace.resource_group, sep='\\n')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create experiment and a directory\n", + "\n", + "Create an experiment to track the runs in your workspace and a directory to deliver the necessary code from your computer to the remote resource." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# create an ML experiment\n", + "exp = Experiment(workspace=workspace, name='labeled-datasets')\n", + "\n", + "# create a directory\n", + "script_folder = './labeled-datasets'\n", + "os.makedirs(script_folder, exist_ok=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create or Attach existing compute resource\n", + "By using Azure Machine Learning Compute, a managed service, data scientists can train machine learning models on clusters of Azure virtual machines. Examples include VMs with GPU support. In this tutorial, you will create Azure Machine Learning Compute as your training environment. The code below creates the compute clusters for you if they don't already exist in your workspace.\n", + "\n", + "**Creation of compute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace the code will skip the creation process." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.compute import ComputeTarget, AmlCompute\n", + "from azureml.core.compute_target import ComputeTargetException\n", + "\n", + "# choose a name for your cluster\n", + "cluster_name = \"openhack\"\n", + "\n", + "try:\n", + " compute_target = ComputeTarget(workspace=workspace, name=cluster_name)\n", + " print('Found existing compute target')\n", + "except ComputeTargetException:\n", + " print('Creating a new compute target...')\n", + " compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6', \n", + " max_nodes=4)\n", + "\n", + " # create the cluster\n", + " compute_target = ComputeTarget.create(workspace, cluster_name, compute_config)\n", + "\n", + " # can poll for a minimum number of nodes and for a specific timeout. \n", + " # if no min node count is provided it uses the scale settings for the cluster\n", + " compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n", + "\n", + "# use get_status() to get a detailed status for the current cluster. \n", + "print(compute_target.get_status().serialize())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Explore labeled datasets\n", + "\n", + "**Note**: How to create labeled datasets is not covered in this tutorial. To create labeled datasets, you can go through [labeling projects](https://docs.microsoft.com/azure/machine-learning/service/how-to-create-labeling-projects) and export the output labels as Azure Machine Lerning datasets. \n", + "\n", + "`animal_labels` used in this tutorial section is the output from a labeling project, with the task type of \"Object Identification\"." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# get animal_labels dataset from the workspace\n", + "animal_labels = Dataset.get_by_name(workspace, 'animal_labels')\n", + "animal_labels" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can load labeled datasets into pandas DataFrame. There are 3 file handling option that you can choose to load the data files referenced by the labeled datasets:\n", + "* Streaming: The default option to load data files.\n", + "* Download: Download your data files to a local path.\n", + "* Mount: Mount your data files to a mount point. Mount only works for Linux-based compute, including Azure Machine Learning notebook VM and Azure Machine Learning Compute." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "animal_pd = animal_labels.to_pandas_dataframe(file_handling_option=FileHandlingOption.DOWNLOAD, target_path='./download/', overwrite_download=True)\n", + "animal_pd" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import matplotlib.pyplot as plt\n", + "import matplotlib.image as mpimg\n", + "\n", + "# read images from downloaded path\n", + "img = mpimg.imread(animal_pd.loc[0,'image_url'])\n", + "imgplot = plt.imshow(img)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can also load labeled datasets into [torchvision datasets](https://pytorch.org/docs/stable/torchvision/datasets.html), so that you can leverage on the open source libraries provided by PyTorch for image transformation and training." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from torchvision.transforms import functional as F\n", + "\n", + "# load animal_labels dataset into torchvision dataset\n", + "pytorch_dataset = animal_labels.to_torchvision()\n", + "img = pytorch_dataset[0][0]\n", + "print(type(img))\n", + "\n", + "# use methods from torchvision to transform the img into grayscale\n", + "pil_image = F.to_pil_image(img)\n", + "gray_image = F.to_grayscale(pil_image, num_output_channels=3)\n", + "\n", + "imgplot = plt.imshow(gray_image)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Train an image classification model\n", + "\n", + " `crack_labels` dataset used in this tutorial section is the output from a labeling project, with the task type of \"Image Classification Multi-class\". We will use this dataset to train an image classification model that classify whether an image has cracks or not." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# get crack_labels dataset from the workspace\n", + "crack_labels = Dataset.get_by_name(workspace, 'crack_labels')\n", + "crack_labels" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Configure Estimator for training\n", + "\n", + "You can ask the system to build a conda environment based on your dependency specification. Once the environment is built, and if you don't change your dependencies, it will be reused in subsequent runs." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core import Environment\n", + "from azureml.core.conda_dependencies import CondaDependencies\n", + "\n", + "conda_env = Environment('conda-env')\n", + "conda_env.python.conda_dependencies = CondaDependencies.create(pip_packages=['azureml-sdk',\n", + " 'azureml-contrib-dataset',\n", + " 'torch','torchvision',\n", + " 'azureml-dataprep[pandas]'])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "An estimator object is used to submit the run. Azure Machine Learning has pre-configured estimators for common machine learning frameworks, as well as generic Estimator. Create a generic estimator for by specifying\n", + "\n", + "* The name of the estimator object, `est`\n", + "* The directory that contains your scripts. All the files in this directory are uploaded into the cluster nodes for execution. \n", + "* The training script name, train.py\n", + "* The input dataset for training\n", + "* The compute target. In this case you will use the AmlCompute you created\n", + "* The environment definition for the experiment" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.train.estimator import Estimator\n", + "\n", + "est = Estimator(source_directory=script_folder, \n", + " entry_script='train.py',\n", + " inputs=[crack_labels.as_named_input('crack_labels')],\n", + " compute_target=compute_target,\n", + " environment_definition= conda_env)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Submit job to run\n", + "\n", + "Submit the estimator to the Azure ML experiment to kick off the execution." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run = exp.submit(est)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run.wait_for_completion(show_output=True)" + ] + } + ], + "metadata": { + "authors": [ + { + "name": "sihhu" + } + ], + "category": "tutorial", + "compute": [ + "Remote" + ], + "deployment": [ + "None" + ], + "exclude_from_index": false, + "framework": [ + "Azure ML" + ], + "friendly_name": "Introduction to labeled datasets", + "index_order": 1, + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.9" + }, + "nteract": { + "version": "nteract-front-end@1.0.0" + }, + "star_tag": [ + "featured" + ], + "tags": [ + "Dataset", + "label", + "Estimator" + ], + "task": "Train" + }, + "nbformat": 4, + "nbformat_minor": 2 +} \ No newline at end of file diff --git a/how-to-use-azureml/work-with-data/datasets-tutorial/labeled-datasets/labeled-datasets/train.py b/how-to-use-azureml/work-with-data/datasets-tutorial/labeled-datasets/labeled-datasets/train.py new file mode 100644 index 000000000..a4bfc53e0 --- /dev/null +++ b/how-to-use-azureml/work-with-data/datasets-tutorial/labeled-datasets/labeled-datasets/train.py @@ -0,0 +1,106 @@ +import os +import torchvision +import torchvision.transforms as transforms +import torch +import torch.nn as nn +import torch.nn.functional as F +import torch.optim as optim + +from azureml.core import Dataset, Run +import azureml.contrib.dataset +from azureml.contrib.dataset import FileHandlingOption, LabeledDatasetTask + +run = Run.get_context() + +# get input dataset by name +labeled_dataset = run.input_datasets['crack_labels'] +pytorch_dataset = labeled_dataset.to_torchvision() + + +indices = torch.randperm(len(pytorch_dataset)).tolist() +dataset_train = torch.utils.data.Subset(pytorch_dataset, indices[:40]) +dataset_test = torch.utils.data.Subset(pytorch_dataset, indices[-10:]) + +trainloader = torch.utils.data.DataLoader(dataset_train, batch_size=4, + shuffle=True, num_workers=0) + +testloader = torch.utils.data.DataLoader(dataset_test, batch_size=4, + shuffle=True, num_workers=0) + + +class Net(nn.Module): + def __init__(self): + super(Net, self).__init__() + self.conv1 = nn.Conv2d(3, 6, 5) + self.pool = nn.MaxPool2d(2, 2) + self.conv2 = nn.Conv2d(6, 16, 5) + self.fc1 = nn.Linear(16 * 71 * 71, 120) + self.fc2 = nn.Linear(120, 84) + self.fc3 = nn.Linear(84, 10) + + def forward(self, x): + x = self.pool(F.relu(self.conv1(x))) + x = self.pool(F.relu(self.conv2(x))) + x = x.view(x.size(0), 16 * 71 * 71) + x = F.relu(self.fc1(x)) + x = F.relu(self.fc2(x)) + x = self.fc3(x) + return x + + +net = Net() + +criterion = nn.CrossEntropyLoss() +optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9) + + +for epoch in range(2): # loop over the dataset multiple times + + running_loss = 0.0 + for i, data in enumerate(trainloader, 0): + # get the inputs; data is a list of [inputs, labels] + inputs, labels = data + + # zero the parameter gradients + optimizer.zero_grad() + + # forward + backward + optimize + outputs = net(inputs) + loss = criterion(outputs, labels) + loss.backward() + optimizer.step() + + # print statistics + running_loss += loss.item() + if i % 5 == 4: # print every 5 mini-batches + print('[%d, %5d] loss: %.3f' % + (epoch + 1, i + 1, running_loss / 5)) + running_loss = 0.0 + +print('Finished Training') +classes = trainloader.dataset.dataset.labels +PATH = './cifar_net.pth' +torch.save(net.state_dict(), PATH) + +dataiter = iter(testloader) +images, labels = dataiter.next() + +net = Net() +net.load_state_dict(torch.load(PATH)) + +outputs = net(images) + +_, predicted = torch.max(outputs, 1) + +correct = 0 +total = 0 +with torch.no_grad(): + for data in testloader: + images, labels = data + outputs = net(images) + _, predicted = torch.max(outputs.data, 1) + total += labels.size(0) + correct += (predicted == labels).sum().item() + +print('Accuracy of the network on the 10 test images: %d %%' % (100 * correct / total)) +pass diff --git a/how-to-use-azureml/work-with-data/datasets-tutorial/pipeline-with-datasets/keras-mnist-fashion/prepare.py b/how-to-use-azureml/work-with-data/datasets-tutorial/pipeline-with-datasets/keras-mnist-fashion/prepare.py new file mode 100644 index 000000000..4e8c054a8 --- /dev/null +++ b/how-to-use-azureml/work-with-data/datasets-tutorial/pipeline-with-datasets/keras-mnist-fashion/prepare.py @@ -0,0 +1,35 @@ +import os + + +def convert(imgf, labelf, outf, n): + f = open(imgf, "rb") + l = open(labelf, "rb") + o = open(outf, "w") + + f.read(16) + l.read(8) + images = [] + + for i in range(n): + image = [ord(l.read(1))] + for j in range(28 * 28): + image.append(ord(f.read(1))) + images.append(image) + + for image in images: + o.write(",".join(str(pix) for pix in image) + "\n") + f.close() + o.close() + l.close() + + +mounted_input_path = os.environ['fashion_ds'] +mounted_output_path = os.environ['AZUREML_DATAREFERENCE_prepared_fashion_ds'] +os.makedirs(mounted_output_path, exist_ok=True) + +convert(os.path.join(mounted_input_path, 'train-images-idx3-ubyte'), + os.path.join(mounted_input_path, 'train-labels-idx1-ubyte'), + os.path.join(mounted_output_path, 'mnist_train.csv'), 60000) +convert(os.path.join(mounted_input_path, 't10k-images-idx3-ubyte'), + os.path.join(mounted_input_path, 't10k-labels-idx1-ubyte'), + os.path.join(mounted_output_path, 'mnist_test.csv'), 10000) diff --git a/how-to-use-azureml/work-with-data/datasets-tutorial/pipeline-with-datasets/keras-mnist-fashion/t10k-images-idx3-ubyte b/how-to-use-azureml/work-with-data/datasets-tutorial/pipeline-with-datasets/keras-mnist-fashion/t10k-images-idx3-ubyte new file mode 100644 index 000000000..37bac79bc Binary files /dev/null and b/how-to-use-azureml/work-with-data/datasets-tutorial/pipeline-with-datasets/keras-mnist-fashion/t10k-images-idx3-ubyte differ diff --git a/how-to-use-azureml/work-with-data/datasets-tutorial/pipeline-with-datasets/keras-mnist-fashion/t10k-labels-idx1-ubyte b/how-to-use-azureml/work-with-data/datasets-tutorial/pipeline-with-datasets/keras-mnist-fashion/t10k-labels-idx1-ubyte new file mode 100644 index 000000000..2195a4d09 Binary files /dev/null and b/how-to-use-azureml/work-with-data/datasets-tutorial/pipeline-with-datasets/keras-mnist-fashion/t10k-labels-idx1-ubyte differ diff --git a/how-to-use-azureml/work-with-data/datasets-tutorial/pipeline-with-datasets/keras-mnist-fashion/train-images-idx3-ubyte b/how-to-use-azureml/work-with-data/datasets-tutorial/pipeline-with-datasets/keras-mnist-fashion/train-images-idx3-ubyte new file mode 100644 index 000000000..ff2f5a963 Binary files /dev/null and b/how-to-use-azureml/work-with-data/datasets-tutorial/pipeline-with-datasets/keras-mnist-fashion/train-images-idx3-ubyte differ diff --git a/how-to-use-azureml/work-with-data/datasets-tutorial/pipeline-with-datasets/keras-mnist-fashion/train-labels-idx1-ubyte b/how-to-use-azureml/work-with-data/datasets-tutorial/pipeline-with-datasets/keras-mnist-fashion/train-labels-idx1-ubyte new file mode 100644 index 000000000..30424ca2e Binary files /dev/null and b/how-to-use-azureml/work-with-data/datasets-tutorial/pipeline-with-datasets/keras-mnist-fashion/train-labels-idx1-ubyte differ diff --git a/how-to-use-azureml/work-with-data/datasets-tutorial/pipeline-with-datasets/keras-mnist-fashion/train.py b/how-to-use-azureml/work-with-data/datasets-tutorial/pipeline-with-datasets/keras-mnist-fashion/train.py new file mode 100644 index 000000000..b0215ad1b --- /dev/null +++ b/how-to-use-azureml/work-with-data/datasets-tutorial/pipeline-with-datasets/keras-mnist-fashion/train.py @@ -0,0 +1,120 @@ +import keras +from keras.models import Sequential +from keras.layers import Dense, Dropout, Flatten +from keras.layers import Conv2D, MaxPooling2D +from keras.layers.normalization import BatchNormalization +from keras.utils import to_categorical +from keras.callbacks import Callback + +import numpy as np +import pandas as pd +import os +import matplotlib.pyplot as plt +from sklearn.model_selection import train_test_split +from azureml.core import Run + +# dataset object from the run +run = Run.get_context() +dataset = run.input_datasets['prepared_fashion_ds'] + +# split dataset into train and test set +(train_dataset, test_dataset) = dataset.random_split(percentage=0.8, seed=111) + +# load dataset into pandas dataframe +data_train = train_dataset.to_pandas_dataframe() +data_test = test_dataset.to_pandas_dataframe() + +img_rows, img_cols = 28, 28 +input_shape = (img_rows, img_cols, 1) + +X = np.array(data_train.iloc[:, 1:]) +y = to_categorical(np.array(data_train.iloc[:, 0])) + +# here we split validation data to optimiza classifier during training +X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=13) + +# test data +X_test = np.array(data_test.iloc[:, 1:]) +y_test = to_categorical(np.array(data_test.iloc[:, 0])) + + +X_train = X_train.reshape(X_train.shape[0], img_rows, img_cols, 1).astype('float32') / 255 +X_test = X_test.reshape(X_test.shape[0], img_rows, img_cols, 1).astype('float32') / 255 +X_val = X_val.reshape(X_val.shape[0], img_rows, img_cols, 1).astype('float32') / 255 + +batch_size = 256 +num_classes = 10 +epochs = 10 + +# construct neuron network +model = Sequential() +model.add(Conv2D(32, kernel_size=(3, 3), + activation='relu', + kernel_initializer='he_normal', + input_shape=input_shape)) +model.add(MaxPooling2D((2, 2))) +model.add(Dropout(0.25)) +model.add(Conv2D(64, (3, 3), activation='relu')) +model.add(MaxPooling2D(pool_size=(2, 2))) +model.add(Dropout(0.25)) +model.add(Conv2D(128, (3, 3), activation='relu')) +model.add(Dropout(0.4)) +model.add(Flatten()) +model.add(Dense(128, activation='relu')) +model.add(Dropout(0.3)) +model.add(Dense(num_classes, activation='softmax')) + +model.compile(loss=keras.losses.categorical_crossentropy, + optimizer=keras.optimizers.Adam(), + metrics=['accuracy']) + +# start an Azure ML run +run = Run.get_context() + + +class LogRunMetrics(Callback): + # callback at the end of every epoch + def on_epoch_end(self, epoch, log): + # log a value repeated which creates a list + run.log('Loss', log['loss']) + run.log('Accuracy', log['accuracy']) + + +history = model.fit(X_train, y_train, + batch_size=batch_size, + epochs=epochs, + verbose=1, + validation_data=(X_val, y_val), + callbacks=[LogRunMetrics()]) + +score = model.evaluate(X_test, y_test, verbose=0) + +# log a single value +run.log("Final test loss", score[0]) +print('Test loss:', score[0]) + +run.log('Final test accuracy', score[1]) +print('Test accuracy:', score[1]) + +plt.figure(figsize=(6, 3)) +plt.title('Fashion MNIST with Keras ({} epochs)'.format(epochs), fontsize=14) +plt.plot(history.history['accuracy'], 'b-', label='Accuracy', lw=4, alpha=0.5) +plt.plot(history.history['loss'], 'r--', label='Loss', lw=4, alpha=0.5) +plt.legend(fontsize=12) +plt.grid(True) + +# log an image +run.log_image('Loss v.s. Accuracy', plot=plt) + +# create a ./outputs/model folder in the compute target +# files saved in the "./outputs" folder are automatically uploaded into run history +os.makedirs('./outputs/model', exist_ok=True) + +# serialize NN architecture to JSON +model_json = model.to_json() +# save model JSON +with open('./outputs/model/model.json', 'w') as f: + f.write(model_json) +# save model weights +model.save_weights('./outputs/model/model.h5') +print("model saved in ./outputs/model folder") diff --git a/how-to-use-azureml/work-with-data/datasets-tutorial/pipeline-with-datasets/pipeline-for-image-classification.ipynb b/how-to-use-azureml/work-with-data/datasets-tutorial/pipeline-with-datasets/pipeline-for-image-classification.ipynb new file mode 100644 index 000000000..739fca0ea --- /dev/null +++ b/how-to-use-azureml/work-with-data/datasets-tutorial/pipeline-with-datasets/pipeline-for-image-classification.ipynb @@ -0,0 +1,488 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Copyright (c) Microsoft Corporation. All rights reserved.\n", + "\n", + "Licensed under the MIT License [2017] Zalando SE, https://tech.zalando.com" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/work-with-data/datasets-tutorial/pipeline-with-datasets/pipeline-for-image-classification.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Build a simple ML pipeline for image classification\n", + "\n", + "## Introduction\n", + "This tutorial shows how to train a simple deep neural network using the [Fashion MNIST](https://github.com/zalandoresearch/fashion-mnist) dataset and Keras on Azure Machine Learning. Fashion-MNIST is a dataset of Zalando's article images\u00e2\u20ac\u201dconsisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes.\n", + "\n", + "Learn how to:\n", + "\n", + "> * Set up your development environment\n", + "> * Create the Fashion MNIST dataset\n", + "> * Create a machine learning pipeline to train a simple deep learning neural network on a remote cluster\n", + "> * Retrieve input datasets from the experiment and register the output model with datasets\n", + "\n", + "## Prerequisite:\n", + "* Understand the [architecture and terms](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture) introduced by Azure Machine Learning\n", + "* If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the [configuration notebook](../../../configuration.ipynb) to:\n", + " * install the latest version of AzureML SDK\n", + " * create a workspace and its configuration file (`config.json`)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Set up your development environment\n", + "\n", + "All the setup for your development work can be accomplished in a Python notebook. Setup includes:\n", + "\n", + "* Importing Python packages\n", + "* Connecting to a workspace to enable communication between your local computer and remote resources\n", + "* Creating an experiment to track all your runs\n", + "* Creating a remote compute target to use for training\n", + "\n", + "### Import packages\n", + "\n", + "Import Python packages you need in this session. Also display the Azure Machine Learning SDK version." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "import azureml.core\n", + "from azureml.core import Workspace, Dataset, Datastore, ComputeTarget, RunConfiguration, Experiment\n", + "from azureml.core.runconfig import CondaDependencies\n", + "from azureml.pipeline.steps import PythonScriptStep, EstimatorStep\n", + "from azureml.pipeline.core import Pipeline, PipelineData\n", + "from azureml.train.dnn import TensorFlow\n", + "\n", + "# check core SDK version number\n", + "print(\"Azure ML SDK Version: \", azureml.core.VERSION)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Connect to workspace\n", + "\n", + "Create a workspace object from the existing workspace. `Workspace.from_config()` reads the file **config.json** and loads the details into an object named `workspace`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# load workspace\n", + "workspace = Workspace.from_config()\n", + "print('Workspace name: ' + workspace.name, \n", + " 'Azure region: ' + workspace.location, \n", + " 'Subscription id: ' + workspace.subscription_id, \n", + " 'Resource group: ' + workspace.resource_group, sep='\\n')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create experiment and a directory\n", + "\n", + "Create an experiment to track the runs in your workspace and a directory to deliver the necessary code from your computer to the remote resource." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# create an ML experiment\n", + "exp = Experiment(workspace=workspace, name='keras-mnist-fashion')\n", + "\n", + "# create a directory\n", + "script_folder = './keras-mnist-fashion'\n", + "os.makedirs(script_folder, exist_ok=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create or Attach existing compute resource\n", + "By using Azure Machine Learning Compute, a managed service, data scientists can train machine learning models on clusters of Azure virtual machines. Examples include VMs with GPU support. In this tutorial, you create Azure Machine Learning Compute as your training environment. The code below creates the compute clusters for you if they don't already exist in your workspace.\n", + "\n", + "**Creation of compute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace the code will skip the creation process." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core.compute import ComputeTarget, AmlCompute\n", + "from azureml.core.compute_target import ComputeTargetException\n", + "\n", + "# choose a name for your cluster\n", + "cluster_name = \"gpu-cluster\"\n", + "\n", + "try:\n", + " compute_target = ComputeTarget(workspace=workspace, name=cluster_name)\n", + " print('Found existing compute target')\n", + "except ComputeTargetException:\n", + " print('Creating a new compute target...')\n", + " compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6', \n", + " max_nodes=4)\n", + "\n", + " # create the cluster\n", + " compute_target = ComputeTarget.create(workspace, cluster_name, compute_config)\n", + "\n", + " # can poll for a minimum number of nodes and for a specific timeout. \n", + " # if no min node count is provided it uses the scale settings for the cluster\n", + " compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)\n", + "\n", + "# use get_status() to get a detailed status for the current cluster. \n", + "print(compute_target.get_status().serialize())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create the Fashion MNIST dataset\n", + "\n", + "By creating a dataset, you create a reference to the data source location. If you applied any subsetting transformations to the dataset, they will be stored in the dataset as well. The data remains in its existing location, so no extra storage cost is incurred. \n", + "\n", + "Every workspace comes with a default [datastore](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-access-data) (and you can register more) which is backed by the Azure blob storage account associated with the workspace. We can use it to transfer data from local to the cloud, and create a dataset from it. We will now upload the [Fashion MNIST](./keras-mnist-fashion) to the default datastore (blob) within your workspace." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "datastore = workspace.get_default_datastore()\n", + "datastore.upload_files(files = ['keras-mnist-fashion/t10k-images-idx3-ubyte', 'keras-mnist-fashion/t10k-labels-idx1-ubyte',\n", + " 'keras-mnist-fashion/train-images-idx3-ubyte','keras-mnist-fashion/train-labels-idx1-ubyte'],\n", + " target_path = 'mnist-fashion',\n", + " overwrite = True,\n", + " show_progress = True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Then we will create an unregistered FileDataset pointing to the path in the datastore. You can also create a dataset from multiple paths. [Learn More](https://aka.ms/azureml/howto/createdatasets) " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "fashion_ds = Dataset.File.from_files([(datastore, 'mnist-fashion')])\n", + "\n", + "# list the files referenced by fashion_ds\n", + "fashion_ds.to_path()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Build 2-step ML pipeline\n", + "\n", + "The [Azure Machine Learning Pipeline](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-ml-pipelines) enables data scientists to create and manage multiple simple and complex workflows concurrently. A typical pipeline would have multiple tasks to prepare data, train, deploy and evaluate models. Individual steps in the pipeline can make use of diverse compute options (for example: CPU for data preparation and GPU for training) and languages. [Learn More](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/machine-learning-pipelines)\n", + "\n", + "\n", + "### Step 1: data preparation\n", + "\n", + "In step one, we will load the image and labels from Fashion MNIST dataset into mnist_train.csv and mnist_test.csv\n", + "\n", + "Each image is 28 pixels in height and 28 pixels in width, for a total of 784 pixels in total. Each pixel has a single pixel-value associated with it, indicating the lightness or darkness of that pixel, with higher numbers meaning darker. This pixel-value is an integer between 0 and 255. Both mnist_train.csv and mnist_test.csv contain 785 columns. The first column consists of the class labels, which represent the article of clothing. The rest of the columns contain the pixel-values of the associated image." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# set up the compute environment to install required packages\n", + "conda = CondaDependencies.create(\n", + " pip_packages=['azureml-sdk','azureml-dataprep[fuse,pandas]'],\n", + " pin_sdk_version=False)\n", + "\n", + "conda.set_pip_option('--pre')\n", + "\n", + "run_config = RunConfiguration()\n", + "run_config.environment.python.conda_dependencies = conda" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Intermediate data (or output of a step) is represented by a `PipelineData` object. preprared_fashion_ds is produced as the output of step 1, and used as the input of step 2. PipelineData introduces a data dependency between steps, and creates an implicit execution order in the pipeline. You can register a `PipelineData` as a dataset and version the output data automatically. [Learn More](https://docs.microsoft.com/azure/machine-learning/service/how-to-version-track-datasets#version-a-pipeline-output-dataset) " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# define output data\n", + "prepared_fashion_ds = PipelineData('prepared_fashion_ds', datastore=datastore).as_dataset()\n", + "\n", + "# register output data as dataset\n", + "prepared_fashion_ds = prepared_fashion_ds.register(name='prepared_fashion_ds', create_new_version=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "A **PythonScriptStep** is a basic, built-in step to run a Python Script on a compute target. It takes a script name and optionally other parameters like arguments for the script, compute target, inputs and outputs. If no compute target is specified, default compute target for the workspace is used. You can also use a [**RunConfiguration**](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.runconfiguration?view=azure-ml-py) to specify requirements for the PythonScriptStep, such as conda dependencies and docker image." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "prep_step = PythonScriptStep(name='prepare step',\n", + " script_name=\"prepare.py\",\n", + " # mount fashion_ds dataset to the compute_target\n", + " inputs=[fashion_ds.as_named_input('fashion_ds').as_mount()],\n", + " outputs=[prepared_fashion_ds],\n", + " source_directory=script_folder,\n", + " compute_target=compute_target,\n", + " runconfig=run_config)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Step 2: train CNN with Keras\n", + "\n", + "Next, we construct an `azureml.train.dnn.TensorFlow` estimator object. The TensorFlow estimator is providing a simple way of launching a TensorFlow training job on a compute target. It will automatically provide a docker image that has TensorFlow installed.\n", + "\n", + "[EstimatorStep](https://docs.microsoft.com/en-us/python/api/azureml-pipeline-steps/azureml.pipeline.steps.estimator_step.estimatorstep?view=azure-ml-py) adds a step to run Tensorflow Estimator in a Pipeline. It takes a dataset as the input." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# set up training step with Tensorflow estimator\n", + "est = TensorFlow(entry_script='train.py',\n", + " source_directory=script_folder, \n", + " pip_packages = ['azureml-sdk','keras','numpy','scikit-learn', 'matplotlib'],\n", + " compute_target=compute_target)\n", + "\n", + "est_step = EstimatorStep(name='train step',\n", + " estimator=est,\n", + " estimator_entry_script_arguments=[],\n", + " # parse prepared_fashion_ds into TabularDataset and use it as the input\n", + " inputs=[prepared_fashion_ds.parse_delimited_files()],\n", + " compute_target=compute_target)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Build the pipeline\n", + "Once we have the steps (or steps collection), we can build the [pipeline](https://docs.microsoft.com/python/api/azureml-pipeline-core/azureml.pipeline.core.pipeline.pipeline?view=azure-ml-py).\n", + "\n", + "A pipeline is created with a list of steps and a workspace. Submit a pipeline using [submit](https://docs.microsoft.com/python/api/azureml-core/azureml.core.experiment(class)?view=azure-ml-py#submit-config--tags-none----kwargs-). When submit is called, a [PipelineRun](https://docs.microsoft.com/python/api/azureml-pipeline-core/azureml.pipeline.core.pipelinerun?view=azure-ml-py) is created which in turn creates [StepRun](https://docs.microsoft.com/python/api/azureml-pipeline-core/azureml.pipeline.core.steprun?view=azure-ml-py) objects for each step in the workflow." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# build pipeline & run experiment\n", + "pipeline = Pipeline(workspace, steps=[prep_step, est_step])\n", + "run = exp.submit(pipeline)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Monitor the PipelineRun" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "inputHidden": false, + "outputHidden": false + }, + "outputs": [], + "source": [ + "run.wait_for_completion(show_output=True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run.find_step_run('train step')[0].get_metrics()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Register the input dataset and the output model" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Azure Machine Learning dataset makes it easy to trace how your data is used in ML. [Learn More](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-version-track-datasets#track-datasets-in-experiments)
\n", + "For each Machine Learning experiment, you can easily trace the datasets used as the input through `Run` object." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# get input datasets\n", + "prep_step = run.find_step_run('prepare step')[0]\n", + "inputs = prep_step.get_details()['inputDatasets']\n", + "input_dataset = inputs[0]['dataset']\n", + "\n", + "# list the files referenced by input_dataset\n", + "input_dataset.to_path()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Register the input Fashion MNIST dataset with the workspace so that you can reuse it in other experiments or share it with your colleagues who have access to your workspace." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "fashion_ds = input_dataset.register(workspace = workspace,\n", + " name = 'fashion_ds',\n", + " description = 'image and label files from fashion mnist',\n", + " create_new_version = True)\n", + "fashion_ds" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Register the output model with dataset" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "run.find_step_run('train step')[0].register_model(model_name = 'keras-model', model_path = 'outputs/model/', \n", + " datasets =[('train test data',fashion_ds)])" + ] + } + ], + "metadata": { + "authors": [ + { + "name": "sihhu" + } + ], + "category": "tutorial", + "compute": [ + "Remote" + ], + "datasets": [ + "Fashion MNIST" + ], + "deployment": [ + "None" + ], + "exclude_from_index": false, + "framework": [ + "Azure ML" + ], + "friendly_name": "Datasets with ML Pipeline", + "index_order": 1, + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python36" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.9" + }, + "nteract": { + "version": "nteract-front-end@1.0.0" + }, + "star_tag": [ + "featured" + ], + "tags": [ + "Dataset", + "Pipeline", + "Estimator", + "ScriptRun" + ], + "task": "Train" + }, + "nbformat": 4, + "nbformat_minor": 2 +} \ No newline at end of file diff --git a/how-to-use-azureml/work-with-data/datasets-tutorial/pipeline-with-datasets/pipeline-for-image-classification.yml b/how-to-use-azureml/work-with-data/datasets-tutorial/pipeline-with-datasets/pipeline-for-image-classification.yml new file mode 100644 index 000000000..e6b3df702 --- /dev/null +++ b/how-to-use-azureml/work-with-data/datasets-tutorial/pipeline-with-datasets/pipeline-for-image-classification.yml @@ -0,0 +1,7 @@ +name: pipeline-for-image-classification +dependencies: +- pip: + - azureml-sdk + - azureml-dataprep + - pandas<=0.23.4 + - fuse diff --git a/how-to-use-azureml/work-with-data/datasets-tutorial/tabular-timeseries-dataset-filtering.ipynb b/how-to-use-azureml/work-with-data/datasets-tutorial/timeseries-datasets/tabular-timeseries-dataset-filtering.ipynb similarity index 73% rename from how-to-use-azureml/work-with-data/datasets-tutorial/tabular-timeseries-dataset-filtering.ipynb rename to how-to-use-azureml/work-with-data/datasets-tutorial/timeseries-datasets/tabular-timeseries-dataset-filtering.ipynb index f59ac29d4..f3269da3d 100644 --- a/how-to-use-azureml/work-with-data/datasets-tutorial/tabular-timeseries-dataset-filtering.ipynb +++ b/how-to-use-azureml/work-with-data/datasets-tutorial/timeseries-datasets/tabular-timeseries-dataset-filtering.ipynb @@ -23,8 +23,8 @@ "\n", "The detailed APIs to be demoed in this script are:\n", "- Create Tabular Dataset instance\n", - "- Assign fine timestamp column and coarse timestamp column for Tabular Dataset to activate Time Series related APIs\n", - "- Clear fine timestamp column and coarse timestamp column\n", + "- Assign timestamp column and partition timestamp column for Tabular Dataset to activate Time Series related APIs\n", + "- Clear timestamp column and partition timestamp column\n", "- Filter in data before a specific time\n", "- Filter in data after a specific time\n", "- Filter in data in a specific time range\n", @@ -32,8 +32,8 @@ "\n", "Besides above APIs, you'll also see:\n", "- Create and load a Workspace\n", - "- Load National Oceanic & Atmospheric (NOAA) weather data into Azure blob storage\n", - "- Create and register NOAA weather data as a Tabular dataset\n", + "- Load weather data into Azure blob storage\n", + "- Create and register weather data as a Tabular dataset\n", "- Re-load Tabular Dataset from your Workspace" ] }, @@ -91,8 +91,7 @@ "from calendar import monthrange\n", "from datetime import datetime, timedelta\n", "\n", - "from azureml.core import Dataset, Datastore, Workspace, Run\n", - "from azureml.opendatasets import NoaaIsdWeather" + "from azureml.core import Dataset, Datastore, Workspace, Run" ] }, { @@ -110,7 +109,7 @@ "metadata": {}, "outputs": [], "source": [ - "ws = Workspace.from_config()\n", + "ws = Workspace.from_config()\n", "dstore = ws.get_default_datastore()\n", "\n", "dset_name = 'weather-data-florida'\n", @@ -124,34 +123,7 @@ "source": [ "## Load Data to Blob Storage\n", "\n", - "This demo uses public NOAA weather data. You can replace this data with your own. The first cell below creates a Pandas Dataframe object with the first 6 months of 2019 NOAA weather data. The last cell saves the data to a CSV file and uploads the CSV file to Azure blob storage to the location specified in the datapath variable. Currently, the Dataset class only reads uploaded files from blob storage. \n", - "\n", - "**NOTE:** to reduce the size of data, we will only keep specific rows with a given stationName." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "target_years = [2019]\n", - "\n", - "for year in target_years:\n", - " for month in range(1, 12+1):\n", - " path = 'data/{}/{:02d}/'.format(year, month)\n", - " \n", - " try: \n", - " start = datetime(year, month, 1)\n", - " end = datetime(year, month, monthrange(year, month)[1]) + timedelta(days=1)\n", - " isd = NoaaIsdWeather(start, end).to_pandas_dataframe()\n", - " isd = isd[isd['stationName'].str.contains('FLORIDA', regex=True, na=False)]\n", - " \n", - " os.makedirs(path, exist_ok=True)\n", - " isd.to_parquet(path + 'data.parquet')\n", - " except Exception as e:\n", - " print('Month {} in year {} likely has no data.\\n'.format(month, year))\n", - " print('Exception: {}'.format(e))" + "This demo uses 2019 weather data under within weather-data folder. You can replace this data with your own." ] }, { @@ -167,7 +139,7 @@ "metadata": {}, "outputs": [], "source": [ - "dstore.upload('data', dset_name, overwrite=True, show_progress=True)" + "dstore.upload('weather-data', dset_name, overwrite=True, show_progress=True)" ] }, { @@ -185,7 +157,7 @@ "source": [ "Create Tabular Dataset instance from blob storage datapath.\n", "\n", - "**TIP:** you can set virtual columns in the partition_format. I.e. if you partition the weather data by state and city, the path can be '/{STATE}/{CITY}/{coarse_time:yyy/MM}/data.parquet'. STATE and CITY would then appear as virtual columns in the dataset, allowing for efficient filtering by these grains. " + "**TIP:** you can set virtual columns in the partition_format. I.e. if you partition the weather data by state and city, the path can be '/{STATE}/{CITY}/{partition_time:yyy/MM}/data.parquet'. STATE and CITY would then appear as virtual columns in the dataset, allowing for efficient filtering by these timestamps. " ] }, { @@ -195,14 +167,14 @@ "outputs": [], "source": [ "datastore_path = [(dstore, dset_name + '/*/*/data.parquet')]\n", - "dataset = Dataset.Tabular.from_parquet_files(path=datastore_path, partition_format = dset_name + '/{coarse_time:yyyy/MM}/data.parquet')" + "dataset = Dataset.Tabular.from_parquet_files(path=datastore_path, partition_format = dset_name + '/{partition_time:yyyy/MM}/data.parquet')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Assign fine timestamp column for Tabular Dataset to activate Time Series related APIs. The column to be assigned should be a Date type, otherwise the assigning will fail." + "Assign timestamp column for Tabular Dataset to activate Time Series related APIs. The column to be assigned should be a Date type, otherwise the assigning will fail." ] }, { @@ -211,8 +183,8 @@ "metadata": {}, "outputs": [], "source": [ - "# for this demo, leave out coarse_time so fine_grain_timestamp is used\n", - "tsd = dataset.with_timestamp_columns(fine_grain_timestamp='datetime') # , coarse_grain_timestamp='coarse_time')" + "# for this demo, leave out partition_time so timestamp is used\n", + "tsd = dataset.with_timestamp_columns(timestamp='datetime') # partition_timestamp='partition_time')" ] }, { @@ -308,7 +280,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "**NOTE:** You must set the coarse_grain_timestamp to None to filter on the fine_grain_timestamp. The below cell will fail unless the second line is uncommented " + "**NOTE:** You must set the partition_timestamp to None to filter on the timestamp. The below cell will fail unless the second line is uncommented " ] }, { @@ -318,7 +290,7 @@ "outputs": [], "source": [ "# select data that occurs within a given time range\n", - "#tsd = tsd.with_timestamp_columns(fine_grain_timestamp='datetime', coarse_grain_timestamp=None)\n", + "#tsd = tsd.with_timestamp_columns(timestamp='datetime', partition_timestamp=None)\n", "tsd2 = tsd.time_after(datetime(2019, 1, 2)).time_before(datetime(2019, 1, 10))\n", "tsd2.to_pandas_dataframe().head(5)" ] @@ -393,7 +365,24 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The columns to be dropped should NOT include timstamp columns.
Below operation will lead to exception." + "If a timeseries column is dropped, the corresponding capabilities will be dropped for the returned dataset.
" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "tsd2 = tsd.drop_columns(columns=['snowDepth', 'version', 'datetime'])\n", + "tsd2.take(5).to_pandas_dataframe()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The exception is expected because dataset loses timeseries capabilities to do time travel." ] }, { @@ -403,7 +392,7 @@ "outputs": [], "source": [ "try:\n", - " tsd2 = tsd.drop_columns(columns=['snowDepth', 'version', 'datetime'])\n", + " tsd2.time_before(datetime(2019, 6, 12)).to_pandas_dataframe().tail(5)\n", "except Exception as e:\n", " print('Expected exception : {}'.format(str(e)))" ] @@ -412,7 +401,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Drop will succeed if modify column list to exclude timestamp columns." + "Drop will return dataset with timeseries capabilities if modify column list to exclude timestamp columns." ] }, { @@ -422,7 +411,16 @@ "outputs": [], "source": [ "tsd2 = tsd.drop_columns(columns=['snowDepth', 'version', 'upload_date'])\n", - "tsd2.take(5).to_pandas_dataframe().sort_values(by='datetime')" + "tsd2.take(5).to_pandas_dataframe()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "tsd2.time_before(datetime(2019, 6, 12)).to_pandas_dataframe().tail(5)" ] }, { @@ -436,7 +434,24 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The columns to be kept should ALWAYS include timstamp columns.
Below operation will lead to exception." + "If a timeseries column is not included, the timeseries capabilities will be dropped for the returned dataset.
" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "tsd2 = tsd.keep_columns(columns=['snowDepth'], validate=False)\n", + "tsd2.to_pandas_dataframe().tail()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The exception is expected because dataset loses timeseries capabilities to do time travel." ] }, { @@ -446,7 +461,7 @@ "outputs": [], "source": [ "try:\n", - " tsd2 = tsd.keep_columns(columns=['snowDepth'], validate=False)\n", + " tsd2.time_before(datetime(2019, 6, 12)).to_pandas_dataframe().tail(5)\n", "except Exception as e:\n", " print('Expected exception : {}'.format(str(e)))" ] @@ -455,7 +470,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Keep will succeed if modify column list to include timestamp columns." + "Keep will return dataset with timeseries capabilities if modify column list to include timestamp columns." ] }, { @@ -464,10 +479,19 @@ "metadata": {}, "outputs": [], "source": [ - "tsd2 = tsd.keep_columns(columns=['snowDepth', 'datetime', 'coarse_time'], validate=False)\n", + "tsd2 = tsd.keep_columns(columns=['snowDepth', 'datetime', 'partition_time'], validate=False)\n", "tsd2.to_pandas_dataframe().tail()" ] }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "tsd2.time_before(datetime(2019, 6, 12)).to_pandas_dataframe().tail(5)" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -480,9 +504,9 @@ "metadata": {}, "source": [ "Rules for reseting are:\n", - "- You cannot assign 'None' to fine_grain_timestamp while assign a valid column name to coarse_grain_timestamp because coarse_grain_timestamp is optional while fine_grain_timestamp is mandatory for Tabular time series data.\n", - "- If you assign 'None' to fine_grain_timestamp, then both fine_grain_timestamp and coarse_grain_timestamp will all be cleared.\n", - "- If you assign only 'None' to coarse_grain_timestamp, then only coarse_grain_timestamp will be cleared." + "- You cannot assign 'None' to timestamp while assign a valid column name to partition_timestamp because partition_timestamp is optional while timestamp is mandatory for Tabular time series data.\n", + "- If you assign 'None' to timestamp, then both timestamp and partition_timestamp will all be cleared.\n", + "- If you assign only 'None' to partition_timestamp, then only partition_timestamp will be cleared." ] }, { @@ -493,17 +517,17 @@ "source": [ "# Illegal clearing, exception is expected.\n", "try:\n", - " tsd2 = tsd.with_timestamp_columns(fine_grain_timestamp=None, coarse_grain_timestamp='coarse_time')\n", + " tsd2 = tsd.with_timestamp_columns(timestamp=None, partition_timestamp='partition_time')\n", "except Exception as e:\n", " print('Cleaning not allowed because {}'.format(str(e)))\n", "\n", "# clear both\n", - "tsd2 = tsd.with_timestamp_columns(fine_grain_timestamp=None, coarse_grain_timestamp=None)\n", + "tsd2 = tsd.with_timestamp_columns(timestamp=None, partition_timestamp=None)\n", "print('after clean both with None/None, timestamp columns are: {}'.format(tsd2.timestamp_columns))\n", "\n", - "# clear coarse_grain_timestamp only and assign 'datetime' as fine timestamp column\n", - "tsd2 = tsd2.with_timestamp_columns(fine_grain_timestamp='datetime', coarse_grain_timestamp=None)\n", - "print('after clean coarse timestamp column, timestamp columns are: {}'.format(tsd2.timestamp_columns))" + "# clear partition_timestamp only and assign 'datetime' as timestamp column\n", + "tsd2 = tsd2.with_timestamp_columns(timestamp='datetime', partition_timestamp=None)\n", + "print('after clean partition timestamp column, timestamp columns are: {}'.format(tsd2.timestamp_columns))" ] }, { @@ -517,7 +541,7 @@ "metadata": { "authors": [ { - "name": "ylxiong" + "name": "jamgan" } ], "category": "tutorial", diff --git a/how-to-use-azureml/work-with-data/datasets-tutorial/timeseries-datasets/tabular-timeseries-dataset-filtering.yml b/how-to-use-azureml/work-with-data/datasets-tutorial/timeseries-datasets/tabular-timeseries-dataset-filtering.yml new file mode 100644 index 000000000..a3471ade9 --- /dev/null +++ b/how-to-use-azureml/work-with-data/datasets-tutorial/timeseries-datasets/tabular-timeseries-dataset-filtering.yml @@ -0,0 +1,6 @@ +name: tabular-timeseries-dataset-filtering +dependencies: +- pip: + - azureml-sdk + - azureml-dataprep + - pandas<=0.23.4 diff --git a/how-to-use-azureml/work-with-data/datasets-tutorial/timeseries-datasets/weather-data/2019/01/data.parquet b/how-to-use-azureml/work-with-data/datasets-tutorial/timeseries-datasets/weather-data/2019/01/data.parquet new file mode 100644 index 000000000..0f2e4be50 Binary files /dev/null and b/how-to-use-azureml/work-with-data/datasets-tutorial/timeseries-datasets/weather-data/2019/01/data.parquet differ diff --git a/how-to-use-azureml/work-with-data/datasets-tutorial/timeseries-datasets/weather-data/2019/02/data.parquet b/how-to-use-azureml/work-with-data/datasets-tutorial/timeseries-datasets/weather-data/2019/02/data.parquet new file mode 100644 index 000000000..ff6b97afd Binary files /dev/null and b/how-to-use-azureml/work-with-data/datasets-tutorial/timeseries-datasets/weather-data/2019/02/data.parquet differ diff --git a/how-to-use-azureml/work-with-data/datasets-tutorial/timeseries-datasets/weather-data/2019/03/data.parquet b/how-to-use-azureml/work-with-data/datasets-tutorial/timeseries-datasets/weather-data/2019/03/data.parquet new file mode 100644 index 000000000..b93cea1f7 Binary files /dev/null and b/how-to-use-azureml/work-with-data/datasets-tutorial/timeseries-datasets/weather-data/2019/03/data.parquet differ diff --git a/how-to-use-azureml/work-with-data/datasets-tutorial/timeseries-datasets/weather-data/2019/04/data.parquet b/how-to-use-azureml/work-with-data/datasets-tutorial/timeseries-datasets/weather-data/2019/04/data.parquet new file mode 100644 index 000000000..257eedc50 Binary files /dev/null and b/how-to-use-azureml/work-with-data/datasets-tutorial/timeseries-datasets/weather-data/2019/04/data.parquet differ diff --git a/how-to-use-azureml/work-with-data/datasets-tutorial/timeseries-datasets/weather-data/2019/05/data.parquet b/how-to-use-azureml/work-with-data/datasets-tutorial/timeseries-datasets/weather-data/2019/05/data.parquet new file mode 100644 index 000000000..9ea22a883 Binary files /dev/null and b/how-to-use-azureml/work-with-data/datasets-tutorial/timeseries-datasets/weather-data/2019/05/data.parquet differ diff --git a/how-to-use-azureml/work-with-data/datasets-tutorial/timeseries-datasets/weather-data/2019/06/data.parquet b/how-to-use-azureml/work-with-data/datasets-tutorial/timeseries-datasets/weather-data/2019/06/data.parquet new file mode 100644 index 000000000..aa4d8a923 Binary files /dev/null and b/how-to-use-azureml/work-with-data/datasets-tutorial/timeseries-datasets/weather-data/2019/06/data.parquet differ diff --git a/how-to-use-azureml/work-with-data/datasets-tutorial/timeseries-datasets/weather-data/2019/07/data.parquet b/how-to-use-azureml/work-with-data/datasets-tutorial/timeseries-datasets/weather-data/2019/07/data.parquet new file mode 100644 index 000000000..a92ccd0db Binary files /dev/null and b/how-to-use-azureml/work-with-data/datasets-tutorial/timeseries-datasets/weather-data/2019/07/data.parquet differ diff --git a/how-to-use-azureml/work-with-data/datasets-tutorial/timeseries-datasets/weather-data/2019/08/data.parquet b/how-to-use-azureml/work-with-data/datasets-tutorial/timeseries-datasets/weather-data/2019/08/data.parquet new file mode 100644 index 000000000..8328d0397 Binary files /dev/null and b/how-to-use-azureml/work-with-data/datasets-tutorial/timeseries-datasets/weather-data/2019/08/data.parquet differ diff --git a/how-to-use-azureml/work-with-data/datasets-tutorial/timeseries-datasets/weather-data/2019/09/data.parquet b/how-to-use-azureml/work-with-data/datasets-tutorial/timeseries-datasets/weather-data/2019/09/data.parquet new file mode 100644 index 000000000..9ee3faa07 Binary files /dev/null and b/how-to-use-azureml/work-with-data/datasets-tutorial/timeseries-datasets/weather-data/2019/09/data.parquet differ diff --git a/how-to-use-azureml/work-with-data/datasets-tutorial/timeseries-datasets/weather-data/2019/10/data.parquet b/how-to-use-azureml/work-with-data/datasets-tutorial/timeseries-datasets/weather-data/2019/10/data.parquet new file mode 100644 index 000000000..7635534c6 Binary files /dev/null and b/how-to-use-azureml/work-with-data/datasets-tutorial/timeseries-datasets/weather-data/2019/10/data.parquet differ diff --git a/how-to-use-azureml/work-with-data/datasets-tutorial/train-dataset/iris.csv b/how-to-use-azureml/work-with-data/datasets-tutorial/train-with-datasets/train-dataset/iris.csv similarity index 100% rename from how-to-use-azureml/work-with-data/datasets-tutorial/train-dataset/iris.csv rename to how-to-use-azureml/work-with-data/datasets-tutorial/train-with-datasets/train-dataset/iris.csv diff --git a/how-to-use-azureml/work-with-data/datasets-tutorial/train-with-datasets.ipynb b/how-to-use-azureml/work-with-data/datasets-tutorial/train-with-datasets/train-with-datasets.ipynb similarity index 87% rename from how-to-use-azureml/work-with-data/datasets-tutorial/train-with-datasets.ipynb rename to how-to-use-azureml/work-with-data/datasets-tutorial/train-with-datasets/train-with-datasets.ipynb index 2a80008cf..9e477a3d8 100644 --- a/how-to-use-azureml/work-with-data/datasets-tutorial/train-with-datasets.ipynb +++ b/how-to-use-azureml/work-with-data/datasets-tutorial/train-with-datasets/train-with-datasets.ipynb @@ -13,23 +13,23 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/work-with-data/datasets-tutorial/train-with-datasets.png)" + "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/work-with-data/datasets-tutorial/train-with-datasets/train-with-datasets.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "# Train with Azure Machine Learning Datasets\n", + "# Train with Azure Machine Learning datasets\n", "Datasets are categorized into TabularDataset and FileDataset based on how users consume them in training. \n", "* A TabularDataset represents data in a tabular format by parsing the provided file or list of files. TabularDataset can be created from csv, tsv, parquet files, SQL query results etc. For the complete list, please visit our [documentation](https://aka.ms/tabulardataset-api-reference). It provides you with the ability to materialize the data into a pandas DataFrame.\n", "* A FileDataset references single or multiple files in your datastores or public urls. This provides you with the ability to download or mount the files to your compute. The files can be of any format, which enables a wider range of machine learning scenarios including deep learning.\n", "\n", - "In this tutorial, you will learn how to train with Azure Machine Learning Datasets:\n", + "In this tutorial, you will learn how to train with Azure Machine Learning datasets:\n", "\n", - "☑ Use Datasets directly in your training script\n", + "☑ Use datasets directly in your training script\n", "\n", - "☑ Use Datasets to mount files to a remote compute" + "☑ Use datasets to mount files to a remote compute" ] }, { @@ -149,12 +149,12 @@ "metadata": {}, "source": [ "You now have the necessary packages and compute resources to train a model in the cloud.\n", - "## Use Datasets directly in training\n", + "## Use datasets directly in training\n", "\n", "### Create a TabularDataset\n", "By creating a dataset, you create a reference to the data source location. If you applied any subsetting transformations to the dataset, they will be stored in the dataset as well. The data remains in its existing location, so no extra storage cost is incurred. \n", "\n", - "Every workspace comes with a default [datastore](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-access-data) (and you can register more) which is backed by the Azure blob storage account associated with the workspace. We can use it to transfer data from local to the cloud, and create Dataset from it. We will now upload the [Iris data](./train-dataset/Iris.csv) to the default datastore (blob) within your workspace." + "Every workspace comes with a default [datastore](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-access-data) (and you can register more) which is backed by the Azure blob storage account associated with the workspace. We can use it to transfer data from local to the cloud, and create dataset from it. We will now upload the [Iris data](./train-dataset/Iris.csv) to the default datastore (blob) within your workspace." ] }, { @@ -174,7 +174,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Then we will create an unregistered TabularDataset pointing to the path in the datastore. You can also create a Dataset from multiple paths. [learn more](https://aka.ms/azureml/howto/createdatasets) " + "Then we will create an unregistered TabularDataset pointing to the path in the datastore. You can also create a dataset from multiple paths. [learn more](https://aka.ms/azureml/howto/createdatasets) \n", + "\n", + "[TabularDataset](https://docs.microsoft.com/python/api/azureml-core/azureml.data.tabulardataset?view=azure-ml-py) represents data in a tabular format by parsing the provided file or list of files. This provides you with the ability to materialize the data into a Pandas or Spark DataFrame. You can create a TabularDataset object from .csv, .tsv, and parquet files, and from SQL query results. For a complete list, see [TabularDatasetFactory](https://docs.microsoft.com/python/api/azureml-core/azureml.data.dataset_factory.tabulardatasetfactory?view=azure-ml-py) class." ] }, { @@ -260,41 +262,19 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Configure and use Datasets as the input to Estimator" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "You can ask the system to build a conda environment based on your dependency specification. Once the environment is built, and if you don't change your dependencies, it will be reused in subsequent runs." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from azureml.core import Environment\n", - "from azureml.core.conda_dependencies import CondaDependencies\n", - "\n", - "conda_env = Environment('conda-env')\n", - "conda_env.python.conda_dependencies = CondaDependencies.create(pip_packages=['azureml-sdk',\n", - " 'azureml-dataprep[pandas,fuse]',\n", - " 'scikit-learn'])" + "### Configure and use datasets as the input to Estimator" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "An estimator object is used to submit the run. Azure Machine Learning has pre-configured estimators for common machine learning frameworks, as well as generic Estimator. Create a generic estimator for by specifying\n", + "An estimator is a configuration object you submit to Azure Machine Learning to instruct how to set up the remote environment. Azure Machine Learning has pre-configured estimators for common machine learning frameworks, as well as generic Estimator. Create a SKLearn estimator by specifying:\n", "\n", "* The name of the estimator object, `est`\n", "* The directory that contains your scripts. All the files in this directory are uploaded into the cluster nodes for execution. \n", "* The training script name, train_titanic.py\n", - "* The input Dataset for training\n", + "* The input dataset for training. `as_named_input()` is required so that the input dataset can be referenced by the assigned name in your training script. \n", "* The compute target. In this case you will use the AmlCompute you created\n", "* The environment definition for the experiment" ] @@ -305,14 +285,14 @@ "metadata": {}, "outputs": [], "source": [ - "from azureml.train.estimator import Estimator\n", + "from azureml.train.sklearn import SKLearn\n", "\n", - "est = Estimator(source_directory=script_folder, \n", - " entry_script='train_iris.py', \n", - " # pass dataset object as an input with name 'titanic'\n", - " inputs=[dataset.as_named_input('iris')],\n", - " compute_target=compute_target,\n", - " environment_definition= conda_env) " + "est = SKLearn(source_directory=script_folder, \n", + " entry_script='train_iris.py', \n", + " # pass dataset object as an input with name 'titanic'\n", + " inputs=[dataset.as_named_input('iris')],\n", + " pip_packages=['azureml-dataprep[fuse]'],\n", + " compute_target=compute_target) " ] }, { @@ -348,9 +328,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Use Datasets to mount files to a remote compute\n", + "## Use datasets to mount files to a remote compute\n", "\n", - "You can use the Dataset object to mount or download files referred by it. When you mount a file system, you attach that file system to a directory (mount point) and make it available to the system. Because mounting load files at the time of processing, it is usually faster than download.
\n", + "You can use the `Dataset` object to mount or download files referred by it. When you mount a file system, you attach that file system to a directory (mount point) and make it available to the system. Because mounting load files at the time of processing, it is usually faster than download.
\n", "Note: mounting is only available for Linux-based compute (DSVM/VM, AMLCompute, HDInsights)." ] }, @@ -396,7 +376,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Create a FileDataset" + "### Create a FileDataset\n", + "\n", + "[FileDataset](https://docs.microsoft.com/python/api/azureml-core/azureml.data.file_dataset.filedataset?view=azure-ml-py) references single or multiple files in your datastores or public URLs. Using this method, you can download or mount the files to your compute as a FileDataset object. The files can be in any format, which enables a wider range of machine learning scenarios, including deep learning." ] }, { @@ -481,6 +463,28 @@ "### Configure & Run" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can ask the system to build a conda environment based on your dependency specification. Once the environment is built, and if you don't change your dependencies, it will be reused in subsequent runs." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from azureml.core import Environment\n", + "from azureml.core.conda_dependencies import CondaDependencies\n", + "\n", + "conda_env = Environment('conda-env')\n", + "conda_env.python.conda_dependencies = CondaDependencies.create(pip_packages=['azureml-sdk',\n", + " 'azureml-dataprep[pandas,fuse]',\n", + " 'scikit-learn'])" + ] + }, { "cell_type": "code", "execution_count": null, @@ -492,7 +496,7 @@ "src = ScriptRunConfig(source_directory=script_folder, \n", " script='train_diabetes.py', \n", " # to mount the dataset on the remote compute and pass the mounted path as an argument to the training script\n", - " arguments =[dataset.as_named_input('diabetes').as_mount('tmp/dataset')])\n", + " arguments =[dataset.as_named_input('diabetes').as_mount()])\n", "\n", "src.run_config.framework = 'python'\n", "src.run_config.environment = conda_env\n", @@ -525,15 +529,16 @@ "metadata": {}, "outputs": [], "source": [ - "print(run.get_metrics())\n", - "metrics = run.get_metrics()" + "run.wait_for_completion()\n", + "metrics = run.get_metrics()\n", + "print(metrics)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "### Register Datasets\n", + "### Register datasets\n", "Use the register() method to register datasets to your workspace so they can be shared with others, reused across various experiments, and referred to by name in your training script." ] }, @@ -553,10 +558,10 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Register models with Datasets\n", + "## Register models with datasets\n", "The last step in the training script wrote the model files in a directory named `outputs` in the VM of the cluster where the job is executed. `outputs` is a special directory in that all content in this directory is automatically uploaded to your workspace. This content appears in the run record in the experiment under your workspace. Hence, the model file is now also available in your workspace.\n", "\n", - "You can register models with Datasets for reproducibility and auditing purpose." + "You can register models with datasets for reproducibility and auditing purpose." ] }, { @@ -642,9 +647,11 @@ "featured" ], "tags": [ - "Dataset" + "Dataset", + "Estimator", + "ScriptRun" ], - "task": "Filtering" + "task": "Train" }, "nbformat": 4, "nbformat_minor": 2 diff --git a/how-to-use-azureml/training/train-on-remote-vm/train-on-remote-vm.yml b/how-to-use-azureml/work-with-data/datasets-tutorial/train-with-datasets/train-with-datasets.yml similarity index 59% rename from how-to-use-azureml/training/train-on-remote-vm/train-on-remote-vm.yml rename to how-to-use-azureml/work-with-data/datasets-tutorial/train-with-datasets/train-with-datasets.yml index d59d5b5e6..4f490f417 100644 --- a/how-to-use-azureml/training/train-on-remote-vm/train-on-remote-vm.yml +++ b/how-to-use-azureml/work-with-data/datasets-tutorial/train-with-datasets/train-with-datasets.yml @@ -1,12 +1,9 @@ -name: train-on-remote-vm +name: train-with-datasets dependencies: -- matplotlib -- tqdm -- scikit-learn - pip: - azureml-sdk - azureml-widgets - azureml-dataprep - - pandas + - pandas<=0.23.4 - fuse - scikit-learn diff --git a/index.md b/index.md index c9a2cef9c..061dfb381 100644 --- a/index.md +++ b/index.md @@ -12,7 +12,6 @@ Machine Learning notebook samples and encourage efficient retrieval of topics an | [Using Azure ML environments](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/training/using-environments/using-environments.ipynb) | Creating and registering environments | None | Local | None | None | None | | [Estimators in AML with hyperparameter tuning](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/training-with-deep-learning/how-to-use-estimator/how-to-use-estimator.ipynb) | Use the Estimator pattern in Azure Machine Learning SDK | None | AML Compute | None | None | None | - ## Tutorials |Title| Task | Dataset | Training Compute | Deployment Target | ML Framework | Tags | @@ -25,13 +24,14 @@ Machine Learning notebook samples and encourage efficient retrieval of topics an | :star:[Data drift on aks](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/monitor-models/data-drift/drift-on-aks.ipynb) | Filtering | NOAA | Remote | AKS | Azure ML | Dataset, Timeseries, Drift | | [Train and deploy a model using Python SDK](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/training/train-within-notebook/train-within-notebook.ipynb) | Training and deploying a model from a notebook | Diabetes | Local | Azure Container Instance | None | None | | :star:[Data drift quickdemo](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/work-with-data/datadrift-tutorial/datadrift-tutorial.ipynb) | Filtering | NOAA | Remote | None | Azure ML | Dataset, Timeseries, Drift | -| :star:[Filtering data using Tabular Timeseiries Dataset related API](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/work-with-data/datasets-tutorial/tabular-timeseries-dataset-filtering.ipynb) | Filtering | NOAA | Local | None | Azure ML | Dataset, Tabular Timeseries | -| :star:[Train with Datasets (Tabular and File)](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/work-with-data/datasets-tutorial/train-with-datasets.ipynb) | Filtering | Iris, Diabetes | Remote | None | Azure ML | Dataset | -| [Forecasting away from training data](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/automated-machine-learning/forecasting-high-frequency/automl-forecasting-function.ipynb) | Forecasting | None | Remote | None | Azure ML AutoML | Forecasting, Confidence Intervals | +| :star:[Introduction to labeled datasets](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/work-with-data/datasets-tutorial/labeled-datasets/labeled-datasets.ipynb) | Train | | Remote | None | Azure ML | Dataset, label, Estimator | +| :star:[Datasets with ML Pipeline](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/work-with-data/datasets-tutorial/pipeline-with-datasets/pipeline-for-image-classification.ipynb) | Train | Fashion MNIST | Remote | None | Azure ML | Dataset, Pipeline, Estimator, ScriptRun | +| :star:[Filtering data using Tabular Timeseiries Dataset related API](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/work-with-data/datasets-tutorial/timeseries-datasets/tabular-timeseries-dataset-filtering.ipynb) | Filtering | NOAA | Local | None | Azure ML | Dataset, Tabular Timeseries | +| :star:[Train with Datasets (Tabular and File)](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/work-with-data/datasets-tutorial/train-with-datasets/train-with-datasets.ipynb) | Train | Iris, Diabetes | Remote | None | Azure ML | Dataset, Estimator, ScriptRun | +| [Forecasting away from training data](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/automated-machine-learning/forecasting-high-frequency/auto-ml-forecasting-function.ipynb) | Forecasting | None | Remote | None | Azure ML AutoML | Forecasting, Confidence Intervals | | [Automated ML run with basic edition features.](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/automated-machine-learning/classification-bank-marketing-all-features/auto-ml-classification-bank-marketing-all-features.ipynb) | Classification | Bankmarketing | AML | ACI | None | featurization, explainability, remote_run, AutomatedML | | [Classification of credit card fraudulent transactions using Automated ML](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/automated-machine-learning/classification-credit-card-fraud/auto-ml-classification-credit-card-fraud.ipynb) | Classification | Creditcard | AML Compute | None | None | remote_run, AutomatedML | -| [Automated ML run with featurization and model explainability.](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/automated-machine-learning/regression-hardware-performance-explanation-and-featurization/auto-ml-regression-hardware-performance-explanation-and-featurization.ipynb) | Regression | MachineData | AML | ACI | None | featurization, explainability, remote_run, AutomatedML | -| [Use MLflow with Azure Machine Learning for training and deployment](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/track-and-monitor-experiments/using-mlflow/train-deploy-pytorch/train-and-deploy-pytorch.ipynb) | Use MLflow with Azure Machine Learning to train and deploy Pa yTorch image classifier model | MNIST | AML Compute | Azure Container Instance | PyTorch | None | +| [Automated ML run with featurization and model explainability.](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/automated-machine-learning/regression-explanation-featurization/auto-ml-regression-explanation-featurization.ipynb) | Regression | MachineData | AML | ACI | None | featurization, explainability, remote_run, AutomatedML | | :star:[Azure Machine Learning Pipeline with DataTranferStep](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-data-transfer.ipynb) | Demonstrates the use of DataTranferStep | Custom | ADF | None | Azure ML | None | | [Getting Started with Azure Machine Learning Pipelines](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-getting-started.ipynb) | Getting Started notebook for ANML Pipelines | Custom | AML Compute | None | Azure ML | None | | [Azure Machine Learning Pipeline with AzureBatchStep](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-how-to-use-azurebatch-to-run-a-windows-executable.ipynb) | Demonstrates the use of AzureBatchStep | Custom | Azure Batch | None | Azure ML | None | @@ -47,7 +47,7 @@ Machine Learning notebook samples and encourage efficient retrieval of topics an | :star:[How to use DatabricksStep with AML Pipelines](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-use-databricks-as-compute-target.ipynb) | Demonstrates the use of DatabricksStep | Custom | Azure Databricks | None | Azure ML, Azure Databricks | None | | :star:[How to use AutoMLStep with AML Pipelines](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-with-automated-machine-learning-step.ipynb) | Demonstrates the use of AutoMLStep | Custom | AML Compute | None | Automated Machine Learning | None | | :star:[Azure Machine Learning Pipelines with Data Dependency](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-with-data-dependency-steps.ipynb) | Demonstrates how to construct a Pipeline with data dependency between steps | Custom | AML Compute | None | Azure ML | None | - +| [How to use run a notebook as a step in AML Pipelines](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/aml-pipelines-with-notebook-runner-step.ipynb) | Demonstrates the use of NotebookRunnerStep | Custom | AML Compute | None | Azure ML | None | ## Training @@ -58,6 +58,7 @@ Machine Learning notebook samples and encourage efficient retrieval of topics an | [Training with hyperparameter tuning using PyTorch](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/ml-frameworks/pytorch/deployment/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb) | Train an image classification model using transfer learning with the PyTorch estimator | ImageNet | AML Compute | Azure Container Instance | PyTorch | None | | [Distributed PyTorch](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/ml-frameworks/pytorch/training/distributed-pytorch-with-horovod/distributed-pytorch-with-horovod.ipynb) | Train a model using the distributed training via Horovod | MNIST | AML Compute | None | PyTorch | None | | [Distributed training with PyTorch](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/ml-frameworks/pytorch/training/distributed-pytorch-with-nccl-gloo/distributed-pytorch-with-nccl-gloo.ipynb) | Train a model using distributed training via Nccl/Gloo | MNIST | AML Compute | None | PyTorch | None | +| [PyTorch object detection](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/ml-frameworks/pytorch/training/mask-rcnn-object-detection/pytorch-mask-rcnn.ipynb) | Fine-tune PyTorch object detection model with a custom dockerfile | Custom | AML Compute | None | PyTorch | remote run, docker | | [Training and hyperparameter tuning with Scikit-learn](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/ml-frameworks/scikit-learn/training/train-hyperparameter-tune-deploy-with-sklearn/train-hyperparameter-tune-deploy-with-sklearn.ipynb) | Train a support vector machine (SVM) to perform classification | Iris | AML Compute | None | Scikit-learn | None | | [Training and hyperparameter tuning using the TensorFlow estimator](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/ml-frameworks/tensorflow/deployment/train-hyperparameter-tune-deploy-with-tensorflow/train-hyperparameter-tune-deploy-with-tensorflow.ipynb) | Train a deep neural network | MNIST | AML Compute | Azure Container Instance | TensorFlow | None | | [Distributed training using TensorFlow with Horovod](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/ml-frameworks/tensorflow/training/distributed-tensorflow-with-horovod/distributed-tensorflow-with-horovod.ipynb) | Use the TensorFlow estimator to train a word2vec model | None | AML Compute | None | TensorFlow | None | @@ -76,7 +77,6 @@ Machine Learning notebook samples and encourage efficient retrieval of topics an | [Use MLflow with AML for a remote training run](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/track-and-monitor-experiments/using-mlflow/train-remote/train-remote.ipynb) | Use MLflow tracking APIs together with AML for storing your metrics and artifacts | Diabetes | AML Compute | None | None | None | - ## Deployment @@ -88,16 +88,14 @@ Machine Learning notebook samples and encourage efficient retrieval of topics an | :star:[Deploy models to AKS using controlled roll out](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/deployment/deploy-with-controlled-rollout/deploy-aks-with-controlled-rollout.ipynb) | Deploy a model with Azure Machine Learning | Diabetes | None | Azure Kubernetes Service | Scikit-learn | None | | [Train MNIST in PyTorch, convert, and deploy with ONNX Runtime](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/deployment/onnx/onnx-train-pytorch-aml-deploy-mnist.ipynb) | Image Classification | MNIST | AML Compute | Azure Container Instance | ONNX | ONNX Converter | | [Deploy ResNet50 with ONNX Runtime](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/deployment/onnx/onnx-modelzoo-aml-deploy-resnet50.ipynb) | Image Classification | ImageNet | Local | Azure Container Instance | ONNX | ONNX Model Zoo | -| [Deploy a model as a web service using MLflow](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/track-and-monitor-experiments/using-mlflow/deploy-model/deploy-model.ipynb) | Use MLflow with AML | Diabetes | None | Azure Container Instance | Scikit-learn | None | | :star:[Convert and deploy TinyYolo with ONNX Runtime](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/deployment/onnx/onnx-convert-aml-deploy-tinyyolo.ipynb) | Object Detection | PASCAL VOC | local | Azure Container Instance | ONNX | ONNX Converter | - +| [Register Spark model and deploy as webservice](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/deployment/spark/model-register-and-deploy-spark.ipynb) | | Iris | None | Azure Container Instance | PySpark | | ## Other Notebooks |Title| Task | Dataset | Training Compute | Deployment Target | ML Framework | Tags | |:----|:-----|:-------:|:----------------:|:-----------------:|:------------:|:------------:| | [DNN Text Featurization](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/automated-machine-learning/classification-text-dnn/auto-ml-classification-text-dnn.ipynb) | Text featurization using DNNs for classification | None | AML Compute | None | None | None | -| [Automated ML Grouping with Pipeline.](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/automated-machine-learning/forecasting-grouping/auto-ml-forecasting-grouping.ipynb) | Use AzureML Pipeline to trigger multiple Automated ML runs. | Orange Juice Sales | AML Compute | Azure Container Instance | Scikit-learn, Pytorch | AutomatedML | | [configuration](https://github.com/Azure/MachineLearningNotebooks/blob/master/configuration.ipynb) | | | | | | | | [lightgbm-example](https://github.com/Azure/MachineLearningNotebooks/blob/master//contrib/gbdt/lightgbm/lightgbm-example.ipynb) | | | | | | | | [azure-ml-with-nvidia-rapids](https://github.com/Azure/MachineLearningNotebooks/blob/master//contrib/RAPIDS/azure-ml-with-nvidia-rapids.ipynb) | | | | | | | @@ -107,7 +105,6 @@ Machine Learning notebook samples and encourage efficient retrieval of topics an | [auto-ml-regression](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/automated-machine-learning/regression/auto-ml-regression.ipynb) | | | | | | | | [build-model-run-history-03](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/azure-databricks/amlsdk/build-model-run-history-03.ipynb) | | | | | | | | [deploy-to-aci-04](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/azure-databricks/amlsdk/deploy-to-aci-04.ipynb) | | | | | | | -| [deploy-to-aks-05](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/azure-databricks/amlsdk/deploy-to-aks-05.ipynb) | | | | | | | | [ingest-data-02](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/azure-databricks/amlsdk/ingest-data-02.ipynb) | | | | | | | | [installation-and-configuration-01](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/azure-databricks/amlsdk/installation-and-configuration-01.ipynb) | | | | | | | | [automl-databricks-local-01](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/azure-databricks/automl/automl-databricks-local-01.ipynb) | | | | | | | @@ -121,23 +118,21 @@ Machine Learning notebook samples and encourage efficient retrieval of topics an | [enable-app-insights-in-production-service](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/deployment/enable-app-insights-in-production-service/enable-app-insights-in-production-service.ipynb) | | | | | | | | [onnx-model-register-and-deploy](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/deployment/onnx/onnx-model-register-and-deploy.ipynb) | | | | | | | | [production-deploy-to-aks](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/deployment/production-deploy-to-aks/production-deploy-to-aks.ipynb) | | | | | | | -| [register-model-create-image-deploy-service](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/deployment/register-model-create-image-deploy-service/register-model-create-image-deploy-service.ipynb) | | | | | | | +| [production-deploy-to-aks-gpu](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/deployment/production-deploy-to-aks-gpu/production-deploy-to-aks-gpu.ipynb) | | | | | | | | [tensorflow-model-register-and-deploy](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/deployment/tensorflow/tensorflow-model-register-and-deploy.ipynb) | | | | | | | | [explain-model-on-amlcompute](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/explain-model/azure-integration/remote-explanation/explain-model-on-amlcompute.ipynb) | | | | | | | | [save-retrieve-explanations-run-history](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/explain-model/azure-integration/run-history/save-retrieve-explanations-run-history.ipynb) | | | | | | | | [train-explain-model-locally-and-deploy](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/explain-model/azure-integration/scoring-time/train-explain-model-locally-and-deploy.ipynb) | | | | | | | | [train-explain-model-on-amlcompute-and-deploy](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/explain-model/azure-integration/scoring-time/train-explain-model-on-amlcompute-and-deploy.ipynb) | | | | | | | +| [training_notebook](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines/notebook_runner/training_notebook.ipynb) | | | | | | | | [nyc-taxi-data-regression-model-building](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/machine-learning-pipelines/nyc-taxi-data-regression-model-building/nyc-taxi-data-regression-model-building.ipynb) | | | | | | | -| [pipeline-batch-scoring](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/machine-learning-pipelines/pipeline-batch-scoring/pipeline-batch-scoring.ipynb) | | | | | | | -| [pipeline-style-transfer](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/machine-learning-pipelines/pipeline-style-transfer/pipeline-style-transfer.ipynb) | | | | | | | | [authentication-in-azureml](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/manage-azureml-service/authentication-in-azureml/authentication-in-azureml.ipynb) | | | | | | | | [Logging APIs](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/track-and-monitor-experiments/logging-api/logging-api.ipynb) | Logging APIs and analyzing results | None | None | None | None | None | | [distributed-cntk-with-custom-docker](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/training-with-deep-learning/distributed-cntk-with-custom-docker/distributed-cntk-with-custom-docker.ipynb) | | | | | | | | [notebook_example](https://github.com/Azure/MachineLearningNotebooks/blob/master//how-to-use-azureml/training-with-deep-learning/how-to-use-estimator/notebook_example.ipynb) | | | | | | | | [configuration](https://github.com/Azure/MachineLearningNotebooks/blob/master//setup-environment/configuration.ipynb) | | | | | | | -| [img-classification-part1-training](https://github.com/Azure/MachineLearningNotebooks/blob/master//tutorials/img-classification-part1-training.ipynb) | | | | | | | -| [img-classification-part2-deploy](https://github.com/Azure/MachineLearningNotebooks/blob/master//tutorials/img-classification-part2-deploy.ipynb) | | | | | | | -| [regression-automated-ml](https://github.com/Azure/MachineLearningNotebooks/blob/master//tutorials/regression-automated-ml.ipynb) | | | | | | | -| [tutorial-1st-experiment-sdk-train](https://github.com/Azure/MachineLearningNotebooks/blob/master//tutorials/tutorial-1st-experiment-sdk-train.ipynb) | | | | | | | -| [tutorial-pipeline-batch-scoring-classification](https://github.com/Azure/MachineLearningNotebooks/blob/master//tutorials/tutorial-pipeline-batch-scoring-classification.ipynb) | | | | | | | - +| [tutorial-1st-experiment-sdk-train](https://github.com/Azure/MachineLearningNotebooks/blob/master//tutorials/create-first-ml-experiment/tutorial-1st-experiment-sdk-train.ipynb) | | | | | | | +| [img-classification-part1-training](https://github.com/Azure/MachineLearningNotebooks/blob/master//tutorials/image-classification-mnist-data/img-classification-part1-training.ipynb) | | | | | | | +| [img-classification-part2-deploy](https://github.com/Azure/MachineLearningNotebooks/blob/master//tutorials/image-classification-mnist-data/img-classification-part2-deploy.ipynb) | | | | | | | +| [tutorial-pipeline-batch-scoring-classification](https://github.com/Azure/MachineLearningNotebooks/blob/master//tutorials/machine-learning-pipelines-advanced/tutorial-pipeline-batch-scoring-classification.ipynb) | | | | | | | +| [regression-automated-ml](https://github.com/Azure/MachineLearningNotebooks/blob/master//tutorials/regression-automl-nyc-taxi-data/regression-automated-ml.ipynb) | | | | | | | diff --git a/setup-environment/configuration.ipynb b/setup-environment/configuration.ipynb index 385473ce0..9ce0a661d 100644 --- a/setup-environment/configuration.ipynb +++ b/setup-environment/configuration.ipynb @@ -102,7 +102,7 @@ "source": [ "import azureml.core\n", "\n", - "print(\"This notebook was created using version 1.0.76.2 of the Azure ML SDK\")\n", + "print(\"This notebook was created using version 1.3.0 of the Azure ML SDK\")\n", "print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")" ] }, diff --git a/tutorials/README.md b/tutorials/README.md index 07575bc0f..53b6e9750 100644 --- a/tutorials/README.md +++ b/tutorials/README.md @@ -1,27 +1,34 @@ -## Azure Machine Learning service Tutorial +# Azure Machine Learning Tutorials -Complete these tutorials to learn how to train and deploy models using Azure Machine Learning services and Python SDK. These Notebooks accompany the -two sets of tutorial articles for: +Azure Machine Learning, a cloud-based environment you can use to train, deploy, automate, manage, and track ML models. - * [Image classification using MNIST dataset](https://docs.microsoft.com/en-us/azure/machine-learning/service/tutorial-train-models-with-aml) - * [Regression using NYC Taxi dataset](https://docs.microsoft.com/en-us/azure/machine-learning/service/tutorial-data-prep) +Azure Machine Learning can be used for any kind of machine learning, from classical ML to supervised, unsupervised, and deep learning. -If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, run the [configuration Notebook](../configuration.ipynb) notebook first to set up your Azure ML Workspace. Then, run the notebooks in following recommended order. +This folder contains a collection of Jupyter Notebooks with the code used in accompanying step-by-step tutorials. -### Create first ML experiment +## Set up your environment. -* [Part 1](https://docs.microsoft.com/azure/machine-learning/service/tutorial-quickstart-setup): Set up workspace & dev environment -* [Part 2](tutorial-quickstart-train-model.ipynb): Learn the foundational design patterns in Azure Machine Learning service, and train a simple scikit-learn model based on the diabetes data set +If you are using an Azure Machine Learning Notebook VM, everything is already set up for you. Otherwise, see the [get started creating your first ML experiment with the Python SDK tutorial](https://docs.microsoft.com/en-us/azure/machine-learning/tutorial-1st-experiment-sdk-setup). -### Image classification +## Introductory Samples - * [Part 1](img-classification-part1-training.ipynb): Train an image classification model with Azure Machine Learning. - * [Part 2](img-classification-part2-deploy.ipynb): Deploy an image classification model from first tutorial in Azure Container Instance (ACI). +The following tutorials are intended to provide an introductory overview of Azure Machine Learning. - ### Regression - * [Part 1](regression-part1-data-prep.ipynb): Prepare the data using Azure Machine Learning Data Prep SDK. - * [Part 2](regression-part2-automated-ml.ipynb): Train a model using Automated Machine Learning. +| Tutorial | Description | Notebook | Task | Framework | +| --- | --- | --- | --- | --- | +| [Train your first ML Model](https://docs.microsoft.com/azure/machine-learning/tutorial-1st-experiment-sdk-train) | Learn the foundational design patterns in Azure Machine Learning and train a scikit-learn model based on a diabetes data set. | [tutorial-quickstart-train-model.ipynb](create-first-ml-experiment/tutorial-1st-experiment-sdk-train.ipynb) | Regression | Scikit-Learn +| [Train an image classification model](https://docs.microsoft.com/azure/machine-learning/tutorial-train-models-with-aml) | Train a scikit-learn image classification model. | [img-classification-part1-training.ipynb](image-classification-mnist-data/img-classification-part1-training.ipynb) | Image Classification | Scikit-Learn +| [Deploy an image classification model](https://docs.microsoft.com/azure/machine-learning/tutorial-deploy-models-with-aml) | Deploy a scikit-learn image classification model to Azure Container Instances. | [img-classification-part2-deploy.ipynb](image-classification-mnist-data/img-classification-part2-deploy.ipynb) | Image Classification | Scikit-Learn +| [Use automated machine learning to predict taxi fares](https://docs.microsoft.com/azure/machine-learning/tutorial-auto-train-models) | Train a regression model to predict taxi fares using Automated Machine Learning. | [regression-part2-automated-ml.ipynb](regression-automl-nyc-taxi-data/regression-automated-ml.ipynb) | Regression | Automated ML - Also find quickstarts and how-tos on the [official documentation site for Azure Machine Learning service](https://docs.microsoft.com/en-us/azure/machine-learning/service/). +## Advanced Samples + +The following tutorials are intended to provide examples of more advanced feature in Azure Machine Learning. + +| Tutorial | Description | Notebook | Task | Framework | +| --- | --- | --- | --- | --- | +| [Build an Azure Machine Learning pipeline for batch scoring](https://docs.microsoft.com/azure/machine-learning/tutorial-pipeline-batch-scoring-classification) | Create an Azure Machine Learning pipeline to run batch scoring image classification jobs | [tutorial-pipeline-batch-scoring-classification.ipynb](machine-learning-pipelines-advanced/tutorial-pipeline-batch-scoring-classification.ipynb) | Image Classification | TensorFlow + +For additional documentation and resources, see the [official documentation site for Azure Machine Learning](https://docs.microsoft.com/azure/machine-learning/). ![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/tutorials/README.png) \ No newline at end of file diff --git a/tutorials/create-first-ml-experiment/imgs/experiment_main.png b/tutorials/create-first-ml-experiment/imgs/experiment_main.png new file mode 100644 index 000000000..2419855bb Binary files /dev/null and b/tutorials/create-first-ml-experiment/imgs/experiment_main.png differ diff --git a/tutorials/create-first-ml-experiment/imgs/model_download.png b/tutorials/create-first-ml-experiment/imgs/model_download.png new file mode 100644 index 000000000..adcdf70ec Binary files /dev/null and b/tutorials/create-first-ml-experiment/imgs/model_download.png differ diff --git a/tutorials/tutorial-1st-experiment-sdk-train.ipynb b/tutorials/create-first-ml-experiment/tutorial-1st-experiment-sdk-train.ipynb similarity index 89% rename from tutorials/tutorial-1st-experiment-sdk-train.ipynb rename to tutorials/create-first-ml-experiment/tutorial-1st-experiment-sdk-train.ipynb index 354d23b5b..09ac2ca5b 100644 --- a/tutorials/tutorial-1st-experiment-sdk-train.ipynb +++ b/tutorials/create-first-ml-experiment/tutorial-1st-experiment-sdk-train.ipynb @@ -31,7 +31,7 @@ "\n", "> * Connect your workspace and create an experiment \n", "> * Load data and train a scikit-learn model\n", - "> * View training results in the portal\n", + "> * View training results in the studio\n", "> * Retrieve the best model" ] }, @@ -74,7 +74,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Now create an experiment in your workspace. An experiment is another foundational cloud resource that represents a collection of trials (individual model runs). In this tutorial you use the experiment to create runs and track your model training in the Azure Portal. Parameters include your workspace reference, and a string name for the experiment." + "Now create an experiment in your workspace. An experiment is another foundational cloud resource that represents a collection of trials (individual model runs). In this tutorial you use the experiment to create runs and track your model training in the Azure Machine Learning studio. Parameters include your workspace reference, and a string name for the experiment." ] }, { @@ -171,7 +171,7 @@ "\n", "1. For each alpha hyperparameter value in the `alphas` array, a new run is created within the experiment. The alpha value is logged to differentiate between each run.\n", "1. In each run, a Ridge model is instantiated, trained, and used to run predictions. The root-mean-squared-error is calculated for the actual versus predicted values, and then logged to the run. At this point the run has metadata attached for both the alpha value and the rmse accuracy.\n", - "1. Next, the model for each run is serialized and uploaded to the run. This allows you to download the model file from the run in the portal.\n", + "1. Next, the model for each run is serialized and uploaded to the run. This allows you to download the model file from the run in the studio.\n", "1. At the end of each iteration the run is completed by calling `run.complete()`.\n", "\n" ] @@ -180,7 +180,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "After the training has completed, call the `experiment` variable to fetch a link to the experiment in the portal." + "After the training has completed, call the `experiment` variable to fetch a link to the experiment in the studio." ] }, { @@ -196,14 +196,14 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## View training results in portal" + "## View training results in studio" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Following the **Link to Azure Portal** takes you to the main experiment page. Here you see all the individual runs in the experiment. Any custom-logged values (`alpha_value` and `rmse`, in this case) become fields for each run, and also become available for the charts and tiles at the top of the experiment page. To add a logged metric to a chart or tile, hover over it, click the edit button, and find your custom-logged metric.\n", + "Following the **Link to Azure Machine Learning studio** takes you to the main experiment page. Here you see all the individual runs in the experiment. Any custom-logged values (`alpha_value` and `rmse`, in this case) become fields for each run, and also become available for the charts and tiles at the top of the experiment page. To add a logged metric to a chart or tile, hover over it, click the edit button, and find your custom-logged metric.\n", "\n", "When training models at scale over hundreds and thousands of runs, this page makes it easy to see every model you trained, specifically how they were trained, and how your unique metrics have changed over time." ] @@ -212,21 +212,21 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "![Main Experiment page in Portal](imgs/experiment_main.png)" + "![Main Experiment page in the studio](../imgs/experiment_main.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Clicking on a run number link in the `RUN NUMBER` column takes you to the page for each individual run. The default tab **Details** shows you more-detailed information on each run. Navigate to the **Outputs** tab, and you see the `.pkl` file for the model that was uploaded to the run during each training iteration. Here you can download the model file, rather than having to retrain it manually." + "Select a run number link in the `RUN NUMBER` column to see the page for an individual run. The default tab **Details** shows you more-detailed information on each run. Navigate to the **Outputs + logs** tab, and you see the `.pkl` file for the model that was uploaded to the run during each training iteration. Here you can download the model file, rather than having to retrain it manually." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "![Run details page in Portal](imgs/model_download.png)" + "![Run details page in the studio](../imgs/model_download.png)" ] }, { @@ -240,7 +240,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "In addition to being able to download model files from the experiment in the portal, you can also download them programmatically. The following code iterates through each run in the experiment, and accesses both the logged run metrics and the run details (which contains the run_id). This keeps track of the best run, in this case the run with the lowest root-mean-squared-error." + "In addition to being able to download model files from the experiment in the studio, you can also download them programmatically. The following code iterates through each run in the experiment, and accesses both the logged run metrics and the run details (which contains the run_id). This keeps track of the best run, in this case the run with the lowest root-mean-squared-error." ] }, { @@ -317,7 +317,9 @@ "\n", "If you used a cloud notebook server, stop the VM when you are not using it to reduce cost.\n", "\n", - "1. In your workspace, select **Notebook VMs**.\n", + "1. In your workspace, select **Compute**.\n", + "\n", + "1. Select the **Notebook VMs** tab in the compute page.\n", "\n", "1. From the list, select the VM.\n", "\n", @@ -350,7 +352,7 @@ "\n", "> * Connected your workspace and created an experiment\n", "> * Loaded data and trained scikit-learn models\n", - "> * Viewed training results in the portal and retrieved models\n", + "> * Viewed training results in the studio and retrieved models\n", "\n", "[Deploy your model](https://docs.microsoft.com/azure/machine-learning/service/tutorial-deploy-models-with-aml) with Azure Machine Learning.\n", "Learn how to develop [automated machine learning](https://docs.microsoft.com/azure/machine-learning/service/tutorial-auto-train-models) experiments." diff --git a/tutorials/tutorial-1st-experiment-sdk-train.yml b/tutorials/create-first-ml-experiment/tutorial-1st-experiment-sdk-train.yml similarity index 100% rename from tutorials/tutorial-1st-experiment-sdk-train.yml rename to tutorials/create-first-ml-experiment/tutorial-1st-experiment-sdk-train.yml diff --git a/tutorials/img-classification-part1-training.ipynb b/tutorials/image-classification-mnist-data/img-classification-part1-training.ipynb similarity index 90% rename from tutorials/img-classification-part1-training.ipynb rename to tutorials/image-classification-mnist-data/img-classification-part1-training.ipynb index db744bb42..8ce815b8c 100644 --- a/tutorials/img-classification-part1-training.ipynb +++ b/tutorials/image-classification-mnist-data/img-classification-part1-training.ipynb @@ -30,7 +30,9 @@ "\n", "## Prerequisites\n", "\n", - "See prerequisites in the [Azure Machine Learning documentation](https://docs.microsoft.com/azure/machine-learning/service/tutorial-train-models-with-aml#prerequisites)." + "See prerequisites in the [Azure Machine Learning documentation](https://docs.microsoft.com/azure/machine-learning/service/tutorial-train-models-with-aml#prerequisites).\n", + "\n", + "On the computer running this notebook, conda install matplotlib, numpy, scikit-learn=0.22.1" ] }, { @@ -126,7 +128,8 @@ "metadata": {}, "source": [ "### Create or Attach existing compute resource\n", - "By using Azure Machine Learning Compute, a managed service, data scientists can train machine learning models on clusters of Azure virtual machines. Examples include VMs with GPU support. In this tutorial, you create Azure Machine Learning Compute as your training environment. The code below creates the compute clusters for you if they don't already exist in your workspace.\n", + "By using Azure Machine Learning Compute, a managed service, data scientists can train machine learning models on clusters of Azure virtual machines. Examples include VMs with GPU support. In this tutorial, you create Azure Machine Learning Compute as your training environment. You will submit Python code to run on this VM later in the tutorial. \n", + "The code below creates the compute clusters for you if they don't already exist in your workspace.\n", "\n", "**Creation of compute takes approximately 5 minutes.** If the AmlCompute with that name is already in your workspace the code will skip the creation process." ] @@ -236,12 +239,15 @@ "source": [ "# make sure utils.py is in the same directory as this code\n", "from utils import load_data\n", + "import glob\n", + "\n", "\n", "# note we also shrink the intensity values (X) from 0-255 to 0-1. This helps the model converge faster.\n", - "X_train = load_data(os.path.join(data_folder, \"train-images-idx3-ubyte.gz\"), False) / 255.0\n", - "X_test = load_data(os.path.join(data_folder, \"t10k-images-idx3-ubyte.gz\"), False) / 255.0\n", - "y_train = load_data(os.path.join(data_folder, \"train-labels-idx1-ubyte.gz\"), True).reshape(-1)\n", - "y_test = load_data(os.path.join(data_folder, \"t10k-labels-idx1-ubyte.gz\"), True).reshape(-1)\n", + "X_train = load_data(glob.glob(os.path.join(data_folder,\"**/train-images-idx3-ubyte.gz\"), recursive=True)[0], False) / 255.0\n", + "X_test = load_data(glob.glob(os.path.join(data_folder,\"**/t10k-images-idx3-ubyte.gz\"), recursive=True)[0], False) / 255.0\n", + "y_train = load_data(glob.glob(os.path.join(data_folder,\"**/train-labels-idx1-ubyte.gz\"), recursive=True)[0], True).reshape(-1)\n", + "y_test = load_data(glob.glob(os.path.join(data_folder,\"**/t10k-labels-idx1-ubyte.gz\"), recursive=True)[0], True).reshape(-1)\n", + "\n", "\n", "# now let's show some randomly chosen images from the traininng set.\n", "count = 0\n", @@ -263,7 +269,7 @@ "source": [ "## Train on a remote cluster\n", "\n", - "For this task, submit the job to the remote training cluster you set up earlier. To submit a job you:\n", + "For this task, you submit the job to run on the remote training cluster you set up earlier. To submit a job you:\n", "* Create a directory\n", "* Create a training script\n", "* Create an estimator object\n", @@ -308,7 +314,7 @@ "import glob\n", "\n", "from sklearn.linear_model import LogisticRegression\n", - "from sklearn.externals import joblib\n", + "import joblib\n", "\n", "from azureml.core import Run\n", "from utils import load_data\n", @@ -396,15 +402,20 @@ "source": [ "### Create an estimator\n", "\n", - "An estimator object is used to submit the run. Azure Machine Learning has pre-configured estimators for common machine learning frameworks, as well as generic Estimator. Create SKLearn estimator for scikit-learn model, by specifying\n", + "An estimator object is used to submit the run. Azure Machine Learning has pre-configured estimators for common machine learning frameworks, as well as generic Estimator. Create an estimator by specifying\n", "\n", "* The name of the estimator object, `est`\n", "* The directory that contains your scripts. All the files in this directory are uploaded into the cluster nodes for execution. \n", "* The compute target. In this case you will use the AmlCompute you created\n", "* The training script name, train.py\n", - "* Parameters required from the training script \n", + "* An environment that contains the libraries needed to run the script\n", + "* Parameters required from the training script. \n", "\n", - "In this tutorial, the target is AmlCompute. All files in the script folder are uploaded into the cluster nodes for execution. The data_folder is set to use the dataset." + "In this tutorial, the target is AmlCompute. All files in the script folder are uploaded into the cluster nodes for execution. The data_folder is set to use the dataset.\n", + "\n", + "First, create the environment that contains: the scikit-learn library, azureml-dataprep required for accessing the dataset, and azureml-defaults which contains the dependencies for logging metrics. The azureml-defaults also contains the dependencies required for deploying the model as a web service later in the part 2 of the tutorial.\n", + "\n", + "Once the environment is defined, register it with the Workspace to re-use it in part 2 of the tutorial." ] }, { @@ -417,10 +428,20 @@ "from azureml.core.conda_dependencies import CondaDependencies\n", "\n", "# to install required packages\n", - "env = Environment('my_env')\n", - "cd = CondaDependencies.create(pip_packages=['azureml-sdk','scikit-learn','azureml-dataprep[pandas,fuse]>=1.1.14'])\n", + "env = Environment('tutorial-env')\n", + "cd = CondaDependencies.create(pip_packages=['azureml-dataprep[pandas,fuse]>=1.1.14', 'azureml-defaults'], conda_packages = ['scikit-learn==0.22.1'])\n", + "\n", + "env.python.conda_dependencies = cd\n", "\n", - "env.python.conda_dependencies = cd" + "# Register environment to re-use later\n", + "env.register(workspace = ws)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Then, create the estimator by specifying the training script, compute target and environment." ] }, { @@ -433,7 +454,7 @@ }, "outputs": [], "source": [ - "from azureml.train.sklearn import SKLearn\n", + "from azureml.train.estimator import Estimator\n", "\n", "script_params = {\n", " # to mount files referenced by mnist dataset\n", @@ -441,7 +462,7 @@ " '--regularization': 0.5\n", "}\n", "\n", - "est = SKLearn(source_directory=script_folder,\n", + "est = Estimator(source_directory=script_folder,\n", " script_params=script_params,\n", " compute_target=compute_target,\n", " environment_definition=env,\n", @@ -666,7 +687,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.6.9" + "version": "3.7.6" }, "msauthor": "roastala" }, diff --git a/tutorials/img-classification-part1-training.yml b/tutorials/image-classification-mnist-data/img-classification-part1-training.yml similarity index 100% rename from tutorials/img-classification-part1-training.yml rename to tutorials/image-classification-mnist-data/img-classification-part1-training.yml diff --git a/tutorials/img-classification-part2-deploy.ipynb b/tutorials/image-classification-mnist-data/img-classification-part2-deploy.ipynb similarity index 81% rename from tutorials/img-classification-part2-deploy.ipynb rename to tutorials/image-classification-mnist-data/img-classification-part2-deploy.ipynb index 8301618a7..cf8cc89a9 100644 --- a/tutorials/img-classification-part2-deploy.ipynb +++ b/tutorials/image-classification-mnist-data/img-classification-part2-deploy.ipynb @@ -39,11 +39,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "tags": [ - "register model from file" - ] - }, + "metadata": {}, "outputs": [], "source": [ "# If you did NOT complete the tutorial, you can instead run this cell \n", @@ -64,16 +60,17 @@ " description=\"Mnist handwriting recognition\",\n", " workspace=ws)\n", "\n", - "# download test data\n", - "import os\n", - "import urllib.request\n", + "from azureml.core.environment import Environment\n", + "from azureml.core.conda_dependencies import CondaDependencies\n", "\n", - "data_folder = os.path.join(os.getcwd(), 'data')\n", - "os.makedirs(data_folder, exist_ok = True)\n", + "# to install required packages\n", + "env = Environment('tutorial-env')\n", + "cd = CondaDependencies.create(pip_packages=['azureml-dataprep[pandas,fuse]>=1.1.14', 'azureml-defaults'], conda_packages = ['scikit-learn==0.22.1'])\n", "\n", + "env.python.conda_dependencies = cd\n", "\n", - "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz', filename=os.path.join(data_folder, 'test-images.gz'))\n", - "urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz', filename=os.path.join(data_folder, 'test-labels.gz'))" + "# Register environment to re-use later\n", + "env.register(workspace = ws)" ] }, { @@ -113,50 +110,23 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Retrieve the model\n", - "\n", - "You registered a model in your workspace in the previous tutorial. Now, load this workspace and download the model to your local directory." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "tags": [ - "load workspace", - "download model" - ] - }, - "outputs": [], - "source": [ - "from azureml.core import Workspace\n", - "from azureml.core.model import Model\n", - "import os \n", - "ws = Workspace.from_config()\n", - "model=Model(ws, 'sklearn_mnist')\n", + "## Deploy as web service\n", "\n", - "model.download(target_dir=os.getcwd(), exist_ok=True)\n", + "Deploy the model as a web service hosted in ACI. \n", "\n", - "# verify the downloaded model file\n", - "file_path = os.path.join(os.getcwd(), \"sklearn_mnist_model.pkl\")\n", + "To build the correct environment for ACI, provide the following:\n", + "* A scoring script to show how to use the model\n", + "* A configuration file to build the ACI\n", + "* The model you trained before\n", "\n", - "os.stat(file_path)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Test model locally\n", + "### Create scoring script\n", "\n", - "Before deploying, make sure your model is working locally by:\n", - "* Loading test data\n", - "* Predicting test data\n", - "* Examining the confusion matrix\n", + "Create the scoring script, called score.py, used by the web service call to show how to use the model.\n", "\n", - "### Load test data\n", + "You must include two required functions into the scoring script:\n", + "* The `init()` function, which typically loads the model into a global object. This function is run only once when the Docker container is started. \n", "\n", - "Load the test data from the **./data/** directory created during the training tutorial." + "* The `run(input_data)` function uses the model to predict a value based on the input data. Inputs and outputs to the run typically use JSON for serialization and de-serialization, but other formats are supported.\n" ] }, { @@ -165,179 +135,169 @@ "metadata": {}, "outputs": [], "source": [ - "from utils import load_data\n", + "%%writefile score.py\n", + "import json\n", + "import numpy as np\n", "import os\n", + "import pickle\n", + "import joblib\n", "\n", - "data_folder = os.path.join(os.getcwd(), 'data')\n", - "# note we also shrink the intensity values (X) from 0-255 to 0-1. This helps the neural network converge faster\n", - "X_test = load_data(os.path.join(data_folder, 'test-images.gz'), False) / 255.0\n", - "y_test = load_data(os.path.join(data_folder, 'test-labels.gz'), True).reshape(-1)" + "def init():\n", + " global model\n", + " # AZUREML_MODEL_DIR is an environment variable created during deployment.\n", + " # It is the path to the model folder (./azureml-models/$MODEL_NAME/$VERSION)\n", + " # For multiple models, it points to the folder containing all deployed models (./azureml-models)\n", + " model_path = os.path.join(os.getenv('AZUREML_MODEL_DIR'), 'sklearn_mnist_model.pkl')\n", + " model = joblib.load(model_path)\n", + "\n", + "def run(raw_data):\n", + " data = np.array(json.loads(raw_data)['data'])\n", + " # make prediction\n", + " y_hat = model.predict(data)\n", + " # you can return any data type as long as it is JSON-serializable\n", + " return y_hat.tolist()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "### Predict test data\n", + "### Create configuration file\n", "\n", - "Feed the test dataset to the model to get predictions." + "Create a deployment configuration file and specify the number of CPUs and gigabyte of RAM needed for your ACI container. While it depends on your model, the default of 1 core and 1 gigabyte of RAM is usually sufficient for many models. If you feel you need more later, you would have to recreate the image and redeploy the service." ] }, { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [ + "configure web service", + "aci" + ] + }, "outputs": [], "source": [ - "import pickle\n", - "from sklearn.externals import joblib\n", + "from azureml.core.webservice import AciWebservice\n", "\n", - "clf = joblib.load( os.path.join(os.getcwd(), 'sklearn_mnist_model.pkl'))\n", - "y_hat = clf.predict(X_test)" + "aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, \n", + " memory_gb=1, \n", + " tags={\"data\": \"MNIST\", \"method\" : \"sklearn\"}, \n", + " description='Predict MNIST with sklearn')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "### Examine the confusion matrix\n", + "### Deploy in ACI\n", + "Estimated time to complete: **about 2-5 minutes**\n", "\n", - "Generate a confusion matrix to see how many samples from the test set are classified correctly. Notice the mis-classified value for the incorrect predictions." + "Configure the image and deploy. The following code goes through these steps:\n", + "\n", + "1. Create environment object containing dependencies needed by the model using the environment file (`myenv.yml`)\n", + "1. Create inference configuration necessary to deploy the model as a web service using:\n", + " * The scoring file (`score.py`)\n", + " * envrionment object created in previous step\n", + "1. Deploy the model to the ACI container.\n", + "1. Get the web service HTTP endpoint." ] }, { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [ + "configure image", + "create image", + "deploy web service", + "aci" + ] + }, "outputs": [], "source": [ - "from sklearn.metrics import confusion_matrix\n", + "%%time\n", + "from azureml.core.webservice import Webservice\n", + "from azureml.core.model import InferenceConfig\n", + "from azureml.core.environment import Environment\n", + "from azureml.core import Workspace\n", + "from azureml.core.model import Model\n", "\n", - "conf_mx = confusion_matrix(y_test, y_hat)\n", - "print(conf_mx)\n", - "print('Overall accuracy:', np.average(y_hat == y_test))" + "ws = Workspace.from_config()\n", + "model = Model(ws, 'sklearn_mnist')\n", + "\n", + "\n", + "myenv = Environment.get(workspace=ws, name=\"tutorial-env\", version=\"1\")\n", + "inference_config = InferenceConfig(entry_script=\"score.py\", environment=myenv)\n", + "\n", + "service = Model.deploy(workspace=ws, \n", + " name='sklearn-mnist-svc', \n", + " models=[model], \n", + " inference_config=inference_config, \n", + " deployment_config=aciconfig)\n", + "\n", + "service.wait_for_deployment(show_output=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Use `matplotlib` to display the confusion matrix as a graph. In this graph, the X axis represents the actual values, and the Y axis represents the predicted values. The color in each grid represents the error rate. The lighter the color, the higher the error rate is. For example, many 5's are mis-classified as 3's. Hence you see a bright grid at (5,3)." + "Get the scoring web service's HTTP endpoint, which accepts REST client calls. This endpoint can be shared with anyone who wants to test the web service or integrate it into an application." ] }, { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "tags": [ + "get scoring uri" + ] + }, "outputs": [], "source": [ - "# normalize the diagonal cells so that they don't overpower the rest of the cells when visualized\n", - "row_sums = conf_mx.sum(axis=1, keepdims=True)\n", - "norm_conf_mx = conf_mx / row_sums\n", - "np.fill_diagonal(norm_conf_mx, 0)\n", - "\n", - "fig = plt.figure(figsize=(8,5))\n", - "ax = fig.add_subplot(111)\n", - "cax = ax.matshow(norm_conf_mx, cmap=plt.cm.bone)\n", - "ticks = np.arange(0, 10, 1)\n", - "ax.set_xticks(ticks)\n", - "ax.set_yticks(ticks)\n", - "ax.set_xticklabels(ticks)\n", - "ax.set_yticklabels(ticks)\n", - "fig.colorbar(cax)\n", - "plt.ylabel('true labels', fontsize=14)\n", - "plt.xlabel('predicted values', fontsize=14)\n", - "plt.savefig('conf.png')\n", - "plt.show()" + "print(service.scoring_uri)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## Deploy as web service\n", - "\n", - "Once you've tested the model and are satisfied with the results, deploy the model as a web service hosted in ACI. \n", - "\n", - "To build the correct environment for ACI, provide the following:\n", - "* A scoring script to show how to use the model\n", - "* An environment file to show what packages need to be installed\n", - "* A configuration file to build the ACI\n", - "* The model you trained before\n", - "\n", - "### Create scoring script\n", - "\n", - "Create the scoring script, called score.py, used by the web service call to show how to use the model.\n", - "\n", - "You must include two required functions into the scoring script:\n", - "* The `init()` function, which typically loads the model into a global object. This function is run only once when the Docker container is started. \n", - "\n", - "* The `run(input_data)` function uses the model to predict a value based on the input data. Inputs and outputs to the run typically use JSON for serialization and de-serialization, but other formats are supported.\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%writefile score.py\n", - "import json\n", - "import numpy as np\n", - "import os\n", - "import pickle\n", - "from sklearn.externals import joblib\n", - "from sklearn.linear_model import LogisticRegression\n", - "\n", - "def init():\n", - " global model\n", - " # AZUREML_MODEL_DIR is an environment variable created during deployment.\n", - " # It is the path to the model folder (./azureml-models/$MODEL_NAME/$VERSION)\n", - " # For multiple models, it points to the folder containing all deployed models (./azureml-models)\n", - " model_path = os.path.join(os.getenv('AZUREML_MODEL_DIR'), 'sklearn_mnist_model.pkl')\n", - " model = joblib.load(model_path)\n", - "\n", - "def run(raw_data):\n", - " data = np.array(json.loads(raw_data)['data'])\n", - " # make prediction\n", - " y_hat = model.predict(data)\n", - " # you can return any data type as long as it is JSON-serializable\n", - " return y_hat.tolist()" + "## Test the model\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "### Create environment file\n", - "\n", - "Next, create an environment file, called myenv.yml, that specifies all of the script's package dependencies. This file is used to ensure that all of those dependencies are installed in the Docker image. This model needs `scikit-learn` and `azureml-sdk`." + "### Download test data\n", + "Download the test data to the **./data/** directory" ] }, { "cell_type": "code", "execution_count": null, - "metadata": { - "tags": [ - "set conda dependencies" - ] - }, + "metadata": {}, "outputs": [], "source": [ - "from azureml.core.conda_dependencies import CondaDependencies \n", + "import os\n", + "from azureml.core import Dataset\n", + "from azureml.opendatasets import MNIST\n", "\n", - "myenv = CondaDependencies()\n", - "myenv.add_conda_package(\"scikit-learn\")\n", + "data_folder = os.path.join(os.getcwd(), 'data')\n", + "os.makedirs(data_folder, exist_ok=True)\n", "\n", - "with open(\"myenv.yml\",\"w\") as f:\n", - " f.write(myenv.serialize_to_string())" + "mnist_file_dataset = MNIST.get_file_dataset()\n", + "mnist_file_dataset.download(data_folder, overwrite=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Review the content of the `myenv.yml` file." + "### Load test data\n", + "\n", + "Load the test data from the **./data/** directory created during the training tutorial." ] }, { @@ -346,119 +306,106 @@ "metadata": {}, "outputs": [], "source": [ - "with open(\"myenv.yml\",\"r\") as f:\n", - " print(f.read())" + "from utils import load_data\n", + "import os\n", + "import glob\n", + "\n", + "data_folder = os.path.join(os.getcwd(), 'data')\n", + "# note we also shrink the intensity values (X) from 0-255 to 0-1. This helps the neural network converge faster\n", + "X_test = load_data(glob.glob(os.path.join(data_folder,\"**/t10k-images-idx3-ubyte.gz\"), recursive=True)[0], False) / 255.0\n", + "y_test = load_data(glob.glob(os.path.join(data_folder,\"**/t10k-labels-idx1-ubyte.gz\"), recursive=True)[0], True).reshape(-1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "### Create configuration file\n", + "### Predict test data\n", "\n", - "Create a deployment configuration file and specify the number of CPUs and gigabyte of RAM needed for your ACI container. While it depends on your model, the default of 1 core and 1 gigabyte of RAM is usually sufficient for many models. If you feel you need more later, you would have to recreate the image and redeploy the service." + "Feed the test dataset to the model to get predictions.\n", + "\n", + "\n", + "The following code goes through these steps:\n", + "1. Send the data as a JSON array to the web service hosted in ACI. \n", + "\n", + "1. Use the SDK's `run` API to invoke the service. You can also make raw calls using any HTTP tool such as curl." ] }, { "cell_type": "code", "execution_count": null, - "metadata": { - "tags": [ - "configure web service", - "aci" - ] - }, + "metadata": {}, "outputs": [], "source": [ - "from azureml.core.webservice import AciWebservice\n", - "\n", - "aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, \n", - " memory_gb=1, \n", - " tags={\"data\": \"MNIST\", \"method\" : \"sklearn\"}, \n", - " description='Predict MNIST with sklearn')" + "import json\n", + "test = json.dumps({\"data\": X_test.tolist()})\n", + "test = bytes(test, encoding='utf8')\n", + "y_hat = service.run(input_data=test)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "### Deploy in ACI\n", - "Estimated time to complete: **about 7-8 minutes**\n", - "\n", - "Configure the image and deploy. The following code goes through these steps:\n", + "### Examine the confusion matrix\n", "\n", - "1. Build an image using:\n", - " * The scoring file (`score.py`)\n", - " * The environment file (`myenv.yml`)\n", - " * The model file\n", - "1. Register that image under the workspace. \n", - "1. Send the image to the ACI container.\n", - "1. Start up a container in ACI using the image.\n", - "1. Get the web service HTTP endpoint." + "Generate a confusion matrix to see how many samples from the test set are classified correctly. Notice the mis-classified value for the incorrect predictions." ] }, { "cell_type": "code", "execution_count": null, - "metadata": { - "tags": [ - "configure image", - "create image", - "deploy web service", - "aci" - ] - }, + "metadata": {}, "outputs": [], "source": [ - "%%time\n", - "from azureml.core.webservice import Webservice\n", - "from azureml.core.model import InferenceConfig\n", - "\n", - "inference_config = InferenceConfig(runtime= \"python\", \n", - " entry_script=\"score.py\",\n", - " conda_file=\"myenv.yml\")\n", - "\n", - "service = Model.deploy(workspace=ws, \n", - " name='sklearn-mnist-svc', \n", - " models=[model], \n", - " inference_config=inference_config, \n", - " deployment_config=aciconfig)\n", + "from sklearn.metrics import confusion_matrix\n", "\n", - "service.wait_for_deployment(show_output=True)" + "conf_mx = confusion_matrix(y_test, y_hat)\n", + "print(conf_mx)\n", + "print('Overall accuracy:', np.average(y_hat == y_test))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Get the scoring web service's HTTP endpoint, which accepts REST client calls. This endpoint can be shared with anyone who wants to test the web service or integrate it into an application." + "Use `matplotlib` to display the confusion matrix as a graph. In this graph, the X axis represents the actual values, and the Y axis represents the predicted values. The color in each grid represents the error rate. The lighter the color, the higher the error rate is. For example, many 5's are mis-classified as 3's. Hence you see a bright grid at (5,3)." ] }, { "cell_type": "code", "execution_count": null, - "metadata": { - "tags": [ - "get scoring uri" - ] - }, + "metadata": {}, "outputs": [], "source": [ - "print(service.scoring_uri)" + "# normalize the diagonal cells so that they don't overpower the rest of the cells when visualized\n", + "row_sums = conf_mx.sum(axis=1, keepdims=True)\n", + "norm_conf_mx = conf_mx / row_sums\n", + "np.fill_diagonal(norm_conf_mx, 0)\n", + "\n", + "fig = plt.figure(figsize=(8,5))\n", + "ax = fig.add_subplot(111)\n", + "cax = ax.matshow(norm_conf_mx, cmap=plt.cm.bone)\n", + "ticks = np.arange(0, 10, 1)\n", + "ax.set_xticks(ticks)\n", + "ax.set_yticks(ticks)\n", + "ax.set_xticklabels(ticks)\n", + "ax.set_yticklabels(ticks)\n", + "fig.colorbar(cax)\n", + "plt.ylabel('true labels', fontsize=14)\n", + "plt.xlabel('predicted values', fontsize=14)\n", + "plt.savefig('conf.png')\n", + "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## Test deployed service\n", - "\n", - "Earlier you scored all the test data with the local version of the model. Now, you can test the deployed model with a random sample of 30 images from the test data. \n", + "## Show predictions\n", "\n", - "The following code goes through these steps:\n", - "1. Send the data as a JSON array to the web service hosted in ACI. \n", + "Test the deployed model with a random sample of 30 images from the test data. \n", "\n", - "1. Use the SDK's `run` API to invoke the service. You can also make raw calls using any HTTP tool such as curl.\n", "\n", "1. Print the returned predictions and plot them along with the input images. Red font and inverse image (white on black) is used to highlight the misclassified samples. \n", "\n", @@ -616,7 +563,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.6.6" + "version": "3.7.6" }, "msauthor": "sgilley" }, diff --git a/tutorials/img-classification-part2-deploy.yml b/tutorials/image-classification-mnist-data/img-classification-part2-deploy.yml similarity index 73% rename from tutorials/img-classification-part2-deploy.yml rename to tutorials/image-classification-mnist-data/img-classification-part2-deploy.yml index bc88852c7..b3d47ee5c 100644 --- a/tutorials/img-classification-part2-deploy.yml +++ b/tutorials/image-classification-mnist-data/img-classification-part2-deploy.yml @@ -4,3 +4,5 @@ dependencies: - azureml-sdk - matplotlib - sklearn + - pandas + - azureml-opendatasets diff --git a/tutorials/image-classification-mnist-data/sklearn_mnist_model.pkl b/tutorials/image-classification-mnist-data/sklearn_mnist_model.pkl new file mode 100644 index 000000000..611d69706 Binary files /dev/null and b/tutorials/image-classification-mnist-data/sklearn_mnist_model.pkl differ diff --git a/tutorials/utils.py b/tutorials/image-classification-mnist-data/utils.py similarity index 100% rename from tutorials/utils.py rename to tutorials/image-classification-mnist-data/utils.py diff --git a/tutorials/imgs/experiment_main.png b/tutorials/imgs/experiment_main.png deleted file mode 100644 index bb3e51af7..000000000 Binary files a/tutorials/imgs/experiment_main.png and /dev/null differ diff --git a/tutorials/imgs/model_download.png b/tutorials/imgs/model_download.png deleted file mode 100644 index e07fc1db6..000000000 Binary files a/tutorials/imgs/model_download.png and /dev/null differ diff --git a/tutorials/machine-learning-pipelines-advanced/scripts/batch_scoring.py b/tutorials/machine-learning-pipelines-advanced/scripts/batch_scoring.py new file mode 100644 index 000000000..3b5e3dbcf --- /dev/null +++ b/tutorials/machine-learning-pipelines-advanced/scripts/batch_scoring.py @@ -0,0 +1,83 @@ +# Copyright (c) Microsoft. All rights reserved. +# Licensed under the MIT license. + +import os +import argparse +import datetime +import time +import tensorflow as tf +from math import ceil +import numpy as np +import shutil +from tensorflow.contrib.slim.python.slim.nets import inception_v3 + +from azureml.core import Run +from azureml.core.model import Model +from azureml.core.dataset import Dataset + +slim = tf.contrib.slim + +image_size = 299 +num_channel = 3 + + +def get_class_label_dict(): + label = [] + proto_as_ascii_lines = tf.gfile.GFile("labels.txt").readlines() + for l in proto_as_ascii_lines: + label.append(l.rstrip()) + return label + + +def init(): + global g_tf_sess, probabilities, label_dict, input_images + + parser = argparse.ArgumentParser(description="Start a tensorflow model serving") + parser.add_argument('--model_name', dest="model_name", required=True) + parser.add_argument('--labels_name', dest="labels_name", required=True) + args, _ = parser.parse_known_args() + + workspace = Run.get_context(allow_offline=False).experiment.workspace + label_ds = Dataset.get_by_name(workspace=workspace, name=args.labels_name) + label_ds.download(target_path='.', overwrite=True) + + label_dict = get_class_label_dict() + classes_num = len(label_dict) + + with slim.arg_scope(inception_v3.inception_v3_arg_scope()): + input_images = tf.placeholder(tf.float32, [1, image_size, image_size, num_channel]) + logits, _ = inception_v3.inception_v3(input_images, + num_classes=classes_num, + is_training=False) + probabilities = tf.argmax(logits, 1) + + config = tf.ConfigProto() + config.gpu_options.allow_growth = True + g_tf_sess = tf.Session(config=config) + g_tf_sess.run(tf.global_variables_initializer()) + g_tf_sess.run(tf.local_variables_initializer()) + + model_path = Model.get_model_path(args.model_name) + saver = tf.train.Saver() + saver.restore(g_tf_sess, model_path) + + +def file_to_tensor(file_path): + image_string = tf.read_file(file_path) + image = tf.image.decode_image(image_string, channels=3) + + image.set_shape([None, None, None]) + image = tf.image.resize_images(image, [image_size, image_size]) + image = tf.divide(tf.subtract(image, [0]), [255]) + image.set_shape([image_size, image_size, num_channel]) + return image + + +def run(mini_batch): + result_list = [] + for file_path in mini_batch: + test_image = file_to_tensor(file_path) + out = g_tf_sess.run(test_image) + result = g_tf_sess.run(probabilities, feed_dict={input_images: [out]}) + result_list.append(os.path.basename(file_path) + ": " + label_dict[result[0]]) + return result_list diff --git a/tutorials/tutorial-pipeline-batch-scoring-classification.ipynb b/tutorials/machine-learning-pipelines-advanced/tutorial-pipeline-batch-scoring-classification.ipynb similarity index 66% rename from tutorials/tutorial-pipeline-batch-scoring-classification.ipynb rename to tutorials/machine-learning-pipelines-advanced/tutorial-pipeline-batch-scoring-classification.ipynb index b0b42b8c3..7d067225e 100644 --- a/tutorials/tutorial-pipeline-batch-scoring-classification.ipynb +++ b/tutorials/machine-learning-pipelines-advanced/tutorial-pipeline-batch-scoring-classification.ipynb @@ -12,14 +12,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/machine-learning-pipelines/pipeline-batch-scoring/pipeline-batch-scoring.png)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "**Note**: Azure Machine Learning recently released ParallelRunStep for public preview, this will allow for parallelization of your workload across many compute nodes without the difficulty of orchestrating worker pools and queues. See the [batch inference notebooks](../contrib/batch_inferencing/) for examples on how to get started." + "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/tutorials/machine-learning-pipelines-advanced/tutorial-pipeline-batch-scoring-classification.png)" ] }, { @@ -28,6 +21,10 @@ "source": [ "# Use Azure Machine Learning Pipelines for batch prediction\n", "\n", + "## Note\n", + "This notebook uses public preview functionality (ParallelRunStep). Please install azureml-contrib-pipeline-steps package before running this notebook.\n", + "\n", + "\n", "In this tutorial, you use Azure Machine Learning service pipelines to run a batch scoring image classification job. The example job uses the pre-trained [Inception-V3](https://arxiv.org/abs/1512.00567) CNN (convolutional neural network) Tensorflow model to classify unlabeled images. Machine learning pipelines optimize your workflow with speed, portability, and reuse so you can focus on your expertise, machine learning, rather than on infrastructure and automation. After building and publishing a pipeline, you can configure a REST endpoint to enable triggering the pipeline from any HTTP library on any platform.\n", "\n", "\n", @@ -37,6 +34,7 @@ "> * Create data objects to fetch and output data\n", "> * Download, prepare, and register the model to your workspace\n", "> * Provision compute targets and create a scoring script\n", + "> * Use ParallelRunStep to do batch scoring\n", "> * Build, run, and publish a pipeline\n", "> * Enable a REST endpoint for the pipeline\n", "\n", @@ -111,14 +109,14 @@ "source": [ "## Create data objects\n", "\n", - "When building pipelines, `DataReference` objects are used for reading data from workspace datastores, and `PipelineData` objects are used for transferring intermediate data between pipeline steps.\n", + "When building pipelines, `Dataset` objects are used for reading data from workspace datastores, and `PipelineData` objects are used for transferring intermediate data between pipeline steps.\n", "\n", "This batch scoring example only uses one pipeline step, but in use-cases with multiple steps, the typical flow will include:\n", "\n", - "1. Using `DataReference` objects as **inputs** to fetch raw data, performing some transformations, then **outputting** a `PipelineData` object.\n", + "1. Using `Dataset` objects as **inputs** to fetch raw data, performing some transformations, then **outputting** a `PipelineData` object.\n", "1. Use the previous step's `PipelineData` **output object** as an *input object*, repeated for subsequent steps.\n", "\n", - "For this scenario you create `DataReference` objects corresponding to the datastore directories for both the input images and the classification labels (y-test values). You also create a `PipelineData` object for the batch scoring output data." + "For this scenario you create `Dataset` objects corresponding to the datastore directories for both the input images and the classification labels (y-test values). You also create a `PipelineData` object for the batch scoring output data." ] }, { @@ -127,21 +125,11 @@ "metadata": {}, "outputs": [], "source": [ - "from azureml.data.data_reference import DataReference\n", + "from azureml.core.dataset import Dataset\n", "from azureml.pipeline.core import PipelineData\n", "\n", - "input_images = DataReference(datastore=batchscore_blob, \n", - " data_reference_name=\"input_images\",\n", - " path_on_datastore=\"batchscoring/images\",\n", - " mode=\"download\"\n", - " )\n", - "\n", - "label_dir = DataReference(datastore=batchscore_blob, \n", - " data_reference_name=\"input_labels\",\n", - " path_on_datastore=\"batchscoring/labels\",\n", - " mode=\"download\" \n", - " )\n", - "\n", + "input_images = Dataset.File.from_files((batchscore_blob, \"batchscoring/images/\"))\n", + "label_ds = Dataset.File.from_files((batchscore_blob, \"batchscoring/labels/*.txt\"))\n", "output_dir = PipelineData(name=\"scores\", \n", " datastore=def_data_store, \n", " output_path_on_compute=\"batchscoring/results\")" @@ -150,6 +138,25 @@ { "cell_type": "markdown", "metadata": {}, + "source": [ + "Next, we need to register the datasets with the workspace." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "input_images = input_images.register(workspace = ws, name = \"input_images\")\n", + "label_ds = label_ds.register(workspace = ws, name = \"label_ds\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], "source": [ "## Download and register the model" ] @@ -192,13 +199,17 @@ "metadata": {}, "outputs": [], "source": [ + "import shutil\n", "from azureml.core.model import Model\n", - " \n", + "\n", + "# register downloaded model \n", "model = Model.register(model_path=\"models/inception_v3.ckpt\",\n", " model_name=\"inception\",\n", " tags={\"pretrained\": \"inception\"},\n", " description=\"Imagenet trained tensorflow inception\",\n", - " workspace=ws)" + " workspace=ws)\n", + "# remove the downloaded dir after registration if you wish\n", + "shutil.rmtree(\"models\")" ] }, { @@ -244,142 +255,16 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "To do the scoring, you create a batch scoring script `batch_scoring.py`, and write it to the current directory. The script takes input images, applies the classification model, and outputs the predictions to a results file.\n", + "To do the scoring, you create a batch scoring script `batch_scoring.py`, and write it to the current directory. The script takes a minibatch of input images, applies the classification model, and outputs the predictions to a results file.\n", "\n", - "The script `batch_scoring.py` takes the following parameters, which get passed from the `PythonScriptStep` that you create later:\n", + "The script `batch_scoring.py` takes the following parameters, which get passed from the `ParallelRunStep` that you create later:\n", "\n", "- `--model_name`: the name of the model being used\n", - "- `--label_dir` : the directory holding the `labels.txt` file \n", - "- `--dataset_path`: the directory containing the input images\n", - "- `--output_dir` : the script will run the model on the data and output a `results-label.txt` to this directory\n", - "- `--batch_size` : the batch size used in running the model\n", + "- `--labels_name` : the name of the `Dataset` holding the `labels.txt` file \n", "\n", "The pipelines infrastructure uses the `ArgumentParser` class to pass parameters into pipeline steps. For example, in the code below the first argument `--model_name` is given the property identifier `model_name`. In the `main()` function, this property is accessed using `Model.get_model_path(args.model_name)`." ] }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%%writefile batch_scoring.py\n", - "\n", - "import os\n", - "import argparse\n", - "import datetime\n", - "import time\n", - "import tensorflow as tf\n", - "from math import ceil\n", - "import numpy as np\n", - "import shutil\n", - "from tensorflow.contrib.slim.python.slim.nets import inception_v3\n", - "from azureml.core.model import Model\n", - "\n", - "slim = tf.contrib.slim\n", - "\n", - "parser = argparse.ArgumentParser(description=\"Start a tensorflow model serving\")\n", - "parser.add_argument('--model_name', dest=\"model_name\", required=True)\n", - "parser.add_argument('--label_dir', dest=\"label_dir\", required=True)\n", - "parser.add_argument('--dataset_path', dest=\"dataset_path\", required=True)\n", - "parser.add_argument('--output_dir', dest=\"output_dir\", required=True)\n", - "parser.add_argument('--batch_size', dest=\"batch_size\", type=int, required=True)\n", - "\n", - "args = parser.parse_args()\n", - "\n", - "image_size = 299\n", - "num_channel = 3\n", - "\n", - "# create output directory if it does not exist\n", - "os.makedirs(args.output_dir, exist_ok=True)\n", - "\n", - "\n", - "def get_class_label_dict(label_file):\n", - " label = []\n", - " proto_as_ascii_lines = tf.gfile.GFile(label_file).readlines()\n", - " for l in proto_as_ascii_lines:\n", - " label.append(l.rstrip())\n", - " return label\n", - "\n", - "\n", - "class DataIterator:\n", - " def __init__(self, data_dir):\n", - " self.file_paths = []\n", - " image_list = os.listdir(data_dir)\n", - " self.file_paths = [data_dir + '/' + file_name.rstrip() for file_name in image_list]\n", - "\n", - " self.labels = [1 for file_name in self.file_paths]\n", - "\n", - " @property\n", - " def size(self):\n", - " return len(self.labels)\n", - "\n", - " def input_pipeline(self, batch_size):\n", - " images_tensor = tf.convert_to_tensor(self.file_paths, dtype=tf.string)\n", - " labels_tensor = tf.convert_to_tensor(self.labels, dtype=tf.int64)\n", - " input_queue = tf.train.slice_input_producer([images_tensor, labels_tensor], shuffle=False)\n", - " labels = input_queue[1]\n", - " images_content = tf.read_file(input_queue[0])\n", - "\n", - " image_reader = tf.image.decode_jpeg(images_content, channels=num_channel, name=\"jpeg_reader\")\n", - " float_caster = tf.cast(image_reader, tf.float32)\n", - " new_size = tf.constant([image_size, image_size], dtype=tf.int32)\n", - " images = tf.image.resize_images(float_caster, new_size)\n", - " images = tf.divide(tf.subtract(images, [0]), [255])\n", - "\n", - " image_batch, label_batch = tf.train.batch([images, labels], batch_size=batch_size, capacity=5 * batch_size)\n", - " return image_batch\n", - "\n", - "\n", - "def main(_):\n", - " label_file_name = os.path.join(args.label_dir, \"labels.txt\")\n", - " label_dict = get_class_label_dict(label_file_name)\n", - " classes_num = len(label_dict)\n", - " test_feeder = DataIterator(data_dir=args.dataset_path)\n", - " total_size = len(test_feeder.labels)\n", - " count = 0\n", - " \n", - " # get model from model registry\n", - " model_path = Model.get_model_path(args.model_name)\n", - " \n", - " with tf.Session() as sess:\n", - " test_images = test_feeder.input_pipeline(batch_size=args.batch_size)\n", - " with slim.arg_scope(inception_v3.inception_v3_arg_scope()):\n", - " input_images = tf.placeholder(tf.float32, [args.batch_size, image_size, image_size, num_channel])\n", - " logits, _ = inception_v3.inception_v3(input_images,\n", - " num_classes=classes_num,\n", - " is_training=False)\n", - " probabilities = tf.argmax(logits, 1)\n", - "\n", - " sess.run(tf.global_variables_initializer())\n", - " sess.run(tf.local_variables_initializer())\n", - " coord = tf.train.Coordinator()\n", - " threads = tf.train.start_queue_runners(sess=sess, coord=coord)\n", - " saver = tf.train.Saver()\n", - " saver.restore(sess, model_path)\n", - " out_filename = os.path.join(args.output_dir, \"result-labels.txt\")\n", - " with open(out_filename, \"w\") as result_file:\n", - " i = 0\n", - " while count < total_size and not coord.should_stop():\n", - " test_images_batch = sess.run(test_images)\n", - " file_names_batch = test_feeder.file_paths[i * args.batch_size:\n", - " min(test_feeder.size, (i + 1) * args.batch_size)]\n", - " results = sess.run(probabilities, feed_dict={input_images: test_images_batch})\n", - " new_add = min(args.batch_size, total_size - count)\n", - " count += new_add\n", - " i += 1\n", - " for j in range(new_add):\n", - " result_file.write(os.path.basename(file_names_batch[j]) + \": \" + label_dict[results[j]] + \"\\n\")\n", - " result_file.flush()\n", - " coord.request_stop()\n", - " coord.join(threads)\n", - "\n", - " shutil.copy(out_filename, \"./outputs/\")\n", - "\n", - "if __name__ == \"__main__\":\n", - " tf.app.run()" - ] - }, { "cell_type": "markdown", "metadata": {}, @@ -407,26 +292,23 @@ "metadata": {}, "outputs": [], "source": [ + "from azureml.core import Environment\n", + "from azureml.core.conda_dependencies import CondaDependencies\n", "from azureml.core.runconfig import DEFAULT_GPU_IMAGE\n", - "from azureml.core.runconfig import CondaDependencies, RunConfiguration\n", "\n", "cd = CondaDependencies.create(pip_packages=[\"tensorflow-gpu==1.13.1\", \"azureml-defaults\"])\n", "\n", - "amlcompute_run_config = RunConfiguration(conda_dependencies=cd)\n", - "amlcompute_run_config.environment.docker.enabled = True\n", - "amlcompute_run_config.environment.docker.base_image = DEFAULT_GPU_IMAGE\n", - "amlcompute_run_config.environment.spark.precache_packages = False" + "env = Environment(name=\"parallelenv\")\n", + "env.python.conda_dependencies=cd\n", + "env.docker.base_image = DEFAULT_GPU_IMAGE" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "### Parameterize the pipeline\n", - "\n", - "Define a custom parameter for the pipeline to control the batch size. After the pipeline has been published and exposed via a REST endpoint, any configured parameters are also exposed and can be specified in the JSON payload when rerunning the pipeline with an HTTP request.\n", - "\n", - "Create a `PipelineParameter` object to enable this behavior, and define a name and default value." + "### Create the configuration to wrap the inference script\n", + "Create the pipeline step using the script, environment configuration, and parameters. Specify the compute target you already attached to your workspace as the target of execution of the script. We will use PythonScriptStep to create the pipeline step." ] }, { @@ -435,8 +317,19 @@ "metadata": {}, "outputs": [], "source": [ - "from azureml.pipeline.core.graph import PipelineParameter\n", - "batch_size_param = PipelineParameter(name=\"param_batch_size\", default_value=20)" + "from azureml.contrib.pipeline.steps import ParallelRunConfig\n", + "\n", + "parallel_run_config = ParallelRunConfig(\n", + " environment=env,\n", + " entry_script=\"batch_scoring.py\",\n", + " source_directory=\"scripts\",\n", + " output_action=\"append_row\",\n", + " mini_batch_size=\"20\",\n", + " error_threshold=1,\n", + " compute_target=compute_target,\n", + " process_count_per_node=2,\n", + " node_count=1\n", + ")" ] }, { @@ -452,7 +345,7 @@ "* input and output data, and any custom parameters\n", "* reference to a script or SDK-logic to run during the step\n", "\n", - "There are multiple classes that inherit from the parent class [`PipelineStep`](https://docs.microsoft.com/python/api/azureml-pipeline-core/azureml.pipeline.core.builder.pipelinestep?view=azure-ml-py) to assist with building a step using certain frameworks and stacks. In this example, you use the [`PythonScriptStep`](https://docs.microsoft.com/python/api/azureml-pipeline-steps/azureml.pipeline.steps.python_script_step.pythonscriptstep?view=azure-ml-py) class to define your step logic using a custom python script. Note that if an argument to your script is either an input to the step or output of the step, it must be defined **both** in the `arguments` array, **as well as** in either the `input` or `output` parameter, respectively. \n", + "There are multiple classes that inherit from the parent class [`PipelineStep`](https://docs.microsoft.com/python/api/azureml-pipeline-core/azureml.pipeline.core.builder.pipelinestep?view=azure-ml-py) to assist with building a step using certain frameworks and stacks. In this example, you use the [`ParallelRunStep`](https://docs.microsoft.com/en-us/python/api/azureml-contrib-pipeline-steps/azureml.contrib.pipeline.steps.parallelrunstep?view=azure-ml-py) class to define your step logic using a scoring script. \n", "\n", "An object reference in the `outputs` array becomes available as an **input** for a subsequent pipeline step, for scenarios where there is more than one step." ] @@ -463,20 +356,20 @@ "metadata": {}, "outputs": [], "source": [ - "from azureml.pipeline.steps import PythonScriptStep\n", + "from azureml.contrib.pipeline.steps import ParallelRunStep\n", + "from datetime import datetime\n", "\n", - "batch_score_step = PythonScriptStep(\n", - " name=\"batch_scoring\",\n", - " script_name=\"batch_scoring.py\",\n", - " arguments=[\"--dataset_path\", input_images, \n", - " \"--model_name\", \"inception\",\n", - " \"--label_dir\", label_dir, \n", - " \"--output_dir\", output_dir, \n", - " \"--batch_size\", batch_size_param],\n", - " compute_target=compute_target,\n", - " inputs=[input_images, label_dir],\n", - " outputs=[output_dir],\n", - " runconfig=amlcompute_run_config\n", + "parallel_step_name = \"batchscoring-\" + datetime.now().strftime(\"%Y%m%d%H%M\")\n", + "\n", + "batch_score_step = ParallelRunStep(\n", + " name=parallel_step_name,\n", + " inputs=[input_images.as_named_input(\"input_images\")],\n", + " output=output_dir,\n", + " models=[model],\n", + " arguments=[\"--model_name\", \"inception\",\n", + " \"--labels_name\", \"label_ds\"],\n", + " parallel_run_config=parallel_run_config,\n", + " allow_reuse=False\n", ")" ] }, @@ -510,7 +403,7 @@ "from azureml.pipeline.core import Pipeline\n", "\n", "pipeline = Pipeline(workspace=ws, steps=[batch_score_step])\n", - "pipeline_run = Experiment(ws, 'batch_scoring').submit(pipeline, pipeline_parameters={\"param_batch_size\": 20})\n", + "pipeline_run = Experiment(ws, \"batch_scoring\").submit(pipeline)\n", "pipeline_run.wait_for_completion(show_output=True)" ] }, @@ -534,14 +427,20 @@ "metadata": {}, "outputs": [], "source": [ - "import pandas as pd\n", + "batch_run = next(pipeline_run.get_children())\n", + "batch_output = batch_run.get_output_data(\"scores\")\n", + "batch_output.download(local_path=\"inception_results\")\n", "\n", - "step_run = list(pipeline_run.get_children())[0]\n", - "step_run.download_file(\"./outputs/result-labels.txt\")\n", + "import pandas as pd\n", + "for root, dirs, files in os.walk(\"inception_results\"):\n", + " for file in files:\n", + " if file.endswith(\"parallel_run_step.txt\"):\n", + " result_file = os.path.join(root,file)\n", "\n", - "df = pd.read_csv(\"result-labels.txt\", delimiter=\":\", header=None)\n", + "df = pd.read_csv(result_file, delimiter=\":\", header=None)\n", "df.columns = [\"Filename\", \"Prediction\"]\n", - "df.head(10)" + "print(\"Prediction has \", df.shape[0], \" rows\")\n", + "df.head(10) " ] }, { @@ -599,7 +498,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Get the REST url from the `endpoint` property of the published pipeline object. You can also find the REST url in your workspace in the portal. Build an HTTP POST request to the endpoint, specifying your authentication header. Additionally, add a JSON payload object with the experiment name and the batch size parameter. As a reminder, the `param_batch_size` is passed through to your `batch_scoring.py` script because you defined it as a `PipelineParameter` object in the step configuration.\n", + "Get the REST url from the `endpoint` property of the published pipeline object. You can also find the REST url in your workspace in the portal. Build an HTTP POST request to the endpoint, specifying your authentication header. Additionally, add a JSON payload object with the experiment name and the batch size parameter. As a reminder, the `process_count_per_node` is passed through to `ParallelRunStep` because you defined it is defined as a `PipelineParameter` object in the step configuration.\n", "\n", "Make the request to trigger the run. Access the `Id` key from the response dict to get the value of the run id." ] @@ -616,8 +515,25 @@ "response = requests.post(rest_endpoint, \n", " headers=auth_header, \n", " json={\"ExperimentName\": \"batch_scoring\",\n", - " \"ParameterAssignments\": {\"param_batch_size\": 50}})\n", - "run_id = response.json()[\"Id\"]" + " \"ParameterAssignments\": {\"process_count_per_node\": 6}})" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "try:\n", + " response.raise_for_status()\n", + "except Exception: \n", + " raise Exception(\"Received bad response from the endpoint: {}\\n\"\n", + " \"Response Code: {}\\n\"\n", + " \"Headers: {}\\n\"\n", + " \"Content: {}\".format(rest_endpoint, response.status_code, response.headers, response.content))\n", + "\n", + "run_id = response.json().get('Id')\n", + "print('Submitted pipeline run: ', run_id)" ] }, { @@ -652,7 +568,8 @@ "\n", "If you used a cloud notebook server, stop the VM when you are not using it to reduce cost.\n", "\n", - "1. In your workspace, select **Notebook VMs**.\n", + "1. In your workspace, select **Compute**.\n", + "1. Select the **Notebook VMs** tab in the compute page.\n", "1. From the list, select the VM.\n", "1. Select **Stop**.\n", "1. When you're ready to use the server again, select **Start**.\n", @@ -683,19 +600,16 @@ "\n", "See the [how-to](https://docs.microsoft.com/azure/machine-learning/service/how-to-create-your-first-pipeline?view=azure-devops) for additional detail on building pipelines with the machine learning SDK." ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] } ], "metadata": { "authors": [ { - "name": "sanpil" + "name": [ + "sanpil", + "trmccorm", + "pansav" + ] } ], "kernelspec": { diff --git a/tutorials/tutorial-pipeline-batch-scoring-classification.yml b/tutorials/machine-learning-pipelines-advanced/tutorial-pipeline-batch-scoring-classification.yml similarity index 82% rename from tutorials/tutorial-pipeline-batch-scoring-classification.yml rename to tutorials/machine-learning-pipelines-advanced/tutorial-pipeline-batch-scoring-classification.yml index bb6402691..1e896b846 100644 --- a/tutorials/tutorial-pipeline-batch-scoring-classification.yml +++ b/tutorials/machine-learning-pipelines-advanced/tutorial-pipeline-batch-scoring-classification.yml @@ -3,7 +3,7 @@ dependencies: - pip: - azureml-sdk - azureml-pipeline-core - - azureml-pipeline-steps + - azureml-contrib-pipeline-steps - pandas - requests - azureml-widgets diff --git a/tutorials/regression-automated-ml.ipynb b/tutorials/regression-automl-nyc-taxi-data/regression-automated-ml.ipynb similarity index 99% rename from tutorials/regression-automated-ml.ipynb rename to tutorials/regression-automl-nyc-taxi-data/regression-automated-ml.ipynb index 6482feb1d..9b06e729f 100644 --- a/tutorials/regression-automated-ml.ipynb +++ b/tutorials/regression-automl-nyc-taxi-data/regression-automated-ml.ipynb @@ -564,7 +564,8 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "1. In your workspace, select **Notebook VMs**.\n", + "1. In your workspace, select **Compute**.\n", + "1. Select the **Notebook VMs** tab in the compute page.\n", "1. From the list, select the VM.\n", "1. Select **Stop**.\n", "1. When you're ready to use the server again, select **Start**." diff --git a/tutorials/regression-automated-ml.yml b/tutorials/regression-automl-nyc-taxi-data/regression-automated-ml.yml similarity index 100% rename from tutorials/regression-automated-ml.yml rename to tutorials/regression-automl-nyc-taxi-data/regression-automated-ml.yml diff --git a/tutorials/sklearn_mnist_model.pkl b/tutorials/sklearn_mnist_model.pkl deleted file mode 100644 index cec0edfef..000000000 Binary files a/tutorials/sklearn_mnist_model.pkl and /dev/null differ