Skip to content

Commit

Permalink
Merge pull request #31 from getindata/release-0.3.1
Browse files Browse the repository at this point in the history
Release 0.3.1
  • Loading branch information
marrrcin committed Nov 18, 2022
2 parents c799481 + 16bb6c3 commit 03788f8
Show file tree
Hide file tree
Showing 9 changed files with 81 additions and 70 deletions.
2 changes: 1 addition & 1 deletion .bumpversion.cfg
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[bumpversion]
current_version = 0.3.0
current_version = 0.3.1

[bumpversion:file:pyproject.toml]

Expand Down
2 changes: 1 addition & 1 deletion .copier-answers.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ description: Kedro plugin with Azure ML Pipelines support
docs_url: https://kedro-azureml.readthedocs.io/
full_name: Kedro Azure ML Pipelines plugin
github_url: https://github.com/getindata/kedro-azureml
initial_version: 0.3.0
initial_version: 0.3.1
keywords:
- kedro
- mlops
Expand Down
11 changes: 9 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,14 @@

## [Unreleased]

## [0.3.1] - 2022-11-18

- Fix default configuration, to make code upload as default
- Improved documentation and quickstart related to the code upload feature

## [0.3.0] - 2022-11-16

- Added support for execution via code upload for faster development cycles https://github.com/getindata/kedro-azureml/pull/15
- Added support for execution via code upload for faster development cycles <https://github.com/getindata/kedro-azureml/pull/15>
- Quickstart documentation improvements

## [0.2.2] - 2022-10-26
Expand All @@ -23,7 +28,9 @@

- Initial plugin release

[Unreleased]: https://github.com/getindata/kedro-azureml/compare/0.3.0...HEAD
[Unreleased]: https://github.com/getindata/kedro-azureml/compare/0.3.1...HEAD

[0.3.1]: https://github.com/getindata/kedro-azureml/compare/0.3.0...0.3.1

[0.3.0]: https://github.com/getindata/kedro-azureml/compare/0.2.2...0.3.0

Expand Down
6 changes: 5 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,11 @@
</p>

## About
Following plugin enables running Kedro pipelines on Azure ML Pipelines service
Following plugin enables running Kedro pipelines on Azure ML Pipelines service.

We support 2 native Azure Machine Learning types of workflows:
* For Data Scientists: fast, iterative development with code upload
* For MLOps: stable, repeatable workflows with Docker

## Documentation

Expand Down
122 changes: 61 additions & 61 deletions docs/source/03_quickstart.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ created in Azure and have their **names** ready to input to the plugin:
- Azure ML Compute Cluster
- Azure Storage Account and Storage Container
- Azure Storage Key (will be used to execute the pipeline)
- Azure Container Registry (optional)
- Azure Container Registry

1. Make sure that you're logged into Azure (``az login``).
2. Prepare new virtual environment with Python >=3.8. Install the
Expand Down Expand Up @@ -49,15 +49,15 @@ created in Azure and have their **names** ready to input to the plugin:
3. Go to the project's directory: ``cd kedro-azureml-demo``
4. Add ``kedro-azureml`` to ``src/requirements.txt``
5. (optional) Remove ``kedro-telemetry`` from ``src/requirements.txt``
or set appopriate settings
or set appropriate settings
(`https://github.com/kedro-org/kedro-plugins/tree/main/kedro-telemetry <https://github.com/kedro-org/kedro-plugins/tree/main/kedro-telemetry>`__).
6. Install the requirements ``pip install -r src/requirements.txt``
7. Initialize Kedro Azure ML plugin, it requires the Azure resource
names as stated above. Experiment name can be anything you like (as
long as it's allowed by Azure ML). The environment name is the name
of the Azure ML Environment to be created in the next step. You can
use the syntax <environment_name>@latest for the latest version or
<environment-name>:<version> for a specific version.
of the Azure ML Environment to be created in the next steps. You can
use the syntax ``<environment_name>@latest`` for the latest version or
``<environment-name>:<version>`` for a specific version.

.. code:: console
Expand All @@ -66,16 +66,38 @@ created in Azure and have their **names** ready to input to the plugin:
# STORAGE_CONTAINER ENVIRONMENT_NAME
kedro azureml init <resource-group-name> <workspace-name> <experiment-name> <compute-cluster-name> <storage-account-name> <storage-container-name> <environment-name>
8. Adjust the Data Catalog - the default one stores all data locally,
whereas the plugin will automatically use Azure Blob Storage. Only
input data is required to be read locally. Final
``conf/base/catalog.yml`` should look like this:

.. code:: yaml
companies:
type: pandas.CSVDataSet
filepath: data/01_raw/companies.csv
layer: raw
8. Create an Azure ML Environment for the project:
reviews:
type: pandas.CSVDataSet
filepath: data/01_raw/reviews.csv
layer: raw
shuttles:
type: pandas.ExcelDataSet
filepath: data/01_raw/shuttles.xlsx
layer: raw
9. Prepare an Azure ML Environment for the project:

For the project's code to run on Azure ML it needs to have an environment
with the necessary dependencies. Here is it shown how to do this from a
local Docker build context. Please refer to the
`Azure ML CLI documentation <https://learn.microsoft.com/en-us/azure/machine-learning/how-to-manage-environments-v2#create-an-environment>`__
for more options.
with the necessary dependencies.

Start by executing the following command:
You have 2 options for executing your pipeline in Azure ML
1. Use code upload (default) - more suitable for Data Scientists' experimentation and pipeline development
2. Use docker image flow (shown in the Quickstart video) - more suitable for MLOps processes with better experiment repeatability guarantees

Start by executing the following command:

.. code:: console
Expand All @@ -85,38 +107,47 @@ This command creates a several files, including ``Dockerfile`` and
``.dockerignore``. These can be adjusted to match the workflow for
your project.

You have 3 options for executing your pipeline in Azure ML
1. Use code upload (default) - more suitable for Data Scientists' experimentation and pipeline development
2. Use docker image flow (shown in the Quickstart video) - more suitable for MLOps processes with better experiment repeatability guarantees
3. Use docker flow with Azure ML CLI - suitable for workflows where docker is not available on the machine (Azure ML builds the image in this case)

Depending on whether you want to use code upload when submitting an
experiment or not, you would need to add the code and any possible input
data to the Docker image.

8.1. **If using code upload (default)**
9.1. **If using code upload** (default)

Everything apart from the section "install project requirements"
can be removed from the ``Dockerfile``. This plugin automatically creates empty ``.amlignore`` file (`see the official docs <https://learn.microsoft.com/en-us/azure/machine-learning/how-to-save-write-experiment-files#storage-limits-of-experiment-snapshots>`__)
which means that all of the files (including potentially sensitive ones!) will be uploaded to Azure ML. Modify this file if needed.

Set ``code_directory: "."`` in the ``azureml.yml`` config file.
Ensure ``code_directory: "."`` is set in the ``azureml.yml`` config file (it's set by default).

\Build the image:

.. code:: console
.. warning::
kedro docker build --docker-args "--build-arg=BASE_IMAGE=python:3.9" --image=<acr repo name>.azurecr.io/kedro-base-image:latest
\Login to ACR and push the image:

.. code:: console
| Make sure that you have the latest version of Azure CLI before running this command.
| We observed some issues with the command behaviour, so make sure that you have
| `azure-cli` >= 2.42.0 and `ml` extension >= 2.11.0.
| You can check installed versions by running `az --version`.
az acr login --name <acr repo name>
docker push <acr repo name>.azurecr.io/kedro-base-image:latest
\Run the command:
\Register the Azure ML Environment:

.. code:: console
az ml environment create --name <environment-name> --version <version> --build-context . --dockerfile-path Dockerfile
az ml environment create --name <environment-name> --image <acr repo name>.azurecr.io/kedro-base-image:latest
\
Now you can re-use this environment and run the pipeline without the need to build the docker image again (unless you add some dependencies to your environment, obviously :-) ).

9.2. **If using docker image flow** (shown in the Quickstart video)

8.2. **If using docker image flow**
.. note::
| Note that using docker image flow means that every time you change your pipeline's code,
| you will need to build and push the docker image to ACR again.
| We recommend this option for CI/CD-automated MLOps workflows.
Ensure that in the ``azureml.yml`` you have ``code_directory`` set to null, and ``docker.image`` is filled:

Expand All @@ -132,55 +163,24 @@ Keep the sections in the ``Dockerfile`` and adjust the ``.dockerignore``
file to include any other files to be added to the Docker image,
such as ``!data/01_raw`` for the raw data files.

Invoke docker build
Invoke docker build:

.. code:: console
kedro docker build --docker-args "--build-arg=BASE_IMAGE=python:3.9" --image=<image tag from conf/base/azureml.yml>
Once finished, push the image:
\Once finished, login to ACR:

.. code:: console
docker push <image tag from conf/base/azureml.yml>
(you will need to authorize to the ACR first, e.g. by
``az acr login --name <acr repo name>`` ).
az acr login --name <acr repo name>

8.3. **If using docker flow with Azure ML CLI**

In this flow, the docker image will be built in the Azure, not locally.
Keep the sections in the ``Dockerfile`` and adjust the ``.dockerignore``
file to include any other files to be added to the Docker image,
such as ``!data/01_raw`` for the raw data files.
\and push the image:

.. code:: console
az ml environment create --name <environment-name> --version <version> --build-context . --dockerfile-path Dockerfile
\

9. Adjust the Data Catalog - the default one stores all data locally,
whereas the plugin will automatically use Azure Blob Storage. Only
input data is required to be read locally. Final
``conf/base/catalog.yml`` should look like this:

.. code:: yaml
companies:
type: pandas.CSVDataSet
filepath: data/01_raw/companies.csv
layer: raw
reviews:
type: pandas.CSVDataSet
filepath: data/01_raw/reviews.csv
layer: raw
docker push <image tag from conf/base/azureml.yml>
shuttles:
type: pandas.ExcelDataSet
filepath: data/01_raw/shuttles.xlsx
layer: raw
10. Run the pipeline on Azure ML Pipelines. Here, the *Azure Subscription ID* and *Storage Account Key* will be used:

Expand Down
2 changes: 1 addition & 1 deletion kedro_azureml/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
__version__ = "0.3.0"
__version__ = "0.3.1"

import warnings

Expand Down
2 changes: 1 addition & 1 deletion kedro_azureml/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ class KedroAzureRunnerConfig(BaseModel):
# Azure ML Environment to use during pipeline execution
environment_name: "{environment_name}"
# Path to directory to upload, or null to disable code upload
code_directory: null
code_directory: "."
# Path to the directory in the Docker image to run the code from
# Ignored when code_directory is set
working_directory: /home/kedro
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[tool.poetry]
name = "kedro-azureml"
version = "0.3.0"
version = "0.3.1"
description = "Kedro plugin with Azure ML Pipelines support"
readme = "README.md"
authors = ['marcin.zablocki <marcin.zablocki@getindata.com>']
Expand Down
2 changes: 1 addition & 1 deletion sonar-project.properties
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ sonar.tests=tests/
sonar.python.coverage.reportPaths=coverage.xml
sonar.python.version=3.9

sonar.projectVersion=0.3.0
sonar.projectVersion=0.3.1
sonar.projectDescription=Kedro plugin with Azure ML Pipelines support
sonar.links.homepage=https://kedro-azureml.readthedocs.io/
sonar.links.ci=https://github.com/getindata/kedro-azureml/actions
Expand Down

0 comments on commit 03788f8

Please sign in to comment.