Merge pull request #31 from getindata/release-0.3.1

Release 0.3.1
getindata · Nov 18, 2022 · 03788f8 · 03788f8
2 parents c799481 + 16bb6c3
commit 03788f8
Show file tree

Hide file tree

Showing 9 changed files with 81 additions and 70 deletions.
diff --git a/.bumpversion.cfg b/.bumpversion.cfg
@@ -1,5 +1,5 @@
 [bumpversion]
-current_version = 0.3.0
+current_version = 0.3.1
 
 [bumpversion:file:pyproject.toml]
 

diff --git a/.copier-answers.yml b/.copier-answers.yml
@@ -7,7 +7,7 @@ description: Kedro plugin with Azure ML Pipelines support
 docs_url: https://kedro-azureml.readthedocs.io/
 full_name: Kedro Azure ML Pipelines plugin
 github_url: https://github.com/getindata/kedro-azureml
-initial_version: 0.3.0
+initial_version: 0.3.1
 keywords:
 - kedro
 - mlops

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -2,9 +2,14 @@
 
 ## [Unreleased]
 
+## [0.3.1] - 2022-11-18
+
+-   Fix default configuration, to make code upload as default
+-   Improved documentation and quickstart related to the code upload feature
+
 ## [0.3.0] - 2022-11-16
 
--   Added support for execution via code upload for faster development cycles https://github.com/getindata/kedro-azureml/pull/15
+-   Added support for execution via code upload for faster development cycles <https://github.com/getindata/kedro-azureml/pull/15>
 -   Quickstart documentation improvements
 
 ## [0.2.2] - 2022-10-26
@@ -23,7 +28,9 @@
 
 -   Initial plugin release
 
-[Unreleased]: https://github.com/getindata/kedro-azureml/compare/0.3.0...HEAD
+[Unreleased]: https://github.com/getindata/kedro-azureml/compare/0.3.1...HEAD
+
+[0.3.1]: https://github.com/getindata/kedro-azureml/compare/0.3.0...0.3.1
 
 [0.3.0]: https://github.com/getindata/kedro-azureml/compare/0.2.2...0.3.0
 

diff --git a/README.md b/README.md
@@ -16,7 +16,11 @@
 </p>
 
 ## About
-Following plugin enables running Kedro pipelines on Azure ML Pipelines service
+Following plugin enables running Kedro pipelines on Azure ML Pipelines service.
+
+We support 2 native Azure Machine Learning types of workflows:
+* For Data Scientists: fast, iterative development with code upload 
+* For MLOps: stable, repeatable workflows with Docker 
 
 ## Documentation 
 

diff --git a/docs/source/03_quickstart.rst b/docs/source/03_quickstart.rst
@@ -19,7 +19,7 @@ created in Azure and have their **names** ready to input to the plugin:
 -  Azure ML Compute Cluster
 -  Azure Storage Account and Storage Container
 -  Azure Storage Key (will be used to execute the pipeline)
--  Azure Container Registry (optional)
+-  Azure Container Registry
 
 1. Make sure that you're logged into Azure (``az login``).
 2. Prepare new virtual environment with Python >=3.8. Install the
@@ -49,15 +49,15 @@ created in Azure and have their **names** ready to input to the plugin:
 3. Go to the project's directory: ``cd kedro-azureml-demo``
 4. Add ``kedro-azureml`` to ``src/requirements.txt``
 5. (optional) Remove ``kedro-telemetry`` from ``src/requirements.txt``
-   or set appopriate settings
+   or set appropriate settings
    (`https://github.com/kedro-org/kedro-plugins/tree/main/kedro-telemetry <https://github.com/kedro-org/kedro-plugins/tree/main/kedro-telemetry>`__).
 6. Install the requirements ``pip install -r src/requirements.txt``
 7. Initialize Kedro Azure ML plugin, it requires the Azure resource
    names as stated above. Experiment name can be anything you like (as
    long as it's allowed by Azure ML). The environment name is the name
-   of the Azure ML Environment to be created in the next step. You can
-   use the syntax <environment_name>@latest for the latest version or
-   <environment-name>:<version> for a specific version.
+   of the Azure ML Environment to be created in the next steps. You can
+   use the syntax ``<environment_name>@latest`` for the latest version or
+   ``<environment-name>:<version>`` for a specific version.
 
 .. code:: console
 
@@ -66,16 +66,38 @@ created in Azure and have their **names** ready to input to the plugin:
    #                          STORAGE_CONTAINER ENVIRONMENT_NAME
    kedro azureml init <resource-group-name> <workspace-name> <experiment-name> <compute-cluster-name> <storage-account-name> <storage-container-name> <environment-name>
 
+8. Adjust the Data Catalog - the default one stores all data locally,
+   whereas the plugin will automatically use Azure Blob Storage. Only
+   input data is required to be read locally. Final
+   ``conf/base/catalog.yml`` should look like this:
+
+.. code:: yaml
+
+   companies:
+     type: pandas.CSVDataSet
+     filepath: data/01_raw/companies.csv
+     layer: raw
 
-8. Create an Azure ML Environment for the project:
+   reviews:
+     type: pandas.CSVDataSet
+     filepath: data/01_raw/reviews.csv
+     layer: raw
+
+   shuttles:
+     type: pandas.ExcelDataSet
+     filepath: data/01_raw/shuttles.xlsx
+     layer: raw
+
+9. Prepare an Azure ML Environment for the project:
 
    For the project's code to run on Azure ML it needs to have an environment
-   with the necessary dependencies. Here is it shown how to do this from a
-   local Docker build context. Please refer to the
-   `Azure ML CLI documentation <https://learn.microsoft.com/en-us/azure/machine-learning/how-to-manage-environments-v2#create-an-environment>`__
-   for more options.
+   with the necessary dependencies.
 
-   Start by executing the following command:
+You have 2 options for executing your pipeline in Azure ML
+    1. Use code upload (default) - more suitable for Data Scientists' experimentation and pipeline development
+    2. Use docker image flow (shown in the Quickstart video) - more suitable for MLOps processes with better experiment repeatability guarantees
+
+Start by executing the following command:
 
 .. code:: console
 
@@ -85,38 +107,47 @@ This command creates a several files, including ``Dockerfile`` and
 ``.dockerignore``. These can be adjusted to match the workflow for
 your project.
 
-You have 3 options for executing your pipeline in Azure ML
-    1. Use code upload (default) - more suitable for Data Scientists' experimentation and pipeline development
-    2. Use docker image flow (shown in the Quickstart video) - more suitable for MLOps processes with better experiment repeatability guarantees
-    3. Use docker flow with Azure ML CLI - suitable for workflows where docker is not available on the machine (Azure ML builds the image in this case)
 
 Depending on whether you want to use code upload when submitting an
 experiment or not, you would need to add the code and any possible input
 data to the Docker image.
 
-8.1. **If using code upload (default)**
+9.1. **If using code upload** (default)
 
 Everything apart from the section "install project requirements"
 can be removed from the ``Dockerfile``. This plugin automatically creates empty ``.amlignore`` file (`see the official docs <https://learn.microsoft.com/en-us/azure/machine-learning/how-to-save-write-experiment-files#storage-limits-of-experiment-snapshots>`__)
 which means that all of the files (including potentially sensitive ones!) will be uploaded to Azure ML. Modify this file if needed.
 
-Set ``code_directory: "."`` in the ``azureml.yml`` config file.
+Ensure ``code_directory: "."`` is set in the ``azureml.yml`` config file (it's set by default).
+
+\Build the image:
+
+.. code:: console
 
-.. warning::
+    kedro docker build --docker-args "--build-arg=BASE_IMAGE=python:3.9" --image=<acr repo name>.azurecr.io/kedro-base-image:latest
+
+\Login to ACR and push the image:
+
+.. code:: console
 
-    | Make sure that you have the latest version of Azure CLI before running this command.
-    | We observed some issues with the command behaviour, so make sure that you have
-    | `azure-cli` >= 2.42.0 and `ml` extension >= 2.11.0.
-    | You can check installed versions by running `az --version`.
+    az acr login --name <acr repo name>
+    docker push <acr repo name>.azurecr.io/kedro-base-image:latest
 
-\Run the command:
+\Register the Azure ML Environment:
 
 .. code:: console
 
-   az ml environment create --name <environment-name> --version <version> --build-context . --dockerfile-path Dockerfile
+    az ml environment create --name <environment-name> --image <acr repo name>.azurecr.io/kedro-base-image:latest
 
+\
+Now you can re-use this environment and run the pipeline without the need to build the docker image again (unless you add some dependencies to your environment, obviously :-) ).
+
+9.2. **If using docker image flow** (shown in the Quickstart video)
 
-8.2. **If using docker image flow**
+.. note::
+    | Note that using docker image flow means that every time you change your pipeline's code,
+    | you will need to build and push the docker image to ACR again.
+    | We recommend this option for CI/CD-automated MLOps workflows.
 
 Ensure that in the ``azureml.yml`` you have ``code_directory`` set to null, and ``docker.image`` is filled:
 
@@ -132,55 +163,24 @@ Keep the sections in the ``Dockerfile`` and adjust the ``.dockerignore``
 file to include any other files to be added to the Docker image,
 such as ``!data/01_raw`` for the raw data files.
 
-Invoke docker build
+Invoke docker build:
 
 .. code:: console
 
    kedro docker build --docker-args "--build-arg=BASE_IMAGE=python:3.9" --image=<image tag from conf/base/azureml.yml>
 
-Once finished, push the image:
+\Once finished, login to ACR:
 
 .. code:: console
 
-   docker push <image tag from conf/base/azureml.yml>
-
-(you will need to authorize to the ACR first, e.g. by
-``az acr login --name <acr repo name>`` ).
+    az acr login --name <acr repo name>
 
-
-8.3. **If using docker flow with Azure ML CLI**
-
-In this flow, the docker image will be built in the Azure, not locally.
-Keep the sections in the ``Dockerfile`` and adjust the ``.dockerignore``
-file to include any other files to be added to the Docker image,
-such as ``!data/01_raw`` for the raw data files.
+\and push the image:
 
 .. code:: console
 
-   az ml environment create --name <environment-name> --version <version> --build-context . --dockerfile-path Dockerfile
-\
-
-9. Adjust the Data Catalog - the default one stores all data locally,
-   whereas the plugin will automatically use Azure Blob Storage. Only
-   input data is required to be read locally. Final
-   ``conf/base/catalog.yml`` should look like this:
-
-.. code:: yaml
-
-   companies:
-     type: pandas.CSVDataSet
-     filepath: data/01_raw/companies.csv
-     layer: raw
-
-   reviews:
-     type: pandas.CSVDataSet
-     filepath: data/01_raw/reviews.csv
-     layer: raw
+   docker push <image tag from conf/base/azureml.yml>
 
-   shuttles:
-     type: pandas.ExcelDataSet
-     filepath: data/01_raw/shuttles.xlsx
-     layer: raw
 
 10. Run the pipeline on Azure ML Pipelines. Here, the *Azure Subscription ID* and *Storage Account Key* will be used:
 

diff --git a/kedro_azureml/__init__.py b/kedro_azureml/__init__.py
@@ -1,4 +1,4 @@
-__version__ = "0.3.0"
+__version__ = "0.3.1"
 
 import warnings
 

diff --git a/kedro_azureml/config.py b/kedro_azureml/config.py
@@ -72,7 +72,7 @@ class KedroAzureRunnerConfig(BaseModel):
   # Azure ML Environment to use during pipeline execution
   environment_name: "{environment_name}"
   # Path to directory to upload, or null to disable code upload
-  code_directory: null
+  code_directory: "."
   # Path to the directory in the Docker image to run the code from
   # Ignored when code_directory is set
   working_directory: /home/kedro

diff --git a/pyproject.toml b/pyproject.toml
@@ -1,6 +1,6 @@
 [tool.poetry]
 name = "kedro-azureml"
-version = "0.3.0"
+version = "0.3.1"
 description = "Kedro plugin with Azure ML Pipelines support"
 readme = "README.md"
 authors = ['marcin.zablocki <marcin.zablocki@getindata.com>']

diff --git a/sonar-project.properties b/sonar-project.properties
@@ -6,7 +6,7 @@ sonar.tests=tests/
 sonar.python.coverage.reportPaths=coverage.xml
 sonar.python.version=3.9
 
-sonar.projectVersion=0.3.0
+sonar.projectVersion=0.3.1
 sonar.projectDescription=Kedro plugin with Azure ML Pipelines support
 sonar.links.homepage=https://kedro-azureml.readthedocs.io/
 sonar.links.ci=https://github.com/getindata/kedro-azureml/actions