Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STYLE: Remove trailing whitespaces #346

Merged
merged 2 commits into from
May 18, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions .devcontainer/devcontainer.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
// https://github.com/microsoft/vscode-dev-containers/tree/v0.187.0/containers/python-3-miniconda
{
"name": "hi-ml",
"build": {
"build": {
"context": "..",
"dockerfile": "Dockerfile",
"args": {
Expand All @@ -12,7 +12,7 @@
},

// Set *default* container specific settings.json values on container create.
"settings": {
"settings": {
"python.pythonPath": "/opt/conda/bin/python",
"python.languageServer": "Pylance",
"python.linting.enabled": true,
Expand Down Expand Up @@ -42,7 +42,7 @@

// Comment out connect as root instead. More info: https://aka.ms/vscode-remote/containers/non-root.
// "remoteUser": "vscode"

// Extra settings to start the docker container in order to use libfuse, required for locally mounting datasets.
// More info: https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.data.filedataset?view=azure-ml-py#mount-mount-point-none----kwargs-
"runArgs": [
Expand Down
2 changes: 1 addition & 1 deletion .devcontainer/noop.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
This file is copied into the container along with environment.yml* from the
parent folder. This is done to prevent the Dockerfile COPY instruction from
parent folder. This is done to prevent the Dockerfile COPY instruction from
failing if no environment.yml is found.
2 changes: 1 addition & 1 deletion .github/workflows/check-pr-title.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
name: 'Check PR Title'
on:
on:
pull_request:
types: [edited, opened, synchronize, reopened]

Expand Down
4 changes: 2 additions & 2 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ class Foo:
This is the class description.

The following block will be pretty-printed by Sphinx. Note the space between >>> and the code!

Usage example:
>>> from module import Foo
>>> foo = Foo(bar=1.23)
Expand All @@ -107,7 +107,7 @@ class Foo:
if enclosed in double backtick.

This method can raise a :exc:`ValueError`.

:param arg: This is a description for the method argument.
Long descriptions should be indented.
"""
Expand Down
2 changes: 1 addition & 1 deletion SECURITY.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ Instead, please report them to the Microsoft Security Response Center (MSRC) at

If you prefer to submit without logging in, send email to [secure@microsoft.com](mailto:secure@microsoft.com). If possible, encrypt your message with our PGP key; please download it from the [Microsoft Security Response Center PGP Key page](https://www.microsoft.com/en-us/msrc/pgp-key-msrc).

You should receive a response within 24 hours. If for some reason you do not, please follow up via email to ensure we received your original message. Additional information can be found at [microsoft.com/msrc](https://www.microsoft.com/msrc).
You should receive a response within 24 hours. If for some reason you do not, please follow up via email to ensure we received your original message. Additional information can be found at [microsoft.com/msrc](https://www.microsoft.com/msrc).

Please include the requested information listed below (as much as you can provide) to help us better understand the nature and scope of the possible issue:

Expand Down
30 changes: 15 additions & 15 deletions docs/source/authentication.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,14 @@

## Authentication

The `hi-ml` package uses two possible ways of authentication with Azure.
The `hi-ml` package uses two possible ways of authentication with Azure.
The default is what is called "Interactive Authentication". When you submit a job to Azure via `hi-ml`, this will
use the credentials you used in the browser when last logging into Azure. If there are no credentials yet, you should
see instructions printed out to the console about how to log in using your browser.

We recommend using Interactive Authentication.
We recommend using Interactive Authentication.

Alternatively, you can use a so-called Service Principal, for example within build pipelines.
Alternatively, you can use a so-called Service Principal, for example within build pipelines.


## Service Principal Authentication
Expand All @@ -19,7 +19,7 @@ training runs from code, for example from within an Azure pipeline. You can find
[here](https://docs.microsoft.com/en-us/azure/active-directory/develop/app-objects-and-service-principals).

If you would like to use Service Principal, you will need to create it in Azure first, and then store 3 pieces
of information in 3 environment variables — please see the instructions below. When all the 3 environment variables are in place,
of information in 3 environment variables — please see the instructions below. When all the 3 environment variables are in place,
your Azure submissions will automatically use the Service Principal to authenticate.


Expand All @@ -29,28 +29,28 @@ your Azure submissions will automatically use the Service Principal to authentic
1. Navigate to `App registrations` (use the top search bar to find it).
1. Click on `+ New registration` on the top left of the page.
1. Choose a name for your application e.g. `MyServicePrincipal` and click `Register`.
1. Once it is created you will see your application in the list appearing under `App registrations`. This step might take
a few minutes.
1. Click on the resource to access its properties. In particular, you will need the application ID.
You can find this ID in the `Overview` tab (accessible from the list on the left of the page).
1. Create an environment variable called `HIML_SERVICE_PRINCIPAL_ID`, and set its value to the application ID you
1. Once it is created you will see your application in the list appearing under `App registrations`. This step might take
a few minutes.
1. Click on the resource to access its properties. In particular, you will need the application ID.
You can find this ID in the `Overview` tab (accessible from the list on the left of the page).
1. Create an environment variable called `HIML_SERVICE_PRINCIPAL_ID`, and set its value to the application ID you
just saw.
1. You need to create an application secret to access the resources managed by this service principal.
On the pane on the left find `Certificates & Secrets`. Click on `+ New client secret` (bottom of the page), note down your token.
Warning: this token will only appear once at the creation of the token, you will not be able to re-display it again later.
1. You need to create an application secret to access the resources managed by this service principal.
On the pane on the left find `Certificates & Secrets`. Click on `+ New client secret` (bottom of the page), note down your token.
Warning: this token will only appear once at the creation of the token, you will not be able to re-display it again later.
1. Create an environment variable called `HIML_SERVICE_PRINCIPAL_PASSWORD`, and set its value to the token you just
added.

### Providing permissions to the Service Principal
Now that your service principal is created, you need to give permission for it to access and manage your AzureML workspace.
Now that your service principal is created, you need to give permission for it to access and manage your AzureML workspace.
To do so:
1. Go to your AzureML workspace. To find it you can type the name of your workspace in the search bar above.
1. On the `Overview` page, there is a link to the Resource Group that contains the workspace. Click on that.
1. When on the Resource Group, navigate to `Access control`. Then click on `+ Add` > `Add role assignment`. A pane will appear on the
the right. Select `Role > Contributor`. In the `Select` field type the name
of your Service Principal and select it. Finish by clicking `Save` at the bottom of the pane.


### Azure Tenant ID
The last remaining piece is the Azure tenant ID, which also needs to be available in an environment variable. To get
that ID:
Expand Down
28 changes: 14 additions & 14 deletions docs/source/commandline_tools.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@ From the command line, run the command

```himl-tb```

specifying one of
`[--experiment] [--latest_run_file] [--run]`
specifying one of
`[--experiment] [--latest_run_file] [--run]`

This will start a TensorBoard session, by default running on port 6006. To use an alternative port, specify this with `--port`.

Expand All @@ -21,16 +21,16 @@ If you choose to specify `--experiment`, you can also specify `--num_runs` to vi

If your AML config path is not ROOT_DIR/config.json, you must also specify `--config_file`.

To see an example of how to create TensorBoard logs using PyTorch on AML, see the
To see an example of how to create TensorBoard logs using PyTorch on AML, see the
[AML submitting script](examples/9/aml_sample.rst) which submits the following [pytorch sample script](examples/9/pytorch_sample.rst). Note that to run this, you'll need to create an environment with pytorch and tensorboard as dependencies, as a minimum. See an [example conda environemnt](examples/9/tensorboard_env.rst). This will create an experiment named 'tensorboard_test' on your Workspace, with a single run. Go to outputs + logs -> outputs to see the tensorboard events file.
## Download files from AML Runs

From the command line, run the command
From the command line, run the command

```himl-download```

specifying one of
`[--experiment] [--latest_run_file] [--run]`
specifying one of
`[--experiment] [--latest_run_file] [--run]`

If `--experiment` is provided, the most recent Run from this experiment will be downloaded.
If `--latest_run_file` is provided, the script will expect to find a RunId in this file.
Expand All @@ -46,29 +46,29 @@ If your AML config path is not `ROOT_DIR/config.json`, you must also specify `--
## Creating your own command line tools

When creating your own command line tools that interact with the Azure ML ecosystem, you may wish to use the
`AmlRunScriptConfig` class for argument parsing. This gives you a quickstart way for accepting command line arguments to
`AmlRunScriptConfig` class for argument parsing. This gives you a quickstart way for accepting command line arguments to
specify the following

- experiment: a string representing the name of an Experiment, from which to retrieve AML runs
- tags: to filter the runs within the given experiment
- num_runs: to define the number of most recent runs to return from the experiment
- run: to instead define one or more run ids from which to retrieve runs (also supports the older format of run recovery ideas although these are obsolete now)
- latest_run_file: to instead provide a path to a file containing the id of your latest run, for retrieval.
- config_path: to specify a config.json file in which your workspace settings are defined
You can extend this list of arguments by creating a child class that inherits from AMLRunScriptConfig.

You can extend this list of arguments by creating a child class that inherits from AMLRunScriptConfig.

### Defining your own argument types

Additional arguments can have any of the following types: `bool`, `integer`, `float`, `string`, `list`, `class/class instance`
with no additional work required. You can also define your own custom type, by providing a custom class in your code that
inherits from `CustomTypeParam`. It must define 2 methods:
1. `_validate(self, x: Any)`: which should raise a `ValueError` if x is not of the type you expect, and should also make a call
with no additional work required. You can also define your own custom type, by providing a custom class in your code that
inherits from `CustomTypeParam`. It must define 2 methods:
1. `_validate(self, x: Any)`: which should raise a `ValueError` if x is not of the type you expect, and should also make a call
`super()._validate(val)`
2. `from_string(self, y: string)` which takes in the command line arg as a string (`y`) and returns an instance of the type
that you want. For example, if your custom type is a tuple, this method should create a tuple from the input string and return that.
An example of a custom type can be seen in our own custom type: `RunIdOrListParam`, which accepts a string representing one or more
run ids (or run recovery ids) and returns either a List or a single RunId object (or RunRecoveryId object if appropriate)
run ids (or run recovery ids) and returns either a List or a single RunId object (or RunRecoveryId object if appropriate)

### Example:

Expand Down
30 changes: 15 additions & 15 deletions docs/source/datasets.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,12 +11,12 @@ to one dataset.


### AzureML Data Stores
Secondly, there are data stores. This is a concept coming from Azure Machine Learning, described
Secondly, there are data stores. This is a concept coming from Azure Machine Learning, described
[here](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-access-data). Data stores provide access to
one blob storage account. They exist so that the credentials to access blob storage do not have to be passed around
in the code - rather, the credentials are stored in the data store once and for all.
in the code - rather, the credentials are stored in the data store once and for all.

You can view all data stores in your AzureML workspace by clicking on one of the bottom icons in the left-hand
You can view all data stores in your AzureML workspace by clicking on one of the bottom icons in the left-hand
navigation bar of the AzureML studio.

One of these data stores is designated as the default data store.
Expand All @@ -27,11 +27,11 @@ Thirdly, there are datasets. Again, this is a concept coming from Azure Machine
* A data store
* A set of files accessed through that data store

You can view all datasets in your AzureML workspace by clicking on one of the icons in the left-hand
You can view all datasets in your AzureML workspace by clicking on one of the icons in the left-hand
navigation bar of the AzureML studio.

### Preparing data
To simplify usage, the `hi-ml` package creates AzureML datasets for you. All you need to do is to
To simplify usage, the `hi-ml` package creates AzureML datasets for you. All you need to do is to
* Create a blob storage account for your data, and within it, a container for your data.
* Create a data store that points to that storage account, and store the credentials for the blob storage account in it

Expand All @@ -54,7 +54,7 @@ What will happen under the hood?
is no dataset of that name, it will create one from all the files in blob storage in folder "my_folder". The dataset
will be created using the data store provided, "my_datastore".
* Once the script runs in AzureML, it will download the dataset "my_folder" to a temporary folder.
* You can access this temporary location by `run_info.input_datasets[0]`, and read the files from it.
* You can access this temporary location by `run_info.input_datasets[0]`, and read the files from it.

More complicated setups are described below.

Expand All @@ -77,19 +77,19 @@ output_folder = run_info.output_datasets[0]
```
Your script can now read files from `input_folder`, transform them, and write them to `output_folder`. The latter
will be a folder on the temp file system of the machine. At the end of the script, the contents of that temp folder
will be uploaded to blob storage, and registered as a dataset.
will be uploaded to blob storage, and registered as a dataset.

### Mounting and downloading
An input dataset can be downloaded before the start of the actual script run, or it can be mounted. When mounted,
the files are accessed via the network once needed - this is very helpful for large datasets where downloads would
the files are accessed via the network once needed - this is very helpful for large datasets where downloads would
create a long waiting time before the job start.

Similarly, an output dataset can be uploaded at the end of the script, or it can be mounted. Mounting here means that
all files will be written to blob storage already while the script runs (rather than at the end).

Note: If you are using mounted output datasets, you should NOT rename files in the output folder.

Mounting and downloading can be triggered by passing in `DatasetConfig` objects for the `input_datasets` argument,
Mounting and downloading can be triggered by passing in `DatasetConfig` objects for the `input_datasets` argument,
like this:

```python
Expand All @@ -105,14 +105,14 @@ output_folder = run_info.output_datasets[0]

### Local execution
For debugging, it is essential to have the ability to run a script on a local machine, outside of AzureML.
Clearly, your script needs to be able to access data in those runs too.
Clearly, your script needs to be able to access data in those runs too.

There are two ways of achieving that: Firstly, you can specify an equivalent local folder in the
`DatasetConfig` objects:
```python
from pathlib import Path
from health_azure import DatasetConfig, submit_to_azure_if_needed
input_dataset = DatasetConfig(name="my_folder",
input_dataset = DatasetConfig(name="my_folder",
datastore="my_datastore",
local_folder=Path("/datasets/my_folder_local"))
run_info = submit_to_azure_if_needed(...,
Expand All @@ -134,8 +134,8 @@ AzureML has the capability to download/mount a dataset to such a fixed location.
trigger that behaviour via an additional option in the `DatasetConfig` objects:
```python
from health_azure import DatasetConfig, submit_to_azure_if_needed
input_dataset = DatasetConfig(name="my_folder",
datastore="my_datastore",
input_dataset = DatasetConfig(name="my_folder",
datastore="my_datastore",
use_mounting=True,
target_folder="/tmp/mnist")
run_info = submit_to_azure_if_needed(...,
Expand All @@ -147,12 +147,12 @@ input_folder = run_info.input_datasets[0]
This is also true when running locally - if `local_folder` is not specified and an AzureML workspace can be found, then the dataset will be downloaded or mounted to the `target_folder`.

### Dataset versions
AzureML datasets can have versions, starting at 1. You can view the different versions of a dataset in the AzureML
AzureML datasets can have versions, starting at 1. You can view the different versions of a dataset in the AzureML
workspace. In the `hi-ml` toolbox, you would always use the latest version of a dataset unless specified otherwise.
If you do need a specific version, use the `version` argument in the `DatasetConfig` objects:
```python
from health_azure import DatasetConfig, submit_to_azure_if_needed
input_dataset = DatasetConfig(name="my_folder",
input_dataset = DatasetConfig(name="my_folder",
datastore="my_datastore",
version=7)
run_info = submit_to_azure_if_needed(...,
Expand Down
Loading