Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Azbatch executor #1953

Merged
merged 106 commits into from May 4, 2023
Merged

Conversation

jakevc
Copy link
Contributor

@jakevc jakevc commented Nov 4, 2022

Description

Finish remaining issues with the original implementation of the Azure Batch Executor originally tackled by @andreas-wilm in #533.

  • simplified command line options
  • batch configuration as a class with sensible defaults, or environment variable overrides.
  • made use of existing AzBlob helper where possible.
  • updates dockerfile with necessary deps
  • container jakevc/snakemake:latest contains these changes for testing

Toyed with the idea of having more configuration from command line options, but this litters the codebase with redundant code (each command line option needs to be repeated a few times in init.py, scheduler.py, workflow.py, azure_batch.py). Env Variables seem appropriate with sensible defaults.

QC

  • The PR contains a test case for the changes or the changes are already covered by an existing test case.
  • The documentation (docs/) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake).

Example

Example invocation:

export AZ_BLOB_ACCOUNT_URL=*********
export AZ_BATCH_ACCOUNT_KEY=*********

snakemake \
    --jobs 3 \
    -rpf --verbose --default-remote-prefix snaketest \
    --use-conda \
    --default-remote-provider AzBlob \
    --envvars AZ_BLOB_ACCOUNT_URL \
    --az-batch \
    --container-image jakevc/snakemake \
    --az-batch-account-url **********

jakevc and others added 28 commits October 21, 2022 19:02
### Description

<!--Add envmodules keyword to vim syntax-->

### QC
<!-- Make sure that you can tick the boxes below. -->

* [ ] The PR contains a test case for the changes or the changes are
already covered by an existing test case.
* [ ] The documentation (`docs/`) is updated to reflect the changes or
this is not necessary (e.g. if the change does neither modify the
language nor the behavior or functionalities of Snakemake).
…nakemake#1921)

### Description

This just removes a single period in the documentation section about the
multiext function, so that this, with a redundant period:

```
expand("some/plot.{ext}", ext=[".pdf", ".svg", ".png"])
```

becomes this, matching the expected output:

```
expand("some/plot{ext}", ext=[".pdf", ".svg", ".png"])
```

Fixes snakemake#1012.

### QC
<!-- Make sure that you can tick the boxes below. -->

* [X] The PR contains a test case for the changes or the changes are
already covered by an existing test case.
* [X] The documentation (`docs/`) is updated to reflect the changes or
this is not necessary (e.g. if the change does neither modify the
language nor the behavior or functionalities of Snakemake).
🤖 I have created a release *beep* *boop*
---


##
[7.16.2](snakemake/snakemake@v7.16.1...v7.16.2)
(2022-10-26)


### Bug Fixes

* fix false rerun triggering downstream of checkpoints due to spurious
parameter, code or software env changes
([638ea86](snakemake@638ea86))
* remove redundant dot in expand call in multiext documentation
([snakemake#1921](snakemake#1921))
([278beaa](snakemake@278beaa))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
…e software envs from the caching hash value, which can be handy e.g. for download rules where the software version does not affect the result) (snakemake#1933)

### Description

<!--Add a description of your PR here-->

### QC
<!-- Make sure that you can tick the boxes below. -->

* [x] The PR contains a test case for the changes or the changes are
already covered by an existing test case.
* [x] The documentation (`docs/`) is updated to reflect the changes or
this is not necessary (e.g. if the change does neither modify the
language nor the behavior or functionalities of Snakemake).
### Description

The value of `os.pathconf(self._incomplete_path, "PC_NAME_MAX")` is
cached in an attribute `Persistence._max_len` -- this avoids repeatedly
querying the operating system for the same configuration information
every time `marked_incomplete()` is called on a file/job output. (Note:
this will dramatically speed up Snakemake DAG building on compute
clusters where `os.pathconf()` is slow to return and the number of files
checked for incompleteness is very large.)

### QC
<!-- Make sure that you can tick the boxes below. -->

* [X] The PR contains a test case for the changes or the changes are
already covered by an existing test case.
* [X] The documentation (`docs/`) is updated to reflect the changes or
this is not necessary (e.g. if the change does neither modify the
language nor the behavior or functionalities of Snakemake).
🤖 I have created a release *beep* *boop*
---


##
[7.17.0](snakemake/snakemake@v7.16.2...v7.17.0)
(2022-10-27)


### Features

* allow to define the cache mode per rule (this enables to exclude
software envs from the caching hash value, which can be handy e.g. for
download rules where the software version does not affect the result)
([snakemake#1933](snakemake#1933))
([715e618](snakemake@715e618))


### Performance Improvements

* cached os.pathconf() call in _record_path()
([snakemake#1920](snakemake#1920))
([551badb](snakemake@551badb))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
… structure, such that imports from e.g. scripts also work with remote modules (if specified as additional input files with workflow.source_path) (snakemake#1936)

### Description

<!--Add a description of your PR here-->

### QC
<!-- Make sure that you can tick the boxes below. -->

* [x] The PR contains a test case for the changes or the changes are
already covered by an existing test case.
* [x] The documentation (`docs/`) is updated to reflect the changes or
this is not necessary (e.g. if the change does neither modify the
language nor the behavior or functionalities of Snakemake).
🤖 I have created a release *beep* *boop*
---


##
[7.17.1](snakemake/snakemake@v7.17.0...v7.17.1)
(2022-10-28)


### Bug Fixes

* change source cache entries to keep the original name and folder
structure, such that imports from e.g. scripts also work with remote
modules (if specified as additional input files with
workflow.source_path)
([snakemake#1936](snakemake#1936))
([c34f3f6](snakemake@c34f3f6))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
…s. This allows to import from Python modules next to scripts while deploying the workflow as a snakemake module, even from remote locations. (snakemake#1940)

### QC
<!-- Make sure that you can tick the boxes below. -->

* [x] The PR contains a test case for the changes or the changes are
already covered by an existing test case.
* [x] The documentation (`docs/`) is updated to reflect the changes or
this is not necessary (e.g. if the change does neither modify the
language nor the behavior or functionalities of Snakemake).
🤖 I have created a release *beep* *boop*
---


##
[7.17.2](snakemake/snakemake@v7.17.1...v7.17.2)
(2022-10-28)


### Bug Fixes

* Consider source cache when setting search path for python scripts.
This allows to import from Python modules next to scripts while
deploying the workflow as a snakemake module, even from remote
locations. ([snakemake#1940](snakemake#1940))
([27be1d4](snakemake@27be1d4))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
…sting wildcard values from the consuming job. This can dramatically reduce ambiuity problems. Thanks to @descostesn! (snakemake#1939)

Only fall back to generic regex based matching if the wildcard based
approach does not work. This is inspired by a chat with @descostesn.

<!-- Make sure that you can tick the boxes below. -->

* [x] The PR contains a test case for the changes or the changes are
already covered by an existing test case.
* [x] The documentation (`docs/`) is updated to reflect the changes or
this is not necessary (e.g. if the change does neither modify the
language nor the behavior or functionalities of Snakemake).
🤖 I have created a release *beep* *boop*
---


##
[7.18.0](snakemake/snakemake@v7.17.2...v7.18.0)
(2022-10-31)


### Features

* first try to match output files against input files while persisting
wildcard values from the consuming job. This can dramatically reduce
ambiuity problems. Thanks to
[@descostesn](https://github.com/descostesn)!
([snakemake#1939](snakemake#1939))
([d093907](snakemake@d093907))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
…o 2.3.0 (snakemake#1945)

Bumps
[marocchino/sticky-pull-request-comment](https://github.com/marocchino/sticky-pull-request-comment)
from 2.2.1 to 2.3.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/marocchino/sticky-pull-request-comment/releases">marocchino/sticky-pull-request-comment's
releases</a>.</em></p>
<blockquote>
<h2>v2.3.0</h2>
<ul>
<li>Support glob path</li>
<li>Add <code>follow_symbolic_links</code> for path</li>
<li>Add <code>ignore_empty</code> for skip empty body</li>
<li>Update README for new output syntax</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/marocchino/sticky-pull-request-comment/commit/adca94abcaf73c10466a71cc83ae561fd66d1a56"><code>adca94a</code></a>
Merge pull request <a
href="https://github-redirect.dependabot.com/marocchino/sticky-pull-request-comment/issues/799">#799</a>
from marocchino/dependabot/npm_and_yarn/octokit/graph...</li>
<li><a
href="https://github.com/marocchino/sticky-pull-request-comment/commit/af76947473de144ee8fd8a8a2a0e5b2f392f5626"><code>af76947</code></a>
build(deps): Bump <code>@​octokit/graphql-schema</code> from 12.11.0 to
12.21.0</li>
<li><a
href="https://github.com/marocchino/sticky-pull-request-comment/commit/e5883ea53c71dc8e52b346fd5c883eb1ea63ece7"><code>e5883ea</code></a>
🔀 Merge pull request <a
href="https://github-redirect.dependabot.com/marocchino/sticky-pull-request-comment/issues/782">#782</a>
from marocchino/dependabot/npm_and_yarn/eslint-plug...</li>
<li><a
href="https://github.com/marocchino/sticky-pull-request-comment/commit/2520dca5b1f09dab5376fc18715ff64dab2ceebb"><code>2520dca</code></a>
🔀 Merge pull request <a
href="https://github-redirect.dependabot.com/marocchino/sticky-pull-request-comment/issues/790">#790</a>
from marocchino/dependabot/npm_and_yarn/typescript-...</li>
<li><a
href="https://github.com/marocchino/sticky-pull-request-comment/commit/f4ae94070a1b2cc47b7a126d9271c48c3dd7e803"><code>f4ae940</code></a>
🔀 Merge pull request <a
href="https://github-redirect.dependabot.com/marocchino/sticky-pull-request-comment/issues/793">#793</a>
from marocchino/dependabot/npm_and_yarn/jest-circus...</li>
<li><a
href="https://github.com/marocchino/sticky-pull-request-comment/commit/97e16fd48982b5a81b0ef88b530b54e8995e27c4"><code>97e16fd</code></a>
Merge pull request <a
href="https://github-redirect.dependabot.com/marocchino/sticky-pull-request-comment/issues/792">#792</a>
from samkio/feature/glob</li>
<li><a
href="https://github.com/marocchino/sticky-pull-request-comment/commit/da2f89048f71a00bda8f6df93622691adf401be3"><code>da2f890</code></a>
build(deps-dev): Bump eslint-plugin-jest from 27.1.2 to 27.1.3</li>
<li><a
href="https://github.com/marocchino/sticky-pull-request-comment/commit/ba82cbc35c1c35bab4a3bca541920a5d9e4d6f60"><code>ba82cbc</code></a>
build(deps-dev): Bump <code>@​typescript-eslint/parser</code> from
5.40.1 to 5.41.0</li>
<li><a
href="https://github.com/marocchino/sticky-pull-request-comment/commit/b8644f1ac559a999648c96168f58c769bb1af49a"><code>b8644f1</code></a>
🔀 Merge pull request <a
href="https://github-redirect.dependabot.com/marocchino/sticky-pull-request-comment/issues/789">#789</a>
from marocchino/dependabot/npm_and_yarn/eslint-8.26.0</li>
<li><a
href="https://github.com/marocchino/sticky-pull-request-comment/commit/95c224a6ab058ad5a70e45bb6d16f8b221078bfb"><code>95c224a</code></a>
🔀 Merge pull request <a
href="https://github-redirect.dependabot.com/marocchino/sticky-pull-request-comment/issues/797">#797</a>
from marocchino/dependabot/npm_and_yarn/types/node-...</li>
<li>Additional commits viewable in <a
href="https://github.com/marocchino/sticky-pull-request-comment/compare/v2.2.1...v2.3.0">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=marocchino/sticky-pull-request-comment&package-manager=github_actions&previous-version=2.2.1&new-version=2.3.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
@jakevc jakevc changed the title Azbatch executor feat: Azbatch executor Nov 4, 2022
@jakevc
Copy link
Contributor Author

jakevc commented Apr 27, 2023

This is the format of the secrets I am using when I run the test locally and it passes:

AZ_BLOB_PREFIX=snaketest
AZ_BLOB_ACCOUNT_URL=https://***********.blob.core.windows.net/
AZ_BATCH_ACCOUNT_URL=https://**********.westus2.batch.azure.com
AZ_BLOB_CREDENTIAL=$SAS
AZ_BATCH_ACCOUNT_KEY=$KEY

@johanneskoester
Copy link
Contributor

No clue, I just checked it again. The URL for the storage account seems follow the same pattern as you suggest.

@johanneskoester
Copy link
Contributor

Maybe you can add some parsing of those env vars to the remote provider and the executor, which gives a proper error message if something is invalid (without printing any secrets of course).

@vsmalladi
Copy link

I wonder if there is some special characters that aren't escaped correctly.

@jakevc can you try dummy variables and print those to stdout and see if there is something odd with the variables.

@vsmalladi
Copy link

@jakevc can you do this

      AZ_BLOB_PREFIX: '$${{ secrets.AZ_BLOB_PREFIX }}'
      AZ_BLOB_ACCOUNT_URL: '${{ secrets.AZ_STORAGE_ACCOUNT_URL }}'
      AZ_BLOB_CREDENTIAL: '${{ secrets.AZ_STORAGE_KEY }}'
      AZ_BATCH_ACCOUNT_URL: '${{ secrets.AZ_BATCH_ACCOUNT_URL }}'
      AZ_BATCH_ACCOUNT_KEY: '${{ secrets.AZ_BATCH_KEY }}'

@jakevc
Copy link
Contributor Author

jakevc commented Apr 28, 2023

Face palm: of course this is the issue. The funny thing is on my development environment zsh doesn't even let me set the blob credential without quoting it.

@jakevc
Copy link
Contributor Author

jakevc commented Apr 28, 2023

or not...

Still failing with the same error from the BlobServiceClient initialization:
ValueError: Invalid URL: https://

To me that looks like thee AZ_BLOB_ACCOUNT_URL is empty. I just tested that I get the same error when I specify an empty value for the AZ_BLOB_ACCOUNT_URL locally.

@johanneskoester is it correct that this is the name of the secrets? At one point it was, maybe you changed them at some point to match the environment variable names?

secrets.AZ_BLOB_PREFIX
secrets.AZ_STORAGE_ACCOUNT_URL
secrets.AZ_STORAGE_KEY
secrets.AZ_BATCH_ACCOUNT_URL
secrets.AZ_BATCH_KEY

@jakevc
Copy link
Contributor Author

jakevc commented May 1, 2023

Okay I'm not crazy.. I slightly more specific error and now we are failing with:

raise ValueError("Blob Account URL is None or empty string")
[220](https://github.com/snakemake/snakemake/actions/runs/4848828641/jobs/8640274478#step:20:221)
ValueError: Blob Account URL is None or empty string

@sonarcloud
Copy link

sonarcloud bot commented May 3, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 2 Code Smells

No Coverage information No Coverage information
1.1% 1.1% Duplication

@johanneskoester johanneskoester changed the base branch from main to azbatch-test May 4, 2023 18:44
@johanneskoester
Copy link
Contributor

Understood the problem now! The issue is that as the PR comes from a fork, it does not have access to secrets.

@johanneskoester johanneskoester merged commit a7f2124 into snakemake:azbatch-test May 4, 2023
8 of 9 checks passed
@johanneskoester
Copy link
Contributor

I've merged your changes in a snakemake-owned branch and created a PR for merging into main here: #2246
Let us go on from there. As you are anyway already part of the Snakemake org, you should in the future work directly in our repo in order to ensure that your PRs always have the secrets.

bool: True if the environment variable is a valid Azure Storage Account URL, False otherwise.
"""
url_pattern = re.compile(
r"^https:\/\/[a-z0-9]+(\.[a-z0-9]+)*\.blob\.core\.windows\.net\/?$"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jakevc this url pattern fails when the sas token is added should be

r"^https:\/\/[a-z0-9]+(\.[a-z0-9]+)*\.blob\.core\.windows\.net\/(\?.+)?$"

johanneskoester added a commit that referenced this pull request Jun 11, 2023
### Description

Finish remaining issues with the original implementation of the Azure
Batch Executor originally tackled by @andreas-wilm in #533.

- simplified command line options
- batch configuration as a class with sensible defaults, or environment
variable overrides.
- made use of existing AzBlob helper where possible.
- updates dockerfile with necessary deps
- container jakevc/snakemake:latest contains these changes for testing

Toyed with the idea of having more configuration from command line
options, but this litters the codebase with redundant code (each command
line option needs to be repeated a few times in __init__.py,
scheduler.py, workflow.py, azure_batch.py). Env Variables seem
appropriate with sensible defaults.

### QC
<!-- Make sure that you can tick the boxes below. -->

* [x] The PR contains a test case for the changes or the changes are
already covered by an existing test case.
* [x] The documentation (`docs/`) is updated to reflect the changes or
this is not necessary (e.g. if the change does neither modify the
language nor the behavior or functionalities of Snakemake).


### Example

Example invocation:

```bash
export AZ_BLOB_ACCOUNT_URL=*********
export AZ_BATCH_ACCOUNT_KEY=*********

snakemake \
    --jobs 3 \
    -rpf --verbose --default-remote-prefix snaketest \
    --use-conda \
    --default-remote-provider AzBlob \
    --envvars AZ_BLOB_ACCOUNT_URL \
    --az-batch \
    --container-image jakevc/snakemake \
    --az-batch-account-url **********
```

---------

### Description

<!--Add a description of your PR here-->

### QC
<!-- Make sure that you can tick the boxes below. -->

* [x] The PR contains a test case for the changes or the changes are
already covered by an existing test case.
* [x] The documentation (`docs/`) is updated to reflect the changes or
this is not necessary (e.g. if the change does neither modify the
language nor the behavior or functionalities of Snakemake).

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Jake VanCampen <jake.vancampen7@gmail.com>
Co-authored-by: Emmanouil "Manolis" Maragkakis <em.maragkakis@gmail.com>
Co-authored-by: Jesse Connell <jesse08@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: roshnipatel <roshnipatel@berkeley.edu>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Mike Taves <mwtoews@gmail.com>
Co-authored-by: Rich Abdill <rabdill@users.noreply.github.com>
Co-authored-by: Koen van Greevenbroek <74298901+koen-vg@users.noreply.github.com>
Co-authored-by: Chang Y <yech1990@gmail.com>
johanneskoester added a commit that referenced this pull request Jun 11, 2023
🤖 I have created a release *beep* *boop*
---


##
[7.28.0](v7.27.0...v7.28.0)
(2023-06-11)


### Features

* Added native support for execution via Azure Batch
([#1953](#1953))
([#2246](#2246))
([0f9c49f](0f9c49f))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Johannes Köster <johannes.koester@tu-dortmund.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants