Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The phylogenetic github action cannot build from staged ingest data during dev-branch testing #49

Closed
j23414 opened this issue May 19, 2024 · 0 comments · Fixed by #50
Closed
Assignees
Labels
bug Something isn't working

Comments

@j23414
Copy link
Contributor

j23414 commented May 19, 2024

Current Behavior

The phylogenetic GitHub action (see this run) ignored the provided sequence and metadata URL configurations:

env:
    TRIAL_NAME: serogeno20240518
    SEQUENCES_URL: s3://nextstrain-data/files/workflows/dengue/trials/serogeno20240518/sequences_all.fasta.zst
    METADATA_URL: s3://nextstrain-data/files/workflows/dengue/trials/serogeno20240518/metadata_all.tsv.zst

These input fields are specified in the .github/workflows file:

sequences_url:
description: |
URL for a sequences.fasta.zst file.
If not provided, will use default sequences_url from phylogenetic/config/config_dengue.yaml
required: false
type: string
metadata_url:
description: |
URL for a metadata.tsv.zst file.
If not provided, will use default metadata_url from phylogenetic/config/config_dengue.yaml

However, they are not being used in the phylogenetic rule:

sequences_url = "https://data.nextstrain.org/files/workflows/dengue/sequences_{serotype}.fasta.zst",
metadata_url = "https://data.nextstrain.org/files/workflows/dengue/metadata_{serotype}.tsv.zst"

Expected Behavior

The phylogenetic GitHub action should accept sequences and metadata from specified URLs, especially when testing different features on dev branches. These URL datasets are often generated by the ingest GitHub action and should be a configurable-optional-input dataset during feature testing.

Possible Solution(s)

Consider implementing changes similar to the Zika repository, but with the addition of allowing for serotype expansion ( all, denv1, denv2, denv3, denv4).

Hopefully, then we could provide a config similar to:

env:
    TRIAL_NAME: serogeno20240518
    SEQUENCES_URL: https://data.nextstrain.org/files/workflows/dengue/trials/serogeno20240518/sequences_{serotype}.fasta.zst
    METADATA_URL: https://data.nextstrain.org/files/workflows/dengue/trials/serogeno20240518/metadata_{serotype}.tsv.zst

Which could be expanded across all serotypes. Otherwise, solving this issue might involve defining multiple sets of SEQUENCES_DENVX_URL and METADATA_DENVX_URL fields, which would be tedious during testing dev-branches. Alternatively, consider splitting phylogenetic into separate workflows (or workflow calls from a main workflow) for each serotype (phylogenetic_denv1 to phylogenetic_denv4). Open to discussion or suggestions.

@j23414 j23414 added the bug Something isn't working label May 19, 2024
@j23414 j23414 linked a pull request May 19, 2024 that will close this issue
2 tasks
@j23414 j23414 self-assigned this May 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant