-
Notifications
You must be signed in to change notification settings - Fork 403
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pangolin flag #593
Pangolin flag #593
Conversation
Arguably should be going through "combine_metadata" but I can't quite figure out how to do that properly in this context.
Alright, this branch now: I have two main questions (probably for @huddlej or @jameshadfield ?) before I land this:
Would love suggestions on how to implement this metadata addendum (I can then remove the Sidenote: (D) is currently broken, but given that I don't think this is the path we want to go down, putting those questions aside for now.
I'm unfamiliar with the |
this looks mostly good, but the node data output needs be {
"produced_by":"pangolin",
"nodes": {
"tax1": {"pango": "B.34.2323.454"},
"tax2": {"pango": "X.3.3.4"},
...
}
} Your implementation is missing the explicit attribute name. besides, we would need to figure out how this interacts with the Pango lineage assignment we do from GISAID metadata. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sidneymbell I made some specific comments below, but maybe most are not helpful if the goal is to not have a new node data JSON.
Is the most helpful outcome here to add a "Nextstrain-approved" pangolin lineage column to the combined metadata (local + GISAID, for example) prior to subsampling? Or is the goal more specifically for PHLs to reannotate their own metadata file(s) prior to filtering, subsampling, etc.? The specific implementation could be different depending on which of those use cases you need to support.
Thanks so much, @rneher and @huddlej! The goal for us is to enable our partner PHLs to see up-to-date pangolin lineage assignments for both the contextual gisaid data and their local data both (1) on the tree and (2, very impt) in the "download metadata" tsv file from the auspice frontend. Would reimplementing the Re: these assignments vs gisaid's, I'd propose having these pangolin assignments (if run) supercede the gisaid assignments for two reasons:
|
Perfect! It looks like most (maybe all?) node data annotations in the auspice JSON are available through Auspice's metadata download. Since the current workflow mainly flows from raw metadata and sequences toward node data JSONs aggregated into an auspice JSON, the simplest way to achieve the goal is to export the new Pangolin annotations in a node data JSON like you've setup here. You don't have to choose between existing annotations and the re-annotation. For instance, you can provide an argument like Someone from @nextstrain/core can add the |
Thanks for your help, @huddlej ! Recent changes:
Opening full PR for review, looking forward to any additional suggestions you may have. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This worked for me as it is, @sidneymbell, but I made a couple of requests for minor Snakemake surgery and a simpler conda environment file. Let me know if you have any questions. We can also chat online (offline?), if it helps to be more synchronous.
Next, we'll have to figure out the best way to get pangolin and pangolearn in the Nextstrain Docker image, but that can be a separate PR.
As always, thanks for such a detailed, thoughtful and constructive review, @huddlej ! I believe I've implemented all of the suggested changes, and this runs well on my box using the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good, @sidneymbell. I had only two small edits:
conda-forge
has to appear first in the channel list of the conda environment- the
run_pangolin
rule needs to write out to the log file with2>&1 | tee {log}
After those changes, we can merge! 🎉
At the last Nextstrain meeting, we discussed adding the pangolin tools to the base Nextstrain Docker image vs. maintaining a separate image for the ncov workflow. The consensus was to add these tools to the base Docker image, so I'll make a separate issue/PR for that change once I figure out how to install pangolin without Conda. :)
Draft implementation for fixing #557.
I need to spend time debugging and testing, but would love any high-level feedback on the approach if you have a minute.
cc @ttung