-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adding microbiome analysis workflows to IWC with test data #182
base: main
Are you sure you want to change the base?
Conversation
Thanks, can you add a README, Changelog and dockstore.yml files ? (https://github.com/galaxyproject/iwc/blob/main/workflows/README.md#structure-of-the-directory) |
@mvdbeek did I miss something else ? Thanks a lot for helping me out :) |
@wm75 Can you help me revising and merging this PR |
…underscore in its name before
To do as discussed with @wm75 : 1- remove "latest" from workflow names |
@bebatut I have added the 5 workflows for the single samples run and the 4 workflows for the collection of samples run so in total 9 workflows with their test data, I have chosen the minimum size sample data which contain VFs, Contigs, etc. the maximum file size is 50Mb, but the other files are either in Bytes or Kbs |
@EngyNasr @bebatut I don't see much value in offering the single-sample workflows, when collection-based flavors exist that could be run with 1-element collections. Having the single-sample WFs published would just mean more maintenance and synchronization efforts. |
@EngyNasr can you please run some json reformatter tool over your workflows. Single-line JSON is just not very nice to review and prevents meaningful diffs. Use, e.g., |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some initial comments:
- At least some of your workflows are lacking a
release
attribute, which you need to add manually. - In the Preprocessing workflow, the conversion of fastq.gz to plain fastq, just for the purpose of filtering reads by their ID, is rather unfriendly for a user's quota.
There istoolshed.g2.bx.psu.edu/repos/iuc/seqtk/seqtk_subseq/1.3.1
which can filter compressed fastqs directly (and which is probably faster in all cases). It's downside is that it only keeps matching IDs, but can't discard them, or write both to separate files (liketoolshed.g2.bx.psu.edu/repos/peterjc/seq_filter_by_id/seq_filter_by_id/0.2.7
).
So if you want the non-host reads, you'll have to invert the action of Filter Tabular at the step before.
If you really need also the host reads as a separate file (which I'm not entirely convinced of), you would have to run Filter Tabular and seqtk subseq twice, but even that might still be better than the current way?
it was just the old way we used to do the analysis and we use these workflows in the current training material, thats why we wanted to have both as two versions of the workflow. but definitely they are useless now since the collection version does the same exact job, but it will never take a single file it has to always be a collection |
This tool should also work here: toolshed.g2.bx.psu.edu/repos/iuc/krakentools_extract_kraken_reads/krakentools_extract_kraken_reads/1.2+galaxy0 |
…sing, to be added once the PR of the tool update is merges
I need help in tests @wm75 @paulzierep @bebatut :
I noticed that the tool id for these tools is not like the rest of the tools e.g. toolshed.g2.bx.psu.edu/repos/iuc/bedtools/bedtools_getfastabed/2.30.0+galaxy1 Is there a way to solve that for these tests to succeed? |
…with an update to Krakentool, and also pushing the latest updates to the workflows, still the planemo tests fails for the same reasons, I need help with that
@EngyNasr once galaxyproject/planemo#1377 is merged and a new version is released it should work. |
thank you so much :) |
@mvdbeek, Is it possible that it is also done for FILTER_EMPTY_DATASETS ? |
I think it was closed by mistake |
Looks like it worked and you only need to work on your test assertions. |
@EngyNasr two questions on the latest preprocessing version:
|
The Pathogen-Detection-Nanopore-All-Samples-Analysis WF needs much better annotations and input dataset labels to make it understandable what it is good for. Right now, understanding the purpose without knowing the tutorial is really hard. Even the README file only talks about VF genes, but many people wouldn't know what that is. |
…database size, where we use now a input parameter instead, thank you so much Avatar
The one error you've got is still the same:
you'd need a smaller database for testing, I think @wm75 or @paulzierep were interested in doing that ? |
I can use the Standard-8 or Standard-16 databases instead for now, I just some administrative help to add to the kraken2 tool on Galaxy EU and Org. @mvdbeek this error I dont understand, can you please check that, or is it only the database size problem of the taxonomy profiling workflow? |
Dears :) @wm75 This PR is finally ready for review, it would be great if we could merge it this week Thanks a lot, |
…es, to make it opened for any microbiome workflows
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @EngyNasr
I made some comments mostly for the 1st workflow, but they apply also to other workflows
workflows/microbiome/gene-based-pathogen-identification/README.md
Outdated
Show resolved
Hide resolved
workflows/microbiome/nanopore-allele-based-pathogen-identification/README.md
Outdated
Show resolved
Hide resolved
workflows/microbiome/gene-based-pathogen-identification/Gene-based-Pathogen-Identification.ga
Outdated
Show resolved
Hide resolved
workflows/microbiome/gene-based-pathogen-identification/Gene-based-Pathogen-Identification.ga
Outdated
Show resolved
Hide resolved
workflows/microbiome/gene-based-pathogen-identification/Gene-based-Pathogen-Identification.ga
Outdated
Show resolved
Hide resolved
workflows/microbiome/gene-based-pathogen-identification/Gene-based-Pathogen-Identification.ga
Show resolved
Hide resolved
Thank you so much I will follow them all and ping you again, thanks a lot |
…low, solving the linting issue
… and after host removal to the Multiqc output of the preprocessing workflow
…, now the table is included and multiq runs correctly :)
…the collections comming from genebased pathogen identification, which may happen when no contigs are found by metaFlye for some samples
…ollections, to protect the workflow from the carry on error that might occur for samples with fewer number of contigs, VF genes and AMR genes found
…o the user to choose
…lows of PathoGFAIR in one workflow
… to help users track their histories
I tried to reduce the test-data in this PR, hope it works.
Thanks a lot,
Engy <3