adding microbiome analysis workflows to IWC with test data #182

EngyNasr · 2023-03-07T12:30:09Z

I tried to reduce the test-data in this PR, hope it works.

Thanks a lot,
Engy <3

mvdbeek · 2023-03-07T14:32:50Z

Thanks, can you add a README, Changelog and dockstore.yml files ? (https://github.com/galaxyproject/iwc/blob/main/workflows/README.md#structure-of-the-directory)

EngyNasr · 2023-03-10T09:24:32Z

@mvdbeek did I miss something else ?

Thanks a lot for helping me out :)

EngyNasr · 2023-03-14T14:37:07Z

@wm75 Can you help me revising and merging this PR
Thanks a lot <3

…underscore in its name before

EngyNasr · 2023-03-27T08:40:04Z

To do as discussed with @wm75 :

1- remove "latest" from workflow names
2- keep only one file to test for all the workflows, except the pre-processing
3- pre-processing tests we can use the txt file to compare content
4- make a readme inside each folder as well
5- make the changelog only inside each folder
6- remove the big docker.yml in the main folder and keep only the one in every folder

EngyNasr · 2023-05-31T20:32:55Z

@bebatut I have added the 5 workflows for the single samples run and the 4 workflows for the collection of samples run so in total 9 workflows with their test data, I have chosen the minimum size sample data which contain VFs, Contigs, etc. the maximum file size is 50Mb, but the other files are either in Bytes or Kbs

wm75 · 2023-06-13T14:41:44Z

@EngyNasr @bebatut I don't see much value in offering the single-sample workflows, when collection-based flavors exist that could be run with 1-element collections. Having the single-sample WFs published would just mean more maintenance and synchronization efforts.
Or is there anything that can be achieved with the single-sample versions, that the collection-based versions don't do?

wm75 · 2023-06-13T14:52:04Z

@EngyNasr can you please run some json reformatter tool over your workflows. Single-line JSON is just not very nice to review and prevents meaningful diffs. Use, e.g., python3 -m json.tool collection-version/pathogen-detection-nanopore-pre-processing-collection/Pathogen-Detection-Nanopore-Pre-Processing-collection.ga collection-version/pathogen-detection-nanopore-pre-processing-collection/Pathogen-Detection-Nanopore-Pre-Processing-collection_pretty.ga or any other tool you like.

wm75

Some initial comments:

At least some of your workflows are lacking a release attribute, which you need to add manually.
In the Preprocessing workflow, the conversion of fastq.gz to plain fastq, just for the purpose of filtering reads by their ID, is rather unfriendly for a user's quota.
There is toolshed.g2.bx.psu.edu/repos/iuc/seqtk/seqtk_subseq/1.3.1 which can filter compressed fastqs directly (and which is probably faster in all cases). It's downside is that it only keeps matching IDs, but can't discard them, or write both to separate files (like toolshed.g2.bx.psu.edu/repos/peterjc/seq_filter_by_id/seq_filter_by_id/0.2.7).
So if you want the non-host reads, you'll have to invert the action of Filter Tabular at the step before.
If you really need also the host reads as a separate file (which I'm not entirely convinced of), you would have to run Filter Tabular and seqtk subseq twice, but even that might still be better than the current way?

EngyNasr · 2023-06-14T09:10:44Z

@EngyNasr @bebatut I don't see much value in offering the single-sample workflows, when collection-based flavors exist that could be run with 1-element collections. Having the single-sample WFs published would just mean more maintenance and synchronization efforts. Or is there anything that can be achieved with the single-sample versions, that the collection-based versions don't do?

it was just the old way we used to do the analysis and we use these workflows in the current training material, thats why we wanted to have both as two versions of the workflow. but definitely they are useless now since the collection version does the same exact job, but it will never take a single file it has to always be a collection

paulzierep · 2023-06-15T13:22:04Z

This tool should also work here: toolshed.g2.bx.psu.edu/repos/iuc/krakentools_extract_kraken_reads/krakentools_extract_kraken_reads/1.2+galaxy0
Can use fastq.gz

…sing, to be added once the PR of the tool update is merges

EngyNasr · 2023-06-22T11:56:23Z

I need help in tests @wm75 @paulzierep @bebatut :

In the Pathogen-Detection-Nanopore-All-Samples-Analysis.ga workflow, I use __FILTER_EMPTY_DATASETS__ (version 1.0.0) which seems not installed when I run planemo test.
Same for Pathogen-Detection-Nanopore-Gene-based-pathogenic-Identification-collection.ga, workflow for Build list tool

I noticed that the tool id for these tools is not like the rest of the tools e.g. toolshed.g2.bx.psu.edu/repos/iuc/bedtools/bedtools_getfastabed/2.30.0+galaxy1

Is there a way to solve that for these tests to succeed?

…with an update to Krakentool, and also pushing the latest updates to the workflows, still the planemo tests fails for the same reasons, I need help with that

mvdbeek · 2023-06-28T14:22:02Z

@EngyNasr once galaxyproject/planemo#1377 is merged and a new version is released it should work.

EngyNasr · 2023-06-28T15:08:17Z

@EngyNasr once galaxyproject/planemo#1377 is merged and a new version is released it should work.

thank you so much :)

EngyNasr · 2023-06-28T15:10:29Z

@EngyNasr once galaxyproject/planemo#1377 is merged and a new version is released it should work.

@mvdbeek, Is it possible that it is also done for FILTER_EMPTY_DATASETS ?

bebatut · 2023-06-29T11:09:52Z

I think it was closed by mistake

mvdbeek · 2023-06-29T15:04:05Z

Looks like it worked and you only need to work on your test assertions.

wm75 · 2023-06-30T14:48:56Z

@EngyNasr two questions on the latest preprocessing version:

why is the host ID now not a WF parameter anymore? You could use the "Map parameter value" expression tool to map a user-selected species name to the required taxid.
is there a specific reason why you're using fastqc, when you're already running fastp? MultiQC can also visualize fastp JSON output dataset and it doesn't look too different from what two FastQC results (before and after trimming).

wm75 · 2023-06-30T15:04:16Z

The Pathogen-Detection-Nanopore-All-Samples-Analysis WF needs much better annotations and input dataset labels to make it understandable what it is good for. Right now, understanding the purpose without knowing the tutorial is really hard. Even the README file only talks about VF genes, but many people wouldn't know what that is.

…database size, where we use now a input parameter instead, thank you so much Avatar

mvdbeek · 2023-08-16T08:11:09Z

The one error you've got is still the same:

Loading database information...Failed attempt to allocate 65426688000bytes;
you may not have enough free memory to load this database.
If your computer has enough RAM, perhaps reducing memory usage from
other programs could help you load this database?
classify: unable to allocate hash table memory

you'd need a smaller database for testing, I think @wm75 or @paulzierep were interested in doing that ?

EngyNasr · 2023-08-16T10:44:56Z

The one error you've got is still the same:
Loading database information...Failed attempt to allocate 65426688000bytes;
you may not have enough free memory to load this database.
If your computer has enough RAM, perhaps reducing memory usage from
other programs could help you load this database?
classify: unable to allocate hash table memory
you'd need a smaller database for testing, I think @wm75 or @paulzierep were interested in doing that ?

I can use the Standard-8 or Standard-16 databases instead for now, I just some administrative help to add to the kraken2 tool on Galaxy EU and Org.

@mvdbeek
Another question, the HTML output of the Github testing, shows an error in the Genebased Pathogen identification workflow too: Expecting value: line 1 column 1 (char 0), which actually blocks the runs of the Pre-processing and the taxonomy profiling workflows.

this error I dont understand, can you please check that, or is it only the database size problem of the taxonomy profiling workflow?

EngyNasr · 2024-04-23T18:02:51Z

Dears :) @wm75

This PR is finally ready for review, it would be great if we could merge it this week

Thanks a lot,
Engy

workflows/microbiome-analysis/README.md

…es, to make it opened for any microbiome workflows

bebatut

Thanks @EngyNasr
I made some comments mostly for the 1st workflow, but they apply also to other workflows

workflows/microbiome/README.md

workflows/microbiome/gene-based-pathogen-identification/README.md

workflows/microbiome/nanopore-allele-based-pathogen-identification/README.md

workflows/microbiome/gene-based-pathogen-identification/Gene-based-Pathogen-Identification.ga

workflows/microbiome/README.md

workflows/microbiome/gene-based-pathogen-identification/Gene-based-Pathogen-Identification.ga

EngyNasr · 2024-05-02T12:20:51Z

Thanks @EngyNasr I made some comments mostly for the 1st workflow, but they apply also to other workflows

Thank you so much I will follow them all and ping you again, thanks a lot

…ments

…low, solving the linting issue

…this time :D

… and after host removal to the Multiqc output of the preprocessing workflow

…, now the table is included and multiq runs correctly :)

…the collections comming from genebased pathogen identification, which may happen when no contigs are found by metaFlye for some samples

…ollections, to protect the workflow from the carry on error that might occur for samples with fewer number of contigs, VF genes and AMR genes found

…o the user to choose

…lows of PathoGFAIR in one workflow

… to help users track their histories

adding microbiome analysis workflows to IWC with test data

8340c8a

adding Changelog, REadme and dockstore yml file

56a6dbf

EngyNasr added 2 commits March 19, 2023 11:34

solving linting issues

e72aa56

solving linting error by correcting the file name since i forgot the …

a401cba

…underscore in its name before

EngyNasr added 2 commits May 31, 2023 15:23

applying all comments

77e2f8c

adding workflows for the collection version

f8bcf61

wm75 reviewed Jun 13, 2023

View reviewed changes

applying wolfgang comments, still removing the decompress tool is mis…

121bac4

…sing, to be added once the PR of the tool update is merges

EngyNasr mentioned this pull request Jun 20, 2023

adding gzipped files formates to inputs galaxyproject/tools-iuc#5360

Merged

updating the preprocessing workflow replacing the decompressing step …

4461da3

…with an update to Krakentool, and also pushing the latest updates to the workflows, still the planemo tests fails for the same reasons, I need help with that

mvdbeek mentioned this pull request Jun 28, 2023

Update collection operation tool list galaxyproject/planemo#1377

Merged

nsoranzo closed this in galaxyproject/planemo#1377 Jun 29, 2023

bebatut reopened this Jun 29, 2023

solving the taxonomy profiling testing failure due to the standardPF …

7feccfc

…database size, where we use now a input parameter instead, thank you so much Avatar

EngyNasr added 4 commits April 22, 2024 12:53

updating workflows based on our latest paper update April 2024

1c949e0

updating workflows to include the tools latest version in Galaxy eu

4267b04

correcting a test file

79f8553

updating all readme files for workflows

a011293

EngyNasr commented Apr 24, 2024

View reviewed changes

workflows/microbiome-analysis/README.md Outdated Show resolved Hide resolved

EngyNasr commented Apr 24, 2024

View reviewed changes

workflows/microbiome-analysis/README.md Outdated Show resolved Hide resolved

EngyNasr commented Apr 24, 2024

View reviewed changes

workflows/microbiome-analysis/README.md Outdated Show resolved Hide resolved

mvdbeek requested a review from wm75 April 24, 2024 15:21

adding workflow comments, arranging the main folder and workflows nam…

dbaaa06

…es, to make it opened for any microbiome workflows

bebatut reviewed May 2, 2024

View reviewed changes

EngyNasr added 15 commits May 8, 2024 16:18

editing readme and workflows namings and tags based on Berenice's com…

f7b7f8a

…ments

solving linting problem

c1e6034

updating few typos

7f627a6

removing a test file which is no longer produced by the updated workf…

9aba574

…low, solving the linting issue

i missed to change the test file correctly the last push, here we go …

9d7a9cc

…this time :D

updated workflow reports, and adding the total number of reads before…

cc4aa97

… and after host removal to the Multiqc output of the preprocessing workflow

solving the error of MultiQC including the host reads removal details…

4d0a969

…, now the table is included and multiq runs correctly :)

adding a step in the 5th workflow to remove all failed datasets from …

13d3986

…the collections comming from genebased pathogen identification, which may happen when no contigs are found by metaFlye for some samples

add more filteration steps of removing failed and empty datasets in c…

308402b

…ollections, to protect the workflow from the carry on error that might occur for samples with fewer number of contigs, VF genes and AMR genes found

leaving the optional input to minimap2 selectng the samples profile t…

e6aaa69

…o the user to choose

correcting a comment typo in the preprocessing workflow

1191598

adding a 5in1 workflow named PathoGFAIR that groups all other 5 workf…

77ad03d

…lows of PathoGFAIR in one workflow

removing the 5 in 1 workflow and adding tags to main workflow outputs…

11dcc69

… to help users track their histories

correcting typos in readmes

b6be8a5

removing hashtags from tags to make them normal ones not promoted ones

97d22f7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adding microbiome analysis workflows to IWC with test data #182

adding microbiome analysis workflows to IWC with test data #182

EngyNasr commented Mar 7, 2023

mvdbeek commented Mar 7, 2023

EngyNasr commented Mar 10, 2023

EngyNasr commented Mar 14, 2023

EngyNasr commented Mar 27, 2023 •

edited

EngyNasr commented May 31, 2023

wm75 commented Jun 13, 2023

wm75 commented Jun 13, 2023

wm75 left a comment

EngyNasr commented Jun 14, 2023

paulzierep commented Jun 15, 2023

EngyNasr commented Jun 22, 2023 •

edited by nsoranzo

mvdbeek commented Jun 28, 2023

EngyNasr commented Jun 28, 2023

EngyNasr commented Jun 28, 2023 •

edited

bebatut commented Jun 29, 2023 •

edited

mvdbeek commented Jun 29, 2023

wm75 commented Jun 30, 2023

wm75 commented Jun 30, 2023

mvdbeek commented Aug 16, 2023

EngyNasr commented Aug 16, 2023

EngyNasr commented Apr 23, 2024 •

edited

bebatut left a comment

EngyNasr commented May 2, 2024

adding microbiome analysis workflows to IWC with test data #182

Are you sure you want to change the base?

adding microbiome analysis workflows to IWC with test data #182

Conversation

EngyNasr commented Mar 7, 2023

mvdbeek commented Mar 7, 2023

EngyNasr commented Mar 10, 2023

EngyNasr commented Mar 14, 2023

EngyNasr commented Mar 27, 2023 • edited

EngyNasr commented May 31, 2023

wm75 commented Jun 13, 2023

wm75 commented Jun 13, 2023

wm75 left a comment

Choose a reason for hiding this comment

EngyNasr commented Jun 14, 2023

paulzierep commented Jun 15, 2023

EngyNasr commented Jun 22, 2023 • edited by nsoranzo

mvdbeek commented Jun 28, 2023

EngyNasr commented Jun 28, 2023

EngyNasr commented Jun 28, 2023 • edited

bebatut commented Jun 29, 2023 • edited

mvdbeek commented Jun 29, 2023

wm75 commented Jun 30, 2023

wm75 commented Jun 30, 2023

mvdbeek commented Aug 16, 2023

EngyNasr commented Aug 16, 2023

EngyNasr commented Apr 23, 2024 • edited

bebatut left a comment

Choose a reason for hiding this comment

EngyNasr commented May 2, 2024

EngyNasr commented Mar 27, 2023 •

edited

EngyNasr commented Jun 22, 2023 •

edited by nsoranzo

EngyNasr commented Jun 28, 2023 •

edited

bebatut commented Jun 29, 2023 •

edited

EngyNasr commented Apr 23, 2024 •

edited