Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add to Galaxy #15

Closed
ewels opened this issue Sep 7, 2015 · 50 comments
Closed

Add to Galaxy #15

ewels opened this issue Sep 7, 2015 · 50 comments
Milestone

Comments

@ewels
Copy link
Member

ewels commented Sep 7, 2015

See the tutorial. Maybe wait a bit for it to settle down and become more stable first..

@ewels ewels modified the milestone: v0.4 Oct 19, 2015
@ewels ewels modified the milestones: v0.4, v0.5 Nov 23, 2015
@ewels ewels changed the title Add to Galaxy? Add to Galaxy Dec 8, 2015
@ewels
Copy link
Member Author

ewels commented Mar 9, 2016

Quite a lot of work as I need to set up a galaxy instance. Dropping this for now, but would be nice to come back to at some point. If anyone out there is already used to Galaxy, would be great to have some help!

@ewels ewels closed this as completed Mar 9, 2016
@yvanlebras
Copy link
Contributor

Hi @ewels We, with @abretaud, just begin to evaluate the possibility to add multiQC to a Galaxy instance... Maybe something on an "Interactive environment" can be investigated with @bgruening...

@ewels
Copy link
Member Author

ewels commented Apr 7, 2016

Hi @yvanlebras, that's great! I'll re-open this issue then. Do you think it will be a lot of work to write a galaxy wrapper for it?

@ewels ewels reopened this Apr 7, 2016
@ewels ewels modified the milestones: v0.6, v0.5 Apr 7, 2016
@yvanlebras
Copy link
Contributor

I/We have to evaluate this ! In fact, creating a "classical" Galaxy tool seems to be not the best way.... As mentionned, creating a MultiQC Interactive Environment (like in this screencast : https://www.youtube.com/watch?v=mKmXSN1G-Po) can be a good way... The idea will be to allow MultiQC interact with logs from an history where have been execute tools like FastQC, STAR, Cutadapt, .... @bgruening & @erasche will be best to evaluate this....

@hexylena
Copy link

hexylena commented Apr 7, 2016

@yvanlebras is the output more than just an HTML dataset? I'm just seeing HTML + JS there. If so, then this would be fine as a classical galaxy tool, no IE needed

@yvanlebras
Copy link
Contributor

True... I don't know why I absolutely want to propose an IE... Oh indeed, because I love this functionality ;) More seriously,  you're right, this is more because I had like a feeling that MultiQC is more something who can be applied to an entire QC history + transiently + about dynamic visualization... so more "usable" as an IE than a Galaxy tool....

@ewels
Copy link
Member Author

ewels commented Apr 7, 2016

@erasche the main output from MultiQC is a HTML report as you say (a single file, everything is embedded). It does also save parsed data as tsv / yaml / json in a directory which can be helpful sometimes. It's easy to disable this if needed with a config option or command line flag.

@ewels
Copy link
Member Author

ewels commented Apr 7, 2016

The only thing that had worried me about running on Galaxy is that MultiQC needs to see all previous logs / standard out / stderr and so on (varies across tools). Not sure if different tasks are sandboxed in galaxy or not? Sorry for my unfamiliarity with it :)

@hexylena
Copy link

hexylena commented Apr 7, 2016

@yvanlebras I understand the feeling, IEs are exciting, I want to turn lots of tools into them too. :) That's an interesting application though, run MultiQC on existing datasets. Interesting thought!

@ewels ok... If your tool needs access to stdout/stderr, I am absolutely sure we can find a way to make that possible if it isn't already. It would be easily possible to keep all of the tsv/yaml/json extra datasets.

@ewels
Copy link
Member Author

ewels commented Apr 7, 2016

@erasche Not exactly - it just need access to files made by other tools. Some of these files will have come from the stdout/stderr of that tool. My point was that the files it needs could be seen as intermediate files and deleted before MultiQC runs..? Not sure if galaxy does that..

@yvanlebras
Copy link
Contributor

@ewels, you're right, we have deployed a new cloud VM.. sorry ;)

You can find a new version here : Bowtie2 stat report

Cheers,

Yvan

@ewels
Copy link
Member Author

ewels commented May 23, 2016

Thanks! What is this file called? As there's nothing else in the log, the bowtie2 module should take the filename as the sample name, and try to clean it up. I'll add something to the docs about this now.

Phil

@yvanlebras
Copy link
Contributor

yvanlebras commented May 23, 2016

The datasets name in the Galaxy history is " Bowtie2 on data 1, data 5, and data 4: mapping stats "

Not sure this can help you because you're searching the name of this stat file following bowtie2 command line execution, isn't it ? ;)

@ewels
Copy link
Member Author

ewels commented May 24, 2016

Yeah exactly, MultiQC is looking at the filename on the disk - I guess it won't know the name in the galaxy history if that's kept separately. At least, not without writing a Galaxy-specific plugin for MultiQC.

@bgruening
Copy link

bgruening commented May 24, 2016

@ewels @yvanlebras not following the problem closely, but do not forget you can always change the input file name on disk to everything you like by simply creating a symlink of $input.

ln -s $input ./my_smart_name.bam

This could also be the name of the history element, which you can access by $input.name afaik.

@devengineson
Copy link

@ewels can you give us the expected file name ?
@bgruening we will not forget ;) Thanks!

@ewels
Copy link
Member Author

ewels commented May 24, 2016

Absolutely - it can be whatever you like really. MultiQC will truncate from anything in the config.fn_clean_exts list.

So, if you call it sample_name.txt then the MultiQC report will show the name as sample_name.

@devengineson
Copy link

Sorry @ewels for my unclear question (Sorry for my very approximate frenglish) ;) I understand that MultiQC will truncate sample_name.ext, but it seems that for Bowtie2, MultiQC is searching to parse a specific filename (like bowtie2.log for example)... Is it the case? If yes, we have to preprocess the bowtie2 log file to affect a good file name (using the @bgruening method for example ;) ) before giving it to MultiQC.... OR, we don't have to change the name because MultiQC is looking at the content of the bowtie2 log file and we just have to give the bowtie2 log file to MultiQC. We have tested this second manner but it seems to don't work...

@ewels
Copy link
Member Author

ewels commented May 24, 2016

Ah I see, sorry. MultiQC uses a config file called search_patterns.yaml to define the search parameters (these can be overwritten by the user, see the docs).

Bowtie 2 has no standardised filename for the output as it's just stderr, so instead MultiQC finds logs by searching for any file containing the string "reads; of these:" which is pretty rubbish, but the best I could manage. Other modules do search by filename as you say, these use fn: in the config instead of contents:. For example, FastQC uses fn: '*_fastqc.zip'.

So MultiQC should find the bowtie logs with any filename if they're there. Then sample names in the report will be chosen based on the filename of the file that is found.

Make sense?

Phil

@ewels
Copy link
Member Author

ewels commented May 24, 2016

ps. Two things to consider - MultiQC will overwrite samples if it gives them the same name, so if all bowtie 2 logs are called bowtie2.log then your report will contain only the last file that was found. Also, if something goes wrong with the way that MultiQC parses the logs then there may be no results. It's usually possible to get a better idea about both of these by running in verbose mode (-v) or looking at the contents of multiqc_data/.multiqc.log

@devengineson
Copy link

Ok @ewels that make sense ;) So, in our first test, we have propose the previously mentionned bowtie 2 stat file to MultiQC but it didn't make nothing with it... maybe MultiQC don't like our file ;)

@ewels
Copy link
Member Author

ewels commented May 24, 2016

Hmm, you're right. Debugging now - MultiQC finds your log but is looking for a handful of lines which aren't there. My fault - I thought that it was always printed to the log but it seems the bowtie2 log is even more sparse than I thought.

ewels added a commit to MultiQC/test-data that referenced this issue May 24, 2016
ewels added a commit that referenced this issue May 24, 2016
@ewels
Copy link
Member Author

ewels commented May 24, 2016

Ok, change pushed - hopefully this version should recognise your bowtie2 logs now..

@devengineson
Copy link

Thank you! We are using the MultiQC 0.6 conda recipe. Can you inform me about the fact that an update can fix this problem ?

@ewels
Copy link
Member Author

ewels commented May 24, 2016

Hi @devengineson,

No sorry, this update is currently in v0.7dev which is only on GitHub. It will end up in v0.7 on conda when I release it, but that may be a few weeks yet.

Phil

@devengineson
Copy link

devengineson commented May 24, 2016

Ok! It was just to be sure ;)

rcavalcante added a commit to sartorlab/mint that referenced this issue Jun 4, 2016
* FastQCs, bismark, and bowtie2 outputs
* bowtie2 support is properly coming to multiqc v0.7.0 according to:
MultiQC/MultiQC#15
@ewels
Copy link
Member Author

ewels commented Jun 21, 2016

Hi all,

I'm building up to a v0.7 release soon. How are you getting on? Is there anything I need to add to MultiQC? Can I claim on the readme that it works with Galaxy yet? 😉

Phil

@yvanlebras
Copy link
Contributor

yvanlebras commented Jun 21, 2016

Hi Phil,

Yes we have finished the tests and it seems ok for the 0.6 version. We have create a dedicated Galaxy Tool Shed repository here : https://toolshed.g2.bx.psu.edu/view/engineson/multiqc/ff22ea7aa6bb

Cheers,

Yvan

@ewels
Copy link
Member Author

ewels commented Jun 21, 2016

Ok great! I'll add this to the changelog and readme files then if that's ok. Do you think you could write a sentence that I can copy describing how people can use it? Also maybe a slightly longer version that I can add to the docs or something.

Phil

@ewels
Copy link
Member Author

ewels commented Jun 21, 2016

Also, I just noticed that there is a section in multiqc.xml that describes citations. MultiQC has just been published in Bioinformatics, so that would be a better citation (Epigenomics of Common Disease was a conference poster). See http://dx.doi.org/10.1093/bioinformatics/btw354

@devengineson
Copy link

devengineson commented Jun 21, 2016

For sure! Thanks for the info

@ewels
Copy link
Member Author

ewels commented Jun 21, 2016

Ok, I've mentioned the wrapper in the readme now. Let me know if you'd like me to change the text. If there's nothing else for me to do with MultiQC then I'll close this issue. Feel free to reopen it again if you feel the need.

Thanks again!

Phil

@ewels ewels closed this as completed Jun 21, 2016
woook pushed a commit to moka-guys/MultiQC that referenced this issue Jan 11, 2018
ewels pushed a commit that referenced this issue May 7, 2021
DRAGEN-12963 Fix incorrect label in GC-Quality plot
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants