-
Notifications
You must be signed in to change notification settings - Fork 417
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add rgcca #3179
Add rgcca #3179
Conversation
@ecamenen Can you fix the R linting and tool failure issues? |
@bernt-matthias it should work now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ecamenen! Small review inline!
tools/rgcca/TUTORIAL.md
Outdated
@@ -0,0 +1,236 @@ | |||
# TUTORIAL RGCCA GALAXY-TOOL | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be moved to the GTN? training.galaxyproject.org/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be the goal. After validation of the PR? I'm ending my contract soon without being sure that I'll be renewed and I don't know how much time I could still give for the new adventure which is creating a Galaxy tutorial. is it possible to keep that here at the moment ? if not, I can make a link in the help section to the tutorial of my git company .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you dump this file as PR against training.galaxyproject.org ... just as a WIP PR. And hopefully, you will get paid to work on it in the future. In this repo it will be overseen :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refresh problem, I hadn't seen the comment. OK, no worries, I will remove it and send it to a tutorial link of my institute.
tools/rgcca/rgcca.xml
Outdated
help="A block defined as the concatenation of all the other blocks. The space spanned by global components is viewed as a compromise space that integrated all the modalities and facilitates the visualization of the results and their interpretation. If disabled, all blocks are assumed to be connected or a connection file could be used."/> | ||
|
||
<conditional name="supervised"> | ||
<param name="bool" type="boolean" label="Use a block as response (supervised analysis)"/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use a select here instead of a boolean? See https://galaxy-iuc-standards.readthedocs.io/en/latest/best_practices/tool_xml.html#booleans
tools/rgcca/rgcca.xml
Outdated
<conditional name="supervised"> | ||
<param name="bool" type="boolean" label="Use a block as response (supervised analysis)"/> | ||
<when value="true"> | ||
<param name="block_response" type="integer" value="0" min="0" max="10" label=" " help="@BLOCK_RULES@"/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But if you only have one parameter, you can remove the entire conditional I think and make the integer just optional="true"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, if this is the case, there is a bug in Galaxy frontend. ping @guerler is that something you could look at?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't you just clear the text input field?
tools/rgcca/rgcca.xml
Outdated
<data name="design" label="design.pdf" format="pdf"/> | ||
<data name="individual_table" label="individuals.tsv" format="tsv"/> | ||
<data name="variable_table" label="variables.tsv" format="tsv"/> | ||
<data name="R_results" label="rgcca.result.RData" format="rdata"/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would write all variable names in small letters
<!--</output>--> | ||
</test> | ||
|
||
<test expect_num_outputs="8" expect_exit_code="0"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super cool! Thanks for the great tests.
tools/rgcca/rgcca.xml
Outdated
<param name="bool" type="boolean" | ||
label="Load the design matrix (if superblock or supervised disabled)"/> | ||
<when value="true"> | ||
<param name="file" type="data" format="tsv,tabular,txt,csv" label=" " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tsv is not a csv? Does this matter? In genearl I would recommend to use only TSV in Galaxy. There are so many tools that can work with TSV.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the default format when I upload my files to planemo is tabular. Do I leave only tsv as the accepted format for all file fields (and in this case, I will explain later in the documentation how to change the default format) or do I add the tabular format?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tabular only is totally ok I think. The questions was more how does your tool decide between tabular and csv? Can it support both at the same time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tools/rgcca/rgcca.xml
Outdated
#set $data_paths += ',' + $s.block_n.file_name | ||
#set $data_names += ',' + $s.block_n.name | ||
#end for | ||
Rscript $__tool_directory__/launcher.R |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
every path should be single quoted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping
tools/rgcca/rgcca.xml
Outdated
Rscript $__tool_directory__/launcher.R | ||
--datasets '${$data_paths}' | ||
--names '${$data_names}' | ||
--directory $__tool_directory__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does this directory do? Please note that this directory is read only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remnants of codes from an old version without conda. removed.
tools/rgcca/rgcca.xml
Outdated
#set $data_names += ',' + $s.block_n.name | ||
#end for | ||
Rscript $__tool_directory__/launcher.R | ||
--datasets '${$data_paths}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks like a $ too many
@bgruening if I have understood everything correctly, all the modifications requested by your review have been made (remaining: GTN; optional: tabular format). |
hi @bgruening. i've made the last changes ;) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remove the files under static/images/ please? They do not seems to be used.
Sorry, I have very limited time lately. @bernt-matthias could you help here please?
tools/rgcca/rgcca.xml
Outdated
#set $data_paths += ',' + $s.block_n.file_name | ||
#set $data_names += ',' + $s.block_n.name | ||
#end for | ||
Rscript $__tool_directory__/launcher.R |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping
tools/rgcca/rgcca.xml
Outdated
<inputs> | ||
<param name="block_1" type="data" format="tsv" label="Load a block" | ||
help="TSV file containing a matrix with: (i) quantitative values only (decimal should be separated by '.'), (ii) the samples in lines (should be labelled in the 1rst column) and (iii) variables in columns (should have a header)."/> | ||
<repeat name="dataset" title="New block" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you simply use a <param name="block_n" type="data" format="tsv" multiple="true" optional="true" label="Dataset"/>
and remove the repeat?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or even better, merge block_1 and block_n and make it <param name="block_n" type="data" format="tsv" multiple="true" optional="false" label="Dataset"/>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bernt-matthias Sorry for the delay, in the meantime my contract was finished and I don't had much time for new features.
I don't know about multiple data, I think it's less intuitive for biologist users in my lab but if you want I can code that. [For example, I don't know how to select only certain data (Cf. image) and suppress the ones we don't want to. So the biologist don't either.]
As a result, I can't get my tool to work with multiple data anymore with this notation. I looked at the help sections I could find and the example (https://github.com/galaxyproject/galaxy/blob/dev/test/functional/tools/multi_data_param.xml) but I m more a R programmer. Can you help me with that? I pushed a broken version for the example. The test on a single file works, but not multiple files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bernt-matthias Sorry for the delay, in the meantime my contract was finished and I don't had much time for new features.
No worries.
I don't know about multiple data, I think it's less intuitive for biologist users in my lab but if you want I can code that. [For example, I don't know how to select only certain data (Cf. image) and suppress the ones we don't want to. So the biologist don't either.]
You can add more data sets by holding the Ctrl
key while clicking data sets in the selcetor. You can select complete sets of files by holding Shift
and click.
As a result, I can't get my tool to work with multiple data anymore with this notation. I looked at the help sections I could find and the example (https://github.com/galaxyproject/galaxy/blob/dev/test/functional/tools/multi_data_param.xml) but I m more a R programmer. Can you help me with that? I pushed a broken version for the example. The test on a single file works, but not multiple files.
For the test you can simple use a comma separated list of file named in the parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#set data_paths = ",".join([_.file_name for _ in $blocks])
#set data_names = ",".join([_.name for _ in $blocks])
might be a short cut to the for loop. But not sure if it works.
tools/rgcca/macro.xml
Outdated
|
||
<token name="@TOOL_VERSION@">3.0.0</token> | ||
|
||
<token name="@BLOCK_RULES@">0 corresponds to the superblock (or the last block loaded), 1 to the first block, 2 to the second one, etc. This number should not be greater than the number of blocks selected.</token> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add the default, as in COMP_RULES
tools/rgcca/rgcca.xml
Outdated
<inputs> | ||
<param name="block_1" type="data" format="tsv" label="Load a block" | ||
help="TSV file containing a matrix with: (i) quantitative values only (decimal should be separated by '.'), (ii) the samples in lines (should be labelled in the 1rst column) and (iii) variables in columns (should have a header)."/> | ||
<repeat name="dataset" title="New block" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or even better, merge block_1 and block_n and make it <param name="block_n" type="data" format="tsv" multiple="true" optional="false" label="Dataset"/>
tools/rgcca/rgcca.xml
Outdated
</inputs> | ||
|
||
<outputs> | ||
<data name="individual_plot" label="individuals.pdf" format="pdf"/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- please add labels to the outputs that distinguishes them eg
${tool.name} on ${on_string}: individual plot
- are there some optional outputs? then you may add a select to the inputs and make the outputs conditional on the users choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. What exactly would you like to fix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe there is a way to display the real name of the data files (e.g. agriculture.txt and industry.txt)? data2 and data1 doesn't look great. Otherwise, I'll take on ${on_string}
off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would strongly vote for leaving it as is, because that's Galaxy's default naming scheme.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
tools/rgcca/rgcca.xml
Outdated
</conditional> | ||
|
||
<conditional name="tau"> | ||
<param name="bool" type="boolean" truevalue="optimal" falsevalue="false" checked="false" label="Use an optimal tau" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the IUC standards this should be a select.
tools/rgcca/rgcca.xml
Outdated
<requirement type="package" version="@TOOL_VERSION@">rgccacmd</requirement> | ||
</requirements> | ||
|
||
<stdio> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may just add detect_errors="exit_code"
to the command tag instead
tools/rgcca/macro.xml
Outdated
<token name="@COMP_RULES@">This number should not be greater than the selected number of component (2, by default).</token> | ||
|
||
<xml name="output_tests" token_path="" token_compx="1" token_compy="2"> | ||
<param name="block_1" value="agriculture.tsv" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add ftype
tools/rgcca/macro.xml
Outdated
|
||
<xml name="output_tests_3blocks"> | ||
<repeat name="dataset"> | ||
<param name="block_n" value="industry.tsv"/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also ftype
@@ -0,0 +1,528 @@ | |||
# Author: Etienne CAMENEN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears to me that you copied this file from https://github.com/rgcca-factory/RGCCA/blob/master/inst/launcher.R which is for version 3.0? Do you know for sure if it works also for 2.1.2?
Would it be an option if I try to update the conda package to 3.0 and include the launcher directly there. Then we do not need to have it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The galaxy will not work with the latest version of CRAN which is 2.1.2. I worked with statisticians on the github, we had to make concessions and modifications and it's not always very clean :P We can talk about a pre-version tagged "3.0.0" that I packaged in its current conda (and finally, we decided to name it rather 2.9.0). But it's not yet available for the CRAN (WIP).
Not knowing in advance how Galaxy worked with Conda and learning little by little, I actually had to make a "patch" by copying and modifying a recent version of my command line interface (launcher.R) for this package (bioconda/bioconda-recipes#22918 (comment)).
You can indeed take the previous version of my conda to update the launcher with what your lintr asked for and change the path so that I can call it from the conda (I'm pretty new in conda). I'm used to put R-Shiny files and external scripts (as launcher.R) in an "inst" folder, but from memory I hadn't managed to access the CONDA_PREFIX variable via Galaxy. Hence my "copy and paste" patch.
Co-authored-by: M Bernt <m.bernt@ufz.de>
Co-authored-by: M Bernt <m.bernt@ufz.de>
I didn't understand too much. And it will make me change internal functions, tests and documentation on several deliverables derived from this RGCCA package (docker, shiny, command line). We have a lot of files with "_" in our lab. Under R, the file existence checks have been performed with the file.exists() function (https://github.com/rgcca-factory/RGCCA/blob/shiny/R/check_file.R). |
Actually only the |
Cool tests are passing :). Can the messages:
be ignored? |
tools/rgcca/rgcca.xml
Outdated
</inputs> | ||
|
||
<outputs> | ||
<data name="individual_plot" label="${tool.name} on ${on_string}: individuals.pdf" format="pdf"/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is how you use the output selector in the outputs:
<data name="individual_plot" label="${tool.name} on ${on_string}: individuals.pdf" format="pdf"/> | |
<data name="individual_plot" label="${tool.name} on ${on_string}: individuals.pdf" format="pdf"> | |
<filter>"individuals" in output_selector</filter> | |
</data> |
|
||
<test expect_num_outputs="8" expect_exit_code="0"> | ||
<expand macro="output_tests" path="1block"/> | ||
<param name="blocks" value="agriculture.tsv" ftype = "tsv"/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also in the tests you can then select the desired outputs:
<param name="blocks" value="agriculture.tsv" ftype = "tsv"/> | |
<param name="blocks" value="agriculture.tsv" ftype = "tsv"/> | |
<param name="output_selector" value="individual,corcircle,top_variables"> |
and you would need to change expect_num_outputs="3"
Ah ok, I didn't understand! It's managed internally in an R function, but if it's done in front-end, that's great. |
If I remember where it is, it's an internal function that has been fixed since then. If a new conda is made, I suppose we can start on the version under development (the "shiny" branch, let's say) which is the latest one in progress and which should not know any major modifications. |
Co-authored-by: M Bernt <m.bernt@ufz.de>
Co-authored-by: M Bernt <m.bernt@ufz.de>
I have not closed comments, but normally all suggestions made more than two days ago have been made. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bgruening ready for merge?
Hello @bernt-matthias, we would like to present our galaxy tool in congress. I thought that for future updates the tool would appear in the toolbar here https://usegalaxy.eu/ , but it doesn't seem to be the case. Can you tell me how to access it from "your official server" ? |
Hi @ecamenen you need to add it to this list: https://github.com/usegalaxy-eu/usegalaxy-eu-tools/blob/master/tools_iuc.yaml by opening a PR over there. |
FOR CONTRIBUTOR: