Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add rgcca #3179

Merged
merged 23 commits into from
Jan 12, 2021
Merged

Add rgcca #3179

merged 23 commits into from
Jan 12, 2021

Conversation

ecamenen
Copy link
Contributor

@ecamenen ecamenen commented Sep 2, 2020

FOR CONTRIBUTOR:

  • - I have read the CONTRIBUTING.md document and this tool is appropriate for the tools-iuc repo.
  • - License permits unrestricted use (educational + commercial)
  • - This PR adds a new tool or tool collection
  • - This PR updates an existing tool or tool collection
  • - This PR does something else (explain below)

@bernt-matthias
Copy link
Contributor

@ecamenen Can you fix the R linting and tool failure issues?

@ecamenen
Copy link
Contributor Author

@bernt-matthias it should work now

Copy link
Member

@bgruening bgruening left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ecamenen! Small review inline!

@@ -0,0 +1,236 @@
# TUTORIAL RGCCA GALAXY-TOOL

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be moved to the GTN? training.galaxyproject.org/

Copy link
Contributor Author

@ecamenen ecamenen Oct 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be the goal. After validation of the PR? I'm ending my contract soon without being sure that I'll be renewed and I don't know how much time I could still give for the new adventure which is creating a Galaxy tutorial. is it possible to keep that here at the moment ? if not, I can make a link in the help section to the tutorial of my git company .

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you dump this file as PR against training.galaxyproject.org ... just as a WIP PR. And hopefully, you will get paid to work on it in the future. In this repo it will be overseen :(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refresh problem, I hadn't seen the comment. OK, no worries, I will remove it and send it to a tutorial link of my institute.

help="A block defined as the concatenation of all the other blocks. The space spanned by global components is viewed as a compromise space that integrated all the modalities and facilitates the visualization of the results and their interpretation. If disabled, all blocks are assumed to be connected or a connection file could be used."/>

<conditional name="supervised">
<param name="bool" type="boolean" label="Use a block as response (supervised analysis)"/>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<conditional name="supervised">
<param name="bool" type="boolean" label="Use a block as response (supervised analysis)"/>
<when value="true">
<param name="block_response" type="integer" value="0" min="0" max="10" label=" " help="@BLOCK_RULES@"/>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if you only have one parameter, you can remove the entire conditional I think and make the integer just optional="true"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not very keen about that. For integer, the user cannot go back if he has selected an integer for this parameter and then realizes that finally he doesn't want to select any. He has to refresh the analysis for that. If possible, I would prefer to stay on a conditional select option.
image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, if this is the case, there is a bug in Galaxy frontend. ping @guerler is that something you could look at?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't you just clear the text input field?

<data name="design" label="design.pdf" format="pdf"/>
<data name="individual_table" label="individuals.tsv" format="tsv"/>
<data name="variable_table" label="variables.tsv" format="tsv"/>
<data name="R_results" label="rgcca.result.RData" format="rdata"/>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would write all variable names in small letters

<!--</output>-->
</test>

<test expect_num_outputs="8" expect_exit_code="0">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super cool! Thanks for the great tests.

<param name="bool" type="boolean"
label="Load the design matrix (if superblock or supervised disabled)"/>
<when value="true">
<param name="file" type="data" format="tsv,tabular,txt,csv" label=" "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tsv is not a csv? Does this matter? In genearl I would recommend to use only TSV in Galaxy. There are so many tools that can work with TSV.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the default format when I upload my files to planemo is tabular. Do I leave only tsv as the accepted format for all file fields (and in this case, I will explain later in the documentation how to change the default format) or do I add the tabular format?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tabular only is totally ok I think. The questions was more how does your tool decide between tabular and csv? Can it support both at the same time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I based myself on my R shiny app: you have to choose the column separator of the file. But if the csv (or txt) are a problem, I remove it.
image

#set $data_paths += ',' + $s.block_n.file_name
#set $data_names += ',' + $s.block_n.name
#end for
Rscript $__tool_directory__/launcher.R
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

every path should be single quoted.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping

Rscript $__tool_directory__/launcher.R
--datasets '${$data_paths}'
--names '${$data_names}'
--directory $__tool_directory__
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this directory do? Please note that this directory is read only.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remnants of codes from an old version without conda. removed.

#set $data_names += ',' + $s.block_n.name
#end for
Rscript $__tool_directory__/launcher.R
--datasets '${$data_paths}'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks like a $ too many

@ecamenen
Copy link
Contributor Author

ecamenen commented Oct 1, 2020

@bgruening if I have understood everything correctly, all the modifications requested by your review have been made (remaining: GTN; optional: tabular format).

@ecamenen
Copy link
Contributor Author

hi @bgruening. i've made the last changes ;)

Copy link
Member

@bgruening bgruening left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove the files under static/images/ please? They do not seems to be used.

Sorry, I have very limited time lately. @bernt-matthias could you help here please?

#set $data_paths += ',' + $s.block_n.file_name
#set $data_names += ',' + $s.block_n.name
#end for
Rscript $__tool_directory__/launcher.R
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping

<inputs>
<param name="block_1" type="data" format="tsv" label="Load a block"
help="TSV file containing a matrix with: (i) quantitative values only (decimal should be separated by '.'), (ii) the samples in lines (should be labelled in the 1rst column) and (iii) variables in columns (should have a header)."/>
<repeat name="dataset" title="New block"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you simply use a <param name="block_n" type="data" format="tsv" multiple="true" optional="true" label="Dataset"/> and remove the repeat?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or even better, merge block_1 and block_n and make it <param name="block_n" type="data" format="tsv" multiple="true" optional="false" label="Dataset"/>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bernt-matthias Sorry for the delay, in the meantime my contract was finished and I don't had much time for new features.
I don't know about multiple data, I think it's less intuitive for biologist users in my lab but if you want I can code that. [For example, I don't know how to select only certain data (Cf. image) and suppress the ones we don't want to. So the biologist don't either.]
As a result, I can't get my tool to work with multiple data anymore with this notation. I looked at the help sections I could find and the example (https://github.com/galaxyproject/galaxy/blob/dev/test/functional/tools/multi_data_param.xml) but I m more a R programmer. Can you help me with that? I pushed a broken version for the example. The test on a single file works, but not multiple files.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bernt-matthias Sorry for the delay, in the meantime my contract was finished and I don't had much time for new features.

No worries.

I don't know about multiple data, I think it's less intuitive for biologist users in my lab but if you want I can code that. [For example, I don't know how to select only certain data (Cf. image) and suppress the ones we don't want to. So the biologist don't either.]

You can add more data sets by holding the Ctrl key while clicking data sets in the selcetor. You can select complete sets of files by holding Shift and click.

As a result, I can't get my tool to work with multiple data anymore with this notation. I looked at the help sections I could find and the example (https://github.com/galaxyproject/galaxy/blob/dev/test/functional/tools/multi_data_param.xml) but I m more a R programmer. Can you help me with that? I pushed a broken version for the example. The test on a single file works, but not multiple files.

For the test you can simple use a comma separated list of file named in the parameter.

Copy link
Contributor

@bernt-matthias bernt-matthias Dec 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#set data_paths = ",".join([_.file_name for _ in $blocks])
#set data_names = ",".join([_.name for _ in $blocks])

might be a short cut to the for loop. But not sure if it works.


<token name="@TOOL_VERSION@">3.0.0</token>

<token name="@BLOCK_RULES@">0 corresponds to the superblock (or the last block loaded), 1 to the first block, 2 to the second one, etc. This number should not be greater than the number of blocks selected.</token>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add the default, as in COMP_RULES

<inputs>
<param name="block_1" type="data" format="tsv" label="Load a block"
help="TSV file containing a matrix with: (i) quantitative values only (decimal should be separated by '.'), (ii) the samples in lines (should be labelled in the 1rst column) and (iii) variables in columns (should have a header)."/>
<repeat name="dataset" title="New block"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or even better, merge block_1 and block_n and make it <param name="block_n" type="data" format="tsv" multiple="true" optional="false" label="Dataset"/>

</inputs>

<outputs>
<data name="individual_plot" label="individuals.pdf" format="pdf"/>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • please add labels to the outputs that distinguishes them eg ${tool.name} on ${on_string}: individual plot
  • are there some optional outputs? then you may add a select to the inputs and make the outputs conditional on the users choice.

Copy link
Contributor Author

@ecamenen ecamenen Dec 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, do you know how to fix this ('data_2 and data_1')?
image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. What exactly would you like to fix?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe there is a way to display the real name of the data files (e.g. agriculture.txt and industry.txt)? data2 and data1 doesn't look great. Otherwise, I'll take on ${on_string} off.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would strongly vote for leaving it as is, because that's Galaxy's default naming scheme.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

</conditional>

<conditional name="tau">
<param name="bool" type="boolean" truevalue="optimal" falsevalue="false" checked="false" label="Use an optimal tau"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the IUC standards this should be a select.

<requirement type="package" version="@TOOL_VERSION@">rgccacmd</requirement>
</requirements>

<stdio>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may just add detect_errors="exit_code" to the command tag instead

<token name="@COMP_RULES@">This number should not be greater than the selected number of component (2, by default).</token>

<xml name="output_tests" token_path="" token_compx="1" token_compy="2">
<param name="block_1" value="agriculture.tsv" />
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add ftype


<xml name="output_tests_3blocks">
<repeat name="dataset">
<param name="block_n" value="industry.tsv"/>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also ftype

@@ -0,0 +1,528 @@
# Author: Etienne CAMENEN
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears to me that you copied this file from https://github.com/rgcca-factory/RGCCA/blob/master/inst/launcher.R which is for version 3.0? Do you know for sure if it works also for 2.1.2?

Would it be an option if I try to update the conda package to 3.0 and include the launcher directly there. Then we do not need to have it here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The galaxy will not work with the latest version of CRAN which is 2.1.2. I worked with statisticians on the github, we had to make concessions and modifications and it's not always very clean :P We can talk about a pre-version tagged "3.0.0" that I packaged in its current conda (and finally, we decided to name it rather 2.9.0). But it's not yet available for the CRAN (WIP).

Not knowing in advance how Galaxy worked with Conda and learning little by little, I actually had to make a "patch" by copying and modifying a recent version of my command line interface (launcher.R) for this package (bioconda/bioconda-recipes#22918 (comment)).

You can indeed take the previous version of my conda to update the launcher with what your lintr asked for and change the path so that I can call it from the conda (I'm pretty new in conda). I'm used to put R-Shiny files and external scripts (as launcher.R) in an "inst" folder, but from memory I hadn't managed to access the CONDA_PREFIX variable via Galaxy. Hence my "copy and paste" patch.

ecamenen and others added 3 commits December 7, 2020 17:09
Co-authored-by: M Bernt <m.bernt@ufz.de>
Co-authored-by: M Bernt <m.bernt@ufz.de>
@ecamenen
Copy link
Contributor Author

ecamenen commented Dec 7, 2020

        #set data_paths = ",".join([str(_.file_name) for _ in $blocks])
        #set data_names = ",".join([str(_.name) for _ in $blocks])

Note that since data set and file names may contain commas this may fail. We could search and replace , by _

I didn't understand too much. And it will make me change internal functions, tests and documentation on several deliverables derived from this RGCCA package (docker, shiny, command line). We have a lot of files with "_" in our lab. Under R, the file existence checks have been performed with the file.exists() function (https://github.com/rgcca-factory/RGCCA/blob/shiny/R/check_file.R).

@bernt-matthias
Copy link
Contributor

        #set data_paths = ",".join([str(_.file_name) for _ in $blocks])
        #set data_names = ",".join([str(_.name) for _ in $blocks])

Note that since data set and file names may contain commas this may fail. We could search and replace , by _

I didn't understand too much. And it will make me change internal functions, tests and documentation on several deliverables derived from this RGCCA package (docker, shiny, command line). We have a lot of files with "_" in our lab. Under R, the file existence checks have been performed with the file.exists() function (https://github.com/rgcca-factory/RGCCA/blob/shiny/R/check_file.R).

Actually only the data_names are a problem. Assume that you have a data set with a , in the name. Then the splitting will be wrong. But we can fix this on the Galaxy side. I will suggest a fix once the tests are passing.

@bernt-matthias
Copy link
Contributor

Cool tests are passing :). Can the messages:

Warning message:
In if (class(y) %in% c("matrix", "data.frame")) { :
  the condition has length > 1 and only the first element will be used

be ignored?

tools/rgcca/rgcca.xml Show resolved Hide resolved
</inputs>

<outputs>
<data name="individual_plot" label="${tool.name} on ${on_string}: individuals.pdf" format="pdf"/>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how you use the output selector in the outputs:

Suggested change
<data name="individual_plot" label="${tool.name} on ${on_string}: individuals.pdf" format="pdf"/>
<data name="individual_plot" label="${tool.name} on ${on_string}: individuals.pdf" format="pdf">
<filter>"individuals" in output_selector</filter>
</data>


<test expect_num_outputs="8" expect_exit_code="0">
<expand macro="output_tests" path="1block"/>
<param name="blocks" value="agriculture.tsv" ftype = "tsv"/>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also in the tests you can then select the desired outputs:

Suggested change
<param name="blocks" value="agriculture.tsv" ftype = "tsv"/>
<param name="blocks" value="agriculture.tsv" ftype = "tsv"/>
<param name="output_selector" value="individual,corcircle,top_variables">

and you would need to change expect_num_outputs="3"

@ecamenen
Copy link
Contributor Author

ecamenen commented Dec 8, 2020

        #set data_paths = ",".join([str(_.file_name) for _ in $blocks])
        #set data_names = ",".join([str(_.name) for _ in $blocks])

Note that since data set and file names may contain commas this may fail. We could search and replace , by _

I didn't understand too much. And it will make me change internal functions, tests and documentation on several deliverables derived from this RGCCA package (docker, shiny, command line). We have a lot of files with "_" in our lab. Under R, the file existence checks have been performed with the file.exists() function (https://github.com/rgcca-factory/RGCCA/blob/shiny/R/check_file.R).

Actually only the data_names are a problem. Assume that you have a data set with a , in the name. Then the splitting will be wrong. But we can fix this on the Galaxy side. I will suggest a fix once the tests are passing.

Ah ok, I didn't understand! It's managed internally in an R function, but if it's done in front-end, that's great.

@ecamenen
Copy link
Contributor Author

ecamenen commented Dec 8, 2020

Cool tests are passing :). Can the messages:

Warning message:
In if (class(y) %in% c("matrix", "data.frame")) { :
  the condition has length > 1 and only the first element will be used

be ignored?

If I remember where it is, it's an internal function that has been fixed since then. If a new conda is made, I suppose we can start on the version under development (the "shiny" branch, let's say) which is the latest one in progress and which should not know any major modifications.
Note for me : check_size_blocks.R l5:
if (any(class(y) %in% c("matrix", "data.frame"))) {

ecamenen and others added 3 commits December 8, 2020 15:28
Co-authored-by: M Bernt <m.bernt@ufz.de>
Co-authored-by: M Bernt <m.bernt@ufz.de>
@ecamenen
Copy link
Contributor Author

ecamenen commented Dec 8, 2020

I have not closed comments, but normally all suggestions made more than two days ago have been made.

Copy link
Contributor

@bernt-matthias bernt-matthias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bgruening ready for merge?

@bgruening bgruening merged commit 00f9e92 into galaxyproject:master Jan 12, 2021
@ecamenen
Copy link
Contributor Author

Hello @bernt-matthias, we would like to present our galaxy tool in congress. I thought that for future updates the tool would appear in the toolbar here https://usegalaxy.eu/ , but it doesn't seem to be the case. Can you tell me how to access it from "your official server" ?

@bernt-matthias
Copy link
Contributor

Hi @ecamenen you need to add it to this list: https://github.com/usegalaxy-eu/usegalaxy-eu-tools/blob/master/tools_iuc.yaml by opening a PR over there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants