Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding a ballgown wrapper (test not working) #1366

Open
wants to merge 11 commits into
base: master
from

Conversation

Projects
None yet
4 participants
@CollardT
Copy link

commented Jun 14, 2017

The tests of the tool do not run on either planemo or the official Galaxy docker. I'm open to any critique.

Fix #1183.

Adding a ballgown wrapper (test not working)
The tests of the tool does not run on either planemo or the official Galaxy docker
@bgruening
Copy link
Member

left a comment

@CollardT Thanks so much for this tool. A short review inline.

@@ -0,0 +1,73 @@
#!/usr/bin/Rscript

This comment has been minimized.

Copy link
@bgruening

bgruening Jun 15, 2017

Member

!#/usr/bin/env Rscript

@@ -0,0 +1,294 @@
<tool id="ballgown" name="Ballgown" version="0.5.0" workflow_compatible="true">

This comment has been minimized.

Copy link
@bgruening

bgruening Jun 15, 2017

Member

Shouldn't be the version 2.2.0?

<requirement type="package" version="1.3.2">r-optparse</requirement>

</requirements>
<command interpreter="Rscript" detect_errors="aggressive">

This comment has been minimized.

Copy link
@bgruening

bgruening Jun 15, 2017

Member

interpreter is depreacted. Can you use Rscript $__tool_directory__/script.r instead.

## ...
## }
##------------------------------------------------------------------------------------
#def read_sample_mapping_file(sample_mapping_file):

This comment has been minimized.

Copy link
@bgruening

bgruening Jun 15, 2017

Member

This is a lot of magic, could you use configfiles instead? https://github.com/bgruening/galaxytools/blob/master/tools/sklearn/nn_classifier.xml#L15

Or a separate python file?

#end for

## Run the R script with the location of the linked files and the name for outpot file
ballgown.R --directory $output.files_path --outputtranscript $output --outputgenes $outputgn --texpression $trexpression --phendat $phendata --bgout $bgo

This comment has been minimized.

Copy link
@bgruening

bgruening Jun 15, 2017

Member

please single quote all paths and text parameters.

ballgown.R --directory $output.files_path --outputtranscript $output --outputgenes $outputgn --texpression $trexpression --phendat $phendata --bgout $bgo
</command>
<inputs>
<param name="e_data" type="data_collection" collection_type="list" format="tabular" label="Exon-level expression measurements" help="One row per exon. See below for more details."/>

This comment has been minimized.

Copy link
@bgruening

bgruening Jun 15, 2017

Member

indention looks wrong here.


.. class:: infomark

'''TIP''' *Note* Here's an example of a good phenotype data file for your expirement.

This comment has been minimized.

Copy link
@bgruening

bgruening Jun 15, 2017

Member

Nice docu!!!

<param argument="--texpression" name="trexpression" type="float" value="0.5" label="minimal transcript expression to appear in the results"/>
</inputs>
<outputs>
<data name="bgo" format="rda" label="${tool.name} on ${on_string}: ballgown object (R data file)"/>

This comment has been minimized.

Copy link
@bgruening

bgruening Jun 15, 2017

Member

Can we make this optional? Maybe via a select box or a boolean?

This comment has been minimized.

Copy link
@CollardT

CollardT Jun 15, 2017

Author

What should be optional exactly? The last input or the first output?
Because in the case of the output I cannot do much. When I tried to have an input that is also an output, Galaxy made my tool disappear from my toolshed.

This comment has been minimized.

Copy link
@bgruening

bgruening Jun 15, 2017

Member

The last output. Similar to here:

<filter>image_format['format']=="png"</filter>

</collection>
</param>
<param name="phendata" value="pdata.csv"/>
<output name="outputgn" file="genes_expression(tabular).csv"/>

This comment has been minimized.

Copy link
@bgruening

bgruening Jun 15, 2017

Member

I don't think it is a good idea to have ( in file names, can you use _ or just name is genes_expression.tabular?

This comment has been minimized.

Copy link
@bgruening

bgruening Jun 15, 2017

Member

Btw. this file is really a CSV file. I think in the Galaxy context it would be better to output a tsv file. You can post-process this with some shell magic if you like.

@bgruening

This comment has been minimized.

Copy link
Member

commented Jun 25, 2017

@CollardT please also add a .shed.yml file to the repo. This way the tests are triggered and we can upload it automatically to the TS.
Thanks.

CollardT and others added some commits Jun 26, 2017

Modifications following the differents comments made
added a .shed.yml
changed the some of the elements:
- parameters are now single quote
- rewritten the env informations
- corrected some errors made.
Adding a Ballgown wrapper
Latest modifications:
- added a .shed.yml file
- added a condition to the tool to create tsv or csv files
- changed some little things around: env in the R script, single quotes,
 etc
@bgruening

This comment has been minimized.

Copy link
Member

commented Aug 5, 2017

@CollardT I fixed a few linting errors the tests are now running and you have not specified the e_data in your tests. Let us know if you need help.

@CollardT

This comment has been minimized.

Copy link
Author

commented Aug 8, 2017

@bgruening Can you add this wrapper to the Tool shed of Galaxy? If it needs any further enhancements, I'm open to it.
Also, I don't see my e_data not being specified in the test.
And finally, you said in one of your previous comment that an external python script can be used to move the huge command block in the XML file. But I don't know how to use the Galaxy API outside of the XML file. Do you have by chance an example or a tutorial explaining how to do so?

@bgruening

This comment has been minimized.

Copy link
Member

commented Aug 9, 2017

@CollardT at the moment we can not merge this as the test are failing. See here for more details: https://travis-ci.org/galaxyproject/tools-iuc/jobs/261304875

Using external python/r/... scripts is not a problem and you don't need to use the Galaxy API here. Have a look at this example for a simple r script: https://github.com/galaxyproject/tools-iuc/tree/master/tools/goseq

Essentially, just write a python script that you can call from your command section if this is really needed.

CollardT added some commits Aug 14, 2017

@nsoranzo

This comment has been minimized.

Copy link
Member

commented Aug 14, 2017

@CollardT I'm not sure that commit f8f58a6 is correct, we generally prefer to use multiple="true" for multiple inputs, which accommodates both multiple datasets and a list collection. Am I missing something?

CollardT and others added some commits Aug 15, 2017

@nsoranzo

This comment has been minimized.

Copy link
Member

commented Sep 6, 2017

I've added some fixes to @CollardT branch, but the way multiple inputs are handled still need to be fixed and I'm not sure which is the best among list collections, multiple="true" or repeats for this specific tool.

</param>
<when value="tsv"/>
<when value="csv"/>
</conditional>

This comment has been minimized.

Copy link
@nsoranzo

nsoranzo Nov 9, 2017

Member

The wrapper can be simplified a lot if we remove this conditional and just use tsv (i.e. tabular Galaxy datatype). Is csv format somehow useful?

#else:
#set dir_name = str($output.files_path) + '/' + $key + '/'
#end if
$os.makedirs($dir_name)

This comment has been minimized.

Copy link
@nsoranzo

nsoranzo Nov 14, 2017

Member

You shouldn't use $output.files_path as working directory, just create a subdir in the present working directory, e.g.:

mkdir 'work_dir/$key'
#set $result = $read_input_files("i2t.ctab", $i2t, $result, $sample_mapping, False)
## For each input sample, create a directory and link the input files for ballgown
#import os

This comment has been minimized.

Copy link
@nsoranzo

nsoranzo Nov 14, 2017

Member

Remove #import os and add:

mkdir workd_dir
#end if
$os.makedirs($dir_name)
#for $file_name, $file_path in $value.iteritems():
$os.symlink($file_path, $dir_name + $file_name)

This comment has been minimized.

Copy link
@nsoranzo

nsoranzo Nov 14, 2017

Member

ln -s '$file_path' 'workd_dir/$key/$file_name'

#if str($file_format.format) == 'tsv':
--tsvoutputtranscript $toutputtranscript
--tsvoutputgenes $toutput
--directory $toutput.files_path

This comment has been minimized.

Copy link
@nsoranzo

nsoranzo Nov 14, 2017

Member

--directory work_dir

#else:
--outputtranscript $output
--outputgenes $outputgn
--directory $output.files_path

This comment has been minimized.

Copy link
@nsoranzo

nsoranzo Nov 14, 2017

Member

--directory work_dir

<data name="outputgn" format="csv" from_work_dir="output_genes.csv" label="${tool.name} on ${on_string}: genes_expression_tabular">
<filter>file_format['format']=="csv"</filter>
</data>
<data name="toutputtranscript" format="tabular" from_work_dir="output_transcript.tsv" label="${tool.name} on ${on_string}: transcripts_expression_tabular">

This comment has been minimized.

Copy link
@nsoranzo

nsoranzo Nov 14, 2017

Member

Remove unused from_work_dir.

</inputs>
<outputs>
<data name="bgo" format="rdata" from_work_dir="ballgown_object.rda" label="${tool.name} on ${on_string}: ballgown_object_R_data_file"/>
<data name="output" format="csv" from_work_dir="output_transcript.csv" label="${tool.name} on ${on_string}: transcripts_expression_tabular">

This comment has been minimized.

Copy link
@nsoranzo

nsoranzo Nov 14, 2017

Member

Remove unused from_work_dir.

<data name="output" format="csv" from_work_dir="output_transcript.csv" label="${tool.name} on ${on_string}: transcripts_expression_tabular">
<filter>file_format['format']=="csv"</filter>
</data>
<data name="outputgn" format="csv" from_work_dir="output_genes.csv" label="${tool.name} on ${on_string}: genes_expression_tabular">

This comment has been minimized.

Copy link
@nsoranzo

nsoranzo Nov 14, 2017

Member

Remove unused from_work_dir.

<data name="toutputtranscript" format="tabular" from_work_dir="output_transcript.tsv" label="${tool.name} on ${on_string}: transcripts_expression_tabular">
<filter>file_format['format']=="tsv"</filter>
</data>
<data name="toutput" format="tabular" from_work_dir="output_genes.tsv" label="${tool.name} on ${on_string}: genes_expression_tabular">

This comment has been minimized.

Copy link
@nsoranzo

nsoranzo Nov 14, 2017

Member

Remove unused from_work_dir.

</conditional>
</inputs>
<outputs>
<data name="bgo" format="rdata" from_work_dir="ballgown_object.rda" label="${tool.name} on ${on_string}: ballgown_object_R_data_file"/>

This comment has been minimized.

Copy link
@nsoranzo

nsoranzo Nov 14, 2017

Member

Remove unused from_work_dir.

@bimbam23

This comment has been minimized.

Copy link
Contributor

commented Apr 10, 2018

Maybe the "$" needs to be escaped? I had a similar problem before. But I do not know much about cheetah code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.