STAR 2.7.0x update #2316

bgruening · 2019-02-26T23:23:25Z

STAR changed the index layout in >2.7.0. So I think we need to have a new loc file or add a version column to the already existing loc file.

In addition, STAR added a few new features for scRNA libraries.

Has anyone time to work on these updates?

ping @bwlang @yhoogstrate

wm75 · 2019-03-17T10:14:27Z

@bgruening I can spend some time on it this week

wm75 · 2019-03-18T19:09:46Z

After a first look at the new scRNA functionality I'm almost tempted to split this out into a separate tool.
Specifically, my idea would be to have two xmls living side-by-side sharing most of their code via macros, but getting packaged into two toolshed repos.

Advantages I can see:

less nested conditionals
no changes to the regular STAR wrapper -> makes it easier to update the tool version used in workflows
could offer STAR solo in an scRNA category, while regular STAR could continue living in the RNA-Seq category of the Tools bar

Disadvantages:

harder to understand that this is the same underlying command-line tool
some wrapper code duplication (though of course, macros will help a lot)

This is really just an idea, but I thought I'd bring it up early before beginning any serious work on the wrapper. Opinions @bgruening, @bebatut, @mtekman, everyone?

bgruening · 2019-03-18T19:13:06Z

My initial feeling was a separate tool as well. But I stopped looking at it deeply.

mtekman · 2019-03-18T19:15:26Z

I've played around only slightly with STAR solo, but wouldn't the scRNA-only wrapper you are suggesting still share a ton of overlap with the vanilla tool? As in, doesn't STAR solo extend functionality, instead of providing completely separate functionality? So all the options from RNA-seq will still apply to the scRNA stuff?

…

On Mon, 18 Mar 2019 at 20:09, Wolfgang Maier ***@***.***> wrote: After a first look at the new scRNA functionality I'm almost tempted to split this out into a separate tool. Specifically, my idea would be to have two xmls living side-by-side sharing most of their code via macros, but getting packaged into two toolshed repos. Advantages I can see: - less nested conditionals - no changes to the regular STAR wrapper -> makes it easier to update the tool version used in workflows - could offer STAR solo in an scRNA category, while regular STAR could continue living in the RNA-Seq category of the Tools bar Disadvantages: - harder to understand that this is the same underlying command-line tool - some wrapper code duplication (though of course, macros will help a lot) This is really just an idea, but I thought I'd bring it up early before beginning any serious work on the wrapper. Opinions @bgruening <https://github.com/bgruening>, @bebatut <https://github.com/bebatut>, @mtekman <https://github.com/mtekman>, everyone? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2316 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ATr2ehq6lQlGkB8pewxopXlJSj-wQkcGks5vX-R9gaJpZM4bTWPS> .

wm75 · 2019-03-18T19:17:31Z

Sure, for users of STAR solo the benefits of a separate tool would be minimal.
For the (many more) users of regular STAR, however, we would avoid additional complexity, which they just don't need.

mtekman · 2019-03-18T19:23:28Z

Oh right, yes I guess for better UX this would definitely be the way forward

…

On Mon, 18 Mar 2019 at 20:17, Wolfgang Maier ***@***.***> wrote: Sure, for users of STAR solo the benefits of a separate tool would be minimal. For the (many more) users of regular STAR, however, we would avoid additional complexity, which they just don't need. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2316 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ATr2er0TSpfBZ_o_SeXDq4giyzOfdFSvks5vX-ZNgaJpZM4bTWPS> .

wm75 · 2019-03-20T09:54:46Z

Looking at the STARsolo option a bit more I'm starting to wonder whether it is really mature enough to offer it through Galaxy. I've found:

at least, which give me the impression that it may be better to wait one or two minor releases longer.
These aren't show-stoppers, i.e., they are not preventing me from working on a tool wrapper and a DM update, but maybe it's not yet time for a MTS release? In particular, I'm worried about genome index changes, which already seem to have occurred within the 2.7.0 series (i.e. with patch releases! - have to confirm this though) and we may not want to force Admins to keep on rebuilding indexes on every update.

wm75 · 2019-03-20T09:56:16Z

Another issue I've come across is alexdobin/STAR#594, which may require a bioconda recipe fix if reproducible.

wm75 · 2019-03-20T11:12:47Z

I cannot reproduce the bioconda issue mentioned above. Maybe it was fixed by upgrading to v2.7.0e.

alexdobin · 2019-03-20T22:58:41Z

Hi All,

interesting discussion here, I would like to put my 2 cents and hear your recommendation.

The STARsolo releases are still somewhat buggy, but it seems to be stabilizing.
However, I will be adding more solo* options in the next 1-2 months.

The genome index - is it too painful for Galaxy users to re-generate the index?
I was conservative for a while trying to preserve the compatibility with old indexes.
I could, in principle, maintain compatibility and require new indexes only for new features (e.g. STARsolo), but it makes it more complicated for developing.

STARsolo is still STAR - it just takes a few extra parameters and generates a few more outputs.

The segfault "STAR --version" (with bioconda) looked strange to me. I will have to look into what bioconda is doing.

Cheers
Alex

wm75 · 2019-03-21T09:59:00Z

Hi Alex,
great to see you here!

I guess most of the points discussed so far are things I've raised simply to get some feedback, not because they are a serious concern. It's helpful to hear that you're sharing my opinion about stability of the STARsolo mode. It's not a problem to wait just a little bit longer until you have resolved the majority of the issues around it. I can prepare the Galaxy wrappers for STAR2.7, then release them only once we
can couple them to a stable version. The bioconda issue could potentially be a serious problem, but as I commented above, I could not reproduce it so things look good on that side.

The thing that I'm most concerned about right now really is the index building:

If you want to understand why we have issues with rebuilding indexes, it may be useful to provide you with a bit of background on how Galaxy handles them (feel free to skip this if you aren't interested). Essentially, you can think of the Galaxy community as consisting of three spheres (with a rather huge overlap between them though):

tool developers who work on wrapping tools (like STAR) for Galaxy and offering them for installation through Galaxy toolsheds (the most important one of these being the Main Toolshed or MTS)
server administrators who manage a Galaxy server and install tools (from toolsheds) into it so that they can be used on that particular server instance
end users who work on a Galaxy server and run installed tools on it

Now who is responsible for building the indexes and/or provide additional data that tools may require to function properly? This is addressed by a separate class of tools called data managers or DMs.
Data managers are special tools because they can only be run by server admins. A data manager knows how/where to obtain the data required by a specific regular tool, or in the case of indexes, how to build them. So Galaxy admins only have to execute the DM through Galaxy's UI and the DM will handle everything for them (build an index, for example, and put it in the right place on the server where the dependent tool can find it).
So like a regular tool, a data manager is written by tool developers and installed by admins. What's special is that it's also used by admins, while end users should not need to know anything about it.

One reason why this approach exists is that admins may install tools because users ask them to, but they may not know much about what the tool does and how it functions. In fact, they may have installed hundreds to thousands of tools on their server and simply cannot keep an overview over all of them. So making it easy for server admins to install everything that's needed for a particular tool to work is crucial.

So what are the problems with this approach:

Server admins have to be aware that a tool they just installed requires a data manager to be run before the tool can be used. If they forget to run the DM, end users will, for example, not be able to select an index from the tool's user interface, so users will complain to their admin. It's particularly easy for admins to forget to run the DM if they only update to a newer version of a tool because they think that they have installed the index previously.
It means extra effort for tool developers. Writing DMs is relatively complicated and, in cases like STAR,
the DM needs to be adapted with every change to the STAR index structure.
Why is that? Galaxy puts a lot of emphasis on reproducibility so, typically, when a server admin installs a new version of a tool they won't delete the old one, but keep it around, so that users of the earlier version can come back and reproduce their results with that version. Clearly though, it does not make any sense to offer to end users of a tool to select an index that is not compatible with the version of the tool they are running. So it becomes the tool developers responsibility to adapt the DM to store some kind of version info alongside the indexes, and to write the regular tool in a way that it only offers the right indexes when a users selects a particular version of the tool.

Why am I telling you all of this? Because it is relatively rare that an original tool author is aware or cares about all of it, but you can make our lives quite a bit easier if you do :)

Of course, we don't want to (nor can we) put any restriction on your development process. If there is a good reason why you think you need to change the index structure between any versions of STAR, then go for it. Just keep in mind that this generates quite some overhead here on the Galaxy side of things. It's simply not as easy for us to cope with it as it may be for the average command line user.
If you change the index structure between versions, then you can help us a lot by announcing the change clearly and by stating explicitly which versions of your software are compatible with which index version. It is complicated enough to get the logic right if we know this association, but it's really bad if we have to guess, or have to study the source code of different versions to find out.

Puh, a lengthy comment, sorry. Let me add to it that all the time you invest into developing STAR is really appreciated, and that it's very encouraging to see a tool author respond to requests as quickly as you.

abretaud · 2022-04-05T11:58:01Z

We have >2.7 now

alexdobin mentioned this issue Mar 21, 2019

Exact meaning of versionGenome in parametersDefault alexdobin/STAR#596

Closed

abretaud closed this as completed Apr 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

STAR 2.7.0x update #2316

STAR 2.7.0x update #2316

bgruening commented Feb 26, 2019

wm75 commented Mar 17, 2019

wm75 commented Mar 18, 2019

bgruening commented Mar 18, 2019

mtekman commented Mar 18, 2019 via email

wm75 commented Mar 18, 2019

mtekman commented Mar 18, 2019 via email

wm75 commented Mar 20, 2019

wm75 commented Mar 20, 2019

wm75 commented Mar 20, 2019

alexdobin commented Mar 20, 2019

wm75 commented Mar 21, 2019

abretaud commented Apr 5, 2022

STAR 2.7.0x update #2316

STAR 2.7.0x update #2316

Comments

bgruening commented Feb 26, 2019

wm75 commented Mar 17, 2019

wm75 commented Mar 18, 2019

bgruening commented Mar 18, 2019

mtekman commented Mar 18, 2019 via email

wm75 commented Mar 18, 2019

mtekman commented Mar 18, 2019 via email

wm75 commented Mar 20, 2019

wm75 commented Mar 20, 2019

wm75 commented Mar 20, 2019

alexdobin commented Mar 20, 2019

wm75 commented Mar 21, 2019

abretaud commented Apr 5, 2022