Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STAR 2.7.0x update #2316

Closed
bgruening opened this issue Feb 26, 2019 · 12 comments
Closed

STAR 2.7.0x update #2316

bgruening opened this issue Feb 26, 2019 · 12 comments

Comments

@bgruening
Copy link
Member

STAR changed the index layout in >2.7.0. So I think we need to have a new loc file or add a version column to the already existing loc file.

In addition, STAR added a few new features for scRNA libraries.

Has anyone time to work on these updates?

ping @bwlang @yhoogstrate

@wm75
Copy link
Contributor

wm75 commented Mar 17, 2019

@bgruening I can spend some time on it this week

@wm75
Copy link
Contributor

wm75 commented Mar 18, 2019

After a first look at the new scRNA functionality I'm almost tempted to split this out into a separate tool.
Specifically, my idea would be to have two xmls living side-by-side sharing most of their code via macros, but getting packaged into two toolshed repos.

Advantages I can see:

  • less nested conditionals
  • no changes to the regular STAR wrapper -> makes it easier to update the tool version used in workflows
  • could offer STAR solo in an scRNA category, while regular STAR could continue living in the RNA-Seq category of the Tools bar

Disadvantages:

  • harder to understand that this is the same underlying command-line tool
  • some wrapper code duplication (though of course, macros will help a lot)

This is really just an idea, but I thought I'd bring it up early before beginning any serious work on the wrapper. Opinions @bgruening, @bebatut, @mtekman, everyone?

@bgruening
Copy link
Member Author

My initial feeling was a separate tool as well. But I stopped looking at it deeply.

@mtekman
Copy link
Contributor

mtekman commented Mar 18, 2019 via email

@wm75
Copy link
Contributor

wm75 commented Mar 18, 2019

Sure, for users of STAR solo the benefits of a separate tool would be minimal.
For the (many more) users of regular STAR, however, we would avoid additional complexity, which they just don't need.

@mtekman
Copy link
Contributor

mtekman commented Mar 18, 2019 via email

@wm75
Copy link
Contributor

wm75 commented Mar 20, 2019

Looking at the STARsolo option a bit more I'm starting to wonder whether it is really mature enough to offer it through Galaxy. I've found:

at least, which give me the impression that it may be better to wait one or two minor releases longer.
These aren't show-stoppers, i.e., they are not preventing me from working on a tool wrapper and a DM update, but maybe it's not yet time for a MTS release? In particular, I'm worried about genome index changes, which already seem to have occurred within the 2.7.0 series (i.e. with patch releases! - have to confirm this though) and we may not want to force Admins to keep on rebuilding indexes on every update.

@wm75
Copy link
Contributor

wm75 commented Mar 20, 2019

Another issue I've come across is alexdobin/STAR#594, which may require a bioconda recipe fix if reproducible.

@wm75
Copy link
Contributor

wm75 commented Mar 20, 2019

I cannot reproduce the bioconda issue mentioned above. Maybe it was fixed by upgrading to v2.7.0e.

@alexdobin
Copy link

Hi All,

interesting discussion here, I would like to put my 2 cents and hear your recommendation.

The STARsolo releases are still somewhat buggy, but it seems to be stabilizing.
However, I will be adding more solo* options in the next 1-2 months.

The genome index - is it too painful for Galaxy users to re-generate the index?
I was conservative for a while trying to preserve the compatibility with old indexes.
I could, in principle, maintain compatibility and require new indexes only for new features (e.g. STARsolo), but it makes it more complicated for developing.

STARsolo is still STAR - it just takes a few extra parameters and generates a few more outputs.

The segfault "STAR --version" (with bioconda) looked strange to me. I will have to look into what bioconda is doing.

Cheers
Alex

@wm75
Copy link
Contributor

wm75 commented Mar 21, 2019

Hi Alex,
great to see you here!

I guess most of the points discussed so far are things I've raised simply to get some feedback, not because they are a serious concern. It's helpful to hear that you're sharing my opinion about stability of the STARsolo mode. It's not a problem to wait just a little bit longer until you have resolved the majority of the issues around it. I can prepare the Galaxy wrappers for STAR2.7, then release them only once we
can couple them to a stable version. The bioconda issue could potentially be a serious problem, but as I commented above, I could not reproduce it so things look good on that side.

The thing that I'm most concerned about right now really is the index building:

If you want to understand why we have issues with rebuilding indexes, it may be useful to provide you with a bit of background on how Galaxy handles them (feel free to skip this if you aren't interested). Essentially, you can think of the Galaxy community as consisting of three spheres (with a rather huge overlap between them though):

  • tool developers who work on wrapping tools (like STAR) for Galaxy and offering them for installation through Galaxy toolsheds (the most important one of these being the Main Toolshed or MTS)
  • server administrators who manage a Galaxy server and install tools (from toolsheds) into it so that they can be used on that particular server instance
  • end users who work on a Galaxy server and run installed tools on it

Now who is responsible for building the indexes and/or provide additional data that tools may require to function properly? This is addressed by a separate class of tools called data managers or DMs.
Data managers are special tools because they can only be run by server admins. A data manager knows how/where to obtain the data required by a specific regular tool, or in the case of indexes, how to build them. So Galaxy admins only have to execute the DM through Galaxy's UI and the DM will handle everything for them (build an index, for example, and put it in the right place on the server where the dependent tool can find it).
So like a regular tool, a data manager is written by tool developers and installed by admins. What's special is that it's also used by admins, while end users should not need to know anything about it.

One reason why this approach exists is that admins may install tools because users ask them to, but they may not know much about what the tool does and how it functions. In fact, they may have installed hundreds to thousands of tools on their server and simply cannot keep an overview over all of them. So making it easy for server admins to install everything that's needed for a particular tool to work is crucial.

So what are the problems with this approach:

  • Server admins have to be aware that a tool they just installed requires a data manager to be run before the tool can be used. If they forget to run the DM, end users will, for example, not be able to select an index from the tool's user interface, so users will complain to their admin. It's particularly easy for admins to forget to run the DM if they only update to a newer version of a tool because they think that they have installed the index previously.
  • It means extra effort for tool developers. Writing DMs is relatively complicated and, in cases like STAR,
    the DM needs to be adapted with every change to the STAR index structure.
    Why is that? Galaxy puts a lot of emphasis on reproducibility so, typically, when a server admin installs a new version of a tool they won't delete the old one, but keep it around, so that users of the earlier version can come back and reproduce their results with that version. Clearly though, it does not make any sense to offer to end users of a tool to select an index that is not compatible with the version of the tool they are running. So it becomes the tool developers responsibility to adapt the DM to store some kind of version info alongside the indexes, and to write the regular tool in a way that it only offers the right indexes when a users selects a particular version of the tool.

Why am I telling you all of this? Because it is relatively rare that an original tool author is aware or cares about all of it, but you can make our lives quite a bit easier if you do :)

  1. Of course, we don't want to (nor can we) put any restriction on your development process. If there is a good reason why you think you need to change the index structure between any versions of STAR, then go for it. Just keep in mind that this generates quite some overhead here on the Galaxy side of things. It's simply not as easy for us to cope with it as it may be for the average command line user.
  2. If you change the index structure between versions, then you can help us a lot by announcing the change clearly and by stating explicitly which versions of your software are compatible with which index version. It is complicated enough to get the logic right if we know this association, but it's really bad if we have to guess, or have to study the source code of different versions to find out.

Puh, a lengthy comment, sorry. Let me add to it that all the time you invest into developing STAR is really appreciated, and that it's very encouraging to see a tool author respond to requests as quickly as you.

@abretaud
Copy link
Contributor

abretaud commented Apr 5, 2022

We have >2.7 now

@abretaud abretaud closed this as completed Apr 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants