Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

package design suggestion #6

Open
jorainer opened this issue Oct 20, 2017 · 9 comments
Open

package design suggestion #6

jorainer opened this issue Oct 20, 2017 · 9 comments

Comments

@jorainer
Copy link

@stanstrup, very nice work indeed! Always wanted to have such a packages and also started implementing something (https://github.com/jotsetung/xcmsExtensions), but never too serious.

My suggestion(s):

Keep functionality separate from the data:
Have dedicated data packages. This allows to have data packages from different sources or from different versions. See e.g. ensembldb and the EnsDb.Hsapiens.v75 package, or GenomicFeatures and the separate TxDb packages.

Define a CompoundDb class with main methods to query and access the database. E.g. have a method compound that retrieves compounds from the database and supports multiple filters. I know that's a little different setup (Bioconductor's rich, S4-based) than the tidyverse one, still, I think one could combine both worlds.

One could then define e.g. a HMDB class that simply extends the CompoundDb to accommodate HMDB-specific fields and attributes.

What might also be interesting is to implement the filters (or create filters that extend) AnnotationFilter (e.g. MzFilter or MassFilter). The MzFilter would have to calculate the mass for the provided mz. Ideal would be to have a MassFilter that takes a MzFilter as input, calculated the theoretical mass for the mz and returns a MassFilter.

The big advantages of having this setup would be:

  • Versioned data packages.
  • Data packages could be added to AnnotationHub.
  • Using the same interface (methods and filters) for different data resources
    simplifies the use for the user (e.g. use the same method to retrieve
  • Integration into Bioconductor. Common concept for annotation resources.
  • You don't have to worry about licensing of the data resource in the main
    package. Each data package could/should have its own version. For those that
    don't allow sharing the data you could just provide the functionality to
    create the resource in the package and let it to the user to create the
    package for themselfs (if they have the license to do so).

I would be happy to contribute here (especially related to the database class,
interface methods and filters as I did all this already in ensembldb).

open for discussion

@stanstrup
Copy link
Owner

Ad data package: Yes I think you are definitely right that this is the way to go. I will do that when I have parsed HMDB (need to ask if I can be allowed to include it) and pubchem.

Ad the the S4 stuff. I am not against that at all. But I think I have to say that that is not priority for me as I have not yet dwelled into the whole S4-thing yet. So right now it will be some investment of time to learn that stuff and less easy for me to maintain. I would also not want to drop the tidyverse style code as I find this so much easier to work with. But I guess that can be done as you said.
That said if you want to start on this I would be grateful and more than happy to change the structure. It is something I really want but I also need to priorities actually using the tool to do the projects I am supposed to be working on...

@jorainer
Copy link
Author

If it's OK for you I would fork your repo and add the code related to the S4 approach - it's easier to describe/discuss if you can really get your hands onto it.

@stanstrup
Copy link
Owner

Yes of course. I think it would be less messy if you hold on for now though. I am adding the last things I have. The peaklist annotation and the browser itself.

@stanstrup
Copy link
Owner

stanstrup commented Oct 20, 2017

@jotsetung I added the last functions and added full example to the README. I don't plan to do much more right now. Perhaps get Pubchem and HMDB parsers done.

@jorainer
Copy link
Author

Cool! thanks! Could you please make a branch named e.g. S4? So, if I make a pull request at some time we don't mess with the master branch?

@stanstrup
Copy link
Owner

Sure! Done.

@rsalek
Copy link

rsalek commented Oct 20, 2017

I need to catchup, nice work so quickly put together!

@stanstrup
Copy link
Owner

I am starting to think that it might make sense to lock it down to defining an S4 object that takes a CAMERA object and adds the annotation (or extends the CAMERA object to allow this).
That way all the peaktable manipulation can be abstracted away. Downside is that you can then basically only use xcms/CAMERA (though I do have a function to fake a CAMERA object from an MzMine peaklist).

@jorainer
Copy link
Author

Supporting to perform the annotation on a CAMERA object makes sense, but I would definitely keep low-level functions that don't required special S4 objects as input, so that it can be applied in general too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants