package design suggestion #6

jorainer · 2017-10-20T05:41:33Z

@stanstrup, very nice work indeed! Always wanted to have such a packages and also started implementing something (https://github.com/jotsetung/xcmsExtensions), but never too serious.

My suggestion(s):

Keep functionality separate from the data:
Have dedicated data packages. This allows to have data packages from different sources or from different versions. See e.g. ensembldb and the EnsDb.Hsapiens.v75 package, or GenomicFeatures and the separate TxDb packages.

Define a CompoundDb class with main methods to query and access the database. E.g. have a method compound that retrieves compounds from the database and supports multiple filters. I know that's a little different setup (Bioconductor's rich, S4-based) than the tidyverse one, still, I think one could combine both worlds.

One could then define e.g. a HMDB class that simply extends the CompoundDb to accommodate HMDB-specific fields and attributes.

What might also be interesting is to implement the filters (or create filters that extend) AnnotationFilter (e.g. MzFilter or MassFilter). The MzFilter would have to calculate the mass for the provided mz. Ideal would be to have a MassFilter that takes a MzFilter as input, calculated the theoretical mass for the mz and returns a MassFilter.

The big advantages of having this setup would be:

Versioned data packages.
Data packages could be added to AnnotationHub.
Using the same interface (methods and filters) for different data resources
simplifies the use for the user (e.g. use the same method to retrieve
Integration into Bioconductor. Common concept for annotation resources.
You don't have to worry about licensing of the data resource in the main
package. Each data package could/should have its own version. For those that
don't allow sharing the data you could just provide the functionality to
create the resource in the package and let it to the user to create the
package for themselfs (if they have the license to do so).

I would be happy to contribute here (especially related to the database class,
interface methods and filters as I did all this already in ensembldb).

open for discussion

The text was updated successfully, but these errors were encountered:

stanstrup · 2017-10-20T07:44:31Z

Ad data package: Yes I think you are definitely right that this is the way to go. I will do that when I have parsed HMDB (need to ask if I can be allowed to include it) and pubchem.

Ad the the S4 stuff. I am not against that at all. But I think I have to say that that is not priority for me as I have not yet dwelled into the whole S4-thing yet. So right now it will be some investment of time to learn that stuff and less easy for me to maintain. I would also not want to drop the tidyverse style code as I find this so much easier to work with. But I guess that can be done as you said.
That said if you want to start on this I would be grateful and more than happy to change the structure. It is something I really want but I also need to priorities actually using the tool to do the projects I am supposed to be working on...

jorainer · 2017-10-20T08:03:32Z

If it's OK for you I would fork your repo and add the code related to the S4 approach - it's easier to describe/discuss if you can really get your hands onto it.

stanstrup · 2017-10-20T08:20:22Z

Yes of course. I think it would be less messy if you hold on for now though. I am adding the last things I have. The peaklist annotation and the browser itself.

stanstrup · 2017-10-20T11:06:39Z

@jotsetung I added the last functions and added full example to the README. I don't plan to do much more right now. Perhaps get Pubchem and HMDB parsers done.

jorainer · 2017-10-20T11:28:02Z

Cool! thanks! Could you please make a branch named e.g. S4? So, if I make a pull request at some time we don't mess with the master branch?

stanstrup · 2017-10-20T11:32:44Z

Sure! Done.

rsalek · 2017-10-20T14:07:20Z

I need to catchup, nice work so quickly put together!

stanstrup · 2017-10-23T08:15:04Z

I am starting to think that it might make sense to lock it down to defining an S4 object that takes a CAMERA object and adds the annotation (or extends the CAMERA object to allow this).
That way all the peaktable manipulation can be abstracted away. Downside is that you can then basically only use xcms/CAMERA (though I do have a function to fake a CAMERA object from an MzMine peaklist).

jorainer · 2017-10-23T08:46:31Z

Supporting to perform the annotation on a CAMERA object makes sense, but I would definitely keep low-level functions that don't required special S4 objects as input, so that it can be applied in general too.

jorainer mentioned this issue Oct 20, 2017

Databases to convert to tables #2

Closed

12 tasks

stanstrup added enhancement help wanted labels Oct 20, 2017

stanstrup mentioned this issue Oct 23, 2017

Where to distribute the package? #11

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

package design suggestion #6

package design suggestion #6

jorainer commented Oct 20, 2017

stanstrup commented Oct 20, 2017

jorainer commented Oct 20, 2017

stanstrup commented Oct 20, 2017

stanstrup commented Oct 20, 2017 •

edited

Loading

jorainer commented Oct 20, 2017

stanstrup commented Oct 20, 2017

rsalek commented Oct 20, 2017

stanstrup commented Oct 23, 2017

jorainer commented Oct 23, 2017

package design suggestion #6

package design suggestion #6

Comments

jorainer commented Oct 20, 2017

stanstrup commented Oct 20, 2017

jorainer commented Oct 20, 2017

stanstrup commented Oct 20, 2017

stanstrup commented Oct 20, 2017 • edited Loading

jorainer commented Oct 20, 2017

stanstrup commented Oct 20, 2017

rsalek commented Oct 20, 2017

stanstrup commented Oct 23, 2017

jorainer commented Oct 23, 2017

stanstrup commented Oct 20, 2017 •

edited

Loading