Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assays in MSnExperiments #444

Open
lgatto opened this issue Feb 21, 2019 · 46 comments

Comments

@lgatto
Copy link
Owner

commented Feb 21, 2019

I'm thinking out loud here, but I am wondering if it would be a good move to fuse the low-level MSnExperiment and the other classes that contain processed data, such as features (XCMSFeatureSet) and psms/peptides/proteins (MSnSet). The main reason for this is that having all in one structure would support cross-level visualisation. (Now I remember we briefly talked about a super-class idea that we ditched, so may be this is leading us back there, but let's see).

At this stage, the raw MS data is stored in the Backend slot. We don't have an assayData slot any more. My idea would be to have it back (as an environment, for example), that could contain processed data originating from the backend. That assayData could contain a matrix with quantitation values for psm/peptides/proteins and whatever is used for features (can't remember right now if it was also a matrix).

Using [x,i,j=missing,drop=missing, [[ and spectra would access the spectra in the raw data (as always), and data in the assayData slot would be accessed with assay(, "assay") or [x,i,j,drop=missin, with for example

  • assay(x, "exprs") or exprs(x) for the quantitation matrix
  • assay(x, "count") or count(x) for the count data (if we wanted to distinguish this explicitly from the quant matrix)
  • assay(x, "features") or features(x) to access the feature data
  • x[i, j] if there's only one assay, possibly with a default one

Each of these rows in the assay would point to a collection of feature variables/spectra. This could be

  • a single feature data/spectrum (for a PSM, where we have a single match between a MS2 spectrum and a peptides), or
  • several feature data/specta, for example for a peptide (or a protein) identified by several peptides, or
  • a M/Z feature, matched by several spectra and their feature data over a retention time period.

We would need to find a clever way to manage these pointers.

One drawback I see is that now, the rows of an assay don't match those of the feature one-to-one any more, but we still have a 1-to-1 match between spectra and feature data. I briefly considered having two feature data, but I don't like this at all.

The assays would be populated during processing, for example

length(x) ## 12345
addIdenticiationData(x, 'id.mzid') ## adding feature data
## at this stage, no dims
dims(x) ## 0 0
x <- quantify(x, reporters = TMT10)
length(x) ## still 12345
dim(x) ## for example, 2981 by 10, as here we quantify only MS2 spectra
exprs(x)  ## a matrix

With this object, we could extract an M/Z feature/peptide/protein..., and get all its corresponding spectra. Alternatively, extract a spectrum, and get the M/Z features/peptides associated with it.

There are other points that need to be clarified here, but before considering this more seriously, I wanted to pass it along you.

@sgibb @jotsetung - comments, suggestions, clarifications?

@jorainer

This comment has been minimized.

Copy link
Collaborator

commented Feb 21, 2019

I like the idea - but am wondering if it makes sense to define a single superclass that covers all possible usage scenarios. I'm afraid that this will overcomplicate it. Also, I would not add an assayData slot to MSnExperiment. IMHO MSnExperiment should be as lightweight as possible - in principle just a list of spectrum objects with additional spectrum annotations.

I however totally agree that the MSnSet could be replaced by an object that extends MSnExperiment (and has an @assayData slot or something similar). That is exactly what I have in xcms. The XCMSnExp object has a slot @msFeatureData which is essentially an environment containing certain data (adjusted retention times, identified chromatographic peaks or feature definitions) - so this somehow equivalent to an assayData. Currently this object extends OnDiskMSnExp, but I am really hoping to change it to MSnExperiment/Spectra to be able to use all backends.
A reason I don't want to deviate too much from my implementation of the XCMSnExp: it took me quite some time to convince the xcms users to accept the new object - some will definitely be upset if I change it again.

My suggestion would be that for now we have one object extending MSnExperiment in each domain (proteomics and metabolomics (= the XCMSnExp)) and we then compare them and think how they can be fused.

@lgatto

This comment has been minimized.

Copy link
Owner Author

commented Feb 22, 2019

Point taken - I will experiment with this in another class that inherits from MSnExperiment.

I think however that your as lightweight as possible argument is wrong. Adding that one slot wont make it heavy (or slow, or large size, ...) - I think this is a biaised developers perspective.

@jorainer

This comment has been minimized.

Copy link
Collaborator

commented Feb 22, 2019

I think experimenting with an object extending the MSnExperiment is the right way to go for now - we can always merge them later.

And yes, you're absolutely right, lightweight is the wrong term. I just wanted to keep it as simple and generic as possible. For use cases were we just need a list of Spectrum object (such as with the CompoundDb or representing MS2 spectra from an annotation database) the assayData would not be needed.

Actually, what if we renamed the current MSnExperiment then to Spectra (remove the old implementation) and call the object with the assayData slot MSnExperiment?

@lgatto

This comment has been minimized.

Copy link
Owner Author

commented Feb 22, 2019

I think the notion of experiment is essential and shouldn't be understated by calling the object simply Spectra. There's also metadata, annotations...

I do struggle to find a name for the subclass with an assay slot though 🤔

@jorainer

This comment has been minimized.

Copy link
Collaborator

commented Feb 22, 2019

You think so? Hm, I actually liked the Chromatograms/Spectra names - simple, short and tell what their main data type is. Both have row and column annotations, and in fact can represent experiments but don't have too.

Eventually, after sleeping over it you might also like renaming MSnExperiment into Spectra 😄
The good sides of it: no more struggling to find a name for the subclass, and we can replace the current Spectra object in MSnbase! Otherwise we really have quite some redundancy (MSnExp, OnDiskMSnExp, Spectra, MSnExperiment).

And the downside of not calling it Spectra: to be consistent we would have to rename Chromatograms into something like ChromatogramExperiment or MrmExperiment - and this will be troublesome for many xcms users as Chromatograms and Chromatogram is a currently heavily used result object in xcms...

@lgatto

This comment has been minimized.

Copy link
Owner Author

commented Feb 22, 2019

Still thinking out loud here...

Trying to bring the Spectra vs. MSnExperiment further, what is we had Spectra that contained only the backend (with the raw data) and the feature metadata - a bit like the Spectra we have now, but basically with a flexible/proper backend.

MSnExperiment has

  • an (optional) spectraData slot
  • an (optional) assayData slot
  • a mandatory (but possibly empty) sample metadata slot
  • a mandatory general metadata list slot
  • a mandatory (but possibly empty) feature metadata slot (*) that would related to elements in the assay and/or spectra slots.

So that an MSnExperiment with a spectraData slot would be the equivalent of the current MSnExp (but better) and an MSnExperiment with a assayData slot would be the equivalent of the current MSnSet (but better).

(*) Could we simply move the colData from a Spectra object to the MSnExperiment feature metadata slot?

The official data structure would be MSnExperiment, and Spectra would be more low-level, development oriented? The long term plan would be to discontinue MSnExp and OnDiskMSnExp (as long as this doens't break xcms, of course).

@jorainer

This comment has been minimized.

Copy link
Collaborator

commented Feb 22, 2019

Yes, I was thinking also along the same lines. Only that I would keep the @spectraData (aka featureData) slot in the Spectra object. This would contain then the header information from the spectra, so it is crucial to have them in there - and it allows to have arbitrary additional spectrum metadata columns (reason why I implemented the current Spectra object.

So, my slightly modified suggestion:
Spectra has

  • spectraData slot.
  • backend slot.
  • processingQueue slot.
  • metadata slot.
  • optional processing slot to keep track of processings - would be better to have that in MSnExperiment, but then we would have to implement methods twice, just to add an entry to this slot.

The sampleData goes to the MSnExperiment:

  • sampleData slot.
  • assayData slot (an environment)?

Would be nice if the MSnExperiment with the assayData slot would be kept generic enough for me to extend it in xcms...

Regarding breaking xcms - my plan was to change XCMSnExp to extend MSnExperiment/Spectra instead of OnDiskMSnExp - hope that will work out.

@lgatto

This comment has been minimized.

Copy link
Owner Author

commented Feb 22, 2019

I would prefer the generic metadata to be at the experiment level. That's part of the experiment data, rather than the actual data.

Not sure if we understood spectraData the same way - I was referring to the actual Spectra object, that would be in a slot called spectraData. I think you (correctly) referred to the Spectra mcols slot, which of course is needed in the Spectra.

So my question is, could that mcols from Spectra be moved/linked to the new feature metadata slot at the experiment level, so that (1) it can also be used in cases where there are no spectra but only an assay in addition to cases with (2) only spectra and (3) cases with both spectra and assays.

A similar question arises for the processing slot - there can be processing on the spectra and/or the assays. How can we keep these apart? A processing list with elements spectra and assays?

Or would we want two data-only classes: Spectra and MSnAssay with their own processing and feature/spectra metadata slots, that are brought into an MSnExperiment?

And yes, we should definitely make sure make sure that the assay slot is generic enough to accommodate your XCMSnExp use case.

@jorainer

This comment has been minimized.

Copy link
Collaborator

commented Feb 22, 2019

Re metadata: also OK for me to go to the experiment level.

Re spectraData: you're right. I was referring to the featureData that we currently have in MSnExp. Only, in the current MSnExperiment it is called spectraData. Confusing. In the future Spectra implementation we don't need a @spectraData slot. The spectra data is in fact in the @backend slot (or in the mzML or HDF5 files).

I would suggest the following:

setClass(
    "Spectra",
    slots = c(
        backend = "Backend",
        ## was featureData in MSnExp, could also be called spectraMetaData
        featureData = "DataFrame",
        spectraProcessingQueue = "list",
        ## logging
        processing = "character",
        version = "character"
    ),
    prototype = prototype(version = "0.1")
)

setClass(
    "MSnExperiment",
    slots = c(
        ## was phenoData in MSnExp
        sampleData = "DataFrame",
        ## metadata
        metadata = "list",
        assayData = "environment"
    ),
    prototype = prototype(version = "0.1")
)

The Spectra@featureData (although I would prefer to call it spectraMetaData) is in fact equivalent to the mcols we have in the current Spectra implementation. I would not necessarily link that to features in the assay slot (that's also a reason I would not call it featureData), although we could. But then there is the question what if you have spectra and want to annotate features in assayData? In the XCMSnExp I keep feature annotations in an environment slot. Users access it with the featureDefinitions method.

To keep the processings for spectra and assays apart I would almost use a spectraProcessing slot in Spectra and assayProcessing in MSnExperiment.

The rest should work. All methods (accessors, filters etc) could be implemented for the Spectra object without any need to have an additional implementation for MSnExperiment - with the exception of [ and filterFile that need also to subset sampleData.

I have implemented already almost all accessor etc methods, so, once we have pull request #440 in, I could split the current MSnExperiment we have in the backends branch into Spectra and the new MSnExperiment, adapt the methods, remove the old Spectra implementation. You could then work on the new MSnExperiment object and I could check how much I would break and have to fix in xcms.

@lgatto

This comment has been minimized.

Copy link
Owner Author

commented Feb 22, 2019

We need to clarify how to deal with the feature meta data in the Spectra and in the Assays. Ideally, we don't want to duplicate these DataFrames. I have several related questions:

  • Implementation: how do we refer to the same feature metadata DataFrame and keep track of the one-to-one and one-to-many relations?
  • Where would that feature metadata DataFrame live - at the experiment-level ideally, I suppose?
  • A Spectra object on it's own has it own feature metadata DataFrame, but when in exists in a MSnExperiment, that feature metadata DataFrame then 'moves up' at the experiment level?

These questions don't really apply for the assays because there are two pipelines

  1. start from raw data (or Spectra) to create an experiment with a Spectra slot; the experiment inherits the feature (spectra) metadata `DataFrame).
  2. an assay is computed from the raw data, the assayData slot gets computed, and the new features point to existing features metadata DataFrame
    or
  3. create a new experiment with only quantitative data (equivalent to the current MSnSet)

Populating raw data (i.e. Spectra) for an experiment that already has an assay makes less sense.

@lgatto

This comment has been minimized.

Copy link
Owner Author

commented Feb 22, 2019

Last reflection before I switch off for the night, referring to your Chromatogram(s) naming conflict. An MSnExperiment could possibly also contain Chromatogram(s), just like it can contain Spectra and Assays...

@jorainer

This comment has been minimized.

Copy link
Collaborator

commented Feb 23, 2019

I'm not familiar enough with the MSnSet so I don't quite understand what type of features you have and how they should/could be linked between the Spectra and Assays. But according to point 3 it would make sense that Assays has its own feature DataFrame, and the one in Spectra would be empty, because there are no spectra.

I can tell you how I organize data in xcms, maybe it helps: in XCMSnExp I have a matrix of detected chromatographic peaks as a first set of results (one row per peak, one matrix for all samples, to know in which sample the peak was identified I have a column with the sample index). Then, after correspondence, I have a DataFrame called featureDefinitions that essentially groups peaks across samples. To link the feature to the peaks I have a column peak_idx which is essentially a list, each element with the indices of the chromatographic peaks in the peaks matrix assigned to that feature (allows n:m mapping). Both these results go into an environment in XCMSnExp. I have no direct link to the spectra, but it is just very handy to have them: using the retention time range and m/z range for the chromatographic peaks I can always extract Chromatograms on the fly from the raw data.

So from my point of view it makes sense to have feature data frame in Spectra, possibly naming spectraMetaData to avoid confusions with higher level features in Assays. Also, for me the current implementation of MSnExperiment would be ideal, because it contains everything I need: a DataFrame contain sample descriptions (@sampleData) and the spectra information.

What I would propose (as it would fit best into my use cases):

  1. Keep the MSnExperiment implementation as it is now.
  2. My XCMSnExp will extend MSnExperiment - everything should then work out of the box.
  3. You could have an QMSnExperiment (quantified MSn experiment?) with an assayData etc.

I know you don't like the idea, but still, I think it would be handy to rename MSnExperiment to Spectra - that way we could silently replace the current Spectra implementation without having to go through deprecation cycles etc. just an idea - I can also live with keeping it as MSnExperiment.

@lgatto

This comment has been minimized.

Copy link
Owner Author

commented Feb 23, 2019

Here's a schema of what my current thoughts are.

  • If you only want spectra, you could have a Spectra object.
  • If you only have quantitative data, you could have and Assays (could be a SummarizedExperiment or an MSnSet).
  • If you only have chromatograms, you could have a Chromatograms object.

All, some or only one of these can be encapsulated into an experiment, that would make sure that the link between spectra, chromatograms and assay features are tracked when the quantitative features are computed from the spectra of chromatograms.

An *experiment* (tentatively named `MSnExperment`) contains
- general metadata
- feature metadata (optional, if available elsewhere) <-?-?-?-?-?-?-?-?-?-?-?-?--+
- sample metadata (optional, if available elsewhere) <------------------------+  |
- *assays* i.e. quanitative features (MS1 features, as in xcms,  <----------+ |  |
  peptides, proteins) (optional)                                            | |  |
- *spectra* (optional) <-------------------------------------------------+  | |  |
- *chromatograms* (optional)                                             |  | |  |
                                                                         |  | |  |
*Spectra* contain -------------------------------------------------------+  | |  |
- spectra (how these are stored depending on the backend)                   | |  |
- feature (spectra) metadata <-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?--|-|--+
- processing queue (for lazy eval)                                          | |  |
- processing log                                                            | |  |
                                                                            | |  |
*Assays* contain <----------------------------------------------------------+ |  |
- an assay of quantitative features (lets call them *qfeatures* for           |  |
  now)                                                                        |  |
- qfeatures metadata <-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?-?--|--+
- sample metadata <-----------------------------------------------------------+
- processing log

*Chromatogam* contains ... (see the class definition, but will possibly
need updates to fit with this general idea at some point)

What I currently don't know, is how to integrate the different spectra/assays (and later chromatogram) feature metadata. We don't want to have individual feature metadata at the experiment level.

But maybe this isn't an issue after all. I think that the most general pipeline for an experiment would be to

  1. Create and experiment from spectra (or from chromatograms, or both)
  2. Compute the quantitative features. The qfeatures metadata would then, by definition, match the spectra metadata.

or

  1. I have some spectra (in a Spectra object), and I decide to convert these in an experiment to benefit from extra metadata (sample and generic) and because I want to quantify these data.
  2. Compute the quantitative features to populate the assay.

but the following situation would be problematic

  • I have some spectra
  • I have quantified features
  • I want to combine it all as an experiment
    How would we (1) make the link between the spectra and qfeatures metadata and (2) how/where to store these?

Looking forward to our next dev cell to discuss these ideas directly by voice.

@jorainer

This comment has been minimized.

Copy link
Collaborator

commented Feb 23, 2019

Yes, that sounds good! 👍
I was however wondering if it wouldn't make sense to have also @sampleData in Spectra? The backend needs anyway to track the information whether the information comes from a single or multiple files. So IMHO it would make sense to put also sample/file annotations there.

With that I would be perfectly happy!

BTW, wouldn't a MultiAssayExperiment make more sense than a SummarizedExperiment? Never used that object myself but it could suit heterogeneous data such as Spectra and Chromatograms.

@lgatto

This comment has been minimized.

Copy link
Owner Author

commented Feb 23, 2019

The exact reason there is an experiment class that goes beyond spectra, is because we have samples and an experimental design.

As far as I remember MAE is for matrix-type data from different types of assays related the same/similar sets of samples.

@jorainer

This comment has been minimized.

Copy link
Collaborator

commented Feb 23, 2019

OK, so what you suggest is to remove the sampleData and metadata slots from the current MSnExperiment to make it the new Spectra object.

For my XCMSnExp it will then however be tricky. I can not extend the Spectra, because then I will need my own sampleData slot - and along with that also my own function to read the data. Extending the future MSnExperiment sounds also not like a good idea, because there will be slots and functionality I will never need (and which can confuse my users - I will not need any functionality of an MSnSet).

We definitely need to discuss this in the next call.

@jorainer

This comment has been minimized.

Copy link
Collaborator

commented Feb 23, 2019

Can you sketch the class prototypes that you are thinking of? That would help me following your lines of thought. From above I did e.g. not quite understand if you plan to put Spectra into a slot or if MSnExperiment is extending Spectra.

@lgatto

This comment has been minimized.

Copy link
Owner Author

commented Feb 24, 2019

I don't want to define any class prototypes at this stage. I am trying to discuss possible plans for the future, without ties to any fixed implementations or (lack of) backward compatibility. The idea is to have low-level support for various types of data such as spectra, assays, chromatograms, ... that materialise (individually or together) into experiments. I'll think a bit more about it, toy with some test code, and present some ideas during the next dev call.

@jorainer

This comment has been minimized.

Copy link
Collaborator

commented Feb 24, 2019

👍

In the meantime, may I propose the following:

  1. we remove the sampleData and metadata slots from the current MSnExperiment and rename that object to Spectra (hence replacing the obsolete List of Spectrum objects).
  2. create a new SpectraSet object that has sampleData and metadata and extends Spectra. This would be equivalend to the current MSnExp and OnDiskMSnExp.

With this new SpectraSet I could then play around to see how and if I can make the transistion in xcms. We could even replace MSnExp/OnDiskMSnExp with this object (with a soft transition in which readMSData would return a SpectraSet). We would then have more time to design your envisioned MSnExperiment.

@lgatto

This comment has been minimized.

Copy link
Owner Author

commented Feb 24, 2019

No, let's leave things like that, please. I don't want to make plans at the moment to accommodate the current situation. I think we should discuss things in the light of where we want to go, and then consider advantages and costs of backward compatibility.

@lgatto

This comment has been minimized.

Copy link
Owner Author

commented Mar 8, 2019

Here's a summary from our last call (#445) concerning this topic.

setClassUnion("SpectraOrNULL", c("NULL", "Spectra"))
setClassUnion("ChromatogramsOrNULL", c("NULL", "Chromatograms"))
setClassUnion("AssaysOrNULL", c("NULL", "Assays"))

setClass("MSnExperiement",
         slots = c(
             ## metadata
             metadata = "list",
             sampleMetadata = "DataFrame",
             dataIndices = "DataFrame",
             ## Data
             spectra = "SpectraOrNULL",
             assays = "AssaysOrNULL",
             chromatograms = "ChromatogramsOrNULL"))

setClass("Spectra", ## former MSnExperiment
         slots = c(
             backend = "Backend",
             spectraData = "DataFrame",
             processingQueue = "list",
             processing = "character"))

@jotsetung @sgibb, please let me know if that in indeed what transpired from the discussion, and whether we want to take it on from there.

@jorainer

This comment has been minimized.

Copy link
Collaborator

commented Mar 11, 2019

👍 from my side.

I have however a question (since I'm not familiar with your type of data), what would Assays be? What type of data would we store into that?

@sgibb

This comment has been minimized.

Copy link
Collaborator

commented Mar 11, 2019

Absolutely agree 👍 ;

I am not sure whether each data type (Spectra,Assays, ...) need a processing slot or if we use a "global" one in MSnExperiment.

@lgatto

This comment has been minimized.

Copy link
Owner Author

commented Mar 15, 2019

@jotsetung - assays would typically be matrices or list of matrices, or more even environment. I think in this context, we would use this for quantitative data. It must fit your definition too.

@sgibb - I though it would be good to keep them separate, but they could be combined with a data-specific tag to group them.

@jorainer

This comment has been minimized.

Copy link
Collaborator

commented Mar 15, 2019

Ideally assays should be an environment so we're not restricted on its content.
But in the end I plan (at least for the beginning) to use an XCMSnExp object extending the MSnExperiment. I can put all the information I need (matrix with identified chromatographic peaks, vector with adjusted retention times and DataFrame with feature definitions) into an environment - if it is assays or a new slot does not matter.

@sgibb

This comment has been minimized.

Copy link
Collaborator

commented Mar 17, 2019

Ideally assays should be an environment so we're not restricted on its content.

IMHO that is a bad design because in all methods we will have to check whether assays is a matrix, a list of matrix/data.frame or whatever.

Is there any use-case you can think of that could not represented with a (list of) matrix?

In this context we should look at DelayedArray. It provides a similar structure as our Backend classes for array-like (matrix, data.frame) objects.

@lgatto

This comment has been minimized.

Copy link
Owner Author

commented Mar 17, 2019

Yes, I think @sgibb has an excellent point here. A list of arrays would possibly work in my cases.

@jotsetung, would this me possible for your use cases?

Would we want any constrains on these?

@jorainer

This comment has been minimized.

Copy link
Collaborator

commented Mar 18, 2019

Well, I have no direct use of the assays slot if it was a matrix or list of matrices. But that should not be a problem.

I could simply extend the MSnExperiment object in xcms and add the slot I'll need. That's how I do it with the OnDiskMSnExp at present.
Let me explain you the xcms pre-processing and the corresponding results from each step: the first step in the xcms pre-processing identifies chromatographic peaks. The result is a matrix, each row being the definition of the chromatographic peak in one sample. The next step is to align the samples (aka retention time correction). The result is a list of numeric with the updated retention times for the spectra of each file. The final step groups chromatographic peaks across samples. The result is a DataFrame with the definitions of the m/z-rt features. So, that's the data I need to store in my result object.
The final two-dimensional matrix with rows being m/z-rt features and columns samples is generated on-the-fly from the chromatographic peak matrix and feature definition DataFrame.

This one could indeed go into the assays slot (although for large experiments it makes more sense to create it always on-the-fly to keep memory demand low).

While I could imagine that it would be possible to extract the ion chromatograms for the identified chromatographic peaks and put them into the @chromatograms slot (thus the chromatographic peak matrix described above would not be needed). In practive this would however not work, because a) memory demand would be way too large and b) for diagnostic purposes it is also better to create such chromatograms dynamically allowing the user to extract also signal left and/or right (in retention time dimension) of the actually detected peak. I would thus prefer to keep the definition of identified chromatographic peaks as a matrix (with arbitrary length).

Summarizing, a MSnExperiment object with an assays slot being a list of matrices (a la SummarizedExperiment) would be kind of OK for me. I would then extend this object in xcms to add eventually required additional slots.

@sgibb

This comment has been minimized.

Copy link
Collaborator

commented Mar 25, 2019

BTW: Should we inherit from SummerizedExperiment? AFAIK the suggested MSnExperiment above is a SummerizedExperiment with additional slots for spectra, chromatograms (if we really need that) and dataIndices.

@jorainer

This comment has been minimized.

Copy link
Collaborator

commented Mar 25, 2019

Would make sense to me.

@jorainer

This comment has been minimized.

Copy link
Collaborator

commented Apr 1, 2019

just linking @nilshoffmann here - mzTab-m (mzTab 2.0 for metabolomics): https://github.com/HUPO-PSI/mzTab

@lgatto

This comment has been minimized.

Copy link
Owner Author

commented Apr 1, 2019

I had a look recently, and I wan't not sure about SummarizedExperiment. I think it will very much depend on the requirements we want to set on the assays with regard to @jotsetung's current uses.

@lgatto

This comment has been minimized.

Copy link
Owner Author

commented Apr 8, 2019

Re mzTab: there is read support in MSnbase. Write support was dropped after the mzTab specifications changed, but it would be good to get in back.

@jorainer

This comment has been minimized.

Copy link
Collaborator

commented Apr 8, 2019

Yes, @nilshoffmann is working on an R-package that enables mgf export (and import).

@lgatto

This comment has been minimized.

Copy link
Owner Author

commented Apr 8, 2019

mfg or mzTab?

@jorainer

This comment has been minimized.

Copy link
Collaborator

commented Apr 8, 2019

Sorry, got things mixed up because I'm working with GNPS mgf files at present - it's mzTab 2.0. The package is https://github.com/lifs-tools/rmzTab-m

@lgatto

This comment has been minimized.

Copy link
Owner Author

commented Apr 9, 2019

And why limit this to metabolomics (assuming that what the M stands for)? Are the specs for proteomics and metabolomics that different?

@nilshoffmann

This comment has been minimized.

Copy link

commented Apr 9, 2019

@jorainer

This comment has been minimized.

Copy link
Collaborator

commented Apr 9, 2019

This should go then into xmcs instead. Sorry, my bad to link it here.

@lgatto

This comment has been minimized.

Copy link
Owner Author

commented Apr 9, 2019

Ok, thanks for the update on mzTab-P and mzTab-M.

I think MSnbase will then depend on these respective packages to create the appropriate data structures.

@lgatto

This comment has been minimized.

Copy link
Owner Author

commented Apr 10, 2019

@sgibb @jorainer - I looked back at SummarizedExperiment and its Assays class. The issue I see is that the elements in Assays must all have the same ncol and nrow. It would have been great to be able to store matrices of PSMs, peptides and proteins, but these will by definition have different ncol.

@jorainer

This comment has been minimized.

Copy link
Collaborator

commented Apr 10, 2019

OK, so it might be better a list (or List???).

@lgatto

This comment has been minimized.

Copy link
Owner Author

commented Apr 10, 2019

The underlying implementation of an Assays (a slot in an SummarizedExperment) is a SimpleList (actually a ShallowSImpleList, among others possible). Would be worth checking with Martin to see if we could possibly have an additionnal implementation that only checks of ncol are identical.

@jorainer

This comment has been minimized.

Copy link
Collaborator

commented Apr 11, 2019

the reason that they check for ncol and nrow will be the rowData and colData. If we have arrays with different dimensions in an assays slot we would also need a specific rowData and colData for each of them... tricky

@lgatto

This comment has been minimized.

Copy link
Owner Author

commented Apr 11, 2019

Previously in this issue, we mentioned we didn't want this, but wanted to track the link between these assay rows to a single colData, matching the longest assay. If this isn't possible at all, we won't be able to reuse any of that infrastructure. Hence my desire to chat with Martin. We still have some work before we get there anyway.

@sgibb

This comment has been minimized.

Copy link
Collaborator

commented Apr 11, 2019

Regarding track the link between theses assays and the dataIndicies slot it may be worth looking into https://github.com/LTLA/IndexedRelations (even if we just use it as inspiration).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.