Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Easier way to import (spectra) data from an experiment #32

Closed
jorainer opened this issue Dec 7, 2022 · 11 comments
Closed

Easier way to import (spectra) data from an experiment #32

jorainer opened this issue Dec 7, 2022 · 11 comments

Comments

@jorainer
Copy link
Member

jorainer commented Dec 7, 2022

The current way to create a MsExperiment and adding data to it is IMHO a bit cumbersome. In most of my use cases (e.g. in a typical xcms-based analysis workflow) I have a set of mzML files and a data.frame with sample annotations. Importing that data in the current way involves many code of lines with the possibility of mistakes or misunderstandings (especially for the linking between samples and spectra). I think it would be nice (especially also for the less experienced user) to provide a simple import functions like the one below:

readMsExperiment <- function(files = character(),
                             sampleData = data.frame(), ...) {
    files <- normalizePath(files)
    if (!nrow(sampleData))
        sampleData <- data.frame(sample_index = seq_along(files))
    if (nrow(sampleData) != length(files))
        stop("Number of rows in 'sampleData' have to match the number of files")
    sampleData$dataOrigin <- files
    if (!inherits(sampleData, "DataFrame"))
        sampleData <- DataFrame(sampleData)
    x <- MsExperiment()
    sampleData(x) <- sampleData
    spectra(x) <- Spectra(files, ...)
    linkSampleData(x, with = "sampleData.dataOrigin = spectra.dataOrigin")
}

I currently have that function in xcms but maybe we could add that directly to MsExperiment. @lgatto @sgibb any opinion on that?

@lgatto
Copy link
Member

lgatto commented Dec 7, 2022

Yes, indeed, that would be useful. Why not use the MsExperiment() constructor, and allow to pass populate slots, such as spectra?

@jorainer
Copy link
Member Author

jorainer commented Dec 7, 2022

The problem with that is (IMO) that passing a pre-loaded Spectra do MsExperiment does not simplify the linking step that is needed to assign samples (rows in sampleData to spectra from e.g. one file):

sps <- Spectra(fls)
mse <- MsExperiment(sps, sampleData = pd)

that would then pre-fill the slots with the spectra and the sample data, but then an additional linkSampleData is required - which needs the user to define which spectra belong to which sample. And that's actually the step I find pretty cumbersome.

Importing the data using file names and the sample data (like with the readMsExperiment above) would have the advantage that (in most experiments) one file contains data from one sample (row in the sample data) and the link could be automatically established.

@lgatto
Copy link
Member

lgatto commented Dec 7, 2022

Of course, I understand, but why no adding the logic of linking in the constructor. That same logic could be applied if the object is constructed from spectra or chromatograms (probably not relevant for quantitative data). Otherwise, we'll need to have a readMsData() for the former and a readChromData() for the latter. And what if we have both?

@jorainer
Copy link
Member Author

jorainer commented Dec 9, 2022

That's why I didn't put it in the MsExperiment constructor :)

OK, a solution for having all in the MsExperiment constructor could be:

sps <- Spectra(pd$dataOrigin)
mse <- MsExperiment(spectra = sps, sampleData = pd)

requiring that pd contains a column pd$spectraOrigin which contains the file names of the original data files (including the full path). Then the code within MsExperiment could establish the link using that column and sps$dataOrigin. Would that be acceptable or do you have maybe a better/cleaner solution @lgatto ?

@lgatto
Copy link
Member

lgatto commented Dec 12, 2022

Or, let's start with the simplest solution, which is the one you suggested first. Once it has matured and been tested, that function could simply be called from within the constructor when both spectra and a valid pd are provided. What do you think?

@jorainer
Copy link
Member Author

Or what about this:

MsExperiment <- function(spectra, sampleData, ...) {
    if (is.character(spectra)) {
        if (!nrow(sampleData))
            sampleData <- data.frame(spectraOrigin = spectra)
        if (!any(colnames(sampleData) == "spectraOrigin") && nrow(sampleData) == length(spectra))
            sampleData$spectraOrigin <- spectra
        spectra <- Spectra(spectra, ...)
    }
    x <- MsExperiment()
    x@spectra <- spectra
    x@sampleData <- sampleData
    if (any(colnames(sampleData) == "spectraOrigin"))
        linkSampleData(x, with = "sampleData.spectraOrigin = spectra.dataOrigin")
    x
}

this would be a combination of both variants... if a character vector of files is provided the data is imported and (if no sampleData is provided) linked to one sample per file. If a Spectra is provided it will be linked, but only if the sampleData contains a dedicated column (which I would suggest should be called "spectraOrigin", to specify that the column contains the original file names of spectra, not chromatograms or other). Obviously, this function needs also to allow passing values for all other slots.

@sgibb
Copy link
Member

sgibb commented Dec 14, 2022

I don't like to overload the meaning of the spectra argument with Spectra or character type (and different behaviour of the constructor; and it will just work for file based backends). What about a new files argument: MsExperiment(spectra, sampleData, ..., files) (it is done in a similar way in read.table: read.table(file, ..., text)).

In general I think we should follow your first thought and @lgatto's suggestion to provide a new function and if it has matured it could be called by the constructor, or?

@jorainer
Copy link
Member Author

Agree - let's not overdo it now. The problem with MsExperiment(spectra, sampleData, ..., files) would be that it is not clear that these files should be used to read/import Spectra from them - could also be any of the other slots (such as Chromatograms ) in future...

Anyway, I'll make a PR for a new simple import function as you suggest.

jorainer added a commit that referenced this issue Dec 16, 2022
@lgatto
Copy link
Member

lgatto commented Jan 3, 2023

The problem still persists in readMsExperiment(files)... the files could be read to import Chromatograms.

Do we need

  • a spectraFiles argument, or
  • a readMsExperimentSpectra() function?

@jorainer
Copy link
Member Author

jorainer commented Jan 9, 2023

Hm, I would prefer a readMsExperiment with a spectraFiles argument over a readMsExperimentSpectra function. I'll amend the PR accordingly.

@jorainer
Copy link
Member Author

jorainer commented Jan 9, 2023

In commit 70d8c0d (PR #33) I changed files -> spectraFiles.

jorainer added a commit that referenced this issue Feb 13, 2023
feat: add readMsExperiment (issue #32)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants