Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prepare Seurat for CellRanger 3.0 #708

Closed

Conversation

evolvedmicrobe
Copy link
Contributor

10X is releasing a new version of CellRanger that is changing the output format. This pull request makes Seurat forward compatible with the new version. In particular, the following changes are made:

HDF5 Format - The format of this file is changing. As Seurat is making HDF5r optional, the Read10X_h5 function will no longer be the preferred way to load data, and we have added an error message if newer (and incompatible files are loaded.)

Text File Formats - In order to save disk space, the sparse matrix and barcode text files will now be gzipped. As R automatically identifies and correctly reads gzipped files, no changes were needed to account for this other than appending a suffix when necessary. Additionally, in order to account for experiments that have "multimodal" datasets, the gene.tsv will instead become the features.tsv file. This file will contain an additional column describing the type of feature referred to in that row of the matrix.

In order to help users follow the multimodal vignette and analyze this type of data, rather than return one matrix, with multimodal data a list of matrices are returned, with the name of each element in the list corresponding to the type of data. As this new data is distinct from the older types of data shown in past tutorials, we print a message when this occurs so the user is aware that they have different data types.

Very small test data files and associated tests were added to verify the expected behavior.

evolvedmicrobe and others added 5 commits July 16, 2018 16:51
If the data used for the regression contains NA values, this will lead to errors downstream.  For example, if one row contains an NA value, it will be removed and create a size mismatch when a data.frame is later constructed using the residuals of all rows, or if a linear model is used, it will create a problem when the QR values are reused.
Warn on NA data being used for regression
CellRanger 3.0 is going to use a different version of the hdf5 file, which will no longer include this PYTABLES attribute.
10X is releasing a new version of CellRanger that is changing the output format.  This pull request makes Seurat forward compatible with the new version.  In particular, the following changes are made:

**HDF5 Format** - The format of this file is changing.  As Seurat is making HDF5r optional, the `Read10X_h5` function will no longer be the preferred way to load data, and we have added an error message if newer (and incompatible files are loaded.)

**Text File Formats** - In order to save disk space, the sparse matrix text file will now be gzipped.  As R automatically identifies and correctly reads gzipped files, no changes were needed to account for this other than appending a suffix when necessary.  Additionally, in order to account for experiments that have "multimodal" datasets, the `gene.tsv` will instead become the `features.tsv` file.  This file will contain an additional column describing the type of feature referred to in that row of the matrix.

In order to help users follow the [multimodal vignette](https://satijalab.org/seurat/multimodal_vignette.html) and analyze this type of data, rather than return one matrix, with multimodal data a list of matrices are returned, with the name of each element in the list corresponding to the type of data.  As this new data is distinct from the older types of data shown in past tutorials, we print a message when this occurs so the user is aware that they have different data types.
@satijalab
Copy link
Collaborator

Hi Nigel - thanks very (very!) much we reached out to you on your gmail to discuss this and additional future plans offline- let us know if you didn't receive or if there is a better address to reach you at.

@evolvedmicrobe
Copy link
Contributor Author

Hi @satijalab, sorry for the slow response, I just wrote back from my more "official" email address after rebooting following a family vacation. Hope to continue the dialog.

Warm wishes,
Nigel

@evolvedmicrobe evolvedmicrobe changed the base branch from develop to release/3.0 September 28, 2018 22:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants