Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prepare Seurat for CellRanger 3.0 #933

Merged
merged 7 commits into from Nov 15, 2018

Conversation

evolvedmicrobe
Copy link
Contributor

10X is releasing a new version of CellRanger that is changing the output format. This pull request makes Seurat forward compatible with the new version. In particular, the following changes are made:

HDF5 Format - The format of this file is changing. As Seurat is making HDF5r optional, the Read10X_h5 function will no longer be the preferred way to load data, and we have added an error message if newer (and incompatible files are loaded.)

Text File Formats - In order to save disk space, the sparse matrix and barcode text files will now be gzipped. As R automatically identifies and correctly reads gzipped files, no changes were needed to account for this other than appending a suffix when necessary. Additionally, in order to account for experiments that have "multimodal" datasets, the gene.tsv will instead become the features.tsv file. This file will contain an additional column describing the type of feature referred to in that row of the matrix.

In order to help users follow the multimodal vignette and analyze this type of data, rather than return one matrix, with multimodal data a list of matrices are returned, with the name of each element in the list corresponding to the type of data. As this new data is distinct from the older types of data shown in past tutorials, we print a message when this occurs so the user is aware that they have different data types.

Very small test data files and associated tests were added to verify the expected behavior.

Note, this was originally PR #708 but was modified to merge into the latest release/3.0 branch.

evolvedmicrobe and others added 6 commits July 16, 2018 16:51
If the data used for the regression contains NA values, this will lead to errors downstream.  For example, if one row contains an NA value, it will be removed and create a size mismatch when a data.frame is later constructed using the residuals of all rows, or if a linear model is used, it will create a problem when the QR values are reused.
Warn on NA data being used for regression
CellRanger 3.0 is going to use a different version of the hdf5 file, which will no longer include this PYTABLES attribute.
10X is releasing a new version of CellRanger that is changing the output format.  This pull request makes Seurat forward compatible with the new version.  In particular, the following changes are made:

**HDF5 Format** - The format of this file is changing.  As Seurat is making HDF5r optional, the `Read10X_h5` function will no longer be the preferred way to load data, and we have added an error message if newer (and incompatible files are loaded.)

**Text File Formats** - In order to save disk space, the sparse matrix text file will now be gzipped.  As R automatically identifies and correctly reads gzipped files, no changes were needed to account for this other than appending a suffix when necessary.  Additionally, in order to account for experiments that have "multimodal" datasets, the `gene.tsv` will instead become the `features.tsv` file.  This file will contain an additional column describing the type of feature referred to in that row of the matrix.

In order to help users follow the [multimodal vignette](https://satijalab.org/seurat/multimodal_vignette.html) and analyze this type of data, rather than return one matrix, with multimodal data a list of matrices are returned, with the name of each element in the list corresponding to the type of data.  As this new data is distinct from the older types of data shown in past tutorials, we print a message when this occurs so the user is aware that they have different data types.
@evolvedmicrobe
Copy link
Contributor Author

Hi Seurat Team,

With the new 3.0 branch, I thought it would be easier to close the old PR, and make a new one with the changes already merged in to the new release/3.0 branch. I'll circle back to make sure it passes tests, but let me know if you'd like to see any other changes.

Out of curiosity, is there any estimate of when 3.0 will be heading out? I've showed it to a few people who were all enthusiastic about it.

Cheers,
Nigel

@mojaveazure mojaveazure merged commit 711f486 into satijalab:release/3.0 Nov 15, 2018
@mojaveazure
Copy link
Member

Thanks Nigel!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants