Prepare Seurat for CellRanger 3.0 #708

evolvedmicrobe · 2018-08-18T00:42:09Z

10X is releasing a new version of CellRanger that is changing the output format. This pull request makes Seurat forward compatible with the new version. In particular, the following changes are made:

HDF5 Format - The format of this file is changing. As Seurat is making HDF5r optional, the Read10X_h5 function will no longer be the preferred way to load data, and we have added an error message if newer (and incompatible files are loaded.)

Text File Formats - In order to save disk space, the sparse matrix and barcode text files will now be gzipped. As R automatically identifies and correctly reads gzipped files, no changes were needed to account for this other than appending a suffix when necessary. Additionally, in order to account for experiments that have "multimodal" datasets, the gene.tsv will instead become the features.tsv file. This file will contain an additional column describing the type of feature referred to in that row of the matrix.

In order to help users follow the multimodal vignette and analyze this type of data, rather than return one matrix, with multimodal data a list of matrices are returned, with the name of each element in the list corresponding to the type of data. As this new data is distinct from the older types of data shown in past tutorials, we print a message when this occurs so the user is aware that they have different data types.

Very small test data files and associated tests were added to verify the expected behavior.

If the data used for the regression contains NA values, this will lead to errors downstream. For example, if one row contains an NA value, it will be removed and create a size mismatch when a data.frame is later constructed using the residuals of all rows, or if a linear model is used, it will create a problem when the QR values are reused.

Warn on NA data being used for regression

…n of genes.tsv to use as gene names

CellRanger 3.0 is going to use a different version of the hdf5 file, which will no longer include this PYTABLES attribute.

10X is releasing a new version of CellRanger that is changing the output format. This pull request makes Seurat forward compatible with the new version. In particular, the following changes are made: **HDF5 Format** - The format of this file is changing. As Seurat is making HDF5r optional, the `Read10X_h5` function will no longer be the preferred way to load data, and we have added an error message if newer (and incompatible files are loaded.) **Text File Formats** - In order to save disk space, the sparse matrix text file will now be gzipped. As R automatically identifies and correctly reads gzipped files, no changes were needed to account for this other than appending a suffix when necessary. Additionally, in order to account for experiments that have "multimodal" datasets, the `gene.tsv` will instead become the `features.tsv` file. This file will contain an additional column describing the type of feature referred to in that row of the matrix. In order to help users follow the [multimodal vignette](https://satijalab.org/seurat/multimodal_vignette.html) and analyze this type of data, rather than return one matrix, with multimodal data a list of matrices are returned, with the name of each element in the list corresponding to the type of data. As this new data is distinct from the older types of data shown in past tutorials, we print a message when this occurs so the user is aware that they have different data types.

satijalab · 2018-08-22T15:34:27Z

Hi Nigel - thanks very (very!) much we reached out to you on your gmail to discuss this and additional future plans offline- let us know if you didn't receive or if there is a better address to reach you at.

evolvedmicrobe · 2018-09-12T23:53:09Z

Hi @satijalab, sorry for the slow response, I just wrote back from my more "official" email address after rebooting following a family vacation. Hope to continue the dialog.

Warm wishes,
Nigel

evolvedmicrobe and others added 5 commits July 16, 2018 16:51

Merge pull request satijalab#619 from evolvedmicrobe/warn_na

9b8a628

Warn on NA data being used for regression

Add gene.column option to Read10X to let the user specify which colum…

3cd9be0

…n of genes.tsv to use as gene names

Check HDF5 Format Version

5612fac

CellRanger 3.0 is going to use a different version of the hdf5 file, which will no longer include this PYTABLES attribute.

evolvedmicrobe force-pushed the cellranger3.0 branch from 331f318 to 9b91694 Compare August 18, 2018 02:58

evolvedmicrobe changed the base branch from develop to release/3.0 September 28, 2018 22:05

evolvedmicrobe mentioned this pull request Nov 14, 2018

Prepare Seurat for CellRanger 3.0 #933

Merged

evolvedmicrobe closed this Nov 14, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prepare Seurat for CellRanger 3.0 #708

Prepare Seurat for CellRanger 3.0 #708

evolvedmicrobe commented Aug 18, 2018

satijalab commented Aug 22, 2018

evolvedmicrobe commented Sep 12, 2018

Prepare Seurat for CellRanger 3.0 #708

Prepare Seurat for CellRanger 3.0 #708

Conversation

evolvedmicrobe commented Aug 18, 2018

satijalab commented Aug 22, 2018

evolvedmicrobe commented Sep 12, 2018