Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--useMtx is ignored if too.big == TRUE in runSeurat.R #262

Open
RoganGrant opened this issue Feb 8, 2023 · 6 comments
Open

--useMtx is ignored if too.big == TRUE in runSeurat.R #262

RoganGrant opened this issue Feb 8, 2023 · 6 comments

Comments

@RoganGrant
Copy link

First of all, thank you for this incredibly useful package. We get a lot of use out of it.

For large matrices (where too.big = TRUE), I've run into an issue where you can't force --useMtx to be TRUE. This is because the first line of this chunk will always read TRUE in runSeurat.R:

if (use.mtx || too.big) {
        # we have to write the matrix to an mtx file
        matrixPath <- file.path(dir, paste(prefix, "matrix.mtx", sep=""))
        genesPath <- file.path(dir, paste(prefix, "features.tsv", sep=""))
        barcodesPath <- file.path(dir, paste(prefix, "barcodes.tsv", sep=""))
        message("Writing expression matrix to ", matrixPath)
        writeMM(counts, matrixPath)
        # easier to load if the genes file has at least two columns. Even though seurat objects
        # don't have yet explicit geneIds/geneSyms data, we just duplicate whatever the matrix has now
        write.table(as.data.frame(cbind(rownames(counts), rownames(counts))), file=genesPath, sep="\t", row.names=F, col.names=F, quote=F)
        write(colnames(counts), file = barcodesPath)
        message("Gzipping expression matrix")
        gzip(matrixPath)
        gzip(genesPath)
        gzip(barcodesPath)
  } else {
      # we can write the matrix as a tsv file
      gzPath <- file.path(dir, paste(prefix, "exprMatrix.tsv.gz", sep=""))
      if (too.big) {
          if (.Platform$OS.type=="windows")
              error("Cannot write very big matrices to a text file on Windows. Please use the --useMtx (R: use.mtx) option")
          writeSparseTsvChunks(counts, gzPath);
      } else {
          mat = as.matrix(counts)

Would it be possible to allow the use to force a tsv instead, such as changing (use.mtx || too.big) to (use.mtx || (.Platform$OS.type=="windows" && too.big))? I ask largely because cbBuild consistently fails for me with .mtx files, and I can't figure out precisely how to configure the cellbrowser.conf to fix this issue.

Thank you!

@RoganGrant
Copy link
Author

Realizing now that the --forceMtx flag does not take a text argument, and rather is true if specified, false if not. In any case, it would be great to have an equivalent --forceTSV flag

@maximilianh
Copy link
Owner

maximilianh commented Feb 8, 2023 via email

@maximilianh
Copy link
Owner

maximilianh commented Feb 8, 2023 via email

@RoganGrant
Copy link
Author

RoganGrant commented Feb 8, 2023

Thank you for the quick response! I have personally converted this matrix to non-sparse in R in the course of certain function calls without issue, but the documentation agrees with you. I honestly don't know how much of a risk this poses in terms of the function failing for others.

In any case I have no issue with .mtx files, but I can't get them to work at all with cbBuild. The cellbrowser.conf file still points to a single tsv file that does not exist, and manually supplying each individual file does not seem to work (next it asks for a barcodes.tsv, which is ignored if I specify directly for each assay). My ultimate solution (which worked very well) was to run mtx2tsv on each assay before deployment.

@maximilianh
Copy link
Owner

maximilianh commented Feb 11, 2023 via email

@RoganGrant
Copy link
Author

Sorry, I should have waited to give more concrete examples. My object has three assays (counts, data, and scale). As far as I can tell cbBuild does not handle this correctly if a .mtx file is used. If I run cbBuild without any conversion, I initially get the following error:

FileNotFoundError: [Errno 2] No such file or directory: '[path]/counts_exprMatrix.tsv.gz'

Full trace:

INFO:root:dataRoot is not set in ~/.cellbrowser.conf or via $CBDATAROOT. Dataset hierarchies are not supported.
INFO:root:Creating [path]
INFO:root:Determining if [path]/exprMatrix.tsv.gz needs to be created
INFO:root:[path]/exprMatrix.tsv.gz does not exist. Must build matrix now.
INFO:root:Creating [path]/metaFields
INFO:root:Checking and reordering meta data to [path]/meta.tsv
INFO:root:Reading sample names from [path]/meta.tsv
INFO:root:Reading headers from file [path]/counts_exprMatrix.tsv.gz
ERROR:root:Unexpected error: (<class 'FileNotFoundError'>, FileNotFoundError(2, 'No such file or directory'), <traceback object at 0x7f2afdea0f48>)
Traceback (most recent call last):
File "/home/deploy/.local/lib/python3.6/site-packages/cellbrowser/cellbrowser.py", line 4783, in cbBuildCli
build(confFnames, outDir, port, redo=options.redo)
File "/home/deploy/.local/lib/python3.6/site-packages/cellbrowser/cellbrowser.py", line 4598, in build
convertDataset(inDir, inConf, outConf, datasetDir, redo)
File "/home/deploy/.local/lib/python3.6/site-packages/cellbrowser/cellbrowser.py", line 3944, in convertDataset
sampleNames, needFilterMatrix = convertMeta(inDir, inConf, outConf, datasetDir, outMetaFname)
File "/home/deploy/.local/lib/python3.6/site-packages/cellbrowser/cellbrowser.py", line 3539, in convertMeta
sampleNames, needFilterMatrix = metaReorder(matrixFname, metaFname, finalMetaFname)
File "/home/deploy/.local/lib/python3.6/site-packages/cellbrowser/cellbrowser.py", line 2296, in metaReorder
matrixSampleNames = readMatrixSampleNames(matrixFname)
File "/home/deploy/.local/lib/python3.6/site-packages/cellbrowser/cellbrowser.py", line 2288, in readMatrixSampleNames
return readHeaders(fname)[1:]
File "/home/deploy/.local/lib/python3.6/site-packages/cellbrowser/cellbrowser.py", line 3135, in readHeaders
ifh = openFile(fname, "rtU")
File "/home/deploy/.local/lib/python3.6/site-packages/cellbrowser/cellbrowser.py", line 807, in openFile
fh = gzip.open(fname, mode, encoding=encoding)
File "/opt/rh/rh-python36/root/usr/lib64/python3.6/gzip.py", line 53, in open
binary_file = GzipFile(filename, gz_mode, compresslevel)
File "/opt/rh/rh-python36/root/usr/lib64/python3.6/gzip.py", line 163, in init
fileobj = self.myfileobj = builtins.open(filename, mode or 'rb')
FileNotFoundError: [Errno 2] No such file or directory: '[path]/counts_exprMatrix.tsv.gz'

The initial cellbrowser.conf file is as follows:

# This is a bare-bones cellbrowser config file auto-generated by the command-line tool cbImportSeurat
# or directly from R with SeuratWrappers::ExportToCellbrowser().
# Look at https://github.com/maximilianh/cellBrowser/blob/master/src/cbPyLib/cellbrowser/sampleConfig/cellbrowser.conf
# for a full file that shows all possible options
name="name"
shortLabel="name"
exprMatrix="counts_exprMatrix.tsv.gz"
matrices=[ {'label':'counts','fileName':'counts_exprMatrix.tsv.gz'},
 {'label':'data','fileName':'data_exprMatrix.tsv.gz'},
 {'label':'scale','fileName':'scale_exprMatrix.tsv.gz'}]
#tags = ["10x", "smartseq2"]
meta="meta.tsv"
# possible values: "gencode-human", "gencode-mouse", "symbol" or "auto"
geneIdType="auto"
# file with gene,description (one per line) with highlighted genes, called "Dataset Genes" in the user interface
# quickGenesFile="quickGenes.csv"
clusterField="typestate"
labelField="typestate"
enumFields=["orig.ident", "HTO_maxID", "HTO_secondID", "HTO_classification", "HTO_classification.global", "hash.ID", "MULTI_ID", "MULTI_classification"$
markers = [{"file": "markers.tsv", "shortLabel": "Seurat Cluster Markers"}]
coords=[{"file": "umap.coords.tsv", "shortLabel": "Seurat umap"},
{"file": "SCVI.coords.tsv", "shortLabel": "Seurat SCVI"}]

If I modify the cellbrowser.conf matrices and exprMatrix arguments as follows (note that scale is a smaller matrix, still gets output as a tsv):

exprMatrix="counts_matrix.mtx.gz"
matrices=[ {'label':'counts','fileName':'counts_matrix.mtx.gz'},
 {'label':'data','fileName':'data_matrix.mtx.gz'},
 {'label':'scale','fileName':'scale_exprMatrix.tsv.gz'}]

I run into a new error, where it seems cbBuild does not recognize the additional assays:

FileNotFoundError: [Errno 2] No such file or directory: '[path]/barcodes.tsv.gz'

Full trace:

INFO:root:dataRoot is not set in ~/.cellbrowser.conf or via $CBDATAROOT. Dataset hierarchies are not supported.
INFO:root:Determining if /var/www/apps/test/name/matrix.mtx.gz needs to be created
INFO:root:/var/www/apps/test/name/matrix.mtx.gz does not exist. Must build matrix now.
INFO:root:Checking and reordering meta data to /var/www/apps/test/name/meta.tsv
INFO:root:Reading sample names from [path]/meta.tsv
INFO:root:Reading sample names for [path] -> [path]/barcodes.tsv.gz
ERROR:root:Unexpected error: (<class 'FileNotFoundError'>, FileNotFoundError(2, 'No such file or directory'), <traceback object at 0x7fdd7ffa56c8>)
Traceback (most recent call last):
File "/home/deploy/.local/lib/python3.6/site-packages/cellbrowser/cellbrowser.py", line 4783, in cbBuildCli
build(confFnames, outDir, port, redo=options.redo)
File "/home/deploy/.local/lib/python3.6/site-packages/cellbrowser/cellbrowser.py", line 4598, in build
convertDataset(inDir, inConf, outConf, datasetDir, redo)
File "/home/deploy/.local/lib/python3.6/site-packages/cellbrowser/cellbrowser.py", line 3944, in convertDataset
sampleNames, needFilterMatrix = convertMeta(inDir, inConf, outConf, datasetDir, outMetaFname)
File "/home/deploy/.local/lib/python3.6/site-packages/cellbrowser/cellbrowser.py", line 3539, in convertMeta
sampleNames, needFilterMatrix = metaReorder(matrixFname, metaFname, finalMetaFname)
File "/home/deploy/.local/lib/python3.6/site-packages/cellbrowser/cellbrowser.py", line 2296, in metaReorder
matrixSampleNames = readMatrixSampleNames(matrixFname)
File "/home/deploy/.local/lib/python3.6/site-packages/cellbrowser/cellbrowser.py", line 2281, in readMatrixSampleNames
lines = openFile(barcodePath).read().splitlines()
File "/home/deploy/.local/lib/python3.6/site-packages/cellbrowser/cellbrowser.py", line 807, in openFile
fh = gzip.open(fname, mode, encoding=encoding)
File "/opt/rh/rh-python36/root/usr/lib64/python3.6/gzip.py", line 53, in open
binary_file = GzipFile(filename, gz_mode, compresslevel)
File "/opt/rh/rh-python36/root/usr/lib64/python3.6/gzip.py", line 163, in init
fileobj = self.myfileobj = builtins.open(filename, mode or 'rb')
FileNotFoundError: [Errno 2] No such file or directory: '[path]/barcodes.tsv.gz'

Finally, if I add additional fields to specify the naming structure, it seems to be ignored (but perhaps I am using the wrong arguments):

exprMatrix="counts_matrix.mtx.gz"
matrices=[ {'label':'counts','fileName':'counts_matrix.mtx.gz'},
 {'label':'data','fileName':'data_matrix.mtx.gz'},
 {'label':'scale','fileName':'scale_exprMatrix.tsv.gz'}]
barcodes=[ {'label':'counts','fileName':'counts_barcodes.tsv.gz'},
 {'label':'data','fileName':'data_barcodes.tsv.gz'}]
features=[ {'label':'counts','fileName':'counts_features.tsv.gz'},
 {'label':'data','fileName':'data_features.tsv.gz'}]

Same error:

FileNotFoundError: [Errno 2] No such file or directory: '[path]/barcodes.tsv.gz'

Thank you for your help with this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants