integer error for Read10X #4030

Ting-PKU · 2021-02-05T05:18:55Z

I want to load a large-scale single cell data using Read10X. But there is an error message as follows:

Error in scan(file, nmax = 1, what = what, quiet = TRUE, ...) :
scan() expected 'an integer', got '2319108599'

Do you have any idea to solve this problem?

saketkc · 2021-02-05T18:31:26Z

Are you able to read the mtx file on its own?

> library(Matrix)
> counts <- readMM("matrix.mtx")

AlexStewart25 · 2021-02-09T11:08:35Z

I'm having exactly the same problem (given the identical integer value I suspect we are both looking at the new scCovid data from Cell journal). Running @saketkc fread on the matrix now, it's a 9gb file but seems to be working. How do you then integrate the barcodes and features when the counts are loaded? You get the following outputs:

Warning messages:
1: In fread("~/Data/matrix.mtx.gz") :
Detected 1 column names but the data has 3 columns (i.e. invalid file). Added 2 extra default column names at the end.
2: In setattr(ans, "row.names", .set_row_names(nr)) :
NAs introduced by coercion to integer range
<simpleError in dim.data.table(obj): long vectors not supported yet: ../../src/include/Rinlinedfuns.h:522>

covid<-CreateSeuratObject(counts, project = "covid", min.cells = 3, min.features = 200)
Error in dim.data.table(x) :
long vectors not supported yet: ../../src/include/Rinlinedfuns.h:52

jonasns · 2021-02-10T07:57:53Z

I am also trying to load this dataset, and I found that the data structure is different from a normal 10X structure:

head GSE158055_covid19_counts.mtx
%%MatrixMarket matrix coordinate real general
%
1462702 27943 2319108599
1 18558 6.2000000e+01
1 18565 8.0000000e+00
1 18564 2.9000000e+01
1 18562 9.6000000e+01
1 18563 8.0000000e+00
1 18561 4.6000000e+01
1 18557 1.3600000e+02

normal 10X structure:
head matrix.mtx
%%MatrixMarket matrix coordinate integer general
%metadata_json: {"format_version": 2, "software_version": "3.0.2"}
58051 737280 10601056
41 2 1
3122 2 1
3125 2 3
3133 2 1
3506 2 1
6649 2 1
7271 2 1

I would suspect that this is the reason why the Read10X command doesn't work.

Ting-PKU · 2021-02-10T08:13:34Z

Yes, the data structure is different from 10X structure. Also, the dataset is too big and beyond the limit of 2147483647. Therefore, I splited it into several parts and swap the first and second columns. Finally, it worked.

AlexStewart25 · 2021-02-10T08:19:03Z

Any chance you could post the script you used @Ting-PKU ?

Ting-PKU · 2021-02-10T09:21:43Z

#remove header (first three lines) and split into two parts
sed -n '4,1159554303p' matrix.mtx |awk '{print $2"\t"$1"\t"$3}' > part1
sed -n '1159554304,2319108602p' matrix.mtx |awk '{print $2"\t"$1"\t"$3}' > part2
#add header to each part
part1:
%%MatrixMarket matrix coordinate real general
%
27943 1462702 1159554300
part2:
%%MatrixMarket matrix coordinate real general
%
27943 1462702 1159554299

xiaotong-666 · 2021-02-12T13:58:41Z

您好，请问barcodes.tsv 是否也要拆成相应的两部分呢，非常感谢！@Ting-PKU

Ting-PKU · 2021-02-12T14:55:23Z

不需要的

xiaotong-666 · 2021-02-12T14:57:45Z

感谢！@Ting-PKU

saketkc · 2021-02-13T18:27:52Z

The number of non-zero entries in this dataset exceeds the integer limit of R making it impossible to make a dgCMatrix. You can either split the matrix file into two (or more) separate mtx files as done above and proceed. This will also require you to split the barcodes accordingly.

I have created a anndata object of the complete matrix here.

suyanxun · 2021-02-26T01:34:35Z

楼主你好~ 我按照你的命令拆成了两个，但是在读取的时候遇到了另外一个错误：

Error in [.data.frame(feature.names, , gene.column) :
undefined columns selected

请问你有遇到类似的问题么

Ting-PKU · 2021-02-26T02:16:03Z

While you use Read10X() function, you should add gene.column = 1 as the features.tsv file only has one column

suyanxun · 2021-02-26T08:15:25Z

强👍，多谢多谢

GAgafencu · 2021-03-22T21:16:48Z

@saketkc thank you for helping with the issue raised by Ting. I'm currently using the anndata object you generated from that the dataset that Ting was trying to load. Could you please let me know how did you generate it starting from the data available at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE158055 . I'm a novice at manipulating single cell RNA-seq data in Python and your insights would be greatly appreciated.

Bon-jour · 2021-03-27T08:41:54Z

Thank you for your proposal. I have another question, How to read part1 and part2 with Read10X?

Bon-jour · 2021-03-27T08:42:14Z

@https://github.com/Ting-PKU

Ting-PKU · 2021-03-28T08:36:49Z

Thank you for your proposal. I have another question, How to read part1 and part2 with Read10X?

You have to follow the requirement of Read10X() function. That is organizing the two matrices as
part1 /
barcodes.tsv.gz
features.tsv.gz
matrix.mtx.gz
part2 /
barcodes.tsv.gz
features.tsv.gz
matrix.mtx.gz
Then read them in R with Read10X("dir/to/part1",gene.column = 1 )or Read10X("dir/to/part2",gene.column = 1 )

Bon-jour · 2021-03-28T09:56:20Z

谢谢你的回复🌹

…

---Original--- From: "Ting ***@***.***> Date: Sun, Mar 28, 2021 16:37 PM To: ***@***.***>; Cc: ***@***.******@***.***>; Subject: Re: [satijalab/seurat] integer error for Read10X (#4030) Thank you for your proposal. I have another question, How to read part1 and part2 with Read10X? You have to follow the requirement of Read10X() function. That is organizing the two matrices as part1 / barcodes.tsv.gz features.tsv.gz matrix.mtx.gz part2 / barcodes.tsv.gz features.tsv.gz matrix.mtx.gz Then read them in R with Read10X("dir/to/part1",gene.column = 1 )or Read10X("dir/to/part2",gene.column = 1 ) — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

StevenDaChui · 2021-04-21T03:00:18Z

#remove header (first three lines) and split into two parts
sed -n '4,1159554303p' matrix.mtx |awk '{print $2"\t"$1"\t"$3}' > part1
sed -n '1159554304,2319108602p' matrix.mtx |awk '{print $2"\t"$1"\t"$3}' > part2
#add header to each part
part1:
%%MatrixMarket matrix coordinate real general
%
27943 1462702 1159554300
part2:
%%MatrixMarket matrix coordinate real general
%
27943 1462702 1159554299

I use this code in linux but it didnt work, could you plz explain this code again. Im new to linux. Thank you very much!

Ting-PKU · 2021-04-23T11:52:38Z

#remove header (first three lines) and split into two parts
sed -n '4,1159554303p' matrix.mtx |awk '{print $2"\t"$1"\t"$3}' > part1
sed -n '1159554304,2319108602p' matrix.mtx |awk '{print $2"\t"$1"\t"$3}' > part2
#add header to each part
part1:
%%MatrixMarket matrix coordinate real general
%
27943 1462702 1159554300
part2:
%%MatrixMarket matrix coordinate real general
%
27943 1462702 1159554299

I use this code in linux but it didnt work, could you plz explain this code again. Im new to linux. Thank you very much!

Maybe you can paste your code and error reporting here then we can help to figure it out.

SeZuNo · 2021-05-14T08:56:54Z

Hi there! is it possible to attend this issue again? specially from you @Ting-PKU, seems like you got a hold on the problem 👍
Once the Matrix has been split and try to use Read10x, I get the error on readMM() with an additional warning: "In scan(file, nmax = nz, quiet = TRUE, what = list(i = integer(), : number of items read is not a multiple of the number of columns". (same as in #2946)

For part1: There are 1159554300 entries indicated in the matrix header and there are 1159554303 lines in the file
For part2: There are 1159554299 entries in the header whereas the file has 1159554302. That seems to make sense given that each part has 3 lines of header in MMarket format.

In one of the answers @saketkc mentioned that the other files (barcodes and features) should "split accordingly". Is this split taken into account when splitting the matrix?, if so, why only the entries where modified and not the features (27943) or barcodes (1462702).

This is my first use of the Seurat Package and any pointers are greatly appreciated :)

Yale73 · 2021-05-21T02:28:01Z

Hi @saketkc,

Your anndata object is really helpful. But it will be great if you can match your sample ID to their raw sample ID, then we can subset cells we are interested in.
I checked your sample IDs are d01_sample_A_xxx, d01_sample_B_xxx, d01_sample_C_xxx, ..., d02_200114A_12_xxx, d02_200114B_23_xxx, ..., d17_9_xxx , while their sample IDs are S-HC001, S-HC002, etc

Thanks,
Yale

li-xuyang28 · 2021-06-03T18:24:00Z

The number of non-zero entries in this dataset exceeds the integer limit of R making it impossible to make a dgCMatrix. You can either split the matrix file into two (or more) separate mtx files as done above and proceed. This will also require you to split the barcodes accordingly.

I have created a anndata object of the complete matrix here.

Hi,

Thank you for sharing the h5ad file, but could you please provide some guidance on how to load this into Seurat? I tried to follow the tutorial from SeuratDisk

Convert("covid19.h5ad", dest = "h5seurat", overwrite = TRUE)
so <- LoadH5Seurat("covid19.h5seurat")

And got the following error:

Validating h5Seurat file
Initializing RNA with data
Error in if ((lp <- length(p)) < 1 || p[1] != 0 || any((dp <- p[-1] -  : 
  missing value where TRUE/FALSE needed
In addition: Warning message:
In sparseMatrix(i = x[["indices"]][] + 1, p = x[["indptr"]][], x = x[["data"]][],  :
  NAs introduced by coercion to integer range

Best,

saketkc · 2021-06-03T18:41:19Z

Sorry, the current R limitations make it impossible to load the entire dataset in R.

MitsuhaMiyamizu · 2021-06-16T09:34:11Z

The number of non-zero entries in this dataset exceeds the integer limit of R making it impossible to make a dgCMatrix. You can either split the matrix file into two (or more) separate mtx files as done above and proceed. This will also require you to split the barcodes accordingly.
I have created a anndata object of the complete matrix here.

Hi,

Thank you for sharing the h5ad file, but could you please provide some guidance on how to load this into Seurat? I tried to follow the tutorial from SeuratDisk
Convert("covid19.h5ad", dest = "h5seurat", overwrite = TRUE)
so <- LoadH5Seurat("covid19.h5seurat")
And got the following error:
Validating h5Seurat file
Initializing RNA with data
Error in if ((lp <- length(p)) < 1 || p[1] != 0 || any((dp <- p[-1] -  : 
  missing value where TRUE/FALSE needed
In addition: Warning message:
In sparseMatrix(i = x[["indices"]][] + 1, p = x[["indptr"]][], x = x[["data"]][],  :
  NAs introduced by coercion to integer range
Best,

Note that the error says "NAs introduced by coercion to integer range", which means you might hit the limitation of R. In case you might wonder what that limitation is, see discussion here:

https://stackoverflow.com/questions/28398007/support-for-bit-vector-size-limit-231-1-in-r

wanghao98 · 2021-12-04T19:53:05Z

Hello, just want to follow up on the splitting method. After we finish splitting, how should we combine the two part? Thanks a lot!

Lilab-SYSU · 2022-02-14T08:39:40Z

I splited the large dataset into two part. The part1 can be read into R correctly using Read10X, but the part2 dataset can' t be read into R correctly. The expression matrix of part 2 was entirely zero. Do you have same problem??

clc37 · 2022-03-20T09:26:44Z

@Ting-PKU 楼主你好~我是新手，按照你的命令拆成了两个，但得到的part1和part2文件格式不对，它们不是mtx文件，结果如下：

sed -n '1159554304,2319108602p' GSE158055_covid19_counts.mtx |awk '{print $2"\t"$1"\t"$3}' > part2
head part2
712621 22373 10
712621 13908 1
712621 5156 1
712621 17251 1
712621 17732 1
712621 8376 1
希望能得到你的帮助~谢谢

2022.03.21
我明白了~原先这个疑问解决了

Lilab-SYSU · 2022-03-20T09:28:48Z

我也没有拆成功，我用的scanpy python包读取之后拆的，我按照每个样本拆出来的。R没读进取 ***@***.*** From: clc37 Date: 2022-03-20 17:26 To: satijalab/seurat CC: Lilab-SYSU; Comment Subject: Re: [satijalab/seurat] integer error for Read10X (#4030) @Ting-PKU 楼主你好~我是新手，按照你的命令拆成了两个，但得到的part1和part2文件格式不对，它们不是mtx文件，结果如下： sed -n '1159554304,2319108602p' GSE158055_covid19_counts.mtx |awk '{print $2"\t"$1"\t"$3}' > part2 head part2 712621 22373 10 712621 13908 1 712621 5156 1 712621 17251 1 712621 17732 1 712621 8376 1 希望能得到你的帮助~谢谢 — Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you commented.Message ID: ***@***.***>

Ting-PKU · 2022-03-21T12:56:19Z

@Ting-PKU 楼主你好~我是新手，按照你的命令拆成了两个，但得到的part1和part2文件格式不对，它们不是mtx文件，结果如下：

sed -n '1159554304,2319108602p' GSE158055_covid19_counts.mtx |awk '{print $2"\t"$1"\t"$3}' > part2 head part2 712621 22373 10 712621 13908 1 712621 5156 1 712621 17251 1 712621 17732 1 712621 8376 1 希望能得到你的帮助~谢谢

2022.03.21 我明白了~原先这个疑问解决了

👍应该是没加header

clc37 · 2022-03-22T13:20:28Z

楼主@Ting-PKU，我又来打扰你了~按照你的方法将分成了三部分，成功读取了，但是在后续用R分析时又报错了，如下：
r$> all.genes <- rownames(sce1)
r$> sce1 <- ScaleData(sce1, features = all.genes)
Centering and scaling data matrix
| 0%Error in (function (mat, scale = TRUE, center = TRUE, scale_max = 10, :
std::bad_alloc
请问这个还是超过了内存的意思吗？那这个是不是只能用python来分析？在用scanpy分析时也失败了，如下：
r$>adata = sc.read_10x_mtx('./GSE158055/', var_names='gene_symbols', cache=True)
Traceback (most recent call last):
File "", line 1, in
File "/home/user03/clc/app/miniconda3/envs/python3.8/lib/python3.8/site-packages/scanpy/readwrite.py", line 481, in read_10x_mtx
adata = read(
File "/home/user03/clc/app/miniconda3/envs/python3.8/lib/python3.8/site-packages/scanpy/readwrite.py", line 545, in _read_v3_10x_mtx
adata = read(
File "/home/user03/clc/app/miniconda3/envs/python3.8/lib/python3.8/site-packages/scanpy/readwrite.py", line 112, in read
return _read(
File "/home/user03/clc/app/miniconda3/envs/python3.8/lib/python3.8/site-packages/scanpy/readwrite.py", line 728, in _read
raise FileNotFoundError(f'Did not find file {filename}.')
FileNotFoundError: Did not find file /home/user03/clc/Yip/sepsis/GSE158055/first_analysis/0/matrix.mtx.gz.

clc37 · 2022-03-23T12:06:06Z

@saketkc Your anndata object is really helpful. But it will be great if you can match your sample ID to their raw sample ID, then we can subset cells we are interested in.
I checked your sample IDs are d01_sample_A_xxx, d01_sample_B_xxx, d01_sample_C_xxx, ..., d02_200114A_12_xxx, d02_200114B_23_xxx, ..., d17_9_xxx , while their sample IDs are S-HC001, S-HC002, etc.
or could you please explain this code? thanks

aditya-sarkar441 · 2022-03-25T20:40:05Z

Is this the .obs of the adata attached above ? Can someone confirm ? @saketkc

saketkc · 2022-03-25T21:14:05Z

The data is also downloadable from here: http://covid19.cancer-pku.cn/#/metadata

saketkc · 2022-03-25T21:15:13Z

See this gist for a discussion about the metadata: https://gist.github.com/saketkc/31f26c71d61dbe1cf8bb9cff6af5d04f

aditya-sarkar441 · 2022-03-26T03:20:57Z

Thanks @saketkc . That's helpful. I was looking for this only.

Diennguyen8290 · 2022-04-04T19:02:12Z

The data is also downloadable from here: http://covid19.cancer-pku.cn/#/metadata

@saketkc: I'm trying with this one and could not read the provided h5ad object (the details can be seen here: scverse/anndata#753). May I ask have you got chance to go with this object ? Thanks

Zjianglin · 2023-03-09T06:54:50Z

Hi @Ting-PKU , I split the counts.mtx.gz into two parts like you. However, I cannot load them into R environment, below is the error output:

 exp_matrix1 <- Read10X(file.path(wkdir, 'part1'), gene.column = 1)
Error: readMM(): row	 values 'i' are not in 1:nr

Here is the data content in part1:

$ ll
-rw-rw-r-- 1 zjl zjl    30306249 Mar  9 08:37 barcodes.tsv
-rw-rw-r-- 1 zjl zjl      221641 Mar  9 08:36 features.tsv
lrwxrwxrwx 1 zjl zjl          12 Mar  9 11:55 genes.tsv -> features.tsv
-rw-rw-r-- 1 zjl zjl 17046103030 Mar  8 22:17 matrix.mtx

$ head matrix.mtx 
%%MatrixMarket matrix coordinate real general
%
27943 1462702 1159554300
1	18558	62
1	18565	8
1	18564	29
1	18562	96
1	18563	8
1	18561	46
1	18557	136

$ wc -l *.tsv
 1462702 barcodes.tsv
   27943 features.tsv

Could you please help me, did I do something wrong or lost some necessary steps? Thanks.

jianrong0520 · 2023-07-28T09:41:44Z

hi,Zjianlin, did you solve you problem,I have the same error with you,if you solve your error, please give me some help about this problem.

zochzh · 2023-09-05T10:49:34Z

Did you solve you problem,I have the same error with you~I wonder if I can get your help？

xiangmingcai · 2023-10-05T19:52:17Z

I kind of solved this problem. This file is indeed too huge for me, and I only need a subset of data from it. So, I split this .mtx file into hundreds of .mtx files. So that I can process them in batch using R.
Here is my code
// split files by rows

split -l 5000000 -d --verbose /media/sf_share/filtered_feature_bc_matrix/GSE158055_covid19_counts.mtx counts.mtx

// modify splited file names

for i in `ls|grep counts`; do a=`echo $i|awk -F '.mtx' '{print $1$2".mtx"}'`; mv $i $a; done

// correct first file

head counts00.mtx
sed -i '3c 27943 1462702 4999997' counts00.mtx
head counts00.mtx

// correct last file

wc -l counts9373.mtx

head counts9373.mtx
sed -i '1i\%%MatrixMarket matrix coordinate integer general\
%\
27943 1462702 4108602' counts9373.mtx
head counts9373.mtx

// move counts00.mtx and counts9373.mtx to another folder. And correct other files

for file in `ls`
	do
		echo $file
		sed -i '1i\%%MatrixMarket matrix coordinate integer general\
%\
27943 1462702 5000000' $file
	done

bellayqian · 2024-01-14T07:38:52Z

Hi everyone, sorry to bother but I have a new problem with this question.
I completely follow this instruction, but I still have this error. This time, the value after got became the first line of the matrix.mtx.gz file. Please take a look of my code and error:

This is the original matrix.mtx file:

head matrix.mtx
%%MatrixMarket matrix coordinate real general
%
45107 100940 2447013140

Here is my code to separate matrix.mtx file to part 1 and part 2:

Remove header (first three lines) and split into two parts

sed -n '4,1223506572p' matrix.mtx |awk '{print $2"\t"$1"\t"$3}' > part1
sed -n '1223506573, 2447013140p' matrix.mtx |awk '{print $2"\t"$1"\t"$3}' > part2

Add header to each part

# part1:
echo "%%MatrixMarket matrix coordinate real general" > temp_header
echo "%" >> temp_header
echo "100940 45107 1223506570 1223506569" >> temp_header
cat temp_header part1.mtx > part1_new.mtx
# check head of part1.mtx and part1_new.mtx
mv part1_new.mtx matrix.mtx
rm temp_header part1.mtx

# part2:
echo "%%MatrixMarket matrix coordinate real general" > temp_header
echo "%" >> temp_header
echo "100940 45107 1223506570 1223506568" >> temp_header
cat temp_header part2.mtx > part2_new.mtx
mv part2_new.mtx matrix.mtx
rm temp_header part2.mtx

Here is the error I got:

part1 <- Read10X(data.dir = './part1/', gene.column = 1)
Error in scan(file, nmax = nz, quiet = TRUE, what = list(i = integer(), :
scan() expected 'an integer', got '8.1000000e-02'
part2 <- Read10X(data.dir = './part2/', gene.column = 1)
Error in scan(file, nmax = nz, quiet = TRUE, what = list(i = integer(), :
scan() expected 'an integer', got '4.1600001e-01'

`readMM` function gives the exact error:

counts <- readMM("part2/matrix.mtx.gz")
Error in scan(file, nmax = nz, quiet = TRUE, what = list(i = integer(), :
scan() expected 'an integer', got '4.1600001e-01'

Thank you so much for your help!!

#remove header (first three lines) and split into two parts sed -n '4,1159554303p' matrix.mtx |awk '{print $2"\t"$1"\t"$3}' > part1 sed -n '1159554304,2319108602p' matrix.mtx |awk '{print $2"\t"$1"\t"$3}' > part2 #add header to each part part1: %%MatrixMarket matrix coordinate real general % 27943 1462702 1159554300 part2: %%MatrixMarket matrix coordinate real general % 27943 1462702 1159554299

CroixJeremy2 · 2024-02-09T12:40:55Z

Hello everyone,

I had a relatively similar problem when trying to read a scRNA-seq dataset in R (GEO accession numbers GSE242615, the data can be downloaded here). It was not so tricky to solve the problem in the end, but for new R and UNIX users (which is my case), it could be difficult and frightening, so I put here the whole situation that I dealt with hoping it could help other people in the future :)

I first downloaded the files, and I had to rename them for Read10X()
_ GSM7766340_barcodes.tsv.gz -> barcodes.tsv.gz
_ GSM7766340_genes.tsv.gz -> features.tsv.gz
_ GSM7766340_matrix.mtx.gz -> matrix.mtx.gz
Otherwise, I get errors like this:
Error in Read10X(data.dir = "~/Downloads/GSE242615_RAW"): Barcode file missing. Expecting barcodes.tsv.gz
Trying to read the files with: counts_fail_1 = Read10X(data.dir = "~/Downloads/GSE242615_RAW")
But I have the following error messages:
Error: readMM(): column indices 'j' are not in 1:ncol[=4115] In addition: Warning messages:
1: In scan(file, nmax = nz, quiet = TRUE, what = list(i = integer(), : number of items read is not a multiple of the number of columns
2: readMM(): expected 7693754 entries but found only 6682659
So, I went online, and found this issue integer error for Read10X #4030, and also these forums that helped (here and here). So, I decompressed the matrix.mtx.gz file, and looked at what was inside using a text editor. It looked like this at the beginning of the file:

Therefore, according to the line 3, matrix.mtx should contain 7693754 lines, however, it looked like this at the end of the file:

So, the file contains 6682662 lines, which is very close to the number of lines found in the 1st warning message. Actually, if you subtract the 2 first header lines + the last incomplete line, we get exactly to the 6682659 lines found in the first error message.

Therefore, I modified the matrix.mtx file. 7693754 became 6682659, and I removed the last incomplete line.
I saved the matrix.mtx file, and compressed it with the command gzip matrix.mtx in a Terminal (sorry I don't know how to generate .gz compressed files in Windows, but you should be able to find it online elsewhere).
Read the folder in R with Read10X(data.dir = "~/Downloads/GSE242615_RAW_modified"). And it works like a charm

Remark: I hope the authors of the article did generate the dataset correctly and uploaded the raw files properly, but I suspect that something went wrong in the process because why 7693754 lines were expected but only 6682659 lines were present in the file? Plus, why was the last line incomplete? I guess we will never know.

I hope it will help you all.
Best regards,

Marwansha · 2024-03-06T10:54:05Z

hey, i have same error with reading the matrix, and i splited the matrix into 3 diffrent parts, but cant merge them back, any advices on this

x=readMM("/pasteur/zeus/projets/p02/evo_immuno_pop/single_cell/project/CZI_AfricanCellAtlas/data/GEX/11__cell_type_assignment_full/untitled_folder/counts_v2.mtx")
Error in scan(file, nmax = 1, what = what, quiet = TRUE, ...) :
scan() expected 'an integer', got '2267347504'

the head of the file have the format that people here mentioned is problematic
(base) [masharaw@maestro-submit ~]$ head /pasteur/zeus/projets/p02/evo_immuno_pop/single_cell/project/CZI_AfricanCellAtlas/data/GEX/11__cell_type_assignment_full/untitled_folder/counts_v2.mtx
%%MatrixMarket matrix coordinate real general
%
697321 14329 2267347504
1 3 1
1 5 6
1 6 1.05E2
1 7 1
1 9 2
1 10 3
1 11 3

i splitted the matrix to 3 files m1,m2,m3
but i still cant merge them do you know of a way to do this ?

matrices_123 <- rbind(m1, m2, m3)Error in .rbind2Csp(x, y) :
Cholmod error 'problem too large' at file ../Core/cholmod_sparse.c, line 89

saketkc added the more-information-needed We need more information before this can be addressed label Feb 5, 2021

no-response bot removed the more-information-needed We need more information before this can be addressed label Feb 10, 2021

satijalab deleted a comment from xiaotong-666 Feb 13, 2021

saketkc closed this as completed Feb 13, 2021

daccachejoe mentioned this issue Apr 9, 2021

SplitObject induces integer(0) in some RNA assays #4346

Closed

SpiffyLab mentioned this issue May 24, 2023

Integer error Read10X and ReadMM #7374

Closed

Tijs-dot mentioned this issue Sep 12, 2023

Preparing COVID data for loading in R noobCoding/Benchmarking-integration-of-differential-expression#1

Open

bellayqian mentioned this issue Jan 14, 2024

integer error for Read10X no.2 #8320

Open

integer error for Read10X #4030

integer error for Read10X #4030

Comments

Ting-PKU commented Feb 5, 2021

saketkc commented Feb 5, 2021 • edited Loading

AlexStewart25 commented Feb 9, 2021 • edited Loading

jonasns commented Feb 10, 2021

Ting-PKU commented Feb 10, 2021

AlexStewart25 commented Feb 10, 2021

Ting-PKU commented Feb 10, 2021 • edited Loading

xiaotong-666 commented Feb 12, 2021

Ting-PKU commented Feb 12, 2021

xiaotong-666 commented Feb 12, 2021

saketkc commented Feb 13, 2021

suyanxun commented Feb 26, 2021

Ting-PKU commented Feb 26, 2021

suyanxun commented Feb 26, 2021

GAgafencu commented Mar 22, 2021

Bon-jour commented Mar 27, 2021

Bon-jour commented Mar 27, 2021

Ting-PKU commented Mar 28, 2021

Bon-jour commented Mar 28, 2021 via email

StevenDaChui commented Apr 21, 2021

Ting-PKU commented Apr 23, 2021

SeZuNo commented May 14, 2021 • edited Loading

Yale73 commented May 21, 2021

li-xuyang28 commented Jun 3, 2021

saketkc commented Jun 3, 2021

MitsuhaMiyamizu commented Jun 16, 2021

wanghao98 commented Dec 4, 2021

Lilab-SYSU commented Feb 14, 2022

clc37 commented Mar 20, 2022 • edited Loading

Lilab-SYSU commented Mar 20, 2022 via email

Ting-PKU commented Mar 21, 2022

clc37 commented Mar 22, 2022

clc37 commented Mar 23, 2022

aditya-sarkar441 commented Mar 25, 2022 • edited Loading

saketkc commented Mar 25, 2022

saketkc commented Mar 25, 2022 • edited Loading

aditya-sarkar441 commented Mar 26, 2022

Diennguyen8290 commented Apr 4, 2022 • edited Loading

Zjianglin commented Mar 9, 2023

jianrong0520 commented Jul 28, 2023

zochzh commented Sep 5, 2023

xiangmingcai commented Oct 5, 2023 • edited Loading

bellayqian commented Jan 14, 2024 • edited Loading

This is the original matrix.mtx file:

Here is my code to separate matrix.mtx file to part 1 and part 2:

Remove header (first three lines) and split into two parts

Add header to each part

Here is the error I got:

readMM function gives the exact error:

Thank you so much for your help!!

CroixJeremy2 commented Feb 9, 2024

Marwansha commented Mar 6, 2024

saketkc commented Feb 5, 2021 •

edited

Loading

AlexStewart25 commented Feb 9, 2021 •

edited

Loading

Ting-PKU commented Feb 10, 2021 •

edited

Loading

SeZuNo commented May 14, 2021 •

edited

Loading

clc37 commented Mar 20, 2022 •

edited

Loading

aditya-sarkar441 commented Mar 25, 2022 •

edited

Loading

saketkc commented Mar 25, 2022 •

edited

Loading

Diennguyen8290 commented Apr 4, 2022 •

edited

Loading

xiangmingcai commented Oct 5, 2023 •

edited

Loading

bellayqian commented Jan 14, 2024 •

edited

Loading

`readMM` function gives the exact error: