-
Notifications
You must be signed in to change notification settings - Fork 905
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
integer error for Read10X #4030
Comments
Are you able to read the
|
I'm having exactly the same problem (given the identical integer value I suspect we are both looking at the new scCovid data from Cell journal). Running @saketkc fread on the matrix now, it's a 9gb file but seems to be working. How do you then integrate the barcodes and features when the counts are loaded? You get the following outputs: Warning messages: covid<-CreateSeuratObject(counts, project = "covid", min.cells = 3, min.features = 200) |
I am also trying to load this dataset, and I found that the data structure is different from a normal 10X structure: head GSE158055_covid19_counts.mtx normal 10X structure: I would suspect that this is the reason why the Read10X command doesn't work. |
Yes, the data structure is different from 10X structure. Also, the dataset is too big and beyond the limit of 2147483647. Therefore, I splited it into several parts and swap the first and second columns. Finally, it worked. |
Any chance you could post the script you used @Ting-PKU ? |
#remove header (first three lines) and split into two parts |
您好,请问barcodes.tsv 是否也要 拆成相应的两部分呢,非常感谢!@Ting-PKU |
不需要的 |
感谢!@Ting-PKU |
The number of non-zero entries in this dataset exceeds the integer limit of R making it impossible to make a I have created a |
楼主你好~ 我按照你的命令拆成了两个,但是在读取的时候遇到了另外一个错误: Error in 请问你有遇到类似的问题么 |
While you use Read10X() function, you should add gene.column = 1 as the features.tsv file only has one column |
强👍,多谢多谢 |
@saketkc thank you for helping with the issue raised by Ting. I'm currently using the anndata object you generated from that the dataset that Ting was trying to load. Could you please let me know how did you generate it starting from the data available at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE158055 . I'm a novice at manipulating single cell RNA-seq data in Python and your insights would be greatly appreciated. |
Thank you for your proposal. I have another question, How to read part1 and part2 with Read10X? |
You have to follow the requirement of Read10X() function. That is organizing the two matrices as |
谢谢你的回复🌹
…---Original---
From: "Ting ***@***.***>
Date: Sun, Mar 28, 2021 16:37 PM
To: ***@***.***>;
Cc: ***@***.******@***.***>;
Subject: Re: [satijalab/seurat] integer error for Read10X (#4030)
Thank you for your proposal. I have another question, How to read part1 and part2 with Read10X?
You have to follow the requirement of Read10X() function. That is organizing the two matrices as
part1 /
barcodes.tsv.gz
features.tsv.gz
matrix.mtx.gz
part2 /
barcodes.tsv.gz
features.tsv.gz
matrix.mtx.gz
Then read them in R with Read10X("dir/to/part1",gene.column = 1 )or Read10X("dir/to/part2",gene.column = 1 )
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
I use this code in linux but it didnt work, could you plz explain this code again. Im new to linux. Thank you very much! |
Maybe you can paste your code and error reporting here then we can help to figure it out. |
Hi there! is it possible to attend this issue again? specially from you @Ting-PKU, seems like you got a hold on the problem 👍 For part1: There are 1159554300 entries indicated in the matrix header and there are 1159554303 lines in the file In one of the answers @saketkc mentioned that the other files (barcodes and features) should "split accordingly". Is this split taken into account when splitting the matrix?, if so, why only the entries where modified and not the features (27943) or barcodes (1462702). This is my first use of the Seurat Package and any pointers are greatly appreciated :) |
Hi @saketkc, Your Thanks, |
Hi, Thank you for sharing the h5ad file, but could you please provide some guidance on how to load this into Seurat? I tried to follow the tutorial from SeuratDisk
And got the following error:
Best, |
Sorry, the current R limitations make it impossible to load the entire dataset in R. |
Note that the error says "NAs introduced by coercion to integer range", which means you might hit the limitation of R. In case you might wonder what that limitation is, see discussion here: https://stackoverflow.com/questions/28398007/support-for-bit-vector-size-limit-231-1-in-r |
Hello, just want to follow up on the splitting method. After we finish splitting, how should we combine the two part? Thanks a lot! |
I splited the large dataset into two part. The part1 can be read into R correctly using Read10X, but the part2 dataset can' t be read into R correctly. The expression matrix of part 2 was entirely zero. Do you have same problem?? |
@Ting-PKU 楼主你好~我是新手,按照你的命令拆成了两个,但得到的part1和part2文件格式不对,它们不是mtx文件,结果如下: sed -n '1159554304,2319108602p' GSE158055_covid19_counts.mtx |awk '{print $2"\t"$1"\t"$3}' > part2 2022.03.21 |
我也没有拆成功,我用的scanpy python包读取之后拆的,我按照每个样本拆出来的。R没读进取
***@***.***
From: clc37
Date: 2022-03-20 17:26
To: satijalab/seurat
CC: Lilab-SYSU; Comment
Subject: Re: [satijalab/seurat] integer error for Read10X (#4030)
@Ting-PKU 楼主你好~我是新手,按照你的命令拆成了两个,但得到的part1和part2文件格式不对,它们不是mtx文件,结果如下:
sed -n '1159554304,2319108602p' GSE158055_covid19_counts.mtx |awk '{print $2"\t"$1"\t"$3}' > part2
head part2
712621 22373 10
712621 13908 1
712621 5156 1
712621 17251 1
712621 17732 1
712621 8376 1
希望能得到你的帮助~谢谢
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications on the go with GitHub Mobile for iOS or Android.
You are receiving this because you commented.Message ID: ***@***.***>
|
👍应该是没加header |
楼主@Ting-PKU,我又来打扰你了~按照你的方法将分成了三部分,成功读取了,但是在后续用R分析时又报错了,如下: |
@saketkc Your anndata object is really helpful. But it will be great if you can match your sample ID to their raw sample ID, then we can subset cells we are interested in. |
Is this the .obs of the adata attached above ? Can someone confirm ? @saketkc |
The data is also downloadable from here: http://covid19.cancer-pku.cn/#/metadata |
See this gist for a discussion about the metadata: https://gist.github.com/saketkc/31f26c71d61dbe1cf8bb9cff6af5d04f |
Thanks @saketkc . That's helpful. I was looking for this only. |
@saketkc: I'm trying with this one and could not read the provided h5ad object (the details can be seen here: scverse/anndata#753). May I ask have you got chance to go with this object ? Thanks |
Hi @Ting-PKU , I split the exp_matrix1 <- Read10X(file.path(wkdir, 'part1'), gene.column = 1)
Error: readMM(): row values 'i' are not in 1:nr Here is the data content in $ ll
-rw-rw-r-- 1 zjl zjl 30306249 Mar 9 08:37 barcodes.tsv
-rw-rw-r-- 1 zjl zjl 221641 Mar 9 08:36 features.tsv
lrwxrwxrwx 1 zjl zjl 12 Mar 9 11:55 genes.tsv -> features.tsv
-rw-rw-r-- 1 zjl zjl 17046103030 Mar 8 22:17 matrix.mtx
$ head matrix.mtx
%%MatrixMarket matrix coordinate real general
%
27943 1462702 1159554300
1 18558 62
1 18565 8
1 18564 29
1 18562 96
1 18563 8
1 18561 46
1 18557 136
$ wc -l *.tsv
1462702 barcodes.tsv
27943 features.tsv
Could you please help me, did I do something wrong or lost some necessary steps? Thanks. |
hi,Zjianlin, did you solve you problem,I have the same error with you,if you solve your error, please give me some help about this problem. |
|
I kind of solved this problem. This file is indeed too huge for me, and I only need a subset of data from it. So, I split this .mtx file into hundreds of .mtx files. So that I can process them in batch using R.
// modify splited file names
// correct first file
// correct last file
// move counts00.mtx and counts9373.mtx to another folder. And correct other files
|
Hi everyone, sorry to bother but I have a new problem with this question. This is the original matrix.mtx file:
Here is my code to separate matrix.mtx file to part 1 and part 2:Remove header (first three lines) and split into two parts
Add header to each part
Here is the error I got:
|
Hello everyone, I had a relatively similar problem when trying to read a scRNA-seq dataset in R (GEO accession numbers GSE242615, the data can be downloaded here). It was not so tricky to solve the problem in the end, but for new R and UNIX users (which is my case), it could be difficult and frightening, so I put here the whole situation that I dealt with hoping it could help other people in the future :)
Therefore, according to the line 3, matrix.mtx should contain 7693754 lines, however, it looked like this at the end of the file: So, the file contains 6682662 lines, which is very close to the number of lines found in the 1st warning message. Actually, if you subtract the 2 first header lines + the last incomplete line, we get exactly to the 6682659 lines found in the first error message.
Remark: I hope the authors of the article did generate the dataset correctly and uploaded the raw files properly, but I suspect that something went wrong in the process because why 7693754 lines were expected but only 6682659 lines were present in the file? Plus, why was the last line incomplete? I guess we will never know. I hope it will help you all. |
hey, i have same error with reading the matrix, and i splited the matrix into 3 diffrent parts, but cant merge them back, any advices on this x=readMM("/pasteur/zeus/projets/p02/evo_immuno_pop/single_cell/project/CZI_AfricanCellAtlas/data/GEX/11__cell_type_assignment_full/untitled_folder/counts_v2.mtx") the head of the file have the format that people here mentioned is problematic i splitted the matrix to 3 files m1,m2,m3 matrices_123 <- rbind(m1, m2, m3)Error in .rbind2Csp(x, y) : |
I want to load a large-scale single cell data using Read10X. But there is an error message as follows:
Error in scan(file, nmax = 1, what = what, quiet = TRUE, ...) :
scan() expected 'an integer', got '2319108599'
Do you have any idea to solve this problem?
The text was updated successfully, but these errors were encountered: