-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merging Matrices in libTSVAggrTransByGene fails? #36
Comments
Hi! |
Hi, Or can you give me a suggetion which files are best to use? |
It seems that Ensembl recently decided that the transcript ids in the cdna file do not need to be exactly the same as the transcript ids in the gtf file (e.g., ENST00000434970.2 in one file and in the other file you get ENST00000434970). This is basically a data issue, anyway I pushed a change to libTSVAggrTransByGene to detect this issue and try to fix it. Please let me know if it fixes the issue. |
This somewhat relates to the discussion we had in Palo Alto: the "real" ID
is ENST00000434970. If you ignore the version number, then Gencode and
Ensembl are exactly the same transcript set, as far as I know.
On differences between Ensembl and Gencode see their FAQ:
https://www.gencodegenes.org/faq.html
|
Yes, it is related. |
A quick update on this issue. There is another problem with the latest versions of Ensembl - the cdna file contains transcripts that are not in the GTF file or, from another perspective, the GTF file has missing transcripts. For instance, in the Ensembl v90, there are 16k transcripts found in the human cdna file and not found in the matching GTF (e.g., ENST00000631435). Currently iRAP will fail because it checks for consistency between the two files (which no longer seems to exist). To work around this data issue, I' ll change the code today/tomorrow to inform and warn about these inconsistencies and carry on (but not to exit). Cheers. |
Hey there, Thank you very much for your efforts! I can try your patch tomorrorw and send results to you next week. Cheers. |
Hi, |
Hey, i tried it today and I got following error
|
Hi, it should already be fixed in the latest version? |
Oh, thanks. |
Is there a possibillity to filter low expressed genes? I have a lot of genes that have zero expression and if I include them in DE, edgeR will fail. I removed those genes with expression sums over all samples below 10 in transcripts.raw.kallisto.tsv and now it continues DEA. |
In report generation it still fails with following error: |
You may use the parameter |
Hey, If I use this parameter, I get an error message. Is this a bug?
|
Hi, |
Hi, It seems like not all values appear in the opt$label2group object. |
Hey there,
But colnames (data.f) contains So either remove the data.f columns not needed (what I would not do), or add the missing items to the opt$label2group and use them in the design matrix and care for the correct contrast. I think if you remove the not needed columns from testing, you remove information from testing that are helpful. Maybe change this that the whole available information are used? edit: Further investigation lead me to the error: in |
Hi, thank you for the Rdata file. I'll look into this issue in the coming days - last week I did not have the time. |
Hi, indeed the problem was on the line that you mentioned. It should now be fixed in the latest release (0.8.5p5). Many thanks again for the report and the fix ;-) |
Hey there,
I try to use kallisto to quant the reads, but it failes during execution of mentioned script.
The first new.mat is full with information, but after, I think filtering?, the new.mat is totally empty. May it be to wrong files in reference folder?
The text was updated successfully, but these errors were encountered: