Merging Matrices in libTSVAggrTransByGene fails? #36

Niklas123Niklas · 2017-10-17T13:43:38Z

Hey there,

I try to use kallisto to quant the reads, but it failes during execution of mentioned script.

new.mat<-merge(tsv.mat,g2t,by.x=colnames(tsv.mat)[1],by.y="transcript_id",all.x=TRUE,sort=F)
new.mat <- new.mat[new.mat$transcript_id %in% as.character(g2t$transcript_id),,drop=FALSE]

The first new.mat is full with information, but after, I think filtering?, the new.mat is totally empty. May it be to wrong files in reference folder?

The text was updated successfully, but these errors were encountered:

nunofonseca · 2017-10-18T12:07:52Z

Hi!
Maybe the cdna file that you used contains transcript ids that are not in the GTF? What files cdna and gtf files are you using?
Cheers.

Niklas123Niklas · 2017-10-18T12:58:12Z

Hi,
thanks for the hint!
Homo_sapiens.GRCh38.cdna.all.fa is the cdna file and Homo_sapiens.GRCh38.90.gtf is the gtf file, but I don't remember if I got it from the same source. I try to get new ones and retry it.

Or can you give me a suggetion which files are best to use?

nunofonseca · 2017-10-18T16:08:07Z

It seems that Ensembl recently decided that the transcript ids in the cdna file do not need to be exactly the same as the transcript ids in the gtf file (e.g., ENST00000434970.2 in one file and in the other file you get ENST00000434970). This is basically a data issue, anyway I pushed a change to libTSVAggrTransByGene to detect this issue and try to fix it. Please let me know if it fixes the issue.
Cheers.

maximilianh · 2017-10-18T16:32:16Z

This somewhat relates to the discussion we had in Palo Alto: the "real" ID is ENST00000434970. If you ignore the version number, then Gencode and Ensembl are exactly the same transcript set, as far as I know. On differences between Ensembl and Gencode see their FAQ: https://www.gencodegenes.org/faq.html

nunofonseca · 2017-10-18T16:35:45Z

Yes, it is related.
As I wrote above, I added a workaround for that in the script but I just realized that the problem pops up in another point of the pipeline - thank you Ensembl for making my day!

nunofonseca · 2017-10-19T09:15:33Z

A quick update on this issue. There is another problem with the latest versions of Ensembl - the cdna file contains transcripts that are not in the GTF file or, from another perspective, the GTF file has missing transcripts. For instance, in the Ensembl v90, there are 16k transcripts found in the human cdna file and not found in the matching GTF (e.g., ENST00000631435). Currently iRAP will fail because it checks for consistency between the two files (which no longer seems to exist). To work around this data issue, I' ll change the code today/tomorrow to inform and warn about these inconsistencies and carry on (but not to exit).

Cheers.

Niklas123Niklas · 2017-10-19T11:34:24Z

Hey there,

Thank you very much for your efforts! I can try your patch tomorrorw and send results to you next week.

Cheers.

nunofonseca · 2017-10-20T08:22:35Z

Hi,
The code should now be able to cope with the inconsistencies. Please let me know if it works or not for you.
Cheers.

Niklas123Niklas · 2017-10-25T10:02:12Z

Hey, i tried it today and I got following error

[DONE] Assembly and quantification
make: *** No rule to make target 'Master2/none/kallisto/transcripts.raw.kallisto.irap.tsv', needed by 'stage3'. Schluss.

nunofonseca · 2017-10-25T10:08:37Z

Hi, it should already be fixed in the latest version?
Cheers

Niklas123Niklas · 2017-10-25T11:15:18Z

Oh, thanks.
It works so far now.

Niklas123Niklas · 2017-10-25T11:23:49Z

Is there a possibillity to filter low expressed genes? I have a lot of genes that have zero expression and if I include them in DE, edgeR will fail. I removed those genes with expression sums over all samples below 10 in transcripts.raw.kallisto.tsv and now it continues DEA.

Niklas123Niklas · 2017-10-25T12:16:56Z

In report generation it still fails with following error:
make: *** No rule to make target 'Master2/report/riq//raw_data/C1_0s1.f.fastqc.tsv', needed by 'Master2/report/fastq_qc_report.tsv'. Schluss.

nunofonseca · 2017-10-25T14:42:49Z

You may use the parameter
de_min_count=10
in the configuration file to remove genes with total expression below 10.

Niklas123Niklas · 2017-11-03T11:53:55Z

Hey,

If I use this parameter, I get an error message. Is this a bug?

[INFO] Filtering out genes with low counts (<=5)...
[INFO] Filtering out genes with low counts (<=5)...done.
Fehler in conds[i] <- label2group[[conds[i]]] : Ersetzung hat Länge 0
Ruft auf: map.conds2cols
Ausführung angehalten
/home/niklas/irap_install/scripts/../aux/mk/irap_de.mk:161: recipe for target 'Master/tophat2/htseq2/edger/G0SvsG20S.genes_de.tsv' failed

nunofonseca · 2017-11-07T10:06:22Z

Hi,
Could you rerun the ./scripts/irap_DE_edger script with --debug option and share with me the irap_DE_edgeR.Rdata file that is generated? This will allow me to reproduce the error.
Thanks
PS: you should get the irap_DE_edger command by rerunning irap as follows
irap conf=path2your_conf_file [any options that you are passing in the command line] Master/tophat2/htseq2/edger/G0SvsG20S.genes_de.tsv -n

Niklas123Niklas · 2017-11-07T15:48:42Z

Hi,
here you go. Thank you very much!

Rdata.zip

It seems like not all values appear in the opt$label2group object.

Niklas123Niklas · 2017-11-20T13:17:57Z

Hey there,
any progress here?
I searched a bit in the R-script. The problem is: The opt$label2group variable does only contain the values for the selected groups.

"C1_0s1"  "C1_0s2"  "C2_0s1"  "C2_0s2"  "C1_90s1" "C1_90s2" "C2_90s1"
"G0S"     "G0S"     "G0S"     "G0S"     "G90S"    "G90S"    "G90S"

But colnames (data.f) contains
"C1_0s1" "C1_0s2" "C1_H2O22" "C1_20s1" "C1_90s1" "C1_90s2" "C2_0s1" "C2_0s2" "C2_H2O21" "C2_H2O22" "C2_20s1" "C2_20s2" "C2_90s1"

So either remove the data.f columns not needed (what I would not do), or add the missing items to the opt$label2group and use them in the design matrix and care for the correct contrast. I think if you remove the not needed columns from testing, you remove information from testing that are helpful. Maybe change this that the whole available information are used?

edit: Further investigation lead me to the error: in irap_de.R in line 195 it says
data.f <- data[rows.sel,],
but it should be:
data.f <- data.f[rows.sel,]

nunofonseca · 2017-11-20T14:35:06Z

Hi, thank you for the Rdata file. I'll look into this issue in the coming days - last week I did not have the time.
Cheers.

nunofonseca · 2017-11-26T22:49:54Z

Hi, indeed the problem was on the line that you mentioned. It should now be fixed in the latest release (0.8.5p5). Many thanks again for the report and the fix ;-)

Niklas123Niklas changed the title ~~Merging Matrices in libTSVAggrTransByGene failes=~~ Merging Matrices in libTSVAggrTransByGene failes? Oct 17, 2017

nunofonseca self-assigned this Oct 18, 2017

nunofonseca changed the title ~~Merging Matrices in libTSVAggrTransByGene failes?~~ Merging Matrices in libTSVAggrTransByGene fails? Oct 18, 2017

nunofonseca closed this as completed Nov 26, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merging Matrices in libTSVAggrTransByGene fails? #36

Merging Matrices in libTSVAggrTransByGene fails? #36

Niklas123Niklas commented Oct 17, 2017 •

edited

Loading

nunofonseca commented Oct 18, 2017

Niklas123Niklas commented Oct 18, 2017 •

edited

Loading

nunofonseca commented Oct 18, 2017

maximilianh commented Oct 18, 2017 via email

nunofonseca commented Oct 18, 2017 •

edited

Loading

nunofonseca commented Oct 19, 2017

Niklas123Niklas commented Oct 19, 2017

nunofonseca commented Oct 20, 2017

Niklas123Niklas commented Oct 25, 2017

nunofonseca commented Oct 25, 2017

Niklas123Niklas commented Oct 25, 2017

Niklas123Niklas commented Oct 25, 2017

Niklas123Niklas commented Oct 25, 2017

nunofonseca commented Oct 25, 2017

Niklas123Niklas commented Nov 3, 2017

nunofonseca commented Nov 7, 2017

Niklas123Niklas commented Nov 7, 2017 •

edited

Loading

Niklas123Niklas commented Nov 20, 2017 •

edited

Loading

nunofonseca commented Nov 20, 2017

nunofonseca commented Nov 26, 2017

Merging Matrices in libTSVAggrTransByGene fails? #36

Merging Matrices in libTSVAggrTransByGene fails? #36

Comments

Niklas123Niklas commented Oct 17, 2017 • edited Loading

nunofonseca commented Oct 18, 2017

Niklas123Niklas commented Oct 18, 2017 • edited Loading

nunofonseca commented Oct 18, 2017

maximilianh commented Oct 18, 2017 via email

nunofonseca commented Oct 18, 2017 • edited Loading

nunofonseca commented Oct 19, 2017

Niklas123Niklas commented Oct 19, 2017

nunofonseca commented Oct 20, 2017

Niklas123Niklas commented Oct 25, 2017

nunofonseca commented Oct 25, 2017

Niklas123Niklas commented Oct 25, 2017

Niklas123Niklas commented Oct 25, 2017

Niklas123Niklas commented Oct 25, 2017

nunofonseca commented Oct 25, 2017

Niklas123Niklas commented Nov 3, 2017

nunofonseca commented Nov 7, 2017

Niklas123Niklas commented Nov 7, 2017 • edited Loading

Niklas123Niklas commented Nov 20, 2017 • edited Loading

nunofonseca commented Nov 20, 2017

nunofonseca commented Nov 26, 2017

Niklas123Niklas commented Oct 17, 2017 •

edited

Loading

Niklas123Niklas commented Oct 18, 2017 •

edited

Loading

nunofonseca commented Oct 18, 2017 •

edited

Loading

Niklas123Niklas commented Nov 7, 2017 •

edited

Loading

Niklas123Niklas commented Nov 20, 2017 •

edited

Loading