Input for the import-rna command #479

SJRussell · 2019-11-15T13:16:29Z

It's not clear what Gene Resource data to download from BioMart. I'm using the built in Hg38 gene info as a template but BioMart doesn't have "NCBI gene ID" or "Transcript support level (TSL)" available for Hg19. I'm going to try to get the original fasta files for the project and remap to Hg38, but it would be great to have a brief overview of what "Gene-resource" info is required when working with non-Hg38 genomes.
I'd also like to build my own cnv expression correlates, but the input requirements for cnv_expression_correlate.py are not clear. Could you point me to a resource for building these inputs?
There are several issues/questions about using counts as input for import-rna. Have the issues been fixed or should I run RSEM instead of HTSeq-count to generate my sample input.

Thanks for your great tool and in advance for your help.

etal · 2019-11-18T21:14:52Z

Thanks for checking in, and sorry for the trouble.

Offhand I'm not sure what to do about BioMart's lack of support for hg19 etc. Do you know of another BioMart source that might have these, or a way to do it in R?
At the moment the code is the spec, and I agree some proper docs are necessary here. (Tagging this ticket.) Some manual wrangling of the tables is needed.
There's a chance it's been fixed -- if it's quick to check, then try it, otherwise RSEM would be a viable workaround.

SJRussell · 2019-11-26T16:44:30Z

Thanks for the response. For now:

I've mapped to hg38 instead of trying to get the right output from biomart. The difficulty was that I didn't know what fields were required for the gene-resource file.
If you update the documentation to include details on how to build custom expression correlates, please mention in this ticket. It seems to me that the accuracy of the algorithm depends on experiment-specific expression correlates.
Upon installing CNVkit with conda and running any cnvkit.py commands, I got this error:
Traceback (most recent call last): File "/home/stewart/anaconda3/envs/cnvkit/bin/cnvkit.py", line 8, in <module> from cnvlib import commands File "/home/stewart/cnvkit/cnvlib/__init__.py", line 4, in <module> from .cmdutil import read_cna as read File "/home/stewart/cnvkit/cnvlib/cmdutil.py", line 7, in <module> from .cnary import CopyNumArray as CNA File "/home/stewart/cnvkit/cnvlib/cnary.py", line 9, in <module> from . import core, descriptives, params, smoothing File "/home/stewart/cnvkit/cnvlib/smoothing.py", line 152 x, wing, *padded = check_inputs(x, width, False, weights) ^ SyntaxError: invalid syntax
By reinstalling with pip, the issues seem resolved. I also ran with -f counts and it appears to give log2 values, suggesting that counts from STAR or HTSeq-count can be used.

etal · 2019-11-29T18:44:30Z

Thanks for the feedback. I'll roll another release for the sake of getting the latest fixes out to the world, and then see about replicating and documenting the process of creating the gene resource and cnv-expression correlates.

SJRussell · 2019-12-06T12:11:04Z

Much appreciated.
Do you have any suggestions for cleaning up the calls I'm getting? So far I've tried specifying normal samples, using --no-txlen, --max-log2 2, and segment -m none. The PDFs I've attached are with 3 normal samples specified, using counts, and with the rest of the parameters default. The total input was 15 RNA seq samples. As you can see, the XY normal sample segments are still quite variable. In the -16 samples, there is a clear decrease in log2 for chrom 16. However in the XO samples there is no clear decrease in XO (this could be due to dosage compensation or the fact that the population contains both XX and XY samples). Any suggestions on how to bring the baseline closer to 0 and reduce variability? Thanks!

normal1.pdf
normal2.pdf
normal3.pdf
XO-1.pdf
XO-2.pdf
XO-3.pdf
minus16-1.pdf
minus16-2.pdf
minus16-3.pdf
plus16-1.pdf

etal added the rna label Nov 18, 2019

etal added the documentation label Nov 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Input for the import-rna command #479

Input for the import-rna command #479

SJRussell commented Nov 15, 2019

etal commented Nov 18, 2019

SJRussell commented Nov 26, 2019

etal commented Nov 29, 2019

SJRussell commented Dec 6, 2019 •

edited

Input for the import-rna command #479

Input for the import-rna command #479

Comments

SJRussell commented Nov 15, 2019

etal commented Nov 18, 2019

SJRussell commented Nov 26, 2019

etal commented Nov 29, 2019

SJRussell commented Dec 6, 2019 • edited

SJRussell commented Dec 6, 2019 •

edited