How to run post analysis steps such as copynumber and novel SNP detection? #11

yingchen69 · 2023-05-17T05:56:38Z

Hi,

The readme mentions that t1k post analysis steps can do copynumber and novel SNP detection. Is there any detail regarding how to do the tasks? There is a t1k-copynumber.py, but I am not sure if python t1k-copynumber.py -g T1K_genotyping_result_file can work.

Thanks a lot for the help!

Ying

mourisl · 2023-05-17T15:43:35Z

The novel SNP detection is automatically included in the workflow/wrapper, and the vcf file is the SNP detection results where the coordinate is the concatenated exons for each allele.

For the t1k-copynumber.py script, the "-g" option takes the the XXX_genotype.tsv file generated by T1K. You may also add the gene names to option "--nomissing" as a comma-separated list, where the genes are expected to be present on every chromosome. For example, we know there are four KIR framework genes, so the copy number inference command should be:

python3 t1k-copynumber.py -g XXX_genotype.tsv --nomissing KIR3DL3,KIR2DL4,KIR3DP1,KIR3DL2

Hope this helps and please let me know if you have further questions.

yingchen69 · 2023-05-17T18:39:54Z

Hi,

Thanks a lot for the quick reply!

I am still a bit confused by the SNP detection. Here is my command to get the genotype.tsv:

run-t1k -1 my_R1.fq.gz -2 my_R2.fq.gz --preset hla -f /hlaidx/hlaidx_dna_seq.fa --od outputdir -o mysampleid

I do not see any vcf in the output folder. Should I set more parameter to get SNP result in the output?

Best,

Ying

mourisl · 2023-05-17T18:47:15Z

Can you show me the output on the screen?

Another way to call the variants is through the "./analyzer" command, which can be run as:
"./analyzer -f /hlaidx/hlaidx_dna_seq.fa -a outputdir/mysampleid_allele.tsv -1 outputdir/mysampleid_aligned_1.fa -2 outputdir/mysampleid_aligned_1.fa.gz -s 0.97 -o outputdir/mysampleid" in your case. After that you shall see the outputdir/mysampleid_allele.tsv file.

yingchen69 · 2023-05-17T19:05:27Z

Hi, Our server is currently down for monthly maintenance. I will get the list of outputs of my T1K jobs tomorrow once it's back online. Thanks a lot for the help! Ying Get Outlook for Android<https://aka.ms/AAb9ysg>

…

________________________________ From: Li Song ***@***.***> Sent: Wednesday, May 17, 2023 2:47:27 PM To: mourisl/T1K ***@***.***> Cc: yingchen69 ***@***.***>; Author ***@***.***> Subject: Re: [mourisl/T1K] How to run post analysis steps such as copynumber and novel SNP detection? (Issue #11) Can you show me the output on the screen? Another way to call the variants is through the "./analyzer" command, which can be run as: "./analyzer -f /hlaidx/hlaidx_dna_seq.fa -a outputdir/mysampleid_allele.tsv -1 outputdir/mysampleid_aligned_1.fa -2 outputdir/mysampleid_aligned_1.fa.gz -s 0.97 -o outputdir/mysampleid" in your case. After that you shall see the outputdir/mysampleid_allele.tsv file. — Reply to this email directly, view it on GitHub<#11 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABQ767T7VEUDMO4FTKWOWO3XGUML7ANCNFSM6AAAAAAYESQUS4>. You are receiving this because you authored the thread.Message ID: ***@***.***>

yingchen69 · 2023-05-18T03:46:51Z

Hi,

I just got our server back and I checked that we do have the allele.vcf files. Sorry for the confusion :(

I got t1k-copynumber.py working, but I am not sure about the column names. From the code it seems the column names should be gene, number of alleles, allele 1, allele 1 cn, allele 1 ratio, allele 2, allele 2 cn, allele 2 ratio. What is the 'ratio'? Is it in log2?

Another question, how does T1K deal with duplicated reads? My data are all from whole exon sequencing and I see ~20% duplicates rate by flagstat. I tried several tools for HLA typing such as polysolver, OptiType, hisat-genotype,HLAHD, HLALA. I always got totally different HLA typing results if I removed duplicated reads from the bam files first.

Thanks a lot!

Ying

mourisl · 2023-05-18T03:53:47Z

Ratio is the log-ratio of the likelihood between the most likely copy number and the second likely copy number. I'm still trying to optimize t1k-copynumber.py, so please interpret its result with caution.

We don't remove the duplicated reads. The duplicated reads will contribute to the allele abundance estimation (or other type of allele score in other HLA genotypers), therefore it is expected that the deduplication will affect the genotyping results. Hope this helps.

yingchen69 · 2023-05-18T15:17:58Z

Hi,

Thanks a lot for the details!

Regarding the duplicate reads, is their a reason to keep them for analysis? Normally we think duplicate reads are artifacts due to PCR, optical...

Best,

Ying

mourisl · 2023-05-18T18:19:19Z

In RNA-seq data, I saw some genes are very highly expressed and some reads can become duplicated by chance. I think the deduplication should be left to the user when preprocessing the fastq file if they feel the duplication has become a severe issue.

fernandogs97BR mentioned this issue Jul 3, 2023

Ratio is the log-ratio of the likelihood between the most likely copy number and the second likely copy number. I'm still trying to optimize t1k-copynumber.py, so please interpret its result with caution. #12

Open

mourisl closed this as completed Nov 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to run post analysis steps such as copynumber and novel SNP detection? #11

How to run post analysis steps such as copynumber and novel SNP detection? #11

yingchen69 commented May 17, 2023

mourisl commented May 17, 2023

yingchen69 commented May 17, 2023

mourisl commented May 17, 2023

yingchen69 commented May 17, 2023 via email

yingchen69 commented May 18, 2023

mourisl commented May 18, 2023

yingchen69 commented May 18, 2023

mourisl commented May 18, 2023

How to run post analysis steps such as copynumber and novel SNP detection? #11

How to run post analysis steps such as copynumber and novel SNP detection? #11

Comments

yingchen69 commented May 17, 2023

mourisl commented May 17, 2023

yingchen69 commented May 17, 2023

mourisl commented May 17, 2023

yingchen69 commented May 17, 2023 via email

yingchen69 commented May 18, 2023

mourisl commented May 18, 2023

yingchen69 commented May 18, 2023

mourisl commented May 18, 2023