How to generate the reference genome for heterozygous HiC data? #116

xinkwu · 2017-11-30T02:43:15Z

Hi nservant,

The HiC data I used was a hybrid fly, produced by A and B crossed. I used genome A , genome B, and genome combine A and B( I just changed the name of each scaffolds in two genome and used cat* command to combine them together in the same fastq file) as the reference genome, respectively.

The result confused me a lot. I got more mapped pairs, of course, by using combined genome. But I got more multiple pairs alignments and fewer valid pairs by using combined genome.

How to generate the reference genome for my data properly? Do you have any suggestions?

Thanks a lot,
Kai

nservant · 2017-11-30T08:39:58Z

Hi kai,
If I correctly understood, you generated a kind of diploid reference genome ? with 2N chromosomes ?
In this case, you will indeed expect much more multiple pairs, as all reads that do not overlap a SNPs between A/B will be mapped twice (on the two parental chromosomes).
What do you want to do exactly ? do you want to generate allele specific maps ? to explore the chromosome organization of each parental chromosome ?
If so, you can have a llok to the manual http://nservant.github.io/HiC-Pro/AS.html.
You will need to generate a N-masked genome, where all SNPs between A/B are replaced by a N.
Best
N

xinkwu · 2017-11-30T10:18:52Z

Yes, I generated a diploid reference genome.
Now the first thing I want to do is to make a quality control of my HiC data, to ensure that my data can support the next step analysis with RNA-seq data.
Because of the hybrid background, the result was not very well when I used A genome as the reference genome only. I supposed there will have more valid pairs when I combine the parental genomes together.
Maybe I could delete the identical sequences between the two genomes(leave one copy only).
Do you think that will be successful?
Best,
Kai

nservant · 2017-12-19T14:22:23Z

Hi,
I don't know. I never tested HiC-pro with a diploid mapping. I'm sure that this is a good approaches, but I'm only sure that there is many details that HiC-pro will not check, such as repeated reads in common between A/B genome. If you want to perform allele-specific analysis with HiC-pro, please used a N-masked strategy as already mentioned, instead of a diploid mapping.
Best

nservant closed this as completed Dec 19, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to generate the reference genome for heterozygous HiC data? #116

How to generate the reference genome for heterozygous HiC data? #116

xinkwu commented Nov 30, 2017

nservant commented Nov 30, 2017

xinkwu commented Nov 30, 2017 •

edited

nservant commented Dec 19, 2017

How to generate the reference genome for heterozygous HiC data? #116

How to generate the reference genome for heterozygous HiC data? #116

Comments

xinkwu commented Nov 30, 2017

nservant commented Nov 30, 2017

xinkwu commented Nov 30, 2017 • edited

nservant commented Dec 19, 2017

xinkwu commented Nov 30, 2017 •

edited