Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to generate the reference genome for heterozygous HiC data? #116

Closed
xinkwu opened this issue Nov 30, 2017 · 3 comments
Closed

How to generate the reference genome for heterozygous HiC data? #116

xinkwu opened this issue Nov 30, 2017 · 3 comments

Comments

@xinkwu
Copy link

xinkwu commented Nov 30, 2017

Hi nservant,

The HiC data I used was a hybrid fly, produced by A and B crossed. I used genome A , genome B, and genome combine A and B( I just changed the name of each scaffolds in two genome and used cat* command to combine them together in the same fastq file) as the reference genome, respectively.

The result confused me a lot. I got more mapped pairs, of course, by using combined genome. But I got more multiple pairs alignments and fewer valid pairs by using combined genome.

How to generate the reference genome for my data properly? Do you have any suggestions?

Thanks a lot,
Kai

@nservant
Copy link
Owner

Hi kai,
If I correctly understood, you generated a kind of diploid reference genome ? with 2N chromosomes ?
In this case, you will indeed expect much more multiple pairs, as all reads that do not overlap a SNPs between A/B will be mapped twice (on the two parental chromosomes).
What do you want to do exactly ? do you want to generate allele specific maps ? to explore the chromosome organization of each parental chromosome ?
If so, you can have a llok to the manual http://nservant.github.io/HiC-Pro/AS.html.
You will need to generate a N-masked genome, where all SNPs between A/B are replaced by a N.
Best
N

@xinkwu
Copy link
Author

xinkwu commented Nov 30, 2017

Yes, I generated a diploid reference genome.
Now the first thing I want to do is to make a quality control of my HiC data, to ensure that my data can support the next step analysis with RNA-seq data.
Because of the hybrid background, the result was not very well when I used A genome as the reference genome only. I supposed there will have more valid pairs when I combine the parental genomes together.
Maybe I could delete the identical sequences between the two genomes(leave one copy only).
Do you think that will be successful?
Best,
Kai

@nservant
Copy link
Owner

Hi,
I don't know. I never tested HiC-pro with a diploid mapping. I'm sure that this is a good approaches, but I'm only sure that there is many details that HiC-pro will not check, such as repeated reads in common between A/B genome. If you want to perform allele-specific analysis with HiC-pro, please used a N-masked strategy as already mentioned, instead of a diploid mapping.
Best

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants