Skip to content

Can hickit deal with nanopore long reads and generate phasing results properly? #42

@Wong718

Description

@Wong718

Hello Professor Li. It's a wonderful tool for 3D genome analysis.
Recently, I am dealing with the scNanoHiC data with bwa-sw and hickit, and I found that the haplotype phasing results were confusing. And I want to ask if hickit could deal with the nanopore long sequencing data as properly as NGS.
The sam file generated by bwa-sw is looked like as follows, where a single read may map to multiple positions and generated several records with the same id.

d3d59d85-f117-406d-93e3-4901250df094    0       chr10   118628182       42      344S305M441S    *       0       0       GTTCAGTTACGTATTGCTAGCTCTTTCCCTACACGACGCTCTTCCGATCTGAGATTAAAAAAAAAAAAAAAAACATTTTAACCTAGGTGGAAGTGGAGGGAGGAGGGGACGAAGGAGAGAATAAGAAATTTCTGGAGCTTTTAACAAGGGGAGTGTGAGGGTAATCCAGCAATTCAGAAGCCGGGCGCGGTGGCTCATGCCTATAATCCCAGCACTTATTGGGAGGCCGAGGCAGGTGGATAGCTTGAGCCCAAGAGTTCGAGACCACCCTGGCCAACATAGTGAGAACCCCCCATCTCTATTTAAACAACAACAAAAAAGAAATTTGAGAACAACTGCCCCCATAGCTGGGCATGGTGGCACACGCCTGTAATCCCAGCTACTCAGAGGGCTGAGGCAGGAGAATCACTTGGACCCAGGAGGCAGAGGTTGTAGCGAGCCAAGATCATACCACTGCATGCCAGCCTGGGAAGGAGAGTGAGATTCCATCTTGGGGAGGGGGGAGGAACCTTGCAAGGTAGATAACAGTAGCCCCTATTTGGAAGGTGGCACAGCTGGGGTCCAGATAGATGAAGTAACTTGCCCAAGGTCACACAGTTAATAAATGGCAAAGCTTGGATTGGAGCCCACATCTTTTGATTATACCACATGAGCATGGCTTTAGACACGCTGGTGCAAGGATCTGTGTGACCTCTAATCTCACAAGAGTCCTTGCTCAGACCCAGAAAGGGCTTCTCTACAGTATAGGAGAGGAATACCTCCAGGTTGCATGTGGGCAGCTGCCAACGTGAATGGCTTGGTCCTCAGCCTATAGAGCTTAAAGGTATTTTGTATAAGCCTAGTTTCCTCCTGTATAAAAAGGGATAAAACATGAACCCTATGTGGTTGTTGCAGGAGGATGTGAAAGTGCTGCCCCAGTACTTGGTATTAAGAATATCAATAAATCATTAGGACTATGATCTATTTTTAAACAATTTTCAAACAAAGTATTACCAAGATCGGAAGAGCACACGTCTGAACTCCAGTCACCACAAAGACACCAACAACTTTCTTATCTCGTATGCCGTCTTCTGCTTGAGCAATACGTGGG      (+,./.--,-+++,/'(&$%&'&(/215982110+*)())*10,-'&&&1%%%(--126;<<?CBAA>;::;;943348644520-,-00000012255/),/-255552+++-5532.,,)&%%&'')...034543423422100122531100.--.2100/0017754240///.22377787656:<@?>>>6666@6666;>=@@?@9940*((*159;?@64448?==<87556.--.8;<?>;;;544499////79993333>:;;;?>@532144/0..26>=;87789<211//0331201101169:8511442213211.//-///34>>91//268:;A=;;;=?>@::954335A>><<==BBAEA@@?A:84//2/:<>>?@@???@>@=><4210,++,10:<>C?>99::>;;:***'***<@?>=<<<710008<;;;;33348;<===<841////0324449----65656653410148<875502.--.135344100/.,,,22....45873224577645443455655566452244:5210033232100011014575554445445410/0.//043335222255111///012457365556200/20/.---/4542101221225543210221...065433457754546524334354210000220.-.///0-2..433233358764444655566543323/-++,,12122.20//005555510/1100023564444776221/0//00335443322225444320,,++002432232000148620//./11332123498779621.-./543212/-..2/--011233743334411103333323442112221112457666587600//23310/0/4//6322223322222200./0123222111233348975543104598434433430/..//02410/.+++,12123241.-./4534347761//0667655433453((()0.--4343//./1,,,,/00///1000241/--./01000''%%%      AS:i:299        XS:i:58 XF:i:3  XE:i:1  NM:i:1
d3d59d85-f117-406d-93e3-4901250df094    0       chr2    164815549       39      828S167M95S     *       0       0       GTTCAGTTACGTATTGCTAGCTCTTTCCCTACACGACGCTCTTCCGATCTGAGATTAAAAAAAAAAAAAAAAACATTTTAACCTAGGTGGAAGTGGAGGGAGGAGGGGACGAAGGAGAGAATAAGAAATTTCTGGAGCTTTTAACAAGGGGAGTGTGAGGGTAATCCAGCAATTCAGAAGCCGGGCGCGGTGGCTCATGCCTATAATCCCAGCACTTATTGGGAGGCCGAGGCAGGTGGATAGCTTGAGCCCAAGAGTTCGAGACCACCCTGGCCAACATAGTGAGAACCCCCCATCTCTATTTAAACAACAACAAAAAAGAAATTTGAGAACAACTGCCCCCATAGCTGGGCATGGTGGCACACGCCTGTAATCCCAGCTACTCAGAGGGCTGAGGCAGGAGAATCACTTGGACCCAGGAGGCAGAGGTTGTAGCGAGCCAAGATCATACCACTGCATGCCAGCCTGGGAAGGAGAGTGAGATTCCATCTTGGGGAGGGGGGAGGAACCTTGCAAGGTAGATAACAGTAGCCCCTATTTGGAAGGTGGCACAGCTGGGGTCCAGATAGATGAAGTAACTTGCCCAAGGTCACACAGTTAATAAATGGCAAAGCTTGGATTGGAGCCCACATCTTTTGATTATACCACATGAGCATGGCTTTAGACACGCTGGTGCAAGGATCTGTGTGACCTCTAATCTCACAAGAGTCCTTGCTCAGACCCAGAAAGGGCTTCTCTACAGTATAGGAGAGGAATACCTCCAGGTTGCATGTGGGCAGCTGCCAACGTGAATGGCTTGGTCCTCAGCCTATAGAGCTTAAAGGTATTTTGTATAAGCCTAGTTTCCTCCTGTATAAAAAGGGATAAAACATGAACCCTATGTGGTTGTTGCAGGAGGATGTGAAAGTGCTGCCCCAGTACTTGGTATTAAGAATATCAATAAATCATTAGGACTATGATCTATTTTTAAACAATTTTCAAACAAAGTATTACCAAGATCGGAAGAGCACACGTCTGAACTCCAGTCACCACAAAGACACCAACAACTTTCTTATCTCGTATGCCGTCTTCTGCTTGAGCAATACGTGGG      (+,./.--,-+++,/'(&$%&'&(/215982110+*)())*10,-'&&&1%%%(--126;<<?CBAA>;::;;943348644520-,-00000012255/),/-255552+++-5532.,,)&%%&'')...034543423422100122531100.--.2100/0017754240///.22377787656:<@?>>>6666@6666;>=@@?@9940*((*159;?@64448?==<87556.--.8;<?>;;;544499////79993333>:;;;?>@532144/0..26>=;87789<211//0331201101169:8511442213211.//-///34>>91//268:;A=;;;=?>@::954335A>><<==BBAEA@@?A:84//2/:<>>?@@???@>@=><4210,++,10:<>C?>99::>;;:***'***<@?>=<<<710008<;;;;33348;<===<841////0324449----65656653410148<875502.--.135344100/.,,,22....45873224577645443455655566452244:5210033232100011014575554445445410/0.//043335222255111///012457365556200/20/.---/4542101221225543210221...065433457754546524334354210000220.-.///0-2..433233358764444655566543323/-++,,12122.20//005555510/1100023564444776221/0//00335443322225444320,,++002432232000148620//./11332123498779621.-./543212/-..2/--011233743334411103333323442112221112457666587600//23310/0/4//6322223322222200./0123222111233348975543104598434433430/..//02410/.+++,12123241.-./4534347761//0667655433453((()0.--4343//./1,,,,/00///1000241/--./01000''%%%      AS:i:167        XS:i:0  XF:i:3  XE:i:1  NM:i:0
d3d59d85-f117-406d-93e3-4901250df094    0       chr11   61056073        39      177S167M746S    *       0       0       GTTCAGTTACGTATTGCTAGCTCTTTCCCTACACGACGCTCTTCCGATCTGAGATTAAAAAAAAAAAAAAAAACATTTTAACCTAGGTGGAAGTGGAGGGAGGAGGGGACGAAGGAGAGAATAAGAAATTTCTGGAGCTTTTAACAAGGGGAGTGTGAGGGTAATCCAGCAATTCAGAAGCCGGGCGCGGTGGCTCATGCCTATAATCCCAGCACTTATTGGGAGGCCGAGGCAGGTGGATAGCTTGAGCCCAAGAGTTCGAGACCACCCTGGCCAACATAGTGAGAACCCCCCATCTCTATTTAAACAACAACAAAAAAGAAATTTGAGAACAACTGCCCCCATAGCTGGGCATGGTGGCACACGCCTGTAATCCCAGCTACTCAGAGGGCTGAGGCAGGAGAATCACTTGGACCCAGGAGGCAGAGGTTGTAGCGAGCCAAGATCATACCACTGCATGCCAGCCTGGGAAGGAGAGTGAGATTCCATCTTGGGGAGGGGGGAGGAACCTTGCAAGGTAGATAACAGTAGCCCCTATTTGGAAGGTGGCACAGCTGGGGTCCAGATAGATGAAGTAACTTGCCCAAGGTCACACAGTTAATAAATGGCAAAGCTTGGATTGGAGCCCACATCTTTTGATTATACCACATGAGCATGGCTTTAGACACGCTGGTGCAAGGATCTGTGTGACCTCTAATCTCACAAGAGTCCTTGCTCAGACCCAGAAAGGGCTTCTCTACAGTATAGGAGAGGAATACCTCCAGGTTGCATGTGGGCAGCTGCCAACGTGAATGGCTTGGTCCTCAGCCTATAGAGCTTAAAGGTATTTTGTATAAGCCTAGTTTCCTCCTGTATAAAAAGGGATAAAACATGAACCCTATGTGGTTGTTGCAGGAGGATGTGAAAGTGCTGCCCCAGTACTTGGTATTAAGAATATCAATAAATCATTAGGACTATGATCTATTTTTAAACAATTTTCAAACAAAGTATTACCAAGATCGGAAGAGCACACGTCTGAACTCCAGTCACCACAAAGACACCAACAACTTTCTTATCTCGTATGCCGTCTTCTGCTTGAGCAATACGTGGG      (+,./.--,-+++,/'(&$%&'&(/215982110+*)())*10,-'&&&1%%%(--126;<<?CBAA>;::;;943348644520-,-00000012255/),/-255552+++-5532.,,)&%%&'')...034543423422100122531100.--.2100/0017754240///.22377787656:<@?>>>6666@6666;>=@@?@9940*((*159;?@64448?==<87556.--.8;<?>;;;544499////79993333>:;;;?>@532144/0..26>=;87789<211//0331201101169:8511442213211.//-///34>>91//268:;A=;;;=?>@::954335A>><<==BBAEA@@?A:84//2/:<>>?@@???@>@=><4210,++,10:<>C?>99::>;;:***'***<@?>=<<<710008<;;;;33348;<===<841////0324449----65656653410148<875502.--.135344100/.,,,22....45873224577645443455655566452244:5210033232100011014575554445445410/0.//043335222255111///012457365556200/20/.---/4542101221225543210221...065433457754546524334354210000220.-.///0-2..433233358764444655566543323/-++,,12122.20//005555510/1100023564444776221/0//00335443322225444320,,++002432232000148620//./11332123498779621.-./543212/-..2/--011233743334411103333323442112221112457666587600//23310/0/4//6322223322222200./0123222111233348975543104598434433430/..//02410/.+++,12123241.-./4534347761//0667655433453((()0.--4343//./1,,,,/00///1000241/--./01000''%%%      AS:i:161        XS:i:0  XF:i:3  XE:i:1  NM:i:1

Then, I applyed the sam2seg function to generated the .seg file as follows, by providing the corresponding .vcf file with -v parameter.

0a0a0f39-8470-464b-aea8-ae41b7967128    chrX!57850012!57850118!+!.!32!1 chrX!118125267!118125631!+!.!47!1       chrX!118200629!118201192!-!.!49!1
0a0a6e33-7628-475a-a306-12ba28ca555d    chr15!94570904!94571063!-!.!36!1        chr15!97971692!97972740!-!1!54!1
0a0a7a06-e53d-4c7a-8675-ff8bf1b74ec3    chr7!7146551!7147017!-!.!48!1   chr10!89043022!89043154!-!1!35!1        chr7!7143778!7144116!+!.!42!1

However, after I generated the .pairs file with modifed seg2pairs function (I have modified this function to generate multi-contact for one read), I observed that the trans-parental contacts in the same chromosome were more than expected, and I want to figure out why.
In general, I want to ask if the hickit::sam2seg functon could deal with the .sam files generated by bwa-sw and make correct phasing decision, and what will sam2seg do if a long mapped read has SNPs derived from opposite phases.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions