-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
could you give us an example of cactus? #4
Comments
Sorry to bother you, could you give me an example of cactus? I downloaded the pan genome constructed using cactus from HPRC, and it seems that there are some differences between it and pggb when using this software. As I have just come into contact with this field, I don't know how to solve it. |
Sure! I guess it breaks when grouping paths together, because cactus uses both "P" and "W" lines. I'll soon make some convenience options available in panacus that will allow grouping by sample/haplotype without providing an explicit group list. But until then, here's a recipe that works with the HPRC MC graph and other graphs that use W-lines:
https://s3-us-west-2.amazonaws.com/human-pangenomics/pangenomes/freeze/freeze1/minigraph-cactus/hprc-v1.0-mc-grch38.gfa.gz
gunzip hprc-v1.0-mc-grch38.gfa.gz
grep '^P' hprc-v1.0-mc-grch38.gfa | cut -f2 > hprc-v1.0-mc-grch38.paths.txt
grep -e '^W' hprc-v1.0-mc-grch38.gfa | cut -f2-6 | awk '{ print $1 "#" $2 "#" $3 ":" $4 "-" $5 }' >> hprc-v1.0-mc-grch38.paths.txt
cut -f1 -d\# hprc-v1.0-mc-grch38.paths.txt > hprc-v1.0-mc-grch38.groupnames.txt
paste hprc-v1.0-mc-grch38.paths.txt hprc-v1.0-mc-grch38.groupnames.txt > hprc-v1.0-mc-grch38.groups.txt
grep -ive 'grch38\|chm13' hprc-v1.0-mc-grch38.paths.txt > hprc-v1.0-mc-grch38.paths.haplotypes.txt
RUST_LOG=info panacus histgrowth -t16 -q 0,1,0.5,0.1 -g hprc-v1.0-mc-grch38.groups.txt -s hprc-v1.0-mc-grch38.paths.haplotypes.txt hprc-v1.0-mc-grch38.gfa > hprc-v1.0-mc-grch38.histgrowth.node.txt hprc-v1.0-mc-grch38.histgrowth.node.pdf |
Thank you for your answer. |
No processing required,
grep -ie 'grch38' hprc-v1.0-mc-grch38.paths.txt > hprc-v1.0-mc-grch38.paths.grch38.txt
RUST_LOG=info panacus histgrowth -t16 -c bp -q 0,1,0.5,0.1 -e hprc-v1.0-mc-grch38.paths.grch38.txt -g hprc-v1.0-mc-grch38.groups.txt -s hprc-v1.0-mc-grch38.paths.haplotypes.txt hprc-v1.0-mc-grch38.gfa > hprc-v1.0-mc-grch38.histgrowth.nogrch38.bp.txt |
In ‘A draft human pangenome reference’,Fig.3g shows that when the first sample is added, the total length is about 30Mb. When I used the above method, its length seemed to be a haplotype length.How has it been processed? |
Fig. 3g is an ordered growth histogram (which you can also produce with this tool). But you can also do this by looking at all possible permutations, which is what hprc-v1.0-mc-grch38.histgrowth.nogrch38.bp.pdf |
A new release is out that fixes several bugs in the software. If your issue still persists, please open a new issue--I'm closing this one. |
No description provided.
The text was updated successfully, but these errors were encountered: