could you give us an example of cactus? #4

xuxingyubio · 2023-05-22T02:57:18Z

No description provided.

xuxingyubio · 2023-05-22T02:58:05Z

Sorry to bother you, could you give me an example of cactus? I downloaded the pan genome constructed using cactus from HPRC, and it seems that there are some differences between it and pggb when using this software. As I have just come into contact with this field, I don't know how to solve it.

danydoerr · 2023-05-22T08:59:35Z

Sure! I guess it breaks when grouping paths together, because cactus uses both "P" and "W" lines. I'll soon make some convenience options available in panacus that will allow grouping by sample/haplotype without providing an explicit group list. But until then, here's a recipe that works with the HPRC MC graph and other graphs that use W-lines:

Download and unpack the graph:

https://s3-us-west-2.amazonaws.com/human-pangenomics/pangenomes/freeze/freeze1/minigraph-cactus/hprc-v1.0-mc-grch38.gfa.gz
gunzip hprc-v1.0-mc-grch38.gfa.gz

Prepare file to group paths by sample: <-- this is the crucial step!

grep '^P' hprc-v1.0-mc-grch38.gfa | cut -f2 > hprc-v1.0-mc-grch38.paths.txt
grep -e '^W' hprc-v1.0-mc-grch38.gfa | cut -f2-6 | awk '{ print $1 "#" $2 "#" $3 ":" $4 "-" $5 }' >> hprc-v1.0-mc-grch38.paths.txt
cut -f1 -d\# hprc-v1.0-mc-grch38.paths.txt > hprc-v1.0-mc-grch38.groupnames.txt
paste hprc-v1.0-mc-grch38.paths.txt hprc-v1.0-mc-grch38.groupnames.txt > hprc-v1.0-mc-grch38.groups.txt

Prepare file to select subset of paths corresponding to haplotypes: <-- in the MC graph, reference paths are upper-cased

grep -ive 'grch38\|chm13' hprc-v1.0-mc-grch38.paths.txt > hprc-v1.0-mc-grch38.paths.haplotypes.txt

Run panacus histgrowth to calculate pangenome growth for nodes (default) with quorum tresholds 0, 1, 0.5, and 0.1 using up to 16 threads:

RUST_LOG=info panacus histgrowth -t16 -q 0,1,0.5,0.1 -g hprc-v1.0-mc-grch38.groups.txt -s hprc-v1.0-mc-grch38.paths.haplotypes.txt hprc-v1.0-mc-grch38.gfa > hprc-v1.0-mc-grch38.histgrowth.node.txt

hprc-v1.0-mc-grch38.histgrowth.node.pdf
hprc-v1.0-mc-grch38.histgrowth.node.txt

xuxingyubio · 2023-05-22T12:31:30Z

Thank you for your answer.
How to quantify the amount of non-reference(GRCh38) sequence has always troubled me.
Does the above step indicate this information, or does it require some processing of the constructed pangenome?

danydoerr · 2023-05-22T13:11:31Z

No processing required, panacus can do that for you if you hand it an exclusion file. The part of the graph that is traversed by any of the specified paths in the exclusion file will be omitted from the computation. So, staying with this example, to quantify non-reference (GRCh38) sequence (bp), do the following:

Produce exclusion file

grep -ie 'grch38' hprc-v1.0-mc-grch38.paths.txt > hprc-v1.0-mc-grch38.paths.grch38.txt

Run panacus histrowth:

RUST_LOG=info panacus histgrowth -t16 -c bp -q 0,1,0.5,0.1 -e hprc-v1.0-mc-grch38.paths.grch38.txt -g hprc-v1.0-mc-grch38.groups.txt -s hprc-v1.0-mc-grch38.paths.haplotypes.txt hprc-v1.0-mc-grch38.gfa > hprc-v1.0-mc-grch38.histgrowth.nogrch38.bp.txt

xuxingyubio · 2023-05-22T14:01:37Z

In ‘A draft human pangenome reference’,Fig.3g shows that when the first sample is added, the total length is about 30Mb. When I used the above method, its length seemed to be a haplotype length.How has it been processed？

danydoerr · 2023-05-22T16:49:10Z

Fig. 3g is an ordered growth histogram (which you can also produce with this tool). But you can also do this by looking at all possible permutations, which is what panacus histgrowth does. If you follow the steps from above, you should be able to get the attached data / plot. As expected, the average 1st genome adds about 30Mb of non-reference sequence to the pangenome. The data is consistent with Fig. 3g from the paper.

hprc-v1.0-mc-grch38.histgrowth.nogrch38.bp.pdf
hprc-v1.0-mc-grch38.histgrowth.nogrch38.bp.txt

danydoerr · 2023-06-09T15:00:02Z

A new release is out that fixes several bugs in the software. If your issue still persists, please open a new issue--I'm closing this one.

danydoerr pinned this issue May 22, 2023

danydoerr closed this as completed Jun 9, 2023

danydoerr unpinned this issue Jun 15, 2023

ld9866 mentioned this issue Dec 1, 2023

How to Visualize the results of the minigraph-cactus? #14

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

could you give us an example of cactus? #4

could you give us an example of cactus? #4

xuxingyubio commented May 22, 2023

xuxingyubio commented May 22, 2023

danydoerr commented May 22, 2023

xuxingyubio commented May 22, 2023

danydoerr commented May 22, 2023 •

edited

Loading

xuxingyubio commented May 22, 2023

danydoerr commented May 22, 2023

danydoerr commented Jun 9, 2023

could you give us an example of cactus? #4

could you give us an example of cactus? #4

Comments

xuxingyubio commented May 22, 2023

xuxingyubio commented May 22, 2023

danydoerr commented May 22, 2023

xuxingyubio commented May 22, 2023

danydoerr commented May 22, 2023 • edited Loading

xuxingyubio commented May 22, 2023

danydoerr commented May 22, 2023

danydoerr commented Jun 9, 2023

danydoerr commented May 22, 2023 •

edited

Loading