Adding input files for Panache #197

mictadlo · 2024-04-11T00:41:41Z

Description of feature

Hi,
I found Panache a web-based interface designed for the visualization of linearized pangenomes. It can be used to show presence/absence information of pangenomic blocks of sequence or genes in a browser-like display. This documation shows how to create the input files for Panache.

Thank you for your considaration.

Michal

subwaystation · 2024-04-11T08:49:12Z

Hi @mictadlo,

I am aware of Panache, but it does not seem straightforward to get the files right. See SouthGreenPlatform/panache#32. It depends on the input data.
If @SingingMeerkat or @brettChapman can share all the steps necessary from a pggb graph to the actual visualization, I would have a starting point, though.

@mictadlo There is also https://github.com/chfi/waragraph. You can directly plugin the 2D TSV layout from nf-core/pangenome and interactively explore the graph, including a 1D viz!

SingingMeerkat · 2024-04-16T09:22:31Z

Hi @mictadlo and @subwaystation ,

Thanks for your interest in Panache! I agree that for now the bridge between pggb and Panache is difficult to cross, especially as Panache has been built as to be usable for pangenome graphs and pan gene atlas alike, and assumes nothing about how the input blocks are obtained.

Unfortunately I cannot dedicate as much time as I would like on Panache anymore, but I would be happy to help make it more accessible. I may have more time in 2 weeks, in the meantime I opened a dedicated issue at SouthGreenPlatform/panache#38 , to keep it in my mind.

brettChapman · 2024-04-17T04:33:01Z

Hi @mictadlo

I'd be happy to share my steps for creating a PAV matrix here, generated from the PGGB graphs:

reference_prefix=(some reference ID name)
odgi paths -i pangenome_chr1.og -f | grep ${reference_prefix} -A 1 > reference.fa
samtools faidx reference.fa
cut -f 1,2 reference.fa.fai > genome.txt
bedtools makewindows -g genome.txt -w 1000 > pangenome_chr1.w1000bp.bed
odgi pav -i pangenome_chr1.og -b pangenome_chr1.w1000bp.bed -M -B 0.5 > pangenome_chr1.pavs.txt

You then correct any header names, remove the reference column name from the PAV file (odgi pav produces pav for every path in the graph) add additional columns as per the Panache Wiki (https://github.com/SouthGreenPlatform/panache/wiki/Files-&-formats), and merge all the PAV matrices together across all chromosomes. I usually use pandas and merge all the dataframes as some matrices have columns in different positions, generating a BED file called pav.bed. Then you want to merge with gene coordinates to only show overlapping PAV with genes. This helps reduce the size as a large PAV matrix can hit Panache performance.

bedtools intersect -wa -a pav.bed -b genome.gff | sort | uniq | sort -k1,1 -k2,2n > overlaps.bed

The resulting BED file an the GFF file can then be converted to JSON format using the Panache conversion script.

You'll also want to generate a newick file of all genomes except the reference genome used, which will be added for sorting by phylogeny. I use mashtree for this.

mictadlo added the enhancement Improvement for existing functionality label Apr 11, 2024

SingingMeerkat mentioned this issue Apr 16, 2024

Linking pggb output to Panache input SouthGreenPlatform/panache#38

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding input files for Panache #197

Adding input files for Panache #197

mictadlo commented Apr 11, 2024

subwaystation commented Apr 11, 2024

SingingMeerkat commented Apr 16, 2024

brettChapman commented Apr 17, 2024 •

edited

Loading

Adding input files for Panache #197

Adding input files for Panache #197

Comments

mictadlo commented Apr 11, 2024

Description of feature

Description of feature

subwaystation commented Apr 11, 2024

SingingMeerkat commented Apr 16, 2024

brettChapman commented Apr 17, 2024 • edited Loading

brettChapman commented Apr 17, 2024 •

edited

Loading