Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding input files for Panache #197

Open
mictadlo opened this issue Apr 11, 2024 · 3 comments
Open

Adding input files for Panache #197

mictadlo opened this issue Apr 11, 2024 · 3 comments
Labels
enhancement Improvement for existing functionality

Comments

@mictadlo
Copy link

Description of feature

Description of feature

Hi,
I found Panache a web-based interface designed for the visualization of linearized pangenomes. It can be used to show presence/absence information of pangenomic blocks of sequence or genes in a browser-like display. This documation shows how to create the input files for Panache.

Thank you for your considaration.

Michal

@mictadlo mictadlo added the enhancement Improvement for existing functionality label Apr 11, 2024
@subwaystation
Copy link
Collaborator

Hi @mictadlo,

I am aware of Panache, but it does not seem straightforward to get the files right. See SouthGreenPlatform/panache#32. It depends on the input data.
If @SingingMeerkat or @brettChapman can share all the steps necessary from a pggb graph to the actual visualization, I would have a starting point, though.

@mictadlo There is also https://github.com/chfi/waragraph. You can directly plugin the 2D TSV layout from nf-core/pangenome and interactively explore the graph, including a 1D viz!

@SingingMeerkat
Copy link

Hi @mictadlo and @subwaystation ,

Thanks for your interest in Panache! I agree that for now the bridge between pggb and Panache is difficult to cross, especially as Panache has been built as to be usable for pangenome graphs and pan gene atlas alike, and assumes nothing about how the input blocks are obtained.

Unfortunately I cannot dedicate as much time as I would like on Panache anymore, but I would be happy to help make it more accessible. I may have more time in 2 weeks, in the meantime I opened a dedicated issue at SouthGreenPlatform/panache#38 , to keep it in my mind.

@brettChapman
Copy link

brettChapman commented Apr 17, 2024

Hi @mictadlo

I'd be happy to share my steps for creating a PAV matrix here, generated from the PGGB graphs:

reference_prefix=(some reference ID name)
odgi paths -i pangenome_chr1.og -f | grep ${reference_prefix} -A 1 > reference.fa
samtools faidx reference.fa
cut -f 1,2 reference.fa.fai > genome.txt
bedtools makewindows -g genome.txt -w 1000 > pangenome_chr1.w1000bp.bed
odgi pav -i pangenome_chr1.og -b pangenome_chr1.w1000bp.bed -M -B 0.5 > pangenome_chr1.pavs.txt

You then correct any header names, remove the reference column name from the PAV file (odgi pav produces pav for every path in the graph) add additional columns as per the Panache Wiki (https://github.com/SouthGreenPlatform/panache/wiki/Files-&-formats), and merge all the PAV matrices together across all chromosomes. I usually use pandas and merge all the dataframes as some matrices have columns in different positions, generating a BED file called pav.bed. Then you want to merge with gene coordinates to only show overlapping PAV with genes. This helps reduce the size as a large PAV matrix can hit Panache performance.

bedtools intersect -wa -a pav.bed -b genome.gff | sort | uniq | sort -k1,1 -k2,2n > overlaps.bed

The resulting BED file an the GFF file can then be converted to JSON format using the Panache conversion script.

You'll also want to generate a newick file of all genomes except the reference genome used, which will be added for sorting by phylogeny. I use mashtree for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improvement for existing functionality
Projects
None yet
Development

No branches or pull requests

4 participants