Uncollapsed gene families #223

nicola-palmieri · 2023-03-09T15:50:41Z

I am using Panaroo on 100 E. coli isolates using different parameters and I am validating the gene presence/absence matrix using a subset of genes (genes starting with tra). It seems that often the same gene is split among different rows, ideally, I would expect only 15 rows for all the genes starting with tra: traA, traC, traD, traG and so on. This has important implications since I want to use this data to perform a GWAS analysis. Do you have some recommendations on how to minimize this issue? I attach a part of the gene presence/absence table for one of the runs filtered for the tra genes. I have played with -c, -f, -mode and merge_paralogs parameters without success.

Thank you.
Nicola

gtonkinhill · 2023-03-15T00:22:14Z

Hi Nicola,

This is likely to be mainly due to the diversity and mobility of the gene.

If you have diverse versions of a gene occurring in different regions of the genome it can be very challenging (and sometimes not desirable) to cluster them together. In my experience, annotations of gene families like this one are often inconsistent so I would also expect instance of annotations of different tra* genes to be clustered together in some situations.

By design, Panaroo is cautious about clustering diverse copies of a gene that occur in different locations as these can often have different functions or phenotypes. I would recommend performing two separate GWAS analyses. One using the gene clusters from Panaroo, which will also encode information about location and diversity and one using unitigs as described in pyseer

gtonkinhill closed this as completed Apr 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uncollapsed gene families #223

Uncollapsed gene families #223

nicola-palmieri commented Mar 9, 2023

gtonkinhill commented Mar 15, 2023

Uncollapsed gene families #223

Uncollapsed gene families #223

Comments

nicola-palmieri commented Mar 9, 2023

gtonkinhill commented Mar 15, 2023