Add the protein sequence to gene family when reading clustering #238

jpjarnoux · 2024-06-10T21:12:14Z

When external clustering is provided, gene families sequences were missing in the pangenome file. This resulted in some output files being impossible to generated. We especially noticed that the FASTA file of RGP borders (created by ppanggolin write_pangenome --borders) was containing only sequence IDs without actual sequences.

This PR adds functionality to translate the representative gene of each gene family in the cluster step, enabling the writing of the protein sequence into the HDF5 pangenome file.

This change should resolve the issue of missing sequence outputs.

ppanggolin/formats/writeFlatPangenome.py

ppanggolin/workflow/all.py

…nto ReadClustSeq

jpjarnoux added 2 commits June 10, 2024 23:05

Add the protein sequence to gene family when reading clustering

6ddc973

Add a warning message if gene families are without sequences

9e62cd2

JeanMainguy self-requested a review June 11, 2024 13:09

JeanMainguy reviewed Jun 11, 2024

View reviewed changes

ppanggolin/formats/writeFlatPangenome.py Outdated Show resolved Hide resolved

ppanggolin/workflow/all.py Outdated Show resolved Hide resolved

Translate representative gene when read clustering everytime

96653a9

Base automatically changed from subprocessErr to dev June 11, 2024 16:19

jpjarnoux and others added 5 commits June 11, 2024 18:26

Merge branch 'dev' into ReadClustSeq

283428e

Merge branch 'ReadClustSeq' of https://github.com/labgem/PPanGGOLiN i…

bf44030

…nto ReadClustSeq

get_genes return one gene if begin==end

b9a542c

remove deprecated option

edb09e6

fix remaining deprecated args

ed629cf

jpjarnoux merged commit d9563a5 into dev Jun 12, 2024
4 checks passed

jpjarnoux deleted the ReadClustSeq branch June 12, 2024 06:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add the protein sequence to gene family when reading clustering #238

Add the protein sequence to gene family when reading clustering #238

jpjarnoux commented Jun 10, 2024 •

edited by JeanMainguy

Loading

Add the protein sequence to gene family when reading clustering #238

Add the protein sequence to gene family when reading clustering #238

Conversation

jpjarnoux commented Jun 10, 2024 • edited by JeanMainguy Loading

jpjarnoux commented Jun 10, 2024 •

edited by JeanMainguy

Loading