Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the protein sequence to gene family when reading clustering #238

Merged
merged 8 commits into from
Jun 12, 2024

Conversation

jpjarnoux
Copy link
Member

@jpjarnoux jpjarnoux commented Jun 10, 2024

When external clustering is provided, gene families sequences were missing in the pangenome file. This resulted in some output files being impossible to generated. We especially noticed that the FASTA file of RGP borders (created by ppanggolin write_pangenome --borders) was containing only sequence IDs without actual sequences.

This PR adds functionality to translate the representative gene of each gene family in the cluster step, enabling the writing of the protein sequence into the HDF5 pangenome file.

This change should resolve the issue of missing sequence outputs.

@JeanMainguy JeanMainguy self-requested a review June 11, 2024 13:09
ppanggolin/formats/writeFlatPangenome.py Outdated Show resolved Hide resolved
ppanggolin/workflow/all.py Outdated Show resolved Hide resolved
Base automatically changed from subprocessErr to dev June 11, 2024 16:19
@jpjarnoux jpjarnoux merged commit d9563a5 into dev Jun 12, 2024
4 checks passed
@jpjarnoux jpjarnoux deleted the ReadClustSeq branch June 12, 2024 06:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants