Skip to content

training needs to shorten contig names for snap/augustus training #2

@hyphaltip

Description

@hyphaltip

The train step should probably automatically shorten the contig names as it seems like even header length set to 20 that still causes a problem. Adding a test for this to arrest training or better just shrink the names as internally for training the sequence IDs are not relevant?

To reproduce the bug/problem: try to run training step on this genome with long accessions as sequence names: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/030/734/395/GCA_030734395.1_ASM3073439v1/GCA_030734395.1_ASM3073439v1_genomic.fna.gz

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingenhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions