The train step should probably automatically shorten the contig names as it seems like even header length set to 20 that still causes a problem. Adding a test for this to arrest training or better just shrink the names as internally for training the sequence IDs are not relevant?
To reproduce the bug/problem: try to run training step on this genome with long accessions as sequence names: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/030/734/395/GCA_030734395.1_ASM3073439v1/GCA_030734395.1_ASM3073439v1_genomic.fna.gz