New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add detection of pseudo genes #4
Comments
idea for workflow:
|
* Add initial pseudogene detection * Fix type hints for Python 3.8 * Add pseudogene prediction tests * Refactor pseudogene prediction functions * Add pseudogene metrics always to the Genome Annotation Summary * Reanable preliminary pseudogene-candidate cutoffs * Set alignment length based on protein level as default * Add minor improvements * Add missing docstrings * Refactor --pseudo argument to --skip-pseudo * Add unused loss of stop codon test, comment loss of start codon * Add unused 'loss of stop codon' case, comment 'loss of start codon' * Refactor missed '--pseudo' argument to '--skip-pseudo' * Refactor pseudogene constants * Refactor unused code and inconsistencies * Add test for get_elongated_cds() * Fix get_elongated_cds() fill calculation * Refactor constant variables * Refactor subprocess.run(cmd) * Refactor pseudogene variable names * Add pseudogene count to annotation summary * Change: drop unused pseudogene causes * Refactor cause variable to causes
Hi there, |
Hi @amvarani , thanks for reaching out on this. We're actively working on a first pseudogene detection/annotation feature as a default step in the Bakta workflow. We're currently fixing the last little things and look forward to release a new version very soon (i.e. next weeks). There are different strategies how to detect pseudogenes, the most promising would be to use a closely related genome - however, as this is often not the case in a de novo assembly/annotation workflow, we address this without external genome information. I hope this answers your question. Best regards! |
* Fix pseudogene position calculation #4 * Refactor pseudogene logging #4 * Change pseudogene tests to use new positioning logic * Rewrite pseudogene positioning logic * Add pseudogene stop codon point mutation cause * Add pseudogene stop codon point mutation cause test * Fix type hint TypeError * Refactor variable names * Refactor missed variable names
First hints for ideas:
The text was updated successfully, but these errors were encountered: