-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Annotation File Support #67
base: master
Are you sure you want to change the base?
Conversation
Hello @AbhinavChede This is wonderful work! I can see that you have carefully researched the formats of the example files and created deliberate regex. I can only comment on the Greengenes file:
Hello @pavia27 Please work with Abhinav to sort out the KEGG file format. It may be helpful it we have a small sample set of files placed under
|
@pavia27 any thoughts? Thanks! |
Hi @qiyunzhu ,
I have added the functionality into BinaRena that reads annotation files. This is only a preliminary version. I think the current version gets the job done but maybe is not the most optimal method, regarding run time. Also, I do not know if that is how you imagined the code should check if the current file is an annotation file. See here . I also need to find a better regex for the greengenes file to account for exceptions in the order of the taxons. For now, it does recognize most of the taxons in the test cases.
Also, @pavia27 , is this how you imagined the KEGG support should work? Right now, the code reads the KEGG annotation file and outputs which genes the contig has. It does not indicate at which position the genes is in nor does it show the missing genes. Regarding the KEGG support, I only tested it with a very small dataset due to a lack of proper testing samples. Let me know what you think and if there is something I have to add.