-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversion between GenBank and SBOL3 #183
Comments
Hi @jakebeal I'm Gonzalo Vidal PhD candidate on biologial and medical engineering from Chile. I have 3 years of experience in Python and 1 in SBOL. I am willing to contribute to this project for GSoC 2022, any guidance on where to begin and where can I learn Biopython would be encouraging and helpful. |
Hi, @Gonza10V : I'd be happy to supervise you on this project. If you want to get started playing with biopython, I would suggest:
|
@ArchitJain1201 also expressed interest in this project. I sent the following background information in response to an email from @ArchitJain1201 requesting suggestions for where to begin. I am posting it here so others can clarify, elaborate, or correct this response, as well as for the benefit of others who might be interested in working on this task. My reply: See https://github.com/SynBioDex/SBOL-utilities That repository is a collection of utility programs for SBOL, particularly SBOL3. In the file sbol_utilities/conversion.py you will find two functions: convert_from_genbank and convert_to_genbank. convert_to_genbank currently works by converting SBOL3 files to SBOL2 files, then uploading the files to an online SBOL2-to-genbank converter. convert_from_genbank goes the opposite way, converting genbank to SBOL2 and then SBOL2 to SBOL3. It's a lossy process in both directions. What is desired in a conversion between GenBank and SBOL3 is a more direct conversion, and one entirely written in Python so that it can be run locally, without the need for an online converter, and without the need to convert to/from SBOL2. As I understand it, Genbank is a very loose format. I don't think there is a specification, or if there is it is minimal. I might be wrong about that. There are sample SBOL files, for both SBOL2 and SBOL3, in https://github.com/SynBioDex/SBOLTestSuite. You could try those out. The online converter can be found at https://validator.sbolstandard.org/validate/ If you plan to work on this it would be a good idea to open an issue on SBOL-utilities for it so that you can ask questions, get answers, and so forth. That will also prevent duplication of effort. Please let us know via a GitHub issue if you need additional assistance. https://github.com/SynBioDex/SBOL-utilities/issues I'm not the best person to answer all the questions for this task. There are others who monitor the issues there that will have additional information. |
Hi @jakebeal @tcmitchell @cjmyers @bbartley I took a Genetics course at college and did a project using some ML libraries, Biopython, Py3Dmol, and nglview which you can find here. I'll start studying from the resources you attached above about SBOL (the SBOL tutorial material on the data model and Python library that was presented at IWBDA 2021) to start working on this project for GSOC 22. Thanks for your time |
NRNB has officially been accepted as a mentoring organization for GSoC 2022! Here are some useful links: |
Hi @tcmitchell @jakebeal @bbartley @cjmyers, I have read the SBOL tutorial material on the data model and Python library that was presented at IWBDA 2021 and I have now a good understanding of SBOL, SBOL data model, what are SBOL composition, the difference between SBOL, FASTA, and GenBank. Also, I have watched some videos from this playlist IWBDA 2021. I opened the repo and understood the code of important files. Finally, It's great that NRNB has officially been accepted. I'll start working on my proposal for this project as soon as possible. I hope you tell me what is the next step? Thanks for your time. |
Hi @ahmedtarek26, thanks for your interest! Here are some links that should help you with next steps:
We are happy to answer any questions that you might have while you develop your proposal/application. Please post those here so we can maintain a level playing field for all potential contributors. Thanks! |
Hi @tcmitchell, |
Here are some links from the GSoC Mentors mailing list that might be generally helpful to all who are interested in this project: |
A reminder that the application period opens on Monday April 4. Proposals to NRNB must be submitted on the official GSoC Site (https://summerofcode.withgoogle.com/) before April 19, 18:00 UTC to be considered, and contributors are encouraged to submit proposals in draft format early, so that mentors can give feedback directly at the GSoC site. |
IMPORTANT REMINDER: GSoC 2022 is for new “beginners” to open source. Applicants are expected to review eligibility requirements prior to applying. We can not accept applications from contributors with prior open source development experience. From the GSoC FAQ https://developers.google.com/open-source/gsoc/faq:
|
Closing because this is an active project for GSoC 2022. |
Background
SBOL3 can currently be converted to GenBank only by first being downconverted to SBOL2, and vice versa. We would like to have the ability to directly convert between the two formats. This would be implemented as part of sbol-utilities in using BioPython and pySBOL3.
Goal
Equivalent conversion of a set of test GenBank files.
Difficulty Level: Easy
There is a well-defined and existing two-step conversion, and the project just needs to build an equivalent direct conversion.
Size and Length of Project
Skills
Essential skills: Python
Will be learned if not known: SBOL, BioPython
Public Repository
https://github.com/SynBioDex/SBOL-utilities
Potential Mentors
jakebeal@ieee.org, tom.mitchell@raytheon.com, Bryan.A.Bartley@raytheon.com,Chris.Myers@colorado.edu
The text was updated successfully, but these errors were encountered: