Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversion between GenBank and SBOL3 #183

Closed
jakebeal opened this issue Oct 15, 2021 · 12 comments
Closed

Conversion between GenBank and SBOL3 #183

jakebeal opened this issue Oct 15, 2021 · 12 comments

Comments

@jakebeal
Copy link

jakebeal commented Oct 15, 2021

Background

SBOL3 can currently be converted to GenBank only by first being downconverted to SBOL2, and vice versa. We would like to have the ability to directly convert between the two formats. This would be implemented as part of sbol-utilities in using BioPython and pySBOL3.

Goal

Equivalent conversion of a set of test GenBank files.

Difficulty Level: Easy

There is a well-defined and existing two-step conversion, and the project just needs to build an equivalent direct conversion.

Size and Length of Project

  • 350 hours
  • 12 week project, timeline is flexible.

Skills

Essential skills: Python
Will be learned if not known: SBOL, BioPython

Public Repository

https://github.com/SynBioDex/SBOL-utilities

Potential Mentors

jakebeal@ieee.org, tom.mitchell@raytheon.com, Bryan.A.Bartley@raytheon.com,Chris.Myers@colorado.edu

@Gonza10V
Copy link

Gonza10V commented Jan 4, 2022

Hi @jakebeal I'm Gonzalo Vidal PhD candidate on biologial and medical engineering from Chile. I have 3 years of experience in Python and 1 in SBOL. I am willing to contribute to this project for GSoC 2022, any guidance on where to begin and where can I learn Biopython would be encouraging and helpful.

@jakebeal
Copy link
Author

jakebeal commented Jan 4, 2022

Hi, @Gonza10V : I'd be happy to supervise you on this project. If you want to get started playing with biopython, I would suggest:

  1. Looking at how it's already used in SBOL-utilities, and
  2. Spending some time with the BioPython Cookbook

@tcmitchell
Copy link

@ArchitJain1201 also expressed interest in this project. I sent the following background information in response to an email from @ArchitJain1201 requesting suggestions for where to begin. I am posting it here so others can clarify, elaborate, or correct this response, as well as for the benefit of others who might be interested in working on this task.

My reply:

See https://github.com/SynBioDex/SBOL-utilities

That repository is a collection of utility programs for SBOL, particularly SBOL3.

In the file sbol_utilities/conversion.py you will find two functions: convert_from_genbank and convert_to_genbank.

convert_to_genbank currently works by converting SBOL3 files to SBOL2 files, then uploading the files to an online SBOL2-to-genbank converter. convert_from_genbank goes the opposite way, converting genbank to SBOL2 and then SBOL2 to SBOL3. It's a lossy process in both directions.

What is desired in a conversion between GenBank and SBOL3 is a more direct conversion, and one entirely written in Python so that it can be run locally, without the need for an online converter, and without the need to convert to/from SBOL2.

As I understand it, Genbank is a very loose format. I don't think there is a specification, or if there is it is minimal. I might be wrong about that.

There are sample SBOL files, for both SBOL2 and SBOL3, in https://github.com/SynBioDex/SBOLTestSuite. You could try those out. The online converter can be found at https://validator.sbolstandard.org/validate/

If you plan to work on this it would be a good idea to open an issue on SBOL-utilities for it so that you can ask questions, get answers, and so forth. That will also prevent duplication of effort.

Please let us know via a GitHub issue if you need additional assistance. https://github.com/SynBioDex/SBOL-utilities/issues

I'm not the best person to answer all the questions for this task. There are others who monitor the issues there that will have additional information.

@ahmedtarek26
Copy link

Hi @jakebeal @tcmitchell @cjmyers @bbartley
I am Ahmed Tarek and I am a medical informatics 3rd-year undergraduate student. I have good experience using python for two years.
I am interested in machine learning, and deep learning so I joined Neuromatch Academy as an interactive student in which we used Pytorch.
I am working as a research assistant on a research paper in NLP and we are about to publish our work soon.

I took a Genetics course at college and did a project using some ML libraries, Biopython, Py3Dmol, and nglview which you can find here.
I used Biopython in this project to deal with fasta files and read them, translate and transcribe the sequence, then analyze protein sequence and compare between each gene.
I used PDB id for each gene to visualize it using Py3Dmol and nglview.

I'll start studying from the resources you attached above about SBOL (the SBOL tutorial material on the data model and Python library that was presented at IWBDA 2021) to start working on this project for GSOC 22.

Thanks for your time

@khanspers
Copy link
Contributor

NRNB has officially been accepted as a mentoring organization for GSoC 2022! Here are some useful links:

@ahmedtarek26
Copy link

Hi @tcmitchell @jakebeal @bbartley @cjmyers,

I have read the SBOL tutorial material on the data model and Python library that was presented at IWBDA 2021 and I have now a good understanding of SBOL, SBOL data model, what are SBOL composition, the difference between SBOL, FASTA, and GenBank.

Also, I have watched some videos from this playlist IWBDA 2021.

I opened the repo and understood the code of important files.

Finally, It's great that NRNB has officially been accepted. I'll start working on my proposal for this project as soon as possible.

I hope you tell me what is the next step?

Thanks for your time.

@tcmitchell
Copy link

Hi @ahmedtarek26, thanks for your interest! Here are some links that should help you with next steps:

We are happy to answer any questions that you might have while you develop your proposal/application. Please post those here so we can maintain a level playing field for all potential contributors.

Thanks!

@ahmedtarek26
Copy link

Hi @tcmitchell,
I'm working on the proposal and there are some details I'll add but these days there are many college works I should do, so I'll continue the proposal soon. I hope to share a draft via email next Thursday if available.
Thanks for your time and help

@tcmitchell
Copy link

Here are some links from the GSoC Mentors mailing list that might be generally helpful to all who are interested in this project:

@khanspers
Copy link
Contributor

A reminder that the application period opens on Monday April 4. Proposals to NRNB must be submitted on the official GSoC Site (https://summerofcode.withgoogle.com/) before April 19, 18:00 UTC to be considered, and contributors are encouraged to submit proposals in draft format early, so that mentors can give feedback directly at the GSoC site.

@AlexanderPico
Copy link
Member

IMPORTANT REMINDER: GSoC 2022 is for new “beginners” to open source.

Applicants are expected to review eligibility requirements prior to applying. We can not accept applications from contributors with prior open source development experience. From the GSoC FAQ https://developers.google.com/open-source/gsoc/faq:

Can someone already participating in open source be a GSoC Contributor?

The goal of GSoC is to bring new contributors into open source organizations. GSoC can also help beginner contributors learn the ins and outs of open source while being mentored by experienced community members.
GSoC is for new and beginner contributors to open source, it is not for experienced contributors to open source.

@khanspers
Copy link
Contributor

Closing because this is an active project for GSoC 2022.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants