-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Import and export fasta format for RNA, DNA and PEPTIDES #1755
Comments
This task covers import/export of sequences with defined monomers:
|
Background
This task covers import/export of sequences with defined monomers
These sequences in fasta are represented as a header, comment and plain string like:
plain string is a combination of the following symbols:
for peptides:
A - Alanine
C - Cysteine
D - Aspartic Acid
E - Glutamic Acid
F - Phenylalanine
G - Glycine
H - Histidine
I - Isoleucine
K - Lysine
L - Leucine
M - Methionine
N - Asparagine
O - Pyrrolysine
P - Proline
Q - Glutamine
R - Arginine
S - Serine
T - Threonine
U - Selenocysteine
V - Valine
W - Tryptophan
Y - Tyrosine
for RNA nucleotides:
A - AMP (Adenosine monophosphate)
C - CMP (Cytidine monophosphate)
G - GMP (Guanosine monophosphate)
U - UMP (Uridine monophosphate)
T - rTMP (Ribothymidine monophosphate)
for DNA nucleotides:
A - dAMP (Deoxyadenosine monophosphate)
C - dCMP (Deoxycytidine monophosphate)
G - dGMP (Deoxyguanosine monophosphate)
U - dUMP (Deoxyuridine monophosphate)
T - TMP (Thymidine monophosphate)
*
- translation stop-
- gap of indeterminate lengthRequirements:
Each sequence in FASTA format is expressed in 2 or more lines of text. The first line is an identifying header, the remainder of the lines (one or more) represent the sequence itself.
The header line starts with a greater-than symbol (">") and ends with newline. Allowed characters are "A" to "Z", "a" to
"z", "0" to "9", "_", "-", ".", ",", ";" and "|" with SPACES between them.
The comments line starts with the semicolon ";" symbol and ends with newline. May contain any symbol (including ">")
Solution
Add language bindings for Python, Java, C#
python binding functions:
def loadFASTA(self, input_string: string, sequence_type: string):
def loadFASTAFromFile(self, input_file: string, sequence_type: string):
def FASTA(self, sequence_type: string):
def saveFASTA(self, output_file: string, sequence_type: string)
Add the following content types to WASM "loadMoleculeOrReaction" and Indigo service "convert" API:
chemical/x-rna-fasta, chemical/x-dna-fasta, chemical/x-peptide-fasta
The text was updated successfully, but these errors were encountered: