# Sugar Basic Tutorial

### BioBasket and BioSeq objects

You can read sequences with the `read()` function. The format can be automatically detected or specified. Currently, sugar has read and write support for fasta, stockholm, and a custom json format. Read support is provided for genbank format. Glob expressions and web resources can also be used. Additionally, all files readable by biopython can be read with the call `read(filename, fmt, tool='biopython')`. Calling `read()` without an argument returns example sequences.

In [1]:
from sugar import read
seqs = read()
print(seqs)

2 seqs in basket
AB047639  9678  ACCTGCCCCTAATAGGGGCGACACTCCGCCATGAATCACTCCCCTGTGA...  GC:58.26%
AB677533  9471  GCCCGCCCCCTGATGGGGGCGACACTCCGCCATGAATCACTCCCCTGTG...  GC:57.46%
  customize output with BioBasket.tostr() method


Sequences are stored in the `BioBasket` object. The `BioBasket` object behaves like a list, where each item is a single sequence. It also has some useful biomethods attached to it. You can write out sequences using the `write()` method. The format can be automatically detected based on the file extension. Sequences are stored in `BioSeq` objects, which behave like `str`, but again with some biomethods attached. Metadata is stored in the `meta` attribute.

In [2]:
print('Ids', seqs.ids)
seq1 = seqs[0]
seq2 = seqs.d['AB677533']  # Select sequence by id
print(f'First sequence {seq1.id} starts with {seq1[:10]}.')
print(f'\nMetadata:\n{seq1.meta}')
print('\nMetadata can be accessed with keys or as attributes:', seq1.meta.id, seq1.meta['id'])

Ids ['AB047639', 'AB677533']
First sequence AB047639 starts with ACCTGCCCCT.

Metadata:
      id: AB047639
    _fmt: genbank
_genbank: Attr(locus='AB047639, 9678, bp, RNA, linear, VRL, 12-NOV-2005', def...
features:
source   0+ 9_678  seqid=AB047639;_genbank=Attr(organism='Hepatitis C virus J...
   CDS 340+ 9_102  seqid=AB047639;_genbank=Attr(codon_start=1, product='polyp...

Metadata can be accessed with keys or as attributes: AB047639 AB047639


There are shortcuts for id and feature metadata: `seq1.id` and `seq1.fts`. You can get the first features of a given type using the `get()` method, select all fitting features with the `select()` method, e.g. `seq1.fts.select('cds')`. The `__getitem__` function is overloaded, so you can do things like `seqs[:, 10:20]`, `seq1['cds']`. Translation can be done with the `translate()` method.

In [3]:
print(seqs[:, 'cds'].translate())

2 seqs in basket
AB047639  3033  MSTNPKPQRKTKRNTNRRPEDVKFPGGGQIVGGVYLLPRRGPRLGVRTTRKTSERSQPRG...
AB677533  3014  MSTNPKPQRKTKRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRATRKTSERSQPRG...
  customize output with BioBasket.tostr() method


Other things to discover:
* The `sugar.web` module provides a simple Entrez client.
* The `sugar.data` module provides scoring matrices and translation tables.
* The `sugar` command provides some useful CLI tools. Run the tests with `sugar test`.
* You can convert a `BioBasket` object into the equivalent biopython object and vice versa. We plan to add support for other bio modules.

In [4]:
from sugar.web import Entrez
client = Entrez()
seq = client.get_seq('AB677533')

from sugar.data import submat
sm = submat('blosum62')