![rmotr](https://user-images.githubusercontent.com/7065401/52071918-bda15380-2562-11e9-828c-7f95297e4a82.png)
<hr style="margin-bottom: 40px;">

![filo_virion](https://user-images.githubusercontent.com/22747792/73687685-7111bc00-467f-11ea-906e-e16132529840.png)

# Python for Genomics 
## Section 4.1: SeqFeature Objects, Part 1 

![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)

### To recap we've learned about:

* Seq objects
* SeqRecord objects, which hold:
    * Seq Objects, 
    * sequence name,
    * sequence description,
    * ID, etc.

We also an attribute called 'features'.

In this first part on SeqFeatures, we'll cover the basics.

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)




Let's investigate:


In [8]:
from Bio import SeqIO

ebola_gb = SeqIO.read('data/KM034562.gb', 'genbank')
ebola_gb

SeqRecord(seq=Seq('CGGACACACAAAAAGAAAGAAGAATTTTTAGGATCTTTTGTGTGCGAATAACTA...GTC', IUPACAmbiguousDNA()), id='KM034562.1', name='KM034562', description='Zaire ebolavirus isolate Ebola virus/H.sapiens-wt/SLE/2014/Makona-G3686.1, complete genome', dbxrefs=['BioProject:PRJNA257197', 'BioSample:SAMN02951978'])

In [9]:
ebola_gb.features

[SeqFeature(FeatureLocation(ExactPosition(0), ExactPosition(18957), strand=1), type='source'),
 SeqFeature(FeatureLocation(ExactPosition(55), ExactPosition(3026), strand=1), type='gene'),
 SeqFeature(FeatureLocation(ExactPosition(55), ExactPosition(3026), strand=1), type='mRNA'),
 SeqFeature(FeatureLocation(ExactPosition(55), ExactPosition(67), strand=1), type='regulatory'),
 SeqFeature(FeatureLocation(ExactPosition(469), ExactPosition(2689), strand=1), type='CDS'),
 SeqFeature(FeatureLocation(ExactPosition(3014), ExactPosition(3026), strand=1), type='regulatory'),
 SeqFeature(FeatureLocation(ExactPosition(3031), ExactPosition(4407), strand=1), type='gene'),
 SeqFeature(FeatureLocation(ExactPosition(3031), ExactPosition(4407), strand=1), type='mRNA'),
 SeqFeature(FeatureLocation(ExactPosition(3031), ExactPosition(3043), strand=1), type='regulatory'),
 SeqFeature(FeatureLocation(ExactPosition(3128), ExactPosition(4151), strand=1), type='CDS'),
 SeqFeature(FeatureLocation(ExactPosition(4

### The `.features` attribute holds a list of SeqFeature objects.


Here is what the features looks like if you were to see it in Genbank:
(this is the an Ebola Zaire ref seq)

In [1]:
from IPython.display import IFrame
url = "https://www.ncbi.nlm.nih.gov/nuccore/KM034562"
IFrame(url, 800, 400)


## What is a SeqFeature Object?

👉 SeqFeatures are objects that contain information we know about a sequence (e.g., a gene, it relative position in the sequence, and name) 

Since the features attribute is a list, we could access a certain feature based on its numerical position:

In [10]:
my_feat = ebola_gb.features[9]
my_feat

SeqFeature(FeatureLocation(ExactPosition(3128), ExactPosition(4151), strand=1), type='CDS')

But that's not very helpful, because feature contents can be arbitrary.

So we'll use the `dir()` again to explore a single SeqFeature object:

In [11]:
dir(my_feat)

['__bool__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__nonzero__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_flip',
 '_get_location_operator',
 '_get_ref',
 '_get_ref_db',
 '_get_strand',
 '_set_location_operator',
 '_set_ref',
 '_set_ref_db',
 '_set_strand',
 '_shift',
 'extract',
 'id',
 'location',
 'location_operator',
 'qualifiers',
 'ref',
 'ref_db',
 'strand',
 'translate',
 'type']

In [12]:
my_feat.location

FeatureLocation(ExactPosition(3128), ExactPosition(4151), strand=1)

In [13]:
my_feat.type

'CDS'

In [14]:
my_feat.id

'<unknown id>'

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## How do get the sequence contained in a SeqFeature object?

There is a handy `.extract()` method that extracts sequence from a specified feature. 

All we need is to specify the parent sequence (a seq obj) as the argument.

Say we've identified some feature and we need to grab the sequence associated with it.


In [16]:
#parent sequence

ebola_gb.seq

Seq('CGGACACACAAAAAGAAAGAAGAATTTTTAGGATCTTTTGTGTGCGAATAACTA...GTC', IUPACAmbiguousDNA())

In [17]:
my_feat.extract(ebola_gb.seq)

Seq('ATGACAACTAGAACAAAGGGCAGGGGCCATACTGTGGCCACGACTCAAAACGAC...TGA', IUPACAmbiguousDNA())

In [18]:
my_feat.translate(ebola_gb.seq)

Seq('MTTRTKGRGHTVATTQNDRMPGPELSGWISEQLMTGRIPVNDIFCDIENNPGLC...LKI', ExtendedIUPACProtein())