#### Function parameters
When a function is called, python evaluates each of the call's argument expressions. Then it assigns the first parameter name to the first argument's value, and so on.


In [4]:
def validation_base_sequence(base_sequence):
    seq = base_sequence.upper()
    return len(seq) == (seq.count('T') + seq.count('C') +
                        seq.count('G') + seq.count('A'))

seq = 'AAAT'
validation_base_sequence('tacgcatgatcg')

True

#### commments and Documantation
The # character tells python to ignore the rest of the line.

In [5]:
help(validation_base_sequence)

Help on function validation_base_sequence in module __main__:

validation_base_sequence(base_sequence)



In [11]:
# defining a function to compute GC content
def gc_content1(base_seq):
    """ return the percentage of G and C in seq"""
    seq = base_seq.upper()
    return (seq.count('G') + seq.count('C'))/len(seq)
seq50 = "AATCAGCATGACT"
seq65 = "ATCCCGCGCCATA"
seq45 = "ATCAGCATCAGACTACGATACACACACAC"
gc_content1(seq50)
gc_content1(seq65)
gc_content1(seq45)

0.4482758620689655

an assertion statement tests whether an expression is true of false, causing an error if it is false.
```
assert expression
assert expression1 expression2
```

In [17]:
# adding an assertion to the gc_content function
def gc_content2(base_seq):
    assert validation_base_sequence(base_seq),\
            'argument has invalid characters'
    seq = base_seq.upper()
    return (seq.count('G') + seq.count('C'))/len(seq)
seq63 = "ATCCCGCGCCATA"
seq44 = "ATCAGCATCAGACTACGATACACACACAC"
gc_content2(seq63)
gc_content2(seq44)

0.4482758620689655

#### default parameter values
python provides a way to assign a default value to the parameter that will be used if no explicit value is included in a call to the function.

In [82]:
def validation_base_sequence2(base_sequence, RNAflag):
    seq = base_sequence.upper()
    return len(seq) == (seq.count('U' if RNAflag else 'T') +
                        seq.count('C') +
                        seq.count('G') + 
                        seq.count('A'))
validation_base_sequence2('ATCG',False)
validation_base_sequence2('AUCG',False)
validation_base_sequence2('ATCG',True)
validation_base_sequence2('AUCG',False)

False

In [26]:
# adding an assertion and default parameter to the gc_content function
def gc_content3(base_seq,RNAflag):
    assert validation_base_sequence2(base_seq,RNAflag),\
            'argument has invalid characters'
    seq = base_seq.upper()
    return (seq.count('G') + seq.count('C'))/len(seq)

gc_content3('ACCCUUUGG',True)
gc_content3('ACCCTTTGG',False)

0.5555555555555556

#### Using modules
python offers a large selection of optional types, functions, and methods. These are defined by *module* files placed in a *library* directory as part of python's installation.

**Importing**  A module's contents are brought into the interpreter's environment by an *import statement*.
```
import model_name
```
the module os provides an interface to the computer’s operating system.

selective import
```
form modulename import name1, name2, ...
from modulename import *
```

One useful module is *arndom*, which provides various ways to generate random numbers.

In [55]:
import os
os
os.getcwd()
os.getlogin()
import sys
sys
import random
random.gauss(2,6)
from random import randint
randint(0,3)
'UACG'[0]

'U'

In [67]:
# generating a random codon or sequence
from random import randint
def random_base(RNAflag = False):
    return('UCAG' if RNAflag else 'TACG')[randint(0,3)]
def random_condon(RNAflag = False):
    return random_base(RNAflag) + random_base(RNAflag) + random_base(RNAflag)
random_base()
random_condon(1)

'UCG'

下面这个函数的功能是：生成输入序列长度以内的随机数，得到这个随机位置的碱基，讲这个碱基从ACTG四个碱基中去除掉，然后从剩余的三个碱基中随机取一个碱基，再将3者拼接起来，返回单位点突变的序列

In [78]:
# shows a function that simulates single-base mutation
from random import randint
def replace_base_randomly_using_names(base_seq):
    """Return a sequence with the base at a randomly selected position of base_seq replaced by a base chosen randomly from the three bases that are not at that position"""
    position = randint(0, len(base_seq)-1)
    base = base_seq[position]
    bases = 'TACG'
    bases.replace(base,'')      # replaced with empty string
    newbase = bases[randint(0,2)]
    beginning = base_seq[0:position]
    end = base_seq[position+1:]
    return beginning + newbase + end

replace_base_randomly_using_names("ACGTACGT")

'ATGTACGT'

In [79]:
# a better version of the same function
from random import randint
def replace_base_randomly_using_names2(base_seq):
    position = randint(0,len(base_seq)-1)
    return (base_seq[0:position] +
           'ATCG'.replace(base_seq[position],'')[randint(0,2)]
           +base_seq[position+1:])

replace_base_randomly_using_names2("ACGT")

'AAGT'

#### python files

In [87]:
def validation_base_sequence2(base_sequence, RNAflag):
    seq = base_sequence.upper()
    return len(seq) == (seq.count('U' if RNAflag else 'T') +
                        seq.count('C') +
                        seq.count('G') + 
                        seq.count('A'))

def gc_content3(base_seq,RNAflag):
    assert validation_base_sequence2(base_seq,RNAflag),\
            'argument has invalid characters'
    seq = base_seq.upper()
    return (seq.count('G') + seq.count('C'))/len(seq)

def recognition_site(base_seq, recognition_seq):
    return base_seq.find(recognition_seq)

def test():
    #assert validation_base_sequence2("ACUG")
    #assert validation_base_sequence2("")
    #assert not validation_base_sequence2("ACTG")
    
    assert validation_base_sequence2("ACTG", False)
    #assert not validation_base_sequence2("ACUG", True)
    assert validation_base_sequence2("ACUG",True)
    
    assert .5 == gc_content3("ACTG",False)
    assert 1.0 == gc_content3("CCGG",False)
    assert 0.25 == gc_content3("ACTT",False)
    
    print("All test passed")
    
test()    

All test passed
