# Introduction

In this session we are going to look at how ```python``` handles different types of objects. What are objects? Well, in a nutshell the data that we let python loose on. Objects can be text, numbers or more exotic things as we'll see. When we write programs we generally want to manipulate some **objects**. In the python world objects have different **types**. The type of an object allows python to decide what can be done with that object. So in this section we'll look at some of the different types of objects; specifically we'll look at text (type ```string``` i.e. strings of characters), numbers (types ```int```, ```long```, ```float``` and ```complex```) and Booleans (type ```bool```). We'll also look at some basic **operators** which are the things that indicate a particular operation we'd like to carry out (e.g. + is the operator for addition, just as it is in maths).

There are a few differences between 2.x and 3.x versions of python here but they's mostly minimal and so if you keep this up and transition to 3.x python in the future you should be able to adjust without any great issues.

# Text (strings)

In the world of computers sequences of characters (such as you're reading now) are generally referred to as **strings**. In python strings are indicated by the use of quotes. You can use single or double quotes and sometimes you may have good reason to use one and not the other. They are often interchangeable though.

In [1]:
my_name = 'Iain J Gallagher'
your_name = 'Sheba von Strachling'

print my_name
print your_name

Iain J Gallagher
Sheba von Strachling


In the code above each line is called a [```statement```](http://en.wikipedia.org/wiki/Statement_%28computer_science%29) in programmer speak. A statement is the smallest unit of code that expresses some action for the computer to take. So what actions were we dealing with above?

The first two lines were variable assignment. This just means that we asked python to put some values somewhere in the computer memory and label these values. There's more on this below.

Specifically ```my_name = 'Iain J Gallagher'``` takes the value 'Iain J Gallagher', puts it in the computer memory and labels it with ```my_name```. In the second two lines we asked python to ```print``` our variables to the screen so we could see them. We'll talk more about variables and variable assignment shortly but for now let's get back to strings.

If you start a string with a single or double quote you must end (or **close**) that string with the same kind of quote. This is enforced because python might have problems deciding when strings ended. For example:

In [2]:
my_annoying_string = 'Iain's pet monkey, Fang.'
print my_annoying_string

SyntaxError: invalid syntax (<ipython-input-2-718ba5a3cfc1>, line 1)

That didn't go so well. Firstly let's notice that python has told us there is a screw-up here. The ```SyntaxError``` message is python's way of asking for clarification. We've issued a command that python doesn't understand - that's the specific meaning of ```SyntaxError```. Computers are very fast but very literal - idiot savants. We have to be absolutely explicit about what we want or the computer (and in this case python which is our proxy for direct communication with the computer) won't do what we ask. I'm sure you've experienced this with other programmes (Word, stop moving figures around in my document you insolent cur!). So don't be downhearted by error messages like this - they're a cry for help.

So what's the problem in the example above? It's to do with the pattern of the quotes. We've opened a string with a single quote, used a single quote as an apostrophe and then closed the string with a single quote. However that's not the way python sees it. As far as python's concerned we've opened a string, closed it at the 'n' at the end of 'Iain' and then typed some random characters from the 's' onwards.

We can sort this problem out by using double quotes i.e. ".

In [3]:
my_improved_string = "Iain's pet monkey, Fang."
print my_improved_string

Iain's pet monkey, Fang.


In this example we have deliberately opened our string with double quotes because we know there's going to be a single quote in there serving as an apostrophe. Sorted.

Strings indicated by a single set of quotes cannot span lines as the following demonstrates.

In [4]:
spanner = "She cannae
take it captain!"
print spanner

SyntaxError: EOL while scanning string literal (<ipython-input-4-e29d29a6679c>, line 1)

Once again python asks for some clarification. Recall that when we discussed why you can't write scripts for python in Word one of the reasons was the insertion of a bunch of invisible formatting and layout characters. Well, there's a special character combination to indicate 'new line please': ```\n``` (backslash n). In the ```SyntaxError``` above python complains about ```EOL``` in the string. ```EOL``` stands for End Of Line and it's this ```\n``` combination. It's more commonly known as the 'newline'.

So how do we allow strings to span lines? There are two methods. Firstly we could insert a newline character into our string where we wanted the line span. Secondly we can use triple quotes. This can be triple single or triple double (example below) quotes. Let's see both of these.

In [1]:
scotty_says = "She cannae\ntake it captain!"
print scotty_says

She cannae
take it captain!


In [2]:
scotty_says = """She cannae
take it captain!"""
print scotty_says

She cannae
take it captain!


The other common string spacing code you might use is ```\t``` (backslash t) which inserts tab stops and is really handy for creating delimited files for writing data into spreadsheets etc. More on that later in the course though.

## Strings as a "type"

Strings are only one of several **object types** python can handle. In the above examples we often assigned a string to a variable (e.g. ```scotty_says = "She cannae\ntake it captain!"```). Variable assignment in python is done using the ```=``` sign. Note that ```=``` does not mean 'equal to' (although you might think of it that way). In python you might like to think of the ```=``` sign as meaning 'points to'.

Once you've written your program you might lose track of which variables are of which type so it can be useful to check variable types. It's also a really useful thing to be able to do to stop other people using your program from entering the wrong kind of data (e.g. trying to enter a number where a string should go). So how do we check the type of a variable? Simple - we use the type command. 

In [3]:
scotty_says = "She cannae\ntake it captain!"
type(scotty_says)

str

The output, ```str```, tells us that ```spanner``` is a string and once we know this we can make assumptions about the kinds of things python will let us do to the variable ```spanner``` because ```spanner``` is an object of type ```string```.

## Finding out what we can do - methods and functions

We know that python can handle several different object types and we'll deal with numbers shortly. But what do we mean by 'handle'? Well there are a bunch of 'default' things that are built into python for manipulating these specific objects (e.g. strings, integers, floats etc). How do we find out what these are? We can use the ```dir``` command.

In [8]:
dir(spanner)

['__add__', '__class__', '__contains__', '__delattr__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__getslice__', '__gt__', '__hash__', '__init__', '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '_formatter_field_name_split', '_formatter_parser', 'capitalize', 'center', 'count', 'decode', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'index', 'isalnum', 'isalpha', 'isdigit', 'islower', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']


For the moment you can ignore the items that begin with the double underscores at the beginning of this list (e.g. ```__add__```). Let's start instead with ```capitalize```. This is an example of a **method** that acts on a string object to change that string in some way. What do you think it does? 

There is a specific 'dot notation' to use methods like this on strings (or use methods on other objects).

In [9]:
lcs = "i'm a lowercase string."
lcs.capitalize()

"I'm a lowercase string."

So the ```captialize``` method takes a string (note that the string is the whole sentence) and ```capitalize```s (note US spelling) the first letter of that string. Try it out now using your name as an example.

In [10]:
my_name = 'iain'
my_name.capitalize()

'Iain'

As you can see above there are a bunch of methods available for manipulating strings. How do you get help on what they do? The ```help``` function is your first friend.

In [11]:
help(my_name.capitalize)

Help on built-in function capitalize:

capitalize(...)
    S.capitalize() -> string
    
    Return a copy of the string S with only its first character
    capitalized.



This is fine, but it's a bit terse and this can be a problem for beginners. Briefly breaking down the information given above, the ```S.capitalize()``` is the important bit. It tells you the syntax to use (note those empty parentheses) and that the output or **return value** is a string (```-> string```). Finally there's a brief textual description of the function. This one is pretty obvious.

Your next best friend is the Python [documentation](https://www.python.org/doc/versions/). This can also be rather terse. There are many web resources though and if the documentation doesn't clear things up then Google should be your next port of call. Finally if these don't clear things up you can simply try applying the method to the object you're dealing with and see what happens.

As well as the object specific methods for manipulating data there are also a set of [default](https://docs.python.org/2/library/functions.html) **functions** which can be applied to objects of different types. Rather confusingly (but you'll get used to it) these have a different syntax from the object specific methods. We don't use the dot notation; instead we enclose the variable we want to operate on in parentheses after **calling** the function. An example will help.

The ```len``` function returns the length of a variable. For strings this is the number of individual characters in the string. This includes spaces (called *whitespace* in programmer speak).

In [12]:
my_string = 'a'
len(my_string)

1

In [13]:
my_other_string = 'Iain J'
len(my_other_string)

6

Notice that even though 'Iain J' only contains 5 letters the string is 6 characters long because of the space between 'Iain' and 'J'. Getting help on these functions is simple enough (see below). The input to ```len``` (what goes between the parentheses) is referred to as an **argument**. The  ```len``` function can act on objects of types other than ```str```- as we'll see later.

In [14]:
help(len)

Help on built-in function len in module __builtin__:

len(...)
    len(object) -> integer
    
    Return the number of items of a sequence or mapping.



In the documentation above the ```len(...)``` is your clue to the syntax. The ```(...)``` means "you might need to put some argument(s) here, in these parentheses". The ```-> integer``` part means that the output of the function is an integer.

# Slicing strings up

Right at the beginning of this unit we described strings as sequences of characters. This suggests that we can get at either the individual characters of a string or at substrings i.e. just some of the characters. Quick aside - variables that can be sub-divided are referred to as *non-scalar* variables. String **slicing** provides the mechanism by which we can sub-divide strings. Each individual character in a string can be referred to by its place in the string. We count the individual characters starting at zero. Why do we start at zero, that's silly?! Well there are good reasons you can read about [here](http://gestaltrevision.be/wiki/python/zerobased). For now best just accept it and move on. Many other programming languages also start counting at zero but some (e.g. [R](http://cran.r-project.org/) - Iain's other weapon of choice - start at 1). The take home message is that we start counting the letters in a string (or values in any other object) at zero in python.

So in 'Iain J':

* I = 0
* a = 1
* i = 2
* n = 3
* the space = 4
* J = 5

Note that although there are 6 characters in total the indexing by number is 6-1 = 5. That's true of all of the datatypes that are countable sequences in so called zero-indexed languages like python; if the countable has *n* elements the indices are *n-1*. This will become more apparent in a moment.

So how do we use this to our advantage to extract bits of strings? In python the syntax for accessing any sequence by index is the square bracket. Recall that the first character in our string is numbered ```0```. We simply put the index of the character we require into square brackets after our string or variable

In [15]:
my_other_string = 'Iain J'
my_other_string[0]

'I'

In [16]:
'This is a string not a variable'[3]

's'

In the second example above we get back 's' because 
* 'T' = 0
* 'h' = 1
* 'i' = 2
* 's' = 3

All fine and well but what if we want more than one character? Well python has that covered as well. We can index a range by specifying two numbers separated by a colon ':'.

In [17]:
my_other_string[1:4]

'ain'

In [18]:
'This is another string not a variable'[8:14]

'anothe'

The second example is informative. We wanted to isolate the word 'another' so we *sliced* the string from character 8 to character 14 (count the characters to confirm the indexing here - remember spaces count!) but we didn't get the whole word back. This is because the *slice* **includes the first value but not the last**. This will initially annoy you but again, you'll get used to it. 

So what we needed to do was ```'This is another string not a variable'[8:15]```. Let's see:

In [19]:
'This is another string not a variable'[8:15]

'another'

There is some shorthand for slicing to the end of a string (or any other *sequence*). We specify the position we want to start the slice at and then we leave the second number out. Python interprets this as 'go to the end'.

In [20]:
'another string'[2:]

'other string'

We didn't specify an endpoint so python just gave us back everything from our start point ([2]) onwards. 

Similarly we can go from the start to a certain point.

In [21]:
'other string'[:5]

'other'

By omitting the first number python has assumed we want to go from the first (zeroth) element (letter in this case) to - but not including - the [fifth element](http://www.imdb.com/title/tt0119116/). Note if we had typed ```'other string'[:6]``` we'd have got the space between other and string back as well (try it and see). That's enough on slicing just now but we'll return to this.

##String formatting

A really handy techinque for providing informative text outputs from your programmes is called [string formatting](http://www.learnpython.org/en/String_Formatting). String formatting uses a combination of the % symbol and a letter (and possibly numbers) in print statments. This combination of symbols serves as a placeholder for some variable value (see below) that gets inserted into the printed text. There are three commonly used combinations:

* ```%s``` a place holder for a string variable
* ```%d``` a place holder for an integer variable
* ```%f``` a place holder for a float variable

You choose the value to insert by placing it at the end of the ```print``` statement, outside your quotes and after a % sign.

In [2]:
str_var = 'preposterous pixies'
int_var = 3
float_var = 4.678286359464263

print 'My mind is full of %s.' % str_var
print 'A use of string formatting is to insert whole numbers (e.g. %d) and floats (e.g. %.2f).' % (int_var, float_var)

My mind is full of preposterous pixies.
A use of string formatting is to insert whole numbers (e.g. 3) and floats (e.g. 4.68).


As you can see from the above if we have to insert more than one value we provide the formatting operator (```%```) with a comma separated list of variables in parentheses. The variables are inserted in the order they are given in the parentheses. Also the value of floating point variables can be rounded by inserting a dot and the desired rounding level between the ```%``` and the letter f. In the example above we rounded our float to 2 decimal places. Try changing the ```%.2f``` to ```%.5f``` or ```%.1f``` (or removing the ```%.2f``` altogether) to see the effect.

## Putting it together!

Create a variable called ```my_seq``` and assign it to the string 'acctgtagctgaatcgtgtgttcgatcgat'. If you use the ```dir``` function on ```my_seq``` you'll see that there are methods called ```upper``` and ```count```.

Use the ```upper``` method to print an uppercase version of ```my_seq```. Use the ```count``` method to print the number of cytosine residues and, separately the number of guanine residues. Use the ```help``` function to see the syntax for these methods if you need to. 

Use string formatting to print an informative message about these numbers to the user.

Bonus: Can you subset this string to print out the substring 'GAAT'?

In [4]:
# solution remove from student doc
my_seq = 'acctgtagctgaatcgtgtgttcgatcgat'
my_seq = my_seq.upper()
cy = my_seq.count('C')
gu = my_seq.count('G')
print cy, gu
my_seq[10:14]

6 8


'GAAT'

## Summary

In this part we have introduced the idea of variables and types. We have specifically introduced *strings* as a data type made up of a sequence of characters. Some *methods* only act on strings. We apply methods using *dot notation*. 

e.g. ```my_string.upper()```

These methods can be listed using the ```dir()``` function e.g. ```dir(my_string)``` so you can see what's available for a particular data type.

We can get help on these using e.g. ```help(my_string.strip)``` where we use the ```help``` function putting the specific method we want help on in parentheses after ```help```.

Other *functions* in python can act on many object types. ```len``` was the example we showed. These are applied by placing the object you want the function to act on in parentheses after the function name e.g. ```len(my_string)```.

Again you can get help on these using the ```help``` function e.g. ```help(len)```.

Since strings are sequences python can get at the individual components. Counting the individual components starts at zero and all of the sequence data types in python (as we'll see) are indexed using a square bracket notation (e.g. ```my_string[1]```). Ranges can be defined for *slicing* using a colon. This slicing is inclusive at the lower bound but not at the higher bound so ```my_string[0:3]``` will return elements 0, 1 and 2 of the ```my_string``` variable. This can be further shortened to ```my_string[:3]``` if we want to go from the start whilst ```my_string[3:]``` will go from element [3] to the end of the sequence.

We saw how string formatting can be used to insert the value of variables into printed output to provide information for the user.

# Numbers

We have described how strings are one of several object types that python can handle. Python wouldn't be much use if it could only handle text and not numbers. Well, lucky us! Not only can python handle numbers it divides numbers into different types so we have exquisite control over our grand World domination schemes.

In the python view of the universe numbers are divided into:  

* ```int``` (signed integers): often called just integers or ints, are positive or negative whole numbers with no decimal point.

* ```long``` (long integers ): or longs, are integers of (nearly) unlimited size, written like integers and followed by an uppercase or lowercase L.

* ```float``` (floating point real values) : or floats, represent real numbers (i.e. from the set of real numbers - $\mathbb{R}$) and are written with a decimal point dividing the integer and fractional parts. Floats may also be in scientific notation, with E or e indicating the power of 10 (2.5e2 = 2.5 x 10$^2$ = 250).

* ```complex``` (complex numbers) : are of the form a + bJ, where a and b are floats and J (or j) represents the square root of -1 (which is an imaginary number). a is the real part of the number, and b is the imaginary part. We won't be using these but they're useful for solving differential equations.

The reasons for these subdivisions are to do with the efficient allocation of computer memory assigned to different data types. We'll be using only the ```int``` and ```float``` types.

One of the first questions I expect you have is 'Can I do simple arithmetic thus turning my expensive computer into a cheap calculator?'. Well, yes you can! The main arithmetic operations are:

* Addition ```+```
* Subtraction ```-```
* Multiplication ```*```
* Division ```/```

In [22]:
x = 4
y = 3

print x+y
print x-y
print x*y
print x/y


7
1
12
1


Eh? ```4/3=1```? What? 

Well, you've come across a feature of python 2.x that many people consider a bug (i.e. a feature that doesn't do what's expected). In python 2.x division of an ```int``` by an ```int``` results in what's called *floor division*  - the quotient (the result of the division) is rounded down and any remainder thrown away. What kind of tomfoolery is this? No wonder we can't get back to the Moon!

In this case python has allocated both the variables ```x``` and ```y``` the type ```int``` because thay have no fractional component.

In [23]:
print type(x)
print type(y)

<type 'int'>
<type 'int'>


If both numbers are ```int``` then floor division takes place. This has been changed in python 3.x but how do we deal with it here using python 2.x? Actually it's easy (if inconvenient) - we just convert one (or both) of our ```int``` variables to a ```float```. If we put a ```float``` into the division the quotient is a ```float```. We can do the conversion simply as seen below.

In [24]:
print float(x)/y

1.33333333333


Well, that's more sensible!

There is another way round this. If you're using python 2.x and you put the following line in at the start of any python scripts (we'll get to that shortly):

``` from  future import division```

you can divide ```int``` by ```int``` and get back ```float```.

We'll talk more about lines like this later in the course.

'What are the other arithmetic functions?' I hear you ask. In addition to those detailed above we also have:

* Exponent ```**```
* Modulo ```%```
* Floor division ```//```

Exponent is one you are already likely familiar with. A given number is raised to a power.

In [25]:
x = 2
print x**2
print x**3

4
8


The modulus (or modulo - never quite sure) returns the remainder. This is useful for e.g. checking if numbers are odd or even. If the number is even and you use 2 in modulo the remainder will always be zero. Later we'll see how we can act on conditions like this but for now an example will suffice.

In [26]:
x = 2
print x%2
print (x+1)%2

0
1


Note the use of brackets above to maintain the correct order of operations for arithemtic. Python obeys the same order of operations you learned in school  - BODMAS or PEDMAS (hopefully these mean something to you!) (B)rackets, (O)rder, (D)ivision, (M)ultiplication, (A)ddition, (S)ubtraction *or* (P)arentheses, (E)xponents, (D)ivision, (M)ultiplication, (A)ddition, (S)ubtraction.

We've already talked about floor division (```//```) but just to recap this is when the digits after the decimal point are removed from the quotient after division. Notice in the example below that even though the variable ```y``` is a ```float``` (has a decimal component) the floor division operation still strips out the decimal part of the quotient.

In [27]:
x = 9
y = 4.0
print x//y
print x/y

2.0
2.25


Just as we saw with strings there are a bunch of methods that are specific to numbers.

In [4]:
x = 1
dir(x)

['__abs__',
 '__add__',
 '__and__',
 '__class__',
 '__cmp__',
 '__coerce__',
 '__delattr__',
 '__div__',
 '__divmod__',
 '__doc__',
 '__float__',
 '__floordiv__',
 '__format__',
 '__getattribute__',
 '__getnewargs__',
 '__hash__',
 '__hex__',
 '__index__',
 '__init__',
 '__int__',
 '__invert__',
 '__long__',
 '__lshift__',
 '__mod__',
 '__mul__',
 '__neg__',
 '__new__',
 '__nonzero__',
 '__oct__',
 '__or__',
 '__pos__',
 '__pow__',
 '__radd__',
 '__rand__',
 '__rdiv__',
 '__rdivmod__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rfloordiv__',
 '__rlshift__',
 '__rmod__',
 '__rmul__',
 '__ror__',
 '__rpow__',
 '__rrshift__',
 '__rshift__',
 '__rsub__',
 '__rtruediv__',
 '__rxor__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__sub__',
 '__subclasshook__',
 '__truediv__',
 '__trunc__',
 '__xor__',
 'bit_length',
 'conjugate',
 'denominator',
 'imag',
 'numerator',
 'real']

As you can see most of these begin with double underscores (and we won't go into those just now). The example above was for the data of type ```int```. See for yourself what you get using ```dir``` on a ```float```, ```long``` or ```complex``` type.

## Putting it together

The GC percentage is a common statistic generated for DNA sequences. 

Apply the ```count``` method and ```len``` function to your DNA character string - ```my_seq``` ( 'acctgtagctgaatcgtgtgttcgatcgat') - to calculate the GC percentage. Assign this percentage to a variable ```gc_perc``` and print the value out using string formatting. Assign the AT percentage to a variable ```at_perc``` and print this out as well.

Bonus - Using the ```len``` function and the modulo operator decide whether ```my_seq``` has an odd or even number of DNA bases.

In [7]:
# solution remove from student doc
my_seq = 'acctgtagctgaatcgtgtgttcgatcgat'

cyt = my_seq.count('c')
gn = my_seq.count('g')

gc_perc = cyt+gn/float(100)*100
at_perc = 100-gc_perc

print 'The GC is %.2f. The AT is %.2f' % (gc_perc, at_perc)

seq_l = len(my_seq)
if seq_l%2==0:
    print 'The sequence has even length.'
else:
    print 'The sequence has odd length.'

The GC is 14.00. The AT is 86.00
The sequence has even length.


## Summary

In this section we've learned about how python handles numbers. Python characterises numbers as ```int```, ```long```, ```float``` and ```complex```.  You can carry out simple arithmetic with numbers using the standard arithmetic operators $+, -, *, /$. In addition python provides an operator for exponents ($**$) and for the modulo (// - return only the quotient from a division). Complex arithmetic procedures are best broken up using brackets to ensure the correct order of operations. Python obeys the standard order of operations for arithmetic. Just like text there are some ```methods``` that are specific to numbers and each number type has specific ```methods```. These can be accessed using the ```dir``` function.

# Use of the ```+``` and ```*``` operators on different data types

The arithmetic operators ```+``` and ```*``` can be used on both strings and numbers. 

In [28]:
x = 2
y = 3
print x + y
print x * y

5
6


In [29]:
my_string = 'Merry '
my_other_string = 'Christmas'

print my_string + my_other_string
print my_string * x

Merry Christmas
Merry Merry 


Here we can see that is we 'multiply' a string by a number python interprets that as a request for 'x' copies of the string. But what happens is we try to add a string to a number?

In [30]:
my_string + x

TypeError: cannot concatenate 'str' and 'int' objects

Python chokes on this. It's not sensible to add strings to numbers even though you might have wanted to add the 2 (as a string) to the string 'Merry '. Notice again that the *Traceback* or error we get here is informative (if you know what you're looking for). Here we have a ```TypeError``` - the object types being operated on are incompatible with the desired operation. Here, specifically, we can't carry out addition on a string and a number. You can't add or join (*concatenate* - see traceback above) the string 'Merry' and integer 2 - that has no meaning.


So how would print out the string with the number in it? We've already seen that we can join two strings so we need to convert our integer to a string. You can convert between a numeric and textual representation of a number using the ```str``` function.

In [31]:
print my_string + str(x)

Merry 2


Similarly if you have a textual representation of a number (and we'll see this shortly) you can convert that string to the number type you want.

In [32]:
my_number_string = '65'
print type(my_number_string) # is a string
print float(my_number_string)
print type(float(my_number_string)) # has been converted to a float

<type 'str'>
65.0
<type 'float'>


## Comments

You might wonder what the # symbol used in the code above is for. This is a comment and it's one way to add reminders or brief explanations to your python code. This is very useful when you have to go back to code you wrote a few days, weeks, months ago and try and remember what it does or how it works. Anything after the # sign on the same line is ignored by python. Comments are also really useful for those who follow you on working on a project or with whom you share your code.

## Summary

In this section we've seen how the ```+``` and ```*``` operators can act on both strings and numbers and how to convert a number to a string and back again. Finally we've also seen how to put comments in our code to improve readability for ourselves and others.

# Variables

So far in this document I've been referring to variables without any explanation. So what are variables and why are they useful? In programming variables have the same purpose they serve in algebra - they are placeholders for some value. That value may stay constant or it may change. The use of the variable means we can refer to the variable and not have to independently keep track of the exact value. Variables have three important attributes - name, type and value.

The *name* of the variable is the label we, the programmers, give it. It's important to give your variables informative names. If you've written a program and left the code for a while it's really difficult to come back to it and remember what a bunch of variables called ```a```, ```a1```, ```r```, ```t1``` etc are accounting for or how they're being changed. Imagine you're writing a small utility program to convert degrees Celsius to degrees Farenheit (and you will be soon). Far better to name your variables e.g. ```deg_c``` and ```deg_f``` than ```value1``` and ```value2```. The former variable names immediately give you some idea of what those variables refer to. The latter names are pretty anonymous.

The second attribute of a variable is the *type* and we have discussed that above for ```str```, ```int``` and ```float``` (whilst touching on ```long``` and ```complex```). There is another basic type in python and that is the ```bool``` or Boolean value. We'll talk about those in a future unit.

Finally we have the *value* itself. This is the information the variable points to or contains (technically the variable points to a specific place in computer memory). The value and the type are closely related and a variable of a certain type can only contain certain values. However as the program runs the variable may change type (e.g. from a ```str``` to an ```int``` and you'll write such programs as we go on).

One final attribute of variables worth touching on is *scope*. This is, in some sense, where the variable can be seen from by python. We'll talk more about this when we talk about writing our own functions (instead of relying on the in-built python ones). For now you can let the idea of scope percolate in your brain.

So how do we link a variable name to its value and type? Well, we've already been doing that with the '=' sign. Once we've assigned a piece of information a name python automatically assigns the variable type for us - which is nice! Let's break down a variable assignment.

In [6]:
my_float = 3.14
print my_float
print type(my_float)

3.14
<type 'float'>


The assignment of variable to value takes place in the ```my_float = 3.14``` statement. The name of the variable is on the left hand side, the = sign links the left hand side and the right hand side. The value we want our variable pointing to is on the right hand side. 

What's happening under the hood? Well essentially we're asking python to take the ```float``` 3.14, find a location in the computer memory, put the value 3.14 there and mark that location with the label ```my_float```. Python recognises 3.14 as a float. In some languages you have to tell the computer up front what kind of variable you're dealing with.

This marking a location for each variable can cause some problems until you are used to it (and even afterwards). If you assign a variable, then assign another variable to the first variable both the labels point to the same place.

In [34]:
y = 7
x = y
print y
print x

7
7


In the above example ```y``` gets the value 7 and x is assigned the value of y, i.e. x = y. 

If your intention was that these variables should change together you'll be surprised. If you change ```x``` then ```y``` doesn't change - even though you set ```x = y```. Here there aren't two 7s in memory, just one 7 with two markers (or labels). If we change the value of one label it doesn't change the other label - it still points to where it was told to originally.

In [35]:
x = 8
print y
print x

7
8


In the re-assignment of ```x``` to 8 you haven't changed ```y``` at all - even though you set ```x=y``` above. The take home message is **you did not create a copy of the variable ```y```**. Instead you just created another pointer (named ```x```) to the value 7 and when you change the label ```y``` the label ```x``` doesn't change.

When you re-assigned the 'name' x to the value 8 the link between ```x``` and 7 was been broken and a new link between ```x``` and 8 established. You should be aware of this issue but it shouldn't bother you too much. There's a nice (and very visual) explanation of the issue [here](http://foobarnbaz.com/2012/07/08/understanding-python-variables/) comparing python to C++ (a compiled language). Don't lose any sleep if this all goes above your head though!

In variable assignment it's important to note that the ```=``` does not mean 'equals'. You can think of the ```=``` sign as meaning 'points to'. Python has a different operator, ```==``` that tests whether two things are equal to each other.

## Summary

Variables are use to 'name' objects. This makes it easier to track specific objects in our program. You should name your variables with meaningful names - this stuff is tricky enough without making it harder! The ```=``` sign is used in python to assign a variable name to an object. The name points to a place in the computer memory. 

##Putting it together

Create a ```string```, an ```int``` and a ```float``` variable. Check the variable types using the ```type()``` function. We discussed above how you can multiply a ```string``` by an ```int``` and get back copies of the string. What happens if you multiply a ```string``` by a ```float```? What happens to the value of the ```float``` if you convert it to an ```int```? And what about if you convert it back to a ```float``` afterwards? 

In [3]:
# solution - remove from student doc
my_str = 'Iain'
my_int = 3
my_float = 7.668

print 'The variable types are %s, %s and %s' % (type(my_str), type(my_int), type(my_float))
my_str * my_float

The variable types are <type 'str'>, <type 'int'> and <type 'float'>


TypeError: can't multiply sequence by non-int of type 'float'

In [6]:
my_new_int = int(my_float)
print my_new_int
my_new_float = float(my_new_int)
print my_new_float

7
7.0


##Homework

The DNA sequence below is the [FASTA](http://en.wikipedia.org/wiki/FASTA_format) representation of the human [PPARG](http://www.ncbi.nlm.nih.gov/nuccore/NM_005037.5) gene. As you know the [translation](https://en.wikipedia.org/wiki/Translation_%28biology%29) from DNA/RNA to protein is brought about by a triplet code whereby triplets of nucleotide bases code for a particular amino acid in the protein sequence. Futhermore translation always starts at the 'ATG' codon which codes for the amino acid methionine (M). As you can see in the sequence below the first codon is not ATG. 

For this assignment you'll need to look up the help for the ```index()``` method for strings.

Cut and paste the sequence (only the sequence, not the FASTA header, i.e. paste from GGC onwards) into a string variable. Note that the sequence will span several lines so think about how to deal with that. Use the ```index``` method to find the position of the first ATG codon. Print out that position. Use python to extract the next three triplet codons and print these along with the amino acids in the protein sequence that these code for. You don't have to use python for this bit - there is a genetic code table [here](http://www.google.co.uk/imgres?imgurl=http://www.geek.com/wp-content/uploads/2013/12/genetic-code.jpg&imgrefurl=http://www.geek.com/science/scientists-discover-a-second-genetic-code-except-not-really-1579496/&h=1000&w=1172&tbnid=pvhXPeyS1CPIdM:&zoom=1&tbnh=156&tbnw=184&usg=__--K3O7YCY8zGaQbDTw400OngOhM=&docid=m1SWW2CQ_miS9M&itg=1). 

Hand in the code you used to find the first ATG and extract the next three codons. Include brief comments in the code explaining what it does.

In [13]:
# one potential solution

seq = '''GGCGCCCGCGCCCGCCCCCGCGCCGGGCCCGGCTCGGCCCGACCCGGCTCCGCCGCGGGCAGGCGGGGCC
CAGCGCACTCGGAGCCCGAGCCCGAGCCGCAGCCGCCGCCTGGGGCGCTTGGGTCGGCCTCGAGGACACC
GGAGAGGGGCGCCACGCCGCCGTGGCCGCAGAAATGACCATGGTTGACACAGAGATGCCATTCTGGCCCA
CCAACTTTGGGATCAGCTCCGTGGATCTCTCCGTAATGGAAGACCACTCCCACTCCTTTGATATCAAGCC
CTTCACTACTGTTGACTTCTCCAGCATTTCTACTCCACATTACGAAGACATTCCATTCACAAGAACAGAT
CCAGTGGTTGCAGATTACAAGTATGACCTGAAACTTCAAGAGTACCAAAGTGCAATCAAAGTGGAGCCTG
CATCTCCACCTTATTATTCTGAGAAGACTCAGCTCTACAATAAGCCTCATGAAGAGCCTTCCAACTCCCT
CATGGCAATTGAATGTCGTGTCTGTGGAGATAAAGCTTCTGGATTTCACTATGGAGTTCATGCTTGTGAA
GGATGCAAGGGTTTCTTCCGGAGAACAATCAGATTGAAGCTTATCTATGACAGATGTGATCTTAACTGTC
GGATCCACAAAAAAAGTAGAAATAAATGTCAGTACTGTCGGTTTCAGAAATGCCTTGCAGTGGGGATGTC
TCATAATGCCATCAGGTTTGGGCGGATGCCACAGGCCGAGAAGGAGAAGCTGTTGGCGGAGATCTCCAGT
GATATCGACCAGCTGAATCCAGAGTCCGCTGACCTCCGGGCCCTGGCAAAACATTTGTATGACTCATACA
TAAAGTCCTTCCCGCTGACCAAAGCAAAGGCGAGGGCGATCTTGACAGGAAAGACAACAGACAAATCACC
ATTCGTTATCTATGACATGAATTCCTTAATGATGGGAGAAGATAAAATCAAGTTCAAACACATCACCCCC
CTGCAGGAGCAGAGCAAAGAGGTGGCCATCCGCATCTTTCAGGGCTGCCAGTTTCGCTCCGTGGAGGCTG
TGCAGGAGATCACAGAGTATGCCAAAAGCATTCCTGGTTTTGTAAATCTTGACTTGAACGACCAAGTAAC
TCTCCTCAAATATGGAGTCCACGAGATCATTTACACAATGCTGGCCTCCTTGATGAATAAAGATGGGGTT
CTCATATCCGAGGGCCAAGGCTTCATGACAAGGGAGTTTCTAAAGAGCCTGCGAAAGCCTTTTGGTGACT
TTATGGAGCCCAAGTTTGAGTTTGCTGTGAAGTTCAATGCACTGGAATTAGATGACAGCGACTTGGCAAT
ATTTATTGCTGTCATTATTCTCAGTGGAGACCGCCCAGGTTTGCTGAATGTGAAGCCCATTGAAGACATT
CAAGACAACCTGCTACAAGCCCTGGAGCTCCAGCTGAAGCTGAACCACCCTGAGTCCTCACAGCTGTTTG
CCAAGCTGCTCCAGAAAATGACAGACCTCAGACAGATTGTCACGGAACACGTGCAGCTACTGCAGGTGAT
CAAGAAGACGGAGACAGACATGAGTCTTCACCCGCTCCTGCAGGAGATCTACAAGGACTTGTACTAGCAG
AGAGTCCTGAGCCACTGCCAACATTTCCCTTCTTCCAGTTGCACTATTCTGAGGGAAAATCTGACACCTA
AGAAATTTACTGTGAAAAAGCATTTTAAAAAGAAAAGGTTTTAGAATATGATCTATTTTATGCATATTGT
TTATAAAGACACATTTACAATTTACTTTTAATATTAAAAATTACCATATTATGAAATTGCTGATAGTA'''

# get ATG position
print 'The ATG codon begins at %d.' % seq.index('ATG')

# next 9 bases (3 codons)
three_codons = seq[178:187]
        
codon1 = three_codons[:3]   
codon2 = three_codons[3:6]
codon3 = three_codons[6:]        

# print stuff out
print 'The next three codons are: %s, %s, %s.' % (codon1, codon2, codon3)
print 'The amino acids are Thr, Met and Val.'

The ATG codon begins at 175.
The next three codons are: ACC, ATG, GTT.
The amino acids are Thr, Met and Val.
