### What are variables and why do we need them?

When you're writing a program to perform a task, you usually need to store some information in the computer's memory that can be read or updated again later in the program. 

If you to count the number of times the DNA sequence GAT occurs in the sequence below, most people would work their way along the sequence and add 1 to their count every time they encountered the sequence GAT. 

```
AGATGCTAGCTGAGATATCGATCG
```

You can think of the number you are storing in your head as a variable - somewhere to store the current count of GAT sequences. 

**Variables** are simply reserved computer memory locations for storing information and you can store many different types of data in them, such as numbers, text or lists of items. Each of these different types of data is stored in a different type of variable so that Python knows how much memory to allocate to it and what kind of operations to allow on the data. The table below shows a list of the available variable types and what kinds of data they can contain.

### Variable Types

All variables have a type - this indicates the kind of data that is stored in the variable, e.g. string, integer, float, list, dictionary.

Over the next few notebooks we will look at each of the variable types in the table below. 

|Variable type|Description|Example|
|:-------:|:-------|:-------|
|Int | Integer: a positive or negative whole number (e.g. 32 or -7)|243|
|Float | A floating point number - i.e. a number containing a decimal point (e.g. 4.89)|3.14159|
|String | Text - for example DNA sequences, gene names or accession numbers|'Gastropoda'|
|List | A data structure containing a sequence of items (of any variable type(s))|[2.17, 4, "GCATCGATCG"]|
|Dictionary | A data structure composed of pairs of unique keys and (non-necessarily unique) values| {"Scientific Name":"Mus musculus", "Common Name":"Mouse", "Genome size (Gb)":2.8}|

In Python, we can simply create and set a variable by typing its name then assigning a value to it using the `=` sign. You can decide what to call your variable as long as you don't use words already reserved in Python - there is advice on picking good variable names later in this section.

To create a variable called `sequence_length`, you would type:

In [3]:
sequence_length = 431

Since sequence_length is stored in memory, we can now recall it and print it to screen using the `print()` function:

In [4]:
sequence_length = 431
print(sequence_length)

431


You can change the value of a variable at any point - in the example below we first set `sequence_length` to 431, print the value that `sequence_length` is set when the `print()` function is first called, then we change the value to 104 and print the value stored in `sequence_length` variable again. 

In [5]:
sequence_length = 431
print(sequence_length)
sequence_length = 104
print(sequence_length)

431
104


Python also allows you to simultaneously assign the same value to multiple variables, or to assign multiple values to multiple variables at the same time. In the example code below, `num1`, `num2` and `num3` are all set to 0.

In [6]:
num1 = num2 = num3 = 0
print(num1)
print(num2)
print(num3)

0
0
0


Similarly, the code below simultaneously sets variables num1 and num2 to integers 7 and 14 respectively, and a string object containing the DNA sequence "ACGTAGCTATTCG" is assigned to the variable `seq1`.


In [7]:
num1, num2, seq1 = ( 7, 14, "ACGTAGCTATTCG" )
print(num1)
print(num2)
print(seq1)

7
14
ACGTAGCTATTCG


### Basic Mathematical Operations

We can also perform manipulations or calculations using our variable. In the first example below, we set our variable called `number_of_gastropods` to  1042 and another variable named `garden_area` to 168. We then set another variable, `gastropods_per_sqm` to the result of dividing `number_of_gastropods` by `garden_area`. Finally, we print the value stored in the variable `gastropods_per_sqm`. We have calculated the average number of gastropods (snails and slugs) per square metre. 

In [9]:
number_of_gastropods = 1042
garden_area = 168
gastropods_per_sqm = number_of_gastropods / garden_area
print(gastropods_per_sqm)

6.2023809523809526


In this second example, we directly print the result of adding the length of 2 oligonucleotide primers that are 28 nucleotides long to a DNA sequence that is 341 nucleotide long. The next section of the tutorial will cover arithmetic operations like this in more detail. 

In [1]:
sequence_length = 341
primer_length = 28
print(sequence_length + 2*primer_length)

397


#### Assigning different variable types

Python automatically infers what type of variable should be created according to the contents and the way you type it. If you surround a value with single or double quotation marks (' or "), Python will know that the variable type should be a string. To set a string (i.e. text) variable called `cell_type` to 'Hepatocyte', simply type:


In [24]:
cell_type = 'Hepatocyte'
print(cell_type)

Hepatocyte


We can check what kind of variable `cell_type` is by using the `type()` function. `type(cell_type)` returns the kind of variable, and we can use this along with the `print()` function to print this information to screen:

In [12]:
cell_type = 'Hepatocyte'
print(type(cell_type))

<class 'str'>


### Controlling variable type

When we assign a variable, Python sets the variable type based on how it is assigned. If we assign a numeric digit with no decimal points to a variable, Python will automatically make the variable type an integer (int). If we add a decimal point, Python will make the variable a float, and if we enclose it in quotation marks, Python will make the variable type a string.

In [13]:
sequence_length_as_int = 9 
sequence_length_as_float = 9.0
sequence_length_as_string = "9"

print(type(sequence_length_as_int))
print(type(sequence_length_as_float))
print(type(sequence_length_as_string))

<class 'int'>
<class 'float'>
<class 'str'>


#### Changing Variable Types

The variable `cell_type` is a string variable (i.e. it contains text), therefore you cannot perform operations such as arithmatic on it the same way that you could on a an integer or float variable:


In [14]:
cell_type = 'Hepatocyte'
print(cell_type + 3)

TypeError: Can't convert 'int' object to str implicitly

We have our first error: this is telling us that we have tried to arithmetically add an integer (3) on to a string (Hepatocyte), and as a mathematical operation this is nonsense. This is what is known as a 'TypeError', and if you are handling scientific datasets you will probably encounter them frequently. It simply means that the variable type does not match that which the code is expecting, either because you have manually entered it wrongly or because Python has misinterpreted what type of data is stored. As you will see in future sections, when you import data from files, Python sometimes stores numbers as strings rather than int or float.

As we previously learned, we can force Python to make a variable a string, even when entering a number:

In [None]:
sequence_length = 431 #sequence_length is stored as a number - you can perform mathematics operations on it
sequence_length = '431' #sequence_length is stored and treated as a string (text)

The following division therefore won't work and will generate a type error because sequence\_length is a string variable, not an int or float. 

In [None]:
number_of_mismatches = 34
sequence_length = '431'
proportion_mismatched = number_of_mismatches / sequence_length
print(proportion_mismatched)

However, you can convert any variable on-the-fly to a different type (assuming the data within makes sense as that type). To convert to an integer, we wrap the string in the expression int():

In [None]:
number_of_mismatches = 34
sequence_length = '431'
proportion_mismatched = number_of_mismatches / int(sequence_length)
print(proportion_mismatched)

To convert to:
* integer, use `int()`
* string, use `str()`
* float, use `float()`

### What should I call my variable?

Variables should generally be lowercase with words separated by underscores as necessary to improve readability. Some Python coders prefer mixedCase, where capital letters denote word boundaries. 

It is useful to make mneumonic variable names - that is names that help us remember why we created the variable in the first place and what value(s) it contains. The following three blocks of code produce the same results when run by the computer, but vary in how easy they are to understand for a human reader.


In [1]:
#The variable names in this example don't tell you anything useful about what they contain
a = 670
b = 100
c = a / b
print(c)

6.7


In [3]:
#These obscure variable names are obscure, irrelevant and therefore also unhelpful, but the mathematical operation is identical
longjing = 670
london = 100
lemur = longjing / london
print(lemur)

6.7


In [4]:
#These variable names make it clear that we must be calculating a concentration
#This is much easier when you revisit your code after a couple of weeks
milligrams = 670
millilitres = 100
concentration = milligrams / millilitres
print(concentration) 

6.7


### Variable naming guidelines

You can call your variable almost anything you like, although it is useful to follow the guide below:

* You cannot use any of Python's keywords - i.e. words that already have a meaning in the Python language (see table below). Also avoid words that can be confused with these terms (such as capitalized equivalents e.g. Yield), although doing so would not raise an error. 
	
* You should also avoid using function names as variable names. Later in this course we will learn more about functions - so far we have only used `print()`.  If you name your variable the same name as a function, then when you try to use that function you will encounter an error. Therefore avoid variable names such as sum (there is a function called `sum()`, and instead be more specific, e.g. sum\_of\_squares).

* Variable names cannot start with a number.

* For single-character names, avoid using l, O or I (lowercase letter el, oh or eye), as these may be indistinguishable from the numbers 1 or 0 or may be mixed up with themselves.

* Variable names are case-sensitive, so dna\_seq, dna\_SEQ, and dna\_Seq are all separate variables. Although technically we could use all of these combinations, it would be confusing both to the coder and anyone reading or editing the code.

* Avoid names that are too wordy (e.g. number\_of\_amino\_acid\_residues\_in\_protein\_motif) but do try to make it obvious what they are from the name (e.g. motif\_residue\_count) 

#### Reserved keywords in Python

<table>
<tr>
<td>False</td>
<td>class</td>
<td>finally</td>
<td>is</td>
<td>return</td>
</tr>
<tr>
<td>None</td>
<td>continue</td>
<td>for</td>
<td>lambda</td>
<td>try</td>
</tr>
<tr>
<td>True</td>
<td>def</td>
<td>from</td>
<td>nonlocal</td>
<td>while</td>
</tr>
<tr>
<td>and</td>
<td>del</td>
<td>global</td>
<td>not</td>
<td>with</td>
</tr>
<tr>
<td>as</td>
<td>elif</td>
<td>if</td>
<td>or</td>
<td>yield</td>
</tr>
<tr>
<td>assert</td>
<td>else</td>
<td>import</td>
<td>pass</td>
</tr>
<tr>
<td>break</td>
<td>except</td>
<td>in</td>
<td>raise</td>
</tr>
<table>


## Exercise

Each strand of DNA is made of nucleotides, which are also sometimes referred to as bases. There are four different bases in DNA: thymine (T), adenine (A), guanine (G) and cytosine (C). 
There are chemical cross-links between the two strands in DNA, formed by pairs of bases. They always pair up in a particular way, called complementary base pairing:
                    adenine pairs with thymine  (A-T)
                    guanine pairs with cytosine (G–C)
                    
The GC content is the percentage of bases on a DNA or RNA molecule that are either guanine or cytosine. The GC content can vary widely between different regions of an organism's genome and also varies widely between different organisms. GC content is often of interest to biologists as it can indicate the presence of DNA that may have originated from another organism's genome.  

Complete the following code to calculate the percentage GC content and print the value.


In [None]:
dna_length = 14512 #length of DNA sequence in base pairs
number_GC_bases = 8534 #this is the number of bases that were either G or C
percentage_GC = #complete the code from here

* Correct the code below so that it can run without error. 


In [None]:
max_response = '5.3' #dont edit this line
half_max = max_response / 2 #max_response was declared as a string - add code to convert it to a float
print("50% of max dose response was " + half_max) #half_max is a float - convert it to a string