# 1. Data types and operators
This section will look at the basic building blocks of Python code - the data itself. We'll look at how different types of data are stored in variables, and how we can manipulate those variables using operators.

We'll explore the following topics:

1.1 - *Variables and Operators*

1.2 - *Strings, Numbers and Lists*

1.3 - *Dictionaries*




## 1.1 Variables and operators

* We can think of a **variable** as a container which stores a piece of data for later use. 

* We **define** variables using the `=` operator.

* Operators, such as `+`, `*` and `/` allow us to manipulate those variables. 

In the cell below, try writing the following three lines of code:

```python
a = 3
b = 7
a + b
```

When you have finished, hit `shift + enter` or `ctrl + enter` on your keyboard to run the code. Bonus points if you can work out what the difference between the two is!

**Bonus**: Replace `+` with other operators like `*`, `-`, `/`, `==` and `!=`. 


In [None]:
# Code Block 1



Now try the following:

```python
c = a * b
c - 1
c / 2
```

**Before hitting `shift + enter`**: Discuss with you neighbour what you think the output will be.

In [None]:
# Code Block 2


**Printing data**

You can see that the variables `a` and `b` have been stored and can be reused. However, the second line `c - 1` did not update the variable `c`, and did not get printed out. Instead, we just see the value of `c / 2` below the cell. In general, a Jupyter notebook will only print the output of the final line of code. To print intermediary lines, we can use the `print()` function:

Write the following code in the code block below:
```python
print(c)
print(c-1)
print(c/2)
c += 1
print(c/2)
```

In [8]:
# Code Block 3


We now have a print out for each `print()` statement in order. Later, we'll see how to make these print statements more expressive. 

### 🧠 Think-Pair-Share
The block above introduced the `+=` operator. Have a think about the following questions, and discuss with a neighbour:

* Can you explain how the line `c += 1` manipulated the variable `c`?
* What might the operators `-=` and `*=` do?
* Can you write a line of code that is equivalent to `c *= 2` without using the `*=` operator?


In [9]:
# Code Block 4


## 1.2 Basic Types: Booleans, Numbers and Strings
This section is going to take a hands on look at three different types of data - booleans, numbers and strings - and explore how the act under different operators. 

### 1.2.1 Booleans
Booleans are perhaps the simplest data-type, but often extremely useful. A Boolean can be one of either two values: `True` or `False`.

We'll consider a scenario where we are trying to classify different peptides. The peptide might be associated with two variables: `contains_alanine` and `contains_aspargine`.

Use the in built `type()` function to explore these variables:

```python

contains_alanine = True
contains_aspargine = False

print(contains_alanine)
print(type(contains_alanine))
print(contains_aspargine)
print(type(contains_aspargine))
```

In [None]:
# Code Block 5


**Booleans, `==` and `!=`**

We typically encounter booleans when we use the `==` and `!=` operators. These operators compare two variables to see if they have the same value. 

In the following example, we use variables `n_alanine`, `n_aspargine` and `n_glutamine` to store the number of each amino acid present in a peptide. 

Explore these variables with the following code:

```python

n_alanine = 4
n_aspargine = 4
n_glutamine = 5

print(n_alanine == n_aspargine)
print(n_alanine == n_glutamine)
print(n_alanine != n_glutamine)
print(type(n_alanine == n_aspargine))
```

In [198]:
# Code Block 6


### 1.2.2 Numbers
There are three basic types of number Python can handle: `int`, `float` and `complex`:

* An `int` is defined by a whole number with no decimal point, for example `a=123`,

* A `float` is a decimal number, defined using `.`, for example `b=5.` or `c=5.4`,

* A `complex` is a complex number, and is defined using `j`, for exmaple `c = 4j`. We aren't going to worry too much about those. 

Try the following code to use `float` and `int` numbers to define some characteristics of a peptide. We'll then use `==` and `type()` to explore the differences between these types of number:

```python

length = 5
hydrophobicity = 5.0

print(length==hydrophobicity)

print(type(length))
print(type(hydrophobicity))

```
💡 Extend this code to explore what happens when we combine `float` and `int` numbers with the `+`, `*` and `\` operators. What types of data are produced?

In [119]:
# Code Block 7


### 1.2.3 Strings
Not every piece of data we want to manipulate will be a number. When we want to handle characters, such as in plain text, we can store our data as a `string`.

We define strings using either `""` or `''`. We can also convert a variable into a string by using the built in function `str()`. Try the following code to explore defining strings:

```python
a = "123"
b = '123'
c = 123
d = str(c)

print(a==b)
print(b==c)
print(b==d)
```


In [39]:
# Code Block 8


**Manipulating strings**

There are various operators and functions which allow us manipulate strings. We'll take a string representation of the peptide chain: "EMAEVHCGFN" 

Explore the different ways of manipulating this string in code block 9 using the examples below.

```python

peptide = "EMAEVHCGFN"

# concatenating
print(peptide + peptide)
print(peptide + 'FRRA')
# indexing
print(len(a))
print(peptide[3])
print(peptide[:3])
print(peptide[3:6])
print(peptide[-1])
print(peptide[-5:])
print(peptide[100])
print(peptide[::-1])
# manipulating
print(peptide.lower())
print(peptide.replace('V', 'I'))
print(peptide.replace('V', ''))
print(peptide.split('V'))
```
💡 What do you notice about the way we index into strings in python? What will the value of `peptide[0]` be?


In [19]:
# Code Block 9
peptide = "EMAEVHCGFN"


**The f-string**

We can insert variables into strings using the `fstring` (*formatted* string). To define an `fstring` we simply place an `f` directly before the quotation marks. We can then use `{}` to place previously defined varibales in our string. 

Use the code below to explore how f-strings work. 

Try the following:

```python
peptide = "EMAEVHCGFN"
length = len(peptide)
mass = 1135.29

print(f'The peptide chain {peptide} contains {length} amino acids and has mass {mass} g/mol.')
```

💡 **Task**: Go back to your previous `print` statements to make them more informative!

**Don't forget to include the `f` before the quotation mark!**

In [152]:
# Code Block 10


### 🧠 1.2.4 Exercise: Combining Types
We are going to explore how three different types of variables behave under the binary operators `+`, `*`, and `==`.

In code block 11 (below), define the following three variables

* `a = 3`
* `b = "AGCH"`
* `c = True`

Then use code block 12 and the file `binary_operators.xlsx` to complete the following steps:

1. Open the Excel file `binary_operators.xlsx`. This file contains three 3×3 tables—one for each operator.

2. In code block 12, try out all combinations of the variables `a`, `b`, and `c` using the relevant operator (e.g. `a * b`, `b * b`, etc.).

3. For each combination, fill in the table with a short description of the result. You might write something like:
   - `"10"` (if it returns a value)
   - A description of the value (if it is too long)
   - `"Error"` (if it throws an error)

**Don't be afraid of errors!** They are a normal and useful part of coding. If you get one, just copy the **last line** of the error message into your table and try to think about what it tells you.

**Think–Pair–Share**
  
Take a moment to reflect on the following questions, then discuss them with someone next to you:

- What might be the purpose of having different *data types* in a programming language?
- How does the function of an operator like `+` or `*` change depending on the data type?
- How would the third table change if we had used `!=` instead?
- How would the tables change if `c=False`

In [29]:
# Code Block 11 (define variables a, b and c)


In [33]:
# Code Block 12 (explore different combinations of a, b and c with +, * and ==)


## 1.3 Lists and Dictionaries
When dealing with data, it is rare that we only want to handle a single value at a time. Lists and dictionaries (and other types like tuples or sets) let us store collections of data to a single variable.

These can be particularly useful in data science, where a list might hold ordered observations over time, or the sequence of amino acids in a peptide.

### 1.3.1 Lists
Lists are defined using the `[]` symbols, with individual elements separated by a `,`.

Like with strings, we can index into a list using `[]` after the list. Try out the following code to see this in action:

```python
peptide = ['A','G','T','H']
last_aa = a[-1]

print(f'The last value of the peptide {peptide} is {last_aa}')
```
💡 **Task**: Have a look at some of the other indexing techniques we used with strings - do they still apply to lists?

In [None]:
# Code Block 12


**Measuring Lists**

We can also use the `len()` function with lists to find how many elements they have.

💡 **Task:** Use the `len()` function and an fstring to print out a statement describing the number of Amino Acids in `peptide`. 

In [None]:
# Code Block 13


**Lists and Data Types**

Lists don't just have to hold strings, they can hold any types of data!

💡 **Task:** Create a list which contains a mix of `string`, `int` and `float` elements.  

In [13]:
# code block 14


**Listception**

Lists can even contain other lists. Here, we are taking two short peptides, Oxytocin and Bradykinin, and creating a new list containing both:

```python
oxytocin = ['C', 'Y', 'I', 'Q', 'N', 'C', 'P', 'L', 'G']
bradykinin = ['R', 'P', 'P', 'G', 'F', 'S', 'P', 'F', 'R']

peptides = [oxytocin, bradykinin]

print(peptides)
```

💡 **Task:** Copy the code above into Code Block 15, and extend it to answer the following:
* What is the length of the list `peptides`? 
* How might you get the second amino acid in `bradykinin` from the list `peptides`? 

*hint: You can stack up list indices, so if `x=[1,2,3]`, `y=[4,5,6]` and `z=[x,y]` then `z[0][0] == x[0] == 1`.* 

In [None]:
# Code Block 15


**Modifying lists**

We can use indexing to update the value of an element in a list. 

In the exmaple below, the Oxytocin peptide has the wrong value for the first Amino Acid. We can fix this by indexing into the list. 

Try the following code:

```python

oxytocin = ['Y', 'Y', 'I', 'Q', 'N', 'C', 'P', 'L', 'C']
print(oxytocin)
# ammend the first value
oxytocin[0] = 'C'
print(oxytocin)
```

💡 **Task**: The last amino acid should be `G`. How might you change the last element of a list?

In [None]:
# Code Block 16


**Appending and Extending lists**

We can add new data to a list using the `.append()` function. Since this function is a property of just lists (unlike say, `print()`, which can be used on multiple types) the syntax for appending variable `y` to list `x` is: 

```python
x.append(y)
```
Its important to note that this is an *in place* function, which means the variable is updated when we call it, like we saw with `+=`. Hence, the following code would print out `[1,2,3]` then `[1,2,3,4]`:

```python
x = [1,2,3]
print(x) # [1,2,3]
x.append(4)
print(x) # [1,2,3,4]
```

In the following example, the `append()` function is used to add an $CONH_2$ c-terminal group to our oxytocin peptide.

```python
oxytocin = ['C', 'Y', 'I', 'Q', 'N', 'C', 'P', 'L', 'G']
c_terminal = 'CONH_2'
print(oxytocin)

oxytocin.append(c_terminal)

print(oxytocin)
```

💡 **Task**: The `extend()` function works in a similar way, but concatenates one list with another. Use the `<list>.extend(<list>)` function to prepend the peptide chain `['C', 'Y', 'T']` to our oxytocin chain. 

In [None]:
# Code block 17


### 1.3.2 Dictionaries
Dictionaries allow us to store data in *key-value* pairs. They allow us to have easily accessible information, and to link variables with specific values. We define a dictionary using the `{<key>:<value>}` syntax. 

We will use a dictionary to store data about the amino acid Cysteine:

```python

cysteine = {"name":"Cysteine", "abbreviation_3":"Cys", "abbreviation_1":"C", "molar_mass":121.15}

print(cysteine)
```

The objects to the left of each `:` are known as *keys*, these allow us to access the data stored in the *variables* to the right of the `:`. Different data points are seperated by a `,`. 


💡 **Notes**:
- Keys MUST be unique. We can't put in an additional `"molar_mass"` key pointing to a different value. 
- Values can be objects of any *type* (e.g. `string`, `int` etc). There are some restrictions on key types, for example a `list` can be used a dictionary key. 

Use the code box below to define the `cysteine` variable as a dictionary:

In [205]:
# code block 18


**Accessing data from dictionaries**

We can access the data stored in a dictionary using the syntax `dictionary[key]`. For example, to get the 3 letter abbreviation from our `cysteine` dictionary, we would use:

```python
cysteine_abbr = cysteine['abbreviation_3']
```

Try using an `fstring` and the `cysteine` dictionary to fill in the missing information in the string below and print out the result:

```python

"Cysteine, abbreviated as ___ or _, has molar mass ___ g/mol"
```

In [None]:
# Code Block 19


**Accessing data using `.get()`**

If you try to access a dictionary using an incorrect key, Python will raise an error. In more complex programmes this might break your code or cause unwanted bugs. We can use the `.get()` method to foolproof our code against these types of errors. 

The `.get()` method allows you to specify a return value for keys which do not appear in the dictionary, using the syntax `dictionary.get(<key>, <null_return>)`

Try the following code in the code block below:

```python
cys_solubility = cysteine['solubility']
print(cys_solubility)
```
And compare the results to the following:

```python
cys_solubility = cys_solubility.get("solubility", "Not yet defined")
print(cys_solubility)
```

In [None]:
# Code Block 20


**Adding new data to a dictionary**

We can add the `"solubility"` key (and associated value) to our dictionary using the code below:

```python

cysteine["solubility"] = 227
```

We can also use the `.update()` method for dictionaries to do the same thing. This is similar to the `.append()` method for lists:

```python
cysteine.update({"solubility":227})
```

💡 **Task**: Use either method to add both the solubility and chemical formula ($C_3H_7NO_2S$) to our `cysteine` dictionary.

In [206]:
# Code Block 21


**Updating existing entries**

There was a mistake in the solubility of Cysteine - it should read "277". Luckily, we can use exactly the same methods to update existing entries.

Use either the `dictionary[<key>] = <value>` or `dictionary.update({<key>:<value>})` methods to fix the error in `cysteine`.

In [None]:
# Code Block 21


**Turning `dict` into `list`**

Sometimes, we might want to iterate over the entries in a dictionary. We have three methods to help us do this: `.keys()`, `.values()` and `.items()`.

Use code block 22 and the `cysteine` dictionary to explore these built-in dictionary functions. 

In [None]:
# code block 22


## 1.3.3 Lists of Dictionaries

It is common to structure data in the form of a list of dictionaries. This is the fundemantal *data-structure* which we will use later to build dataframes.

A list of dictionaries would be structured something like this:

```python

list_of_dicts = [{"key_1a":"value_1a", "key_1b":"value_1b", ...}, {"key_2a":"value_2a", "key_2b":"value_2b"...}]
```

We are going to use this data-structure to store information about different amino acids. We'll use the same dictionary structure that we used for `cysteine`, so will need the 1 and 3 letter abbreviations, along with the molecular mass, chemical formula and soulbility (at 25 degrees).  

1. Using the internet, find the relavent data for the amino acids "Tyrosine", "Asparagine" and "Proline". 

2. Store the data for each amino acid in separate dictionaries. 

3. Create a list of dictionaries called `amino_acids` containing those amino acids plus Cysteine

How would you get the mass of Proline from your dictionary?

How might you structure your data differently if you wanted to use the syntax `amino_acids["proline"]` to access this data?

In [11]:
# code block 23
