# Introduction to Python

MiCM Workshop - February 11, 2026

Benjamin Z. Rudski, PhD Candidate, Quantitative Life Sciences, McGill University

Dear `Reader | Workshop Attendee`,  
Welcome! In this interactive Jupyter Notebook, I will introduce you to the [Python](https://www.python.org) programming language. In this workshop, we'll journey from the beginnings of storing data in variables and doing simple calculations to writing powerful functions to perform repeatable operations.

This notebook is the **student version**, which contains several blanks where I will write code during the workshop and where you can fill out exercises. There is a [**solution version**](../results/IntroToPython.ipynb) in the `results` folder. I recommend trying the exercises yourself before looking at the solutions, There is often more than one way to answer a programming question, so you should focus more on understanding the code that you are writing, instead of just copying my answers. You may come up with an answer better than the one I've provided!

Here's the outline of this workshop:

1. **Module 1 -- Python Basics (1 hour)**
    1. Hello, World!
        1. Into the Python-verse!
        2. Python Tools
    2. Variables
        1. Variable names
        2. Variable Assignment
    3. Numbers and Comparisons
        1. Mathematical operations
        2. Booleans and logical operators
    4. Intro to strings
        1. String slicing and indexing
        2. String formatting
        3. String methods
    5. **Exercises:** Temperature conversion and DNA GC content
2. **Module 2 -- Collections (1 hour)**
    1. Lists and List Methods
        1. Length of a list
        2. List slicing
        3. Adding elements
        4. Removing elements
        5. Other useful list methods
    2. Tuples
        1. Tuple unpacking
    3. Dictionaries
        1. Retrieving elements
        2. Adding and modifying elements
        3. Removing elements
    4. **Exercises:** Working with collections
3. **Module 3 - Intro to Control Flow and Loops (40 minutes)**
    1. Control Flow: the if statement
    2. Loops
        1. while loops
        2. Iteration with for loops
        3. Interrupting loops
    3. **Exercises:** Working with Strings and Collections for DNA and Protein Processing
4. **Module 4 -- Introduction to Functions (30 minutes)**
    1. Function Overview
        1. What is a function?
        2. Calling built-in functions
    2. Writing Custom Functions
        1. Function definition
        2. Function parameters
        3. Function Return values
    3. Documenting Functions
        1. Defining function docstrings
    4. To script, or not to script?
    5. **Exercises:** Writing functions for biological sequences
5. **Module 5 -- Where to go from here (10 minutes)**
    1. What to learn next? How?
    2. How to get help and how not to get help
    3. Glimpse of other cool programming topics


When this workshop is over, you should be able to write simple Python scripts. More importantly, I am hoping to give you *the tools* so that you can learn new Python skills and read documentation to find what you need. **In my opinion, the most important part of programming is knowing how to get help when you need it.**

This workshop material is based on my previous workshop material, as well as workshop material by Najia Bouaddouch.

# Module 1 - Python Basics

In this section, we'll see the basic, foundational concepts of programming in Python. We'll start with the basics of mathematical operations and we'll see variables for storing data. Then, we'll also start seeing how to get things done in Python. Along the way, I'll also point out possible places where users of different programming languages need to pay special attention.

**Topics:**

1. Hello, World!
    1. Into the Python-verse!
    2. Python Tools
2. Variables
    1. Variable names
    2. Variable Assignment
3. Numbers and Comparisons
    1. Mathematical operations
    2. Booleans and logical operators
4. Intro to strings
    1. String slicing and indexing
    2. String formatting
    3. String methods
5. **Exercises:** Temperature conversion and DNA GC content

## Hello, World!

Welcome to programming in Python! Before we get too far into the material, there are a couple of things we'll cover...

### Into the Python-verse!

When learning a new programming language, it's conventional that the first program we write is the "Hello, World!" program. This is a simple program that writes the text "Hello, World!" to the screen. In Python, it's quite easy to do:

In [None]:
# Your exciting first line of Python code here!
# In this line of code, we'll display, or print, the text string "Hello, World!"


This very simple program introduces a few important ideas. You'll notice that the first line doesn't really look like code. Actually, it's not! The first line is a **comment**. We (and the computer) know this because the line starts with the symbol `#`. Python ignores that symbol and everything that comes after it, letting you write notes about what your code is doing. It's very important to put comments in your code, especially if you're going to need to come back to it after a few weeks or if you're going to share it with other people.

On the second line, we have two things:
* the `print` *function*
* the *string* "Hello, World!"

The `print` function displays output to the screen or to the console. While it's not necessary in this Jupyter notebook (which automatically outputs the result of the last line of code), it's very helpful if you're ever writing code in a different program, like *PyCharm* or *Spyder*. We'll discuss functions in more detail later, but the idea is that functions take inputs, known as **arguments**, do operations on them and optionally return some sort of modified result. Here, the `print` function takes in the **string** of text "Hello, World!", writes it onto the screen and doesn't return any new data.

The **argument** that we pass to this function is the text **string** `"Hello, World!"`. We'll discuss strings in more detail later. The important thing is that a string is a group of characters surrounded by quotation marks (either single quotes `'Hello'` or double quotes `"Hello"`).

Don't worry if this doesn't make sense! We'll explain each part of it as we go along. By the end of this workshop, you'll understand what this line does!

Congratulations! You've now passed an important milestone!

### Python Tools

Based on our first example, you can see that we've managed to embed some code **directly** into our learning material text... I owe you a bit more of an explanation...

Python is an **interpreted** programming language. Basically, that means that it runs one line at a time, converting the code you write into code that your computer understands.

How can we write this code? Well, there are two main categories of software we can use.

#### Read-Eval-Print-Loop (REPL)

Since Python is interpreted, a single line (or short block) of code makes complete sense. So, we can work in an environment where we can write small chunks of code and run them.

Here are some examples of such tools:
* the `python` console on the command line (offers the bare minimum)
* the IPython command line console (offers enhanced completion features, as well as syntax highlighting and history)
* Jupyter notebooks (like this one, which offer the ability to mix text and code)

These tools offer certain advantages:
* it's easy to get results quickly
* you can test out small snippets of code
* your environment is gradually filled with data that you can interact with on-the-fly

So, REPLs offer a great experience if you're learning, exploring a new idea, or if you want to take advantage of Jupyter notebooks and their ability to mix code and explanations.

But... what if you don't want to be limited to writing small blocks of code?

#### Integrated Development Environment (IDE)

An IDE is a software tool that enables you to not only write code, but also manage your code files.

Here are some examples:
* Spyder (similar user interface to MATLAB and RStudio)
* PyCharm
* Microsoft Visual Studio Code

These tools are extremely powerful, and almost universally offer the following features:
* ability to work on large projects containing multiple files
* rich support for code completion (including using LLMs)
* code analysis and error detection
* integration with version control systems (like Git)

Using an IDE helps with larger tasks, like package development or writing complicated scripts that you want to run on large datasets. Working on such projects also makes code easier to reuse and share.

#### What should you use?

Great question! You tell me!

The answer depends on your use case. I've had projects where I've used both. Here are some things to consider:

* What's your major focus? Code or analysis of results?
    * If code (analysis can wait) - consider writing a script in an IDE.
    * If analysis (you want to provide lots of descriptions and tell your reader about what they're seeing in the results) - consider using a Jupyter notebook.
* Do you want to share what you're working on?
    * Yes - either IDE or Jupyter notebook (see above).
* Are you just testing something out?
    * Yes - use a REPL
    * No - store the code in a file (don't rely on being able to go back in the console history... you never know if you may not be able to).
* Are you developing a package?
    * Yes - find a good IDE

This isn't an exhaustive list, but hopefully you get the picture. There's no one-size-fits-all solution. You need to find the setup that works best for your uses.

**We're sticking with a Jupyter notebook for this workshop, as it allows us to easily try out many different things and it lets me combine code examples with my explanations.** As you work on your own projects, decide what tools will best help you to achieve **your specific goals**.

And with that, let's dive into programming!

## Variables

To store data, we use **variables**. A **variable** gives a *name* to a piece of data stored in memory so that you can easily access it later. The information stored in a variable can change (or **vary**).

**Note:** Python has no constants. Only variables. If you come from a language that has constants, I'm sorry.

#### Variable Names
There are rules for naming a variable:
* Variable names are **case-sensitive**.
* A variable name must contain only letters, numbers and underscores.
* A variable name cannot start with a number.
* A variable name cannot be the same as a reserved word in Python (see [here](https://docs.python.org/3/reference/lexical_analysis.html#keywords) for list).

A variable name may consist of multiple words combined. There are a few different conventions for putting words together. Two common ones are known as `snake_case` and `camelCase`:
* In `snake_case`, all letters are lowercase and words are separated by underscores.
* In `camelCase`, different words are combined with no spaces, and the first letter of a new word is put as a capital.

Different people use different conventions. Your code editor may suggest one over the other (for example, PyCharm prefers `snake_case`). The choice depends on your project setup and any existing code you may be adding to.

**Notes:** 

* Although you can combine words together, try to keep variable names reasonably concise.
* Although Python has no constants, `ALL_CAPS_NAMES` are sometimes used to denote variables that shouldn't change.
* Variable names *can* start with underscores, but this often has a special meaning.
* By **strongly encouraged** convention, we start variable names with a **lowercase** letter (or an underscore followed by a lowercase letter).

Let's see some examples of valid and invalid variable names:

| Invalid | Valid |
| ------- | ----- |
| `my_variable12.3` | `my_variable12_3` |
| `-myVariableName2`| `myVariableName2` |
| `@myVariable`| `myVariable` |
| `my-variable&` | `my_variable` | 
| `my+variable` | `my_variable` | 
| `23variable` | `my_variaBle_32` |
| `myV#ariable` | `myVariable` |
| `import` | `my_import` |

#### Variable Assignment

The way that we assign a variable is easy. We just use the `=` sign. That's it. We can also change the value of a variable by just assigning a new value using the equal sign (and so, the value **varies**).

Now, let's do a few examples of variable assignment. Here, we'll make use of the `print` function to track the value of the variables.

1. **Assignment**  
 Let's create a variable called `my_variable` with the value `42`.

In [None]:
# Your code here for variable assignment


2. **Reassignment**  
 We can easily reassign a value using the equal sign again. Let's re-assign our variable `my_variable` to have the value `16`.

In [None]:
# Your code here for variable reassignment


3. **Changing type**  
 There's no requirement for the new value to be of the same type as the original. Let's assign the string `"Hello"` to our variable `my_variable`:

In [None]:
# Your code here to assign a string


We can now store basic information... but we can't really do anything yet...

Let's kick things up a notch...

## Numbers and Comparisons

We've seen how we can store data in variables. But just storing the data is boring! In this section, we'll start talking about things that we can *do* with the data. Specifically, we'll see operations that we can do on numbers and Booleans.

### Mathematical Operations

Python gives users the ability to perform simple mathematical operations on numbers. The following operations that you know very well can be easily done:
* **Addition** is performed using the `+` operator
* **Subtraction** is performed using the `-` operator
* **Multiplication** is performed using the `*` operator
* **Division** is performed using the `/` operator (does not round)

Python offers a few other operations as well:
* **Exponents** can be taken using the `**` operator (**NOT** `^`)
* **Modulus** (remainder) can be taken using the `%` operator (**Warning:** for anyone who uses MATLAB, this is **not** a comment!)
* **Integer division** (dividing and rounding down) can be performed using the `//` operator (**Warning:** for anyone who knows Java or C or any number of other languages, this is **not** a comment in Python!)

To perform a basic mathematical operation, all you need to do is type in the numbers, along with the operator, in the same way that you'd write the expression on paper. For example, to add 5 and 4, we would write the following:

In [None]:
# Put your code here


We can also chain operations together. Remember that the rules of **BEMDAS** apply.

This example contained integers, known in Python as `int`s.

We can also do calculations that involve decimal numbers, known as **floating point numbers** or simply, `float`s. We can also mix the two different types of numbers.

These rules don't only apply when working with numbers. We can also plug in **variables** that hold numeric values. We can also **store** the result in a new variable using the assignment operator `=`.

For example, let's set `a=5`, `b=4`, `c=2`. Let's compute the following:

* $a\times b - c$
* $\text{floor}(b^2 / a)$
* $(a + b) \mod c$

In [None]:
# Your code here

# Example 1
ans1 = ...
print("First answer:", ans1)

# Example 2
ans2 = ...
print("Second answer:", ans2)

# Example 3
ans3 = ...
print("Third answer:", ans3)

Well, let's say we want to update the original variable...

Let's create a variable called `my_variable` with the value `35`. Let's then multiply it by `2` and store this result in the same `my_variable` variable.

In [None]:
# Your code here


This assignment looked a bit bulky! For some of these operations, we have a shortcut so that we don't have to rewrite the variable name twice. For each operation, we can use a new assignment operator:
* We replace assignment and `+` with `+=`
* We replace assignment and `-` with `-=`
* We replace assignment and `*` with `*=`
* We replace assignment and `/` with `/=`
* We replace assignment and `**` with `**=`
* We replace assignment and `%` with `%=`
* We replace assignment and `//` with `//=`

So, we can rewrite the last example we did:

In [None]:
# Your code here
my_variable = 35

# Multiply by 2 and store in same variable
my_variable = my_variable * 2

# Show the result
my_variable

For more information on `int`s and `float`s and the numeric types in Python, see [this page](https://docs.python.org/3/library/stdtypes.html#typesnumeric) from the official Python documentation.

### Booleans and logical operators

A **boolean** represents a value that is either `True` or `False`.

In Python, these are represented as... either `True` or `False`.

That's it.

> **Warning**
>
> The `True` and `False` are **case-sensitive**.

In this section, we'll see how to generate them, and then we'll see fun things we can do with them!

#### Comparisons

Think back to when you were starting to learn math... What was one of the first things they taught you? For me, it was **comparisons** and **inequalities**. We had two numbers, and we had to put the correct sign, `>,<,=` in between (some of you were maybe also told to think of a crocodile opening its mouth to the bigger number...).

Well, this is an important idea in programming too! We can use the following operations to generate boolean values. Let's say that `a` and `b` are both numbers (either `int`s or `float`s):
* `a > b` -- **greater than**, evaluates to `True` if `a` is bigger than `b`, otherwise evaluates to `False`
* `a >= b`-- **greater than or equal to**
* `a < b` -- **less than**
* `a <= b` -- **less than or equal to**
* `a == b` -- **equal** -- ***NOTE:*** there are ***TWO*** equal signs!!!!!
* `a != b` -- **not equal**

Again, I want to emphasize that for the equals comparison, you must must must put two equal signs `==`! Otherwise, Python will think you're trying to assign a variable and it will get mad at you and give you an error!

Also, for `>=` and `<=`, the order of the two signs matters! Do **NOT** write `=>` or `=<`! If you forget, remember that the order is the same as we read it. **Less that or equal to** is first *less than*, so `<` and then *equal to*, so `=`, so the order is `<=`.

Now, let's see some examples:

In [None]:
# Your code here


# Complete these lines: # Your code here
print("a is greater than b:", ...)
print("a is less than b:", ...)
print("a is equal to b:", ...)
print("a is not equal to b:", ...)
print("a is greater than or equal to b:", ...)
print("a is less than or equal to b:", ...)

Feel free to change the values of `a` and `b` and see how the output changes!

These operations don't only work on numbers! We can use `==` and `!=` on just about any other data. Let's see some examples on strings:

In [None]:
# Your code here for string comparisons


These types of comparisons are very important. We'll see why in a bit... But first, let's see some other cool things we can do with Booleans.

#### Boolean Operations

We've seen how to generate booleans using numbers and strings. We can also perform operations on booleans to get... more booleans! These three operations are **logical operations**:
* `and`
* `or`
* `not`

#### The `and` operation
The `and` operation takes **two** boolean values `a` and `b`. If **both** `a` and `b` are `True`, then `a and b` is also `True`. Otherwise, `a and b` is `False`. People coming from other programming languages may know `and` as `&&` or `&`. We can represent this operation using a **truth table**:

| `a` | `b` | `a and b` |
| --- | --- | --- |
| `False` | `False` | `False` |
| `False` | `True` | `False` |
| `True` | `False` | `False` |
| `True` | `True` | `True` |

In practice, you'll often work with Booleans that you've generated using comparisons. Now, let's a more complicated example. Let's set `a=4`, `b=5` and `c=6` and evaluate `(a < b) and (c > b)`:

In [None]:
# Your code here for the example


Let's think about that last example: we have `a=4`, `b=5`, `c=6`. We're looking at the logical expression
```python
a < b and c > b
```

So, we start by breaking it up into the two parts:
* `a < b`
* `c > b`

Now, we look at each part separately:
* `a < b`: well, we have `a=4` and `b=5`, so we have `4 < 5`, which is `True`
* `c > b`: we have `c=6` and `b=5`, so we have `6 > 5`, which is `True`

Now, we can put these two back together: for `a < b and c > b` both the left and the right are `True`, which makes the whole expression `True`!

#### The `or` operation

The `or` operation also takes **two** boolean values `a` and `b`, but it evaluates to `True` if **at least one** of `a` or `b` is `True`. If both values are `False`, then `a or b` is `False`. Otherwise, `a or b` is `True`. In other programming languages, the `or` operation is represented as `a || b` or `a | b`.

To help visualise, here's the truth table:

| `a` | `b` | `a or b` |
| --- | --- | --- |
| `False` | `False` | `False` |
| `False` | `True` | `True` |
| `True` | `False` | `True` |
| `True` | `True` | `True` |

Once again, you'll often work directly with numbers and comparisons. Let's do an example where we have `a = 5`, `b = 6`, `c = 7` and let's evaluate `a > b or c > b`:

In [None]:
# Your code here for a numeric example:


Let's go through that last example. We have `a=5`, `b=6`, `c=7`. Let's again break up our expression into two parts:
* `a > b`
* `c > b`

Let's look at each one:
* `a > b` --> `5 > 6` --> `False`
* `c > b` --> `7 > 6` --> `True`

Since at least one of the two boolean values is `True`, then `a > b or c > b` is `True`.

We don't have to use comparisons of variables of all the same type. Let's set `n1 = 5`, `n2 = 6` and `password = "Hello"`. Let's now check to see if the product of the two numbers is less than 28 **or** the password is equal to `"World"`:

In [None]:
# Your code here for a more complicated example:


Let's now change the password to `"World"` and try again.

Hopefully, you're starting to see that we can use these booleans to make decisions. We'll come back to this idea **really soon**.

#### The `not` operation

The `not` operation only takes in **one** boolean value `a` and flips its value. If `a` is `True`, then `not a` is `False` and if `a` is `False`, then `not a` is `True`. In other languages, it may be represented by `!a` or `~a`.

Here's the truth table:

| `a` | `not a` |
| --- | --- |
|`False` | `True`|
|`True` | `False` |

The easy way to understand it is that it's opposite day! When you add the `not` operator, everything that is usually `True` becomes `False` and everything that is usually `False` becomes `True`.


Let's do a numeric example. Let's set `a=6` and `b=8`. Let's evaluate `not a > b`:

In [None]:
# Your code here for a numeric example


Let's look a bit more closely at this last example. We have `a=6` and `b=8`.

The value of `a > b` is `6 > 8`, which is `False`. But the `not` operation flips this from `False` to `True`.

**Note:** When you want to invert equality, *DO NOT* do `not a == b`. We have an operation that does this in one step, called `!=`. So, you should do `a != b` instead. It's cleaner and simpler.

## Intro to Strings

We've seen basic operations with numbers and booleans... but what about processing text?

A **string** is a sequence of text characters, surrounded by quotation marks. We saw an example above when we wrote the "Hello, World!" program.

We can use either single quotes or double quotes:

In [None]:
# Your code here


We can also use triple-quotes to have a longer string that has line breaks in it:

In [None]:
# Your code here


> **Attention**
> 
> It's very very very important that you remember the quotation marks! Otherwise, Python will think you're talking about variables.

What if we want to indent, or to add a new line? Well, we can add special characters through **escape sequences**:

* `\t` - Tab, indent.
* `\n` - New line.
* `\\` - Backslash.
* `\uXXXX` - Insert unicode character with code `XXXX`.
* `\'` - Insert an apostrophe (useful if your string is defined with `'`).
* `\"` - Insert a quotation mark (useful if your string is defined with `"`).

If you look at the value of the string, you will see these escape sequences, but if you print them, they get converted into their actual meaning:

In [None]:
my_string = "This \"string\"\nhas\n\ta\n\t\tlot\n\t\t\tof\n\t\t\t\tlines."

# Your code here to print the string


See these pages for more information about escape sequences:

* [Python documentation](https://docs.python.org/3/reference/lexical_analysis.html#grammar-token-python-grammar-stringescapeseq)
* [W3Schools tutorial](https://www.w3schools.com/python/gloss_python_escape_characters.asp)

So, we can print strings... But what else?

There's lots of stuff that we can do with these strings. Let's discuss a few operations on strings.

The absolute most basic thing we can do is get the **length**, or number of characters, in a string. We can use the built-in `len` function to get this information, passing the string as the argument:

In [None]:
my_string = "I like Python!"

# Your code here


**Note:** `""` defines the **empty string**. This string has no characters in it, and so it has a length of zero.

In [None]:
# Your code here to check the length of the empty string


### String slicing and indexing

We can access individual characters or substrings using the **bracket operator** `[]`. But first, we need to talk about **indexing**. In a Python string, every character has a numbered position. It's **extremely** important to remember that in Python, the first position is indexed with the number **0**.

Again, I'll repeat that...

***The first character in a Python string has index 0.***

So, you can also figure out that the last character in a string with *n* characters has index *n-1*, **not** *n*.

This diagram should help clarify it:

![string indexing](../assets/StringIndexingPositive.png)

Note that blank spaces are counted! To get the character at an index, stored in variable `i`, we'd write the following:

```python
character_of_interest = my_string[i]
```

To get a substring starting at index `i` and going to the character at index `j` (**excluding** that character), we write:
```python
my_substring = my_string[i:j]
```

If we omit `i`, then we get everything from the beginning up to (but **excluding**) `j`. If we omit `j`, then we get the substring starting at index `i`.

We can even skip every `k` characters by adding a third number:
```python
my_substring = my_string[i:j:k]
```

Now, let's see some examples of string indexing and taking substrings. In Python, this process is commonly referred to as *slicing*.

In [None]:
my_string = "my string text"

# Your code here

# Let's look at single characters
print("The first character in the string is:", ...)
print("The last character in the string is:", ...)

# Now, let's look at substrings
print("The substring from index 3 to index 12 is:", ...)

# Now, let's skip a few characters
print("The substring from 5 to the end, skipping every 2 is:", ...)

Python also has a great feature where we can use **negative** indices! The last character has an index of -1 and the values go back to -n, where n is the length of the string. Here's an updated diagram:

![Negative indices](../assets/StringIndexingNegative.png)

Now, it's your turn! Let's do some string indexing with negative indices. **Note:** We *can* combine positive and negative indices.

In [None]:
# Reproduce the above strings using negative indexing where convenient
my_string = "my string text"

# Your code here

# Let's look at single characters
print("The last character in the string is:", ...)

# Now, let's look at substrings
print("The last 3 characters in the string are:", ...)
print("The first 7 characters in the string are:", ...)

One last note on string slicing and indexing: Strings are **immutable**, meaning that you can't change any of the individual characters or substrings. You can create a new string using existing strings, but you **cannot** change the content of a string.

**Note:** These indexing rules are **super important** to remember! They aren't just useful for strings. When we get to lists, we'll see these slicing tools again.

### String formatting

Let's say you have information in a variable and you'd like to embed it in a string...

Well, you can do it easily using **string formatting**.

All you need to do is:

* put `f` **before** the opening quotation mark
* put the variable in `{}` brace brackets

and then the variable will be embedded.

Let's see an example:

In [None]:
meaning_of_life = 42

# Your code here for string formatting


Again, don't forget the `f` before the opening quotation mark and the curly braces.


There are also cool ways of formatting numbers with extra zeros and spaces... We won't see them today, but if you're curious, feel free to check out [this page](https://docs.python.org/3/library/string.html#formatspec).

### String methods

Let's say we want to take a string and make a new string out of it...

Well, we can do this using **string methods**.

But first, what's a method?

You can think of a method as a sort of "machine" associated with a variable of a certain type. This machine can take in optional **inputs** (arguments) and produces some sort of **output**.

Here's the important syntax:

```python
my_output = variable_name.method_name(arguments)
```

In the case of string methods, we're using methods associated with strings to produce new data, which may be a new string, or a number, or something completely different.

Let's look at an example using DNA.

Let's say we want to find the index of the first `T` nucleotide. We're going to use the [`find`](https://docs.python.org/3/library/stdtypes.html#str.find) string method.

In [None]:
dna_sequence = "AAGGACCTTAGAAGGGGACCATTATTAAATTCCCGCA"

# Put in your code to find the index of the first T nucleotide


#### Replacing Characters

Well, let's say we want to replace this `T` nucleotide with a `G` nucleotide. We can use another useful method: `replace`. As the name suggests, this method replaces specified characters or substrings with the provided new ones. Its documentation is [here](https://docs.python.org/3/library/stdtypes.html#str.replace).

The syntax is:
```python
new_string = my_string.replace("old", "new", optional_count)
```

Let's go back to our DNA sequence and replace only the first `T` with `G`:

In [None]:
# Your code here


There are many more methods we can call for strings. To learn more, see the `str` reference on the Python documentation website (https://docs.python.org/3/library/stdtypes.html#text-sequence-type-str).

## Exercises

Now that we've seen these basic operations, let's do some exercises!

### Temperature Conversion

Most of the world reports temperature in Celsius... most of the world. One place in particular reports the temperature in Fahrenheit.

To make it easier to understand, we can convert between units. Here's the equation for converting a temperature in Fahrenheit into Celsius:

$$
\text{C} = \frac{5}{9}\left(\text{F} - 32\right)
$$

Turn this equation into code and convert the following temperatures from Fahrenheit to Celsius:
* -40
* 32
* 0
* 212

After converting, print the result using string formatting.

In [None]:
# Your code here to convert the temperatures


Got extra time? Derive the equation for converting from Celsius to Fahrenheit and convert the following temperatures:
* 25
* 34
* 5
* -10

As before, use string formatting to print a nice string.

In [None]:
# Your code here to convert the temperatures


### DNA GC content

When analysing biological sequences, like DNA, we may be interested in the relative proportion of different nucleotides.

I've given you a string containing a sequence of DNA. Write code to determine the proportion of nucleotides that are either `G` or `C`.

You should not need to manually count anything.

**Hint:** the string method [`count()`](https://docs.python.org/3/library/stdtypes.html#str.count) may come in handy here...

In [None]:
seq = "TGGGTAATGCGCTTAAGTCGGGTGGGGAGGCTCTGGGTCCCTCCTCGCCGTGTGACTCCG"

# Your code here to find the proportion of GC

Want to try more sequences? Here's some code that randomly generates some biased sequences.

In [None]:
# import random

# nts = ["A", "C", "G", "T"]
# n = 60

# selected_nts = random.choices(nts, weights=[1, 2, 2, 1], k=n)
# new_seq = "".join(selected_nts)

# print(new_seq)

## Module Summary

With that, we've reached the end of our first module on Python Basics. Here's what we've covered:

* How to *store* different *types* of data in **variables**.
* How to perform *basic mathematical operations* on **integers** and **floating-point numbers**.
* How to perform **boolean operations**.
* A **string** represents *text* in Python. We can use **slicing** to access its elements. We can also perform operations, like **string formatting**, and use **methods** to get extra info about a string or create modified versions of it.


# Module 2 - Collections

In this module, we'll take things up to a new level. We've seen how to write code that does stuff with basic data, such as numbers and booleans. We've also played a bit with strings. 

But we've only dealt with **one** piece of information at a time.

Now, let's see how to group multiple values together using **collections**.

Here's the outline for this section:

1. Lists and List Methods
    1. Length of a list
    2. List slicing
    3. Adding elements
    4. Removing elements
    5. Other useful list methods
2. Tuples
    1. Tuple unpacking
3. Dictionaries
    1. Retrieving elements
    2. Adding and modifying elements
    3. Removing elements
4. **Exercises:** Working with collections

For more information on tuples and lists, see [this page](https://docs.python.org/3/library/stdtypes.html#sequence-types-list-tuple-range) of the Python documentation. For more info about dictionaries, see [here](https://docs.python.org/3/library/stdtypes.html#mapping-types-dict).

### Lists and List Methods

Lists contain many entries, typically of the same type.

Lists are **mutable**, meaning we can add new entries, remove entries, and change the entries in a list. Lists are represented as comma-separated values in square brackets -- `[]`.

In [None]:
# Your code here for an example list


**Note:** While you *can* put elements of different types in a list, ask yourself whether in a given scenario you *should* and make sure that your code is prepared to handle the different types of data.

Now, I've told you all these great things that we can do with lists... but how do we do them?

#### Length of a List
Well, let's start with the simplest thing... taking the **length** of a list. We do this in the exact same way that we took the length of a string! We use the `len` function.

In [None]:
# Your code here


#### List Slicing

We can obtain individual items and sublists through *slicing*, exactly the same way that we did with strings and tuples.

And, in case you forgot, indexing starts at zero ðŸ˜‰.

Here's an exercise to test your skills with this...

I'm giving you this list: `[1, 1, 2, 3, 5, 8, 13, 21, 34]`

Using slicing, find:
* the last element
* the values `3, 5, 8`
* the values `1, 2, 5, 13`

In [None]:
my_list = [1, 1, 2, 3, 5, 8, 13, 21, 34]

# Your code here
print("The last element in the list is:", ...)
print("The sublist is:", ...)
print("The sublist is:", ...)

But, there's more that we can do with the slicing! We can now update values using the `=` sign! We can do this for both individual elements and for sublists!

Let's take this example: `[1, 2, 4, 9, 16, 32, 64, 129, 257]`

Any idea what this sequence is? There are three mistakes that we need to correct!

So... Where are the mistakes? How do we correct them?

In [None]:
# Here is our error-filled list:
powers_of_two = [1, 2, 4, 9, 16, 32, 64, 129, 257]

# Your code here to correct

print("The corrected list is:", powers_of_two)

So, we can replace single elements by assigning a new value, or an entire sub-range by passing a list!

#### Adding Elements

Now for the fun part! Let's insert new items! Remember that the list is **mutable**, so when we add new items, we are actually *changing* the list. We are **not** creating a new list and we are **not** creating a new variable (there's no `=` sign). To change the list, we use **methods** from the list object.

Let's start with adding a new item at the **end** of the list. This process is known as *appending* to a list. So, naturally, the method to do this is called `append`:

In [None]:
# Example using our powers of two
# Your code here to continue the list


print("Powers of two is now:", powers_of_two)

We can also insert at any index `i` using the method called... [`insert`](https://docs.python.org/3/library/stdtypes.html#sequence-types-list-tuple-range:~:text=(6)-,s.insert(i%2C%20x),-inserts%20x%20into)! This method takes **two** arguments: the index `i` *before which* we want to insert the new element and the new element that we want to insert:

```python
my_list.insert(i, new_element)
```

***NOTE:*** You must respect this order of arguments.

Here's an example:

In [None]:
days_of_the_week = ["Sunday", "Tuesday", "Wednesday", "Thursday", "Saturday"]

# Your code here to add Monday in the correct spot


# Your code here to add Friday in the right spot (hint: negative indexing)



print(f"The {len(days_of_the_week)} days of the week are: {days_of_the_week}")

How can we learn more about these methods? We can check out the [documentation](https://docs.python.org/3/library/stdtypes.html#list).

#### Removing Elements

Sometimes, we want to delete elements from a list. There are a few ways to do this:
- using the `del` keyword
- using an assignment
- using the `pop` method
- using the `clear` method

Here are the details:
* The `del` keyword can be used to get rid of single elements or a range. `del` is **not** a function, so we **don't** use brackets. 
* To remove a range, we can alternatively just use slicing and assign an empty list to the desired range (see [here](https://docs.python.org/3/library/stdtypes.html#mutable-sequence-types)).
* We can use the `pop` method without an argument to remove the last item from a list, or with an index as argument to remove the item at index `i`. The `pop` method returns the removed element, so it can be stored in a variable.
* We can use the `clear` method to remove **all** items from a list.

In [None]:
test_list = [2, 3, 5, 7, 9, 11, 13, 17, 19, 23]

# Your code here to remove the number which doesn't belong using `pop`...


print("Test list is now:", test_list, "since we removed item:", removed_element)

In [None]:
test_list_2 = [1, 2, 2, 1, 2, 3, 1, 2, 2, 1, 1, 2, 2]

# Your code here to remove the numbers that disrupt the pattern using assignment.

print("Test list 2 is now:", test_list_2)

In [None]:
# Your code here to remove all elements using `clear`

print("The test list 2 is now:", test_list_2)

**Extra:** Using `del` to delete elements. Remember *not* to put brackets.

In [None]:
test_list = [2, 3, 5, 7, 9, 11, 13, 17, 19, 23]

# Your code here
my_index = 4

# Remove the number which doesn't belong using `del`...


print("Test list is now:", test_list)

In [None]:
test_list_2 = [1, 2, 2, 1, 2, 3, 1, 2, 2, 1, 1, 2, 2]

# Your code here to remove the numbers that disrupt the pattern using `del`.


print("Test list 2 is now:", test_list_2)

#### Other useful list methods

There's more that we can do with lists using other list methods.

* [`sort()`](https://docs.python.org/3/library/stdtypes.html#list.sort) - name seems pretty intuitve...
* [`reverse()`](https://docs.python.org/3/library/stdtypes.html#sequence.reverse) - flip the list
* [`extend()`](https://docs.python.org/3/library/stdtypes.html#sequence.extend) - add multiple elements from another list
* [`index()`](https://docs.python.org/3/library/stdtypes.html#sequence.index) - find a specific element

Let's see an example... 

In [None]:
codons_1 = ["UGA", "GCG", "UUG", "AUG", "CGC", "AAA"]
codons_2 = ["GAA", "GCC", "AUU", "UAA"]

# Your code here to combine the codon lists and find the start codon


Remember, to learn about all these different methods, we can consult the **Python documentation**.

### Tuples

What if we want something simpler than a list?

A tuple is a way of packaging a fixed number of values together. The number of values can't be changed, and neither can the values themselves. Tuples are **immutable**.

Tuples are represented using multiple values separated by commas within round brackets (parentheses) -- `()`.

In [None]:
# Your code here


There are two different ways to access individual elements in a tuple:
* Slicing
* Unpacking

When working with tuples, **slicing** works the *exact same way* that it did with strings, described above. And since it works the same way, we won't do an example right here.

But, even without an example, I'll give you my usual reminder... ***INDEXING STARTS AT ZERO***.

#### Tuple Unpacking
**Unpacking** is a different process. Let's say we have a tuple with 2 elements in it. We can assign each one of these elements to a variable, like this:

In [None]:
my_point = (-3, 5)

# Your code here to assign x and y


print("The value of my point is:", ...)
print("The value of x is:", ...)
print("The value of y is:", ...)

**NOTE:** You **MUST** have the same number of variables and the number of elements in the tuple. Otherwise, unpacking won't work and you'll get an error from Python.

### Dictionaries

So... How many of you can remember using a paper dictionary? What's the idea behind them?

Well, we're not going to be defining words... but think about the **structure** of a dictionary. You look up a word and you get an associated valuable piece of information, a definition. Let's call the word a **key** and the associated information a **value**. A **dictionary** is a collection that stores **Key-Value** pairs.

Now, for the syntax... Well, tuples involved round brackets, and lists involved square brackets... so it's only natural that the syntax for dictionaries uses curly brackets, or brace brackets `{}`. But, there's another twist here. 

We need both keys and values! The **values** can be any type, but the **keys** must be **immutable**. So, the keys can be numbers, tuples or strings (or booleans, I guess, but that may not be useful), but they **cannot** be lists. In addition, keys **cannot** be duplicated, but values can. If you try to duplicate a key, only one of the values is kept.

Let's make a simple dictionary with keys for `microCT`, `FIB-SEM`, `confocal`, `STORM` and `cryoTEM`:

In [None]:
# Your code here: dictionary example for image_counts


Now, there are lots of operations that we can do on dictionaries!

#### Retrieving elements

Recall that in strings, tuples and lists we used the square brackets `[]` for indexing. We're still going to use them here, but instead of using a *numeric* index, we put a key in the brackets instead. We can then perform our usual operations of retrieving and replacing values.

In [None]:
# Your code here to access the number of microCT scans and store it in micro_ct_scans


**Note:** Remember, unless your keys are numbers, you **cannot** use numerical indexing to select elements. You **must** use a valid key.

**Warning:** You **cannot** do slicing on a dictionary!

#### Adding and modifying elements

Adding and modifying new elements to a dictionary are easy tasks!

We just need the key and the new value, and then we write:

```python
my_dictionary[key] = new_value
```

For example:

In [None]:
# Your code here to modify the number of confocal images


# Your code here to add TEM to our imaging database


print(f"Our imaging database now has the following datasets available: {image_counts}")

#### Removing Entries

To remove an entry, we can again use the `del` keyword, or we can use `pop`. Like with lists, `pop` gives us the value that we removed in a variable.

In [None]:
# Your code here to remove the STORM datasets and store them in a variable storm_datasets


print("The number of STORM datasets was:", ...)

print("Our dictionary is now:", image_counts)

## Exercise: Working with collections

Biological sequences each have their own alphabets!

What do I mean?

Well, let's consider some sequences:

* DNA is composed of **A**, **T**, **G** and **C**.
* RNA is composed of **A**, **U**, **G** and **C**.
* Proteins are composed of many different amino acids, each represented by a one-letter sequence.

### Base Pairing

Let's think about DNA base pairing. DNA is double-stranded, with the following pairing rules:
* `A` pairs with `T`
* `C` pairs with `G`

Using one of the collection types we've seen, find a way of storing this information to be able to easily get the appropriate base pair for a specific nucleotide.

**Hint:** you're trying to associate two strings to each other...

In [None]:
# Your code here


### Months

Remember what we did with the days of the week? Well, now I've incorrectly written the months.

Fix the mistakes in the list of months.

In [None]:
months = [
    "January",
    "April",
    "May",
    "June",
    "Julie",
    "September",
    "Octobre",
    "Construction"
]

# Your code here to fix the months


## Module Summary

Yay! We've made it through our second content module! Here, we've explored the basics of strings and collection types. Here are the main points that we saw:

* A **list** represents a *variable-length collection* of objects. We can add or remove objects from the list using **list methods**, such as `append`, `insert` and `pop`.
* A **tuple** represents a *small number of objects grouped together*. To access elements, we can either use slicing, or we can **unpack** its contents into the corresponding number of variables. Tuples can't be modified.
* A **dictionary** represents *key-value storage*. Instead of having a numeric index, we access **values** using a **key**. We can add or remove elements using keys and we can modify the dictionary using **dictionary methods**, such as `pop`.

For more information about any of these objects, check out the official Python documentation. There's a lot of detail about each type:
* Strings: https://docs.python.org/3/library/stdtypes.html#text-sequence-type-str
* Tuples: https://docs.python.org/3/library/stdtypes.html#tuple
* Lists: https://docs.python.org/3/library/stdtypes.html#list
* Dictionaries: https://docs.python.org/3/library/stdtypes.html#mapping-types-dict

Finally, there's another collection type that I didn't discuss, called a *set*. If you want to learn about it, check out this page: https://docs.python.org/3/library/stdtypes.html#set-types-set-frozenset.

Don't worry, there's more that we can do with these collections. We'll come back to them a bit more in the next section.

# Module 3 - Intro to Control Flow and Loops

Up until now, we've seen basics of storing data. We've seen how to store numbers, text and collections. We've also seen how to perform basic operations, using either arithmetic or methods.

But what if we want to do something more exciting.

What if we want to make decisions and repeat certain lines?

In this module, we'll explore these topics. Here's the outline of what we'll cover:

1. Control Flow: the if statement
2. Loops
    1. while loops
    2. Iteration with for loops
    3. Interrupting loops
3. **Exercises:** Working with Strings and Collections for DNA and Protein Processing

### Control Flow: the `if` Statement

Let's say you're coming to this workshop. You take the metro and get off at Peel. You get out at the corner of Metcalfe and de Maisonneuve and look around. In your head, you're thinking, `if` Metcalfe is open, I'll walk up there, otherwise (`else`), I'll go to Peel. **Congratulations!!!** You've just done control flow!

Control flow is about **making decisions** using boolean values. The important keyword here is `if`. Here is the syntax of control flow in Python:

```python
if some_boolean:
    do_something
elif some_other_boolean:
    do_something_else
elif yet_another_boolean:
    do_another_something_else
...
else:
    all_else_has_failed_so_lets_do_this

some_other_code_here
```

Here's the idea: 

* If the value of `some_boolean` is `True`, then the line `do_something` runs.
* If the first branch isn't run, then we test to see if `some_other_boolean` is `True`. If it is, then we run `do_something_else`.
* We keep checking all the branches until one of the conditions is met (evaluates to `True`).
* If all conditions are `False`, then the code under `else` runs.

Here are a few things to note about the syntax:
* there is a **colon** (:) after the boolean value.
* the line `do_something` only runs if the `boolean_value` evaluates to `True`. 
* the line `do_something` is **indented**. In other languages, you might be used to curly brackets. Python **DOES NOT** use these. In Python, different blocks of code are indented. Also, note that in Python, we don't need to write `end` when we're done! It's enough to stop indenting.
* the line `some_other_code_here` runs *regardless* of whether the `boolean_value` is `True`. We can tell because it's **not** indented.

And now, some notes about the different branches:

* The `if` branch is **required** (otherwise, there's not much of a point here...).
* There is no limit to the number of `elif` clauses you can have. You can have zero, one, or as many as you want.
* There is no requirement to add an `else` clause. You can have lots of `elif` clauses without a final `else`.
* You can only have at most **one** `else` clause.

It's **SUPER IMPORTANT** to remember that **only one branch is run**. Once Python finds a condition that matches, it stops checking all the other branches.

Let's see some examples to help illustrate.

In [None]:
# Your code here


Try changing the variable `metcalfe_is_open` to `False` and see what happens... Try playing with both boolean values.

You may have noticed that the lines under the `if` statement are indented. That tells Python that they will only run when the `if` condition is met. The lines underneath that aren't indented tell Python that they run no matter what.

You may have also noticed that I didn't write:
```python
if metcalfe_is_open == True
```

This isn't necessary, since we already have a boolean. Putting in the extra comparison makes our code less clean. Also, just try reading the code like it's a sentence. It even sounds like a conversation:
"If Metcalfe is open, [print] I'm going up Metcalfe".

If you think about all the branches, it still seems intuitive:
* If Metcalfe is open, I'm taking Metcalfe.
* If it's not open, but Peel is open, then I'm taking Peel.
* Otherwise, find another way...

We typically won't plug in a raw boolean value. To see something more practical, let's replace the boolean with one of the comparisons we have above...

Let's write a simple program that takes the current outdoor temperature in a variable, and tells us if we're below freezing, above freezing or at freezing.

In [None]:
# Freezing point example: Your code here


In this example, we put an expression that evaluates to a boolean after the `if`. Try setting the value of `current_temperature` to be above zero and see what happens.

## Loops: Repeating code


### `while` loops

So, control flow is great for choosing which lines of code to run, but what if we want to run a line more than once? To do this, we can use **loops**. There are two main kinds of loops in Python:
* `while` loops
* `for` loops

They are similar, but `for` loops run for a predetermined number of times and `while` loops run for an arbitrary number of iterations. We'll start with `while` loops.

Syntax:

```python
while some_boolean:
    do_some_code

code_after_loop...
```

Now, you'll pretty much **NEVER** want to put a raw boolean value in the `while`. You'll instead want to use some sort of operation that returns a boolean. This operation usually involves a variable that you update in the loop. Again, notice the indent!

Sticking with our temperature theme... Let's write an example where the temperature starts at -15 and increases by 2Â° at every iteration until it hits 10Â°. At each iteration, we print a message saying the current temperature and whether we are below, at or above freezing:

In [None]:
current_temperature = -15

# Your code here


Try changing the increment or the starting value to see the differences in the output.

**Remember:** It is **CRITICALLY IMPORTANT** to update the variable in the loop. Otherwise, the condition will always be true and the loop will run forever.

### Iteration with `for` loops

`for` loops are a bit simpler, since they involve running for a pre-determined number of times. To use a `for` loop, we need something to iterate over. One basic iterable uses the `range` function.

The `range` function takes **up to** three arguments:
```python
range(a,b,c)
```

The behaviour changes depending on how many arguments you give:

* `range(a)` - produce all numbers from `0` up to, but *excluding* `a`.
* `range(a, b)` - produce all numbers from `a` up to, but *excluding* `b`.
* `range(a, b, c)` - produce all numbers from `a` up to, but *excluding* `b`, incrementing by `c`.

**Note:** In the last case, the numbers can be decreasing if `b < a` and `c < 0`.

Now that we know about ranges, let's look at the `for` loop!

Here is the `for` loop syntax:
```python
for var_name in iterable:
    some_code

code_when_finished
```

At each step, the next item from our iterable is stored in `var_name`. If we have a list, then the next list item is considered. If we have a `range`, then the next number in the `range` is considered.

Let's see an example where we're calculating the squares of all numbers between 1 and 10 (excluding 10):

In [None]:
# Your code here


We can use many different types of objects in the `for` loop. These types are known as **iterables**.

We've seen a bunch of these iterables already!

#### String Iteration

We can easily use a **string** as an iterable in the `for` loop, like so:

```python
for c in my_string:
    do_something
```

**Note:** `c` is a single character in the string. 

Let's see a DNA example:

In [None]:
my_dna_sequence = "ACGGACAGGAGCGAGATTTGACAGCATTA"

number_of_purines = 0
number_of_pyrimidines = 0

# Your code here

print(f"In our sequence there are {number_of_purines} purines and {number_of_pyrimidines} pyrimidines.")

There's actually an easy way to clean up our boolean conditions. Instead of using string equality, we can check if the nucleotide is contained in a string using the `in` keyword.

#### List Iteration

We can also easily iterate through a list, where we run code on each element in the list:

```python
for item in my_list:
    do_something...
```

But let's say we want to loop through the list **and** get the index of the element...

Well, we can use the `enumerate` function. This returns a **tuple** containing the index and the item from the list.

**Note:** In the `for` loop, we can **immediately unpack** the tuple!

In [None]:
my_list = [2, 4, 6, 5, 8, 7, 1, 3, 5, 7, 8, 9, 10, 22, 11, 95]

number_of_even = 0
number_of_odd = 0

last_even_index = -1
last_odd_index = -1

# Your code here to extract the number of odd and even and get the final indices of each


print("Our list has", number_of_even, "even numbers and", number_of_odd, "odd numbers.")
print("The last even number was at index", last_even_index, "and the last odd number was at index", last_odd_index)


#### Dictionary Iteration

We can also iterate over dictionaries!

Here's an example:

In [None]:
image_counts = {
    "microCT": 12,
    "FIB-SEM": 5,
    "confocal": 36,
    "STORM": 6,
    "cryoTEM": 2
}

# Your code here to compute the average number of datasets for the modalities and store it in average_count
number_of_datasets = 0

for modality in image_counts:
    n = image_counts[modality]
    number_of_datasets += n

average_count = number_of_datasets / len(image_counts)

print("The average number of image datasets is", average_count)

### Interrupting Loops

Sometimes, you may want to interrupt a loop early, or skip one iteration. For this, we have the keywords `break` and `continue`.

#### Stopping early using `break`

We use `break` if we want to stop going through a loop.

Let's say we're iterating through a string and capitalise everything until we get to a period.

In [None]:
my_string = "Hello, world. This is code in Python."

new_string = ""

# Your code here

print(new_string)

If you have disregarded my advice and you've put a pure boolean into a `while` loop, then **make sure** that you have a `break` somewhere to get out of the infinite loop.

**Hint for later:** Think about a common biological system where this ability to interrupt might be helpful...

#### Skipping a beat using `continue`

But what if we don't want to stop early? What if we just want to skip an iteration?

The `continue` keyword skips the current iteration and moves on to the next step in the loop.

For example, let's say we want to take a string and remove all the vowels?

In [None]:
# Here's our string
my_string = "Hello, world! Our string has many vowels!"

VOWELS = ["a", "e", "i", "o", "u"]

new_string = ""
# Your code here

print(new_string)

These two keywords can also be used in `while` loops.

## Exercises: Working with Strings and Collections for DNA and Protein Processing

Earlier, we were looking at temperatures. Now, let's do some more biological exercises. DNA, RNA and proteins can be easily represented using Python collections. In these exercises, we're going to implement the fundamental gene expression steps: **transcription** and **translation**.

DNA and RNA are both **nucleic acids** that are composed of sequences of **nucleotides**. We won't go into the chemical details here (there are plenty of biology textbooks and Wikipedia pages for that), but here are the important ideas:

* DNA is composed of **adenine** (`A`), **thymine** (`T`), **guanine** (`G`) and **cytosine** (`C`).
  * `A` and `G` are known as **purines** and `T` and `C` are known as **pyrimidines**.
* DNA is double-stranded. Each strand consists of a sequence of nucleotides. These two strands interact with each other through **base pairing**. The rules for base pairing are:
  * `A` always pairs with `T`.
  * `C` always pairs with `G`.
* DNA and RNA share *most* of their nucleotides, but they differ in one of the pyrimidines. DNA has thymine `T` while RNA has uracil `U`.
* RNA is single-stranded.

DNA and RNA are (of course) more complicated, but these are the basics that will be helpful in these exercises.

### Transcription

**Transcription** is the process by which messenger RNA (mRNA) is produced based on DNA. Recall that DNA is **double-stranded**. One strand serves as the **template** for the mRNA, and the new nucleotides forming the mRNA base-pair with this template strand. To obtain an mRNA sequence based on a DNA sequence, we have two possibilities:

* If we are considering the *template* strand, we go backwards along the sequence, base-pairing each nucleotide to build up a sequence.
* If we are considering the *non-template* strand, we go along the sequence, replacing each `T` with a `U`.

Let's consider the following sequence:

```
AGCAGATGCATTAGCCATTAGTTTGCACCAGTATATGCAGAGTTTAGGAGACCATAATTAACGAGAGCCGATAGCTAGA
```

1. Assume that this is the **non-template strand**. Transcribe this sequence into mRNA.

In [None]:
dna_sequence = "AGCAGATGCATTAGCCATTAGTTTGCACCAGTATATGCAGAGTTTAGGAGACCATAATTAACGAGAGCCGATAGCTAGA"

# Put your code here


2. Now, let's assume it is the **template strand**. Perform the transcription.

In [None]:
dna_sequence = "AGCAGATGCATTAGCCATTAGTTTGCACCAGTATATGCAGAGTTTAGGAGACCATAATTAACGAGAGCCGATAGCTAGA"

# Put your code here


### Translation (Part I)

After the DNA is transcribed into mRNA, the mRNA travels to the ribosomes, where it is translated into an amino acid sequence. This translation occurs by **codons** of 3 nucleotides each. But, remember, we need to look for a **start codon** `AUG`.

I've given you an mRNA sequence. To prepare for translation, convert the mRNA sequence into a list of codons **starting with the start codon**. Print the number of codons you've found.

In [None]:
my_rna = "AGCAGCAUGACCGAGUCAGUCAGCUUGCGGCUACGUACUGGCCAUUAGCAGUACAGU"

# Your code here

In [None]:
my_rna = "AGCAGCAUGACCGAGUCAGUCAGCUUGCGGCUACGUACUGGCCAUUAGCAGUACAGU"

# Your code here

# Here are a few hints ...

# 1. Create an empty codon list


# 2. Find the start codon


# 3. Iterate over the string
for i in range(..., ..., ...):
    # 4. Get the codon...
    

    # 5. Add codon to list
    


### Translation (Part II)

Now that we have a list of codons, we can convert them to amino acids using a codon table. To make things a bit more interesting, I've given you the inverse codon table from https://en.wikipedia.org/wiki/DNA_and_RNA_codon_tables. This table has the amino acids as keys and the list of corresponding codons as the values.

**Hint:** As a first step, you may want to create the forward dictionary, with the codons as keys and the amino acid as value. This step isn't *necessary* but it will make your code more efficient (and look nicer).

**Recall:** Your list of codons from the DNA sequence earlier should still be in the variable `my_codons`.

In [None]:
amino_acid_to_codon_table = {
    "F": ["UUU", "UUC"],
    "L": ["UUA", "UUG", "CUU", "CUC", "CUA", "CUG"],
    "I": ["AUU", "AUC", "AUA"],
    "M": ["AUG"],
    "V": ["GUU", "GUC", "GUA", "GUG"],
    "S": ["UCU", "UCC", "UCA", "UCG", "AGU", "AGC"],
    "P": ["CCU", "CCC", "CCA", "CCG"],
    "T": ["ACU", "ACC", "ACA", "ACG"],
    "A": ["GCU", "GCC", "GCA", "GCG"],
    "Y": ["UAU", "UAC"],
    "STOP": ["UAA", "UAG", "UGA"],
    "H": ["CAU", "CAC"],
    "Q": ["CAA", "CAG"],
    "N": ["AAU", "AAC"],
    "K": ["AAA", "AAG"],
    "D": ["GAU", "GAC"],
    "E": ["GAA", "GAG"],
    "C": ["UGU", "UGC"],
    "W": ["UGG"],
    "R": ["CGU", "CGC", "CGA", "CGG", "AGA", "AGG"],
    "G": ["GGU", "GGC", "GGA", "GGG"]
}
# Your code here


### Advanced Freezing Point Thermometer

We've seen a lot of examples of control flow using numbers. Well, we can also use strings.

For this exercise, let's update our temperature detector to be helpful for people who use Fahrenheit or Kelvin. Write code that takes the current temperature and a variable `units`. The `units` can be equal to `C`, `F` or `K`. Using these two variables and some control flow, determine whether the provided temperature is above, below or at freezing and print a message for each case.

In [None]:
# Store the current temperature and the units
current_temperature = 10
units = "F"

# Your code here for exercise with units


### Temperature Conversions

Recall that earlier we worked on converting between Celsius and Fahrenheit.

The conversion to Fahrenheit from Celsius is given by:
$$
    \text{F} = \frac{9}{5}\text{C} + 32
$$

To convert from Fahrenheit back to Celsius, we use the equation:
$$
    \text{C} = \frac{5}{9}(\text{F} - 32)
$$

(P.S. if you ever forget, easy way to remember: the relationship is linear -- the lines intersect at -40 -- and we know that water freezes at 32Â°F and 0Â°C and boils at 212Â°F and 100Â°C; with any two of these three points, you can definitely find the line).

Now, let's make our code a bit more complicated...

I've given you two variables: an input temperature and a string containing the input units.

Write code that uses the proper conversion based on the input units.

In [None]:
input_temperature = 60
input_units = "F"

# Your code here


Now, for another example, let's find the temperature in Fahrenheit for all Celsius temperatures from $-40^\circ \text{C}$ to $+35^\circ \text{C}$ (inclusively), incrementing by $5^\circ$.

**BONUS:** Write this code twice: once using a `for` loop and once using a `while` loop.

In [None]:
# Put your code here...


## BONUS: Replacing `for` Loops with `while` Loops

Any time that you use a `for` loop, you can actually use a `while` loop instead. It's just not always as nice and clean:

In [None]:
# Done using a `for` loop
for i in range(10):
    print("The value of i is now", i)
    # print("The operation 2 * i gives us:", 2 * i)


# Done using a `while` loop.
i = 0

while i < 10:
    print("The value of i is now", i)
    i += 1

## Module Summary

Congratulations! You've made it through the control flow and loops module! Here's what we've seen:

* We can decide which lines of code to run (**control flow**) using **`if` statements**.
* We can use repeat code using **`for` and `while` loops**.
* We can **iterate** over strings and collections using `for` loops.
* We can **interrupt** loops using `break` and `continue`.

# Module 4 - Introduction to Functions

In this module, we'll explore functions. By now, we've used existing functions, like `print`, as well as methods like `str.replace`. Here, we'll formally define what is a function and we'll see how to *define* new functions.

Here's the outline for this module:

1. Function Overview
    1. What is a function?
    2. Calling built-in functions
2. Writing Custom Functions
    1. Function definition
    2. Function parameters
    3. Function Return values
3. Documenting Functions
    1. Defining function docstrings
4. To script, or not to script?
5. **Exercises:** Writing functions for biological sequences


## Function Overview

Everything we've looked at so far has involved running a block of code. If we want to run it more than once, we need to put it into a loop...

But what if we want to run it once now on one input, and then later on another input?

Well... we'd have to copy the code.

But what if we then realise that there's a mistake we have to correct.

And then we realise we have the same code in 50 different places!

That's a lot of find-and-replace!

If only we could make the process easier...

### What is a Function?

Introducing **functions**!

We can think of functions as **machines** that take in **inputs**, run code (do calculations, magic or a bit of both), and then produce an **output** that can be used.

The inputs are known as *parameters* or *arguments* and the outputs are known as *return values*.

Here's a diagram to illustrate this.

![Function as a machine](../assets/function/Function.png)

Like anything in Python, a function has a **name**. When we use a function, this is known as **calling** the function. When we *call a function*, we tell it what data to use to perform the operations and where to store the result (typically in a variable). The syntax to call a function and store its result in a variable `x` is as follows:

```python
x = function_name(arguments_here)
```

Let's now see some examples with built-in functions.

### Calling built-in functions

We've actually already seen a function! We used the `print` function a while ago. This function takes a string that we want to display, shows it on the screen and doesn't return any output. Let's call this function:

In [None]:
# Your code here to call the `print` function


Python has other **built-in functions** available. There's a list available at [this link](https://docs.python.org/3/library/functions.html). 

For example, we can use the `abs` function to take the absolute value of a number:

In [None]:
# Your code here to call the absolute value function on some input


Functions can also take multiple inputs. For example, we can round numbers using the `round` function, described [here](https://docs.python.org/3/library/functions.html#round):

In [None]:
# Your code here to call the round function on a decimal number 2.95 to 1 decimal place.


Here, `ndigits` is an optional **keyword argument**. Functions often have many of these, which have default values. To specify what value a keyword argument should take, you simply write it like you would a variable assignment. In this case, since the keyword argument is called `ndigits` in the documentation, and we want to set it to `1`, in the function call, we must write `ndigits=1`.

To run the function, we must **call it** by writing its name, and then including the arguments in brackets.

**Remember! Even if the function has no arguments, you must put the brackets!**

If the function **returns** a value, we can store it in a variable using the typical `=` assignment.

Let's explore the built-in [`round`](https://docs.python.org/3/library/functions.html#round) function:

In [None]:
# Your code here


We can learn more about any function using the built-in `help` function:

In [None]:
# Your code here


This help documentation, known as a **docstring** tells us important information about the function. It describes the parameters and return values, as well as any quirks that the function may have.

In addition to using the `help` function, we can also read the docstring online, at the official Python documentation: https://docs.python.org/3/library/functions.html#round.

## Writing Custom Functions

Now that we've seen what functions are and how to *use* them, let's dive into **defining** our own.

Why write our own functions?

It's all good and fun to write all the steps you want to do line-by-line. But, let's say you want to run the same set of steps multiple times, potentially on different inputs. You could just copy-paste the code... but what happens if you have to change it? You'll have to change all the copies!

Instead of copying the code, we can write new **functions**.

### Function definition

In Python, functions are defined using the `def` keyword. The syntax is:

```python
def function_name(argument1, argument2, argument3, ..., argumentN):
    """
    documentation here
    """

    your_code_here...

    return some_value

```
Here are the important elements to notice when **defining** a function:

* The function definition begins with the `def` keyword. This is similar to the `function` keyword in Javascript, or the `func` keyword in Swift.
* The **function name** follows the same rules as variable names. There are different naming conventions for names that consist of multiple words (`snake_case` vs `camelCase`). By common convention, the function name starts with a **lowercase** letter.
* After the function name, you can include a list of parameters in parentheses. **If your function takes no arguments, you must still put the brackets.** Each argument in the list must have a valid variable name. We'll discuss these in more detail later.
* After closing the argument list bracket, we put a **colon** (`:`).
* After the first line, we must **indent**. This tells Python where the function body begins and ends.
* We can start the body with a **docstring**, which describes the function. We'll discuss these more later.
* Then, you write your code as normal. In this function body, treat the arguments **like normal variables**.
* To **output** a result that can be used later, use the keyword `return`, followed by the result. We'll discuss this more later.
* After finishing to define the function, simply stop indenting. There's no need to close any brackets or type `end`.

To demonstrate, let's write a function with no arguments that simply prints a string onto the screen:

In [None]:
# Your code here


Wait! What happened? Or well, what didn't happen? We didn't see any string... What's going on?

Well, we only **defined** the function. To actually run the function we must *call* it. To call a function, simply write the name of the function, followed by the desired arguments in brackets. **If the function takes no arguments, you must still type the empty brackets.**

Let's call our function we just defined:

In [None]:
# Your code here


### Function Parameters

This function worked, but we didn't really put the *fun* in *function*.

We said that a function takes input and produces output... This does neither!!! So, let's create a function with some parameters! Let's look at the specific syntax:

```python
def my_function(arg1, arg2, arg3, ..., argN):
    my_code...

```

We separate each parameter using **commas (,)**. We can then refer to these as variables in the function body. In this case, in the function body's code, you can refer to `arg1` just as you would any other variable.

As an example, let's write a function that takes a DNA sequence as input and prints the transcribed RNA. To make it more interesting, let's add an extra parameter that indicates whether we are considering the sequence to be on the template strand or not. For simplicity, let's ignore the directionality of DNA.

Remember, if the DNA is on the template strand, we must perform base-pairing!

In [None]:
# Your code here


And now, let's call this function using a specific sequence.

In [None]:
my_sequence = "AATTAGCGAGCCGAATATATAGCCGCGATTCAGACAGTTCCAGCGCA"

# Your code here


This works well! Except, what if most of the time, we're going to call the function on the template strand? It would be nice if we didn't have to specify this argument every time we call the function and if we could give it a default value.

#### Keyword Arguments

Good news! We can set default values for function arguments. These are known as *keyword* arguments. Values without a default value are known as *positional* arguments. To specify the default value, simply assign the value with `=`:

```python
def my_function(my_positional_arg, my_kw_arg=default_value):
    ...
```

Let's extend our transcription example to set a default value for the `is_template_strand` parameter:

In [None]:
# Your code here to modify the function

def transcribe(dna, is_template_strand):
    if not is_template_strand:
        rna_sequence = dna.replace("T", "U")
    else:
        base_pairs = {"A": "U", "C": "G", "T": "A", "G": "C"}
        rna_sequence = ""
        for nt in reversed(dna):
            rna_sequence += base_pairs[nt]
    print(rna_sequence)

So, now we can call the function without having to specify a value for the second parameter:

There are a few **important rules** to remember about positional and keyword arguments:

1. Positional arguments **always** come first, both when defining and when calling functions.
2. When calling a function, you **must** include **all** positional arguments, but you can omit keyword arguments (since they have default values).
3. Keyword arguments can be passed in **any order**, but positional arguments must be kept in the same order.

### Function Return Values

So, we've seen how to pass information into functions, but now, how do we get information out? The answer is **return values**. These return values let us capture the result of a function, which we can then use like a normal variable in code. To return a value, we simply type `return` followed by the value we want to return.

Here's the syntax:
```python
def my_function(...):
    ...

    my_result = ...

    ...

    return my_result
```

Let's now switch our previous transcription function to *return* the mRNA instead of simply printing it:

In [None]:
# Your code here to modify the function to return a result

def transcribe(dna, is_template_strand = True):
    if not is_template_strand:
        rna_sequence = dna.replace("T", "U")
    else:
        base_pairs = {"A": "U", "C": "G", "T": "A", "G": "C"}
        rna_sequence = ""
        for nt in reversed(dna):
            rna_sequence += base_pairs[nt]
    print(rna_sequence)

So, this is how to return the value. Now, let's see how to capture and use it. To capture the value, we simply assign it to a variable, like normal, using the equal sign `=`.

In [None]:
# Your code here


**Note:** If your code has multiple branches, you can put multiple return statements in your code. **But**, once your code reaches the `return` line, the function **stops** and returns to the code that called it. Any code that you've written after the `return` statement **will not run**.

Let's just repeat that again: **Code underneath a `return` statement WILL NOT RUN.**

If you're using a good code editor, it will give you a warning about this "dead code".

We can also return *multiple* values using tuples, lists or dictionaries. For example, let's say we want to count the number of each type of nucleotide in a sequence of DNA:

In [None]:
# Your code here


Now, let's run this code on our example sequence:

In [None]:
# Your code here


This is great! But let's say you get this function from someone else to import and use in your own code. You don't want to have to find this function and read all the code just to use it... But, how do we know what parameters this function takes and what values it returns...

## Documenting Functions

The answer to this question is **documentation**. Remember how we looked at the `help` for the `round` function earlier? We can do the same thing for our custom functions!

### Defining Function Docstrings

When defining a function, we can provide a *docstring*, which describes the important information about a function in a **human-readable** form. The docstring is just a string that a person can read to learn more about a function. If you're using a code editor or IDE, like VS code or PyCharm, this string appears when you hover your mouse over a function. The information contained in this docstring can include:

* A brief description of the function.
* A longer description of the function. If you're implementing an existing approach, it could be good to include a citation here. You can also include equations here. In some documentation formats, some of this information may be better placed in a separate **Notes** section.
* A description of the function parameters, including their types.
* A description of the function return values, as well as their types. This is especially useful if you are returning multiple values and need to include their order.

Let's clarify our previous example by adding a docstring:


In [None]:
# Your code here to add a docstring to our function

def count_nucleotides(dna_sequence):    
    number_of_a = 0
    number_of_t = 0
    number_of_c = 0
    number_of_g = 0

    dna_sequence = dna_sequence.upper()

    for nt in dna_sequence:
        if nt == "A":
            number_of_a += 1
        elif nt == "T":
            number_of_t += 1
        elif nt == "C":
            number_of_c += 1
        else:
            number_of_g += 1
    
    return number_of_a, number_of_t, number_of_c, number_of_g

Now that we have a docstring, we can actually read it using the `help` function!

In [None]:
# Your code here to look at the help for count_nucleotides.


While there are not many rules for how to write docstrings, there are some guidelines laid out in the Python documentation in [PEP 257](https://peps.python.org/pep-0257/). There are also a number of common conventions used. One is the **numpydoc** style, which is used by the developers of the NumPy project. This style is described online [here](https://numpydoc.readthedocs.io/) and is integrated into some code editors.

## To script, or not to script?

That is the question...

Remember our discussion earlier about REPLs and IDEs...

Well, there's another related question... We've been working in a Jupyter notebook up until now...

But this may not always be the best choice.

Let's say, you want to analyse a lot of DNA (thousands of sequences) and you don't need to see the results as a they happen.

Or what if you want to do image processing in bulk and look at files that are outputted at the end?

In this case, you don't need to use a Jupyter notebook. Instead, you're going to probably want to write a **script**.

A **script** is a Python file (`.py`) that contains all the code you want to run. It may also contain comments explaining what certain lines are doing.

But unlike this notebook, the code is really the star of the show.

So, when should you use a Jupyter notebook and when should you write a script? Here are some things to consider:

* How much data do you want to process and how do you want to analyse it?
    * A lot of data, quickly, with minimal in-place analysis - script
    * A few specific examples, showing and commenting on specific processing steps - notebook
* Is your workflow well-established?
    * Yes - write a script to automate the process
    * No - start off with a notebook so that you can see the results of individual steps and easily tweak them and go back to specific cells

Why am I talking about this here and now?

We just finished talking about functions. Functions offer a great way to package up behaviour. But sometimes, you may want to fiddle around with the code before you package it up into a function, like we did with transcription in this workshop.

One philosophy for developing your code could be:

![development philosophy](../assets/scripts/ScriptProgression.png)

Following this philosophy, you can use a Jupyter notebook to figure out the logic for your code, then wrap it up in a function to make sure it works on a small set of data, and then move it into a script and/or a package to be able to easily run it on a large dataset and share it with others.

> **Note**
>
> This is not the only design philosophy. For example, you may approach a problem and immediately think about it in terms of which functions you need, and then do some iterative work where you test out little snippets in a REPL at the same time as writing your functions.

## Exercises: Writing Functions for Biological Sequences

### Amino Acid Properties

Proteins are composed of sequences of amino acids, arranged in polypeptide sequences. There are 20 common amino acids, which have different properties. We'll focus on polarity and charge. Amino acids are grouped into four categories:
1. Non-polar
2. Polar
3. Acidic
4. Basic

Let's write a function called `compute_amino_acid_properties` that takes a peptide sequence and returns the number of amino acids falling into each category. I've given you a dictionary with the amino acids and their properties as a starting point (obtained from [Wikipedia](https://en.wikipedia.org/wiki/DNA_and_RNA_codon_tables)).

To make things easier, I've also given you the function signature and the docstring.

In [None]:
AMINO_ACID_PROPERTIES = {
    "NON_POLAR": ["F", "L", "I", "M", "V", "P", "A", "W", "G"],
    "POLAR": ["S", "T", "Y", "Q", "N", "C"],
    "ACIDIC": ["D", "E"],
    "BASIC": ["H", "K", "R"]
}

# Your code here
def compute_amino_acid_properties(seq: str) -> dict[str, int]:
    """Compute the number of amino acids having different properties.

    Parameters
    ----------
    seq : str
        A string containing a sequence of peptides, represented as
        single letter symbols.

    Returns
    -------
    dict[str, int]
        Dictionary with keys representing the amino acid properties,
        ``NON_POLAR``, ``POLAR``, ``ACIDIC`` and ``BASIC``.
    
    """
    

In [None]:
# Here's an artificial amino acid sequence to test with:
test_peptide = ("EDEQLPAMFYDHSRMGQDCTIQYRAFFKFKCDEVVICPRMCRFDM"
                "GYLSCNWPDQWQFWPPNPHTDSTWVSLDYPLRWDCCRKPHTFEPY"
                "TMHASWCTERDPDIWACIKDSWMSPFEPQGSWGSTELVKEDPGFF"
                "SVFALRPCVWAAPTT")

test_peptide_properties = compute_amino_acid_properties(test_peptide)

print(test_peptide_properties)

### Translation

Earlier we wrote code to perform translation. This code worked well, but it would be more helpful if we wrapped it into a function. In this exercise, write and document functions for translation based on the code from the previous module. Then test this function on some artificial mRNA sequences.

**BONUS:** Make the function a bit more robust to the input. Use string methods to make the function case-insensitive.

**Note:** You may choose to break down the process into *multiple* functions.

In [None]:
# Create the codon table
amino_acid_to_codon_table = {
    "F": ["UUU", "UUC"],
    "L": ["UUA", "UUG", "CUU", "CUC", "CUA", "CUG"],
    "I": ["AUU", "AUC", "AUA"],
    "M": ["AUG"],
    "V": ["GUU", "GUC", "GUA", "GUG"],
    "S": ["UCU", "UCC", "UCA", "UCG", "AGU", "AGC"],
    "P": ["CCU", "CCC", "CCA", "CCG"],
    "T": ["ACU", "ACC", "ACA", "ACG"],
    "A": ["GCU", "GCC", "GCA", "GCG"],
    "Y": ["UAU", "UAC"],
    "STOP": ["UAA", "UAG", "UGA"],
    "H": ["CAU", "CAC"],
    "Q": ["CAA", "CAG"],
    "N": ["AAU", "AAC"],
    "K": ["AAA", "AAG"],
    "D": ["GAU", "GAC"],
    "E": ["GAA", "GAG"],
    "C": ["UGU", "UGC"],
    "W": ["UGG"],
    "R": ["CGU", "CGC", "CGA", "CGG", "AGA", "AGG"],
    "G": ["GGU", "GGC", "GGA", "GGG"]
}

# Your code here


And now for the testing...

In [None]:
my_sequence = "AGCAGATGCATTAGCCATTAGTTTGCACCAGTATATGCAGAGTTTAGGAGACCATAATTAACGAGAGCCGATAGCTAGA"

my_mrna = transcribe(my_sequence)
my_peptide = translate(my_mrna)

print(my_peptide)

## Module Summary

In this module, we've explored **functions**. Specifically, we've seen:

* What functions are and how to **call** them.
* How to **define new functions**, which take in **parameters** and **return** results.
* How to **document** functions using **docstrings** to make them easier to understand and reuse.

That brings us to the end of the programming examples and theory... Now, let's look a bit to the future.

# Module 5 - Where to Go From Here

We're just about at the end of our workshop! Over the course of these few hours, we've seen the basics of variables and numbers, Booleans and strings, as well as more complicated collection types. We've also seen how to package up our code into functions to have repeatable units of behaviour.

So... what comes next?

## What to Learn Next? How?

What great questions? Well, there are still a bunch of topics that I didn't cover today.

We saw functions today. Functions are great and fun. You can write so much stuff... but what if you don't want to reinvent the wheel and rewrite everything from scratch?

Good news! You don't have to! With Python, it's very easy to install **packages** and import code from **modules**.

These topics will be covered in my upcoming workshop **Data Processing in Python**. Here's a little preview of what will be covered:

* Modules and packages: how to use code written by other people
* Array programming with NumPy: how to store large arrays of data
* Plotting with Matplotlib: how to make nice data plots
* Tabular data with pandas: how to process data in tables

*If you're interested in modules more specific to biology and bioinformatics, make sure to check out the [**BioPython** module](https://biopython.org/).*

How can you learn about other Python topics? There are plenty of resources out there. Keep your eyes open for other workshops! And check online for tutorials and videos. I'll talk a bit more about these soon.

## How to Get Help and How NOT to Get Help?

This section is based on the corresponding sections in two of my previous workshops (see [Intro to Python - Summer 2024](https://github.com/bzrudski/micm_intro_to_python_summer_2024) and [Intermediate Python - Summer 2024](https://github.com/bzrudski/micm_intermediate_python_summer_2024)).

When writing code, there are a bunch of resources that can help you!

### Your Code Editor

Yes! That's write! The software you're using to write code can give you lots of help. It can suggest completions and tell when there are errors and even help you reformat your files and restructure your code. So, please, please, please, **DO NOT** write your code in a simple text editor that has not additional features. And ***PLEASE*** don't use a word processing software. Use software that is made for coding!

### Documentation

I mentioned this one earlier. Documentation isn't just something that you should do. Big established projects have big documentation. Take a look at their guides for getting started. For example, [Pandas](https://pandas.pydata.org/) has a [10 minutes to pandas](https://pandas.pydata.org/docs/user_guide/10min.html) tutorial. Other packages, like [NumPy](https://numpy.org) and [Matplotlib](https://matplotlib.org) have very thorough guides and/or examples. Use these resources! If you want to learn how to use a function, **look it up** and read the paragraph about it. It will tell you how to use the arguments, any quirks to expect, and in some cases it will give you references about the papers behind the function. This is especially true in image processing and other fields that rely heavily on algorithms. So, the documentation will tell you not only how to use the code, but also **where it comes from**. And make sure to check out the Official Python docs at https://docs.python.org/3/.

### Books

Books, books, books! There are tons! And tons of books out there! For example, there are a couple of general books that are free online:
* *Think Python 2e* by Allen B. Downey (FREE book): https://greenteapress.com/wp/think-python-2e/
* *Data Structures and Information Retrieval in Python* also by Allen B. Downey (FREE book): https://greenteapress.com/wp/data-structures-and-information-retrieval-in-python/
* *Introduction to Python Programming* by Udayan Das et al., published by OpenStax: https://openstax.org/details/books/introduction-python-programming
* *The Hitchhiker's Guide to Python* by Kenneth Reitz and Tanya Schlusser: https://docs.python-guide.org/

There are also books online about more specialised topics, such as:

* Package development: *Python Packages* by Tomas Beuzen and Tiffany Timbers -- https://py-pkgs.org/
* Data science:
    * *Python for Data Analysis, 3E* by Wes McKinney -- https://wesmckinney.com/book/
    * *Python Data Science Handbook* by Jake VanderPlas -- https://jakevdp.github.io/PythonDataScienceHandbook/

Another book that covers software development for research more generally, including more emphasis on the tools used is:

* *Research Software Engineering with Python* by Damien Irving, et al.: https://third-bit.com/py-rse/index.html

Through the databases at the McGill Library, we also have access to lots of books **for free**. Check out the library's online catalogue to see more.

### Tutorials

Tutorials are also great! And very much abundant! From more formal ones on sites like [freeCodeCamp](https://www.freecodecamp.org/) and [W3Schools](https://www.w3schools.com/python/default.asp) to less formal ones on [DEV](https://dev.to/), you can get lots of insight from these. There are also lots posted on Medium that you can check out. In addition to text-based tutorials, there are also videos on YouTube. And don't forget the official tutorials in the documentation! Tutorials are a very valuable resource that can help you see how to put pieces of code together in real-world examples.

**Want to try some bioinformatics examples?** Check out the Rosalind platform at <https://rosalind.info/problems/locations/>.

### Stack Overflow (and Pitfalls)

If you have a Python question, chances are that someone, somewhere has asked it on [Stack Overflow](https://stackoverflow.com/). Stack Overflow is a **great** resource for finding answers to real questions about programming. **But** make sure that you're using it properly. Try the other resources before going to Stack Overflow. The answer may turn out to be on the documentation page for the function you're looking for. If there's a link to the docs in a Stack Overflow answer, **use it**. Check out in more detail. Make sure that you understand the code that you're about to add to your project and don't just copy-paste it. Coding is a thinking game. Make sure that you have thought about all the code that you're putting in and that you understand why it's there. And use your judgement and intuition when borrowing that code. If it looks sketchy, it could very well be sketchy and there may be a better way.

### ChatGPT (and Pitfalls)

Everything I said above about Stack Overflow. And more. Answers on Stack Overflow are written by humans who have written the code, tested it, and run the results. Be careful when using ChatGPT for code (if you're allowed to at all). Make extra sure that it makes sense, and test it. Don't just trust it because AI wrote it for you. You need to make extra sure that it actually makes sense and runs properly, because you don't have that same guarantee that a human has used this exact code in their own experience. Use your coding judgement and intuition.

Again, ALWAYS remember to **read the documentation**. Often, if you're stuck, the answer is **right there**. If it's not, then it's probably on Stack Overflow. It's often a good idea to check the documentation **first** to see if there's an official explanation or an official example. And don't just copy a Stack Overflow answer or sample code. Think about what the code is doing. Does it make sense? Is there a better way? Try to look line by line to understand what is going on (play around in the IPython interpreter or in a Jupyter notebook!).

## Other Cool Programming Topics

So, I talked a bit about functions and classes, but there's much more that you can look into to help build your programming skills and write code that others will want to use.

### Writing Packages

We've seen how to install and use packages. But, you can also **write your own packages**. There are many great resources online about writing packages. The one that I most recommend is [this free online book](https://py-pkgs.org/): *Python Packages* by Tomas Beuszen and Tiffany Timbers. It's an easy read and helps you learn not only how to organise your code, but how to publish it, too. The authors also walk through how to render your own nice-looking documentation and host that online.

### Object-Oriented Programming

Writing code with loops and control flow is fun, but it's even better when we can combine everything into functions and classes and work in an **object-oriented** manner. This paradigm helps you organise your code differently, constructing building blocks that can work together to build elaborate programs.

### Developing Graphical User Interfaces

Jupyter notebooks and command line scripts are powerful, but they aren't accessible for people who don't know how to code. Solution: build a graphical user interface! Using PyQt, the process is quite straightforward. Check out [this online tutorial series](https://www.pythonguis.com/) by Martin Fitzpatrick to learn about developing GUIs in Python.

### Hosting Projects on GitHub

What fun is a project if other people can't use it? By hosting your project on GitHub, you let others easily contribute to your project and build on it. Learning Git and GitHub are essential! And so are a few other skills along the way, like writing documents in Markdown. MiCM often has Git and GitHub workshops, so check out their workshop schedule!

## The End

We've reached the end of our workshop! For those of you who have previous programming experience, congratulations on adding another language to your repertoire. For those of you who are new, welcome to the world of programming! Just remember, programming is like art: you start with an empty text file and soon enough, you have hundreds (or thousands) of lines of code!

Don't hesitate to reach out if you have any further questions. Happy coding!

In [None]:
from time import sleep


print("Good luck with your programming future!", end=" ")

i = 1
s = "/-\\|"

print(s[0], end="")

while i < 10:
    print("\b" + s[i % len(s)], end="")
    i += 1
    sleep(0.5)

print("\bðŸŽ‰")