All notes are based on notes produced by Mike White for WashU Bio 5075 Fundamentals of Biostatistics, a first year DBBS graduate student course.

# 1. Using Markdown

Jupyter notebooks consist of a set of "cells." In these cells, you can write code or type text. You excute each cell by pressing shift-enter. In a markdown cell, you can write formatted text. In a code cell, you can write Python code.

This is useful because it's like a lab notebook - write down experimental methods, notes about the data, then analyze, etc...

Key things to know:
    1. Use the dropdown menu to select "code" or "markdown" cell type. (We won't use the other cell types.)
    2. Text can be formatted using what's called **markdown**
    3. Type text and format, then hit shift-enter to execute the cell (display the formatted text).

---

## Select this cell. 

1. Now create your own new cell below it using the + button.
2. Change the cell type to Markdown using the drop down menu.
3. Write a sentence and press shift-enter.
4. Double click on the new cell to edit your sentence, then press shift-enter again.

---

Hello there -- how are you? Great

## Markdown format examples

(Double click the cell to make it editable, which allows you to see markdown formatting symbols.)


# Big Header
## Smaller Header
### Even Smaller Header

*Italic* text
**Bold** text

Use backticks for plain text for code examples:
`print('Hello World'\n)`

Code example with syntax coloring:

```python
data = [1,2,3]
print(data)
```

Lists are created by typing a number followed by a period or parentheses:
1. Adenine
2. Cytosine
3. Guanine
4. Thymine

Math (LaTex format) is enclosed by dollar signs:

$\textit{sample_mean} = \frac{\textit{sum}}{\textit{n}}$


# 2. Variables

An important concept in any programming language is the idea of a variable. Variables in programming languages are similar to, but not the same as, variables in math.

Variables are names referring to objects stored in the computer's memory. An object or a type of data is *assigned* to a variable (using, as we'll see, the `=` symbol, called the *assignment operator*). Once an object has been assigned to a variable, you work with that object by invoking the variable name in your code.

An important feature of variables is that they can be assigned different *types* of data: integers (int), decimal numbers (float), text (str *strings* of characters), boolean values (True, False).

In [1]:
# Assigning values to a variable 
# NOTE: the # symbol denotes comments in Python code. Comments are ignored by the Python interpreter

n = 25 # assigning an integer
o = 'adenine' # assigning a character string

In [2]:
# To see the value assigned to the variable n, use the print() function:

print(n)

25


In [3]:
# In Jupyter notebook, a variable placed on the last line will displayed automatically without using print()

n = 50
n

50

In [4]:
# Notebook displays only the LAST statement as output

n # won't be displayed
o # will be displayed

'adenine'

In [5]:
# To show both variables use print()

print(n, o)

50 adenine


In [6]:
# Variables can be assigned new values of any type - hence the name variable

n = 'cysteine' # n is now a text string
n

'cysteine'

## Know the type of value assigned to a variable

What you can do with a variable depends on the value assigned to it. You can add ten to a numerical variable, but not to a string.

One common cause of buggy code is trying to perform an operation on a variable with the wrong data type.

In [7]:
a = 50
a + 10 # you can add 10 to a number

60

In [8]:
b = 'adenine'
b + 10 # will give an error

TypeError: can only concatenate str (not "int") to str

In [9]:
# Use the type function to find out what type a variable is

c = True
d = 12.2

print(type(a), type(b), type(c), type(d)) # integer, string, boolean, floating point number

<class 'int'> <class 'str'> <class 'bool'> <class 'float'>


In [10]:
# Numbers can be numbers or strings

n = 1
print(type(n))
n = '1'
print(type(n))


<class 'int'>
<class 'str'>


## Variables "capture" the result of operations

A line in the middle of your program might do something that produces an output. If you don't capture that output by assigning it to a variable, it will disappear.

In [11]:
# multiply `d` by 10:

print(d)
d*10 # not really useful because the result isn't assigned to a variable

12.2


122.0

In [12]:
# save the result:

answer = d*10

In [13]:
# now we can refer to that value later:

print(answer)

122.0


## Python reads assignment statements from right to left

Note how variable assignment works in the cell above. The variable name is to the left of the assignment operator. An operation that produces some value is on the right side.

Python takes a statement like the one above and works from right to left: the operation on the right is evaluated, and the result is assigned to the variable on the left.

You can put any sort of operation on the right side of the assignment operator, including other variables, as in the example below:

In [14]:
# Create some variables

data = 12
scale_factor = 2

In [15]:
scaled_data = data * scale_factor
print(scaled_data)

24


## Interlude: Don't get confused by the order of cells in the notebook

Jupyter Notebook consists of a set of cells that you generally run from top to bottom. But Python doesn't care about the order of the cells on your browser, only about what code was run last. 

In Jupyter, you can edit and rerun previous cells, but beware: if you change the value of a variable in a lower cell, and then rerun an upper cell, the upper cell will use the new value.

As an example, run the cell below, then rerun the cell above. Does the result make sense?

Notice the number in brackets to the left of each code cell. That refers to the order the cells were run in.

In [16]:
# When you rerun the previous cell, will factor in the new value.
# Pay attention to the line numbers!

scale_factor = 3

In [17]:
# Variables need to be defined before you use them in an expression
# This produces an error because data2 is not defined:

scaled_data = data2 * scale_factor

NameError: name 'data2' is not defined

## Defining variables in terms of other variables

You can define one variable in terms of another variable, but know that variables defined by other variables are NOT dynamically updated.

In [19]:
# Define some variables

replicates = 4
mice_per_rep = 3

# Define new variable from existing variables
total_mice = mice_per_rep * replicates # math with variables
print(total_mice)

12


In [20]:
# Now let's change the number of replicates. What happens to the value of total_mice?

replicates = 5
print(total_mice) # Does the value change? What should you do to update it?

12


# 3. Key variable type: strings

To define a string, you put a *string* of characters between single or double quotes:

In [21]:
dna1 = 'ACGT'
print(dna1)
dna2 = "TCGA"
print(dna2)

ACGT
TCGA


In [22]:
# use triple quotes to span multiple lines in a string
rna = '''GACUAGCUA
GGGCUACAG
CCAGCAGCA'''

print(rna)

GACUAGCUA
GGGCUACAG
CCAGCAGCA


In [23]:
# Normally we just use single quotes, but sometimes double quotes are useful:
# This won't work - python ends the string with the second quote mark:

word = 'don't'

SyntaxError: invalid syntax (<ipython-input-23-9e33455cc3ab>, line 4)

In [24]:
# This works

word = "don't"
print(word)

# You can also use the escape character \ to achieve the same thing:

word = 'don\'t'
print(word)

don't
don't


In [25]:
# DON'T CONFUSE VARIABLE NAMES WITH STRINGS!

print(word) # print the value of the variable word
print('word') # print the string word

don't
word


In [26]:
# Some math operators do double duty as operators on strings.
# We'll see more examples later on.
print(dna1)
print(dna2)
print(dna1 + dna2) # concatenation

ACGT
TCGA
ACGTTCGA


In [27]:
# The basic simple variable types

gene = "CDC28" # strings go between single or double quotes
n_mice = 13 # integer
protein_level = 1.76 # float
is_present = True # boolean can be True or False
is_present = False

# ACTIVITY 1: Variables

## Rules for naming variables:

1. Names are case sensitive = `DNA`, `Dna`, and `dna` are all distinct.
2. Don't begin with a number
3. Generally stick to upper and lower case letters, and numbers
4. Avoid these reserved names that are already used by Python:

`and, as, assert, break, class, continue, def, del, elif, else,
except, False, finally, for, from, global, if, import, in, is, 
lambda, None, nonlocal, not, or, pass, raise, return, True, try, 
while, with, yield`


## Problem 1.1: Variable assignment

In the cell below, take 10 to the 4th power, and assign the result to a variable named `answer1`. To raise a number to a power, use `**`.

In [28]:
# Write your statement here
answer1 = 10**4
# Then print the answer
print(answer1)

10000


## Problem 1.2: Define variables using other variables

Variables can be defined by other variables. In the cell below, divide `answer1` by 2, and assign the result to `answer2`, and print the answer.

In [29]:
answer2 = answer1/2
print(answer2)

5000.0


## Problem 1.3: Variables from other variables are not dynamically updated

A variable whose value comes from another variable is **not** altered when the first variable is changed. 

To practice this, in the cell below re-assign `answer1` the value of 10 to the 5th power. Run the cell with *shift-enter*. Then print `answer2`. Did the value of `answer2` change?

(Bonus question: What data type is `answer2`?)

In [30]:
answer1 = 10**5
print(answer2)

type(answer2)

5000.0


float

## Problem 1.4: Copy a variable

You can make a copy of a variable by assigning one variable to a new variable. What happens to the copy when you change the value of the original variable?

In the cell below create a copy of `answer2` by assigning to a new variable, `answer3`. When you execute the cell, you see that both variables have the same value.

In the subsequent cell, change the value of `answer2` to `20`, then print both variables. What happened to `answer3` when the value of `answer2` was changed?

In [31]:
answer3 = answer2

In [32]:
answer2 = 20
print(answer2, answer3)

20 5000.0


## Problem 1.5: Capture the output of a function by assigning it to a variable

Functions like `type()` take an input and return an output. That output can be "captured" by a variable using the assignment operator. Use `type()` to determine the data type of `answer2`. Assign the result to `answer4` and print it out.

In [33]:
answer4 = type(answer2)
print(answer4)

<class 'int'>


In [34]:
# End of ACTIVITY 1

# 4. Key variable type: lists

A list is defined by square brackets and contains an ordered set of elements. A list can be *initialized* (created) as an empty list, like this: `[]`, or with values inside, like this: `[2,"cat",True]`.

In [35]:
# A list of floats - say four measurements from an experiment.

experiment1 = [10.2, 11.1, 11.0, 9.5]

You have a list. Now what? Here are some things you can do with lists:

In [36]:
# How many items are in the list? Use the length function len() to find out.

len(experiment1)

4

In [37]:
# Access individual values in a list using indexing. Remember, lists are ordered!
# COMPUTER SCIENTISTS BEGIN COUNTING FROM 0
# Rerun this cell a few times with different position values.

experiment1[2]

11.0

In [38]:
experiment1[5] # FAIL

IndexError: list index out of range

In [39]:
# Access the last element in the list by counting backwards from the end.

experiment1[-1]

9.5

In [40]:
# Extract a range of positions. This is called a "slice".
# To take a slice, the first number is the start position, the second is the *last position plus one*:

print(experiment1)
experiment1[1:3] # position 1 and 2

[10.2, 11.1, 11.0, 9.5]


[11.1, 11.0]

In [41]:
experiment1[:3] # position 0, 1, and 2

[10.2, 11.1, 11.0]

In [42]:
print(experiment1)
experiment1[::-1] # wait, what? use this funky syntax to reverse the order of the list

[10.2, 11.1, 11.0, 9.5]


[9.5, 11.0, 11.1, 10.2]

In [43]:
# You can change what's inside a list (lists are *mutable*).
# Replace values in a list using indexing.

print(experiment1)
experiment1[1] = 11.3
print(experiment1)

[10.2, 11.1, 11.0, 9.5]
[10.2, 11.3, 11.0, 9.5]


In [44]:
# Test for membership in a list with 'in'

11.3 in experiment1

True

In [45]:
# Add items to a list using append() - note the dot notation as we use this function.

print(experiment1)
experiment1.append(10.6)
print(experiment1)

[10.2, 11.3, 11.0, 9.5]
[10.2, 11.3, 11.0, 9.5, 10.6]


In [46]:
# Very useful function - count occurrences of a value

count_data = [0,1,4,2,4,0,0,2,6,3]
count_data.count(0) # how many zeros in our data?

3

In [47]:
# Math operators have a meaning with lists too - concatenate

experiment2 = [9.8, 12.1, 11.0, 10.3, 12.0]

print(experiment1)
print(experiment2)
experiments = experiment1 + experiment2
print(experiments)

[10.2, 11.3, 11.0, 9.5, 10.6]
[9.8, 12.1, 11.0, 10.3, 12.0]
[10.2, 11.3, 11.0, 9.5, 10.6, 9.8, 12.1, 11.0, 10.3, 12.0]


# For loops

For loops are a common structure in computer code.
One example of when to use a for loop is when we want to run a few lines of code on every element in a list.
So, we say `for each_thing in my_list, do SOMETHING`:

```python
for each_thing in my_list:
    do something with the value of each_thing
    keep doing more stuff, whatever
    (then move on to the next thing in the list)
```

^^ Notice how the body of the for loop is indented four spaces. For loop gets four spaces - get it? - that's not on purpose but you can remember it that way.

In [48]:
print(experiments)

# Now access each item in the list one at a time and print it out
for data_point in experiments:
    print(data_point)

[10.2, 11.3, 11.0, 9.5, 10.6, 9.8, 12.1, 11.0, 10.3, 12.0]
10.2
11.3
11.0
9.5
10.6
9.8
12.1
11.0
10.3
12.0


In [49]:
for x in [1,2,3,4,5]:
    print(x**2)

1
4
9
16
25


In [50]:
# Example of a for loop

total = 0
for i in experiments:
    total = total + i
print(total) # the total sum of experiments
print(len(experiments)) # the number of observations in experiments
print(total/len(experiments)) # the mean value of experiments

107.8
10
10.78


## Range function
The `range()` function lets for loop iterate for a fixed number of times. 

`range(a,b)` represents numbers `[a, a+1, ... , b-1]`.

`range(b)` represents numbers `[0, 1, ... , b-1]`.

In [51]:
for n in range(3,10):
    print(n)

3
4
5
6
7
8
9


In [52]:
print(experiments)
for n in range(len(experiments)):
    print(experiments[n])

[10.2, 11.3, 11.0, 9.5, 10.6, 9.8, 12.1, 11.0, 10.3, 12.0]
10.2
11.3
11.0
9.5
10.6
9.8
12.1
11.0
10.3
12.0


# ACTIVITY 2: Lists and for loops

## Problem 2.1: Putting data into lists.

We're going to walk though an example of how to use lists to process some experimental data. Imagine that you performed a two-color luciferase reporter gene experiment. In the red channel, you measured the activity of a reporter gene under six different drug concentrations. In the green channel, you measured the expression of your control reporter at each drug concentration. Below are the results:

**Red:** 23, 145, 203, 235, 354, 456

**Green:** 5, 11, 6, 9, 8, 4

In the cell below, create two lists with these values - one called `red`, the other called `green`. 

In [53]:
red = [23, 145, 203, 235, 354, 456]
green = [5, 11, 6, 9, 8, 4]

## Problem 2.2: Write a for loop to print all values of the list red.

Print each value from the `red` list using a for loop.

In [54]:
for x in red:
    print(x)

23
145
203
235
354
456


## Problem 2.3: Create an empty list to hold normalized values

In the cell below create a blank list called `normalized` by assigning an empty list `[]` to it.
Later, we will use a `for` loop add values to this empty list using the `.append()` function.
This empty list **must** be defined before we start our for loop. We create it first, and then add to it each time we move through the for loop.

In [59]:
# Create the empty list [] and assign it (=) to a variable called "normalized"
normalized = []

## Problem 2.4: Write a for loop to normalize red values by green

We want to use python to normalize the red values by the green values. For each sample, we will divide the red value by the green value, then assign the answer to the next position in a list called `normalized`.

Hint:
1. Write a for loop that goes over the range (`range()`) of the length (`len()`) of the red list (same length as the green list).
2. Use the index of the for loop to get values from the red and green lists.
3. Then divide the red value by the green value, then append (`append()`) that ratio to the list `normalized`.

But first, read through this code and think how you can use this structure to solve the problem.

In [60]:
n_obs = len(red) # the length of the red list, aka the number of observations in our experiment

for x in range(n_obs): # for each value in the range [0, 1, ..., n_obs -1]
    print(red[x]) # print the red value corresponding to the index value x

23
145
203
235
354
456


Now use that structure to solve the problem!

In [61]:
normalized = [] # useful to redefine this as an empty list every time

n_obs = len(red)

for x in range(n_obs):
    normalized.append(red[x]/green[x])

print(normalized)

[4.6, 13.181818181818182, 33.833333333333336, 26.11111111111111, 44.25, 114.0]


In [36]:
# End of ACTIVITY 2