# Scientific Python Basics

*Prepared by: Tige Rustad*

*Heavily modified from the Software Carpentry Ipython [notebook](https://swcarpentry.github.io/2014-03-17-ucb/lessons/tkcm-python/index.html) developed by Cindee Madison and Thomas Kluyver*

## 0. Python and Jupyter notebooks

Why python and Jupyter? Python syntax is simpler than most other languages and there is a wealth of libraries available for free. Jupyter notebooks are a convenient way to run python from inside an internet browser while adding commentary. 

 - What is the kernel?
 - Command mode
 - Keyboard commands
 - Cell types
 - Adding, moving, editing, and running cells
 - Simple markdown [(using this cheatsheet)](https://medium.com/ibm-data-science-experience/markdown-for-jupyter-notebooks-cheatsheet-386c05aeebed)

In [1]:
# This is a code cell, where we can type in and run chuncks of code
# Anything that comes after a hash symbol (#) is a comment, and won't be run
# You can use this to leave notes, or to 'disable' a line of code without deleting it

# Click the run button 
# Notice how this cell now has an index number (e.g. In [1])
# While Jupyter is running code it will show In [*], but it's often too fast to see 

__TIP2:__ To run the code in a cell quickly, press Ctrl-Enter.

__TIP1:__ To enter command mode press `Esc`. You can then add cells by pressing `b` for below or `a` for above. CLick on the keyboard icon above to see more shortcut keys.

## 1. Variables

The most basic component of any programming language are "things", AKA variables.

The most common basic "things" in Python are integers, floats, strings, booleans, and various types
things called an "object"  . 

Try entering various things in the code cells below and then run the cell with Ctrl-Enter.

In [2]:
None   # An integer

In [3]:
None   # A floating point number

In [4]:
None   # A boolean value (True or False, And Capitalization Matters!)

In [5]:
None   # A string (A series of charachters surrounded by single or double quotes)

In [6]:
# Try to run this cell with two of the things from above
None
None

In [7]:
# You can show multiple results using the print() command
# Try the same two variables using print()
print(None)
print(None)

None
None


#### Making named variables
You can assign variables using the equal sign. The name of the variable can only begin with an underscore or a letter (a-z, A-Z) which can be followed by letters, numbers (0-9) and underscores. For example: user_name, count, __name, pw123, etc. Use descriptive names, and don't worry too much about length! You can use tab autocomplete for variables. 

#### Python Variable Naming Conventions
Apart from the above-listed rules, there are some variable naming conventions the python community follows.

 - Variable name must be readable and descriptive. 
 - Avoid using single letters like a, b, x, y, etc. as variable names: they don’t really explain anything about the nature of variables.
 - Even though you can create variables name with any length, it is recommended to maintain the length so that it is easy to read.

In [8]:
# Let's store some things as variables
observed_fold_change = 1.972
significant_fold_change = 2
pathogen = 'Mycobacterium tuberculosis'
acr_induced_hypoxia = True  # Remember, this is case sensitive

print(observed_fold_change, significant_fold_change, pathogen, acr_induced_hypoxia)

1.972 2 Mycobacterium tuberculosis True


In [9]:
# The type() function tells us the type of thing we have
# Check the type of the variables from above
# Try out tab autocompletion

print(type(None))
print(type(None))
print(type(None))
print(type(None))

<class 'NoneType'>
<class 'NoneType'>
<class 'NoneType'>
<class 'NoneType'>


In [10]:
# What happens when a new variable points to a previous variable?
prior_sig_fc = significant_fold_change

significant_fold_change = 1.9

# When we change significant_fold_change what happens to prior_sig_fc?

In [11]:
print ("prior_sig_fc:", None)
print ("significant_fold_change", None)

prior_sig_fc: None
significant_fold_change None


## 2. Commands that operate on things

Just storing data in variables isn't much use to us. Right away, we'd like to start performing
operations and manipulations on data and variables.

There are three very common means of performing an operation on a thing.

### 2.1 Use an operator

Most of the basic math operators (+  -  *  /  > %) work like you think they should for numbers. There are at least two exceptions. The equal sign is used in python for assigning variables, so when you are asking if two things are equal you use double equal sign `==`. In Excel and many other programming languages you use the `^` symbol for exponents. Python instead uses `**`, so $9^3$ is `9**3`.

In [12]:
# Standard math operators work as expected on numbers
# Set a and b to small integers and then run this cell
a = None
b = None
print(a + b)
print(a * b)
print(a ** b)  # a to the power of b (a^b does something completely different!)
print(a / b)  
print(a > b)

TypeError: unsupported operand type(s) for +: 'NoneType' and 'NoneType'

Whenever you make a comparison in python, the result is stored as a Boolean value that can only be `True` or `False`. 

In [None]:
# Boolean operators compare two things
a = (1 > 3)
b = (3 == 3)
print(a)
print(b)
print(a or b)
print(a and b)

# Note that True and False will be interpreted as 1 and 0
print(True * 5)
print(False * 5)

Mathematical operators can also do some useful operations on other things, like strings.

In [None]:
# Some operators also work on strings
full_phrase = 'hello' + 'world'

print(None)
print('hello' * 3)
#print('hello' / 3)  # You can't do this!

***
### <font color=brown>Hands on practice</font>
Use the cell below to calculate $y = 3x^2 + 5x + 13$ when x = 9

In [None]:
x = 9
y = None

print (y)

In [None]:
# Note that parentheses change the order of operations as you would expect
y = (8-2*x)-(5*x-1)**2
print (y)

***
### 2.2 Use a function

These will be very familiar to anyone who has used Excel or programmed in any language, and work like you
would expect. You type the function name and then feed it the things that function will operate on inside parentheses. 

In [None]:
# There are thousands of functions that operate on things
# We've already used the type() and print() functions
print(type(3))

# Let's take a look at three more useful functions 
# len() gives you the length of a string 
# round() rounds to a number of significant digits
# sum() works on lists 

print(len('hello'))
print(round(3.3))
a = [4, 5, 6]
print(sum(a))

__TIP:__ To find out what a function does, you can type it's name and then a question mark to
get a pop up help window. Or, to see what arguments it takes, you can type its name, an open
parenthesis, and hit tab.

In [None]:
round?
#round(
round(3.14159, 2)

**TIP:** If you need to find the right function or how to use a function, Google is your friend. I often look for answers on [Google](http://lmgtfy.com/?q=how+to+round+numbers+in+python)

***
### <font color=brown>Hands on practice</font>
Use the len(), sum() and round() functions to calculate the average of the following list of numbers, rounded to three significant digits.

In [None]:
test_list = [-11.33832235, 1.65312546, 2.42441466, 8.39480206, 2.62371567]
### Start code here### 
test_length = None
test_sum = None
test_ave = None
test_ave = None    #Round the test_ave

print(test_ave)

### 2.3 Use a method

Before we get any farther into the Python language, we have to say a word about "objects". We
will not be teaching object oriented programming in this class, but you will encounter objects
throughout Python. In fact, even seemingly simple things like `ints` and `strings` are actually
objects in Python.

In the simplest terms, you can think of an object as a small bundled "thing" that contains within
itself both data and functions that operate on that data, often called "methods". For example, strings in Python are
objects that contain a set of characters and also various functions that operate on the set of
characters.

Instead of the "normal" `function(arguments)` syntax, methods are used by typing a period after a variable, like this `variable.method(arguments)`.

In [None]:
# A string is actually an object
# Set the variable gene to the string '16s ribosomal rna'
gene = None

In [None]:
# Let's try out some of the bundled methods for strings

print(gene.upper())
print(gene.replace('rna', 'dna'))


***
### <font color=brown>Hands on practice</font>
1. Use the string function `upper()` to capitalize the string
2. Replace '...' with ',' using the `replace` string method

In [None]:
fly_genes = "i'm not dead yet...ken and barbie...bride of sevenless"

# Capitalize the string
fly_genes = None

# Replace '...' with ', '
fly_genes = None

print (fly_genes)

## 3. Collections of things

Python provides us with a number of objects to handle collections of things.

Probably 99% of your work in scientific Python will use one of four types of collections:
`lists`, `tuples`, `dictionaries`, and `numpy arrays`. We'll look quickly at each of these and what
they can do for you.

### 3.1 Lists

Lists are probably the handiest and most flexible type of container. You can story any type of data in a list, in any combination.

Lists are declared with square brackets `listname[Thing1, Thing2]`. 

Individual elements of a list can be selected using the syntax `listname[index]`.

In [None]:
# Let's say we want to store metadata for YourFavoriteGene1, YFG1
# We can make a list of the full name, gene length, genome position, and a brief description
YFG1 = ["YouFavoriteGene1", 1002, 1272872 , "The best. gene. ever."]
print(YFG1, type(YFG1))

#Note the different types of data

In [None]:
# All collections of things use square brackets to call for specific indices
# Use indices to print out the full name and description of YFG1
# REMEMBER: The first index is zero, not one

print(YFG1[None])
print(YFG1[None])

In [None]:
# You can count from the end of the list with negative indices
print('last item is:', YFG1[-1])
print('second to last item is:', YFG1[None])

#### Slicing off parts of collections
For any type of collection we use square brackets and index numbers to access specific values. We can also select a range of indices using what is know as 'slice notation'. The basic syntax is `collection[startIndex:endIndex]`. This will give you a slice of the list from the startIndex to the endIndex, not including the end. 
![image.png](attachment:image.png)
If you leave out an index value, python reads that as the start or end of the list, so we can select "Monty" with `[:5]` and Python with `[6:]`. 

In [None]:
# Tou can access multiple items from a list by slicing, using a colon between indexes
# REMEMBER: The end value is not inclusive
print('YFG name, size, and location =', None)
print('YFG location and function', None)

In [None]:
# Lists are objects, like everything else, and have methods such as append()
# Let's add the 54.6 kDa molecular weight of YFG1 to the end of our list

print(YFG1)
YFG1.append(54.6)
print(YFG1)

# Lists can hold nearly any other thing, including other lists
YFG1.append([1,2])
print(YFG1)

### 3.2 Tuples

We won't say a whole lot about tuples except to mention that they basically work just like lists, with
two major exceptions:

1. You declare tuples using () instead of []
1. Once you make a tuple, you can't change what's in it (referred to as immutable)

You'll see tuples come up throughout the Python language, and over time you'll develop a feel for when
to use them. 

In general, they're often used instead of lists:

1. To group items when the position in the collection is critical, such as coord = (x,y)
1. When you want to make prevent accidental modification of the items, e.g. shape = (12,23)

In [None]:
# Let's make a set using the same list items from above
YFGset = ("YouFavoriteGene1", 1002, 1272872 , "The best. gene. ever.")
print(type(YFGset))

# Try and change the description at index 3 
#YFGset[3] = "this won't work with a tuple"

### Anatomy of a traceback error

Traceback errors are `raised` when you try to do something with code it isn't meant to do.  It is also meant to be informative, but like many things, it is not always as informative as we would like.

Looking at our error:

    TypeError                                 Traceback (most recent call last)
    <ipython-input-25-ed149295af18> in <module>()
          4 
          5 # Try and change the description at index 3
    ----> 6 YFGset[3] = "this won't work with a tuple"

    TypeError: 'tuple' object does not support item assignment
    
1. The command you tried to run raise a **TypeError**  This suggests you are using a variable in a way that its **Type** doesnt support
2. the arrow ----> points to the line where the error occurred, In this case on line 3 of your code form the above line.
3. Learning how to **read** a traceback error is an important skill to develop, and helps you know how to ask questions about what has gone wrong in your code.


### 3.3 Dictionaries

One problem with lists and tuples is that you have to remember what you stored at each position. Dictionaries are the collection to use when you want to store and retrieve things by their names (or some other kind of key) instead of by their position in the collection. 

Dictionaries are declared using {"thing1key" : thing1value, "thing2key" : thing2value}.

Let's make a dictionary of our YFG1 metadata.

In [None]:
# Make a dictionary of YFG1 metadata
YFG1meta = {'full_name' : 'YouFavoriteGene1',
            'gene_length' : 1002, 
            'genome_position' : 1272872, 
            'brief_description': 'The best. gene. ever.'
            }


print(YFG1meta)

print(YFG1meta['genome_position'])

# This won't work because there is no key named '2'
print(YFG1meta[2])

In [None]:
# Like lists, dictionaries are 'mutable', so you can add, remove, and change values
# Add a new key:value pair
YFG1meta['readDepth'] = 571
print(YFG1meta)



***


# End of Class 2
We will pick up here on Monday. Today's homework is a separate Jupyter notebook in this same folder. **Don't over-complicate the homework**. There are only a few lines of code for you to write, and they are all variations of things we've done in this class.


***



***

# Class 3

Welcome back! 

This class will finish our tour of `python` by covering three of the most important tools for programming:
 - `for` loops
 - `if` and `else` for making choices
 - making your own functions 

Let's do a quick review of how to store and operate on variables.

***
### <font color=brown>Hands on practice</font>


Let's make and store some variables like we did last week. Let's say you want to know if YFG1 expression goes up when you add the drug deathamous trioxide, or DMT. You have an experiment where you treated three samples with DMT and a three samples with no DMT. You've extracted RNA and now have read counts for YFG1 with and without drug. Before you can compare the results you have to normalize for the total number of read counts. I've given you a list variable `reads` that has the read counts of YFG1 followed by the total reads for each sample.

Before we can work with the data however we need to split the list so we have one for the drug treated and one for the control.

1. Split the `read` list into `read_counts` and `total_reads` using slicing.
1. For the first drug treated sample, calculate the reads-per-million total reads. The equation for that is:
##  $CPM = \frac {10^6}{totalreads} * readcounts$ 
1. Print the result


In [13]:
reads = [2259, 3011, 2792, 4028, 3235, 3719, 3042555, 2637390, 3196661, 3387062, 1957626, 2076348]
samplenames = ["DMT1", "DMT2", "DMT3", "NoDrug1", "NoDrug2", "NoDrug3"]

In [15]:
### Start code here ### (~4 lines)
read_counts = reads[:7]
total_reads = reads[7:]
CPM_first_sample = 10**6 / total_reads[0] *read_counts[0]
CPM_first_sample
### End code here ###

856.5286135156348

***
Now let's use these variables to learn how to make for loops

## 4. Repeating yourself

The two most common ways of repeating operations are for loops and while loops. For loops in Python are useful when you want to cycle over all of the items in a collection (such as all of the elements of a list), and while loops are useful when you want to cycle for an indefinite amount of time until some condition is met.

The basic examples below will work for looping over lists, tuples, and arrays, which we will cover next class. 

In [16]:
# A basic for loop - don't forget the white space!
for sample in samplenames:
    print(sample + ':')

DMT1:
DMT2:
DMT3:
NoDrug1:
NoDrug2:
NoDrug3:


**Note on indentation**: Notice the indentation once we enter the for loop.  Every idented statement after the for loop declaration is part of the for loop.  This rule holds true for while loops, if statements, functions, etc. Required identation is one of the reasons Python is such a beautiful language to read.

If you do not have consistent indentation you will get an `IndentationError`.  Fortunately, most code editors will ensure your indentation is correction.

__NOTE__ In Python the default is to use four (4) spaces for each indentation, most editors can be configured to follow this guide.

In [17]:
# Indentation error: Fix it!
for sample in samplenames:
    lowercase = sample.lower()
    print(lowercase + '!') # Bad indent

dmt1!
dmt2!
dmt3!
nodrug1!
nodrug2!
nodrug3!


In [19]:
# You can sum all of the values in a collection using a for loop
numlist = [1, 4, 77, 3]

total = 0
for num in numlist:
    total = total + num
    
    print("Sum is", total)

Sum is 1
Sum is 5
Sum is 82
Sum is 85


In [20]:
# While loops are useful when you don't know how many steps you will need,
# and want to stop once a certain condition is met.
step = 0
prod = 1
while prod < 100:
    step = step + 1
    prod = prod * 2
    print(step, prod)
    
print('Reached a product of', prod, 'at step number', step)

1 2
2 4
3 8
4 16
5 32
6 64
7 128
Reached a product of 128 at step number 7


In [21]:
# Often we want to loop over the indexes of a collection, not just the items
# python gives us a handy way to do that: enumerate()

for i, sample in enumerate(samplenames):
    print(i+1, sample, samplenames[i])

1 DMT1 DMT1
2 DMT2 DMT2
3 DMT3 DMT3
4 NoDrug1 NoDrug1
5 NoDrug2 NoDrug2
6 NoDrug3 NoDrug3


In [22]:
# Sometimes we want to loop through multiple lists
# you can use zip() to zip together lists
a = [1, 2, 3]
b = [3, 4, 5]
for i, j in zip(a,b):
    print (i, j, i*j)

1 3 3
2 4 8
3 5 15


***
### <font color=brown>Hands on practice</font>

1. Make an empty new list `CPM` for storing the results
2. Write a for loop that calculates the CPM for all of the samples and stores those values in `CPM` 

**Hint**: you'll need to use zip

3. Use enumerate to print a little table of the sample names and CPM


In [None]:
# To make an empty list, make a list but don't put anything in it...
CPM = None

# use the same calculation from above to calculate the sample_cpm
for reads, total in zip(None, None):
    sample_cpm = None
    CPM.append(sample_cpm)
    
# Now let's print out a little table
for i, cpm in enumerate(None):
    # print out the index i, the i-th samplename, and the cpm
    print(None, "\t", None, "\t", None)

## 5. Making choices

Often we want to check if a condition is True and take one action if it is, and another action if the
condition is False. We can achieve this in Python with an if statement.

__TIP:__ You can use any expression that returns a boolean value (True or False) in an if statement.
Common boolean operators are:
- equals        ==
- doesn't equal !=
- <, <=, >, >=

To do this in python we use `if` followed by the conditional expression and then a `:`. When you hit enter python will automatically indent four spaces. Everything indented after the if statement will run if if is `True`, and won't if it's `False`. 



In [None]:
# Simple if statements
x = 3
if x > 0:
    print('x is positive')
    
if x < 0:
    print('x is negative')
if x == 0
    print('x is zero')

If you want to have some code run if the result if False, you can use `else`. And you can chain together a whole string of conditionals using `elif`

In [None]:
# A simple if statement
x = 3
if x > 0:
    print('x is positive')
elif x < 0:
    print('x is negative')
else:
    print('x is zero')

In [None]:
# You can also calculate your boolean value, store it as a variable, and then use that instead of writing out the conditional

x = -1
test = x > 0

if test:
    print('Test was true')
else:
    print('Test was false')

***
### <font color=brown>Hands on practice</font>

For and if loops work great together. Let's go back to our example data set and make a for loop that prints all of the samples that have more than 3,000 reads for YFG1.

In [None]:
Let's look at the data again
print (samplenames)
print (read_counts)

In [None]:
for sample, reads in zip(samplenames, read_counts):
    if None:
        print (sample, reads)

## 6. Creating chunks with functions and modules

One way to write a program is to simply string together commands, like the ones described above, in a long
file, and then to run that file to generate your results. This may work, but it can be cognitively difficult
to follow the logic of programs written in this style. Also, it does not allow you to reuse your code
easily.

The most important ways to "chunk" code into more manageable pieces is to create functions and then
to gather these functions into modules. Below we will discuss how to create
functions and modules. A third common type of "chunk" in Python is classes, but we will not be covering
object-oriented programming in this workshop.

We've used many built in python functions in the class already, and we have used methods, functions built into objects. In the final part of this class we will build our own functions using `def`.

In [None]:
# It's very easy to write your own functions
def multiply(x, y):
    return x*y

With `def` you define a new function. After the name of the function you put what arguments you'd like to feed that function inside parentheses, in this case `x` and `y`. There are a lot of fancy things you can do with these arguments, including presetting values. 

Once you've done whatever you want the function to do, you send back the results using `return`. 

In [None]:
# Once a function is defined and saved in memory, it's available just like any other function
print(type(multiply))
print(multiply(4, 3))

In [None]:
# It's useful to include docstrings to describe what your function does
def say_hello(time, people):
    '''
    Function says a greeting. Useful for engendering goodwill
    '''
    return 'Good ' + time + ', ' + people

**Docstrings**: A docstring is a special type of comment that tells you what a function does.  You can see them when you ask for help about a function.

In [None]:
say_hello('afternoon', 'friends')

In [None]:
# All arguments must be present, or the function will return an error
say_hello('afternoon')

In [None]:
# Keyword arguments can be used to make some arguments optional by giving them a default value
# All mandatory arguments must come first, in order
def say_hello(time, people='friends'):
    return 'Good ' + time + ', ' + people

In [None]:
say_hello('afternoon')

In [None]:
say_hello('afternoon', 'students')

### Class Three Homework

Let's make a new function that calculates the mean and variance using our sample data above. There are much better ways of doing this with `numpy` and `pandas`, which we'll cover in the next class. But for now, let's do it the hard way so we really appreciate the easy way.

The formula for calculating variance is:
###  $variance = \frac{\Sigma{(x-mean)^2}}{n-1}$

For this function you will need to:
1. Calculate the mean of the cpm values using the sum() and len() functions
2. Write a for loop that calculates $(x-mean)^2$ for each sample and keeps a cumulative sum of the results
3. Divide the result by the number of samples minus one
4. Print out the mean and variance 



In [None]:
def variance (CPM):
    '''
    A function for calculating variance.
    '''
    # Calculate the sample size and the mean
    sample_size = None
    mean = None

    sq_dev_sum = 0

    # Calculate the squared deviation from the mean for each sample
    for cpm in CPM:
        # Handy shortcut: instead of a = a + 3 you can use a += 3 
        sq_dev_sum += None

    variance = None
    
    print ("The mean is:", mean)
    print ("The variance is:", variance)

    return 

### Putting the `variance` function in a module

We can make our functions more easily reusable by placing them into modules that we can import, just
like we have been doing with `numpy`. It's pretty simple to do this.

1. Copy your function(s) into a new text file, in the same directory as this notebook,
called `var.py`.
1. In the cell below, type `import var` to import the module. Type `var.` and hit tab to see the available
functions in the module.
1. Now use your variance function for all of the samples together, and the drug treated and control samples separately.