Author: **David Heathcote**

# 1. First steps with *Python* on *Jupyter*

### Goals


- learn usage of *Jupyther Lab* notebooks
- learn basics of the *Python* syntax

Recommended resource: https://www.kaggle.com

## 1.1 Usage of ```Jupyter Lab```

- Notebooks are made of a sequence of cells
- Cells can contain different content such as Python code or Markdown ([Markdown basics](https://www.markdownguide.org/basic-syntax))
- You can change the cell type in the toolbar
- To execute a cell press "Shift+Return"
- The result of the last line will be printed below the cell (this behavior can be disabled by adding a semicolon to the end of the last line)
- Use the tool bar to add, delete, copy, or insert cells

### *1.1 TASK*

1. Edit the first line of this notebook and enter your own name.
2. Python was named for the British comedy troupe Monty Python, so why not make our first Python program an homage to their famous Spam skit? Just for fun, try reading over the code below and predicting together with your neighbor what it's going to do when run. (If you have no idea, that's fine!)
Then execute the cell to see the results of our little program.

In [None]:
spam_amount = 0
print(spam_amount)

# Ordering Spam, egg, Spam, Spam, bacon and Spam (4 more servings of Spam)
spam_amount = spam_amount + 4

if spam_amount > 0:
    print("But I don't want ANY spam!")

viking_song = "Spam " * spam_amount
print(viking_song)

## 1.2 Python

A variable is something that holds a value that may change. In simplest terms, a variable is just a box that you can put stuff in. You can use variables to store all kinds of stuff. For example the integer number 123456:

In [None]:
variable_0 = 123456

#### Data types / objects

A selection of frequently used Python data types / objects is given here:

|Data type   | Examples                                                 |
|------------|----------------------------------------------------------|
|```bool```  |either ```True``` or ```False```                          |
|```int```   |1, 6, -1, 0, 3244, ...                                    |
|```float``` |3.14, -43535.345, 0.0, ...                                |
|```str```   |"Hello world!", "nothing", ...                            |
|```tuple``` |(1,2), (1231.32, ```True```, "Hello world!", None)...     |
|```list```  |[```True```, 1, 3.14, "Hello!", [1,2,34]], ...            |
|```dict```  |{"some key": 1.24233, "another key": "anything"}, ...     |

*Examples:*

In [None]:
# bool 
a = True
b = False

In [None]:
# int
c = -2
d = 3

In [None]:
# float
e = 3.1
f = -2342.4324

In [None]:
# str
g = "Hello!"

In [None]:
# list
h = [1,5,23,-1]
j = [a,b,c,d,e,f,g,h,"I can put anything into a list! :)"]
k = range(10)

In [None]:
# dict
l = {"key1": "any content", "key2": 13, "key1000": [1,4,6]}

#### Operators

- Comparisons: "==", ">", "<", ">=", "<=", ...
- Arithmetics: "+", "-", "*", "/", "//", ...

In [None]:
# Comparisons
a == b

In [None]:
b == False

In [None]:
c > d

In [None]:
c >= c

In [None]:
# If statements
if g == "Hello!":
    print("'g' equals the string 'Hello!'")
else:
    print("'g' does not equal the string 'Hello!'.")

In [None]:
# Arithmetics
e + f

In [None]:
h + j

In [None]:
c/d

In [None]:
c//d

### Functions

Functions allow you to separate and re-use a piece of code.

In [None]:
def my_function(arg1, arg2, arg3=True):
    result = arg1 + arg2
    if arg3:
        result = result * 2
    return result

### Loops

Using _for_-loops allow you to run tasks repeatedly on a data sequence. 

In [None]:
# Example 1
for letter in "abcde":
    print letter

In [None]:
# Example 2
for h_i in h:
    print h_i

In [None]:
# Example 3
for i in range(len(h)):
    print i, h[i]

In [None]:
# Example 4
for key in l.keys():
    print key, l[key]

### Classes

Class definitions are the core of the concept of object oriented programming. Classes allow you to link functions and data that belong together in objects. Classes are templates for the creation of objects (or class instances). Objects have attributes (i.e. variables/data) and methods (i.e. functions).

In [None]:
pi = 3.1416
class Circle:
    def __init__(self, radius):
        self.radius = radius
    def area(self):
        return 3.1416*(self.radius)**2
    def diameter(self):
        return self.radius*2
    def circumference(self):
        return 2*pi*self.radius

C = Circle(2.)
print(f"The circle has an area of {C.area()}")

### *1.2 TASKS*

1. Define a string variable that defines the peptide sequence of human Ubiquitin-1
2. Write a program that counts the number of alanines in the sequence
3. Write a program that creates a new sequence in which all alanines are replaced by cysteines.
4. Copy your code from 2. and 3. into two new cells and re-organise the code into functions. 
5. Copy your functions into a new cell and make a class definition Sequence for it.

In [1]:
cur_sequence = 'MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLRLRGG'

In [2]:
# Method 1 - the downright simple
class process_sequence_method_one():
    
    def __init__(self, sequence):
        #Assign the passed sequence to the instance
        self.sequence = sequence
    
    def count_a(self):
        return self.sequence.count('A')
    
    def replace_a_with_c(self):
        return self.sequence.replace('A', 'C')
    
instance_of_method_one = process_sequence_method_one(cur_sequence)
print(f'A count = {instance_of_method_one.count_a()}')
print(f'New sequence with A replaced with C\n{instance_of_method_one.replace_a_with_c()}')

A count = 2
New sequence with A replaced with C
MQIFVKTLTGKTITLEVEPSDTIENVKCKIQDKEGIPPDQQRLIFCGKQLEDGRTLSDYNIQKESTLHLVLRLRGG


In [3]:
# Method 2 - string based
class process_sequence_method_one():
    
    def __init__(self, sequence):
        #Assign the passed sequence to the instance
        self.sequence = sequence
    
    def count_a(self):
        #Define counter variable
        count = 0
        
        #Loop through each element, increment count if A
        for element in self.sequence:
            if element == 'A':
                count += 1
        return count
    
    def replace_a_with_c(self):
        #Loop through each element of the sequence. Replace the element with C when A by redefining the string
        #Define a variable to hold the new string
        new_sequence = ''
        
        #Loop through each element of the sequence. When an A is found, update the new sequence
        for i, element in enumerate(self.sequence):
            if element == 'A':
                new_sequence += (self.sequence[len(new_sequence):i] + 'C')
                
        #Make sure that the end of the sequence is added on
        new_sequence += self.sequence[len(new_sequence):]
        return new_sequence
    
instance_of_method_one = process_sequence_method_one(cur_sequence)
print(f'A count = {instance_of_method_one.count_a()}')
print(f'New sequence with A replaced with C\n{instance_of_method_one.replace_a_with_c()}')

A count = 2
New sequence with A replaced with C
MQIFVKTLTGKTITLEVEPSDTIENVKCKIQDKEGIPPDQQRLIFCGKQLEDGRTLSDYNIQKESTLHLVLRLRGG


In [4]:
# Method 3 - the list method
class process_sequence_method_one():
    
    def __init__(self, sequence):
        #Assign the passed sequence to the instance after converting to a char list
        self.sequence = [element for element in sequence]
    
    def count_a(self):
        #The same as the previous method for this
        #Define variable
        count = 0
    
        #Loop through each element, increment count if A
        for element in self.sequence:
            if element == 'A':
                count += 1
        return count
    
    def replace_a_with_c(self):
        #Loop through each element of the sequence. Replace the element with C when A
        #Define the new sequence
        new_sequence = self.sequence
        for i, element in enumerate(self.sequence):
            if element == 'A':
                new_sequence[i] = 'C'
        return ''.join(new_sequence)
    
instance_of_method_one = process_sequence_method_one(cur_sequence)
print(f'A count = {instance_of_method_one.count_a()}')
print(f'New sequence with A replaced with C\n{instance_of_method_one.replace_a_with_c()}')

A count = 2
New sequence with A replaced with C
MQIFVKTLTGKTITLEVEPSDTIENVKCKIQDKEGIPPDQQRLIFCGKQLEDGRTLSDYNIQKESTLHLVLRLRGG


In [5]:
# Method 4 - both at once
class process_sequence_method_one():
    
    def __init__(self, sequence):
        #Assign the passed sequence to the instance
        self.sequence = sequence
        #Calculate the A count and the new sequence
        self.calculate()
    
    def calculate(self):
        #Split at A
        temp = self.sequence.split('A')
        self.count = len(temp) - 1
        self.new_sequence = 'C'.join(temp)
    
    def count_a(self):
        return self.count
    
    def replace_a_with_c(self):
        return self.new_sequence
    
instance_of_method_one = process_sequence_method_one(cur_sequence)
print(f'A count = {instance_of_method_one.count_a()}')
print(f'New sequence with A replaced with C\n{instance_of_method_one.replace_a_with_c()}')

A count = 2
New sequence with A replaced with C
MQIFVKTLTGKTITLEVEPSDTIENVKCKIQDKEGIPPDQQRLIFCGKQLEDGRTLSDYNIQKESTLHLVLRLRGG


## 1.3 Using external packages

There is a huge amount of useful Python packages. Many packages are already shipped with your Python installation, others you need to install for example with the command line package manager _pip_.

Before using them in your code you must "_import_" the package.

In [None]:
# We import the standard "time" package
import time

Look up the help message for the function _time.time(...)_ by executing

In [None]:
help(time.time)

In [None]:
t1 = time.time()
time.sleep(1.)
t2 = time.time()
print(f"This took {t2-t1} seconds.")

### *1.3 TASKS*

Use _time.time(...)_ and the _pandas.read_excel(...)_ to measure the time that passes to open one of your excel sheets with the Pandas package.

In [15]:
from time import time
import pandas as pd

t1 = time()
print(pd.read_excel(r'C:\Users\cvgroup\Documents\David Heathcote\BEB Calculations\Comparison for Propane.xlsx'))
t2 = time()
print("Time elapsed = %ss" % (t2-t1))

        1  -11.213842  16.020477  Unnamed: 3       1.1  -11.213851  16.020485  \
0       2  -11.205111  16.018732         NaN         2  -11.205143  16.018742   
1       3  -11.205081  16.018784         NaN         3  -11.205112  16.018795   
2       4   -1.046416   1.255775         NaN         4   -1.046323   1.255607   
3       5   -0.927342   1.277817         NaN         5   -0.927328   1.277727   
4       6   -0.805148   1.224537         NaN         6   -0.805161   1.224636   
5       7   -0.619860   0.868960         NaN         7   -0.619817   0.868998   
6       8   -0.598844   0.931636         NaN         8   -0.598822   0.931564   
7       9   -0.541647   1.057753         NaN         9   -0.541638   1.057397   
8      10   -0.540197   0.971707         NaN        10   -0.540202   0.971698   
9      11   -0.476253   1.143448         NaN        11   -0.476224   1.143494   
10     12   -0.473048   1.245599         NaN        12   -0.473037   1.245638   
11     13   -0.459715   1.08