Author: **Michael Antys**

# 1. First steps with *Python* on *Jupyter*

### Goals


- learn usage of *Jupyther Lab* notebooks
- learn basics of the *Python* syntax

Recommended resource: https://www.kaggle.com

## 1.1 Usage of ```Jupyter Lab```

- Notebooks are made of a sequence of cells
- Cells can contain different content such as Python code or Markdown ([Markdown basics](https://www.markdownguide.org/basic-syntax))
- You can change the cell type in the toolbar
- To execute a cell press "Shift+Return"
- The result of the last line will be printed below the cell (this behavior can be disabled by adding a semicolon to the end of the last line)
- Use the tool bar to add, delete, copy, or insert cells

### *1.1 TASK*

1. Edit the first line of this notebook and enter your own name.
2. Python was named for the British comedy troupe Monty Python, so why not make our first Python program an homage to their famous Spam skit? Just for fun, try reading over the code below and predicting together with your neighbor what it's going to do when run. (If you have no idea, that's fine!)
Then execute the cell to see the results of our little program.

In [1]:
spam_amount = 0
print(spam_amount)

# Ordering Spam, egg, Spam, Spam, bacon and Spam (4 more servings of Spam)
spam_amount = spam_amount + 4

if spam_amount > 0:
    print("But I don't want ANY spam!")

viking_song = "Spam " * spam_amount
print(viking_song)

0
But I don't want ANY spam!
Spam Spam Spam Spam 


## 1.2 Python

A variable is something that holds a value that may change. In simplest terms, a variable is just a box that you can put stuff in. You can use variables to store all kinds of stuff. For example the integer number 123456:

In [2]:
variable_0 = 123456

#### Data types / objects

A selection of frequently used Python data types / objects is given here:

|Data type   | Examples                                                 |
|------------|----------------------------------------------------------|
|```bool```  |either ```True``` or ```False```                          |
|```int```   |1, 6, -1, 0, 3244, ...                                    |
|```float``` |3.14, -43535.345, 0.0, ...                                |
|```str```   |"Hello world!", "nothing", ...                            |
|```tuple``` |(1,2), (1231.32, ```True```, "Hello world!", None)...     |
|```list```  |[```True```, 1, 3.14, "Hello!", [1,2,34]], ...            |
|```dict```  |{"some key": 1.24233, "another key": "anything"}, ...     |

*Examples:*

In [3]:
# bool 
a = True
b = False

In [4]:
# int
c = -2
d = 3

In [5]:
# float
e = 3.1
f = -2342.4324

In [6]:
# str
g = "Hello!"

In [7]:
# list
h = [1,5,23,-1]
j = [a,b,c,d,e,f,g,h,"I can put anything into a list! :)"]
k = range(10)

In [8]:
# dict
l = {"key1": "any content", "key2": 13, "key1000": [1,4,6]}

#### Operators

- Comparisons: "==", ">", "<", ">=", "<=", ...
- Arithmetics: "+", "-", "*", "/", "//", ...

In [9]:
# Comparisons
a == b

False

In [10]:
b == False

True

In [11]:
c > d

False

In [12]:
c >= c

True

In [13]:
# If statements
if g == "Hello!":
    print("'g' equals the string 'Hello!'")
else:
    print("'g' does not equal the string 'Hello!'.")

'g' equals the string 'Hello!'


In [14]:
# Arithmetics
e + f

-2339.3324000000002

In [15]:
h + j

[1,
 5,
 23,
 -1,
 True,
 False,
 -2,
 3,
 3.1,
 -2342.4324,
 'Hello!',
 [1, 5, 23, -1],
 'I can put anything into a list! :)']

In [16]:
c/d

-0.6666666666666666

In [17]:
c//d

-1

### Functions

Functions allow you to separate and re-use a piece of code.

In [18]:
def my_function(arg1, arg2, arg3=True):
    result = arg1 + arg2
    if arg3:
        result = result * 2
    return result

### Loops

Using _for_-loops allow you to run tasks repeatedly on a data sequence. 

In [19]:
# Example 1
for letter in "abcde":
    print (letter)

a
b
c
d
e


In [20]:
# Example 2
for h_b in h:
    print (h_b)

1
5
23
-1


In [21]:
# Example 3
for i in range(len(h)):
    print (i, h[i])

0 1
1 5
2 23
3 -1


In [22]:
# Example 4
for key in l.keys():
    print (key, l[key])

key1 any content
key2 13
key1000 [1, 4, 6]


### Classes

Class definitions are the core of the concept of object oriented programming. Classes allow you to link functions and data that belong together in objects. Classes are templates for the creation of objects (or class instances). Objects have attributes (i.e. variables/data) and methods (i.e. functions).

In [23]:
pi = 3.1416
class Circle:
    def __init__(self, radius):
        self.radius = radius
    def area(self):
        return 3.1416*(self.radius)**2
    def diameter(self):
        return self.radius*2
    def circumference(self):
        return 2*pi*self.radius

C = Circle(4)
print(f"The circle has an area of {C.area()}")
print(f"The circle has a circumference of {C.circumference()}")
print(f"The circle has a diameter of {C.diameter()}")

The circle has an area of 50.2656
The circle has a circumference of 25.1328
The circle has a diameter of 8


### *1.2 TASKS*

1. Define a string variable that defines the peptide sequence of human Ubiquitin-1
2. Write a program that counts the number of alanines in the sequence
3. Write a program that creates a new sequence in which all alanines are replaced by cysteines.
4. Copy your code from 2. and 3. into two new cells and re-organise the code into functions. 
5. Copy your functions into a new cell and make a class definition Sequence for it.

In [24]:
variable = 0
ubiquitin = "MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLRLRGG"
for amino in ubiquitin:
    if amino == "A":
        variable = 1 + variable
print (variable)

ubiquitin = [element for element in ubiquitin]
for amino in range(len(ubiquitin)):
    if ubiquitin[amino] == "A":
        ubiquitin[amino] = "C"     
print (ubiquitin)

2
['M', 'Q', 'I', 'F', 'V', 'K', 'T', 'L', 'T', 'G', 'K', 'T', 'I', 'T', 'L', 'E', 'V', 'E', 'P', 'S', 'D', 'T', 'I', 'E', 'N', 'V', 'K', 'C', 'K', 'I', 'Q', 'D', 'K', 'E', 'G', 'I', 'P', 'P', 'D', 'Q', 'Q', 'R', 'L', 'I', 'F', 'C', 'G', 'K', 'Q', 'L', 'E', 'D', 'G', 'R', 'T', 'L', 'S', 'D', 'Y', 'N', 'I', 'Q', 'K', 'E', 'S', 'T', 'L', 'H', 'L', 'V', 'L', 'R', 'L', 'R', 'G', 'G']


In [25]:
sequence = "MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLRLRGG"
class Protein:
    def __init__(self, sequence):
        self.sequence = sequence
    def count(self):
        variable = 0
        for amino in sequence:
            if amino == "A":
                variable = 1 + variable
        return variable
P = Protein(sequence)
print(f"Ubiquitin-1 has {P.count()} alanines in the sequence.")

Ubiquitin-1 has 2 alanines in the sequence.


In [2]:
sequence = "MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLRLRGG"
sequence = [element for element in sequence]
class Protein:
    def __init__(self, sequence):
        self.sequence = sequence
    def replace(self):
        for amino in range(len(sequence)):
            if sequence[amino] == "A":
                sequence[amino] = "C"
        return ''.join(sequence)
P = Protein(sequence)
print(f"Ubiquitin-1 after alanines are replaced by cystines: {P.replace()}")

Ubiquitin-1 after alanines are replaced by cystines: MQIFVKTLTGKTITLEVEPSDTIENVKCKIQDKEGIPPDQQRLIFCGKQLEDGRTLSDYNIQKESTLHLVLRLRGG


## 1.3 Using external packages

There is a huge amount of useful Python packages. Many packages are already shipped with your Python installation, others you need to install for example with the command line package manager _pip_.

Before using them in your code you must "_import_" the package.

In [27]:
# We import the standard "time" package
import time

Look up the help message for the function _time.time(...)_ by executing

In [28]:
help(time.time)

Help on built-in function time in module time:

time(...)
    time() -> floating point number
    
    Return the current time in seconds since the Epoch.
    Fractions of a second may be present if the system clock provides them.



In [29]:
t1 = time.time()
time.sleep(1.)
t2 = time.time()
print(f"This took {t2-t1} seconds.")

This took 1.0001001358032227 seconds.


### *1.3 TASKS*

Use _time.time(...)_ and the _pandas.read_excel(...)_ to measure the time that passes to open one of your excel sheets with the Pandas package.

In [30]:
import pandas

In [31]:
help(pandas.read_excel)

Help on function read_excel in module pandas.io.excel:

read_excel(io, sheet_name=0, header=0, names=None, index_col=None, usecols=None, squeeze=False, dtype=None, engine=None, converters=None, true_values=None, false_values=None, skiprows=None, nrows=None, na_values=None, parse_dates=False, date_parser=None, thousands=None, comment=None, skipfooter=0, convert_float=True, **kwds)
    Read an Excel table into a pandas DataFrame
    
    Parameters
    ----------
    io : string, path object (pathlib.Path or py._path.local.LocalPath),
        file-like object, pandas ExcelFile, or xlrd workbook.
        The string could be a URL. Valid URL schemes include http, ftp, s3,
        and file. For file URLs, a host is expected. For instance, a local
        file could be file://localhost/path/to/workbook.xlsx
    sheet_name : string, int, mixed list of strings/ints, or None, default 0
    
        Strings are used for sheet names, Integers are used in zero-indexed
        sheet positions.
    

In [32]:
ta=time.time()
pandas.read_excel('session1_antysBook1.xlsx')
tb=time.time()
print(f"This took {ta-tb} seconds.")

This took -0.07800769805908203 seconds.
