In [1]:
from IPython.display import Image
from IPython.display import clear_output
from IPython.display import FileLink, FileLinks

<img src="img/python-logo-master-flat.png" alt="Python Logo" style="width: 120px; float: right; margin: 0 0 10px 10px;" />

## Introduction to Python with Application in Bioinformatics



### Nanjiang Shu

#### 2024-07-17 (Day 3)

## Review of Day 2

- How to put you code in a Python script and how to run the Python script
  - File extension: `.py`
  - Run in the command line with `python myscript.py` or `./myscript.py`
- How to read and write files
- Practised file processing and data maniputation with a VCF file 
- Introduced the course project

## Review of quiz from yesterday

## Tuples

__1. Which of the following variables are of the type tuple?__  
`a = (1, 2, 3, 4)`  
`a = ([1, 2], 'a', 'b')`

__2. What is the difference between a tuple and a list?__  
A tuple is immutable while a list is mutable

In [1]:
myTuple    = (1, 2, 3, 'a', 'b', [4,5,6])
myList     = [1, 2 ,3]
myList[2]  = 4
#myTuple[2] = 4
myList

[1, 2, 4]

## How to structure code

__3. What does pseudocode mean?__  
Writing down the steps you intend to include in your code in more general language

- Decide on what output you want
- What input files do you have?
- How is the input structured, can you iterate over it?
- Where is the information you need located?
- Do you need to save a lot of information while iterating?
  - Lists are good for ordered data
  - Sets are good for non-duplicate single entry information
  - Dictionaries are good for a lot of structured information
- When you have collected the data needed, decide on how to process it
- Are you writing your results to a file?

__Always start with writing pseudocode!__

## Functions and methods

__4. What are the following examples of?__  
`len([1, 2, 3, 4])`  
`print("my text")`

Functions

__5. What are the following examples of?__  
`"my\ttext".split("\t")`  
`[1, 2, 3].pop()`

Methods

General syntax of Functions and Methods

`functionName()`  &emsp; &emsp;  `<object>.methodName()`

<br>
A method always belongs to an object of a specific class, a function does not have to.

__6. Calculate the average of the list `[1,2,3.5,5,6.2]` to one decimal, using Python__

In [6]:
myList = [1, 2, 3.5, 5 ,6.2]
round(sum(myList)/len(myList),1)

3.5

__7. Take the list `['I','know','Python']` as input and output the string 'I KNOW PYTHON'__

In [7]:
my_list   = ['I','know','Python']
my_string =' '.join(my_list).upper()
print(my_string)

I KNOW PYTHON


## Day 3

- __Session 1__
    - Functions and methods
    - Difference of functions and methods
    - Introduction to some useful functions and methods
- __Session 2__    
    - How to write you own functions
    - How to pass arguments from command line using `sys.argv`
    - String formatting

## Session 1: Functions and Methods

### What is a function

- A function is a block of code that performs a specific task.
- It can take input (parameters) and return output (results).
- Functions help to reuse code and make programs more modular and readable.

```python
def function_name(parameters):
    # Block of code
    return result
```

In [14]:
print("Hello Python")

Hello Python


In [16]:
length = len("ACCCCTTGAACCCC")

14


In [17]:
max([87, 131, 69, 112, 147, 55, 68, 130, 119, 50])

147

In [85]:
def gc_content(seq):
    seq = seq.upper()
    gc_count = seq.count("G") + seq.count("C")
    return (gc_count / len(seq)) * 100

sequence = "ACCCCTTGAACCCC"
gc_content = gc_content(sequence)
print(gc_content)

64.28571428571429


## Note: we will describe more about how to define a function in the next session

### What is a method
- A method is a function that is associated with objects (instances of classes).

In [86]:
"ACCCGGGT".lower()

'acccgggt'

In [97]:
mylist = [5, 13, 1, 13, 25]
mylist.sort()
print(mylist)

[1, 5, 13, 13, 25]




### What is the difference between a `function` and a `method`?

- A `method` always belongs to an object of a specific class, a `function` does not have to. For example:

    - `print('a string')` and `print(42)` both works, even though one is a string and one is an integer

    - `'a string '.strip()` works, but `[1,2,3,4].strip()` does not work. `strip()` is a method that only works on strings


### What does it matter to me?

For now, you only need to know the different syntaxes of using a function and a method:

__A function:__  
`functionName()`

__A method:__  
```<object>.methodName()```


### Introduction to some useful functions
[Python Built-in functions](https://docs.python.org/3/library/functions.html#)

<img src="img/built-in_functions.png" alt="Drawing" style="width: 800px;"/> 

### `help`

In [43]:
help(max)

Help on built-in function max in module builtins:

max(...)
    max(iterable, *[, default=obj, key=func]) -> value
    max(arg1, arg2, *args, *[, key=func]) -> value
    
    With a single iterable argument, return its biggest item. The
    default keyword-only argument specifies an object to return if
    the provided iterable is empty.
    With two or more arguments, return the largest argument.



### `range`

In [42]:
for i in range(5):
    print(i)

0
1
2
3
4


In [45]:
# range(start, stop, step)
for i in range(1, 10, 2):
    print(i)

1
3
5
7
9


### `sorted`

In [46]:
li = [5, 4, 1, 3, 10, 13, 13]
print(sorted(li))

[1, 3, 4, 5, 10, 13, 13]


In [47]:
print(sorted(li, reverse=True))

[13, 13, 10, 5, 4, 3, 1]


### `dir`: return a list of names comprising the attributes of the given objet

In [53]:
dir()

['In',
 'Out',
 '_',
 '_1',
 '_10',
 '_11',
 '_12',
 '_15',
 '_17',
 '_18',
 '_19',
 '_2',
 '_20',
 '_21',
 '_23',
 '_25',
 '_3',
 '_4',
 '_48',
 '_5',
 '_50',
 '_52',
 '_6',
 '_7',
 '_8',
 '_9',
 '__',
 '___',
 '__builtin__',
 '__builtins__',
 '__doc__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 '_dh',
 '_i',
 '_i1',
 '_i10',
 '_i11',
 '_i12',
 '_i13',
 '_i14',
 '_i15',
 '_i16',
 '_i17',
 '_i18',
 '_i19',
 '_i2',
 '_i20',
 '_i21',
 '_i22',
 '_i23',
 '_i24',
 '_i25',
 '_i26',
 '_i27',
 '_i28',
 '_i29',
 '_i3',
 '_i30',
 '_i31',
 '_i32',
 '_i33',
 '_i34',
 '_i35',
 '_i36',
 '_i37',
 '_i38',
 '_i39',
 '_i4',
 '_i40',
 '_i41',
 '_i42',
 '_i43',
 '_i44',
 '_i45',
 '_i46',
 '_i47',
 '_i48',
 '_i49',
 '_i5',
 '_i50',
 '_i51',
 '_i52',
 '_i53',
 '_i6',
 '_i7',
 '_i8',
 '_i9',
 '_ih',
 '_ii',
 '_iii',
 '_oh',
 'atexit',
 'exit',
 'get_ipython',
 'historyPath',
 'i',
 'length',
 'li',
 'open',
 'os',
 'quit',
 'readline',
 'rlcompleter',
 'save_history',
 'sys']

## Introduction to some userful methods

###  Operations on strings

<img src="img/string_methods.png" alt="Drawing" style="width: 600px;"/> 

<img src="img/strip.png" alt="Drawing" style="width: 800px;"/> 

<img src="img/lstrip.png" alt="Drawing" style="width: 800px;"/>  

<img src="img/rstrip.png" alt="Drawing" style="width: 800px;"/> 

<img src="img/split.png" alt="Drawing" style="width: 800px;"/> 

In [26]:
a = '  split a string into a list '
a.split(maxsplit=3)

['split', 'a', 'string', 'into a list ']

<img src="img/join.png" alt="Drawing" style="width: 800px;"/> 

In [27]:
'|'.join('a string already')
#'|'.join(['a', 'b', 'c', 'd'])

'a| |s|t|r|i|n|g| |a|l|r|e|a|d|y'

<img src="img/startswith.png" alt="Drawing" style="width: 800px;"/>  

<img src="img/endswith.png" alt="Drawing" style="width: 800px;"/> 

In [12]:
'long string'.startswith('ng', 2)

True

In [129]:
'long string'.endswith('nt')

False

<img src="img/upper.png" alt="Drawing" style="width: 800px;"/>  

<img src="img/lower.png" alt="Drawing" style="width: 800px;"/>  

In [29]:
'LongRandomString'.lower()
'LongRandomString'.upper()

'LONGRANDOMSTRING'

### Useful operations on Mutable sequences

<br></br>

<img src="img/list_methods.png" alt="Drawing" style="width: 400px;"/> 

In [150]:
a = [1, 2, 3, 4, 5, 5, 5, 5]
a.append(6)
a

[1, 2, 3, 4, 5, 5, 5, 5, 6]

In [149]:
a.pop(3)
a

[2, 3, 4, 5, 5]

In [153]:
a.reverse()
a

[6, 5, 5, 5, 5, 4, 3, 2, 1]

In [154]:
a.remove(5)
a

[6, 5, 5, 5, 4, 3, 2, 1]

In [159]:
a.insert(0, 13)
a

[13, 6, 5, 13, 5, 5, 4, 3, 2, 1, 0, 0]

In [162]:
### Tuples are immutable
a = (2, 3, 4, 15)
a.pop()

AttributeError: 'tuple' object has no attribute 'pop'

### Dictionary and its operations

- A dictionary is a mapping of unique keys to values
- Dictionaries are mutable

<br>
<img src="img/key_values.png" alt="Drawing" style="width: 1000px;"/>  

Syntax:  
`a = {}` (create empty dictionary)  
`d = {'key1':1, 'key2':2, 'key3':3}`

In [23]:
myDict = {'drama': 4,
          'thriller': 2,
          'romance': 5}
myDict


{'drama': 4, 'thriller': 2, 'romance': 5}

### Userful operations on Dictionaries

<img src="img/dictionary.png" alt="Drawing" style="width: 600px;"/>  

# Note: show a different dictionary, probably with gene expression

In [33]:
myDict = {'drama': 4, 
          'thriller': 2, 
          'romance': 5}
len(myDict)
myDict['drama']
myDict['horror'] = 2
myDict
del myDict['horror']
myDict
'drama' in myDict
myDict.keys()
list(myDict.items())
list(myDict.values())

[4, 2, 5]

### Exercise

In [39]:
myDict = {'drama': 182, 
          'war': 30, 
          'adventure': 55, 
          'comedy': 46, 
          'family': 24, 
          'animation': 17, 
          'biography': 25}

- How many entries are there in this dictionary?
- How do you find out how many movies are in the genre 'comedy'?
- You're not interested in biographies, delete this entry
- You are however interested in fantasy, add that we have 29 movies of the genre fantasy to the list
- What genres are listed in this dictionary?
- You remembered another comedy movie, increase the number of comedies by one

## Summary

- A method always belongs to an object of a specific class, a function does not have to
- The official Python documentation describes the syntax for all built-in functions and methods
  - https://docs.python.org/3.9/

## Day 3, Exercise 1
- Take a break after the exercise

## Session 2: 
- Functions - how we define our own function
- sys.argv - how we pass argument from the command line
- Formatting

## Functions

<img src="img/for_function.png" alt="Drawing" style="width: 900px;"/>  

A lot of ugly formatting for calculating hours and minutes from seconds...

In [None]:
def FormatSec(genre):   # input a list of seconds
    average   = sum(genreDict[genre])/len(genreDict[genre])
    hours     = int(average/3600)
    minutes   = (average - (3600*hours))/60   
    return str(hours)+'h'+str(round(minutes))+'min'


fh        = open('../downloads/250.imdb', 'r', encoding = 'utf-8')
genreDict = {}

for line in fh:
    if not line.startswith('#'):
        cols    = line.strip().split('|')
        genre   = cols[5].strip()
        glist   = genre.split(',')
        runtime = cols[3]      # length of movie in seconds
        for entry in glist:
            if not entry.lower() in genreDict:
                genreDict[entry.lower()] = [int(runtime)]   # add a list with the runtime
            else:
                genreDict[entry.lower()].append(int(runtime))   # append runtime to existing list
fh.close()
                
for genre in genreDict:
    print('The average length for movies in genre '+genre\
          +' is '+FormatSec(genre))

### Function structure

<img src="img/function_structure.png" alt="Drawing" style="width: 800px;"/>  

### Function structure

<img src="img/function_structure_explained.png" alt="Drawing" style="width: 800px;"/>  

In [55]:
def addFive(number):
    print(number)
    final = number + 5
    return final

my_result = addFive(4)
my_result

4


9

In [57]:
from datetime import datetime

def whatTimeIsIt():
    time = 'The time is: ' + str(datetime.now().time())
    return time

whatTimeIsIt()


'The time is: 13:11:13.671880'

In [62]:
def addFive(number):
    final = number + 5
    return final

#addFive(4)

res = addFive(4)
print(final)


9


### Scope 

- Variables within functions
- Global variables

In [2]:

def someFunction(input_value):
    res = input_value + 4
    return res
    

#print(s)
#someFunction(a)
#print(a)

## Why use functions?

- Cleaner code
- Better defined tasks in code
- Re-usability
- Better structure

## Importing functions

- Collect all your functions in another file
- Keeps main code cleaner
- Easy to use across different code

Example:
1. Create a file called myFunctions.py, located in the same folder as your script
2. Put a function called `formatSec()` in the file
3. Start writing your code in a separate file and `import` the function

In [72]:
from myFunctions import formatSec

seconds = 32154

formatSec(seconds)


'8h56min'

In [3]:
from myFunctions import  formatSec, toSec

seconds = 21154
print(formatSec(seconds))

days    = 0
hours   = 21
minutes = 56
seconds = 45

print(toSec(days, hours, minutes, seconds))

5h53min
79005s


### myFunctions.py

<img src="img/myFunctions.png" alt="Drawing" style="width: 600px;"/>  

## Summary

- A function is a block of organized, reusable code that is used to perform a single, related action
- Variables within a function are local variables
- Functions can be organized in separate files and imported to the main code

## NEW TOPIC: `sys.argv`

- Avoid hardcoding the filename in the code
- Easier to re-use code for different input files
- Uses command-line arguments
- Input is list of strings:
    - Position 0: the program name
    - Position 1: the first argument

<b>The `sys.argv` function</b>

Python script called `print_argv.py`:

<img src="img/argv_print.png" alt="Drawing" style="width: 300px;"/> 

Running the script with command line arguments as input:

<img src="img/argv_print_output.png" alt="Drawing" style="width: 800px;"/> 

Instead of:

<img src="img/non-sysargv.png" alt="Drawing" style="width: 700px;"/>  

do:

<img src="img/sysargv.png" alt="Drawing" style="width: 800px;"/>  

Run with:

<img src="img/sysargv_run.png" alt="Drawing" style="width: 900px;"/>  

## Formatting

Format text for printing or for writing to file.

What we have been doing so far:

In [None]:
title  = 'Toy Story'
rating = 10
print('The result is: ' + title + ' with rating: ' + str(rating))

Other (better) ways of formatting strings:

<br>

__f-strings (since python 3.6)__

In [None]:
title  = 'Toy Story'
rating = 10
print(f'The result is: {title} with rating: {rating}')

__format method__

In [None]:
title  = 'Toy Story'
rating = 10
print('The result is: {} with rating: {}'.format(title, rating))

__The ancient way (python 2)__

In [None]:
title  = 'Toy Story'
rating = 10
print('The result is: %s with rating: %s' % (title, rating))

### IMDb - Pair-programming exercise

__How would you re-structure and write the output to a new file as below?__

<img src="img/re-structured.png" alt="Drawing" style="width: 400px;"/>  

Normal level:  
- Write pseudocode for how you would do this

Normal+ level:
- Write a script that takes the file as input on the command line and writes a new file with one line per movie:
"The movie \<movie\> has the rating \<rating\>"


Insane level (danger zone!):  
- Write the code

### Answer -  Example
<img src="img/reformat_imdb.png" alt="Drawing" style="width: 1000px;"/>  

Run with:
<img src="img/run_reformat.png" alt="Drawing" style="width: 700px;"/>  

## Day 3, Exercise 2
- take a short break after the exercise
- Quiz for Day 3

## Lunch 

## Project time after lunch