# Introduction to Python - Session 1.1
1. Programming and Python
    - What is Python?
    - What are Jupyter Notebooks?
    - Paths and directories
    - Creating a python script/notebook
2. Basics of Python: objects, syntax, functions
3. Data types
    - Numbers
    - Lists
    - Strings
    - Dictionaries

SLIDES [HERE](https://docs.google.com/presentation/d/1OHdfw7pi4ntcolRgFoeL1c8y9vxH20EqNgJKZhvMxpo/export/pdf)

**Getting help**
The built-in function help() will show you interactive documentation about most Python objects.

In [66]:
help(print)

Help on built-in function print in module builtins:

print(...)
    print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False)
    
    Prints the values to a stream, or to sys.stdout by default.
    Optional keyword arguments:
    file:  a file-like object (stream); defaults to the current sys.stdout.
    sep:   string inserted between values, default a space.
    end:   string appended after the last value, default a newline.
    flush: whether to forcibly flush the stream.



## EXERCISE 1 - Getting started
Exercises are explained in this notebook. Use the cells below each question to write and run your own answers. Remember, you can add text cells as you wish and use # for commenting the code.

In [1]:
# This is a comment
print("Hello world")

Hello world


**1. Get familiar with the notebook environment. What is the working directory?**

**2. Using Python, calculate the percentage of males and females currently present in the training.**

**3. Create a new object `myobject` with value 60. Print `myobject` in the console.**

**4. Reassign `myobject` with value 87.**

**5. Subtract 1 to `myobject` and reassign.**

## EXERCISE 2 - Data types: strings

**1. Create the string `mystring` with value "Hello world!", and print it.**

**2. Print "This is my string: Hello world!".**

**3. How long is the string?**

**4. Get only the word "world" from `mystring` using numeric indexes.**

**5. Split `mystring` word by word. Use the method `split`. Then `join` the words with slashes.**

**6. Print `mystring` in upper case. Use the method `upper`.**

## EXERCISE 3 - Data types: numbers

**1. Create three variables `a`, `b`, and `c` with values 2, 5 and 3 respectively.**

**2. Is `a` greater or equal than `c`. And `b`?**

**3. Try `(a+c)*5`. Assign it to `d`.**

**4. Try b\*\*a. Is it the same as `d`? Use either `==` or `is` for comparison.**

**5. What is `d` divided by `a`? If not exact, what is the remainder?**

## EXERCISE 4 - Data types: lists

**1. Create a list `y` which contains the numbers from 2 to 11, both included. 
Print `y` in the console.**

**2. How many elements are in `y`? I.e what is the length of the list `y`?**

**3. Show the 2nd element of `y`. Remember that Python uses 0-based indexing.**

**3. Show from 5th to 7th element of `y`.**

**4. Show the last element of `y`.**

**5. Remove the 4th element of `y` and reassign. Use the methods `pop`and `remove`. What is now the length of `y`?**

**6. What are the minimum, maximum and sum of values in `y`?**

**7. Check whether 1 and 9 are present in `y` list. Use the `in` operator.**

**8. Create a list `x=[1,2,3,1,2,3,1,2,3]`, but expressed as a repetion of `[1,2,3]`. Use `*`.**

**9. Add an additional element `15` in the list `x`. Use `append`.**

**10. Add four additional elements `[45,72,4,6]` in the list `x`. Use `extend`.**

**11. Order the elements of `x` using the method `sort`.**

**12. Remove duplicated numbers in the list using the function `set`.**

Sets are like lists, but are unordered, unchangeable, and unindexed. As such, they cannot contain duplicated items.

**13. Find the items in common between the previous list and this one: `[5, 3, 7, 4, 10, 11, 12, 72]`. Use `set.intersection()`.**

**14. Given the protein sequence "MPISEPTFFEIF", split the sequence into its component amino acid codes and count how many phenilalanines (F) are there.**

## EXERCISE 6 - Data types: dictionaries

**1. Create a dictionary `mydict` that matches the one letter amino acid code `A`, `C`, `D` and `E` to the three letter codes `Ala`, `Cys`, `Asp`, `Glu`.**

**2. Print the keys and values of the dictionary. Use the methods `values` and `keys`. Print the three letter code of `C`.**

**3. Add phenilalanine to `mydict`.**

**4. Add a fake amino acid `A` that matches to `Fake` value. Print `mydict`.**

# Introduction to Python - Session 1.2
1. Flow control
    - if...else statements
    - for and while loops
2. List comprehension
3. Reading and writing files
4. Functions

SLIDES [HERE](https://docs.google.com/presentation/d/1SMnQmfrMgdD-81KdkxVsbGHcEoJ-Qsz0Y_NQIfUziz0/export/pdf)

## EXERCISE 1 - if...else statements

In [4]:
fruit = ["kiwi", "apple", "pear", "grape"]
fruit2 = ["cherry", "strawberry", "blueberry", "peach"]

**1. Use an `if` statement and the `in` function to check whether "apple" is present in `fruit`. If yes, print “There is an apple!”).**

**2. Use an `if` statement and the `in` function to check whether "grapefruit" is present in `fruit`. If no, test whether "pear" is found. Print accordingly.**

**3. Add an `else` section in case neither "grapefruit" nor "pear" is found in `fruit`. Test your `if` statement also on `fruit2`:**

**4. Check whether the sequence "ATGGCGGTCGAATAG" contains a stop codon (TAG, TAA, TGA). Ignore the frame.**

## EXERCISE 2 - For loops

**1. Write a `for` loop that iterates over the sequence of numbers 2 to 10 (both included), and prints the square of each number (function ` **2`).**

**2. Write a `for` loop that iterates over the sequence of numbers 5 to 15 (both included), and prints a new list of 2 elements containing the number and its square.**

**3. Write a program where you print out all positive numbers up to 1000 that can be divided by 13, or 17, or both.**

**4. Write a program where you find, for each positive number up to 50, all numbers that can divide each number in this range.**

**5. Write a program where you start with a list of numbers from 1 to 100, and you then remove every number from this list that can be divided by 3 or by 5. Print the result.**

## EXERCISE 3 - List comprehension
Remember that all list comprehensions can be also expressed as a for loop. Please formulate the answers as list comprehensions for practice.

**1. Create a list of integers which specify the length of each word in `mysentence`, but only if the word is not the word "the".**

In [1]:
mysentence = "the quick brown fox jumps over the lazy dog"


**2. Create a new list called `newlist` out of the list `numbers`, which contains only the positive numbers from the list, as integers.**

In [2]:
numbers = [34.6, -203.4, 44.9, 68.3, -12.2, 44.6, 12.7]


**3. Remove all of the vowels in `mysentence`.**

In [3]:
mysentence = "the quick brown fox jumps over the lazy dog"


**4. Use a nested list comprehension to find all of the numbers from 1-1000 that are divisible by any single digit besides 1 (2-9).**

**5. Print out the amino acid sequence that would be produced by the DNA sequence "GTTGCACCACAACCG". Use genetic code below.**

In [1]:
GENETIC_CODE = {
    'ATA':'I', 'ATC':'I', 'ATT':'I', 'ATG':'M',
    'ACA':'T', 'ACC':'T', 'ACG':'T', 'ACT':'T',
    'AAC':'N', 'AAT':'N', 'AAA':'K', 'AAG':'K',
    'AGC':'S', 'AGT':'S', 'AGA':'R', 'AGG':'R',
    'CTA':'L', 'CTC':'L', 'CTG':'L', 'CTT':'L',
    'CCA':'P', 'CCC':'P', 'CCG':'P', 'CCT':'P',
    'CAC':'H', 'CAT':'H', 'CAA':'Q', 'CAG':'Q',
    'CGA':'R', 'CGC':'R', 'CGG':'R', 'CGT':'R',
    'GTA':'V', 'GTC':'V', 'GTG':'V', 'GTT':'V',
    'GCA':'A', 'GCC':'A', 'GCG':'A', 'GCT':'A',
    'GAC':'D', 'GAT':'D', 'GAA':'E', 'GAG':'E',
    'GGA':'G', 'GGC':'G', 'GGG':'G', 'GGT':'G',
    'TCA':'S', 'TCC':'S', 'TCG':'S', 'TCT':'S',
    'TTC':'F', 'TTT':'F', 'TTA':'L', 'TTG':'L',
    'TAC':'Y', 'TAT':'Y', 'TAA':'_', 'TAG':'_',
    'TGC':'C', 'TGT':'C', 'TGA':'_', 'TGG':'W'}

## EXERCISE 4 - While loops

**1. Write a program that print all the powers of base 2 for which the result is smaller than 1000.**

**2. Write a program where you ask the user for an integer using the `input()` function. Keep on asking if they give a wrong input (not integer). Check whether the number can be divided by 7, and print the result.** HINT: `isdigit()` checks whether a string is an integer.

## EXERCISE 5 - Reading and writing files

**1. Read the "TestFile.pdb" file within the directory "data" (PDB coordinate file for a 5 residue peptide). Print the number of lines of the file (length).**

**2. Loop over the lines, and use the method `strip` to remove starting and trailing spaces/tabs/newlines. As PDB files  are delimited by spaces, use the method `split` to separate the elements in each line. Print.**

**3. Using the "TestFile.pdb", write out all lines that contain 'VAL' to a new file "TestFile_VAL.pdb" in "results" directory.**

**4. Read in the "TestFile.pdb" atom coordinate file, print out the title of the file, and find all atoms that have coordinates closer than 2 angstrom to the (x,y,z) coordinate (-8.7,-7.7,4.7). Print out the model number, residue number, atom name and atom serial for each.**

The model is indicated by:
```
MODEL        1
```
The atom coordinate information is: 
```
ATOM      1  N   ASP A   1     -10.341  -9.922   9.398  1.00  0.00           N
```
where column 1 is always ATOM, column 2 is the atom serial, the column 3 the atom name, column 4 the residue name, column 5 the chain code, column 6 the residue number, followed by the x, y and z coordinates in angstrom in columns 7, 8 and 9.
Note that the distance between two coordinates is calculated as the square root of (x1-x2)²+(y1-y2)²+(z1-z2)². 

## EXERCISE 6 - Functions

**1. Write a function to multiply all the numbers in a list.**

**2. Write a function that computes the absolute of a number. Compare it to the built-in function `abs()`.**

**3. Write a function that computes the mean of a list of numbers.**

**4. Write a function `complementDNA` that produces the complement of a DNA sequence (converts A to T and C to G, and vice versa).**

**5. Add a second argument to `complementDNA` called `reverse`, which defaults to "False". Change the function so that it prints the reverse sequence when `reverse = True`.**
NOTE: In biology, you may want to work with the reverse-complement of a sequence if it contains an ORF on the reverse strand.

**6. Write a function to that calculates the GC content of a sequence. Test it with the sequence "ATCGGCTTTC".**

**7. Write a function that reads a fasta file and outputs a dictionary, which contains the name of the sequence in the keys and the sequence in the values.** Remember that sequences in fasta files are separated by ">". Test the function with the "TestFasta.fa" file.

**8. Read the lyrics of Imagine by John Lennon, 1971 from the file in "imagine.txt". Split the text into words. Print the total number of words, and the number of distinct words. Calculate the frequency of each distinct word and store the result into a dictionary. Find the most frequent word longer than 3 characters in the song, print it with its frequency.**