# Lab 1: Python basics

__Student I:__ abcde123 (John Doe)

__Student II:__ abcde123 (Jane Doe)

### A word of caution

There are currently two versions of Python in common use, Python 2 and Python 3, which are not 100% compatible. Python 2 is slowly being phased out but has a large enough install base to still be relevant. This course uses the more modern Python 3 but while searching for help online it is not uncommon to find help for Python 2. Especially older posts on sources such as Stack Exchange might refer to Python 2 as simply "Python". This should not cause any serious problems but keep it in mind whenever googling. With regards to this lab, the largest differences are how `print` works and the best practice recommendations for string formatting.

### References to R

Most students taking this course who are not already familiar with Python will probably have some experience of the R programming language. For this reason, there will be intermittent references to R throughout this lab. For those of you with a background in R (or MATLAB/Octave, or Julia) the most important thing to remember is that indexing starts at 0, not at 1.

### Recommended Reading

This course is not built on any specific source and no specific litterature is required. However, for those who prefer to have a printed reference book, we recommended the books by Mark Lutz:

* Learning Python by Mark Lutz, 5th edition, O'Reilly. Recommended for those who have no experience of Python. This book is called LP in the text below.

* Programming Python by Mark Lutz, 4th edition, O'Reilly. Recommended for those who have some experience with Python, it generally covers more advanced topics than what is included in this course but gives you a chance to dig a bit deeper if you're already comfortable with the basics. This book is called PP in the text.

For the student interested in Python as a language, it is worth mentioning
* Fluent Python by Luciano Ramalho (also O'Reilly). Note that it is - at the time of writing - still in its first edition, from 2015. Thus newer features will be missing.

### A note about notebooks

When using this notebook, you can enter python code in the empty cells, then press ctrl-enter. The code in the cell is executed and if any output occurs it will be displayed below the square. Code executed in this manner will use the same environment regardless of where in the notebook document it is placed. This means that variables and functions assigned values in one cell will thereafter be accessible from all other cells in your notebook session.

Note that the programming environments described in section 1 of LP is not applicable when you run python in this notebook.

### A note about the structure of this lab

This lab will contain tasks of varying difficulty. There might be cases when the solution seems too simple to be true (in retrospect), and cases where you have seen similar material elsewhere in the course. Don't be fooled by this. In many cases, the task might just serve to remind us of things that are worthwhile to check out, or to find out how to use a specific method.

We will be returning to, and using, several of the concepts in this lab.

### 1. Strings and string handling

The primary datatype for storing raw text in Python is the string. Note that there is no character datatype, only strings of length 1. This can be compared to how there are no atomic numbers in R, only vectors of length 1. A reference to the string datatype can be found __[here](https://docs.python.org/3/library/stdtypes.html#text-sequence-type-str)__.

[Litterature: LP: Part II, especially Chapter 4, 7.]

a) Define the variable `parrot` as the string containing the sentence _It is dead, that is what is wrong with it. This is an ex-"Parrot"._. 

[Note: If you have been programming in a language such as C or Java, you might be a bit confused about the term "define". Different languages use different terms when creating variables, such as "define", "declare", "initialize", etc. with slightly different meanings. In statically typed languages such as C or Java, declaring a variable creates a name connected to a container which can contain data of a specific type, but does not put a value in that container. Initialization is then the act of putting an initial value in such a container. Defining a variable is often used as a synonym to declaring a variable in statically typed languages but as Python is dynamically typed, i.e. variables can contain values of any type, there is no need to declare variables before initializing them. Thus, defining a variable in python entails simply assigning a value to a new name, at which point the variable is both declared and initialized. This works exactly as in R.]

In [96]:
parrot='It is dead, that is what is wrong with it. This is an "ex-Parrot"'

b) What methods does the string now called `parrot` (or indeed any string) seem to support? Write Python commands below to find out.

In [121]:
dir(parrot)

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getnewargs__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rmod__',
 '__rmul__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'capitalize',
 'casefold',
 'center',
 'count',
 'encode',
 'endswith',
 'expandtabs',
 'find',
 'format',
 'format_map',
 'index',
 'isalnum',
 'isalpha',
 'isascii',
 'isdecimal',
 'isdigit',
 'isidentifier',
 'islower',
 'isnumeric',
 'isprintable',
 'isspace',
 'istitle',
 'isupper',
 'join',
 'ljust',
 'lower',
 'lstrip',
 'maketrans',
 'partition',
 'replace',
 'rfind',
 'rindex',
 'rjust',
 'rpartition',
 'rsplit',
 'rstrip',
 'split',
 'splitlines',
 'startswith',
 'strip',
 'swapcase',
 'title',
 'translate',
 'upper',


c) Count the number of characters (letters, blank space, commas, periods
etc) in the sentence.

In [22]:
len(parrot)

65

e) If we type `parrot + parrot`, should it change the string itself, or merely produce a new string? How would you test your intuition? Write expressions below.

In [28]:
parrot+parrot

'It is dead, that is what is wrong with it. This is an "ex-Parrot"It is dead, that is what is wrong with it. This is an "ex-Parrot"'

In [29]:
'''The string does not change itself, it merely concatenates the strings specified to produce a new string.'''

'The string does not change itself, it merely concatenates the strings specified to produce a new string.'

f) Separate the sentence into a list of words (possibly including separators) using a built-in method. Call the list `parrot_words`.

In [117]:
parrot_words=parrot.split()

In [118]:
print(parrot_words)

['It', 'is', 'dead,', 'that', 'is', 'what', 'is', 'wrong', 'with', 'it.', 'This', 'is', 'an', '"ex-Parrot"']


e) Merge (concatenate) `parrot_words` into a string again.

In [45]:
' '.join(parrot_words)

'It is dead, that is what is wrong with it. This is an "ex-Parrot"'

### 2. Iteration, sequences and string formatting

Loops are not as painfully slow in Python as they are in R and thus, not as critical to avoid. However, for many use cases, _comprehensions_, like _list comprehensions_ or _dict comprehensions_ are faster. In this assignment we will see both traditional loop constructs and comprehensions. For an introduction to comprehensions, __[this](https://python-3-patterns-idioms-test.readthedocs.io/en/latest/Comprehensions.html)__ might be a good place to start.

It should also be noted that what Python calls lists are unnamed sequences. As in R, a Python list can contain elements of many types, however, these can only be accessed by indexing or sequence, not by name as in R.

a) Write a `for`-loop that produces the following output on the screen:<br>
> `The next number in the loop is 5`<br>
> `The next number in the loop is 6`<br>
> ...<br>
> `The next number in the loop is 10`<br>

[Hint: the `range` function has more than one argument.]<br>
[Literature: For the range construct see LP part II chapter 4 (p.112).]

In [5]:
for i in range(5,10):
    print("The next number in the loop is {}".format(i))

The next number in the loop is 5
The next number in the loop is 6
The next number in the loop is 7
The next number in the loop is 8
The next number in the loop is 9


b) Write a `for`-loop that for a given`n` sets `first_n_squared` to the sum of squares of the first `n` numbers (0..n-1).

In [18]:
n = 100  # If we change this and run the code, the value of first_n_squared should change afterwards!
# your code goes here
j=[]
for i in range(1,n+1):
    print(i,sep=" ",end=" ")
    if(i<n):
        print("+",sep=" ",end=" ")
    j.append(i^2)
print("=",sum(j))
 
print()   # should return 0^2 + 1^2 + ... + 99^2 = 328350 if n = 100

1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 + 11 + 12 + 13 + 14 + 15 + 16 + 17 + 18 + 19 + 20 + 21 + 22 + 23 + 24 + 25 + 26 + 27 + 28 + 29 + 30 + 31 + 32 + 33 + 34 + 35 + 36 + 37 + 38 + 39 + 40 + 41 + 42 + 43 + 44 + 45 + 46 + 47 + 48 + 49 + 50 + 51 + 52 + 53 + 54 + 55 + 56 + 57 + 58 + 59 + 60 + 61 + 62 + 63 + 64 + 65 + 66 + 67 + 68 + 69 + 70 + 71 + 72 + 73 + 74 + 75 + 76 + 77 + 78 + 79 + 80 + 81 + 82 + 83 + 84 + 85 + 86 + 87 + 88 + 89 + 90 + 91 + 92 + 93 + 94 + 95 + 96 + 97 + 98 + 99 + 100 = 5050



c) Write a code snippet that counts the number of __letters__ in `parrot` (as defined above). Use a `for` loop.

In [57]:
j=''.join(parrot_words)
for i in range(1,len(j)+1):
    a=len(j)
a

52

d) Write a for-loop that iterates over the list `names` below and presents them on the screen in the following fashion:

> `The name Tesco is nice`<br>
> ...<br>
> `The name Zeno is nice`<br>

Use Python's string formatting capabilities (the `format` function in the string class) to solve the problem.

[Warning: The best practices for how to do string formatting differs from Python 2 and 3, make sure you use the Python 3 approach.]<br>
[Literature: String formatting is covered in LP part II chapter 7.]

In [22]:
names = ['Tesco', 'Forex', 'Alonzo', 'Zeno']
for i in range(0,len(names)):
    print("{0}{1}{2}".format("The name ",names[i]," is nice"))

The name Tesco is nice
The name Forex is nice
The name Alonzo is nice
The name Zeno is nice


e) Write a for-loop that iterates over the list `names` and produces the list `n_letters` (`[5,5,6,4]`) with the length of each name.

In [84]:
n_letters=[]
for i in range(0,len(names)):
    n_letters.append(len(names[i]))
n_letters    

[5, 5, 6, 4]

f) How would you - in a Python interpreter/REPL or in this Notebook - retrieve the help for the built-in function `max`?

In [85]:
help(max)

Help on built-in function max in module builtins:

max(...)
    max(iterable, *[, default=obj, key=func]) -> value
    max(arg1, arg2, *args, *[, key=func]) -> value
    
    With a single iterable argument, return its biggest item. The
    default keyword-only argument specifies an object to return if
    the provided iterable is empty.
    With two or more arguments, return the largest argument.



g) Show an example of how `max` can be used with an iterable of your choice.

In [2]:
mix_list = (0, '1','100',90, '111','2', 102.5)
 
mx_val=max(mix_list, key=lambda x:int(x))
 
print("The max value in the list: ",mx_val)

The max value in the list:  111


h) Use a comprehension (or generator) to calculate the sum 0^2 + ... + (n-1)^2 as above.

In [19]:
n = 100
first_n_squared = None # Change None to your solution.
first_n_squared # Should return the same result as your for-loop.

i) Solve assignment e) using a list comprehension.

[Literature: Comprehensions are covered in LP part II chapter 4.]

In [93]:
n=100
first_n_squared = [(x-1)^2 for x in range(n)]
first_n_squared=sum(first_n_squared)
first_n_squared

4850

j) Use a list comprehension to produce a list `short_long` that indicates if the name (in the list `names`) has more than four letters. The answer should be `['long', 'long', 'long', 'short']`.

In [94]:
print(names)
short_long=[]
[short_long.append('long') if len(names[x])>4 else short_long.append('short') for x in range(0,len(names)) ]  
print(short_long)

['Tesco', 'Forex', 'Alonzo', 'Zeno']
['long', 'long', 'long', 'short']


k) Use a comprehension to count the number of letters in `parrot`. You may not use a `for`-loop. (The comprehension will contain the word `for`, but it isn't a `for ... in ...:`-statement.)

In [120]:
count=[(len(parrot[x])-1) if(parrot[x]==' ') else len(parrot[x]) for x in range(0,len(parrot))]
sum(count)


52

[Note: this is fairly similar to the long/short task, but note how we access member functions of the values.]

l) Below we have the string `datadump`. Retrieve the substring string starting at character 27 and ending at character 34 by means of slicing.

In [122]:
datadump = "The name of the game is <b>old html</b>. That is <b>so cool</b>."
datadump[27:35]

'old html'

l) Write a loop that uses indices to __simultaneously__ loop over the lists `names` and `short_long` to write the following to the screen:

> `The name Tesco is a long name`<br>
> ...<br>
> `The name Zeno is a short name`<br>

In [123]:
for x in range(len(names)):
    print("The name {0} is a {1} name".format(names[x], short_long[x]))

The name Tesco is a long name
The name Forex is a long name
The name Alonzo is a long name
The name Zeno is a short name


Note: this is a common programming pattern, though not particularly Pythonic in this use case. We do however need to know how to use indices in lists to work properly with Python.

m) Do the task above once more, but this time without the use of indices.

(1, 1, 2)


[Hint: Use the `zip` function.]<br>
[Literature: zip usage with dictionary is found in LP part II chapter 8 and dictionary comprehensions in the same place.]

n) Among the built-in datatypes, it is also worth mentioning the tuple. Construct two tuples, `one` containing the number one and `two` containing the number 1 and the number 2. What happens if you add them? Name some method that a list with similar content (such as `two_list` below) would support, that `two` doesn't and explain why this makes sense.

In [126]:
one = None    # Change this.
two = None    # Change this
two_list = [1, 2]

print("Adding one and two produces a new tuple (1, 1, 2) that contains elements from both the tuples. append() method supports two_list but does not support two")

Adding one and two produces a new tuple (1, 1, 2) that contains elements from both the tuples. append() method supports two_list but does not support two


### 3. Conditionals, logic and while loops

a) Below we have an integer called `n`. Write code that prints "It's even!" if it is even, and "It's odd!" if it's not.

In [140]:
n = 5 # Change this to other values and run your code to test.
if n%2==0:
    print("It's even!")
else:
    print("It's odd!")

It's odd!


b) Below we have the list `options`. Write code (including an `if` statement) that ensures that the boolean variable `OPTIMIZE` is True _if and only if_ the list contains the string `--optimize` (exactly like that).

In [145]:
OPTIMIZE = None       # Or some value which we are unsure of.
options = ['--print-results', '--optimize', '-x']  # This might have been generated by a GUI or command line option

if "--optimize" in options:
  OPTIMIZE = True
# Here OPTIMIZE should be True if and only if we found '--optimize' in the list.

OPTIMIZE

True

Note: It might be tempting to use a `for` loop. In this case, we will not be needing this, and you may _not_ use it. Python has some useful built-ins to test for membership.

You may use an `else`-free `if` statement if you like.

c) Redo the task above, but now consider the case where the boolean `OPTIMIZE` is True _if and only if_ the `options` list contains either `--optimize` or `-o` (or both).

In [151]:
OPTIMIZE = None       # Or some value which we are unsure of.
options = ['--print-results', '-o', '-x']  # This might have been generated by a GUI or command line option

if "--optimize" or "-o" in options:
  OPTIMIZE = True

# Here OPTIMIZE should be True if and only if we found '--optimize' or '-o' in the list.
OPTIMIZE

True

[Hint: Don't forget to test your code with different versions of the options list! 

If you find something that seems strange, you might want to check what the value of the _condition itself_ is.]

[Note: This extension of the task is included as it includes a common source of hard-to-spot bugs.]

d) Write out *a few* good tests that should (in an *extremely* informal sense) illustrate the correctness of your solution. Make sure that you actually try them out with your code. 

Also make sure to write what the cases actually tell you, and why they are useful inputs to your set of tests. If you already have the test `options = ["hey"]`, adding the test `options = ["hello"]` doesn't add any possible ways of failing (or any coverage).

In [None]:
# Test 0: if options is the list below, OPTIMIZE should be True after. 
# This test demonstrates that ...
options0 = None       

[Note: This way of testing your code is very primitive, but it's good to get used to constructing test cases. We will be discussing procedures and functions in the next lab.]

d) Sometimes we can avoid using an `if` statement altogether. The task above is a prime example of this (and was introduced to get some practice with the `if` statement). Solve the task above in a one-liner without resorting to an `if` statement. (You may use an `if` expression, but you don't have to.)

In [None]:
options = ['--print-results', '-o', '-x']  # This might have been generated by a GUI or command line option

OPTIMIZE = None # Replace None with your single line of code.

# Here OPTIMIZE should be True if and only if we found '--optimize' or '-o' in the list.

[Hint: What should the value of the condition be when you enter the then-branch of the `if`? When you enter the else-branch?]

e) Write a `while`-loop that repeatedly generates a random number from a uniform distribution over the interval [0,1], and prints the sentence 'The random number is smaller than 0.9' on the screen until the generated random number is greater than 0.9.

[Hint: Python has a `random` module with basic random number generators.]<br/>

[Literature: Introduction to the Random module can be found in LP part III chapter 5 (Numeric Types). Importing modules is introduced in part I chapter 3  and covered in depth in part IV.]

### 4. Dictionaries

Dictionaries are association tables, or maps, connecting a key to a value. For instance a name represented by a string as key with a number representing some attribute as a value. Dictionaries can themselves be values in other dictionaries, creating nested or hierarchical data structures. This is similar to named lists in R but keys in Python dictionaries can be more complex than just strings.

[Literature: Dictionaries are found in LP section II chapter 4.]

a) Make a dictionary named `amadeus` containing the information that the student Amadeus is a male, scored 8 on the Algebra exam and 13 on the History exam. The dictionary should NOT include a name entry.

In [127]:
amadeus = {'Gender': 'Male', 'Algebra': 8, 'History': 13}
amadeus

{'Gender': 'Male', 'Algebra': 8, 'History': 13}

b) Make three more dictionaries, one for each of the students: Rosa, Mona and Ludwig, from the information in the following table:

| Name          | Gender        | Algebra       | History | 
| :-----------: | :-----------: |:-------------:| :------:|
| Rosa          | Female        | 19            | 22      |
| Mona          | Female        | 6             | 27      |
| Ludwig        | Other         | 12            | 18      |

In [129]:
rosa = {'Gender': 'Female', 'Algebra': 19, 'History': 22}
mona = {'Gender': 'Female', 'Algebra': 6, 'History': 27}
ludwig = {'Gender': 'Other', 'Algebra': 12, 'History': 18}

c) Combine the four students in a dictionary named `students` such that a user of your dictionary can type `students['Amadeus']['History']` to retrive Amadeus score on the history test.

[HINT: The values in a dictionary can be dictionaries.]

In [130]:
students = {"Amadeus": amadeus, "Rosa": rosa, "Mona": mona, "Ludwig":ludwig}
students["Amadeus"]["History"]

13

d) Add the new male student Karl to the dictionary `students`. Karl scored 14 on the Algebra exam and 10 on the History exam.

In [131]:
students["Karl"] = {'Gender': 'Male', 'Algebra': 14, 'History': 10}
students

{'Amadeus': {'Gender': 'Male', 'Algebra': 8, 'History': 13},
 'Rosa': {'Gender': 'Female', 'Algebra': 19, 'History': 22},
 'Mona': {'Gender': 'Female', 'Algebra': 6, 'History': 27},
 'Ludwig': {'Gender': 'Other', 'Algebra': 12, 'History': 18},
 'Karl': {'Gender': 'Male', 'Algebra': 14, 'History': 10}}

e) Use a `for`-loop to print out the names and scores of all students on the screen. The output should look like something this (the order of the students doesn't matter):

> `Student Amadeus scored 8 on the Algebra exam and 13 on the History exam`<br>
> `Student Rosa scored 19 on the Algebra exam and 22 on the History exam`<br>
> ...

[Hint: Dictionaries are iterables, also, check out the `items` function for dictionaries.]

In [132]:
for k, v in students.items():
    print("Student {0} scored {1} on the Algebra exam and {2} on the History exam".format(k, v['Algebra'], v['History']))

Student Amadeus scored 8 on the Algebra exam and 13 on the History exam
Student Rosa scored 19 on the Algebra exam and 22 on the History exam
Student Mona scored 6 on the Algebra exam and 27 on the History exam
Student Ludwig scored 12 on the Algebra exam and 18 on the History exam
Student Karl scored 14 on the Algebra exam and 10 on the History exam


f) Use a dict comprehension and the lists `names` and `short_long` from assignment 2 to create a dictionary of names and wether they are short or long. The result should be a dictionary equivalent to {'Forex':'long', 'Tesco':'long', ...}.

[Note: Remember that dictionaries in Python are unordered and that the order of the pairs in the above dictionary is arbitrary, you might not get the same order, this is fine.]<br>

In [133]:
length_of_name = {names:short_long for names, short_long in zip(names, short_long)}
length_of_name

{'Tesco': 'long', 'Forex': 'long', 'Alonzo': 'long', 'Zeno': 'short'}

### 5. Introductory file I/O

File I/O in Python is a bit more general than what most R programmers are used to. In R, reading and writing files are usually performed using file type specific functions such as `read.csv` while in Python we usually start with reading standard text files. However, there are lots of specialized functions for different file types in Python as well, especially when using the __[pandas](http://pandas.pydata.org/)__ library which is built around a datatype similar to R DataFrames. Pandas will not be covered in this course though.

[Literature: Files are introduced in LP part II chapter 4 and chapter 9.]

The file `students.tsv` contains tab separated values corresponding to the students in previous assigments.

a) Iterate over the file, line by line, and print each line. The result should be something like this:

> `Amadeus	Male	8	13`<br>
> `Rosa	Female	19	22`<br>
> ...

The file should be closed when reading is complete.

[Hint: Files are iterable in Python.]

In [135]:
a = open("students.tsv", "r")
for eachLine in a.readlines():
    print(eachLine, end = "")
    
a.close()

Amadeus	Male	8	13
Rosa	Female	19	22
Mona	Female	6	27
Ludwig	Other	12	18
Karl	Male	14	10

b) Working with many files can be problematic, especially when you forget to close files or errors interrupt programs before files are closed. Python thus has a special `with` statement which automatically closes files for you, even if an error occurs. Redo the assignment above using the `with` statement.

[Literature: With is introduced in LP part II chapter 9 page 294.]

In [136]:
with open("students.tsv") as a:
    for eachLine in a.readlines():
        print(eachLine, end = "")

Amadeus	Male	8	13
Rosa	Female	19	22
Mona	Female	6	27
Ludwig	Other	12	18
Karl	Male	14	10

c) Recreate the dictionary from assignment the previous assignment by reading the data from the file. Using a dedicated csv-reader is not permitted.

In [137]:
students_1 = {}
with open("students.tsv") as a:
    for eachLine in a.readlines():
        student = eachLine.split()
        students_1[student[0]] = {'Gender': student[1], 'Algebra': int(student[2]), 'History': int(student[3])}

students_1

{'Amadeus': {'Gender': 'Male', 'Algebra': 8, 'History': 13},
 'Rosa': {'Gender': 'Female', 'Algebra': 19, 'History': 22},
 'Mona': {'Gender': 'Female', 'Algebra': 6, 'History': 27},
 'Ludwig': {'Gender': 'Other', 'Algebra': 12, 'History': 18},
 'Karl': {'Gender': 'Male', 'Algebra': 14, 'History': 10}}

d) Using the dictionary above, write sentences from task 4e above to a new file, called `students.txt`.

In [139]:
with open("students.txt", "w") as a:
    for k, v in students_1.items():
        a.write("Student {0} scored {1} in Algebra exam and {2} in History exam\n".format(k, v["Algebra"], v["History"]))