# Lab 1: Python basics

__Student I:__ Damian Mon Ke (damke753)

__Student II:__ Kyriakos Papadopoulos (kyrpa853)

### A word of caution

There are currently two versions of Python in common use, Python 2 and Python 3, which are not 100% compatible. Python 2 is slowly being phased out but has a large enough install base to still be relevant. This course uses the more modern Python 3 but while searching for help online it is not uncommon to find help for Python 2. Especially older posts on sources such as Stack Exchange might refer to Python 2 as simply "Python". This should not cause any serious problems but keep it in mind whenever googling. With regards to this lab, the largest differences are how `print` works and the best practice recommendations for string formatting.

### References to R

Most students taking this course who are not already familiar with Python will probably have some experience of the R programming language. For this reason, there will be intermittent references to R throughout this lab. For those of you with a background in R (or MATLAB/Octave, or Julia) the most important thing to remember is that indexing starts at 0, not at 1.

### Recommended Reading

This course is not built on any specific source and no specific litterature is required. However, for those who prefer to have a printed reference book, we recommended the books by Mark Lutz:

* Learning Python by Mark Lutz, 5th edition, O'Reilly. Recommended for those who have no experience of Python. This book is called LP in the text below.

* Programming Python by Mark Lutz, 4th edition, O'Reilly. Recommended for those who have some experience with Python, it generally covers more advanced topics than what is included in this course but gives you a chance to dig a bit deeper if you're already comfortable with the basics. This book is called PP in the text.

For the student interested in Python as a language, it is worth mentioning
* Fluent Python by Luciano Ramalho (also O'Reilly). Note that it is - at the time of writing - still in its first edition, from 2015. Thus newer features will be missing.

### A note about notebooks

When using this notebook, you can enter python code in the empty cells, then press ctrl-enter. The code in the cell is executed and if any output occurs it will be displayed below the square. Code executed in this manner will use the same environment regardless of where in the notebook document it is placed. This means that variables and functions assigned values in one cell will thereafter be accessible from all other cells in your notebook session.

Note that the programming environments described in section 1 of LP is not applicable when you run python in this notebook.

### A note about the structure of this lab

This lab will contain tasks of varying difficulty. There might be cases when the solution seems too simple to be true (in retrospect), and cases where you have seen similar material elsewhere in the course. Don't be fooled by this. In many cases, the task might just serve to remind us of things that are worthwhile to check out, or to find out how to use a specific method.

We will be returning to, and using, several of the concepts in this lab.

### 1. Strings and string handling

The primary datatype for storing raw text in Python is the string. Note that there is no character datatype, only strings of length 1. This can be compared to how there are no atomic numbers in R, only vectors of length 1. A reference to the string datatype can be found __[here](https://docs.python.org/3/library/stdtypes.html#text-sequence-type-str)__.

[Litterature: LP: Part II, especially Chapter 4, 7.]

a) Define the variable `parrot` as the string containing the sentence _It is dead, that is what is wrong with it. This is an ex-"Parrot"!_. 

[Note: If you have been programming in a language such as C or Java, you might be a bit confused about the term "define". Different languages use different terms when creating variables, such as "define", "declare", "initialize", etc. with slightly different meanings. In statically typed languages such as C or Java, declaring a variable creates a name connected to a container which can contain data of a specific type, but does not put a value in that container. Initialization is then the act of putting an initial value in such a container. Defining a variable is often used as a synonym to declaring a variable in statically typed languages but as Python is dynamically typed, i.e. variables can contain values of any type, there is no need to declare variables before initializing them. Thus, defining a variable in python entails simply assigning a value to a new name, at which point the variable is both declared and initialized. This works exactly as in R.]

In [2]:
parrot = 'It is dead, that is what is wrong with it. This is an ex-"Parrot"!'


b) What methods does the string now called `parrot` (or indeed any string) seem to support? Write Python commands below to find out.

<b> Lecturer's comment: </b> `help` does not work on a string.

In [7]:
parrot.capitalize()
parrot.casefold()
parrot.find("a")
parrot.islower()

help(type(parrot))
dir(type(parrot))

Help on class str in module builtins:

class str(object)
 |  str(object='') -> str
 |  str(bytes_or_buffer[, encoding[, errors]]) -> str
 |  
 |  Create a new string object from the given object. If encoding or
 |  errors is specified, then the object must expose a data buffer
 |  that will be decoded using the given encoding and error handler.
 |  Otherwise, returns the result of object.__str__() (if defined)
 |  or repr(object).
 |  encoding defaults to sys.getdefaultencoding().
 |  errors defaults to 'strict'.
 |  
 |  Methods defined here:
 |  
 |  __add__(self, value, /)
 |      Return self+value.
 |  
 |  __contains__(self, key, /)
 |      Return key in self.
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __format__(self, format_spec, /)
 |      Return a formatted version of the string as described by format_spec.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getnewargs__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rmod__',
 '__rmul__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'capitalize',
 'casefold',
 'center',
 'count',
 'encode',
 'endswith',
 'expandtabs',
 'find',
 'format',
 'format_map',
 'index',
 'isalnum',
 'isalpha',
 'isascii',
 'isdecimal',
 'isdigit',
 'isidentifier',
 'islower',
 'isnumeric',
 'isprintable',
 'isspace',
 'istitle',
 'isupper',
 'join',
 'ljust',
 'lower',
 'lstrip',
 'maketrans',
 'partition',
 'replace',
 'rfind',
 'rindex',
 'rjust',
 'rpartition',
 'rsplit',
 'rstrip',
 'split',
 'splitlines',
 'startswith',
 'strip',
 'swapcase',
 'title',
 'translate',
 'upper',


c) Count the number of characters (letters, blank space, commas, periods
etc) in the sentence.

In [4]:
print(len(parrot))

66


d) If we type `parrot + parrot`, should it change the string itself, or merely produce a new string? How would you test your intuition? Write expressions below.

In [5]:
parrot + parrot
print(parrot)

It is dead, that is what is wrong with it. This is an ex-"Parrot"!


e) Separate the sentence into a list of words (possibly including separators) using a built-in method. Call the list `parrot_words`.

In [6]:
parrot_words = parrot.split()
print(parrot_words)

['It', 'is', 'dead,', 'that', 'is', 'what', 'is', 'wrong', 'with', 'it.', 'This', 'is', 'an', 'ex-"Parrot"!']


f) Merge (concatenate) `parrot_words` into a string again.

In [7]:
print(' '.join(parrot_words))

It is dead, that is what is wrong with it. This is an ex-"Parrot"!


g) Create a string `parrot_info` which consists of "The length of parrot_info is 66." (the length of the string should be calculated automatically, and you may not write any numbers in the string). Use f-string syntax!

In [8]:
print(f"The length of parrot_info is ", len(parrot))

The length of parrot_info is  66


### 2. Iteration, sequences and string formatting

Loops are not as painfully slow in Python as they are in R and thus, not as critical to avoid. However, for many use cases, _comprehensions_, like _list comprehensions_ or _dict comprehensions_ are faster. In this assignment we will see both traditional loop constructs and comprehensions. For an introduction to comprehensions, __[this](https://python-3-patterns-idioms-test.readthedocs.io/en/latest/Comprehensions.html)__ might be a good place to start.

It should also be noted that what Python calls lists are unnamed sequences. As in R, a Python list can contain elements of many types, however, these can only be accessed by indexing or sequence, not by name as in R.

a) Write a `for`-loop that produces the following output on the screen:<br>
> `The next number in the loop is 5`<br>
> `The next number in the loop is 6`<br>
> ...<br>
> `The next number in the loop is 10`<br>

[Hint: the `range` function has more than one argument.]<br>
[Literature: For the range construct see LP part II chapter 4 (p.112).]

In [12]:
for x in range(4, 10):
    print(f"the next number in the loop is ", x+1)

the next number in the loop is  5
the next number in the loop is  6
the next number in the loop is  7
the next number in the loop is  8
the next number in the loop is  9
the next number in the loop is  10


b) Write a `for`-loop that for a given`n` sets `first_n_squared` to the sum of squares of the first `n` numbers (0..n-1). Call the iteration variable `i`.

In [13]:
n = 100  # If we change this and run the code, the value of first_n_squared should change afterwards!
# your code goes here

n = 100
first_n_squared = 0  # should return 0^2 + 1^2 + ... + 99^2 = 328350 if n = 100

for x in range(0,n):
    first_n_squared += x ** 2

print(first_n_squared)

328350


Hint (not mandatory): iteration is often about a gradual procedure of updating or computing. Write out, on paper, how you would compute $0^2$, $0^2 + 1^2$, $0^2 + 1^2 + 2^2$, and consider what kinds of gradual updates you might want to perform.

c) It is often worth considering what a piece of code actually contributes. Think about a single loop iteration (when we go through the body of the loop). What should the variable `first_n_squared` contain _before_ a loop iteration? What should the loop iteration contribute? What does it contain _after_ ? A sentence or two for each is enough. Write this as a code comment in the box below:

<b> Lecturer's comment: </b> You are confusing a loop and loop iteration. The variable should be 0 before the <b> first </b> loop iteration. But not before the subsequent loop iteration. Please reexplain.

In [None]:
"""
Before the first loop iteration:
    first_n_squared is an integer with the value of 0. 
After each iteration: 
    It should contain the sumation of the squares values from 0 to i with i being the number of iterations done.
    after each iteration the square of i should be added to first_n_squared.
Contributed after the loop:
    The summation of all squares from 0 to n-1.
"""

after 1 iterations first_n_squared is 0
after 2 iterations first_n_squared is 1
after 3 iterations first_n_squared is 5
after 4 iterations first_n_squared is 14
after 5 iterations first_n_squared is 30
after 6 iterations first_n_squared is 55
after 7 iterations first_n_squared is 91
after 8 iterations first_n_squared is 140
after 9 iterations first_n_squared is 204
after 10 iterations first_n_squared is 285
after 11 iterations first_n_squared is 385
after 12 iterations first_n_squared is 506
after 13 iterations first_n_squared is 650
after 14 iterations first_n_squared is 819
after 15 iterations first_n_squared is 1015
after 16 iterations first_n_squared is 1240
after 17 iterations first_n_squared is 1496
after 18 iterations first_n_squared is 1785
after 19 iterations first_n_squared is 2109
after 20 iterations first_n_squared is 2470
after 21 iterations first_n_squared is 2870
after 22 iterations first_n_squared is 3311
after 23 iterations first_n_squared is 3795
after 24 iterations f

"\nBefore the first loop iteration:\nIt should contain the number 0. That's a widely used technique to get the sumation of some numbers. \nAfter: \nIt should contain the sumation of the squares values from 0 to n - 1.\n\nContributed:\nin each iteration, let's define the number of each iteration with i, should contribute the square of i.\n"

Hint: 
* Your answer might involve the iteration variable `i` (informally: the current number we're looking at in the loop).
* After all the loop iterations are done (and your iteration variable has reached _n - 1_ ), it should contain the sum $0^2 + 1^2 + ... + (n-1)^2$. Does your explanation suggest that this should be the case?

[Tangent: this form of reasoning can form the basis of a mathematical correctness proof for algorithms, that enables us to formally check that code does what it should. This is quite beyond the scope of this course, but the (CS-)interested reader might want to consider reading up on eg [loop invariants](https://en.wikipedia.org/wiki/Loop_invariant), We only go into it at the level of detail that actually forces us to think about what our (simple) code does.]

d) Write a code snippet that counts the number of __letters__ (alphabetic characters) in `parrot` (as defined above). Use a `for` loop.

In [14]:
count = 0
for ch in parrot:
    count += 1 if ch.isalpha() else 0
print(count)

47


e) Explain your letter-counting code in the same terms as above (before, after, contributed).

<b> Lecturer's comment: </b> Same remark: you are confusing a loop and a loop iteration. Please reexplain.

In [9]:
"""
Before a loop iteration:
    Count is the number of letters in the string up until current index

After: 
    in the end it will contain the total number of letters in the string parrot

Contributed:
    adds 1 to the count if we have a letter otherwise 0.
"""

'\nBefore a loop iteration:\nwe set the variable count equal to 0 as we did before\n\nAfter: \nin the end it will contain the total number of charachters(not spaces or , ! .) in the string parrot\n\nContributed:\nadds 1 to the variable every time that we have a characther. each iteration corresponds is a characther compared to the previous case where it was numeric.\n'

f) Write a for-loop that iterates over the list `names` below and presents them on the screen in the following fashion:

> `The name Tesco is nice`<br>
> ...<br>
> `The name Zeno is nice`<br>

Use Python's string formatting capabilities (the `format` function in the string class) to solve the problem.

[Warning: The best practices for how to do string formatting differs from Python 2 and 3, make sure you use the Python 3 approach.]<br>
[Literature: String formatting is covered in LP part II chapter 7.]

In [15]:
names = ['Tesco', 'Forex', 'Alonzo', 'Zeno']

for name in names:
    print(f"The name ", name, "is nice")

The name  Tesco is nice
The name  Forex is nice
The name  Alonzo is nice
The name  Zeno is nice


g) Write a for-loop that iterates over the list `names` and produces the list `n_letters` (`[5,5,6,4]`) with the length of each name.

In [16]:
n_letters = []
for name in names:
    n_letters.append(len(name))

print(n_letters)

[5, 5, 6, 4]


h) How would you - in a Python interpreter/REPL or in this Notebook - retrieve the help for the built-in function `max`?

In [16]:
help(max)

Help on built-in function max in module builtins:

max(...)
    max(iterable, *[, default=obj, key=func]) -> value
    max(arg1, arg2, *args, *[, key=func]) -> value
    
    With a single iterable argument, return its biggest item. The
    default keyword-only argument specifies an object to return if
    the provided iterable is empty.
    With two or more arguments, return the largest argument.



i) Show an example of how `max` can be used with an iterable of your choice.

In [17]:
numbers = [1,2,3,4,5]
max(numbers)

5

j) Use a comprehension (or generator) to calculate the sum 0^2 + ... + (n-1)^2 as above.

In [18]:
n = 100
first_n_squared = sum([x**2 for x in range(1,n)]) # Change None to your solution.
first_n_squared # Should return the same result as your for-loop.

328350

l) Use a list comprehension to produce a list `short_long` that indicates if the name (in the list `names`) has more than four letters. The answer should be `['long', 'long', 'long', 'short']`.

In [19]:
short_long = ["short" if 4 >= len(name) else "long" for name in names]
short_long
short_long

['long', 'long', 'long', 'short']

m) Use a comprehension to count the number of letters in `parrot`. You may not use a `for`-loop. (The comprehension will contain the word `for`, but it isn't a `for ... in ...:`-statement.)

In [20]:
sum([1 if charachter.isalpha() else 0 for charachter in parrot ])

47

[Note: this is fairly similar to the long/short task, but note how we access member functions of the values.]

n) Below we have the string `datadump`. Retrieve the substring string starting at character 27 (that is "o") and ending at character 34 ("l") by means of slicing.

In [21]:
datadump = "The name of the game is <b>old html</b>. That is <b>so cool</b>."
datadump[27:35]

'old html'

o) Write a loop that uses indices to __simultaneously__ loop over the lists `names` and `short_long` to write the following to the screen:

> `The name Tesco is a long name`<br>
> ...<br>
> `The name Zeno is a short name`<br>

In [22]:
for i in range(len(names)):
        print(f"the name ", names[i], " is a ", short_long[i], " name")

the name  Tesco  is a  long  name
the name  Forex  is a  long  name
the name  Alonzo  is a  long  name
the name  Zeno  is a  short  name


Note: this is a common programming pattern, though not particularly Pythonic in this use case. We do however need to know how to use indices in lists to work properly with Python.

p) Do the task above once more, but this time without the use of indices.

<b> Lecturer's comment: </b> The answer provided is correct. However, note that it is clearer to use a `for` loop instead of list comprehension + f-string.

In [24]:
#print("".join([f"The name {name} is a {lngth} name\n" for name, lngth in zip(names, short_long)]))
for name, length in zip(names, short_long):
    print(f"The name {name} is a {length} name")

The name Tesco is a long name
The name Forex is a long name
The name Alonzo is a long name
The name Zeno is a short name


[Hint: Use the `zip` function.]<br>
[Literature: zip usage with dictionary is found in LP part II chapter 8 and dictionary comprehensions in the same place.]

q) Among the built-in datatypes, it is also worth mentioning the tuple. Construct two tuples, `one` containing the number one and `two` containing the number 1 and the number 2. What happens if you add them? Name some method that a list with similar content (such as `two_list` below) would support, that `two` doesn't and explain why this makes sense.

In [25]:
one = (1,)    # Change this.
two = (1, 2)    # Change this
two_list = [1, 2]
print(type(one + two))
print(one + two)
"""
tuples are immutable, we cannot change their content. there are some methods for tuples 
but not methods that modify them. When we try to add them it creates a new tuple by 
combining them and not adding them.
"""

<class 'tuple'>
(1, 1, 2)


'\ntuples are immutable, we cannot change their content. there are some methods for tuples \nbut not methods that modify them. When we try to add them it creates a new tuple by \ncombining them and not adding them.\n'

### 3. Conditionals, logic and while loops

a) Below we have an integer called `n`. Write code that prints "It's even!" if it is even, and "It's odd!" if it's not.

In [26]:
n = 5 # Change this to other values and run your code to test.
# Your code here.
print("It's even" if n%2 == 0 else "It's odd")

It's odd


b) Below we have the list `options`. Write code (including an `if` statement) that ensures that the boolean variable `OPTIMIZE` is True _if and only if_ the list contains the string `--optimize` (exactly like that).

In [27]:
OPTIMIZE =  None       # Or some value which we are unsure of.
options = ['--print-results', '--optimize', '-x']  # This might have been generated by a GUI or command line option

# Your code goes here.
OPTIMIZE = True if "--optimize" in options else False

# Here OPTIMIZE should be True if and only if we found '--optimize' in the list.
OPTIMIZE

True

Note: It might be tempting to use a `for` loop. In this case, we will not be needing this, and you may _not_ use it. Python has some useful built-ins to test for membership.

You may use an `else`-free `if` statement if you like.

c) Redo the task above, but now consider the case where the boolean `OPTIMIZE` is True _if and only if_ the `options` list contains either `--optimize` or `-o` (or both). **You may only use one if-statement**.

In [28]:
OPTIMIZE = None       # Or some value which we are unsure of.
options = ['--print-results', '-o', '-x']  # This might have been generated by a GUI or command line option

# Your code goes here.
OPTIMIZE = True if "--optimize" in options or "-o" in options else False

# Here OPTIMIZE should be True if and only if we found '--optimize' or '-o' in the list.
OPTIMIZE

True

[Hint: Don't forget to test your code with different versions of the options list! 

If you find something that seems strange, you might want to check what the value of the _condition itself_ is.]

[Note: This extension of the task is included as it includes a common source of hard-to-spot bugs.]

d) Sometimes we can avoid using an `if` statement altogether. The task above is a prime example of this (and was introduced to get some practice with the `if` statement). Solve the task above in a one-liner without resorting to an `if` statement. (You may use an `if` expression, but you don't have to.)

In [29]:
options = ['--print-results', '-o', '-x']  # This might have been generated by a GUI or command line option

OPTIMIZE = "--options" in options or "-o" in options # Replace None with your single line of code.

# Here OPTIMIZE should be True if and only if we found '--optimize' or '-o' in the list.
OPTIMIZE


True

[Hint: What should the value of the condition be when you enter the then-branch of the `if`? When you enter the else-branch?]

e) Write a `while`-loop that repeatedly generates a random number from a uniform distribution over the interval [0,1], and prints the sentence 'The random number is smaller than 0.9' on the screen until the generated random number is greater than 0.9.

[Hint: Python has a `random` module with basic random number generators.]<br/>

[Literature: Introduction to the Random module can be found in LP part III chapter 5 (Numeric Types). Importing modules is introduced in part I chapter 3  and covered in depth in part IV.]

In [31]:
import random
x = random.uniform(0, 1)
while(x < 0.9):
    print(f"The random number is smaller than 0.9. The value is ", x)
    x = random.uniform(0, 1)

print(f"The random number is larger, it's ", x)

The random number is smaller than 0.9. The value is  0.502668954005783
The random number is smaller than 0.9. The value is  0.49319964785525616
The random number is smaller than 0.9. The value is  0.031759863363965746
The random number is smaller than 0.9. The value is  0.38880158482047
The random number is smaller than 0.9. The value is  0.5198441264676404
The random number is smaller than 0.9. The value is  0.36780958365888083
The random number is smaller than 0.9. The value is  0.34062328455738466
The random number is smaller than 0.9. The value is  0.4212691868478414
The random number is smaller than 0.9. The value is  0.05923798381526968
The random number is larger, it's  0.901103986029735


### 4. Dictionaries

Dictionaries are association tables, or maps, connecting a key to a value. For instance a name represented by a string as key with a number representing some attribute as a value. Dictionaries can themselves be values in other dictionaries, creating nested or hierarchical data structures. This is similar to named lists in R but keys in Python dictionaries can be more complex than just strings.

[Literature: Dictionaries are found in LP section II chapter 4.]

a) Make a dictionary named `amadeus` containing the information that the student Amadeus is a male, scored 8 on the Algebra exam and 13 on the History exam. The dictionary should NOT include a name entry.

In [25]:
amadeus = {
    "Gender" : "Male",
    "Algebra" : 8,
    "History" : 13,
}

b) Make three more dictionaries, one for each of the students: Rosa, Mona and Ludwig, from the information in the following table:

| Name          | Gender        | Algebra       | History | 
| :-----------: | :-----------: |:-------------:| :------:|
| Rosa          | Female        | 19            | 22      |
| Mona          | Female        | 6             | 27      |
| Ludwig        | Other         | 12            | 18      |

In [26]:
rosa = {
    "Gender" : "Female",
    "Algebra" : 19,
    "History" : 22,
}

mona = {
    "Gender" : "Female",
    "Algebra" : 6,
    "History" : 27,
}

ludwig = {
    "Gender" : "Other",
    "Algebra" : 12,
    "History" : 18,
}

c) Combine the four students in a dictionary named `students` such that a user of your dictionary can type `students['Amadeus']['History']` to retrive Amadeus score on the history test.

[HINT: The values in a dictionary can be dictionaries.]

In [27]:
students = {
    "Amadeus" : amadeus,
    "Rosa" : rosa,
    "Mona" : mona,
    "Ludwig" : ludwig}
print(students['Amadeus']['History'])  

13


d) Add the new male student Karl to the dictionary `students`. Karl scored 14 on the Algebra exam and 10 on the History exam.

In [28]:
karl = {
    "Gender" : "Male",
    "Algebra" : 14,
    "History" : 10,
}

students["Karl"] = karl

e) Use a `for`-loop to print out the names and scores of all students on the screen. The output should look like something this (the order of the students doesn't matter):

> `Student Amadeus scored 8 on the Algebra exam and 13 on the History exam`<br>
> `Student Rosa scored 19 on the Algebra exam and 22 on the History exam`<br>
> ...

[Hint: Dictionaries are iterables, also, check out the `items` function for dictionaries.]

<b> Lecturer's comment: </b> The variable names are self-explaining and it makes your code very clear. Well done ! A side not: the use of `get` is a bit heavy and you could access the value more directly: `subject_grade["Algebra"]`. But that is a detail.

In [29]:
for name, subject_grade in students.items():    # Change the names of iteration variables to something moresuitable than k, v.
    print(f"Student ", name, " scored ", subject_grade["Algebra"], "on the Algebra exam and ", subject_grade["History"], " on the history exam")   # Your code goes here.

Student  Amadeus  scored  8 on the Algebra exam and  13  on the history exam
Student  Rosa  scored  19 on the Algebra exam and  22  on the history exam
Student  Mona  scored  6 on the Algebra exam and  27  on the history exam
Student  Ludwig  scored  12 on the Algebra exam and  18  on the history exam
Student  Karl  scored  14 on the Algebra exam and  10  on the history exam


f) Use a dict comprehension and the lists `names` and `short_long` from assignment 2 to create a dictionary of names and wether they are short or long. The result should be a dictionary equivalent to {'Forex':'long', 'Tesco':'long', ...}.

In [37]:
{name: lngth for name, lngth in zip(names, short_long)}

{'Tesco': 'long', 'Forex': 'long', 'Alonzo': 'long', 'Zeno': 'short'}

### 5. Introductory file I/O

File I/O in Python is a bit more general than what most R programmers are used to. In R, reading and writing files are usually performed using file type specific functions such as `read.csv` while in Python we usually start with reading standard text files. However, there are lots of specialized functions for different file types in Python as well, especially when using the __[pandas](http://pandas.pydata.org/)__ library which is built around a datatype similar to R DataFrames. Pandas will not be covered in this course though.

[Literature: Files are introduced in LP part II chapter 4 and chapter 9.]

The file `students.tsv` contains tab separated values corresponding to the students in previous assigments.

a) Iterate over the file, line by line, and print each line. Do NOT use a CSV reader.

The result should be something like this:

> `Amadeus	Male	8	13`<br>
> `Rosa	Female	19	22`<br>
> ...

The file should be closed when reading is complete.

[Hint: Files are iterable in Python.]

In [39]:
file = open("students.tsv", "r")
for line in file:
    print(line)
file.close()

Amadeus	Male	8	13

Rosa	Female	19	22

Mona	Female	6	27

Ludwig	Other	12	18

Karl	Male	14	10


b) Working with many files can be problematic, especially when you forget to close files or errors interrupt programs before files are closed. Python thus has a special `with` statement which automatically closes files for you, even if an error occurs. Redo the assignment above using the `with` statement.

[Literature: With is introduced in LP part II chapter 9 page 294.]

<b> Lecturer's comment: </b> The iterating variable could be named better. Something `line` would make your code more readable.

In [30]:
with open("students.tsv", "r") as student_file:
    print("\n".join([" ".join(line.split()) for line in student_file.read().splitlines()]))

Amadeus Male 8 13
Rosa Female 19 22
Mona Female 6 27
Ludwig Other 12 18
Karl Male 14 10


c) If you are going to open text files that might have different character encodings, a useful habit might be to use the [`codecs`](https://docs.python.org/3/library/codecs.html) module. Redo the task above, but using codecs.open. You might want to find out the character encoding of the file (for instance in an edit

<b> Lecturer's comment: </b> Same comment as above.

In [31]:
import codecs
with codecs.open("students.tsv", "r") as student_file:
    print("\n".join([" ".join(line.split()) for line in student_file.read().splitlines()]))

Amadeus Male 8 13
Rosa Female 19 22
Mona Female 6 27
Ludwig Other 12 18
Karl Male 14 10


d) Recreate the dictionary from assignment the previous assignment by reading the data from the file. Using a dedicated csv-reader is not permitted.

<b> Lecturer´s comment: </b> Same remark as above. Please redo the task.

In [40]:
import codecs
with codecs.open("students.tsv", "r") as student_file:
    students =({
        words[0]: {
            "Gender": words[1],
            "Algebra": words[2],
            "History": words[3] 
        }
        for words in [line.split() for line in student_file.read().splitlines()]
    })

e) Using the dictionary above, write sentences from task 4e above to a new file, called `students.txt`.

<b> Lecturer's comment: </b> Same remark as above. `k`could be called `name` for example. Please redo the task.

In [41]:
with codecs.open("students.txt", "w") as new_file:
    new_file.write(
        "".join(
            [f"Student {name} scored {student['Algebra']} on the Algebra exam and {student['History']}\n" for name, student in students.items()])
    )    # Change the names of iteration variables to something more suitable than k, v.