In [1]:
%load_ext tutorial.tests.testsuite

# Input and output in Python

## Input and output

A program is useless without *interaction* with the outside world:

We need:

- Showing text to the user
- Asking for user input
- Accessing the file system
- Reading files
- Writing files
- Connecting to networks

and much more.

## User output

To output text to the python console, we can use the `print` function.


In [None]:
print("a")

If we want to insert a variable in a string to print, we can use *f-strings*:

- We prepend `f` to the string
- We reference the variable in the string using `{name_of_variable}`

In [None]:
x = 4
print(f"x is {x}")

## Asking for user input 
If we want to get text input from the console, we use the `input` function.

⚠️ `input` is a *blocking* function. It stops the execution of the program until the input is available. **Do not** use it in the middle of a computation unless you want a bad surprise.


In [14]:
print("Enter your name")
name = input()
print(f"Hello, {name}")

Enter your name
Simone
Hello, Simone


## Accessing the filesystem

To find and access files in the filesystem, we use the `pathlib` module of the standard library. 

This allows us to represent and manipulate paths to files in a portable and intuitive manner.

In [4]:
import pathlib as pl
# Here we define a path to the current directory
my_path = pl.Path("./")


In [5]:
# We can find all files in `my_path` using `glob`. The "*" is a pattern which means find all files. 
all_my_files = my_path.glob("*")

In [7]:
print(list(all_my_files))

[PosixPath('tutorial'), PosixPath('index.ipynb'), PosixPath('.git'), PosixPath('FAQ.md'), PosixPath('.gitignore'), PosixPath('intro.ipynb'), PosixPath('README.md'), PosixPath('.vscode'), PosixPath('input_output_solutions.py'), PosixPath('functional_programming.html'), PosixPath('pytest.ini'), PosixPath('functional_programming.slides.html'), PosixPath('.ipynb_checkpoints'), PosixPath('magic_example.ipynb'), PosixPath('LICENSE'), PosixPath('.pytest_cache'), PosixPath('data'), PosixPath('.pre-commit-config.yaml'), PosixPath('tests'), PosixPath('utils'), PosixPath('input_output.ipynb'), PosixPath('environment.yml')]


In [15]:
#With `/` we can combine paths:

data_dir = (pl.Path("./tutorial/tests") / pl.Path("./data"))

#With the pattern `"*.csv"` we look for all files that end in ".csv"

print(list(data_dir.glob("*.csv")))

[PosixPath('tutorial/tests/data/example1.csv'), PosixPath('tutorial/tests/data/example.csv'), PosixPath('tutorial/tests/data/english.csv'), PosixPath('tutorial/tests/data/dict.csv')]


## Reading text from files

Now we know how to locate files and we want to learn how to *read* from files.



To do so, we use the `open` function. This *opens* a file for reading, returning a *file handle*.

In [5]:
input_file = open(pl.Path("./tutorial/tests/data/hello.txt"))


In [11]:
print(input_file)
input_file.close()

<_io.TextIOWrapper name='data/me.txt' mode='r' encoding='UTF-8'>


A file handle is like a door: we should never forget to close it with `close`.

Why: the contents of `write` aren't written until the file is closed. 

If we want to get the *content* of the file, we need to use `read` or `readlines`:

In [12]:
input_file = open(pl.Path("./tutorial/tests/data/hello.txt"))
print(input_file.readlines())
input_file.close()


['Simone']


## Writing to files
Obviously we want to perform the opposite action: we want to *write* text to files.


The process is similar to reading;  we obtain a *file handle*, but we change its *mode* to writing:

In [11]:
# `w` opens a file for writing
output_file = open(pl.Path("./tutorial/tests/data/me.txt"), "w")


Then we can write to the file using `write` or `writelines`. Write takes a `str` and writes it  as it is, `writelines` take a list or iterable of `str` and writes each entry as a new line.

Let's test `write`:

In [12]:
output_file.write("Simone")
output_file.close()

Now if we read from it, we will see the new content:

In [13]:
input_file = open(pl.Path("./tutorial/tests/data/me.txt"))
print(input_file.readlines())
input_file.close()


['Simone']


## Context managers

Opened files must be closed to avoid problems like:



In [None]:
- Inconsistent state and data corrpution
- Running out of file handles (if we open many files)

This is a **bad** idea on many operating systems:
```python
f = []
for i in range(10000):
    f.append(open(f"{i}", "w"))
print(f)
```
you will eventually run out of files to open.

To avoid these type of problems, python offers *context managers*. This is a special statement that automatically manages the closing of files for us as soon as we leave the block:

In [9]:
import pathlib as pl
with open(pl.Path("./tutorial/tests/data/hello.txt")) as input_file:
    #This is the scope of the context manager.
    #As long as we stay inside of this, we can acess `input_file`
    text = input_file.readlines()

In [10]:
print(text)

['Hello, I am a file with some text.\n']


ValueError: I/O operation on closed file.

Indeed, if we try to access `input_file` outside of the scope of the context manager, we run into an error:

In [None]:
print(input_file.readlines())

## References
- [input and output tutorial](https://docs.python.org/3/tutorial/inputoutput.html) from the official python tutorial.
- [Is explicitly closing files important?](https://stackoverflow.com/questions/7395542/is-explicitly-closing-files-important)
- [context managers](https://docs.python.org/3/library/contextlib.html) from the python standard library

## Introduction
An important aspect to learn in a new programming language is input/output (or I/O). For our program to be useful, we need to be able to interact with the outside world. This means performing operartions like:
- reading and writing data to a file
- printing text output
- asking for user input
- connecting to databases or other network services


The majority of these operations are covered by the python standard library. We are going to see how to use them in this chapter.

<div class="alert alert-block alert-info">
⚠️ In reference to the chapter on functional programming, it is interesting to note that these functions perform *side-effects*.  Therefore any code containing these operations is no longer *pure* and is not referentially transparent. The same function can return different values for the same argument if called multiple time and the function can have *long-distance* effects. That means that they can modify the state of the program somewhere else leading to unexpected results. 
    
Therefore, I always suggest trying to separate input and output from the other computations in your program. For example, if you have a complex calculation requring several user inputs at several stages of the process, consider writing a function that only performs the calculation given all inputs and then require all inputs separately, for example through a single file. This makes your code easier to debug, test and understand.
</div>




## String input and output 
The most basic I/O operation in python is displaying text on the python console. This is done using the `print` function:

In [1]:
print("I am some text")

I am some text


The complemenentary operation, prompting for *user input* is done through `input`.
The function takes one string as argument, which gives the prompt displayed on the console. The return value of the function contains the input the user types:

In [3]:
user_input = input("Write some text")
print(f"The user wrote {user_input}")

The user wrote fasdf



<div class="alert alert-block alert-info">
⚠️ Note that `input` is a **blocking operation**. The program evaluation stops and waits until the user provides an input. Therefore, use `input` **very carefully* and only when truly necessary. A common issue is that someone first writes an interactive program expecting user input through `input` and later integrates it into a larger application that is supposed to run automatically without any user interaction. Suddenly, the application stops somehwere and does not run further because of a well hidden `input` call...
</div>

If you really have to use it, do not do so in the middle of long-running computation or your model code but ask for all user inputs upfront. If you need to write an interactive applications however, there are other solutions that are more robust.

## File I/O
A second very common I/O operation in most programming languages is reading and writing from files. This is actually more complex than just writing on the console and consists of several steps:

1. You need to find *where* in the operating system the file is located. This gives you the so-called *path* to the file
2. You need to *open* the file for reading or writing. In some operationg systems, this *locks* the file, so that other users or processes cannot write simultaneously with you. This operation gives you a *file handle* that you can use to read or write.
3. Now , you can read (or write) the contents of the file 
4. Finally, you need to *close* the file to make it accessible to other processes again and to finalise the writing: in some implementations, `write` only writes the text to a temporary location in memory (a so-called *buffer*) and only writes the content to the file when you close it. This is done to increase response times, as writing to memory is faster than writing to a file on the disk.


As a trick to remember closing files: *a file handle is like a door handle*. 

### Paths
To handle the *where* a file is located, python offers the [pathlib] module which is used to represent *paths* to files in a portable manner. This means that you can use the one python object to represent the location of a file  regardless of the oeprating system and python will convert it into the representation specific to the operating system where you are executing the program. For example, windows uses `\` as a separator for path elements, while unix/linux use `/`. If you use pathlib, you can write code using only `/` and `pathlib`  will automatically select the correct path separator whe the program runs.

Let's see how to generate a `path` object pointing to the text file [`hello.txt`](./tutorial/tests/data/hello.txt) in the main directory of this repository. To do so, we use the [`Path`](https://docs.python.org/3/library/pathlib.html#basic-use) class of `pathlib`, whose constructor takes the location of the file as a string



In [17]:
import pathlib as pl

path = pl.Path("./tutorial/tests/data/hello.txt")


The `./` at the beginning of the path indicates that this path is a *relative path*. This is just a path that points to a file *relative* to the directory we are working in. 

The `path` object has a method `absolute` that returns the *absolute* full path of the object in the OS:

In [4]:
print(path.absolute())

/home/basi/python-tutorial/data/hello.txt


If we prefer, we can also construct paths by combining their components with the `/` operator. For example, we can do:

In [10]:
import pathlib as pl

path = pl.Path("./tutorial/tests") / pl.Path("data")
print(path.absolute())


/home/basi/python-tutorial/data


We can also list all files matching a certain in a given directory using the `glob` method:

In [11]:
[f for f in path.glob("*")]

[PosixPath('data/example1.csv'),
 PosixPath('data/lines.txt'),
 PosixPath('data/me.txt'),
 PosixPath('data/secret_message.dat'),
 PosixPath('data/numbers.txt'),
 PosixPath('data/example.csv'),
 PosixPath('data/english.csv'),
 PosixPath('data/hello.txt'),
 PosixPath('data/dict.csv'),
 PosixPath('data/output.txt')]

The `glob` method takes a `pattern` argument that expresses the form of the file names to look for. THe `*` star means *match everything*. For more information on *glob patterns*, see the documentation of [fnmatch](https://docs.python.org/3/library/fnmatch.html#module-fnmatch)



### Reading from a file

Let's see how to do this with an example: we want to open the file [hello.txt](./data/hello.txt) and read its contents.

1. The path is already identified, we know the file is in `./data/hello.txt`. We save this in a variable `path`
2. We now can use the built-in function [`open`](https://docs.python.org/3/library/functions.html#open) to open the file. This function returns a [file object](https://docs.python.org/3/glossary.html#term-file-object) that we can use to further manipulate the file. To ensure we only open the file for reading, we pass the string "r" to the second argument of `open`.
3. Now we can read the contents using `read`, `readline` or `readlines`. `read` only reads `n` characters from the file, `readline` reads a whole line, while `readlines` reads the whole file content as a list of strings, one item per line in the file. This knowledge is useful when we only want to read part of a file or when the file is too big to fit in memory and we can only read parts.
4. Finally, we close the file using the `close` method on the file object.




In [4]:
import pathlib as pl

path = pl.Path("./tutorial/tests/data/hello.txt")
file_ob = open(path, 'r')
contents = file_ob.readlines()
file_ob.close()
print(contents)

['Hello, I am a file with some text.\n']


Notice that calling `read`, `readline` or `readlines` *consumes* the file, either fully or to the corresponding location. This means that if we call `readlines` twice, we will get an empty list the second time:

In [20]:
import pathlib as pl

path = pl.Path("./tutorial/tests/data/hello.txt")
file_ob = open(path, 'r')
contents = file_ob.readlines()
print(contents)
other_contents = file_ob.readlines()
print(other_contents)
file_ob.close()


['Hello, I am a file with some text.\n']
[]


We can use this to read a file line-by-line by just iterating over the file using a `for` loop or a list compherension. The `file` object implements the [iterator](https://docs.python.org/3/glossary.html#term-iterator) protocol:

In [11]:
import pathlib as pl

path = pl.Path("./tutorial/tests/data/lines.txt")
file_ob = open(path, 'r')
for line in file_ob:
    print(line)
file_ob.close()

this

file

has multiple

lines

how many

lines

does this file

have?


This is the most *pythonic* way to read a file line-by-line instead of reading the full contents at once.

### Writing to a file
The process to write data to a file is very similar, the main difference being that:
- We use `w` as a second argument of `open` to specify that we want to write to the file. If the file already exists, it will be erased before we write something else to it. If you want to append to the file, you should use `a` instead.
- We use `write` to write a *string* to the file. Other types of object should be converted to string before being written.


Let's see this in action by writing your name in a file called `me.txt` in [data](./data/)

In [5]:
import pathlib as pl

path = pl.Path("./tutorial/tests/data/me.txt")
file_ob = open(path, "w")
file_ob.write("Simone")
file_ob.close()

Congratulations! Your name is now written in stone. 

If we want to write the contents of an *iterable* to a file, we can use the `writelines` method:

In [28]:
import pathlib as pl

path = pl.Path("./tutorial/tests/data/numbers.txt")
file_ob = open(path, "w")
file_ob.writelines([str(i) + "\n" for i in range(10)])
file_ob.close()

Notice that for each line, we concatenate the `newline` `\n` symbol to the string to be written to write the text to a new line.




### Context managers
As you can see, after *opening* and performing operations on a file, we always have to remember to *close* it. If we forget to close it, unexpected behavior can happen. If the program crashes later on, for example, we might have the situation where the text is not written to the file. If you open many files and you don't close them, the python intepreter can run out of memory. On some operating systems, the file contents are only updated after closing, etc ...

This pattern is very common dealing with many *resources*: files, connections, threads, servers, etc...  You acquire the access to the resource, do some work on it and finally you clean up after yourself by closing it again. Because of this, python offers a construct called [*context manager*](https://docs.python.org/3/reference/datamodel.html#context-managers) which implements exactly this beahvior:
- get access to a resource
- Do some work
- release this resource 

In the case of files, we can replace the open-read-close or open-write-close sequence  with a context manager. Context managers are used inside  the `with` statement:

In [17]:
import pathlib as pl

hello_file = pl.Path("./tutorial/tests/data/hello.txt")
with open(hello_file, "r") as file_ob: 
    contents = file_ob.readlines()
    print(contents)

['Hello, I am a file with some text.\n']


`with open(path) as name` opens the file in `path` and assigns it to the `name` file object. This object is only valid in the *scope* of the context manager, that is the indented block of code that follows the `:`. Once the python interpreter leaves the context manager, `file_ob.close()` is automatically called, ensuring the file is properly closed no matter what happens.

This pattern can be extended to any other resource that should be managed in a similar way,  for example database connections. 
Any object that implements `__enter__` and `__exit__` can be used with the context manager syntax.

If you want to learn how to implement context managers for other types of objects, please refer to the `contextlib` [documentation](https://docs.python.org/3/library/contextlib.html) in the python standard library.



### Binary I/O

Another aspect of file I/O is accessing files in [*binary mode*](https://docs.python.org/3/library/io.html#binary-i-o); that means that instead of writing and reading *text*, we manipulate `bytes` in order to represent non-textual data. This is useful for interacting with measurement data and other non-textual information like images, machine learning model parameters and other complex strcuture, although in most cases you won't need the low-level control of binary I/O and will use libraries instead. 

To look at an example, let's write a sequence of `int` to `output.dat`[^1]  as a  sequence of bytes. Because one byte corresponds to 8 bits, using one byte per integer means we can unambigously store `2^8= =256` values.


[^1]: `.dat` is a typical "generic" extension to indicate that the file contains some sort of data. Filename extensions does not have any binding meaning by themselves, they are simply a convention for users to quickly see what contents to expect.

In [20]:
import pathlib as pl

bin_file = pl.Path("./tutorial/tests/data/output.txt")
with open(bin_file, "wb") as out_file:
    bs = b"".join([i.to_bytes(1, 'little') for i in  range(10)])
    print(bs)
    out_file.write(bs)

b'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t'


note that we used the mode `wb` for `write, binary`. The `write` function expects a [`bytes`](https://docs.python.org/3/library/stdtypes.html#bytes) object. Because of the historical connection between bytes and strings, we can produce a  bytes object by prepending `b` to a string literal. Therefore, we generate an array of bytes using [`to_bytes`](https://docs.python.org/3/library/stdtypes.html#int.to_bytes) and combine them with the [`join`](https://docs.python.org/3/library/stdtypes.html#str.join) method on the empty byte literal `b""`.

Now that we wrote out our sequence, we can try to read it back from the file:

In [35]:
import pathlib as pl

bin_file = pl.Path("./tutorial/tests/data/output.txt")
with open(bin_file, "rb") as in_file:
    data = in_file.read()
    seq = [b for b in data]
    print(seq)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


And, surprise! We obtain our original sequence.
Note that because of quirk of python, the `read` method returns a `bytes` object, but when we try to access a single element (as we would do with a string), the entry is already an `int` which corresponds to the unicode codepoint.


If instead we want to store our numbers using two bytes per digit, we do:


In [62]:
import pathlib as pl
import struct

bin_file = pl.Path("./tutorial/tests/data/output1.txt")
with open(bin_file, "wb") as out_file:
    bs = b"".join([struct.pack(">H", i) for i in range(1024)])
    out_file.write(bs)

We do this by using the [`struct`](https://docs.python.org/3/library/struct.html) module of the python standard library, which offers methods to represent built-in data as different bytes formats.

The first argument to `struct.pack` is the format the data should be interpeted into, ">" means little endian, or putting the byte representing the smallest digit of each number first. "H" means an *unsigned short*, which corresponds to two bytes.

If we want to read the data, we can do:

In [68]:
import pathlib as pl
import struct

bin_file = pl.Path("./tutorial/tests/data/output1.txt")
with open(bin_file, "rb") as in_file:
    data = in_file.read()
    seq = [i for i, *rest in struct.iter_unpack(">H", data)]
    print(seq)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221,

#### Bytes and strings
Earlier, we mentioned the connection between bytes and strings in python. To learn more about this, let's briefly review the unicode UTF-8 standard.

The unicode UTF-8  standard is a system to represent texts; in unicode  every character is assigned a number (the *codepoint*) that gives the location of this symbol in the table of all symbols. Particularly, the unicode UTF-8 standard stores each character as 1 to four bytes (each byte is a group of 8 bits). 

Let's see this more clearly with an example:

In [32]:
for i in range(0, 8**3):
    bt = i.to_bytes(4)
    print(f"bytes: {bt}, integer: {i}, string: {chr(i)}")

bytes: b'\x00\x00\x00\x00', integer: 0, string:  
bytes: b'\x00\x00\x00\x01', integer: 1, string: 
bytes: b'\x00\x00\x00\x02', integer: 2, string: 
bytes: b'\x00\x00\x00\x03', integer: 3, string: 
bytes: b'\x00\x00\x00\x04', integer: 4, string: 
bytes: b'\x00\x00\x00\x05', integer: 5, string: 
bytes: b'\x00\x00\x00\x06', integer: 6, string: 
bytes: b'\x00\x00\x00\x07', integer: 7, string: 
bytes: b'\x00\x00\x00\x08', integer: 8, string: 
bytes: b'\x00\x00\x00\t', integer: 9, string: 	
bytes: b'\x00\x00\x00\n', integer: 10, string: 

bytes: b'\x00\x00\x00\x0b', integer: 11, string: 
bytes: b'\x00\x00\x00\x0c', integer: 12, string: 
bytes: b'\x00\x00\x00\r', integer: 13, string: 
bytes: b'\x00\x00\x00\x0e', integer: 14, string: 
bytes: b'\x00\x00\x00\x0f', integer: 15, string: 
bytes: b'\x00\x00\x00\x10', integer: 16, string: 
bytes: b'\x00\x00\x00\x11', integer: 17, string: 
bytes: b'\x00\x00\x00\x12', integer: 18, string: 
bytes: b'\x00\x00\x00\x13', integer: 19, string

In [None]:
The `chr` function takes an integer and returns the corresponding unicode character.

#### Converting bytes to text 

If we receive a `bytes` object and we want to transform it to text, we can use the `int.from_bytes` method and the`chr` function:

In [77]:
message = "Ciao"
message_secret = bytes(message, "utf-8")
[print(f"The `uft8` codepoint  is = {enc}, the bytes representation = {enc.to_bytes(4)}, the representation is {chr(enc)}") for plain, enc in zip(message, message_secret)]

The `uft8` codepoint  is = 67, the bytes representation = b'\x00\x00\x00C', the representation is C
The `uft8` codepoint  is = 105, the bytes representation = b'\x00\x00\x00i', the representation is i
The `uft8` codepoint  is = 97, the bytes representation = b'\x00\x00\x00a', the representation is a
The `uft8` codepoint  is = 111, the bytes representation = b'\x00\x00\x00o', the representation is o


[None, None, None, None]

### Reading/Writing CSV files
If you ever worked with tabular data, you surely encountered CSV (comma separated values) files. These files are used to store table in text format row by row: each row is separated by a new line and the columns inside a row are separated by commas `,` or by semicolumns `;`. The first line in the file usually contains the header giving the names of the columns:

```
first_column,second_column
1,2
2,3
```

As CSV is a very common to exchange tabular data such as statistics and time series, the python standard library offers facilities to read and write CSV files[^3] through the [csv](https://docs.python.org/3/library/csv.html) module. Despite this, today most people prefer using [pandas](https://pandas.pydata.org/) or [polars](https://www.pola.rs/) to manipulate tabular data because they offer more conveinence and faster handling of large datasets. These packages are outside of the scope of this tutorial and will not be handled here.


Let's see how to read csv files using `csv` with an example by reading [example.csv](./data/example.csv):






In [39]:
import csv
import pathlib as pl

csv_file = pl.Path("./tutorial/tests/data/example.csv")
with open(csv_file) as input_file:
    reader = csv.reader(input_file)
    #Get the header
    header = next(reader)
    #Iterate over lines
    for line in reader:
        print(line)


['1', '2', ' some']
['3', '4', ' numbers']
['5', '6', ' here']


The `next` function called on any iterable returns the next value and advances the iterable counter. In the case of `csv.reader`, we use this to read the header.

The `csv` module does not interpet the data; as a default everything is read as `str`.

Similarly, you can use the CSV module to write a CSV using the [`writer`](https://docs.python.org/3/library/csv.html#csv.writer) class. The `writer` object has `writerow` method which takes an `iterable` of values to write as the current row

In [3]:
import csv
import pathlib as pl

csv_file = pl.Path("./tutorial/tests/data/example1.csv")

with open(csv_file, "w") as output_file:
    writer = csv.writer(output_file)
    #Get the header
    writer.writerow(["this", "is", "data"])
    #Iterate over lines
    [writer.writerow((i, i+1, i+2)) for i in range(10)]
        

with this program, we write a csv file with three columns named `this`, `is` and `data` and with the following entries:
`i, i+1, i+2` where `i` ranges from 0 to 9.

## Exercises


### Exercise 1: CSV to dictionary 🌶️🌶️
Write a function that reads a CSV from the file [example](./tutorial/tests/data/example.csv) and returns a `dict` (dictionary) where the keys are the column names and the values are the list of values, without converting any data types.
If the file contains the following lines:
```
a, b
1, 2
3, 4
5, 6
```
this function should return:
```python
{"a":["1", "3", "5"], "b": ["2", "4", "6"]}
```

<div class="alert alert-block alert-info">
    Hints:
    <ul>
        <li>
            To facilitate your solution, all list of the dictionary should be lists of `str`. This is because
            `csv.reader` reads each column as string by default.
        </li>
        <li>
            Calling `next(csv_reader)` immediately after you created the `csv.reader` object `csv_reader` returns you the header to the file and advances the reader to the first data row.
        </li>
        <li>
            Consider the function `itertools.zip_longest`.
        </li>
        <li>
            You recive the file to test as the first argument, `f` of the function skeleton `solution_exercise1` below.
        </li>
    </ul>
</div>






In [2]:
%%ipytest input_output
import pathlib as pl
def solution_exercise1(f: pl.Path) -> "dict[str, list[str]]":
    """
    Write your solution here. 
    f is the path to the file to read from
    """

    

[31mF[0m[31m                                                                        [100%][0m
[31m[1m______________________ test_exercise1[solution_exercise1] ______________________[0m
[1m[31mtutorial/tests/test_input_output.py[0m:21: in test_exercise1
    [94massert[39;49;00m function_to_test(f) == reference_solution_exercise1(f)[90m[39;49;00m
[1m[31mtutorial/tests/test_input_output.py[0m:12: in reference_solution_exercise1
    [94mwith[39;49;00m [96mopen[39;49;00m(f) [94mas[39;49;00m lines:[90m[39;49;00m
[1m[31mE   FileNotFoundError: [Errno 2] No such file or directory: '/home/basi/python-tutorial/data/example.csv'[0m
[31mFAILED[0m tutorial/tests/test_input_output.py::[1mtest_exercise1[solution_exercise1][0m - FileNotFoundError: [Errno 2] No such file or directory: '/home/basi/python-...
[31m[31m[1m1 failed[0m[31m in 0.04s[0m[0m



### Exercise 2: Counting words 🌶️
Write a function  to read all the lines from [`lines.txt`](./tutorial/tests/data/lines.txt) and count the number of words in the file. The solution should be a single number.

For example, for the file
```
this 
file 
has 
three
lines
```
the result should be `4`. 

<div class="alert alert-block alert-info">
    Hints:
    <ul>
        <li>
          The file is available as the input `f` of `solution_exercise2`.
        </li>
    </ul>
</div>




In [1]:
%reload_ext tutorial.tests.testsuite

In [2]:

%%ipytest input_output
import pathlib as pl
def solution_exercise2(f: pl.Path) -> int:
    """
    Write your solution here. 
    f is the path to the file to read from
    """
    return 5


[31mF[0m[31m                                                                        [100%][0m
[31m[1m______________________ test_exercise2[solution_exercise2] ______________________[0m
[1m[31mtutorial/tests/test_input_output.py[0m:35: in test_exercise2
    [94massert[39;49;00m function_to_test(f) == reference_solution_exercise2(f)[90m[39;49;00m
[1m[31mE   AssertionError: assert 5 == 4[0m
[1m[31mE    +  where 5 = <function solution_exercise2 at 0x7f3c70589940>(PosixPath('/home/basi/python-tutorial/tutorial/tests/data/example.csv'))[0m
[1m[31mE    +  and   4 = reference_solution_exercise2(PosixPath('/home/basi/python-tutorial/tutorial/tests/data/example.csv'))[0m
----------------------------- Captured stdout call -----------------------------
/home/basi/python-tutorial/tutorial/tests/test_input_output.py
[31mFAILED[0m tutorial/tests/test_input_output.py::[1mtest_exercise2[solution_exercise2][0m - AssertionError: assert 5 == 4
[31m[31m[1m1 failed[0m[31m in 

 
### Letter statistics 🌶️🌶️
Write a function that reads all the lines from [`lines.txt`](./data/lines.txt) and outputs a table of statistics in this form:
- An **alphabetically sorted** dictionary with `letter: count` for each letter in the words, for example `{a: 5}` means that the letter `a` appeared five times in this file.



<div class="alert alert-block alert-info">
    Hints:
    <ul>
        <li>
          The file is available as the input `f` of `solution_exercise3`.
        </li>
        <li>
            You can use functions from `itertools` to group your strings by letters and to combine all lines in the file in a single string.
    Consider `chain` and `groupby`. Be careful that `groupby` requires the input iterable to be sorted. You can do this by using the `sorted` function.
        </li>
        <li>
            to verify if a character is a letter, you can use the `isalpha` method.
    'a'.isalpha()
        </li>
    </ul>
</div>

In [5]:
%%ipytest input_output
import pathlib as pl
def solution_exercise3(f: pl.Path) -> "dict[str, int]":
    pass
    

[31mF[0m[31m                                                                        [100%][0m
[31m[1m______________________ test_exercise3[solution_exercise3] ______________________[0m
[1m[31mtutorial/tests/test_input_output.py[0m:40: in test_exercise3
    f = get_data([33m"[39;49;00m[33mlines.txt[39;49;00m[33m"[39;49;00m)[90m[39;49;00m
[1m[31mtutorial/tests/test_input_output.py[0m:34: in reference_solution_exercise3
    [94mdef[39;49;00m [92mreference_solution_exercise3[39;49;00m(f: pl.Path) -> [33m"[39;49;00m[33mdict[str, int][39;49;00m[33m"[39;49;00m:[90m[39;49;00m
[1m[31mE   FileNotFoundError: [Errno 2] No such file or directory: '/home/basi/python-tutorial/data/lines.txt'[0m
[31mFAILED[0m tutorial/tests/test_input_output.py::[1mtest_exercise3[solution_exercise3][0m - FileNotFoundError: [Errno 2] No such file or directory: '/home/basi/python-...
[31m[31m[1m1 failed[0m[31m in 0.00s[0m[0m



### Exercise 4: Translating words 🌶️🌶️
Write a function which takes the words from the `english.csv` and translates them to italian using the dictionary file `dict.csv`. The output should be a list of tuples with the pair `italian, english` if the word is found and and nothing otherwise.
For example, given the `english.csv` file:

```
bread
cat
```

and the `dict.csv` file:

```
120, pane, bread
121 sole, sun
```

the result should be:

`[(bread, pane), ]`


<div class="alert alert-block alert-info">
    Hints:
    <ul>
        <li>
            Try to avoid loading the dictionary more than once. Consider that I used the words **dictionary file**, this should suggest the correct python data structure to use to store the translations.
        </li>
        <li>
            The path to the input file `english.csv` is available as the argument `english` of the function `solution_exercise4`, the file `dict.csv` as the argument `dictionary`
        </li>
    </ul>
<div>


In [2]:
%%ipytest input_output
import pathlib as pl
def solution_exercise4(english: pl.Path, dictionary: pl.Path) -> "list[(str, str)]":
    """
    Write your solution here
    """
    pass

[31mF[0m[31m                                                                        [100%][0m
[31m[1m______________________ test_exercise4[solution_exercise4] ______________________[0m
[1m[31mtutorial/tests/test_input_output.py[0m:56: in test_exercise4
    [94massert[39;49;00m function_to_test(words, dictionary) == reference_solution_exercise4(words, dictionary)[90m[39;49;00m
[1m[31mtutorial/tests/test_input_output.py[0m:44: in reference_solution_exercise4
    [94mwith[39;49;00m [96mopen[39;49;00m(english) [94mas[39;49;00m english_file:[90m[39;49;00m
[1m[31mE   FileNotFoundError: [Errno 2] No such file or directory: '/home/basi/python-tutorial/fdata/english.csv'[0m
[31mFAILED[0m tutorial/tests/test_input_output.py::[1mtest_exercise4[solution_exercise4][0m - FileNotFoundError: [Errno 2] No such file or directory: '/home/basi/python-...
[31m[31m[1m1 failed[0m[31m in 0.04s[0m[0m



### Exercise 5: Binary format 🌶️🌶️🌶️
The file `super_secret.dat` contains a secret message. We know that the message is stored in binary format as a sequence of bytes. The message starts with the byte sequence `b'\xff\xee\xdd\xcc\xbb\xaa'` and finishes with `b'\xaa\xbb\xcc\xdd\xee\xff'`. 
Write a function that reads the file and returns **only** the secret message as a string.


<div class="alert alert-block alert-info">
    <b>Hint:</b> The path to the input file is available as the argument `secret_file` of the function `solution_exercise5`
</div>

In [2]:
%%ipytest input_output
import pathlib as pl
def solution_exercise5(secret_file: pl.Path) -> str:
    """
    Write your solution here
    """
    pass

[31mF[0m[31m                                                                        [100%][0m
[31m[1m______________________ test_exercise5[solution_exercise5] ______________________[0m
[1m[31mtutorial/tests/test_input_output.py[0m:62: in test_exercise5
    [94massert[39;49;00m function_to_test(message) == reference_solution_exercise5(message)[90m[39;49;00m
[1m[31mE   AssertionError: assert None == 'Congratulations, you found the secret message!'[0m
[1m[31mE    +  where None = <function solution_exercise5 at 0x7f7340105580>(PosixPath('data/secret_message.dat'))[0m
[1m[31mE    +  and   'Congratulations, you found the secret message!' = reference_solution_exercise5(PosixPath('data/secret_message.dat'))[0m
[31mFAILED[0m tutorial/tests/test_input_output.py::[1mtest_exercise5[solution_exercise5][0m - AssertionError: assert None == 'Congratulations, you found the secret messa...
[31m[31m[1m1 failed[0m[31m in 0.04s[0m[0m

