## Table of contents

- [String input](#String-input)
- [File I/O](#File-I/O)
- [Writing to a file](#Writing-to-a-file)
- [Context managers](#Context-managers)
- [Binary I/O](#Binary-I/O)
    - [Bytes and strings](#Bytes-and-strings)
    - [Converting bytes to text](#Converting-bytes-to-text)
- [Reading/Writing CSV files](#Reading/Writing-CSV-files)
- [Exercises](#Exercises)
    - [Exercise 1: CSV to dictionary 🌶️🌶️](#Exercise-1:-CSV-to-dictionary-🌶️🌶️)
    - [Exercise 2: Counting words 🌶️](#Exercise-2:-Counting-words-🌶️)
    - [Exercise 4: Translating words 🌶️🌶️](#Exercise-4:-Translating-words-🌶️🌶️)
    - [Exercise 5: Binary format 🌶️🌶️🌶️](#Exercise-5:-Binary-format-🌶️🌶️🌶️)




## References
- [Input and output tutorial](https://docs.python.org/3/tutorial/inputoutput.html) from the official Python tutorial.
- [Is explicitly closing files important?](https://stackoverflow.com/questions/7395542/is-explicitly-closing-files-important)
- [Context managers](https://docs.python.org/3/library/contextlib.html) from the Python standard library.

---

## Introduction
An important aspect to learn in a new programming language is input/output (or I/O). For our program to be useful, we need to be able to interact with the outside world. This means performing operations like:
- reading and writing data to a file
- printing text output
- asking for user input
- connecting to databases or other network services


The majority of these operations are covered by the Python standard library. We are going to see how to use them in this chapter.

<div class="alert alert-block alert-info">
⚠️ In reference to the chapter on functional programming, it is interesting to note that these functions perform <b>side-effects</b>. Therefore, any code containing these operations is no longer <b>pure</b> and is not referentially transparent. The same function can return different values for the same argument if called multiple times, and the function can have <b>long-distance</b> effects. That means they can modify the program state elsewhere, leading to unexpected results.<br><br>
Therefore, we suggest separating input and output from the other computations in your program. For example, if you have a complex calculation requiring several user inputs at several stages of the process, consider writing a function that only performs the calculation given all inputs and then requires all inputs separately, for example, through a single file. This makes your code easier to debug, test and understand.
</div>




## String input and output 

### String output
The most basic I/O operation in Python is displaying text on the Python console. This is done using the `print` function:

In [1]:
print("I am some text")

I am some text


### String input
The complementary operation, prompting for *user input*, is done through `input`.
The function takes one string as an argument, which gives the prompt displayed on the console.
The return value of the function contains the input the user types:

In [2]:
user_input = input("Write some text")
print(f"The user wrote {user_input}")

KeyboardInterrupt: Interrupted by user


<div class="alert alert-block alert-info">
⚠️ Note that <code>input</code> is a <b>blocking operation</b>. The program evaluation stops and waits until the user provides an input. Therefore, use <code>input</code> <b>carefully</b> and only when truly necessary. A common issue is that someone first writes an interactive program expecting user input through <code>input</code> and later integrates it into a larger application that is supposed to run automatically without any user interaction. Suddenly, the application stops somewhere and does not run further because of a well hidden <code>input</code> call...


If you have to use it, do not do so in the middle of long-running computation or your model code, but ask for all user inputs upfront. Though, if you need to write an interactive application, there are more robust solutions.

</div>

## File I/O
A second common I/O operation in most programming languages is reading from and writing to files. This is more complex than just writing on the console and consists of several steps

1. You need to find *where* in the operating system the file is located. This gives you the so-called *path* to the file.
2. You need to *open* the file for reading or writing. In some operating systems, this *locks* the file, so that other users or processes cannot write simultaneously with you. This operation gives you a *file handle* that you can use to read or write.
3. Now, you can read (or write) the contents of the file. 
4. Finally, you need to *close* the file to make it accessible to other processes again and to finalize the writing. In some implementations, `write` only writes the text to a temporary location in memory (a so-called *buffer*) and only writes the content to the file when you close it. This is done to increase response times, as writing to memory is faster than writing to a file on the disk.


As a trick to remember closing files: *a file handle is like a door handle*. 

### Paths
To handle the *where* a file is located, Python offers the `pathlib` module, which is used to represent *paths* to files in a portable manner.
This means that you can use one Python object to represent the location of a file regardless of the operating system, and Python will convert it into a representation specific to the operating system where you are executing the program.
For example, Windows uses `\` as a separator for path elements, while Unix/Linux use `/`.
If you use pathlib, you can write code using only `/` and `pathlib`  will automatically select the correct path separator where the program runs.

Let's see how to generate a `path` object pointing to the text file [`hello.txt`](./tutorial/tests/data/hello.txt) in the main directory of this repository.
To do so, we use the [`Path`](https://docs.python.org/3/library/pathlib.html#basic-use) class of `pathlib` module, whose constructor takes the location of the file as a string:



In [None]:
import pathlib as pl

path = pl.Path("./tutorial/tests/data/hello.txt")

The `./` at the beginning of the path indicates that this path is a *relative path*. This is just a path that points to a file *relative* to the directory we are working in. 

The `path` object has a method `absolute` that returns the *absolute* full path of the object in the OS:

In [None]:
print(path.absolute())

`path` objects also have additional methods like:
1. `name` to get only the name of the file
2. `parent` to get the parent folder of the current file
3. `suffix` to get the suffix of a file
4. `is_dir()` to check whether the path is a directory
5. `is_file()` to check whether the path is a file
6. `iterdir()` to get a list of all files (and directories) in the current path, if the path is a directory.
7. Many more methods are listed [here](https://docs.python.org/3/library/pathlib.html#general-properties)

Below, we see some of these methods at work:

In [7]:
import pathlib as pl
p = pl.Path("tutorial/tests/data/hello.txt")
print(p.name)
print(p.parent)
print(p.suffix) 
print(p.is_dir())
print(list(p.parent.iterdir()))
print(p.is_file())

hello.txt
tutorial/tests/data
.txt
False
[PosixPath('tutorial/tests/data/longest_10000.txt'), PosixPath('tutorial/tests/data/buckets_2.txt'), PosixPath('tutorial/tests/data/example1.csv'), PosixPath('tutorial/tests/data/lines.txt'), PosixPath('tutorial/tests/data/me.txt'), PosixPath('tutorial/tests/data/intcode_1.txt'), PosixPath('tutorial/tests/data/output1.txt'), PosixPath('tutorial/tests/data/trees_2.txt'), PosixPath('tutorial/tests/data/secret_message.dat'), PosixPath('tutorial/tests/data/buckets_1.txt'), PosixPath('tutorial/tests/data/numbers.txt'), PosixPath('tutorial/tests/data/intcode_2.txt'), PosixPath('tutorial/tests/data/universe_1.txt'), PosixPath('tutorial/tests/data/example.csv'), PosixPath('tutorial/tests/data/universe_2.txt'), PosixPath('tutorial/tests/data/english.csv'), PosixPath('tutorial/tests/data/hello.txt'), PosixPath('tutorial/tests/data/2020_2.txt'), PosixPath('tutorial/tests/data/trees_1.txt'), PosixPath('tutorial/tests/data/dict.csv'), PosixPath('tutorial/tes

If we prefer, we can also construct paths by combining their components with the `/` operator. For example, we can do:

In [None]:
import pathlib as pl

path = pl.Path("./tutorial/tests") / pl.Path("data")
print(path.absolute())


This also works to combine string and paths, and even just strings, provided that somewhere in the chain of `/` we have a `Path` object:

In [None]:
#This works
p1 = pl.Path(".") / "tutorial/tests/data/hello.txt"
#This too
p2 = pl.Path(".") / "tutorial" / "tests" / "data" / "hello.txt"
#This as well
p3 = "." / pl.Path("tutorial") / "tests" / "data" / "hello.txt"

print(p1, p2, p3)

We can also list all files matching a certain in a given directory using the `glob` method:

In [None]:
[f for f in path.glob("*")]

The `glob` method takes a `pattern` argument that expresses the form of the file names to look for.
The `*` star means *match everything*. For more information on *glob patterns*, see the documentation of [fnmatch](https://docs.python.org/3/library/fnmatch.html#module-fnmatch).


### Exercises:
1. Modify the function `solution_find_all_files` to find all files in the [data](./tutorial/tests/data/) directory and return it as a list
    **Hints**: <div class="alert alert-block alert-info">
            <ul>
                <li>
                The path to the input directory is available as the argument <code>current_path</code> of the function <code>solution_find_all_files</code>
                </li>
            </ul>
        </div>
2. Modify the function `solution_count_parents` to count all *directories* in the *directory* where `input_path` (hint: consider `parent`) of this function is located and return the count as `int`

In [None]:
%%ipytest
from pathlib import Path
def solution_find_all_files(current_path: Path) -> list[Path]: 
    """Write your solution here"""
    pass

[31mF[0m[31m                                                                        [100%][0m
[31m[1m_________________ test_find_all_files[solution_find_all_files] _________________[0m
[1m[31mtutorial/tests/test_input_output.py[0m:20: in test_find_all_files
    [94massert[39;49;00m function_to_test(f) == reference_solution_find_all_files(f)[90m[39;49;00m
[1m[31mE   AssertionError: assert None == [][0m
[1m[31mE    +  where None = <function solution_find_all_files at 0x7f562817b100>(PosixPath('data'))[0m
[1m[31mE    +  and   [] = reference_solution_find_all_files(PosixPath('data'))[0m
[31mFAILED[0m tutorial/tests/test_input_output.py::[1mtest_find_all_files[solution_find_all_files][0m - AssertionError: assert None == []
[31m[31m[1m1 failed[0m[31m in 0.04s[0m[0m



In [4]:
%%ipytest
from pathlib import Path
def solution_count_parents(input_path: Path) -> int: 
    """Write your solution here"""
    pass




### Reading from a file

Let's see how to do this with an example: we want to open the file [hello.txt](./data/hello.txt) and read its contents.

1. The path is already identified, we know the file is in `./data/hello.txt`. We save this in a variable `path`.
2. We now can use the built-in function [`open`](https://docs.python.org/3/library/functions.html#open) to open the file. This function returns a [file object](https://docs.python.org/3/glossary.html#term-file-object) that we can use to further manipulate the file. To ensure we only open the file for reading, we pass the string "r" to the second argument of `open`.
3. Now we can read the contents using `read`, `readline` or `readlines`. `read` reads the whole file content into a single string, `readline` reads one line, while `readlines` reads the whole file content as a list of strings, one item per line in the file. This knowledge is useful when we only want to read part of a file or when the file is too big to fit in memory and we can only read parts.
4. Finally, we close the file using the `close` method on the file object.


In [None]:
import pathlib as pl

path = pl.Path("./tutorial/tests/data/hello.txt")
file_ob = open(path, 'r')
contents = file_ob.readlines()
file_ob.close()
print(contents)

Notice that calling `read`, `readline` or `readlines` *consumes* the file, either fully or to the corresponding location. This means that if we call `readlines` twice, we will get an empty list the second time:

In [None]:
import pathlib as pl

path = pl.Path("./tutorial/tests/data/hello.txt")
file_ob = open(path, 'r')
contents = file_ob.readlines()
print(contents)
other_contents = file_ob.readlines()
print(other_contents)
file_ob.close()


We can use this to read a file line-by-line by just iterating over the file using a `for` loop or a list compherension. The `file` object implements the [iterator](https://docs.python.org/3/glossary.html#term-iterator) protocol:

In [None]:
import pathlib as pl

path = pl.Path("./tutorial/tests/data/lines.txt")
file_ob = open(path, 'r')
for line in file_ob:
    print(line)
file_ob.close()

This is the most *pythonic* way to read a file line-by-line instead of reading the full contents at once.

### Exercises:
1. Modify the function `solution_read_file` to return the contents of the file passed as the `input_file` argument and return it as a list of strings, one string per line


In [2]:
%%ipytest
from pathlib import Path
def solution_read_file(input_file: Path) -> "list[str]": 
    """Write your solution here"""
    pass

[31mF[0m[31m                                                                        [100%][0m
[31m[1m______________________ test_read_file[solution_read_file] ______________________[0m
[1m[31mtutorial/tests/test_input_output.py[0m:37: in test_read_file
    [94massert[39;49;00m function_to_test(data) == reference_solution_read_file(data)[90m[39;49;00m
[1m[31mE   AssertionError: assert None == ['this\n', 'file\n', 'has multiple\n', 'lines\n', 'how many\n', 'lines\n', ...][0m
[1m[31mE    +  where None = <function solution_read_file at 0x7f32f0f0b060>(PosixPath('/home/basi/python-tutorial/tutorial/tests/data/lines.txt'))[0m
[1m[31mE    +  and   ['this\n', 'file\n', 'has multiple\n', 'lines\n', 'how many\n', 'lines\n', ...] = reference_solution_read_file(PosixPath('/home/basi/python-tutorial/tutorial/tests/data/lines.txt'))[0m
[31mFAILED[0m tutorial/tests/test_input_output.py::[1mtest_read_file[solution_read_file][0m - AssertionError: assert None == ['this\n', 'fi

### Writing to a file
The process to write data to a file is very similar, the main difference being that:
- We use `w` as a second argument of `open` to specify that we want to write to the file.If the file already exists, it will be erased before we write something else to it. If you want to append to the file, you should use `a` instead.
- We use `write` to write a *string* to the file. Other types of object should be converted to string before being written.


Let's see this in action by writing your name in a file called `me.txt` in [data](./data/)

In [3]:
import pathlib as pl

path = pl.Path("./tutorial/tests/data/me.txt")
file_ob = open(path, "w")
file_ob.write("Simone")
file_ob.close()

Congratulations! Your name is now written in stone. 

If we want to write the contents of an *iterable* to a file, we can use the `writelines` method:

In [4]:
import pathlib as pl

path = pl.Path("./tutorial/tests/data/numbers.txt")
file_ob = open(path, "w")
file_ob.writelines([str(i) + "\n" for i in range(10)])
file_ob.close()

Notice that for each line, we concatenate the `newline` `\n` symbol to the string to be written to write the text to a new line.


### Exercises
1. Modify the function `solution_write_file` to write the sentence "python tutorial 2023" to the file `output_file` passed as argument to the function

In [24]:
%%ipytest
from pathlib import Path
def solution_write_file(output_file: Path) -> None:
    """Write your solution here"""

### Context managers
As you can see, after *opening* and performing operations on a file, we always have to remember to *close* it.
If we forget to close it, unexpected behaviour can happen.
If the program crashes later on, for example, we might have a situation where the text is not written to the file.
If you open many files and you don't close them, the Python interpreter can run out of memory.
On some operating systems, the file contents are only updated after closing, etc ...

This pattern is common when dealing with many *resources*: files, connections, threads, servers, etc...
You acquire access to the resource, do some work on it, and, finally, clean up after yourself by closing it again.
Because of this, Python offers a construct called [*context manager*](https://docs.python.org/3/reference/datamodel.html#context-managers) which implements exactly this beahvior:
- Get access to a resource.
- Do some work.
- Release this resource.

In the case of files, we can replace the open-read-close or open-write-close sequence  with a context manager.
Context managers are used inside  the `with` statement:

In [6]:
import pathlib as pl

hello_file = pl.Path("./tutorial/tests/data/hello.txt")
with open(hello_file, "r") as file_ob: 
    contents = file_ob.readlines()
    print(contents)

['Hello, I am a file with some text.\n']


`with open(path) as name` opens the file in `path` and assigns it to the `name` file object.
This object is only valid in the *scope* of the context manager, that is the indented block of code that follows the `:`.
Once the Python interpreter leaves the context manager, `file_ob.close()` is automatically called, ensuring the file is properly closed no matter what happens.

This pattern can be extended to any other resource that should be managed in a similar way, for example database connections. 
Any object that implements `__enter__` and `__exit__` can be used with the context manager syntax.

If you want to learn how to implement context managers for other types of objects, please refer to the `contextlib` [documentation](https://docs.python.org/3/library/contextlib.html) in the Python standard library.

### Binary I/O

Another aspect of file I/O is accessing files in [*binary mode*](https://docs.python.org/3/library/io.html#binary-i-o); that means that instead of writing and reading *text*, we manipulate `bytes` in order to represent non-textual data. This is useful for interacting with measurement data and other non-textual information like images, machine learning model parameters and other complex strcuture, although in most cases you won't need the low-level control of binary I/O and will use libraries instead. 

To look at an example, let's write a sequence of `int` to `output.dat`[^1]  as a  sequence of bytes. Because one byte corresponds to 8 bits, using one byte per integer means we can unambigously store `2^8=256` values.


[^1]: `.dat` is a typical "generic" extension to indicate that the file contains some sort of data. Filename extensions does not have any binding meaning by themselves, they are simply a convention for users to quickly see what contents to expect.

In [7]:
import pathlib as pl

bin_file = pl.Path("./tutorial/tests/data/output.txt")
with open(bin_file, "wb") as out_file:
    bs = b"".join([i.to_bytes(1, 'little') for i in  range(10)])
    print(bs)
    out_file.write(bs)

b'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t'


Note that we used the mode `wb` for `write, binary`. The `write` function expects a [`bytes`](https://docs.python.org/3/library/stdtypes.html#bytes) object. Because of the historical connection between bytes and strings, we can produce a bytes object by prepending `b` to a string literal. Therefore, we generate an array of bytes using [`to_bytes`](https://docs.python.org/3/library/stdtypes.html#int.to_bytes) and combine them with the [`join`](https://docs.python.org/3/library/stdtypes.html#str.join) method on the empty byte literal `b""`.

Now that we wrote out our sequence, we can try to read it back from the file:

In [8]:
import pathlib as pl

bin_file = pl.Path("./tutorial/tests/data/output.txt")
with open(bin_file, "rb") as in_file:
    data = in_file.read()
    seq = [b for b in data]
    print(seq)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


And, surprise! We obtain our original sequence.
Note that because of a quirk of Python, the `read` method returns a `bytes` object, but when we try to access a single element (as we would do with a string), the entry is already an `int` which corresponds to the Unicode codepoint.


If instead we want to store our numbers using two bytes per digit, we do:

In [9]:
import pathlib as pl
import struct

bin_file = pl.Path("./tutorial/tests/data/output1.txt")
with open(bin_file, "wb") as out_file:
    bs = b"".join([struct.pack(">H", i) for i in range(1024)])
    out_file.write(bs)

We do this by using the [`struct`](https://docs.python.org/3/library/struct.html) module of the Python standard library, which offers methods to represent built-in data as different bytes formats.

The first argument to `struct.pack` is the format the data should be interpreted into, ">" means little-endian, or putting the byte representing the smallest digit of each number first. "H" means an *unsigned short*, which corresponds to two bytes.

If we want to read the data, we can do:

In [10]:
import pathlib as pl
import struct

bin_file = pl.Path("./tutorial/tests/data/output1.txt")
with open(bin_file, "rb") as in_file:
    data = in_file.read()
    seq = [i for i, *rest in struct.iter_unpack(">H", data)]
    print(seq)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221,

#### Bytes and strings
Earlier, we mentioned the connection between bytes and strings in Python. To learn more about this, let's briefly review the Unicode UTF-8 standard.

The Unicode UTF-8  standard is a system to represent texts; in Unicode  every character is assigned a number (the *codepoint*) that gives the location of this symbol in the table of all symbols. Particularly, the Unicode UTF-8 standard stores each character as 1 to four bytes (each byte is a group of 8 bits). 

Let's see this more clearly with an example:

In [11]:
for i in range(0, 8**3):
    bt = i.to_bytes(4)
    print(f"bytes: {bt}, integer: {i}, string: {chr(i)}")

bytes: b'\x00\x00\x00\x00', integer: 0, string:  
bytes: b'\x00\x00\x00\x01', integer: 1, string: 
bytes: b'\x00\x00\x00\x02', integer: 2, string: 
bytes: b'\x00\x00\x00\x03', integer: 3, string: 
bytes: b'\x00\x00\x00\x04', integer: 4, string: 
bytes: b'\x00\x00\x00\x05', integer: 5, string: 
bytes: b'\x00\x00\x00\x06', integer: 6, string: 
bytes: b'\x00\x00\x00\x07', integer: 7, string: 
bytes: b'\x00\x00\x00\x08', integer: 8, string:
bytes: b'\x00\x00\x00\t', integer: 9, string: 	
bytes: b'\x00\x00\x00\n', integer: 10, string: 

bytes: b'\x00\x00\x00\x0b', integer: 11, string: 
bytes: b'\x00\x00\x00\x0c', integer: 12, string: 
bytes: b'\x00\x00\x00\r', integer: 13, string: 
bytes: b'\x00\x00\x00\x0e', integer: 14, string: 
bytes: b'\x00\x00\x00\x0f', integer: 15, string: 
bytes: b'\x00\x00\x00\x10', integer: 16, string: 
bytes: b'\x00\x00\x00\x11', integer: 17, string: 
bytes: b'\x00\x00\x00\x12', integer: 18, string: 
bytes: b'\x00\x00\x00\x13', integer: 19, string: 

The `chr` function takes an integer and returns the corresponding Unicode character.

#### Converting bytes to text 

If we receive a `bytes` object and we want to transform it to text, we can use the `int.from_bytes` method and the `chr` function:

In [12]:
message = "Ciao"
message_secret = bytes(message, "utf-8")
[print(f"The `uft8` codepoint  is = {enc}, the bytes representation = {enc.to_bytes(4)}, the representation is {chr(enc)}") for plain, enc in zip(message, message_secret)]

The `uft8` codepoint  is = 67, the bytes representation = b'\x00\x00\x00C', the representation is C
The `uft8` codepoint  is = 105, the bytes representation = b'\x00\x00\x00i', the representation is i
The `uft8` codepoint  is = 97, the bytes representation = b'\x00\x00\x00a', the representation is a
The `uft8` codepoint  is = 111, the bytes representation = b'\x00\x00\x00o', the representation is o


[None, None, None, None]

### Reading/Writing CSV files
If you ever worked with tabular data, you surely encountered CSV (comma separated values) files.
These files are used to store table in text format row by row: each row is separated by a new line and the columns inside a row are separated by commas `,` or by semicolumns `;`.
The first line in the file usually contains the header giving the names of the columns:

```
first_column,second_column
1,2
2,3
```

As CSV is very common to exchange tabular data such as statistics and time series, the Python standard library offers facilities to read and write CSV files through the [csv](https://docs.python.org/3/library/csv.html) module.
Despite this, today most people prefer using [pandas](https://pandas.pydata.org/) or [polars](https://www.pola.rs/) to manipulate tabular data because they offer more convenience and faster handling of large datasets.
These packages are outside of the scope of this tutorial and will not be covered here.


Let's see how to read csv files using `csv` with an example by reading [example.csv](./data/example.csv):

In [13]:
import csv
import pathlib as pl

csv_file = pl.Path("./tutorial/tests/data/example.csv")
with open(csv_file) as input_file:
    reader = csv.reader(input_file)
    #Get the header
    header = next(reader)
    #Iterate over lines
    for line in reader:
        print(line)

['1', '2', ' some']
['3', '4', ' numbers']
['5', '6', ' here']


The `next` function called on any iterable returns the next value and advances the iterable counter. In the case of `csv.reader`, we use this to read the header.

The `csv` module does not interpet the data; as a default everything is read as `str`.

Similarly, you can use the CSV module to write a CSV using the [`writer`](https://docs.python.org/3/library/csv.html#csv.writer) class. The `writer` object has `writerow` method which takes an `iterable` of values to write as the current row

In [14]:
import csv
import pathlib as pl

csv_file = pl.Path("./tutorial/tests/data/example1.csv")

with open(csv_file, "w") as output_file:
    writer = csv.writer(output_file)
    #Get the header
    writer.writerow(["this", "is", "data"])
    #Iterate over lines
    [writer.writerow((i, i+1, i+2)) for i in range(10)]
        

## Exercises

In [15]:
%reload_ext tutorial.tests.testsuite

### Exercise 1: CSV to dictionary 🌶️🌶️
Write a function that reads a CSV from the file [example](./tutorial/tests/data/example.csv) and returns a `dict` (dictionary) where the keys are the column names and the values are the list of values, without converting any data types.
If the file contains the following lines:
```
a, b
1, 2
3, 4
5, 6
```
this function should return:
```python
{"a":["1", "3", "5"], "b": ["2", "4", "6"]}
```

<div class="alert alert-block alert-info">
    Hints:
    <ul>
        <li>
            To facilitate your solution, all list of the dictionary should be lists of <code>str</code>. This is because
            <code>csv.reader</code> reads each column as string by default.
        </li>
        <li>
            Calling <code>next(csv_reader)</code> immediately after you have created the <code>csv.reader</code> object `csv_reader` returns the header to the file and advances the reader to the first data row.
        </li>
        <li>
            Consider the function <code>itertools.zip_longest</code>.
        </li>
        <li>
            You recive the file to test as the first argument, <code>f</code> of the function skeleton <code>solution_exercise1</code> below.
        </li>
    </ul>
</div>






with this program, we write a `csv` file with three columns named `this`, `is` and `data` and with the following entries:
`i, i+1, i+2` where `i` ranges from 0 to 9.

In [16]:
%%ipytest input_output
import pathlib as pl
def solution_exercise1(f: pl.Path) -> "dict[str, list[str]]":
    """
    Write your solution here. 
    f is the path to the file to read from
    """
    pass

[31mF[0m[31m                                                                        [100%][0m
[31m[1m______________________ test_exercise1[solution_exercise1] ______________________[0m
[1m[31mtutorial/tests/test_input_output.py[0m:60: in test_exercise1
    [94massert[39;49;00m function_to_test(f) == reference_solution_exercise1(f)[90m[39;49;00m
[1m[31mE   AssertionError: assert None == {' second': ['2', '4', '6'], ' third': [' some', ' numbers', ' here'], 'first': ['1', '3', '5']}[0m
[1m[31mE    +  where None = <function solution_exercise1 at 0x7f32f0b59f80>(PosixPath('/home/basi/python-tutorial/tutorial/tests/data/example.csv'))[0m
[1m[31mE    +  and   {' second': ['2', '4', '6'], ' third': [' some', ' numbers', ' here'], 'first': ['1', '3', '5']} = reference_solution_exercise1(PosixPath('/home/basi/python-tutorial/tutorial/tests/data/example.csv'))[0m
[31mFAILED[0m tutorial/tests/test_input_output.py::[1mtest_exercise1[solution_exercise1][0m - AssertionErro

### Exercise 2: Counting words 🌶️
Write a function  to read all the lines from [`lines.txt`](./tutorial/tests/data/lines.txt) and count the number of words in the file. The solution should be a single number.

For example, for the file
```
this 
file 
has 
five
lines
```
the result should be `5`. 

<div class="alert alert-block alert-info">
    Hints:
    <ul>
        <li>
          The file is available as the input <code>f</code> of <code>solution_exercise2</code>.
        </li>
    </ul>
</div>




In [17]:
%%ipytest input_output
import pathlib as pl
def solution_exercise2(f: pl.Path) -> int:
    """
    Write your solution here. 
    f is the path to the file to read from
    """
    pass

[31mF[0m[31m                                                                        [100%][0m
[31m[1m______________________ test_exercise2[solution_exercise2] ______________________[0m
[1m[31mtutorial/tests/test_input_output.py[0m:69: in test_exercise2
    [94massert[39;49;00m function_to_test(f) == reference_solution_exercise2(f)[90m[39;49;00m
[1m[31mE   AssertionError: assert None == 12[0m
[1m[31mE    +  where None = <function solution_exercise2 at 0x7f32f0b58900>(PosixPath('/home/basi/python-tutorial/tutorial/tests/data/lines.txt'))[0m
[1m[31mE    +  and   12 = reference_solution_exercise2(PosixPath('/home/basi/python-tutorial/tutorial/tests/data/lines.txt'))[0m
[31mFAILED[0m tutorial/tests/test_input_output.py::[1mtest_exercise2[solution_exercise2][0m - AssertionError: assert None == 12
[31m[31m[1m1 failed[0m[31m in 0.01s[0m[0m



 
### Exercise 3: Letter statistics 🌶️🌶️
Write a function that reads all the lines from [`lines.txt`](./data/lines.txt) and outputs a table of statistics in this form:
- An **alphabetically sorted** dictionary with `letter: count` for each letter in the words, for example `{a: 5}` means that the letter `a` appeared five times in this file.



<div class="alert alert-block alert-info">
    Hints:
    <ul>
        <li>
          The file is available as the input <code>f</code> of <code>solution_exercise3</code>.
        </li>
        <li>
            You can use functions from <code>itertools</code> to group your strings by letters and to combine all lines in the file in a single string.
    Consider <code>chain</code> and <code>groupby</code>. Be careful that <code>groupby</code> requires the input iterable to be sorted. You can do this by using the <code>sorted</code> function.
        </li>
        <li>
            to verify if a character is a letter, you can use the <code>isalpha</code> method.
    <code>'a'.isalpha()</code>
        </li>
    </ul>
</div>

In [18]:
%%ipytest input_output
import pathlib as pl
def solution_exercise3(f: pl.Path) -> "dict[str, int]":
    pass

[31mF[0m[31m                                                                        [100%][0m
[31m[1m______________________ test_exercise3[solution_exercise3] ______________________[0m
[1m[31mtutorial/tests/test_input_output.py[0m:79: in test_exercise3
    [94massert[39;49;00m function_to_test(f) == reference_solution_exercise3(f)[90m[39;49;00m
[1m[31mE   AssertionError: assert None == {'a': 3, 'd': 1, 'e': 7, 'f': 2, ...}[0m
[1m[31mE    +  where None = <function solution_exercise3 at 0x7f32f0b5aa20>(PosixPath('/home/basi/python-tutorial/tutorial/tests/data/lines.txt'))[0m
[1m[31mE    +  and   {'a': 3, 'd': 1, 'e': 7, 'f': 2, ...} = reference_solution_exercise3(PosixPath('/home/basi/python-tutorial/tutorial/tests/data/lines.txt'))[0m
[31mFAILED[0m tutorial/tests/test_input_output.py::[1mtest_exercise3[solution_exercise3][0m - AssertionError: assert None == {'a': 3, 'd': 1, 'e': 7, 'f': 2, ...}
[31m[31m[1m1 failed[0m[31m in 0.00s[0m[0m



### Exercise 4: Translating words 🌶️🌶️
Write a function which takes the words from the `english.csv` and translates them to italian using the dictionary file `dict.csv`. The output should be a list of tuples with the pair `italian, english` if the word is found and nothing otherwise.
For example, given the `english.csv` file:

```
bread
cat
```

and the `dict.csv` file:

```
120, pane, bread
121 sole, sun
```

the result should be:

`[(bread, pane), ]`


<div class="alert alert-block alert-info">
    Hints:
    <ul>
        <li>
            Try to avoid loading the dictionary more than once. Consider that we used the expression <i>dictionary file</i>, this should suggest the correct Python data structure to use to store the translations.
        </li>
        <li>
            The path to the input file <code>english.csv</code> is available as the argument <code>english</code> of the function <code>solution_exercise4</code>, the file <code>dict.csv</code> as the argument <code>dictionary</code>
        </li>
    </ul>
<div>


In [19]:
%%ipytest input_output
import pathlib as pl
def solution_exercise4(english: pl.Path, dictionary: pl.Path) -> "list[(str, str)]":
    """
    Write your solution here
    """
    pass

[31mF[0m[31m                                                                        [100%][0m
[31m[1m______________________ test_exercise4[solution_exercise4] ______________________[0m
[1m[31mtutorial/tests/test_input_output.py[0m:95: in test_exercise4
    [94massert[39;49;00m function_to_test(words, dictionary) == reference_solution_exercise4(words, dictionary)[90m[39;49;00m
[1m[31mE   AssertionError: assert None == [('the', 'il'), ('of', 'di'), ('to', 'a'), ('and', 'e'), ('a', 'un'), ('in', 'in'), ...][0m
[1m[31mE    +  where None = <function solution_exercise4 at 0x7f32f0b5b740>(PosixPath('/home/basi/python-tutorial/tutorial/tests/data/english.csv'), PosixPath('/home/basi/python-tutorial/tutorial/tests/data/dict.csv'))[0m
[1m[31mE    +  and   [('the', 'il'), ('of', 'di'), ('to', 'a'), ('and', 'e'), ('a', 'un'), ('in', 'in'), ...] = reference_solution_exercise4(PosixPath('/home/basi/python-tutorial/tutorial/tests/data/english.csv'), PosixPath('/home/basi/python-

### Exercise 5: Binary format 🌶️🌶️🌶️
The file `super_secret.dat` contains a secret message. We know that the message is stored in binary format as a sequence of bytes. The message starts with the byte sequence `b'\xff\xee\xdd\xcc\xbb\xaa'` and finishes with `b'\xaa\xbb\xcc\xdd\xee\xff'`. 
Write a function that reads the file and returns **only** the secret message as a string.


<div class="alert alert-block alert-info">
    <ul>
        <li>
        The path to the input file is available as the argument <code>secret_file</code> of the function <code>solution_exercise5</code>
        </li>
        <li>
            every <code>bytes</code> object has the <code>decode</code>. For example, 
            <code>b"\xbb\xcc\xdd\xee\xff".decode()</code>
        </li>
    </ul>

</div>

In [20]:
int.to_bytes(3).decode("utf-8")

'\x03'

In [21]:
%%ipytest input_output
import pathlib as pl
def solution_exercise5(secret_file: pl.Path) -> str:
    """
    Write your solution here
    """
    pass

[31mF[0m[31m                                                                        [100%][0m
[31m[1m______________________ test_exercise5[solution_exercise5] ______________________[0m
[1m[31mtutorial/tests/test_input_output.py[0m:104: in test_exercise5
    [94massert[39;49;00m function_to_test(message) == reference_solution_exercise5(message)[90m[39;49;00m
[1m[31mE   AssertionError: assert None == 'Congratulations, you found the secret message!'[0m
[1m[31mE    +  where None = <function solution_exercise5 at 0x7f32f0b5a2a0>(PosixPath('/home/basi/python-tutorial/tutorial/tests/data/secret_message.dat'))[0m
[1m[31mE    +  and   'Congratulations, you found the secret message!' = reference_solution_exercise5(PosixPath('/home/basi/python-tutorial/tutorial/tests/data/secret_message.dat'))[0m
[31mFAILED[0m tutorial/tests/test_input_output.py::[1mtest_exercise5[solution_exercise5][0m - AssertionError: assert None == 'Congratulations, you found the secret messa...
[3