# Input / Output

# Table of Contents
  - [Input / Output](#Input-/-Output)
  - [Table of Contents](#Table-of-Contents)
    - [References](#References)
    - [Introduction](#Introduction)
    - [String input and output ](#String-input-and-output)
      - [String output](#String-output)
        - [Quiz on string output:](#Quiz-on-string-output:)
      - [String input](#String-input)
        - [Exercises on string input and output](#Exercises-on-string-input-and-output)
    - [File I/O](#File-I/O)
      - [Paths](#Paths)
        - [Quiz on Paths](#Quiz-on-Paths)
        - [Exercises on paths](#Exercises-on-paths)
      - [Reading from a file](#Reading-from-a-file)
        - [Quiz on file reading](#Quiz-on-file-reading)
        - [Exercises on file reading](#Exercises-on-file-reading)
      - [Writing to a file](#Writing-to-a-file)
        - [Quiz on file writing](#Quiz-on-file-writing)
        - [Exercises on file writing](#Exercises-on-file-writing)
      - [Context managers](#Context-managers)
        - [Quiz on context managers](#Quiz-on-context-managers)
      - [Binary I/O](#Binary-I/O)
        - [Bytes and strings](#Bytes-and-strings)
        - [Converting bytes to text ](#Converting-bytes-to-text)
      - [Reading/Writing CSV files](#Reading/Writing-CSV-files)
        - [Quiz on CSV](#Quiz-on-CSV)
    - [Exercises](#Exercises)
      - [Exercise 1: CSV to dictionary (easy)](#Exercise-1:-CSV-to-dictionary-(easy))
      - [Exercise 2: Counting words (easy)](#Exercise-2:-Counting-words-(easy))
      - [Exercise 3: Letter statistics (medium)](#Exercise-3:-Letter-statistics-(medium))
      - [Exercise 4: Translating words (medium)](#Exercise-4:-Translating-words-(medium))
      - [Exercise 5: Binary format (hard)](#Exercise-5:-Binary-format-(hard))



## References
- [Input and output tutorial](https://docs.python.org/3/tutorial/inputoutput.html) from the official Python tutorial.
- [Is explicitly closing files important?](https://stackoverflow.com/questions/7395542/is-explicitly-closing-files-important)
- [Context managers](https://docs.python.org/3/library/contextlib.html) from the Python standard library.

---

## Introduction
An important aspect to learn in a new programming language is input/output (or I/O). For our program to be useful, we need to be able to interact with the outside world. This means performing operations like:
- reading and writing data to a file
- printing text output
- asking for user input
- connecting to databases or other network services


The majority of these operations are covered by the Python standard library. We are going to see how to use them in this chapter.

## String input and output 

### String output
The most basic I/O operation in Python is displaying text on the Python console. This is done using the `print` function:

In [None]:
print("I am some text")

It is also possible to print any other python object using `print`.

In [None]:
print([])
print([1,2,3])
print({"key": "value"})
print(lambda x: x)
print(("some", "tuple"))

If we want to display the value of a variable in a string, we can do it most conveniently using **string interpolation**. 
To do so, we prepend `f` to a string, which can contain reference to the variables we want to print enclosed in "{}":

In [None]:
my_var = 3
my_string = f"my_var is {my_var}"
print(my_string)

#### Quiz on string output:


In [None]:
import tutorial.quiz.input_output as op
op.StringOutput()

### String input

The complementary operation, prompting for *user input*, is done through `input`.
The function takes one string as an argument, which gives the prompt displayed on the console.
The return value of the function contains the input the user typed:

In [None]:
user_input = input("Write some text")
print(f"The user wrote {user_input}")


<div class="alert alert-block alert-danger">
    <h4><b>Important</b></h4>
    ‚ö†Ô∏è Note that <code>input</code> is a <b>blocking operation</b>. The program execution  stops and waits until the user provides an input. Therefore, use <code>input</code> <b>carefully</b> and only when truly necessary. A common issue is that someone first writes an interactive program expecting user input through <code>input</code> and later integrates it into a larger application that is supposed to run automatically without any user interaction. Suddenly, the application stops somewhere and does not run further because of a well hidden <code>input</code> call...<br><br>

It is more common to have an input file where we read from if we need a user input (reading from a file will come later). If you have to use <code>input</code>, do not do so in the middle of long-running computation or your model code, but ask for all user inputs upfront. For an interactive application, there are more robust solutions.
</div>

### Quiz on string input

In [None]:
import tutorial.quiz.input_output as op
op.StringInput()

#### Exercises on string input and output

In [None]:
%reload_ext tutorial.tests.testsuite

1. Complete the function `solution_print_odd` to **print** all the *odd* numbers between 0 and `n`, **without `n`**. The value of `n` is provided as the parameter of the function `solution_print_odd`.

In [None]:
%%ipytest

def solution_print_odd(n: int) -> None: 
    """Prints all odd numbers from 1 to n
    
    Args:
        n : The maximum number to print (exclusive)

    Returns:
        - None (prints to console)
    """

2. Complete the function `solution_print_salutation`. This function prompts the user for their name using `input` and should **print** the text "Hello, `<name>`", where `<name>` should be replaced with value read from input

<div class="alert alert-block alert-warning">
    <h4><b>Note</b></h4>
    There is no test for this exercise because we cannot easily interact with <code>input</code> in our testing framework. You can check visually if the input does what is expected.
</div>

In [None]:
def solution_print_salutation() -> None:
    """Prints a salutation to the console

    Takes no arguments but takes input with the input() function
    The salutation is Hello, {name} where {name} is the user input

    Returns:
        - None (prints to console)
    """

solution_print_salutation()

## File I/O
A second common I/O operation in most programming languages is reading from and writing to files. This is more complex than just writing on the console and consists of several steps

1. You need to find *where* in the operating system the file is located. This gives you the so-called *path* to the file.
2. You need to *open* the file for reading or writing. In some operating systems, this *locks* the file, so that other users or processes cannot write simultaneously with you. This operation gives you a *file handle* that you can use to read or write.
3. Now, you can read (or write) the contents of the file. 
4. Finally, you need to *close* the file to make it accessible to other processes again and to finalize the writing. In some implementations, `write` only writes the text to a temporary location in memory (a so-called *buffer*) and only writes the content to the file when you close it. This is done to increase response times, as writing to memory is faster than writing to a file on the disk.


As a trick to remember closing files: *a file handle is like a door handle*. 

### Paths
To handle the *where* a file is located, Python offers the `pathlib` module, which is used to represent *paths* to files in a portable manner.
This means that you can use one Python object to represent the location of a file regardless of the operating system, and Python will convert it into a representation specific to the operating system where you are executing the program.
For example, Windows uses `\` as a separator for path elements, while Unix/Linux use `/`.
If you use pathlib, you can write code using only `/` and `pathlib`  will automatically select the correct path separator where the program runs.

Let's see how to generate a `path` object pointing to the text file [`hello.txt`](./tutorial/tests/data/hello.txt) in the main directory of this repository.
To do so, we use the [`Path`](https://docs.python.org/3/library/pathlib.html#basic-use) class of `pathlib` module, whose constructor takes the location of the file as a string:



In [None]:
import pathlib as pl

path = pl.Path("./tutorial/tests/data/hello.txt")

The `./` at the beginning of the path indicates that this path is a *relative path*. The `.` represents the directory we are working in, so where we are executing the program. This is just a path that points to a file *relative* to the directory we are working in. 

The `path` object has a method `absolute` that returns the *absolute* full path of the object in the OS:

In [None]:
print(path.absolute())

`path` objects also have additional methods like:
1. `name` to get only the name of the file
2. `parent` to get the parent folder of the current file
3. `suffix` to get the suffix of a file
4. `is_dir()` to check whether the path is a directory
5. `is_file()` to check whether the path is a file
6. `iterdir()` to get a list of all files (and directories) in the current path, if the path is a directory.
7. Many more methods are listed [here](https://docs.python.org/3/library/pathlib.html#general-properties)

Below, we see some of these methods at work:

In [None]:
import pathlib as pl
p = pl.Path("tutorial/tests/data/hello.txt")
print(p.name)
print(p.parent)
print(p.suffix) 
print(p.is_dir())
print(list(p.parent.iterdir()))
print(p.is_file())

If we prefer, we can also construct paths by combining their components with the `/` operator. For example, we can do:

In [None]:
import pathlib as pl

path = pl.Path("./tutorial/tests") / pl.Path("data")
print(path.absolute())


This also works to combine string and paths, and even just strings, provided that as the first or second object in the chain of `/` we have a `Path` object:

In [None]:
#This works
p1 = pl.Path(".") / "tutorial/tests/data/hello.txt"
#This too
p2 = pl.Path(".") / "tutorial" / "tests" / "data" / "hello.txt"
#This as well
p3 = "." / pl.Path("tutorial") / "tests" / "data" / "hello.txt"

print(p1, p2, p3)

Sometimes one must refer to the parent folder of a file. A `..` in a path refers to the parent folder of the current location.

In [None]:
#Although different paths, these lead to the same file:
p1 = pl.Path("./tutorial/tests/data/hello.txt")
p2 = pl.Path("./tutorial/tests/../../tutorial/tests/data/hello.txt")

print(f"The paths {p1} and {p2} lead to the same file: {p1.samefile(p2)}")

We can also list all files matching a certain pattern in a given directory using the `glob` method. 
The method returns an `iterable` of paths that can be turned into a list like this:

In [None]:
[f for f in path.glob("*")]

The `glob` method takes a `pattern` argument that expresses the form of the file names to look for.
The `*` star means *match everything*. For more information on *glob patterns*, see the documentation of [fnmatch](https://docs.python.org/3/library/fnmatch.html#module-fnmatch).

#### Quiz on Paths


In [1]:
import tutorial.quiz.input_output as op
op.Paths()

Paths(children=(Question(children=(HTML(value='<strong>Q1:</strong> What does the operator <code>/</code> do w‚Ä¶

#### Exercises on paths

In [2]:
%reload_ext tutorial.tests.testsuite


1. Modify the function `solution_find_all_files` to find all files and directories in the [data](./tutorial/tests/data/) (./tutorial/tests/data/) directory and return them as a list of `Path` objects

<div class="alert alert-block alert-info">
    <b>Hint:</b> The path to the data directory is available as the argument <code>current_path</code> of the function <code>solution_find_all_files</code>
</div>

In [3]:
%%ipytest

from pathlib import Path

def solution_find_all_files(current_path: Path) -> list[Path]: 
    """Finds all files in a directory

    Args:
        current_path : The directory to search for files

    Returns:
        - A list of all files in the directory
    """
    return

VBox(children=(Output(outputs=({'output_type': 'display_data', 'data': {'text/plain': 'HTML(value=\'<div style‚Ä¶

2. Modify the function `solution_check_for_file` to check if there is a file named `file_name` in the directory `current_path` and return true if there is one and false if there isn't.

In [9]:
%%ipytest

from pathlib import Path

def solution_check_for_file(current_path: Path, file_name: str) -> bool: 
    """Checks if a file exists in a directory

    Args:
        current_path : The directory to search for the file
        file_name : The name of the file to search for

    Returns:
        - True if the file exists, False otherwise    
    """
    
    return


VBox(children=(Output(outputs=({'output_type': 'display_data', 'data': {'text/plain': 'HTML(value=\'<div style‚Ä¶

3. Modify the function `solution_count_dirs` to count **all the directories** (including hidden ones) in the `input_path`. Return total number of directories found as an integer.

In [10]:
%%ipytest

from pathlib import Path

def solution_count_dirs(input_path: Path) -> int: 
    """Counts the number of directories in a directory

    Args:
        input_path : The directory to search for directories

    Returns:
        - The number of directories in the directory
    """

VBox(children=(Output(outputs=({'output_type': 'display_data', 'data': {'text/plain': 'HTML(value=\'<div style‚Ä¶



### Reading from a file

We now want to learn how to read text from a file. 
Let's see how to do this with an example: we want to open the file [hello.txt](./data/hello.txt) and read its contents.

1. The path is already identified, we know the file is in `./data/hello.txt`. We save this in a variable `path`.
2. We now can use the built-in function [`open`](https://docs.python.org/3/library/functions.html#open) to open the file. This function returns a [file object](https://docs.python.org/3/glossary.html#term-file-object) that we can use to further manipulate the file. To ensure we only open the file for reading, we pass the string "r" to the second argument of `open`.
3. Now we can read the contents using `read`, `readline` or `readlines`. `read` reads the whole file content into a single string, `readline` reads one line, while `readlines` reads the whole file content as a list of strings, one item per line in the file. This knowledge is useful when we only want to read part of a file or when the file is too big to fit in memory and we can only read parts.
4. Finally, we close the file using the `close` method on the file object.


In [11]:
import pathlib as pl

path = pl.Path("./tutorial/tests/data/hello.txt")
file_ob = open(path, 'r')
contents = file_ob.readlines()
file_ob.close()
print(contents)

['Hello, I am a file with some text.\n']


Notice that calling `read`, `readline` or `readlines` *consumes* the file, either fully or to the corresponding location. This means that if we call `readlines` twice, we will get an empty list the second time:

In [12]:
import pathlib as pl

path = pl.Path("./tutorial/tests/data/hello.txt")
file_ob = open(path, 'r')
contents = file_ob.readlines()
print(contents)
other_contents = file_ob.readlines()
print(other_contents)
file_ob.close()


['Hello, I am a file with some text.\n']
[]


We can use this to read a file line-by-line by just iterating over the file using a `for` loop or a list compherension. The `file` object implements the [iterator](https://docs.python.org/3/glossary.html#term-iterator) protocol:

In [13]:
import pathlib as pl

path = pl.Path("./tutorial/tests/data/lines.txt")
file_ob = open(path, 'r')
for line in file_ob:
    print(line)
file_ob.close()

this

file

has multiple

lines

how many

lines

does this file

have?



This is the most *pythonic* way to read a file line-by-line instead of reading the full contents at once.

#### Quiz on file reading

In [14]:
import tutorial.quiz.input_output as op
op.ReadFiles()

ReadFiles(children=(Question(children=(HTML(value='<strong>Q1:</strong> Can you read from a file before callin‚Ä¶

#### Exercises on file reading

In [15]:
%reload_ext tutorial.tests.testsuite

1. Modify the function `solution_read_file` to return the content of the file passed as the `input_file` argument. Return the content as a **list of strings**, one string per line.


In [18]:
%%ipytest

from pathlib import Path

def solution_read_file(input_file: Path) -> list[str]: 
    """Reads the contents of a file

    Args:
        input_file : The file to read

    Returns:
        - A list of strings, each representing a line in the file
    """
    return

VBox(children=(Output(outputs=({'output_type': 'display_data', 'data': {'text/plain': 'HTML(value=\'<div style‚Ä¶

### Writing to a file
The process to write data to a file is very similar, the main difference being that:
- We use `w` as a second argument of `open` to specify that we want to write to the file.If the file already exists, it will be erased before we write something else to it. If you want to append to the file, you should use `a` instead.
- We use `write` to write a *string* to the file. Other types of object should be converted to string before being written.


Let's see this in action by writing your name in a file called `me.txt` in [data](./data/)

In [19]:
import pathlib as pl

path = pl.Path("./tutorial/tests/data/me.txt")
file_ob = open(path, "w")
file_ob.write("Simone")
file_ob.close()

Congratulations! Your name is now written in stone. 

If we want to write the contents of an *iterable* to a file, we can use the `writelines` method:

In [20]:
import pathlib as pl

path = pl.Path("./tutorial/tests/data/numbers.txt")
file_ob = open(path, "w")
file_ob.writelines([str(i) + "\n" for i in range(10)])
file_ob.close()

Notice that for each line, we concatenate the `newline` `\n` symbol to the string to be written to write the text to a new line.


#### Quiz on file writing

In [21]:
import tutorial.quiz.input_output as op
op.WriteFiles()

WriteFiles(children=(Question(children=(HTML(value='<strong>Q1:</strong> What does <code>w</code> in the secon‚Ä¶

#### Exercises on file writing

In [22]:
%reload_ext tutorial.tests.testsuite

1. Modify the function `solution_write_file` to write the sentence "python tutorial 2023" (**without quotes**) to the file `output_file`, which is available as a `Path` object as argument to the function.

In [23]:
%%ipytest

from pathlib import Path

def solution_write_file(output_file: Path) -> None:
    """Writes a string to a file

    String to be written is "python tutorial 2023"

    Args:
        output_file : The file to write to

    Returns:
        - None (writes to file)
    """

VBox(children=(Output(outputs=({'output_type': 'display_data', 'data': {'text/plain': 'HTML(value=\'<div style‚Ä¶

2. Modify the function `solution_read_write_file` to read the lines from the file `input_file` and write them in the form `line, length`, to the file `output_file`. Here `line` is the line of text in `input_file` **without the line ending (\r\n)**, `length` is **number of characters** in that line. The characters are added automatically by the operating system when reading a file. Windows uses **\r\n** whereas Linux and MacOS use only **\n**. To remove the line ending characters, use ```strip("\r\n")```, e.g.: ```line.strip("\r\n")```.

    If `input_file` contains these lines:
    
    ```
    first
    second
    ```
    
    we expect the output file to contain these lines:
    
    ```
    first, 5
    second, 6
    ```
    
    Do not forget to add a **\n** when writing lines in the file.

In [31]:
%%ipytest

from pathlib import Path

def solution_read_write_file(input_file: Path, output_file: Path) -> None:
    """Reads the contents of a file and writes it to another file adding a length count

    The length of each line is added to the end of the line with a comma and a space separating the line and the length

    Args:
        input_file : The file to read
        output_file : The file to write to

    Returns:
        - None (writes to file)
    """

VBox(children=(Output(outputs=({'output_type': 'display_data', 'data': {'text/plain': 'HTML(value=\'<div style‚Ä¶

### Context managers
As you can see, after *opening* and performing operations on a file, we always have to remember to *close* it.
If we forget to close it, unexpected behaviour can happen.
If the program crashes later on, for example, we might have a situation where the text is not written to the file.
If you open many files and you don't close them, the Python interpreter can run out of memory.
On some operating systems, the file contents are only updated after closing, etc ...

This pattern is common when using  many different types of *resources*: files, connections, threads, servers, etc...
You acquire access to the resource, do some work on it, and, finally, clean up after yourself by closing it again.
Because of this, Python offers a construct called [*context manager*](https://docs.python.org/3/reference/datamodel.html#context-managers) which implements exactly this beahvior:
- Get access to a resource, e.g. open a file.
- Do some work with the resource.
- Release this resource, e.g close the file or the connection.

In the case of files, we can replace the open-read-close or open-write-close sequence  with a context manager.
Context managers are used in a `with` statement:

In [32]:
import pathlib as pl

hello_file = pl.Path("./tutorial/tests/data/hello.txt")
with open(hello_file, "r") as file_ob: 
    contents = file_ob.readlines()
    print(contents)

['Hello, I am a file with some text.\n']


`with open(path) as name` opens the file in `path` and assigns it to the `name` file object.
This object is only valid in the *scope* of the context manager, that is the indented block of code that follows the `:`.
Once the Python interpreter leaves the context manager, `file_ob.close()` is automatically called, ensuring the file is properly closed no matter what happens. 
This means that you don't have to manually call `close` after `open` anymore, avoiding many potential bugs.

This pattern can be extended to any other resource that should be managed in a similar way, for example database connections. 
Any object that implements `__enter__` and `__exit__` can be used with the context manager syntax.

If you want to learn how to implement context managers for other types of objects, please refer to the `contextlib` [documentation](https://docs.python.org/3/library/contextlib.html) in the Python standard library.

#### Quiz on context managers

In [33]:
import tutorial.quiz.input_output as op
op.ContextManagers()

ContextManagers(children=(Question(children=(HTML(value='<strong>Q1:</strong> Do you need to call <code>close<‚Ä¶

### Binary I/O

Another aspect of file I/O is accessing files in [*binary mode*](https://docs.python.org/3/library/io.html#binary-i-o); that means that instead of writing and reading *text*, we manipulate `bytes` in order to represent non-textual data. This is useful for interacting with measurement data and other non-textual information like images, machine learning model parameters and other complex strcuture, although in most cases you won't need the low-level control of binary I/O and will use libraries instead. 

To look at an example, let's write a sequence of `int` to `output.dat`  as a  sequence of bytes. Because one byte corresponds to 8 bits, using one byte per integer means we can unambigously store `2^8=256` values.

<div class="alert alert-block alert-warning">
    <h4><b>Note</b></h4>
    <code>.dat</code> is a typical generic extension to indicate that the file contains some sort of data. Filename extensions does not have any binding meaning by themselves, they are simply a convention for users to quickly see what content to expect.
</div>

In [34]:
import pathlib as pl

bin_file = pl.Path("./tutorial/tests/data/output.txt")
with open(bin_file, "wb") as out_file:
    bs = b"".join([i.to_bytes(1, 'little') for i in  range(10)])
    print(bs)
    out_file.write(bs)

b'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t'


Note that we used the mode `wb` for `write, binary`. The `write` function expects a [`bytes`](https://docs.python.org/3/library/stdtypes.html#bytes) object. Because of the historical connection between bytes and strings, we can produce a bytes object by prepending `b` to a string literal. Therefore, we generate an array of bytes using [`to_bytes`](https://docs.python.org/3/library/stdtypes.html#int.to_bytes) and combine them with the [`join`](https://docs.python.org/3/library/stdtypes.html#str.join) method on the empty byte literal `b""`.

Now that we wrote out our sequence, we can try to read it back from the file:

In [35]:
import pathlib as pl

bin_file = pl.Path("./tutorial/tests/data/output.txt")
with open(bin_file, "rb") as in_file:
    data = in_file.read()
    seq = [b for b in data]
    print(seq)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


And, surprise! We obtain our original sequence.
Note that because of a quirk of Python, the `read` method returns a `bytes` object, but when we try to access a single element (as we would do with a string), the entry is already an `int` which corresponds to the Unicode codepoint.


If instead we want to store our numbers using two bytes per digit, we do:

In [36]:
import pathlib as pl
import struct

bin_file = pl.Path("./tutorial/tests/data/output1.txt")
with open(bin_file, "wb") as out_file:
    bs = b"".join([struct.pack(">H", i) for i in range(1024)])
    out_file.write(bs)

We do this by using the [`struct`](https://docs.python.org/3/library/struct.html) module of the Python standard library, which offers methods to represent built-in data as different bytes formats.

The first argument to `struct.pack` is the format the data should be interpreted into, ">" means little-endian, or putting the byte representing the smallest digit of each number first. "H" means an *unsigned short*, which corresponds to two bytes.

If we want to read the data, we can do:

In [37]:
import pathlib as pl
import struct

bin_file = pl.Path("./tutorial/tests/data/output1.txt")
with open(bin_file, "rb") as in_file:
    data = in_file.read()
    seq = [i for i, *rest in struct.iter_unpack(">H", data)]
    print(seq)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221,

#### Bytes and strings
Earlier, we mentioned the connection between bytes and strings in Python. To learn more about this, let's briefly review the Unicode UTF-8 standard.

The Unicode UTF-8  standard is a system to represent texts; in Unicode  every character is assigned a number (the *codepoint*) that gives the location of this symbol in the table of all symbols. Particularly, the Unicode UTF-8 standard stores each character as 1 to four bytes (each byte is a group of 8 bits). 

Let's see this more clearly with an example:

In [38]:
for i in range(0, 6):
    bt = i.to_bytes(4, 'little')
    print(f"bytes: {[hex(b) for b in bt]}, integer: {i}, string: {chr(i)}")
for i in range(60, 74):
    bt = i.to_bytes(4, 'little')
    print(f"bytes: {[hex(b) for b in bt]}, integer: {i}, string: {chr(i)}")
for i in range(8**3, 8**3 + 3):
    bt = i.to_bytes(4, 'little')
    print(f"bytes: {[hex(b) for b in bt]}, integer: {i}, string: {chr(i)}")

bytes: ['0x0', '0x0', '0x0', '0x0'], integer: 0, string:  
bytes: ['0x1', '0x0', '0x0', '0x0'], integer: 1, string: 
bytes: ['0x2', '0x0', '0x0', '0x0'], integer: 2, string: 
bytes: ['0x3', '0x0', '0x0', '0x0'], integer: 3, string: 
bytes: ['0x4', '0x0', '0x0', '0x0'], integer: 4, string: 
bytes: ['0x5', '0x0', '0x0', '0x0'], integer: 5, string: 
bytes: ['0x3c', '0x0', '0x0', '0x0'], integer: 60, string: <
bytes: ['0x3d', '0x0', '0x0', '0x0'], integer: 61, string: =
bytes: ['0x3e', '0x0', '0x0', '0x0'], integer: 62, string: >
bytes: ['0x3f', '0x0', '0x0', '0x0'], integer: 63, string: ?
bytes: ['0x40', '0x0', '0x0', '0x0'], integer: 64, string: @
bytes: ['0x41', '0x0', '0x0', '0x0'], integer: 65, string: A
bytes: ['0x42', '0x0', '0x0', '0x0'], integer: 66, string: B
bytes: ['0x43', '0x0', '0x0', '0x0'], integer: 67, string: C
bytes: ['0x44', '0x0', '0x0', '0x0'], integer: 68, string: D
bytes: ['0x45', '0x0', '0x0', '0x0'], integer: 69, string: E
bytes: ['0x46', '0x0', '0x0', '0x0']

The `chr` function takes an integer and returns the corresponding Unicode character.

#### Converting bytes to text 

If we receive a `bytes` object and we want to transform it to text, we can use the `int.from_bytes` method and the `chr` function:

In [46]:
message = "Ciao"
message_secret = bytes(message, "utf-8")
print_messages = [f"The `uft8` codepoint  is = {enc}, the bytes representation = {enc.to_bytes(4, 'little')}, the representation is {chr(enc)}" for plain, enc in zip(message, message_secret)]
for msg in print_messages:
    print(msg)

The `uft8` codepoint  is = 67, the bytes representation = b'C\x00\x00\x00', the representation is C
The `uft8` codepoint  is = 105, the bytes representation = b'i\x00\x00\x00', the representation is i
The `uft8` codepoint  is = 97, the bytes representation = b'a\x00\x00\x00', the representation is a
The `uft8` codepoint  is = 111, the bytes representation = b'o\x00\x00\x00', the representation is o


### Reading/Writing CSV files
If you ever worked with tabular data, you surely encountered CSV (comma separated values) files.
These files are used to store table in text format row by row: each row is separated by a new line and the columns inside a row are separated by commas `,` or by semicolumns `;`.
The first line in the file usually contains the header giving the names of the columns:

```
first_column,second_column
1,2
2,3
```

As CSV is very common to exchange tabular data such as statistics and time series, the Python standard library offers facilities to read and write CSV files through the [csv](https://docs.python.org/3/library/csv.html) module.
Despite this, today most people prefer using [pandas](https://pandas.pydata.org/) or [polars](https://www.pola.rs/) to manipulate tabular data because they offer more convenience and faster handling of large datasets.
These packages are outside of the scope of this tutorial and will not be covered here.


Let's see how to read csv files using `csv` with an example by reading [example.csv](./data/example.csv):

In [47]:
import csv
import pathlib as pl

csv_file = pl.Path("./tutorial/tests/data/example.csv")
with open(csv_file) as input_file:
    reader = csv.reader(input_file)
    #Get the header
    header = next(reader)
    #Iterate over lines
    for line in reader:
        print(line)

['1', '2', 'some']
['3', '4', 'numbers']
['5', '6', 'here']


The `next` function called on any iterable returns the next value and advances the iterable counter. In the case of `csv.reader`, we use this to read the header.

The `csv` module does not interpet the data; as a default everything is read as `str`.

Similarly, you can use the CSV module to write a CSV using the [`writer`](https://docs.python.org/3/library/csv.html#csv.writer) class. The `writer` object has `writerow` method which takes an `iterable` of values to write as the current row

In [None]:
import csv
import pathlib as pl

csv_file = pl.Path("./tutorial/tests/data/example1.csv")

with open(csv_file, "w") as output_file:
    writer = csv.writer(output_file)
    
    # Get the header
    writer.writerow(["this", "is", "data"])
    
    # Iterate over lines
    [writer.writerow((i, i+1, i+2)) for i in range(10)]
        

#### Quiz on CSV

In [None]:
import tutorial.quiz.input_output as op
op.CSV()

## Exercises

In [None]:
%reload_ext tutorial.tests.testsuite

### Exercise 1: CSV to dictionary (easy)

Write a function that reads a CSV file and returns a dictionary.

- The dictionary keys are in the first **column**
- The dictionary values are in the second **column**

For example, if the file contains the following lines:

```
key,value
a,2
b,4
c,6
```

this function should return the dictionary:

```python
{"a":"2", "b": "4", "c": "6"}
```

<div class="alert alert-block alert-info">
    <h4><b>Hint</b></h4>
    <ul>
        <li>
            The <code>csv.reader</code> reads each column entry as a string by default.
        </li>
        <li>
            Calling <code>next(csv_reader)</code> immediately after you have created the <code>csv.reader</code> object returns the header to the file and advances the reader to the first data row.
        </li>
        <li>
            The parameter <code>input_file</code> of the function skeleton <code>solution_exercise1</code> is the path of the input file. Read the file from that location.
        </li>
    </ul>
</div>






In [None]:
%%ipytest

import csv
import pathlib as pl

def solution_exercise1(input_file: pl.Path) -> dict[str, list[str]]:
    """Reads a CSV file and returns a dictionary with each row representing a key, value pair

    The key is the first element of the row and the value is the second value of the row

    Args:
        input_file : The csv file to read

    Returns:
        - A dictionary with each row represented as a key, value pair
    """
    return

### Exercise 2: Counting words (easy)

Write a function  to read all the lines from `input_file` and count the number of words in the file. The solution should be a single number.

For example, for the file
```
this 
file 
has 
five
lines
```
the result should be `5`. 

<div class="alert alert-block alert-info">
    <h4><b>Hint</b></h4>
    <ul>
        <li>
            The file is available as the parameter <code>input_file</code> of <code>solution_exercise2</code> function
        </li>
        <li>
           A word consists of <b>printable</b> characters without whitespaces, line breaks etc. Have a look at the basic_datatypes notebook if you forgot how to split a text into it's words.
        </li>
    </ul>
</div>




In [None]:
%%ipytest

import pathlib as pl

def solution_exercise2(input_file: pl.Path) -> int:
    """Reads a file and returns the number of words in the file

    Args:
        input_file : The file to read

    Returns:
        - The number of words in the file
    """
    return

### Exercise 3: Letter statistics (medium)

Write a function that reads all the lines from `lines.txt` and counts the occurences of every letter present in all the words.

The result should be a dictionary, where each key-value pair is like `{letter: count}`. For example, `{a: 5}` means that the letter `a` appeared five times in this file.

<div class="alert alert-block alert-info">
    <h4><b>Hints</b></h4>
    <ul>
        <li>
          The file is available as the <code>input_file</code> parameter of <code>solution_exercise3</code> function.
        </li>
        <li>
            You may assume all letters are lower case and contained in the collection <code>string.ascii_lowercase</code>.
        </li>
        <li>
            To verify if a character is a letter, you can use the <code>isalpha()</code> string method. Example: 
    <code>'a'.isalpha()</code> will return <code>True</code>.
        </li>
        <li>
            As a sidenote, historically the order of a dictionary behaved as randomly as the order of a set. Comparing dictionaries ignores the order of the entries. You may also ignore any ordering on the dictionary for this exercise. For some applications it may be helpful to have the dictionary sorted such that when extracting key value pairs you get a nice list. A dictionary is always sorted by order of insertion.
        </li>
    </ul>
</div>

In [None]:
%%ipytest input_output

import pathlib as pl
import string

def solution_exercise3(input_file: pl.Path) -> dict[str, int]:
    """Reads a file and returns a dictionary with a count of each letter

    We consider only the letters a-z

    Args:
        input_file : The file to read

    Returns:
        - A dictionary with a count of each letter
    """
    return

### Exercise 4: Translating words (medium)

Write a function which takes the words from the file `english.txt` and translates them to Italian using the dictionary file `dict.csv`. The output should be a **list of tuples** with the pair `italian, english` if the word is found and nothing otherwise.

For example, given the `english.txt` file:

```
bread
cat
```

and the `dict.csv` file:

```
120, pane, bread
121, sole, sun
```

the result should be:

`[(bread, pane), ]`


<div class="alert alert-block alert-info">
    <h4><b>Hints</b></h4>
    <ul>
        <li>
            Try to avoid loading the dictionary more than once. <b>"Dictionary file"</b> should suggest you the correct Python data structure to use to store the translations.
        </li>
        <li>
            The path to the input file <code>english.txt</code> is available as the argument <code>english</code> of the function <code>solution_exercise4</code>, the file <code>dict.csv</code> as the argument <code>dictionary</code>.
        </li>
    </ul>
<div>


In [None]:
%%ipytest

import pathlib as pl

def solution_exercise4(english: pl.Path, dictionary: pl.Path) -> list[(str, str)]:
    """Reads two files and returns a outputs a list of translation tuples

    Each word in the english file is translated using the dictionary file to an italian word
    and a tuple (english, italian) is added to the list (or not if the word is not in the dictionary)

    Args:
        english : The file with the english words
        dictionary : The file with the dictionary

    Returns:
        - A list of tuples with the english / italian words
    """
    return

### Exercise 5: Binary format (hard)


The file `super_secret.dat` contains a secret message. We know that the message is stored in binary format as a sequence of bytes. The message starts with the byte sequence `b'\xff\xee\xdd\xcc\xbb\xaa'` and finishes with `b'\xaa\xbb\xcc\xdd\xee\xff'`. 
Write a function that reads the file and returns **only** the secret message as a string.


<div class="alert alert-block alert-info">
    <h4><b>Hints</b></h4>
    <ul>
        <li>
        The path to the input file is available as the argument <code>secret_file</code> of the function <code>solution_exercise5</code>
        </li>
        <li>
            Every <code>bytes</code> object has the <code>decode()</code> method. For example, <code>b"\xbb\xcc\xdd\xee\xff".decode()</code>
        </li>
    </ul>

</div>

In [None]:
%%ipytest

import pathlib as pl

def solution_exercise5(secret_file: pl.Path) -> str:
    """Reads a file and returns the secret message

    The file contains a secret message encoded in binary
    start and end of the message are marked by the bytesequences 
    b'\xff\xee\xdd\xcc\xbb\xaa' and b'\xaa\xbb\xcc\xdd\xee\xff'
    
    Args:
        secret_file : The file with the secret message

    Returns:
        - The secret message
    """
    return