# Files

## Average of the grades

The file `grades.txt` contains the students’ grades. Each line of the file contains a single grade.

The next cell uses a Colab magic command to create this file; after execution, the file will be located in the current working directory.


In [None]:
%%writefile grades.txt
13.5
17
9.5
12
14
6
5.5
8.5
10.5
29
14
9
15.5
11.5
16
18
13
12.5
15.5
17


Read each line of this file, extract each grade as a *float* and store it in a list.

Finish by computing and displaying the average of the grades with two decimal places.


In [None]:
# Your code here

### Solution

In [None]:
import pathlib

# With the open function
grades = []
with open("grades.txt", encoding="utf8") as file:
    for line in file:
        grades.append(float(line))

# With pathlib by opening the file "manually"
grades = []
with pathlib.Path("grades.txt").open(encoding="utf8") as file:
    for line in file:
        grades.append(float(line))

# With pathlib by reading the contents directly
grades = []
for grade in pathlib.Path("grades.txt").read_text(encoding="utf8").split():
    grades.append(float(grade))

# With a list comprehension
grades = [float(n)
          for n in pathlib.Path("grades.txt").read_text(encoding="utf8").split()]

# Displaying the average
print(f"{sum(grades) / len(grades):.2f}")


## Passed or failed

Rewrite the grades into the file `grades2.txt` with one grade per line, followed by whether the student passed or failed. The first three expected lines for the file `grades2.txt` are:

```default
13.5 passed
17.0 passed
9.5 failed
```


In [None]:
# Your code here

### Solution

In [None]:
import pathlib


# See the solutions to the previous exercise for other versions
grades = [float(n)
          for n in pathlib.Path("grades.txt").read_text(encoding="utf8").split()]

with pathlib.Path("grades2.txt").open(mode="w", encoding="utf8") as file:
    for grade in grades:
        status = "passed" if grade >= 10 else "failed"
        file.write(f"{grade} {status}\n")

# Iterating over both at the same time
with open("grades.txt", encoding="utf8") as file_read, open(
    "grades2.txt", mode="w", encoding="utf8") as file_write:
    for line in file_read:
        grade = float(line)
        # status = "passed" if grade >= 10 else "failed"
        if grade < 10:
            status = "failed"
        elif grade >= 10:
            status = "passed"
        file_write.write(f"{grade:04.1f} {status}\n")

with open("grades2.txt", encoding="utf8") as file_read2:
    print(file_read2.read())


## `.csv` files

We are going to work with a `.csv` file that contains information about movies extracted from Wikipedia.

Let’s start by retrieving it:


In [None]:
!wget https://query.data.world/s/e7xmrdjqkosrt7fkh54qalp6fajbwz?dws=00000 -O wiki_movie_plots_deduped.csv
!head wiki_movie_plots_deduped.csv


Use the [`reader`](https://docs.python.org/fr/3/libra...?highlight=csv#csv.reader) class from the `csv` module to read the file.

Once the reader has been created, you can iterate over it with a `for` loop to get a list representing one line.

You can also retrieve a single line by using the [`next`](https://docs.python.org/fr/3/library/functions.html#next) function on the reader.

- Retrieve the first line (the column names) and print it.
- Print the first 10 movie titles.


In [None]:
# Your code here

### Solution

In [None]:
# Solution with enumerate to control iteration
import csv

with open("wiki_movie_plots_deduped.csv", encoding="utf8") as file:
    reader = csv.reader(file)
    columns = next(reader)
    print(f"The columns are {columns}")
    title_index = columns.index("Title")
    for i, line in enumerate(reader):
        if i == 10:
            break
        print(line[title_index])


In [None]:
# Solution with zip to control iteration
import csv

with open("wiki_movie_plots_deduped.csv", encoding="utf8", newline="") as fh:
    reader = csv.reader(fh)
    columns = next(reader)
    print(f"The columns are {columns}")
    title_index = columns.index("Title")
    for line, _iteration_limit in zip(reader, range(10)):
        print(line[title_index])


In [None]:
# Solution with zip to control iteration and DictReader to read the CSV file
import csv

with open("wiki_movie_plots_deduped.csv", encoding="utf8", newline="") as fh:
    reader = csv.DictReader(fh)
    for line, _iteration_limit in zip(reader, range(10)):
        print(line["Title"])


## Spiral

In this exercise, you will compute the Cartesian coordinates of a two-dimensional spiral.

The Cartesian coordinates $x_A$ and $y_A$ of a point $A$ on a circle of radius $r$ and angle $\theta$ are given by the formulas
$x_A = \cos(\theta) \times r$ and $y_A = \sin(\theta) \times r$.

![Point A on the circle of radius r.](https://raw.githubuserconte.../main/img/spirale-coord.png "Point A on the circle of radius r.")

To compute the Cartesian coordinates that describe the spiral, you will make two variables change at the same time:

- the angle $\theta$, which will take values from $0$ to $4\pi$ radians in steps of $0.1$, which corresponds to two complete turns;
- the radius of the circle $r$, whose initial value is $0.5$ and which you will increment (that is, increase) in steps of $0.1$.

The trigonometric functions sine and cosine are available in the `math` module. To use it, you will add at the beginning of your script the instruction:

```python
import math
```

You will then write the coordinates of each point of the spiral into a text file named `spiral.dat`, one point per line, with the $x$ coordinate, a space, then the $y$ coordinate.

The first lines of `spiral.dat` should look like:

```default
   0.50000    0.00000
   0.59700    0.05990
   0.68605    0.13907
   0.76427    0.23642
   0.82895    0.35048
   0.87758    0.47943
[...]
```

Once you have generated the file `spiral.dat`, visualize your spiral with the code in the next cell.

You can then try playing with the parameters $\theta$ and $r$, and their increment step, to construct new spirals.


In [None]:
# Your code here

In [None]:
# To run after filling spiral.dat above
import matplotlib.pyplot as plt


xs = []
ys = []
with open("spiral.dat", "r") as f_in:
    for line in f_in:
        x, y = map(float, line.split())
        xs.append(x)
        ys.append(y)

plt.figure(figsize=(8, 8))
plt.axis("off")
plt.plot(xs, ys)
plt.show()


### Solution

In [None]:
import math
import pathlib

theta = 0
r = 0.5
with pathlib.Path("spiral.dat").open(mode="w", encoding="utf8") as file:
    while theta < math.pi * 4:
        file.write(f"{math.cos(theta) * r} {math.sin(theta) * r}\n")
        theta += 0.1
        r += 0.1


In [None]:
# To run after filling spiral.dat above
import matplotlib.pyplot as plt


xs = []
ys = []
with open("spiral.dat", "r") as f_in:
    for line in f_in:
        x, y = map(float, line.split())
        xs.append(x)
        ys.append(y)

plt.figure(figsize=(8, 8))
plt.plot(xs, ys)
plt.axis("off")
plt.show()


## Format conversion

Take the `.csv` file from the previous exercise again and write it as a `.json` file and a `.jsonl` file.

For the `.json` file:

1. Create a list that will contain as many elements as there are rows in the CSV file.
2. Each element will be a dictionary that contains as keys the column names from the CSV (the headers) and as values the values from the CSV row.
3. Call [`json.dump`](https://docs.python.org/fr/3/library/json.html#json.dump) to write the object into a file.

For the `.jsonl` file, use the same idea but write one JSON object per line, using [`json.dumps`](https://docs.python.org/fr/3/library/json.html#json.dumps).


In [None]:
# Your code here

### Solution

`.json` file

In [None]:
import csv
import json


data = []

with open("wiki_movie_plots_deduped.csv", newline="", encoding="utf8") as f:
    reader = csv.reader(f)
    headers = next(reader)
    for i, row in enumerate(reader):
        # Method 1, using the indices of the headers and row lists
        line_dict = {}
        for j in range(len(headers)):
            key = headers[j]
            value = row[j]
            line_dict[key] = value

        # Method 2, using enumerate to link headers and row
        line_dict = {}
        for j, value in enumerate(row):
            key = headers[j]
            line_dict[key] = value

        # Method 3, using zip
        line_dict = {}
        for key, value in zip(headers, row):
            line_dict[key] = value

        # Method 4, using zip and a dictionary comprehension
        line_dict = {key: value for key, value in zip(headers, row)}

        # Method 5, using zip and a call to dict
        line_dict = dict(zip(headers, row))

        data.append(line_dict)

with open("wiki_movie_plots_deduped.json", "w", encoding="utf8") as fh:
    json.dump(data, fh)


`.jsonl` file

In [None]:
import csv
import json


with open("wiki_movie_plots_deduped.csv", newline='', encoding="utf8") as fhr,      open("wiki_movie_plots_deduped.jsonl", "w", encoding="utf8") as fhw:
    reader = csv.reader(fhr)
    headers = next(reader)
    for i, row in enumerate(reader):
        line_dict = dict(zip(headers, row))
        fhw.write(f"{json.dumps(line_dict)}\n")


In [None]:
!head wiki_movie_plots_deduped.jsonl

Using `DictReader`

In [None]:
import csv
import json


with open("wiki_movie_plots_deduped.csv", newline='', encoding="utf8") as fhr,      open("wiki_movie_plots_deduped.jsonl", "w", encoding="utf8") as fhw:
    reader = csv.DictReader(fhr)
    for row in reader:
        fhw.write(f"{json.dumps(row)}\n")
