# Python for everyone -- 09 Reading and writing files

<a href="https://classroom-40p3.onrender.com/" target="_blank">Classroom sign-in</a>

## Reading files

Most the data that we are planning to work with come in files, typically csv files. 

A csv (comma separated values) is a simple text file that looks something like this:
```
Name,Height
Adam,180
Belzebub,220
Papa Smurf,10
...
```
By simple text file we mean that there are no formatting options, no way to insert an image, just plain text. Note that an xls or doc file is not plain text, it is also more complicated to read or write with Python.


Files are typically stored on your computer's hard drive and they are a sequence of 1s and 0s. There are two main types of files:
* Simple text files: when the 1s and 0s translate to characters (examples: csv, txt, html,...)
* Binary files: 1s and 0s could mean anything, including non-textual data (examples: executable file, image, audio, zip,...)

In most cases, working with text files is enough, so here we focus on text files.

To be able to store text as 1s and 0s or recover text from a sequence of 1s and 0s, we need to be able to translate between the two. This is called character encoding ([slides](https://posfaim.github.io/dnds5027/slides/text_encoding.pdf)).

In [None]:
f = open('example.csv')
# or
#f = open('example.csv', 'rt') #'r' stands for read and t stands for text file

This created a variable `f` which contains a *file handle*.

In [None]:
contents = f.read()
contents

Your notebook has a working directory. In a way, the notebook is running in that folder on your computer. The file that you want to open has to be in the working directory, unless you specify otherwise.

You can check the working directory using the following IPython magic command:

In [None]:
%pwd

(Magic commands are not part of Pyhton, they are just a convenient feature supported by Jupyter Notebook.)

What happens if I read the file again?

In [None]:
contents = f.read()
contents

The contents seems to be now empty. The file handle `f` has a pointer that indicates the point in the file where are when reading it. When we open the file, it starts at the beginning and as we read the file it moves to the end.

The second time we executed the `f.read()` method, we were at the end of the file, so we didn't find anything.

To read it again, we have to open it again:

In [None]:
f = open('example.csv')
f.read()

### Reading line-by-line

Read a single line:

In [None]:
f = open('example.csv')

In [None]:
line = f.readline()
line

In [None]:
line2 = f.readline()
line2

Iterate over file line-by-line:

In [None]:
f = open('example.csv')

In [None]:
for line in f:
    print(line)

Q: why do we have the empty lines in the output?

After using a file: close it

In [None]:
f.close()

In [None]:
f.read()

Not closing a file can mainly cause trouble if you are writing to a file.

### `with` context manager

Better way to open files: the `with` context manager.

In [None]:
with open('example.csv') as f:
    for line in f:
        print(line, end="")

The file is open inside the indented block below the `with` statement. When the program exits the `with` block, it properly closes the file:

In [None]:
f.read()

#### ðŸ”´ Exercise -- read file

Create a file with a text editor with the following contents:
```
Paris,2024
Tokyo,2020
Rio de Janeiro,2016
London,2012
Beijing,2008
```

Read the file and print out its contents on the screen.

<details><summary><u>Solution.</u></summary>
<p>
    
```python

olympics = {}
with open('olympics.csv') as f:
    for line in f:
        print(line, end="")
olympics 
```
    
</p>
</details>

### Relative and absolute file paths

If the file is not in the working directory, you need to specify its location. For example, if you have a separate `data` folder:

In [None]:
absolute_file_path = '/home/posfaim/Dropbox/teaching/python_for_everyone/yearly_stuff/2025-26/notebooks/data/example.csv'
with open(absolute_file_path) as f:
    for line in f:
        print(line, end="")

This seems to work, but it is not good practice:
* If I copy my project to a different location on my computer, the code will break
* If you download the project, it will break, because you have different folder structure than I do.

It is better to use a relative path (relative compared to the current working directory):

In [None]:
relative_file_path = 'data/example.csv'
with open(absolute_file_path) as f:
    for line in f:
        print(line, end="")

### Parsing the csv example

A csv file is like a table:
* Each row in the file is row in the table
* In each row, the columns are separted by a ','

To work with the data from the file, it has to be parsed. E.g., converted into numbers and stored in variables.

To parse each row, we will use the `.split()` method of strings.

In [None]:
s = "This is a sentence."
words_list=s.split()
words_list

In [None]:
row = "apple,12,42,48"
row_list = row.split(',')
row_list

Let's store the names and heights of the people in our file in lists:

In [None]:
names = []
heights = []
with open('example.csv') as f:
    f.readline() # the first line in the file is a header, let's skip that
    for line in f:
        l = line.split(',') # split the row into entries
        names.append(l[0]) 
        heights.append(float(l[1])) # we need to convert the height to a number

In [None]:
names

In [None]:
heights

This example already shows that things can get complicated even when parsing csv files. There are python packages that help with this, e.g., `csv` and `pandas`.

#### ðŸ”´ Exercise -- read file

Read the previous about the Olympics and save it's contents in a dictionary where keys are the city names and the values are the years.

<details><summary><u>Solution.</u></summary>
<p>
    
```python

olympics = {}
with open('olympics.csv') as f:
    f.readline() # skip the header
    for line in f:
        l = line.split(',')
        olympics[l[0]]=int(l[1])
olympics 
```
    
</p>
</details>

### Writing files

We can also write text files:

In [None]:
names

In [None]:
with open('output.csv', 'w') as f:
    for name in names:
        f.write(name+'\n')

The `'w'` passed as argument to the `open()` function means that we are writing, not reading, the file.

When we call the `open('output.csv', 'w')` function, Python creates the `output.csv` file. If the file exists, Python overwrites it (so the previous version is lost!).

In [None]:
!cat output.csv

You can also append to the end of an existing file:

In [None]:
new_names = ['Humpty Dumpty', 'Ms. Frizzle']
with open('output.csv', 'a') as f:
    for name in new_names:
        f.write(name+'\n')

In [None]:
!cat output.csv

#### ðŸ”´ Exercise -- read file

Save the contents of the following dictionary as a csv file.

Hint: you can iterate over the key-item pairs of the dictionary using `penguin_weights.items()`.

In [None]:
penguin_weights = {
    "Emperor": 30.0,
    "King": 13.5,
    "Adelie": 4.5,
    "Chinstrap": 4.0,
    "Gentoo": 5.5,
    "Little Blue": 1.2,
    "Magellanic": 4.8
}


<details><summary><u>Solution.</u></summary>
<p>
    
```python

with open('penguins.csv', 'w') as f:
    for key,value in penguin_weights.items():
        f.write(f"{key}, {value}\n") 
```
    
</p>
</details>

Some you new information about penguins has arrived!!!! Quickly add it to the existing file.

In [None]:
new_penguin_weights = {
    "African": 3.0,
    "Rockhopper": 2.5,
    "Macaroni": 5.5
}

 

<details><summary><u>Solution.</u></summary>
<p>
    
```python

with open('penguins.csv', 'a') as f:
    for key,value in new_penguin_weights.items():
        f.write(f"{key}, {value}\n")
```
    
</p>
</details>