# File input / output

In this section you will learn how to read and write ASCII data (pure text data).

Content:

+ Open
+ Close
+ Read
+ Write
+ Read CSV file
+ Write CSV file

Before we begin it should be clear what a file is. A file is like a sheet of paper with a header, data in the middle and the line at the end. The header contains the file name, its size and type, and more. The data is what ever you are working with. The end of file line contains in most cases a specual character telling the system 'here is the end'.

Depending on the system you are working on the lines end with different special characters which are not displayed (most editors provide the function 'show invisible characters').

```
Windows:        CR+LF which is equal to \r\n
Linux and Mac:  LF which is euqal \n
```

<br />

## Open a file for reading

To open the file with the Python `open` function is the first step. It returns a file object that allows us to use finctions for reading and writing.

Reference:
```
open(file, mode='r', newline=None)
```



A file can be opend for reading or writing ASCII or binary data by setting the mode in the function call.

Modes:
```
'r'       open for reading (default, text mode)
'w'       open for writing, overwriting an existing file (text mode)
'b'       binary
't'       text (default)
'rb'      read binary
'wb'      write binary
'a'       appending at the end of the file
'+'       reading and writing
```

**Note:** For text files, the system depending line endings (\n on Unix, \r\n on Windows) will be converted to \n when reading, and when writing it will be converted back to the system specific line ending. Nice, you don't have to care about it.

Now, we want to open the file World_Nations_population_land_area.txt which is stored in the material/data folder in read only mode. Because it is a text file (ASCII) we can use the short mode 'r' instead of 'rt'.

```python
f = open('../data/World_Nations_population_land_area.txt', 'r')

print(f)
```


In [None]:
f = open('../data/World_Nations_population_land_area.txt', 'r')

print(f)

<br>

## Close a file

After the work is done you can close the file object with the `close` function. 

```python
f.close()
```

You can control whether a file is closed or not with the `closed` method.

```python
f.closed
```


In [None]:
f.close()
f.closed

<br>

If you try to close an already closed file nothing will happen (return value None).

In [None]:
f.close()

<br>

<h2 style="color:red"> Exercise </h2>

What happens when you try to read from an closed file? This is only about the message you get, the read function will be introduced in the next section.

Enter

```
f.read()
```

<br>


<br >

## Reading functions

The Python standard library provides a few functions to read data from a file.

1. read()
2. readline()
3. readlines()

<br>

#### 1. read()

The function `read` can be used to read the content of the file object. It reads the complete file content as a string at once.

The following code prints the complete content of the input file

```
f = open('../data/World_Nations_population_land_area.txt', 'r')

data = f.read()
print(data)
```

Output:

```bash
# Country	Population (2020)	Land Area (Km²)	Density(P/Km²)
1    Afghanistan    38928346    652860    60
2    Albania    2877797    27400    105
3    Algeria    43851044    2381740    18
4    Andorra    77265    470    164
5    Angola    32866272    1246700    26
6    Antigua and Barbuda    97929    440    223
7    Argentina    45195774    2736690    17
8    Armenia    2963243    28470    104
9    Australia    25499884    7682300    3
10    Austria    9006398    82409    109
...
195	Zimbabwe	14862924	386850	38```


In [None]:
f = open('../data/World_Nations_population_land_area.txt', 'r')

data = f.read()
print(data)

<br>

If you iterate over the `read()` output it would iterate over every character in the file.

For example, retrieve the size of the content list and print the first 120 characters in content:

```python
f = open('../data/World_Nations_population_land_area.txt', 'r')

content = f.read()

print(len(content))

print(content[:120])
```

Output:

```bash
7521
# Country	Population (2020)	Land Area (Km²)	Density(P/Km²)
1	Afghanistan	38928346	652860	60
2	Albania	2877797	27400	105
```


In [None]:
f = open('../data/World_Nations_population_land_area.txt', 'r')

content = f.read()

print(len(content))

print(content[:120])

<br>

A better way to program is to use the `with` method because at the end the file will be closed automatically.

```python
with open('../data/World_Nations_population_land_area.txt', 'r') as f:
    print(f.read())
```

Output:

```bash
# Country	Population (2020)	Land Area (Km²)	Density(P/Km²)
# Country	Population(2020)	Land Area (Km²)	Density(P/Km²)
1	Afghanistan	38928346	652860	60
2	Albania	2877797	27400	105
3	Algeria	43851044	2381740	18
4	Andorra	77265	470	164
5	Angola	32866272	1246700	26
6	Antigua and Barbuda	97929	440	223
7	Argentina	45195774	2736690	17
8	Armenia	2963243	28470	104
9	Australia	25499884	7682300	3
10	Austria	9006398	82409	109
...
195	Zimbabwe	14862924	386850	38
```


In [None]:
with open('../data/World_Nations_population_land_area.txt', 'r') as f:
    print(f.read())

<br>

#### 2. readline()

The function `readline` reads the file content line by line. For demonstration purposes we use here the method without `with` and at the end, we have to `close` the file manually.

Open the file again:

```python
f = open('../data/World_Nations_population_land_area.txt', 'r')
```

Read the first line and print it:

```python
line = f.readline()

print(line)
```

Output:

```bash
# Country	Population (2020)	Land Area (Km²)	Density(P/Km²)
```

Read and print next line.

```python
line = f.readline()
print(line)
```

Output:

```bash
2 	Albania 	2,877,797 	27,400 	105
```

And so on. A loop over the needed lines would be a better way.

<br>
<h2 style="color:red"> Exercise </h2>

Print the first 10 countries from the file to stdout.

Hint: Don't forget to close the file.

<br>


In [None]:
f = open('../data/World_Nations_population_land_area.txt', 'r')

for i in range(11):
    print(f.readline())

<br>

#### 3. readlines()

The function `readlines` reads all lines and store each line as an element in a list (of type string in this case).

```python
content = open('../data/World_Nations_population_land_area.txt', 'r').readlines()

print(content)
```

Output:

```bash
['# Country\tPopulation(2020)\tLand Area (Km²)\tDensity(P/Km²)\n', '1\tAfghanistan\t38928346\t652860\t60\n', '2\tAlbania\t2877797\t27400\t105\n', ... , '194 \tZimbabwe\t14862924\t386850\t38']
```

The output shows that between the line components a tab '\t' is used as separator.

<br>



In [None]:
content = open('../data/World_Nations_population_land_area.txt', 'r').readlines()

print(content)

<br>

When you know the number of a specific line of interest, you can directly read the line by indexing.

The 65th Nation is Germany and we know that the first line is a header. So we can use the index 65 (line 66) to get the information about the German population.

```python
f = open('../data/World_Nations_population_land_area.txt', 'r')

print(f.readlines()[65])
```

The result from above is the same as the result from below except for the difference that above the file object still needs to be closed when it is no longer needed. myline is an object of type string.

```python
myline = open('../data/World_Nations_population_land_area.txt', 'r').readlines()[65]

print(myline)
```


<br>

<h2 style="color:red"> Exercise </h2>

1. From file World_Nations_population_land_area.txt read the 125th line and tell us which country you have found.
2. Print the countries #45 to #50

<br>


In [None]:
# 1:


In [None]:
# 2:


<br>

## Open a file for writing

When writing to a file, a distinction is made between creating a new file, appending data to an existing file, or overwriting an existing file. This is done by selecting the appropriateed mode in the open statement (see above).

```python
fout = open('myoutput.txt', 'w')
```
It creates a file object in text write mode linked to the file _myoutput.txt_. If _myoutput.txt_ exists it will be overwritten.

<br>


## Write to file

Equivalent to the `read` function Python provides a `write` function as well. Every call of `write` append the input to the file until it is closed. It is important to add blanks and newlines '\n' to get the wanted text result.

```python
fout = open('myoutput.txt', 'w')

x = 10
y = 5

fout.write('This is our first write call without a newline special character.')
fout.write('And here it comes - the second write call without a newline special character.')
fout.write(str(x))
fout.write(str(y))

fout.write('\n---------\n')
fout.write('We should use a newline character. \n')
fout.write(str(x) + '\n')
fout.write(str(y) + '\n')

fout.write('\n---------\n')
fout.write('Or a blank character.\n')
fout.write(str(x) + ' ' + str(y) + '\n')
fout.close()
```


<br>

<h2 style="color:red"> Exercise </h2>

1. Open a new file
2. Write some text and data to the file. Use newline and blanks to format the output.
3. Close the file
4. Open the file again, but you have to append some lines.
5. Add some text and data to the file.

In [None]:
# 1.


In [None]:
# 2. 


In [None]:
# 3.


In [None]:
# 4.


In [None]:
# 5.


<br>

## Read binary data

There are different packages available for reading and writing binary data. The package Numpy can be used to write arrays (sequences of values) to an file.

For simplicity we skip the binary part, since it is rarely used.

<br>


## Read CSV file

The CSV (comma separated values) format is very often used to store ASCII data in an file. Programns like Excel can import and export these format which makes it easy for the exchange of data. The values are separated by a delimiter (default: comma) which can be set or changed by the user. 

The Python module **csv** provides classes for reading and writing this form of data. 

```python
import csv
```

The `reader()` class allows us to read the file content into an iterable _reader_ object. A `delimiter` can be set and, if needed, you can skip initial blanks with the `skipinitialspace=True`.

To read only the first 6 lines of the CSV file we use _enumerate_ to get an iterable object:

```python
with open('../data/World_Nations_population_land_area.txt') as csvfile:
    data = csv.reader(csvfile, delimiter='\t')
    for index, row in enumerate(data):
        if index <=5:
            print(row)
```

```bash
['# Country', 'Population(2020)', 'Land Area (Km²)', 'Density(P/Km²)']
['1', 'Afghanistan', '38928346', '652860', '60']
['2', 'Albania', '2877797', '27400', '105']
['3', 'Algeria', '43851044', '2381740', '18']
['4', 'Andorra', '77265', '470', '164']
['5', 'Angola', '32866272', '1246700', '26']
```


In [None]:
import csv

with open('../data/World_Nations_population_land_area.txt') as csvfile:
    data = csv.reader(csvfile, delimiter='\t')
    for index, row in enumerate(data):
        if index <=5:
            print(row)

<br>

When the CSV file has a more unlike format because the delimiter is not one character some work has to be done.

Example input file:

```
"Number" | "Name" | "Type"
1 | "Balu" | "Bear"
2 | "Groot" | "Tree"
3 | "Chewbacca" | "Wookiee"
4 | "Duffy" | "Duck"

```

The initial blank character can be skipped by setting _skipinitialspace_ to True.

```python
with open('../data/species_not_real.csv') as infile:
    species = csv.reader(infile, delimiter='|', skipinitialspace=True)
    for row in species:
        print(row)
```

Output:

```bash
['Number ', 'Name ', 'Type']
['1 ', 'Balu ', 'Bear']
['2 ', 'Groot ', 'Tree']
['3 ', 'Chewbacca ', 'Wookiee']
['4 ', 'Duffy ', 'Duck']
```

The trailing blank character of the first two elements of each row will remain. The `strip()` method will delete it anyway.

```python
with open('../data/species_not_real.csv') as infile:
    species = csv.reader(infile, delimiter='|', skipinitialspace=True)
    for row in species:
        row = row.strip()
        print(row)
```

The output is

```bash
['Number', 'Name', 'Type']
['1', 'Balu', 'Bear']
['2', 'Groot', 'Tree']
['3', 'Chewbacca', 'Wookiee']
['4', 'Duffy', 'Duck']
```

<br>

In [None]:
with open('../data/species_not_real.csv') as infile:
    species = csv.reader(infile, delimiter='|', skipinitialspace=True)
    for row in species:
        row = list([x.strip() for x in row])
        print(row)

<br>

With the object class DictReader() the content of the CSV file can be read as a dictionary. But we already stick with the initial and trailing blank problem. To solve this issue we define a function which will remove the blanks.

```python
def strip_dict(d):
    return { key.strip() : strip_dict(value)
             if isinstance(value, dict)
             else value.strip()
             for key, value in d.items() }

with open('../data/species_not_real.csv') as infile:
    species = csv.DictReader(infile, delimiter='|', skipinitialspace=True)
    for row in species:
        row1 = strip_dict(row)
        print(row1)
```

```bash
{'Number': '1', 'Name': 'Balu', 'Type': 'Bear'}
{'Number': '2', 'Name': 'Groot', 'Type': 'Tree'}
{'Number': '3', 'Name': 'Chewbacca', 'Type': 'Wookiee'}
{'Number': '4', 'Name': 'Duffy', 'Type': 'Duck'}
```


In [None]:
def strip_dict(d):
    return { key.strip() : strip_dict(value)
             if isinstance(value, dict)
             else value.strip()
             for key, value in d.items() }

with open('../data/species_not_real.csv') as infile:
    species = csv.DictReader(infile, delimiter='|', skipinitialspace=True)
    for row in species:
        row1 = strip_dict(row)
        print(row1)

<br>

## Write CSV file

Just as we expect, there is a `writer()` class for writing data to a file in CSV format.

Let's assume we want to write the following values to a file with a comma as delimiter.

```python
a = [1,2,3,4,5,6]

with open('./test.csv', 'w', encoding='UTF8') as fout:
    writer = csv.writer(fout, delimiter=',')
    writer.writerow(a)
```

File content test.csv:

```bash
1,2,3,4,5,6
```

<br>


In [None]:
a = [1,2,3,4,5,6]

with open('./test.csv', 'w', encoding='UTF8') as fout:
    writer = csv.writer(fout, delimiter=',')
    writer.writerow(a)

<br>

You can combine multiple lists using `writerow()` for a single list and `writerows()` for nested lists.

A good example of use is to save a header and its associated data as follows:

```python
header = ['CONTINENT', 'NAME', 'ISO2', 'ISO3']

data = [['Africa', 'Algeria', 'DZ', 'DZA'],
        ['Africa', 'Angola', 'AO', 'AGO'],
        ['Africa', 'Benin', 'BJ', 'BEN'],
        ['Africa', 'Botswana', 'BW', 'BWA'],
        ['Africa', 'Burkina Faso', 'BF', 'BFA']]

with open('./countries.csv', 'w', encoding='UTF8') as f:
    writer = csv.writer(f, delimiter=',' )
    writer.writerow(header)
    writer.writerows(data)
```

File content of countries.csv:

```bash
CONTINENT,NAME,ISO2,ISO3
Africa,Algeria,DZ,DZA
Africa,Angola,AO,AGO
Africa,Benin,BJ,BEN
Africa,Botswana,BW,BWA
Africa,Burkina Faso,BF,BFA
```

<br>


In [None]:
header = ['CONTINENT', 'NAME', 'ISO2', 'ISO3']

data = [['Africa', 'Algeria', 'DZ', 'DZA'],
        ['Africa', 'Angola', 'AO', 'AGO'],
        ['Africa', 'Benin', 'BJ', 'BEN'],
        ['Africa', 'Botswana', 'BW', 'BWA'],
        ['Africa', 'Burkina Faso', 'BF', 'BFA']]

with open('./countries.csv', 'w', encoding='UTF8') as f:
    writer = csv.writer(f, delimiter=',' )
    writer.writerow(header)
    writer.writerows(data)

<br>

<h2 style="color:red"> Exercise </h2>

Input data: https://www.worldometers.info/world-population/population-by-country/

1. Create a list that contains the header information: name, population, density
1. Create a list of the 5 countries with the largest population
1. Write the CSV data with a semicolon as delimiter to the file _country_population.csv_

<br>


In [None]:
# 1.


In [None]:
# 2.


In [None]:
# 3.
