# <div class = "alert alert-info"> <font color = purple> Chapter 06 - File Handling 1

## 6.1 Data Input and Output

In **Chapter 05**, we looked at how to write functions that take in input data and return output data.

The input data has to come from some source, such as a text file, a database or a server on the internet.

Usually the output will also need to be stored somewhere such as a text file or a database.

As such operations involve **in**put and/or **out**put of data, they are referred to as **In/Out** operations, or **IO** in short.

Python provides a built-in function, `open()`, that lets you open text files for reading, writing, or appending (editing).

## 6.2 The `open()` function

The `open()` function requires at least one argument: the filename.

`open('data.txt', 'r')` will look in the current directory for a file named `data.txt` and open it for **r**eading; you will not be able to write data to it.

`open('data.txt', 'w')` will open the `data.txt` file and **w**rite new data to it, overwriting any existing data. If `data.txt` does not exist, an empty file will first be created.

`open('data.txt', 'a')` will open the `data.txt` file for **a**ppending, causing new data to be written after the last line of the existing file. If `data.txt` does not exist, an empty file will first be created.

You can see that the second argument specifies the **mode** to open the file in.

Arguments and their associated modes

- `'r'` opens the file in **read** mode (no writing)
- `'w'` opens the file in **write** mode (no reading)
- `'x'` creates a new file and opens it for writing (if file doesn't exist)
- `'a'` opens the file for writing in **append** mode
- `'+'` opens the file for both reading and writing

There are also two modes that differ in the way data is read: text or binary.

- `'t'` opens the file in **text** mode (this is the default way to read data)
- `'b'` opens the file in **binary** mode

There are other optional arguments for advanced users, which we will not need.

<font color=red>**CAUTION**</font>
<br>
<font color=red>Be very careful when deciding whether data should be **w**ritten or **a**ppended. A wrong decision may result in loss of existing data!</font>

See `help(open)` for more details on usage.

In [None]:
help(open)

In [49]:
file_handle = open('abc.txt', 'w+')
file_handle.close()
import os
os.remove('D:\computing-notes\Computing-notes\Computing_notes\Chapter-06\\words_2.txt')

## 6.3 Opening a text file for reading

In more advanced Python programs, we may have multiple files being opened e.g. there may be one file to store settings (`settings.ini`), one file to store logs for troubleshooting (`errors.log`), and other files which users of the program may be working on (`mydocument.docx`).

Each file that is opened has a **file handle** associated with it. 

A file handle is another Python object, with its own attributes and methods.

Run the code cell below to examine the output from `type()` and `dir()`:

In [3]:
file_handle = open('data.txt', 'w')

print(f'type: {type(file_handle)}')
print(f'attributes and methods: {dir(file_handle)}')
file_handle.close()

type: <class '_io.TextIOWrapper'>
attributes and methods: ['_CHUNK_SIZE', '__class__', '__del__', '__delattr__', '__dict__', '__dir__', '__doc__', '__enter__', '__eq__', '__exit__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__lt__', '__ne__', '__new__', '__next__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '_checkClosed', '_checkReadable', '_checkSeekable', '_checkWritable', '_finalizing', 'buffer', 'close', 'closed', 'detach', 'encoding', 'errors', 'fileno', 'flush', 'isatty', 'line_buffering', 'mode', 'name', 'newlines', 'read', 'readable', 'readline', 'readlines', 'seek', 'seekable', 'tell', 'truncate', 'writable', 'write', 'writelines']


The methods `read()`, `readline()`, and `readlines()` in `file_handle` each serve a different purpose.

Run each code cell below. Using any necessary code, examine the `data` variable and explain what each method does.

What argument(s) does it require? What output does it produce?

You can open `words.txt` in another window and compare the output to discern the function of `read()`.

In [1]:
file_handle = open('words.txt', 'r')
data = file_handle.read()
file_handle.close()
# Write any additional code you need below this line, to examine data
print(data)

a
aa
aaa
aah
aahed
aahing
aahs
aal
aalii
aaliis
aals
aam
aani
aardvark
aardvarks
aardwolf
aardwolves
aargh
aaron
aaronic
aaronical
aaronite
aaronitic
aarrgh
aarrghh
aaru
aas
aasvogel
aasvogels
ab
aba
ababdeh
ababua
abac
abaca
abacay
abacas
abacate
abacaxi
abaci
abacinate
abacination
abacisci
abaciscus
abacist
aback
abacli
abacot
abacterial
abactinal
abactinally
abaction
abactor
abaculi
abaculus
abacus
abacuses
abada
abaddon
abadejo
abadengo
abadia
abadite
abaff
abaft
abay
abayah
abaisance
abaised
abaiser
abaisse
abaissed
abaka
abakas
abalation
abalienate
abalienated
abalienating
abalienation
abalone
abalones
abama
abamp
abampere
abamperes
abamps
aband
abandon
abandonable
abandoned
abandonedly
abandonee
abandoner
abandoners
abandoning
abandonment
abandonments
abandons
abandum
abanet
abanga
abanic
abannition
abantes
abapical
abaptiston
abaptistum
abarambo
abaris
abarthrosis
abarticular
abarticulation
abas
abase
abased
abasedly
abasedness
abasement
abasements
abaser
abasers
abases
abasgi


In [40]:
data

['a\n', 'aa\n', 'aaa\n', 'aah\n', 'aahed\n']

Q1: What does the `read()` method of `file_handle` do?

A1: Processes all the data in the source the way it is formatted

In [2]:
data = [] # Initialise empty array
file_handle = open('words.txt', 'r')
data.append(file_handle.readline())
data.append(file_handle.readline())
data.append(file_handle.readline())
data.append(file_handle.readline())
data.append(file_handle.readline())
file_handle.close()

# Write any additional code you need below this line, to examine data
print(data)

['a\n', 'aa\n', 'aaa\n', 'aah\n', 'aahed\n']


Q2: What does the `readline()` method of `file_handle` do?

A2: Processes individual lines recording all the keystrokes used to cretae the line (including the Enter button to next line)

In [70]:
file_handle = open('words.txt', 'r')
data = []

# Write any additional code you need below this line, to examine data
for line in file_handle:
    data.append(line.strip().strip('a'))

file_handle.close()
data

['',
 '',
 '',
 'h',
 'hed',
 'hing',
 'hs',
 'l',
 'lii',
 'liis',
 'ls',
 'm',
 'ni',
 'rdvark',
 'rdvarks',
 'rdwolf',
 'rdwolves',
 'rgh',
 'ron',
 'ronic',
 'ronical',
 'ronite',
 'ronitic',
 'rrgh',
 'rrghh',
 'ru',
 's',
 'svogel',
 'svogels',
 'b',
 'b',
 'babdeh',
 'babu',
 'bac',
 'bac',
 'bacay',
 'bacas',
 'bacate',
 'bacaxi',
 'baci',
 'bacinate',
 'bacination',
 'bacisci',
 'baciscus',
 'bacist',
 'back',
 'bacli',
 'bacot',
 'bacterial',
 'bactinal',
 'bactinally',
 'baction',
 'bactor',
 'baculi',
 'baculus',
 'bacus',
 'bacuses',
 'bad',
 'baddon',
 'badejo',
 'badengo',
 'badi',
 'badite',
 'baff',
 'baft',
 'bay',
 'bayah',
 'baisance',
 'baised',
 'baiser',
 'baisse',
 'baissed',
 'bak',
 'bakas',
 'balation',
 'balienate',
 'balienated',
 'balienating',
 'balienation',
 'balone',
 'balones',
 'bam',
 'bamp',
 'bampere',
 'bamperes',
 'bamps',
 'band',
 'bandon',
 'bandonable',
 'bandoned',
 'bandonedly',
 'bandonee',
 'bandoner',
 'bandoners',
 'bandoning',
 'bando

Q3: What does the `readlines()` method of `file_handle` do?

A3: Reads all data into a list, including all keystrokes

### Python conventions: file handle naming

If you are going to keep a file open for a long time and need to use the file handle in multiple functions, it is good practice to give the file handle a useful, descriptive variable name, such as `data_file`, `settings_file`, etc.

However, if you are only opening the file to read in data and then closing the file immediately (with the `close()` method of the file handle), such as in the examples above, it is acceptable (and common) to use a short variable name, often `f` or `f_handle`.

In the rest of the examples, we will be using `f` as a file handle name for quick file reading/writing.

## Opening a text file for writing

The file handle has two methods for writing data to a file: `write()` and `writelines()`.

Run each code cell below. Using any necessary code, examine the file `fruits.txt` that is created in the same directory, and explain what each method does. What argument(s) does it require? What output does it produce?

In [73]:
fruit = ['apple', 'banana', 'cherry', 'durian', 'elderberry']

f = open('fruits.txt', 'w')
for each in fruit:
    f.write(each + '\n')
f.close()

Q4: What does the `write()` method of `f` do?

A4: Writes the argument specified in ```write()``` into the file as is

<font color = red> note:
- '\n' to include new line for next item
- '\t' for items to be put in the same line but spaced apart
- ' '*n for evenly spaced items onn the same line

In [76]:
help(f.writelines)

Help on built-in function writelines:

writelines(lines, /) method of _io.TextIOWrapper instance



In [77]:
fruit = ['apple', 'banana', 'cherry', 'durian', 'elderberry']

f = open('fruits.txt', 'w', newline = '\n')
f.writelines(fruit)
f.close()

Q5: What does the `writelines()` method of `f` do?

A5: Writes items into file based on format of item

## Special characters: `\n` et al.

From the above tasks, you would have noticed the `\n`s that came from the output of the `read()` method.

When Python reads in data from a text file, the data is just one long string of text. How does it know where the next line begins?

Text format follows a specification known as [ASCII](http://www.asciitable.com/), which not only specifies how to store letters, numbers, and symbols, but also special function characters. We will learn more about ASCII in a future lesson. Some of these special function characters are listed below.

- newline character (`\n`)
- tab character (`\t`)
- backspace (`\b`)
- carriage return (`\r`)

These special function characters can be invoked through the use of the **escape character** '`\`'. This enables us to use symbols that we wouldn't otherwise be able to use.

Try each of the follow lines of code one by one in the code cell below to see examples of how to print special characters using the escape character. Doing so is known as **escaping** the character.

1. `print('That\'s great!')`  
   (To use a single-quote or double-quote in a string, you have to escape the quote so that it does not get interpreted as ending the string.)
2. `print('Great job!\b?')`
   (The backspace character deletes the previous character and moves the cursor back by one.)
3. `print('Heading 1\tHeading 2\tHeading 3')`  
   (The tab character moves the cursor to the next alignment mark. Useful for printing text-only tables.)
4. `print('Line 1\nLine 2\nLine 3')`
   (The newline character moves the cursor to the next line and resets its horizontal position to the left.)
5. `print('The special character \\')`
   (Since `\` itself is a special character, if we want to print it in a string, we need to escape it too.)

In [94]:
# Try your code here
print("That's great!")
print("-"*10)
print("That\n's great!")
print("-"*10)
print("That\t 's great!")
print("-"*10)
print("That\b's great!")
print("-"*10)
print("That\r's great!")

That's great!
----------
That
's great!
----------
That	 's great!
----------
Tha's great!
----------
's great!


All those `\n`s from the `read()` method? Those indicated where the file should move to the next line.

Did you wonder why the `write()` and `writelines()` methods printed all 5 fruits to `fruits.txt` in a single line? That's because we didn't print an `\n` between each fruit. To have each fruit on a new line, we need to print a `\n` after each fruit.

Complete the program code below by replacing the underscores (`_____`) with an appropriate expression or variable, so that it writes each fruit to a separate line in the text file:

In [None]:
fruit = ['apple', 'banana', 'cherry', 'durian', 'elderberry']

f = open('fruits.txt', 'w')
for each in fruit:
    f.write(each)
    f.write('\n')
f.close()


### Task 1: Tab-delimited output

Modify the code cell below so that it produces a text file with the fruits separated by tabs instead of newlines.

#### Example output

    apple	banana	cherry	durian	elderberry	

In [None]:
fruit = ['apple', 'banana', 'cherry', 'durian', 'elderberry']

f = open('fruits.txt', 'w')
for each in fruit:
    f.write(each)
    f.write('\t')
f.close()


### Task 2: Comma-delimited output

Modify the code cell below so that it produces a text file with the fruits separated by commas instead of newlines.

#### Example output

    apple,banana,cherry,durian,elderberry

In [None]:
fruit = ['apple', 'banana', 'cherry', 'durian', 'elderberry']

f = open('fruits.txt', 'w')
for each in fruit:
    f.write(each)
    f.write(',')
f.close()


## Iterating over the file handle

The `readline()` method only reads in one line and moves the cursor to the next line. That makes it difficult to read in all data unless we know how many lines there are in the file.

For convenience, when in text mode (`'t'`), Python file handles can also be iterated over using a `for` loop. Each iteration returns a value that is equivalent to the output from `readline()`.  
(Run the code cell below.)

In [None]:
data = [] # Initialise empty array
f = open('words.txt', 'r')
for line in f:
    data.append(line)
f.close()

data

Each `line` is equivalent to the output from `f.readline()`, and the final list object `data` is equivalent to the output from `f.readlines()`. 

## Stripping whitespace

To get rid of the trailing newline `\n`, use the `strip()` string method. Newlines (`\n`) and tabs (`\t`) are treated as whitespace and will be removed.
(Run the code cell below.)

In [None]:
data = [] # Initialise empty array
f = open('words.txt', 'r')
for line in f:
    data.append(line.strip())
f.close()

data

## Appending to a file

Notice that each time you open `fruits.txt` in `'w'` mode, the old data is wiped out. You don't always want this; sometimes you just want to add more data at the end.

To do so, you need to open the file in `'a'` mode. Run the code cell below to see how it works.

In [None]:
# Let's reset the state of fruits.txt first
fruit = ['apple', 'banana', 'cherry', 'durian', 'elderberry']

# Write these fruits to fruits.txt
f = open('fruits.txt', 'w')
for each in fruit:
    f.write(each)
    f.write('\n')
f.close()

# Let's append some data after the last line
more_fruits = ['figs', 'guava', 'honeydew']
f = open('fruits.txt', 'a')
for each in more_fruits:
    f.write(each)
    f.write('\n')
f.close()

# File locks and file handles

## Context managers: the `with` keyword

In Python, you often work with items that need to be closed properly, such as file handles. There are other things you may need to work in future with that also need proper closing, such as

- network sockets and connections
- databases
- execution threads for parallel processing

If you forget to `close()` the file handle, you may end up hogging resources, or cause a program to hang.

Python has a feature, called a **context manager**, to help you automatically close any open resources once your code is done with them. This feature can be used for file handles too.

The context manager is invoked using the `with` keyword.

### Using `with` for opening files

This block of code:

    f = open('words.txt', 'r')
    data = f.readlines()
    f.close()

is equivalent to this:

    with open('words.txt', 'r') as f:
        data = f.readlines()

`f.close()` does not need to be called explicitly, because upon reaching the end of the `with` code block, Python will call it automatically.

To ensure you do not run into any problems due to file-locking, it is good practice to always open files using the `with` statement, unless you have good reasons not to do so.

### Task 3: Refactoring

The act of rewriting code to better organise it is known as **refactoring**.

Refactor the code cell below to use the `with` statement: 

In [None]:
# Let's reset the state of fruits.txt first
fruit = ['apple', 'banana', 'cherry', 'durian', 'elderberry']

f = open('fruits.txt', 'w')
for each in fruit:
    f.write(each)
    f.write('\n')
f.close()

# Let's append some data after the last line
more_fruits = ['figs', 'guava', 'honeydew']

f = open('fruits.txt', 'a')
for each in more_fruits:
    f.write(each)
    f.write('\n')
f.close()


## Useful string methods: `split()`, `join()`

The [Comma-Separated Values](https://en.wikipedia.org/wiki/Comma-separated_values) format (**CSV**) is a popular format for storing organised data, such as those from spreadsheets or databases.

In this format, each row of data is stored on a new line, while values from each cell within the row are separated by commas (`,`). In some variations, tabs (`\t`) or semicolons (`;`) may also be used. The separating symbol is known as a **delimiter** (_delimit_ (verb): "determine the limits or boundaries of").

The `split()` method from strings comes in very handy for parsing each item of data from the entire line.

### The `split()` string method

Run the code cell below to understand how the `split()` method works:

In [None]:
line = 'data1,data2,data3,data4,data5'

line.split(',')

Notice that the output of `split()` is a `list`, not a string. Each item that is separated by a comma (`,`) is now an element of the list.

### The `join()` string method

Run the code cell below to understand how the `join()` method works:

In [None]:
line = ['data1', 'data2', 'data3', 'data4', 'data5']

','.join(line)

With `split()`, we called the method from `line` which was a string. However, with `join()`, the variable `line` is not a string but a list! It would not have access to the `join()` method.

So we call `join()` from the string that we use to **join** the elements of `list`. `join()` must take in a list as its argument, and it returns a string output.

## Reading CSV data into a nested list

One way to parse CSV data into Python is in the form of a **nested list**.  
(Run the code cell below.)

In [None]:
data = []
with open('grades.csv', 'r') as f:
    for line in f:
        row_list = line.strip().split(',')
        data.append(row_list)
        
data

Notice the **two levels of lists**. Each inner list represents one line of data, while the outer list contains each inner list as an element.

How are we going to get data out of this nested list? We will need two indexes.

The first index selects the inner list to access:  
(Run the code cell below.)

In [None]:
# Replace the underscores (_____) with an appropriate expression
# to print each line of data

for _____ in range(_____):
    print(_____)

`data[0]` is the first element in the outer list, `data`.

```
>>> data[0]
['A', '70']
```

The first element in `data`, `data[0]` is a list containing two elements. To access the first element of this list, we add another index after the first:

```
>>> data[0][0]
'A'
>>> data[0][1]
'70'
```

To access the elements of the **next row**, we change the **first index**:

```
>>> data[1][0]
'B'
>>> data[1][1]
'60'
```

We can then print `data` into a custom format in a `for` loop:  
(Run the code cell below.)

In [None]:
for row in data:
    print(f'{row[0]}: {row[1]}')

Of course, you do not need to name the iterating variable as `row`, you can use `i` too. But it is good practice to write code and name your variables as intuitively and systematically as possible, so that it is easy for other programmers (including yourself) to understand the code when you have to read it again.