# Introduction to Computer Programming

## Week 7.1: Reading & Writing Files

* * *

<img src="img/full-colour-logo-UoB.png" alt="Bristol" style="width: 300px;"/>

Open Spyder now 

__Reading files__ : Importing data (e.g. experiment results) into a program

__Writing files__ : Exporting data - storing data outside of the program. <br>(e.g. output of a calculation) 

__Delimited file:__<br>
Uses a set character (tab, space, vertical bar etc) to separate values.

__CSV (comma-seperated-value):__ <br>A *delimited* text file that uses a comma to separate values.

The CSV file is a widely used format for storing tabular data in plain text   and is supported by software applications e.g. Microsoft Excel, Google Spreadsheet.

<img src="img/csv_file_example.png" alt="Bristol" style="width: 400px;"/>

Built-in Python functions for reading and writing text data files (.txt, .csv, .dat): 
- `open()`
- `write()`
- `close()` 

Before a file can be read or written to, it must be opened using the `open()` function. 

```python
open(file_path, mode_specifier)
```



__Mode specifier:__ <br> An open file can be read, overwritten, or added to, depending on the mode specifier used to open it. 


| Mode specifier | Read (R)/Write (W)| File must already exist | If no file exists  | `write() `| Stream position when opened|
| :-: | :-: | :-: | :-: | :-: | :-: | 
| `r` | R |Yes | N/A | N/A | start |
| `w` | W |No | Creates new file | overwrites all previous contents | start |
| `a` | W |No | Creates new file | appends text to end of file| end |
| `r+` | R+W |Yes | N/A | overwrites previous contents | start |
| `w+` | R+W |No | Creates new file | overwrites all previous contents | start |
| `a+` | R+W | No | Creates new file | appends text to end of file | end |

__append__: start writing at end of file<br>
__write__: start writing at beginning of file

# Importing a file from a different directory

When using `open` we must give the full file path. 

Like when importing Python files/modules, often we want to read/write a file located in a different directory. 

The directory must already exist. 

The file does not need to already exist if writing (`a`, `a+`, `w`, `w+`)

## Downstream file location

`/` is used to indicate a sub-directory downstream of the current location.


```python

Documents/
│
├── Folder_1/
│   └── myScores.txt
│
├── Folder_2/
│   └── scores.txt
│
└── read_write.py

```

__Example:__ Open a downstream file within `read_write.py`:

`f = open('Folder_1/myScores.txt', 'w')`

## Upstream file location
`../` is used to indicate a location one directory upstream of the current location.


```python

Documents/
│
├── Folder_1/
│   └── read_write.py
│
├── Folder_2/
│   └── scores.txt
│
└── myScores.txt

```

__Example:__ Open an upstream file within `read_write.py`:

    f = open('../myScores.txt', 'w')

__Example:__ Open a file in a different directory at the same level as the directory containing `read_write.py:    

    f = open('../Folder_2/scores.txt', 'w')

Once the file is open, it creates a *file object* (instance of file class).  

A class object (instance of a class) can have methods.

Methods are actions or functions that the file object is able to perform (`write`, `close`...) 

# Writing files `w`

`write` can be used to write string data to a text file.

```python
f = open('my_file.txt', 'w') # mode specifier to write

f.write('hello world')

f.close()
```

A file type that is often used to store tabulated data is the .csv file.

.csv files can be opened in spreadsheet programs like excel  

A .csv file is simply a text file, with row items separated (or *delimited*) by commas.

__Example:__ <br>
Write the high score table shown to a new file with the filename scores.csv

| |  | 
| :-: | :-: |  
| Elena | 550 | 
| Sajid | 480 | 
| Tom | 380 | 
| Farhad | 305 | 
| Manesha | 150 | 

In [1]:
names = ['Elena', 'Sajid', 'Tom', 'Farhad', 'Manesha']
scores = [550, 480, 380, 305, 150]

In [5]:
f = open('sample_data/scores.txt', 'w')

for n, s in zip(names, scores):
    f.write(n + ' ' + str(s) + '\n')
    
f.close()

##### Try it yourself
__Example:__ <br>
Write the high score table shown to a new file with the filename scores.txt 

| |  | 
| :-: | :-: |  
| Elena | 550 | 
| Sajid | 480 | 
| Tom | 380 | 
| Farhad | 305 | 
| Manesha | 150 | 

### Closing Files
Why do we need to close a file?
1. Not automatically closed.  
3. Saves changes to file.
4. Can prevent other prgrams from accessing file

`close` is just a method, belonging to the file object. 

The simplest open-close process is shown.  

This will erase the contents of / create a new file `file.txt` in the folder `sample_data`

In [363]:
open('sample_data/file.txt', 'w').close()


# Appending files `a`

__Example:__ Append (add a new entry to the end of) scores.txt so that the table reads

(Code structure identical to write, apart from mode specifier) 
  
| |  | 
| :-: | :-: |  
| Elena | 550 | 
| Sajid | 480 | 
| Tom | 380 | 
| Farhad | 305 | 
| Manesha | 150 | 
| Jen | 100 |


In [7]:
f = open('sample_data/scores.txt', 'a')

f.write('Jen 100\n')

f.close()


# Reading Files `r`

(Default argument so mode specifier can be omitted.)  

File object is:
- iterable (can use for loop etc)
- not subscriptable (cannot index individual elements)

In [9]:
f = open('sample_data/scores.txt', 'r')


for line in f:
    print(line)

TypeError: '_io.TextIOWrapper' object is not subscriptable



The __stream position__:

- can be thought of as a curser. 
- goes to end of file when an operation run on file object
- can be returned to start (or any position) with `seek` 

In [12]:
f = open('sample_data/scores.txt', 'r')    


for line in f:  # iterable
    print(line, end='') # each line is a string
    
#f.seek(0)      # stream position goes to end of file when operation run on file object
                # can be returned to start with seek 
    
for line in f:
    print(line, end='') 
    
f.close()

Elena 550
Sajid 480
Tom 380
Farhad 305
Manesha 150
Jen 100


If we convert the file object to a list:
- it is subsriptable 
- the stream position of the list doesn't need to be reset after each operation
- the stream position of the file object is at the end of the file after the list conversion operation

__Example:__ 

Print the list of names and a list of scores from the file `'sample_data/scores.txt'`

Print the name and score of the winner from the file `scores.txt'` 

In [16]:
f = open('sample_data/scores.txt')

file = list(f)

#print(file[0:2])

for line in file[:2]:
    print(line)

f.close()

Elena 550

Sajid 480



##### Try it yourself

__Example:__ 

Print the first three names and scores from the file you created earlier `'scores.txt'`


# Reading and Writing with `r+`, `w+`, `a+`

All modes can be used to read and write files. 

Differences that determine which to use:
- Stream position when opened
- How the stream position when opened affects `write() `


<br>

<br>


| Mode specifier | Read (R)/Write (W)| File must already exist | If no file exists  | `write()`| Stream position when opened|
| :-: | :-: | :-: | :-: | :-: | :-: | 
| `r+` | R+W |Yes | N/A | overwrites previous contents | start |
| `w+` | R+W |No | Creates new file | overwrites all previous contents | start |
| `a+` | R+W | No | Creates new file | appends text to end of file | end |



# `a+`
__Example__: When we want to edit (append only) and read.


The stream position is: 
- at the *end* when opened (must be moved to the start to read). 
- always moved to the *end* before writing when `write` is called (previous contents never overwritten).
- at the *end* after writing.  


In [17]:
f = open('sample_data/scores.txt', 'a+')

f.write('Tim 50\nMajid 500\n')

f.seek(0)

for line in f:
    print(line, end='')
    
f.close()

Elena 550
Sajid 480
Tom 380
Farhad 305
Manesha 150
Jen 100
Tim 50
Majid 500


# `r+`
__Example__: When we want to read and edit.

The stream position is: 
- at the *start* when opened  
- at the *end* after reading 

In [19]:
f = open('sample_data/scores.txt', 'r+')

for line in f:
    print(line, end='')
    
f.write('Ben 50\n')
f.write('Ola 500\n')

f.seek(0)

for line in f:
    print(line, end='')
    
f.close()
    

Elena 550
Sajid 480
Tom 380
Farhad 305
Manesha 150
Jen 100
Tim 50
Majid 500
Elena 550
Sajid 480
Tom 380
Farhad 305
Manesha 150
Jen 100
Tim 50
Majid 500
Ben 50
Ola 500


# `w+`
__Example__: When we want to overwrite file then read 

The stream position is:
- at the *start* when opened (previous contents overwritten).
- at the *end* after writing (subsequent lines added using `write` will appended the file, not overwrite previous contents, until file is closed). 

Writing *must* happen before reading. 


In [21]:
f = open('sample_data/scores.txt', 'w+')

f.write('Tim 50\nMajid 500\n')

for line in f:
    print(line)
    
f.write('Ola 500\n')

f.seek(0)

for line in f:
    print(line)
    
    


Tim 50

Majid 500

Ola 500




# Editing file contents  - a word of warning! 

Unlike the `a+` mode specifier `r+` and `w+` allow writing from *anywhere* in the file.

When characters are overwritten, this can lead to unexpected behaviour.  

__Example:__ Replace the first two items in the table in scores.txt with two new entries. 

In [22]:
f = open('sample_data/scores.txt', 'r+') # stream position at start of file

    
f.write('Sid 50\n')                      # overwrite 
f.write('Jo 20\n')

f.seek(0)

for line in f:                           # read 
    print(line, end='')

    
f.close()

Sid 50
Jo 20
500
Ola 500


The original data is longer than the replacement data. 

Some original charcters are overwritten with new letters and `'\n'` characters. 

The extra characters are left as they were in the original file. 

```python
    Tim 50\nMajid  500\n
    Sid 50\nJo 20\n 
```
   

It is advisable to:
1. convert the data you want to edit to an format to a easy-to-edit Python data structure
1. overwrite the original file

__Example__: Edit the file to remove the unwanted line that reads `500` (between Jo and Ola).  

The file can be erased from a position onwards with `truncate()`, (default position is current position)

In [23]:
f = open('sample_data/scores.txt', 'r+')

file = list(f)

del file[2]

print(file)

f.seek(0)

for line in file:
    f.write(line)
    
f.truncate()

f.close()


['Sid 50\n', 'Jo 20\n', 'Ola 500\n']


In [24]:
f = open('sample_data/scores.txt', 'r+') 

for line in f:               # read file contents
    print(line, end='')
    
f.close()

Sid 50
Jo 20
Ola 500


# Automatically closing files
It can be easy to forget to close a file with `close()`

`with open()` can be used instead of `open()` to remove the need for `close()`:

In [25]:
with open('sample_data/scores.txt', 'a') as f:
    f.write('Ria 460 \n')
    
print('next bit of the program') # Code unindents. File automatically closed


next bit of the program


In [26]:
with open('sample_data/scores.txt', 'r') as f:
    print(f.read())
    

Sid 50
Jo 20
Ola 500
Ria 460 



# Summary 
- Python functions for reading and writing files: `open()`, `read()`, `write()`, `close()`
- The __mode specifier__ defines operations that can be performed on the opened file 
- Files must always be closed after opening
- Files can be automatically closed by opening with `with open`


__Extra Example:__ Change the first row of scores.txt to Mia

In [369]:
with open('sample_data/scores.txt', 'r+') as f:

    file = list(f)                        # convert to list of strings (lines)

    L = [line.split() for line in file]   # list of lists
    print(L)
    names = [i[0] for i in L]             # names and scores
    scores = [i[1] for i in L]
    
    ##################################
    
    names[0] = 'Mia'                      # replace element 0
    scores[0] = '700'
    
    f.seek(0)                             # go to start
    
    for n, s in zip(names, scores):
        f.write(n + ' ' + s + '\n')       # over write original contents 
    
    f.truncate()                          # trim any trailing characters
    

[['Sid', '50'], ['Jo', '20'], [], ['Ola', '500'], ['Ria', '460'], ['Ria', '460']]


IndexError: list index out of range