# Computer Programming and Algorithms

## Week 6.1: Reading Files

<img src="img/full-colour-logo-UoB.png" alt="Bristol" style="width: 300px;"/>

# Aims

In this video we will learn:

* Opening and closing a file using a computer program
* Reading different types of file

In the following examples we will use:
* use the __local path__, not the __global path__
* use file path notation for Unix systems (forward slash `/`) so remember to chage this to backslash `\` if you are using windows
* work on reading file located in the same directory (downstream) of a computer program

# Opening and closing a file using a computer program

Consider the file system below

```python
Week_6/
|
|--- Example_1/
        |
        |--- program_1.py
        |--- README.txt 
```


We create a *file object* in program_1.py using:

```python
file = open('README.txt')
```

__Object__: A data field that has unique attributes and behaviour (int, string, list...)

Just like other objects, you can give the file object a name of your choosing

```python
reader = open('README.txt')
```

```python
my_data = open('README.txt')
```

Some methods that belong to the `file` object:

`read`: reads the contents of the file

`close`: closes the file <br>(Before the program exits, the file must be closed)

In [104]:
file = open('README.txt')

print(file.read())

file.close()

Computer programming and algorithms



Another way to create a file object:

In [107]:
with open('README.txt') as file:
    print(file.read())

Computer programming and algorithms



Notice that the second line is indented with respect to the first. 

The `with` statement closes the file at the end of the indented block of code.

There is no need to use `close`.

This avoid the situation where the file is left open if:
- you forget to include `close`
- the program terminates due to an error before `close` is executed

We will use the `with open()` structure in all following examples 

# What is a file?

A file is a set of bytes (a unit of data that is eight binary digits long) used to store data. 

The file type determines what these bytes represent. 

In a `.txt` file, each byte represents a character. 

This mapping of bytes to characters is called an *encoding*.

UTF-8 is a widely used encoding e.g.

A = 00101001 (41 in decimal)

B = 00101010 (42 in decimal) 

C = 00101011 (43 in decimal) 

When `open` is used, the UTF-8 encoding is used, meaning and we see characters, not bytes

In [118]:
with open('README.txt') as file:
    print(file.read())

Computer programming and algorithms



The first 3 characters of 'README.txt' decoded from their byte values as:

C = 00101011 (43 in decimal) 

o = 01101111 (111 in decimal)

m = 01101101 (109 in decimal) 


# Reading different types of data

There are different categories of file object:

__Text files__: Human-readable data. <br>Bytes represent plain text characters <br>e.g. .py, .csv, .json, .txt

__Binary files__: Data that is not intended to be human-readable. <br>Bytes do not represent plain text characters, but other information about the file. <br>e.g. executable programs (.exe, .bin), images (.jpg, .png, .gif), audio (.mp3, .wav), video (.mp4, .avi), compressed files (.zip). 

In [130]:
with open('README.txt') as file:
    print(file.read())

Computer programming and algorithms



In [132]:
with open('snake.png') as file:
    print(file.read())

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 0: invalid start byte

To open a binary file, we give a second argument within the parentheses of the `open` function. 

This is called the *mode*

`rb` represents `r`ead and `b`inary

In [83]:
with open('snake.png', 'rb') as file:
    print(file.read())

b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x01\\\x00\x00\x01\xa2\x08\x03\x00\x00\x00\x9cP\x8ew\x00\x00\x00TPLTEx\xbc>)F\t\x80\xc7D\xfb\xf9\x9fJ\x81\x0f\xcb\xcbd\x05\x06\x03\xef\xefv\xef\xee\xef\xff\xff\xff]\x9b!\xa8\xeee $\x18\xdc\xdcn\xdb\xdb\xdcj\xaa0\xde\x0bHL_#\x98\xdbY\r7[LLT\x9e\x9f\xa4wy\x80\xc0\xc1\xc4\x83\x82C\xa9\xa9V\x81\r-\xfba\x92_\xb6\xb4\xa5\x00\x00 \x00IDATx\xda\xec\x9d\x89v\xe3\xaa\x12E\x85\xa2`I.\xc1\x92\xc2`\xff\xff\x8f>\x8a\x19\r\xb6\xf3\xec\xbe\xdd\t&\xe9L\x02k\xf5\xe6\xe8P\x8cn\x862\xf5ez_}\xe6j\xf3F\xf0\x86\xfb\x86\xfb\xbe\xfa\x86\xfb\x86\xfb\x86\xfb\xbe\xfa#\xe0\xae\xf2\xbd\xe1\xbe\xf0j\xaf\xa5T\x98$\xb39\xdfpo_\xf5\xcc\x10\x98\xd4\xfd\xe6z\x0eV\t\xa0\x14\x80\x10\x02\x14\x946\x7fz\xc3\xbduu\x18\x98\xb4\xc8\x90\x18~\x17\n\to\xcbb.\xd25\xcb\xd9\'N\xa8\xd0\xc3\x1b\xee\xf1U\xa3Fa\x948zf\xcb\xd2\x8c\xb3\x17\xe5P\x94\x95@gn3-\xfe\xe3|n\x0c^\xf6\x86{pu\xe8\r3\xc2=\xd8 \xc9s\xdb\xe5|=\xda\xce\x91\xf5`\x17\xfc\xc9\xe0\x05\x90o\xb8{W\x87A\x0b:\xb7\x9e\xec\x82\xa9]\x16\

Like text data, these characters represent bytes of data, and each byte has a meaning. 

Examples

`\x89`= start of png file = `0x89` =  137 (decimal) = 10001001 (binary byte) = 

`PNG` = `0x50 0x4E 0x47` (PNG in ASCII encoding) = 80  78  71    = 1010000   1001110    1000111

`\n` = Unix style line ending (Windows line ending is `\r\n`) =  `0x0A` =  10 = 00001010

Unlike 

In [81]:
with open('snake.png', 'rb') as file:
    print(file.read(1))
    print(file.read(3))
    print(file.read(2))
    print(file.read(1))
    print(file.read(1))

b'\x89'
b'PNG'
b'\r\n'
b'\x1a'
b'\n'


In [85]:
with open('Document.docx', 'rb') as file:
    print(file.read())

b'PK\x03\x04\x14\x00\x00\x00\x08\x00\xaf\x98IJ\x8d\x130\xea:\x01\x00\x00\xa7\x02\x00\x00\x10\x00\x00\x00docProps/app.xml\xad\x92\xcdN\x021\x14\x85_\xa5\xe9\xde\xe9\xe0\x82\x18\xc2\x0c1\xb0p\xa1\xc6\x04\xc4um\xef0\x8d\xfdK{A\xe6\xd9\\\xf8H\xbe\x82\xed \x0c\xea\xce\xd8]\xcf\xf9z\xeeO\xfa\xf1\xf6>\x9d\xed\x8d&;\x08Q9[\xd1QQR\x02V8\xa9\xec\xa6\xa2[l.\xae(\x89\xc8\xad\xe4\xdaY\xa8h\x07\x91\xce\xea)\xf7\x93\x87\xe0<\x04T\x10I\xca\xb0\xb1\xa2-\xa2\x9f0\x16E\x0b\x86\xc7"\xd969\x8d\x0b\x86c\xba\x86\rsM\xa3\x04,\x9c\xd8\x1a\xb0\xc8.\xcbr\xcc`\x8f`%\xc8\x0b\x7f\n\xa4\x87\xc4\xc9\x0e\xff\x1a*\x9d\xc8\xfd\xc5\xf5\xaa\xf3C\x1e\xf7\xff\xd9d\xbf\x85k\xef\xb5\x12\x1c\xd3\xfa\xea;%\x82\x8b\xaeA\xf2\xe4\x82$)\x93`\x0b\xe4\x15\x9e\xa7\xec\x07\x9a\x9f\xa6\nK\x10\xdb\xa0\xb0\xab\xcb\x9e8W2\xb1\x14\\\xc3<U\xac\x1b\xae#\xf4\xcc\xa0eb\xee\x8c\xe7\xb6#\xec\xc0\xb7<\x80L)\xe7\xfcI\xcb\xc4MZG\xd0\xca\xbe\xc4y\xcb\xed\x06\xe4\x19\xf9\xdb\xfb\x1ap}\xf8\x1e\xf5h\\\x94\xe9\x1c\x879\xca\x99Z\x81\xf1\x9a#\xd4\xf7y\x91\

### Need to see some more examples? 
https://w3schoolsua.github.io/python/python_file_handling_en.html#gsc.tab=0
<br>https://www.geeksforgeeks.org/file-handling-python/
<br>https://realpython.com/read-write-files-python/#file-paths

### Want to take a quiz?
https://realpython.com/quizzes/read-write-files-python/
<br>https://pynative.com/python-file-handling-quiz/

### Want some more advanced information?
https://pynative.com/python/file-handling/#:~:text=To%20read%20or%20write%20a,It%20returns%20the%20file%20object.