## Files 

### Reading text files

- prepare a text file , eg. containing 30 decimal places of pi
    
```
pi_digits.txt 

3.1415926535
8979323846
2643383279
```  

- A program that opens this file, read it, and print it

```
with open('pi_digits.txt') as file_object:
    contents = file_object.read()

print(content)
```

- Note:
    - open(): open the file to access it
    - open() needs one argument - the name of the file, the path of the file is relative to the current program
    - open() returns an object representing the file
    - **with** keyword: it will close the file automatically once you've finished accessing the file（ie. leave the with block). No need to call close()
        - You could call close() yourself, but if some error happens between open() and close(), then the file may never be closed.
    - read(size): 
        - To read a file’s contents, which reads some quantity of data and returns it as a string (in text mode) or bytes object (in binary mode). 
        - size is an optional numeric argument. When size is omitted or negative, the entire contents of the file will be read and returned. Otherwise, at most size characters (in text mode) or size bytes (in binary mode) are read and returned. 
        - If the end of the file has been reached, f.read() will return **an empty string ('')**.
    

In [None]:
with open('myfiles/pi_digits.txt') as file_object:
    contents=file_object.read()
print(contents)

In [None]:
with open('myfiles/pi_digits.txt') as file_object:
    while (contents := file_object.read(5)) !='':
        print(contents,end='')

### Read lines
- For reading lines from a file, you can loop over the file object.
- This is memory efficient, fast, and leads to simple code:

```
with open('myfiles/pi_digits.txt') as file_object:
    for line in file_object:
        print(line, end='')
``` 

- If you want to read all the lines of a file in a list you can also use list(f) or f.readlines().

```
with open('myfiles/pi_digits.txt') as file_object:
    #read lines into list: two ways
    #lines = file_object.readlines()
    lines = list(file_object)
```

In [None]:
with open('myfiles/pi_digits.txt') as file_object:
    for line in file_object:
        print(line, end='')

In [None]:
with open('myfiles/pi_digits.txt') as file_object:
    #read lines into list: two ways
    #lines = file_object.readlines()
    lines = list(file_object)
for line in lines:
    print(line,end='')
digits=''
for line in lines:
    digits+=line.rstrip() #string concatenation
print(digits)

### Writing text files

```
filename = 'programming.txt'

with open(filename, 'w') as file_object:
    file_object.write("I love programming.\n")
```

- note:
    - 2nd parameter in open() : 'w' means open the file in the **write mode**
    - Below are the possible mode : **read mode ('r'), write mode ('w'), append mode ('a')**, or a mode that allows you to **read and write** to the file **('r+')**. 
    - If you omit the mode argument, Python opens the file in read-only mode by default.
    - The open() function automatically creates the file you’re writing to if it doesn’t already exist. 
    - However, be careful opening a file in write mode ('w') because if the file does exist, Python will erase the contents of the file before returning the file object.
    - write() method: write a string to the file, return the number of characters written
        - Python can only write strings to a text file. 
        - If you want to store other types of data in a text file, you’ll have to convert the data to string format( eg. using the **str() function**)
        -  write() function doesn’t add any newlines to the text you write

In [None]:
filename = 'programming.txt'

with open(filename, 'w') as file_object:
    file_object.write("I love programming.\n")
    #write a tuple
    value = ('the answer', 42)
    s = str(value)  # convert the tuple to string
    file_object.write(s)

### Append to a file

- If you want to add content to a file instead of writing over existing content, you can open the file in append mode. 
- When you open a file in append mode, Python doesn’t erase the contents of the file before returning the file object.
- Any lines you write to the file will be added at the end of the file. 
- If the file doesn’t exist yet, Python will create an empty file for you.

```
filename = 'programming.txt'

with open(filename, 'a') as file_object:
    file_object.write("I also love finding meaning in large datasets.\n")
    file_object.write("I love creating apps that can run in a browser.\n")

```

In [None]:
filename = 'programming.txt'

with open(filename, 'a') as file_object:
    file_object.write("I also love finding meaning in large datasets.\n")
    file_object.write("I love creating apps that can run in a browser.\n")


### Difference between r+ , w+ and a+ for open()

- Both r+, w+ and a+ can read and write to a file. 
- r+ doesn't delete the content of the file and doesn't create a new file if such file doesn't exist
- w+ deletes the content of the file if it exists(ie. truncates the file to zero length) and creates it if it doesn't exist.
- a+ doesn't delete the content of the file if it exists and creates it if it doesn't exist. 
    - You are able to use seek to move the read cursor to any position in the file, but for a write operation, it will move to the end of file and append the content.

![file mode comparison](open_file_mode_2.png "Open() file mode comparison")

![file mode comparison](open_file_mode.png "Open() file mode comparison")



### Binary files

- When a file is opened in text mode, you read and write strings from and to the file, which are encoded in a specific encoding. If encoding is not specified, the default is platform dependent 
- You can append **'b'** to the file mode, which opens the file in binary mode - it means the data is read or written in the form of bytes objects. This mode should be used for all files that don’t contain text.
- Both read() and write() can be used for binary data.

```
with open('myfiles/binary.bin','rb+') as file_object:
    f.write(b'0123456789abcdef')
```
![binary_viewer_result](binary_viewer.PNG "binary_viewer_result")

In [None]:
with open('myfiles/binary.bin','rb+') as file_object:
    file_object.write(b'0123456789abcdef')

In [None]:
with open('myfiles/binary.bin','wb+') as file_object:
    file_object.write(b'0123456789abcdef')

### Bytes vs Strings

- Bytes are bytes; characters are an abstraction. 
- An immutable sequence of Unicode characters is called a string. 
- An immutable sequence of numbers-between-0-and-255 is called a bytes object.
- To define a bytes object, use the **b'' “byte literal” syntax**. Each byte within the byte literal can be **an ascii character** or an encoded hexadecimal number from **\x00 to \xff (0–255)**.
- The type of a bytes object is bytes.

### Return the hexadecimal representation of the binary data. 

    - https://docs.python.org/3/library/binascii.html
    - binascii.b2a_hex(data[, sep[, bytes_per_sep=1]]) or binascii.hexlify(data[, sep[, bytes_per_sep=1]])
    - Every byte of data is converted into the corresponding 2-digit hex representation. The returned bytes object is therefore twice as long as the length of data.
    
### Return the binary data represented by the hexadecimal string 

    - binascii.a2b_hex(hexstr) or binascii.unhexlify(hexstr)
    - This function is the inverse of b2a_hex(). 
    - hexstr must contain an even number of hexadecimal digits (which can be upper or lower case), otherwise an Error exception is raised.

In [None]:
import binascii
b1 = binascii.b2a_hex(b'\xb9\x01\xef')
b1

In [None]:
b2 = binascii.a2b_hex(b1)
b2

### Example
- Checking if the given image is jpeg or not
- reference： https://www.geeksforgeeks.org/working-with-binary-data-in-python/

In [None]:
import binascii
  
# use binascii.a2b_hex() function to generate bytes value or 
# directly generate byte values using the binary literal format

jpeg_signatures = [
    binascii.a2b_hex(b'FFD8FFD8'),
    binascii.a2b_hex(b'FFD8FFE0'),
    binascii.a2b_hex(b'FFD8FFE1')
]

'''
jpeg_signatures = [
    b'\xFF\xD8\xFF\xD8',
    b'\xFF\xD8\xFF\xE0',
    b'\xFF\xD8\xFF\xE1',
]
''' 

with open('metaverse.jpg', 'rb') as file:
    first_four_bytes = file.read(4)
  
    if first_four_bytes in jpeg_signatures:
        print("JPEG detected.")
    else:
        print("File does not look like a JPEG.")

### Python bytes concatenation

- Bytes don't work quite like strings. 
    - When you index with a single value (rather than a slice), you get an integer, rather than a length-one bytes instance. 
    - a[0] gives you an int - 20 (hex 0x14).
- bytes constructor. 
    - If you pass a single integer in as the argument (rather than an iterable), you get a bytes instance that consists of that many null bytes ("\x00"). 
    - Using curly brackets works because it creates a set (which is iterable).
- Using slicing to concat : a += a[0:1]
    - rather than using indexing with a single value. 
    - This will give you a bytes instance that you can concatenate onto your existing value.


In [None]:
a = b'\x14\xf6' 
a += a[0]

In [None]:
bytes(a[0])

In [None]:
bytes({a[0]})

In [None]:
a += a[0:1]
a