## Files 

### Reading text files

- prepare a text file , eg. containing 30 decimal places of pi
    
```
pi_digits.txt 

3.1415926535
8979323846
2643383279
```  

- A program that opens this file, read it, and print it

```
with open('pi_digits.txt') as file_object:
    contents = file_object.read()

print(content)
```

- Note:
    - open(): open the file to access it
    - open() needs one argument - the name of the file, the path of the file is relative to the current program
    - open() returns an object representing the file
    - **with** keyword: it will close the file automatically once you've finished accessing the file（ie. leave the with block). No need to call close()
        - You could call close() yourself, but if some error happens between open() and close(), then the file may never be closed.
    - read(size): 
        - To read a file’s contents, which reads some quantity of data and returns it as a string (in text mode) or bytes object (in binary mode). 
        - size is an optional numeric argument. When size is omitted or negative, the entire contents of the file will be read and returned. Otherwise, at most size characters (in text mode) or size bytes (in binary mode) are read and returned. 
        - If the end of the file has been reached, f.read() will return **an empty string ('')**, which is equal to False.
    

In [None]:
with open('myfiles/pi_digits.txt') as file_object: # try a file name which doesn't exist, it will throw errors
    contents=file_object.read()
print(contents)

In [None]:
with open('myfiles/pi_digits.txt') as file_object:
    while True:
        contents = file_object.read(5)
        if contents:
            print(contents,end='')
        else:
            break

In [None]:
# or use assignment expression (or named expression,  walrus operator)
with open('myfiles/pi_digits.txt') as file_object:
    while contents := file_object.read(5):
        print(contents,end='')

In [None]:
# or use iter() and partial()
from functools import partial
with open('myfiles/pi_digits.txt') as file_object:
    for contents in iter(partial(file_object.read,5), ''):
        print(contents,end='')

### Read lines
- For reading lines from a file, you can loop over the file object.
- This is memory efficient, fast, and leads to simple code:

```
with open('myfiles/pi_digits.txt') as file_object:
    for line in file_object:
        print(line, end='')
``` 

- If you want to read all the lines of a file in a list you can also use list(f) or f.readlines().

```
with open('myfiles/pi_digits.txt') as file_object:
    #read lines into list: two ways
    #lines = file_object.readlines()
    lines = list(file_object)
```

In [None]:
with open('myfiles/pi_digits.txt') as file_object:
    for line in file_object:
        print(line, end='')

In [None]:
with open('myfiles/pi_digits.txt') as file_object:
    #read lines into list: two ways
    #lines = file_object.readlines()
    lines = list(file_object)
for line in lines:
    print(line,end='')
digits=''
for line in lines:
    digits+=line.rstrip() #string concatenation
print(digits)

### Writing text files

```
filename = 'programming.txt'

with open(filename, 'w') as file_object:
    file_object.write("I love programming.\n")
```

- note:
    - 2nd parameter in open() : 'w' means open the file in the **write mode**
    - Below are the possible mode : **read mode ('r'), write mode ('w'), append mode ('a')**, or a mode that allows you to **read and write** to the file **('r+')**. 
    - If you omit the mode argument, Python opens the file in read-only mode by default.
    - The open() function in write mode automatically creates the file you’re writing to if it doesn’t already exist. 
    - However, be careful opening a file in write mode because if the file does exist, Python will erase the contents of the file before returning the file object.
    - write() method: write a string to the file, return the number of characters written
        - Python can only write strings to a text file. 
        - If you want to store other types of data in a text file, you’ll have to convert the data to string format( eg. using the **str() function**)
        -  write() function doesn’t add any newlines to the text you write

In [None]:
filename = 'programming.txt'

with open(filename, 'w') as file_object:
    file_object.write("I love programming.\n")
    #write a tuple
    value = ('the answer', 42)
    s = str(value)  # convert the tuple to string
    file_object.write(s)

In [None]:
# for writing unicode text into text file
filename = 'chinexe_text.txt'
with open(filename, 'w+',encoding='utf8') as file_object: #
    file_object.write("世界大同!\n")
    file_object.seek(0)
    contents = file_object.read()
    print(contents)

### Append to a file

- If you want to add content to a file instead of writing over existing content, you can open the file in **append mode ('a')**. 
- When you open a file in append mode, Python doesn’t erase the contents of the file before returning the file object.
- Any lines you write to the file will be added at the end of the file. 
- If the file doesn’t exist yet, Python will create an empty file for you.

```
filename = 'programming.txt'

with open(filename, 'a') as file_object:
    file_object.write("I also love finding meaning in large datasets.\n")
    file_object.write("I love creating apps that can run in a browser.\n")

```

In [None]:
filename = 'programming.txt'

with open(filename, 'a') as file_object:
    file_object.write("I also love finding meaning in large datasets.\n")
    file_object.write("I love creating apps that can run in a browser.\n")


### Difference between r+ , w+ and a+ for open()

- Both r+, w+ and a+ can read and write to a file. 
- r+ doesn't delete the content of the file and doesn't create a new file if such file doesn't exist
- w+ deletes the content of the file if it exists(ie. truncates the file to zero length) and creates it if it doesn't exist.
- a+ doesn't delete the content of the file if it exists and creates it if it doesn't exist. 
    - You are able to use seek() to move the read cursor to any position in the file, but for a write operation, it will move to the end of file and append the content.

![file mode comparison](open_file_mode_2.png "Open() file mode comparison")

![file mode comparison](open_file_mode.png "Open() file mode comparison")



### Binary files

- When a file is opened in text mode, you read and write strings from and to the file, which are encoded in a specific encoding. 
    - If encoding is not specified, the default is platform dependent 
- You can append **'b'** to the file mode, which opens the file in binary mode - it means the data is read or written in the form of **bytes** objects. 
    - This mode should be used for all files that don’t contain text, eg. images, audio files.
- Both read() and write() can be used for binary data.

```
with open('myfiles/binary.bin','rb+') as file_object:
    f.write(b'0123456789abcdef')
```

![binary_viewer_result](binary_viewer.PNG "binary_viewer_result")

### Change file position

- To change the file object’s position, use f.seek(offset, whence). 
- The position is computed from adding offset to a reference point, ie. the whence argument. 
    - A whence value of 0 measures from the beginning of the file, 
    - A whence value of 1 uses the current file position 
    - A whence value of 2 uses the end of the file as the reference point. 
    - whence can be omitted and defaults to 0, using the beginning of the file as the reference point.
- For text files, only seeks relative to the beginning of the file are allowed 
    - the exception being seeking to the very file end with **seek(0, 2)**
    - the only valid offset values are those returned from the **f.tell()**, or zero. 
    - Any other offset value produces ***undefined behaviour***.

In [None]:
with open('myfiles/binary.bin','rb+') as file_object: # if file does not exist, 'r+' will throw FileNotFoundError
    file_object.write(b'0123456789abcdef')
    #rewind
    file_object.seek(0,0)
    #read the first 5 bytes
    contents_by = file_object.read(5)
    #transform bytes into characters using 'ascii' encoding
    contents_str = contents_by.decode('ascii')
    print(contents_str)

In [None]:
with open('myfiles/binary.bin','wb+') as file_object: # if file does not exist, 'w+' will create it
    file_object.write(b'0123456789abcdef')

### write an integer to a binary file

1. first use ```int.to_bytes()``` function to get the array of bytes for the integer
    - ```int.to_bytes(length, byteorder, *, signed=False)``` : Return an array of bytes representing an integer.
        - The integer is represented using **length** bytes. An OverflowError is raised if the integer is not representable with the given number of bytes.
        - The **byteorder** argument determines the byte order used to represent the integer. If byteorder is **"big"**, the most significant byte is at the beginning of the byte array. If byteorder is **"little"**, the most significant byte is at the end of the byte array. To request the native byte order of the host system, use **sys.byteorder** as the byte order value.
        - The **signed** argument determines whether two’s complement is used to represent the integer. If signed is False and a negative integer is given, an OverflowError is raised. The default value for signed is False.

2. call file.write()

In [2]:
import sys
very_long_int = 2147483647
print(bin(very_long_int))
with open('myfiles/binary_int.bin','wb+') as file_object:
    #file_object.write(very_long_int.to_bytes(4, byteorder="little", signed=True))
    file_object.write(very_long_int.to_bytes(4, byteorder=sys.byteorder, signed=True))
    file_object.seek(0)
    bs = file_object.read()
    for b in bs:
        print(bin(b))
    #call the class method of int : from_bytes()
    original = int.from_bytes(bs, byteorder=sys.byteorder, signed=True)
    print(original)

0b1111111111111111111111111111111
0b11111111
0b11111111
0b11111111
0b1111111
2147483647


In [6]:
x = (-1024).to_bytes(4, byteorder='sys.byteorder', signed=True)
print(x)

b'\xff\xff\xfc\x00'


In [None]:
y = int.from_bytes(x,byteorder=sys.byteorder,signed=True) # from_bytes() is a class method
print(y)

### JSON(JavaScript Object Notation)
- JSON is an open standard file format and data interchange format 
- It uses human-readable text to *store and transmit* data objects consisting of **attribute–value pairs and arrays** (or other serializable values). 
- It is a common data format with diverse uses in electronic data interchange, eg for web applications with servers.
- JSON is a language-independent data format. It was derived from JavaScript, but many modern programming languages include code to generate and parse JSON-format data. 
- JSON filenames use the extension .json

```
{
  "firstName": "John",
  "lastName": "Smith",
  "isAlive": true,
  "age": 27,
  "address": {
    "streetAddress": "21 2nd Street",
    "city": "New York",
    "state": "NY",
    "postalCode": "10021-3100"
  },
  "phoneNumbers": [
    {
      "type": "home",
      "number": "212 555-1234"
    },
    {
      "type": "office",
      "number": "646 555-4567"
    }
  ],
  "children": [],
  "spouse": null
}
```