# Files, Unicode, Bytes, Strings

## Reading Files, File I/O

### Use built-in function open

`open` has 2 args:

* `file_name` - `str` ... absolute or relative path to open
* `mode` - `str` ... single chr: r, w, a
    * `r`: file must exist, otherwise FileNotFoundError
    * `w`: if exists, overwrite, if it doesn't, create
    * `a`: append.... add to existing file

returns: `file` object

file object has following methods:
* `close`... closes the file handle, no args
* `read`... entire contents and returns as str
* `readlines`... same ^^^, but returns as list
* `readline`... one line at a time, location saved


### Reading a file

File object can be looped over using `for`:

In [34]:
f = open('1342-0.txt', 'r')
for line in f:
    # commented out print to prevent entire file being printed out
    # print(line)
    pass
f.close()

You can actually `enumerate` a file object (in this instance, we're only printing out 6 lines):

In [35]:
f = open('1342-0.txt', 'r')
for i, line in enumerate(f):
    print(line)
    if i > 5:
        break
f.close()

﻿The Project Gutenberg EBook of Pride and Prejudice, by Jane Austen



This eBook is for the use of anyone anywhere at no cost and with

almost no restrictions whatsoever.  You may copy it, give it away or

re-use it under the terms of the Project Gutenberg License included

with this eBook or online at www.gutenberg.org





### Using `with` and Using `readlines`

Now let's try using `with`... when the block exits, the file object is automatically cleaned up (`close` is called)

Additionally, we'll use `readlines` to read the contents of the file as a `list`

In [36]:
with open('1342-0.txt', 'r') as f:
    lines = f.readlines()
    print(lines[0])
print('done')

﻿The Project Gutenberg EBook of Pride and Prejudice, by Jane Austen

done


### Handling Exceptions

Catch a FileNotFoundError to deal with a file that doesn't exist:

In [37]:
def read_stuff():
    with open('dne.txt', 'r') as f:
        lines = f.readlines()
        print(lines[0])
try:
    read_stuff()
except FileNotFoundError:
    print('Could not find the file!')
    


Could not find the file!


## Representation of Data

Use bits! ... 0's and 1's

* straightforward for numbers... use numbers, perhaps it may be a set number of bits
* representing text.... mapping of numbers to characters 65... A... in some encodings
* unicode is just the mapping`
* WAT about encoding???
    * utf-8
    * utf-16
    * utf-32
  
### utf-8

Variable length encoding. Even though only 8 bits / 1 byte, can represent other unicode characters by adding additional bytes (higher bytes specify whether or no other bytes should be combined).

* sometimes encoding of file is not known
* ...if you have a series of bytes, you can decode with an scheme of your choice (utf-8, latin-1, etc.?)

If using mostly ASCII characters, then utf-8 is a great choice. However, if characters require more than one byte, utf-16, might be a better option. utf-32 might take up too much space for every character to be practical.

### Strings in Python

From the Python docs: "The default encoding for Python source code is UTF-8, so you can simply include a Unicode character in a string literal"... see below!

In [38]:
s = "this is clearly a string"
s2 = "also a string ☃"
print(s2)

also a string ☃


In [39]:
s3 = "not sure 🙃"
print(s3)

not sure 🙃


### bytes Objects

From Python docs: "Bytes objects are immutable sequences of single bytes." ...

* sequence of ints 0 - 255
* can be created by using string of ascii characters

In [40]:
b = b"hello"

In [41]:
b[0]

104

In [42]:
ord('h')

104

In [43]:
try:
    b + '!!!!!'
except TypeError as e:
    print(type(e), e)

<class 'TypeError'> can't concat str to bytes


In [44]:
type(b)

bytes

### Use `decode` Method to Conver to a String

* interpret a series of bytes as utf-8

In [45]:
b.decode('utf-8')

'hello'

### Now Let's Try utf-16

In [46]:
b = b'hello!'
b.decode('utf-8')
# ... works as you expect!

'hello!'

In [47]:
b.decode('utf-16')
# ... how about same bytes as utf-16

'敨汬Ⅿ'

# String Formatting

## Using the format function

In [48]:
name = 'joe'
num = 20
food = 'apple pies'
"Hi, my name is {}, and I have {} {}!".format(name, num, food)

'Hi, my name is joe, and I have 20 apple pies!'

### Using `format` with Positions

In [49]:
"Hi, my name is {2}, and I have {0} {1}!".format(num, food, name)

'Hi, my name is joe, and I have 20 apple pies!'

### Using `format` with Positions and Format Specifier

In [50]:
"Hi, my name is {2:s}, and I have {0:.2f} {1:s}!".format(num, food, name)

'Hi, my name is joe, and I have 20.00 apple pies!'

### Format String Literals

In [51]:
f"Hi, my name is {name}, and I have {num} {food}!"

'Hi, my name is joe, and I have 20 apple pies!'

In [52]:
f"Hi, my name is {name}, and I have {num:.2f} {food}!"

'Hi, my name is joe, and I have 20.00 apple pies!'

### Using the % Operator

In [53]:
result = "%s %s" % (num, food)
print(result)

20 apple pies
