##### Diving In

Files are primary storage paradigm for every major operating system. You computer is, metaphorically drowning in files.

##### Reading from text files

Opening a file in python couldn't be easier

```python

a_file = open('examples/chinese.txt', enconding='utf-8')

```

Python has a built-in `open()` function, which takes a optional path and filename and an encoding as arguments.

Python handles path seperators, and always uses `/` irrespective of the operating systems path seperator.

If the path does not start with a `/` the path is a relative path. 

The path is a string, most operating systems support unicode file and directory names. Python fully supports `unicode` pathnames.

The second argument specified the encoding to utilize when opening the file. Recally from an earlier notebook, bytes on disk have to be interpreted as characters which requires the specification of encoding / character set.

In [55]:
a_file = open('examples/chinese.txt', encoding='cp1252')
a_string = a_file.read()
a_string

UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 50: character maps to <undefined>

In the above example we specify an incorrect encoding / character set that is unable to decode bytes from the file. The defaule encoding on my laptop is `utf-8` the same as the encoding on the file, and simply reading the file and not specifying and enconding works.

In [56]:
a_file = open('examples/chinese.txt')
a_string = a_file.read()
a_string

'這是一個很長的文本，我想將其翻譯成中文，以便在我打算在線閱讀有關python的資料時，可以將其稱為chinese.txt並將其用於python筆記本。\nDive Into Python 是为有经验的程序员编写的一本 Python 书。\n'

##### Stream Objects

The `open` function returns a _stream object_, which has methods and attributes for getting information about manipulating a stream of charcters.

In [57]:
a_file = open('examples/chinese.txt')
print(a_file.name)
print(a_file.encoding)
print(a_file.mode)

examples/chinese.txt
UTF-8
r


In [58]:
import locale
locale.getpreferredencoding()

'UTF-8'

The `mode` attribute tells you which mode the file was opened. Yo ucan pass an optional `mode` parameter to the `open` function. Python defauls to `r`, which means "open ffor reading only, in text mode". The file mode serves several purposes; different modes let you write to a file, append to a file, or open a file in binary mode  (in which case you deal with bytes instead of characters / strings)

##### Reading data from a text file

After you open a file for reading, you'll probably want to read from it at some point.

In [59]:
a_file = open('examples/chinese.txt', encoding = 'utf-8')
print(a_file.read())
print(a_file.read())
print(a_file.read())

這是一個很長的文本，我想將其翻譯成中文，以便在我打算在線閱讀有關python的資料時，可以將其稱為chinese.txt並將其用於python筆記本。
Dive Into Python 是为有经验的程序员编写的一本 Python 书。





Perhaps somewhat surprisingly, reading the file past the end does not raise an exception. Python does not consider reading past end-of-file to be an error; it simply returns an empty string.

To reread a file, you can use the `seek` method.

In [60]:
a_file.seek(0)
print(a_file.read(16))
print(a_file.read(1))
print(a_file.read(1))
print(a_file.tell())

這是一個很長的文本，我想將其翻譯
成
中
54


The `seek` and `tell` methods count `bytes` but since we opened the file in text mode, the `read` method reads characters. Chinese character require multiple bytes to encode in utf-8, result in the `tell` method returning `54`.

In [61]:
a_file.seek(1)
print(a_file.read(1))

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte

Why does this fail? Because there isn't a character at the first bute, the nearest character is at byte `3`. Trying to read a character from the middle will fail with `UnicodeDecodeError`.

##### Closing a file

Opening files consumes resources, and depending on the file mode, other programs may not be able to acces the file. It is important to close the file as soon as you are finished using them

In [62]:
a_file.close()
a_file.closed

True

Close stream objects do have one useful attribute the `closed` attribute will confirm that the file is `closed`.

##### Closing files automatically

Stream objects have an explicit `close` method, but what happens if your code has a bug and crashes before you call `close`? That file could theoretically stay open for much longer than necessary.

Python preferred wah to close stream object is with the `with` statement.

In [63]:
with open('examples/chinese.txt', encoding='utf-8') as f:
    f.seek(3)
    print(f.read(1))

是


This code calss `open`, but it never calls `close`. The `with` statement starts a code block. Inside this block you can use the variable `f` as the stream object returned by `open`. Whtn the `with` block ends, python calls `f.close()` automatically.

No matter how you exit the `with` block, Python will close the file ... even if you exit via an unhandled exception.

The `with` statement creates a runtime context. In the above example the `stream` object acts as a context manager. Python creates the stream object `f` and tells it that it is entering a runtime context. When the `with` code block is completed, Python tells the stream object that it is exiting th runtime context, the stream object calls its own `close` method.

There is nothing `file` specific about the `with` statement. It's just a generic framework for creating runtime contexts and telling objects that they're entering and exiting a runtime context.

If the object is a stream object, then it does useful file-like things (like closing the file automatically). But that behavior is definedf in the stream object, not in the `with` statement. You can even create your own context managers that have nothing to do with files.

##### Reading data one line at a time.

A line of text is just what you think - you type a few words and press `RETURN`.

Text file can use several different character to mark the end of a line. Every operating system has it own convention of combinations involving `carriage-return / CR` and `line feed / LF`

Pythong handles line encodings automatically by default. If you say read a line of text from a file, Python will figure out which kind of line ending the text file uses and it all *Just Work*

In [64]:
line_number = 0
with open('examples/favorite-people.txt') as a_file:
    for a_line in a_file:
        line_number += 1
        print("{:>4} {}".format(line_number, a_line.rstrip()))

   1 Dora
   2 Ethan
   3 Wesley
   4 John
   5 Anne
   6 Mike
   7 Chris
   8 Sarah
   9 Alex
  10 Lizzie


* Using the *with* pattern you can safely open the file. Python will close it for you
* To read a file one line at a time, use a `for` loop. Besides having explicit methods like `read`, the stream object is also an `iterator` which spits out a single line every time you ask for the `next` value
* Using the `format` method, you can print line number right justifes within 4 spaces. The `a_line` variable contains the complete line, carriage returns and all, the `rstrip` method removes the trailing white space, including the new line characters.

##### Writing to text files

You can write to files in the same way that you read from them. First you open a file and get a stream object then you use methods on the stream object to write data to the file, then you close the file.

To open a file for writing, use the `open` function and specify the write more. There are thwo file modes for writing

* "Write" mode will overwrite the file
* "Append" mode will add data to the file.

Either mode will create the file automatically if it doesn't exist. You should always close a file as soon as you're done writing to it, to release the file handle and ensure the contents are actually written to disk.

As with opening a file for reading, you can call the stream objects `close` method, or you can use the `with` statement and let Python close the file for you.

In [65]:
with open('examples/test.log', mode='w', encoding='utf-8') as a_file:
    a_file.write("test succeeded")
    
with open('examples/test.log', encoding='utf-8') as a_file:
    print(a_file.read())
    
with open('examples/test.log', mode='a', encoding='utf-8') as a_file:
    a_file.write(" and again")
    
with open('examples/test.log', encoding='utf-8') as a_file:
    print(a_file.read())

test succeeded
test succeeded and again


##### Binary files

Not all files contain text. Some of them contain pictures

In [66]:
an_image = open('examples/beauregard.jpg', mode='rb')

print(an_image.mode)
print(an_image.name)
print(an_image.encoding)

rb
examples/beauregard.jpg


AttributeError: '_io.BufferedReader' object has no attribute 'encoding'

* Opening a file in binary mode is similar to a opening a file in text mode with the subtle difference of using mode `rb` where the `r` indicates read and the `b` indicates binary
* The stream object you get from opening a file in binary mode has many of the same attributes, including `mode` for instance
* binary stream objects also have `name` just like test stream objects
* There is one difference though, a binary stream has no `encoding` attribute. That make sense, you're reading bytes not characters. So there is no conversion to do. What you get out of a binary file is exactly what you out into it, no conversion necessary.

In [67]:
print(an_image.tell())
data = an_image.read(3)
print(data)
print(type(data))
print(an_image.tell())
an_image.seek(0)
data = an_image.read()
len(data)

0
b'\x89PN'
<class 'bytes'>
3


1246129

Like text files you can read a binary file by specifying the byte count (instead of character count). That means there is ne ver an unexpected mismatch between the number you passed into `read` and the position index you get out of `tell`.

The `read` method readsbytes, the `seek` and `tell` methods track the number of bytes read. For binary files, they'll always agree.

##### Stream Objects From Non-File Sources

Imagine you are writing a library, and one of your library functions is going to read some data from a file. The function could simply take a filename as a string, go open the file for reading, read it and close it before exiting. 

Instead if you API took an arbitrary _stream object_ it would work with any sources of stream objects and not just with file.

A _stream object_ is anything with a `read` method which takes an optional `size` parameter and returns data. When called with no size parameter, the `read` method should read everything there is to read from the source and return all the data as a single vaue. When called with a `size` parameter it reads that much from the input source and returns that much data. When called again, it picks up where it left off an returns the next chunk of data.

That sounds exactly like the stream object you get from opening a file. However by programming the API to the stream object you are not restricting youself to files. The input source could be anything, a web page, a string in memory, even the output of another program.

As long as your API takes a _stream object_ and calls the `read` method, you can handle any inout source that acts like a file without specific code to handle each kind of input.

In [68]:
a_string = "PapayaWhip is the new black"

import io
a_stream = io.StringIO(a_string)
print(a_stream.read())
print(a_stream.read())
a_stream.seek(0)
print(a_stream.read(10))
print(a_stream.tell())
a_stream.seek(18)
print(a_stream.read())

PapayaWhip is the new black

PapayaWhip
10
new black


`io.StringIO` lets you access a string as a _stream object_ of characters. There is also `io.BytesIO` which lets you treat a byte array as a _stream object_ of bytes.

##### Handling Compressed Files

The Python standard library contains modiles that support reading and writing compressed files created via a numner of different compression schemes.

The `gzip` module lets you create a _stream object_ for reading and writing gzip compressed files.

In [69]:
import gzip

with gzip.open("examples/out.log.gz", mode= 'wb') as z_file:
    z_file.write('A nine mile walk is no joke, especially in the rain'.encode('utf-8'))
    

* Always open gzipped files are a `binary` file. Note the `b` charater int he mode argument

##### Standard Input, Output and Error

Standard outout and standard error (commonly abbreviated `stdout` and `stderr`) are _pipes_ that are built into every UNIX like operating system. When you call the `print` function, the thing you're printing is sent to `stdout` pipe.

When you program crashed and prints the output to `stderr`. By default, both of these pipes are just connected to the terminal window where you are working.

In [70]:
for i in range(3):
    print('PapayaWhip')

PapayaWhip
PapayaWhip
PapayaWhip


In [71]:
import sys

for i in range(3):
    sys.stdout.write(' is the ')
    
for i in range(3):
    sys.stderr.write(' new black ')

 is the  is the  is the 

 new black  new black  new black 

`sys.stdout` and `sys.stderr` are stream objects, but they are write-only. Attempting to call their `read` method will result in an `IOError`

##### Redirecting standard output

`sys.stdout` and `sys.stderr` are _stream object_ that support only writing. But they're not constants; they're variables. That means you can assign them a new value - any other stream object - to _redirect_ their output.

In [72]:
import sys

class RedirectStdoutToFile:
    
    def __init__(self, file):
        self.file = file
        
    def __enter__(self):
        self.prev_out = sys.stdout
        sys.stdout = self.file
        
    def __exit__(self, *args):
        sys.stdout = self.prev_out
        
print('A goes to console')

with open('examples/output.log', mode="w", encoding='utf-8') as file, RedirectStdoutToFile(file):
    print('B goes to file')
    
print('C goes to console')

A goes to console
C goes to console


Lets look at the `with` statement, the context manager created by instantiating `RedirectStdoutToFile` is not assinged a local variable using `as`, this is because we are using the context manager for side effect. The `with` statement does not need us to assign the context manager to a variable.

A context manager is any class that implements the `__enter__` and `__exit__` methods. These are "special methods" invoked by the Python runtime. The `__enter__` method is invoked when entering a `with` block and `__exit__` is invoked when exiting a `with` block.

The `with` statement takes a _comma seperated list of contexts_. The comma-seperated list acts like a series of nested `with` blocks. The first context listed is the "outer" block, the las one is the "inner" block.

The first context opens a file, the second context redirects `sys.stdout` to the stream object that was created in the first context.

The context managers form a last-in-first-out stack. Upon exiting, the second context changed `sys.stdout` back to its original vaue, then the first context closed the file name `examples/output.log`.

Since `stdout` has been restored to its original value, calling the `print` function once again prints to the screen.