# Files

Files are one of the most ubiquitous abstractions for computers.  As users, we constantly interact with files to store our documents and other data.  We organize these files into directories (folders).  Directories can contain subdirectories to provde a hierarchical structure of various contents.  

As with other programming languages, Python provides a rich set of functionality to interact with files and directories. Interacting with files will also be necessary to allow us to persist data. So far, we have just used variables that hold data in the computer's memory - such data will be lost when the program terminates.  By storing data in files, the information is placed (stored/persisted) on [secondary storage devices](https://en.wikipedia.org/wiki/Computer_data_storage#Secondary_storage) such as hard drives and USB sticks. 

We'll also use files to share data with other individuals and systems.  Such data is usually defined in common formats such as tab-delimited, comma-separated values(CSV), and JavaScript Object Notation(JSON).


## Working with Files
Generally to read or write files, you'll follow these steps.
1. Open the file 
2. Read or write to the file
3. Close the file once your are finished.

To open a file, use the built-in function [open()](https://docs.python.org/3/library/functions.html#open)

```
f = open(filename, mode)
```
`open()` returns a file object.  By default, a file is opened for reading as containing text. i.e., mode='r'.  

This table contains the different modes that may be specified:

| Character | Meaning
| :--------:|:-------|
'r'| open for reading (default)
'w'| open for writing, truncating the file first
'x'| open for exclusive creation, failing if the file already exists
'a'| open for writing, appending to the end of file if it exists
'b'| binary mode.  Specify in conjunction 'r','w','x', or 'a'
't'| text mode (default). Specify in conjunction 'r','w','x', or 'a'
'+'| open for updating (reading and writing).  Rarely used. See [open()](https://docs.python.org/3/library/functions.html#open)


Read or write data to the file as necessary

Finally, call `close()` to notify the operating system and interpreter that we are done with the file and can then release any allocated resources.
```
f.close()
```

## Text Files
### Creating a new text file
The following code block opens a file called "text.txt" in the current directory for writing text.  The code block then shows 2 different ways of putting a string into the file.

In [None]:
f = open("test.txt", "wt")
print('String message, print built-function, but specify the file', file=f)
f.write('Another message, uses the write method of the file object')
f.write("test")
f.close()

If you examine the file in a text editor, you'll notice the file contains:
```
String message, print built-function, but specify the file
Another message, uses the write method of the file objecttest
```
By default, `print()` adds a newline at the end of each call unless you specify a different value in the `end` parameter.

The `write()` method does not add any new line characters - you will need to manually add these as needed.

### Reading a text file
To read a text file, we can use several different methods
- `read()`
- `readline()`
- `readlines()`
- an iterator

`read()` with no arguments will read the entire contents of the file into a string.  As such, you'll need to be careful with large files as you may exhaust the available memory in the computer.  

In [None]:
f = open("test.txt", "rt")
contents = f.read()
f.close()
print(contents)

To limit the number of characters read from the file in one method call, you can specify the maximum number of characters to read at a time.

In [None]:
f = open("test.txt", "rt")
numCharacters = 20
message = ""
while True:
    text = f.read(numCharacters)
    if not text:    #string is empty, nothing else to read in the file
        break
    print(text,end="###")
    message += text
f.close()
print("\n\nNow, dispay the message:")
print(message)

`readline()` will read a line at a time, returning the contents in a string.  If the end of the file is reached, an empty string is returned.  If a blank line exists, a string with a newline character is returned.

In [None]:
f = open("test.txt", "rt")
while True:
    line = f.readline()
    if not line:    #string is empty, nothing else to read in the file
        break
    print(line)
f.close()

Notice that in the above output, the newlines stored in the file are kept in the returned string.  If they were simply stripped from the return value, it would not be possible to distinquish between an empty line and the end of the file.

`readlines()` will read the entire contents of the file at once placing and return a list containing each line in a string.

In [None]:
f = open("test.txt", "rt")
lines = f.readlines()
f.close()
for line in lines:
    print(line.strip())  ## stripe the newline character from the end of string 


Probably the most conventional way to ready a text file in Python is to use an iterator:

In [None]:
f = open("test.txt", "rt")
for line in f:
    print(line.strip())
f.close()

While we noted that not specifying a limit to the `read()` can lead to memory issues, the other methods may have issues as well depending upon the presence of new line characters to split apart the data read.

## Closing Files Automatically
Unlike other programming languages, Python will close a file once it is no longer referenced (e.g., the file was opened in a function and the function has ended).  However, closing a file still serves two important purposes:
1. Forces any remaining writes to be completed / "flushed" to the file.
2. Clears any resources allocated to managing the open file

Python utilizes context managers to automatically take action when a code block is entered and/or exited by defining `__enter__()` or `__exit__()` methods on an object. Using objects defined with context managers can then take the form: 
<pre>
with <i>expression</i> as <i>variable</i>:
    <i>code block</i>
</pre>


In [None]:
with open("test.txt") as f:
    for line in f:
        print(line.strip())

## Binary Files
During this course, we'll primary use text files, but binary files are constantly used - images, videos, executables, specialized data files, etc.

To read and write data to binary files, we will manipulate binary data through [bytes](https://docs.python.org/3/library/stdtypes.html#bytes-objects) and [bytearray](https://docs.python.org/3/library/stdtypes.html#bytearray-objects) objects.  Other APIs have been written on top of these types to provide richer capabilities.

Literals can also be defined with byte strings. 

### Binary Example: IP Addresses

The following code fragment resolves a domain name into an IP address.  As you visit various websites on the Internet, the computer performs this resolution such that it can send your request to the appropriate server.

In this example, `socket.gethostbyname()` returns a string representation of the IP address.  For IPv4, address are composed of 4 parts, each with the value between 0 and 255.  So, each value is contained in a single byte and an IPv4 address can be represented with 4 bytes.  (IPv6 addresses are are represented by 8 bytes.)   After printing out the value, the code converts it to bytes, which is an immutable sequence of byte values similar to a tuple. As such, we can index with slices just as we can with other python sequences such as strings and lists.

When displaying byte string literals, Python will display an ASCII value if a number can be converted to a printable ASCII character otherwise, it displays the number as a hexadecimal value. Recall in one of the earlier notebooks, we presented the built-in function `chr()` to convert a number to the corresponding ASCII(Unicode) character.

In [None]:
import socket
addr = socket.gethostbyname('wsj.com')
print(addr1)
ba = socket.inet_aton(addr1)
print(ba)
print(ba[-1])
print(chr(ba[0]), chr(ba[1]), chr(ba[2]), chr(ba[3]))

### Writing to a Binary File

In [None]:
with open("test_binary.dat", 'wb') as f:
    f.write(ba)

### Reading from a Binary File

In [None]:
with open("test_binary.dat", 'rb') as f:
    ip_address = f.read()
print(ip_address)
print(type(ip_address))
print(socket.inet_ntoa(ip_address))    # convert the byte array to a string representation

## File Operations
Python offers a variety of operations on files. 

### Existence
To see whether or not a given file or directory exists, call `os.path.exists()` with the name to check.

In [None]:
import os
print("test_binary.dat",os.path.exists("test_binary.dat"))
print("binary.dat",os.path.exists("binary.dat"))
print(".",os.path.exists("."))        # current directory
print("..",os.path.exists(".."))      # parent directory

### Checking Filetype
Use `os.path.isdile()` to return a Boolean on whether or the argument is a file.

Use `os.path.isdir()` to return a Boolean on whether or the argument is a directory.

In [None]:
print("isfile: test_binary.dat", os.path.isfile("test_binary.dat"))
print("isdir: test_binary.dat", os.path.isdir("test_binary.dat"))

### Deleting Files
To delete a file, use `os.remove()`.

In [None]:
os.remove("test_binary.dat")
os.path.exists("test_binary.dat")

## directory operations
- mkdir
- rmdir
- listdir
- change current dur

For other file and directory operations look at the [os](https://docs.python.org/3/library/os.html#module-os) module.

## Pathnames

relative / absolute

- abspath
os.path.join()

size of a file: https://www.geeksforgeeks.org/how-to-get-file-size-in-python/

Pathlib
from pathlib import Path

Path(r'd:/file.jpg').stat()
file=Path(r'd:/file.jpg').stat().st_size
print("Size of file is :", file, "bytes")

Outline:

- directory / file representation

- Issus
  - line endings
  - character encodings

Support for files is defined with Python's [io](https://docs.python.org/3/library/io.html) module

##  Tricky Issues

### Newline Characters
One of the common issues when dealing with text files is that different platforms use different characters to signify a new line.  One Linux and MacOS, newlines are represented with just the bytecode `0x0a <LF> \n` while on Windows, `0x0d0a <CR><LF> \r\n` represents a new line. 

Within Python 3, the `open()` function has a parameter `newline` which controls how newlines are processed when reading in text files.  By defualt, universal newlines are enabled.  In this mode, lines can end with `\n`, `\r`, or `\r\n`.  Python will translate all of these to `\n` before returning a value to caller.   

When writing output to a file, any `\n` characters are translated to tthe system default line separator, `os.linesep`, as the output is sent to a file.

For both reading and writing, there are additional modes to force behavior if required.

## Encodings


TEXT FROM python docs --- change ---

encoding is the name of the encoding used to decode or encode the file. This should only be used in text mode. The default encoding is platform dependent (whatever locale.getpreferredencoding() returns), but any text encoding supported by Python can be used. See the codecs module for the list of supported encodings.

In [None]:
import locale
locale.getpreferredencoding() 

## Exercises 
- recursively list all files and directories...  This may be a good topic for a lecture ...

- print the size of files in a directory

- write a method with two parameters: filename and countdown.  Countdown is a positve integer.  Using a range function, write a file that looks like 10 9 8 7 6 5 4 3 2 1 BlastOff
- write a method that reads the file produced in the previous exercise.  it will read all of the numbers and produce their sum, printing the result to the console.