# I/O Operations
## Basic file operations

`open()` returns a file object, and is most commonly used with two arguments: `open(filename, mode)`.

The first argument is a string containing the filename. The second argument is another string containing a few characters describing the way in which the file will be used. `mode` can be `'r'` when the file will only be read, `'w'` for only writing (an existing file with the same name will be erased), and `'a'` opens the file for appending; any data written to the file is automatically added to the end. `'r+'` opens the file for both reading and writing. The `mode` argument is optional; `'r'` will be assumed if it’s omitted.

Normally, files are opened in *text mode*, that means, you read and write strings from and to the file, which are encoded in a specific encoding. If `encoding` is not specified, the default is platform dependent (`locale.getpreferredencoding(False)` is called to get the current locale encoding). `'b'` appended to the mode opens the file in *binary mode*: now the data is read and written in the form of bytes objects. This mode should be used for all files that don’t contain text.

| Character | Meaning                                                         |
|-----------|-----------------------------------------------------------------|
| `r`       | open for reading (default)                                      |
| `w`       | open for writing, truncating the file first                     |
| `x`       | open for exclusive creation, failing if the file already exists |
| `a`       | open for writing, appending to the end of the file if it exists |
| `b`       | binary mode                                                     |
| `t`       | text mode (default)                                             |
| `+`       | open for updating (reading and writing)                         |

### Context managers (`with` statement)

It is good practice to use the `with` keyword when dealing with file objects. The advantage is that the file is properly closed after its suite finishes, even if an exception is raised at some point. Using `with` is also much shorter than writing equivalent `try-finally` blocks:

```python
with open('workfile') as f:
    read_data = f.read()
```

The equivalent `try-finally` block of the `with` block above would be:

```python
f = open('workfile')
try:
    read_data = f.read()
finally:
    f.close()
```

While comparing it to the first example we can see that a lot of boilerplate code is eliminated just by using `with`. The main advantage of using a with statement is that it makes sure our file is closed without paying attention to how the nested block exits.

The `with` statement is used to wrap the execution of a block with methods defined by a context manager. A context manager is an object that defines the runtime context to be established when executing a `with` statement. The context manager handles the entry into, and the exit from, the desired runtime context for the execution of the block of code. 

Typical uses of context managers include saving and restoring various kinds of global state, locking and unlocking resources, closing opened files, etc.

### Methods of File Objects

To read a file’s contents, call `f.read(size)`, which reads some quantity of data and returns it as a string (in text mode) or bytes object (in binary mode). `size` is an optional numeric argument. When `size` is omitted or negative, the entire contents of the file will be read and returned; it’s your problem if the file is twice as large as your machine’s memory. Otherwise, at most `size` characters (in text mode) or `size` bytes (in binary mode) are read and returned. If the end of the file has been reached, `f.read()` will return an empty string.



In [1]:
f = open('file_example.txt', 'r+')
f.read(20)

'Beautiful is better '

`f.readline()` reads a single line from the file:

In [2]:
print(f.readline(), end='')
print(f.readline(), end='')

than ugly.
Explicit is better than implicit.


For reading lines from a file, you can loop over the file object. This is memory efficient, fast, and leads to simple code:

In [3]:
for line in f:
    print(line, end='')

Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.


If you want to read all the lines of a file in a list you can also use `list(f)` or `f.readlines()`.

`f.write(string)` writes the contents of string to the file, returning the number of characters written.

In [4]:
f.write('Readability counts.\n')

20

`f.tell()` returns an integer giving the file object’s current position in the file represented as number of bytes from the beginning of the file when in binary mode and an opaque number when in text mode.

In [5]:
f.tell()

229

To change the file object’s position, use `f.seek(offset, whence)`. The position is computed from adding `offset` to a reference point; the reference point is selected by the `whence` argument. A `whence` value of 0 measures from the beginning of the file, 1 uses the current file position, and 2 uses the end of the file as the reference point. whence can be omitted and defaults to 0, using the beginning of the file as the reference point.

In [6]:
f.seek(0)
f.read(10)

'Beautiful '

If you’re not using the `with` keyword, then you should call `f.close()` to close the file and immediately free up any system resources used by it. If you don’t explicitly close a file, Python’s garbage collector will eventually destroy the object and close the open file for you, but the file may stay open for a while.

In [7]:
f.close()

## Working with the file system (`os`, `os.path`, `glob`)

### `os`

The `os` module contains functions to get information on local directories, files, processes, and environment variables.

`os.getcwd()` - returns the current working directory

In [1]:
import os
current_path = os.getcwd()
print(current_path)

/Users/iulia/PycharmProjects/python-training-oct2020/docs


`os.listdir(path)` - returns a list of all the entries in the directory given by `path`

In [2]:
os.listdir(current_path)

['13. Web Applications. Django Framework.ipynb',
 '07. Organizing and reusing code.ipynb',
 '01. Introduction.ipynb',
 '10. Object-Oriented Programming.ipynb',
 'file_example.txt',
 '04. Built-in Types (I).ipynb',
 '03. Everything in Python is an object.ipynb',
 'webapp.png',
 '06. Built-in Types (II).ipynb',
 '12. Testing your code.ipynb',
 '09. Decorators.ipynb',
 '.ipynb_checkpoints',
 '05. Control Flow.ipynb',
 '11. Input-Output Operations.ipynb',
 '02. The Language.ipynb',
 '08. Iterables.ipynb']

`os.mkdir(path)` - creates a directory

`os.makedirs(path)` - creates directory recursively, by adding eventual missing directories 

In [10]:
os.mkdir('testdir')
assert 'testdir' in os.listdir(current_path)

`os.chdir()` - changes the current working directory

In [11]:
os.chdir('testdir')
print('Items in testdir:', os.listdir())
os.chdir(current_path)

Items in testdir: []


`os.rename(source, dest)` - renames the file or directory 

In [12]:
os.rename('testdir', 'new_testdir')
assert 'testdir' not in os.listdir(current_path)
assert 'new_testdir' in os.listdir(current_path)

`os.remove(path)` - removes a file

`os.rmdir(path)` - removes the directory path

`os.removedirs(path)` - Removes directories recursively

In [13]:
os.rmdir('new_testdir')
assert 'new_testdir' not in os.listdir(current_path)

`os.walk(path)` - Directory tree generator. For each directory in the directory tree rooted at top, yields a 3-tuple `dirpath, dirnames, filenames`:
    
* `dirpath` is a string, the path to the directory.
* `dirnames` is a list of the names of the subdirectories in `dirpath` (excluding '.' and '..').
* `filenames` is a list of the names of the non-directory files in `dirpath`.

In [14]:
for dirpath, dirnames, filenames in os.walk('.'):
    print(dirpath, dirnames, filenames)

. ['.ipynb_checkpoints'] ['07. Organizing and reusing code.ipynb', '01. Introduction.ipynb', '10. Object-Oriented Programming.ipynb', 'file_example.txt', '04. Built-in Types (I).ipynb', '03. Everything in Python is an object.ipynb', '06. Built-in Types (II).ipynb', '09. Decorators.ipynb', '05. Control Flow.ipynb', '11. Input-Output Operations.ipynb', '02. The Language.ipynb', '08. Iterables.ipynb']
./.ipynb_checkpoints [] ['07. Organizing and reusing code-checkpoint.ipynb', '10. Object-Oriented Programming-checkpoint.ipynb', '09. Decorators-checkpoint.ipynb', '06. Built-in Types (II)-checkpoint.ipynb', '01. Introduction-checkpoint.ipynb', '02. The Language-checkpoint.ipynb', '03. Everything in Python is an object-checkpoint.ipynb', '05. Control Flow-checkpoint.ipynb', '11. Input-Output Operations-checkpoint.ipynb', '08. Iterables-checkpoint.ipynb', '04. Built-in Types (I)-checkpoint.ipynb']


### `os.path`

`os.path` contains functions for manipulating filenames and directory names.


`os.path.exists(path)` - test whether a path exists

In [15]:
os.path.exists(current_path)

True

`os.path.isfile(path)` - test whether a path is a regular file

In [16]:
os.path.isfile(current_path)

False

`os.path.isdir(path)` - return true if the pathname refers to an existing directory

In [17]:
os.path.isdir(current_path)

True

`os.path.islink(path)` - test whether a path is a symbolic link

In [18]:
os.path.islink(current_path)

False

`os.path.split(path)` - split a pathname;  returns tuple `(head, tail)` where `tail` is everything after the final slash

In [3]:
os.path.split(current_path)

('/Users/iulia/PycharmProjects/python-training-oct2020', 'docs')

`os.path.dirname(path)` - returns the directory component of a pathname

In [4]:
os.path.dirname(current_path)

'/Users/iulia/PycharmProjects/python-training-oct2020'

`os.path.basename(path)` - returns the final component of a pathname

In [21]:
os.path.basename(current_path)

'docs'

`os.path.join(path,"new_var")` - join two or more pathname components, inserting `os.sep` as needed.

In [22]:
os.path.join(current_path, 'testdir', 'innerdir')

'/Users/iulia/PycharmProjects/python-training-june2020/docs/testdir/innerdir'

### `glob`

The glob module is another tool in the Python standard library. It's an easy way to get the contents of a directory programmatically, and it uses the sort of wildcards that we may already be familiar with from working on the command line.

`glob.glob(pathname, recursive=False)` - Return a list of paths matching a `pathname` pattern. The pattern may contain simple shell-style wildcards. If `recursive` is true, the pattern `'**'` will match any files and zero or more directories and subdirectories.

`glob.iglob(pathname, recursive=False)` - Return an iterator which yields the paths matching a pathname pattern.

In [23]:
import glob
glob.glob('*Types*')

['04. Built-in Types (I).ipynb', '06. Built-in Types (II).ipynb']

## Parsing command line arguments (`argparse`)

`argparse` is the recommended command-line parsing module in the Python standard library.

The first step in using the argparse is creating an `ArgumentParser` object:

In [24]:
import argparse
parser = argparse.ArgumentParser(description='Argparse example')

Filling an `ArgumentParser` with information about program arguments is done by making calls to the `add_argument()` method.

In [25]:
parser.add_argument('number', type=int, help='do something with number')
parser.add_argument('--flag', help='flag stored true if present', action='store_true')

_StoreTrueAction(option_strings=['--flag'], dest='flag', nargs=0, const=True, default=False, type=None, choices=None, help='flag stored true if present', metavar=None)

#### `add_argument` parameters:

`name or flags` - Either a name or a list of option strings, e.g. `foo` or `-f, --foo`.

`action` - The basic type of action to be taken when this argument is encountered at the command line.

`nargs` - The number of command-line arguments that should be consumed.

`const` - A constant value required by some action and nargs selections.

`default` - The value produced if the argument is absent from the command line.

`type` - The type to which the command-line argument should be converted.

`choices` - A container of the allowable values for the argument.

`required` - Whether or not the command-line option may be omitted (optionals only).

`help` - A brief description of what the argument does.

`metavar` - A name for the argument in usage messages.

`dest` - The name of the attribute to be added to the object returned by `parse_args()`.


ArgumentParser parses arguments through the `parse_args()` method. This will inspect the command line, convert each argument to the appropriate type and then invoke the appropriate action.

In [26]:
args = parser.parse_args(['--flag', '99'])

`parse_args()` returns an object which will have the arguments as attributes:

In [27]:
args.number

99

In [28]:
args.flag

True