# File objects

First of all - **File object and file content are NOT the same**.

A [file object][File] is the Pythonic way of "communicating" with the file, e.g. query its properties, manage its attributes, etc. One of the many actions applicable with a file object is to read/write its content. This "communication" is implemented by the built-in function [open()][open], which also sets some preliminary features of the object.

This will be a more intuitive differentiation when we'll be more acquainted with the Object-Oriented approach.

[File]: https://docs.python.org/2/library/stdtypes.html#file-objects "File object"
[open]: https://docs.python.org/2/library/functions.html#open "open() documentation"

## Open and close

_File_ objects are created by the _open(name[, mode])_ built-in function, where _name_ is the full file path and _mode_ is the mode in which the file is opened. Several modes are available, but the most common ones are **'r'** for reading (default), **'w'** for writing and **'a'** for appending.

It is not a healthy habit to leave open _File_ objects "hanging" in the file system, so we make sure to close them after we are done with them. The following three scripts illustrate exceedingly better syntaxes for addressing a file.

In [1]:
fname = "example.txt"

#### open() 1

In [3]:
my_file = open(fname, 'r')
# Here do something with the file...

# my_file.closed
# my_file.close()

#### open() 2

To make sure one does not forget to close the file, Python provides the **_with_** block, which **automatically closes the corresponding file** when the block ends. It is highly recommended to use it.

In [3]:
my_file = open(fname, 'r')
with my_file:
    # Here do something with the file...
    pass

#### open() 3

Finally, Python supports the following syntax to wrap it all compactly. **This is how it is usually done.**

In [4]:
with open(fname, 'r') as my_file:
    # Here do something with the file...
    pass

#### closed

For monitoring the status of the file, the attribute _closed_ is available.

In [5]:
with open(fname, 'r') as my_file:
    print(my_file.closed)
print(my_file.closed)

False
True


## Reading from files

There are four ways to read the data of a file:

* Iteratively:
    * with a _**for**_ loop
    * with the _**readline()**_ method
    
* As a whole:
    * with the _**read()**_ method
    * with the _**readlines()**_ method

### Read with a _for_ loop

_File_ objects are their own iterators, and their "elements" are their lines. Iterating a _File_ object with a _for_ loop will ieterate the lines of the file. Note that the lines include the "\n" at the end of each line (therefore the double-space print).

In [9]:
fname = "example.txt"

with open(fname) as f:
    for line in f:
        print(line)

This is the first line.

This is the second line.

This is the third and last line.This is the first line.

This is the second line.

This is the third and last line.


### Read with _readline()_

The method _readline()_ reads the next **single** line from the file. It is useful at specific scenarios, but is less convenient to work with.

In [10]:
fname = "example.txt"

with open(fname) as f:
    print(f.readline(), end=' ')
    f.readline()
    print(f.readline(), end=' ')

This is the first line.
 This is the third and last line.This is the first line.
 

### Read with _read()_

This method is the most simple one, as it simply reads the entire content of the file into a single string.

In [8]:
fname = "example.txt"

with open(fname) as f:
    print(f.read())

This is the first line.
This is the second line.
This is the third and last line.


### Read with _readlines()_

This method also reads the entire content of the file, but it creates a list whose elements are the lines of the file as strings.

In [11]:
fname = "example.txt"

with open(fname) as f:
    print(f.readlines())

['This is the first line.\n', 'This is the second line.\n', 'This is the third and last line.This is the first line.\n', 'This is the second line.\n', 'This is the third and last line.']


## Writing to files

### Writing methods

Similarly to _read()_ and _readlines()_ methods for reading, there are _**write()**_ and _**writelines()**_ methods for writing. _write()_ expects a single string and write it directly to the file, while _writelines()_ expects a list of strings and then writes them consequently to the file. It should be noted that _writelines()_ simply applies _write()_ iteratively, and no "\n"s are added in the process.

In [10]:
my_data_str = "This is the first line.\nThis is the second line.\nThis is the third and last line."
my_data_list = ['This is the first line.\n', 'This is the second line.\n', 'This is the third and last line.']

In [11]:
fname = "example.txt"

with open(fname, 'w') as f:
    f.write(my_data_str)

In [12]:
with open(fname, 'w') as f:
    f.writelines(my_data_list)   

> **NOTE:** While _readlines()_ automatically splits the text by \\n, _writelines()_ does not add \\n automatically.

### Writing modes

In standard writing mode, indicated by 'w', a new file will be created and an existing file will be overwritten. If we want to append the data to what is already in the file, then we should use the append mode, indicated by 'a'.

In [13]:
fname = "example.txt"

with open(fname, 'a') as f:
    f.write(my_data_str)

Now the content of the example2 file is:

In [14]:
with open(fname, 'r') as f:
    print(f.read())

This is the first line.
This is the second line.
This is the third and last line.This is the first line.
This is the second line.
This is the third and last line.


## Example

The file "players.txt" contains the names and ages of seven band members. Use the data of the file to create a new file called "sorted players.txt", in which the members are listed by the alphabetical order of their names.

We note that for sorting, it is easier to have the entire data in our hands.

### Solution 1 - _read()_ and _write()_

In [16]:
# Get the data
with open("players.txt", 'r') as f:
    data = f.read()

# Manipulate the data
data = data.split('\n')
sorted_data = sorted(data)
sorted_data = '\n'.join(sorted_data)

# Create the new file
with open("c:\\temp\\sorted players.txt", 'w') as f:
    f.write(sorted_data)

### Solution 2 - _readlines()_ and _writelines()_

In [17]:
# Get the data
with open("players.txt", 'r') as f:
    data = f.readlines()

# Manipulate the data
data[-1] += '\n' # Last line does not contain '\n'
sorted_data = sorted(data)

# Create the new file
with open("sorted players.txt", 'w') as f:
    f.writelines(sorted_data)

## Example

The file "Christmas.txt" contains some data about the customers of an online shop in the days before Christmas. Each customer is represented by two lines: the first records his login time to the site and the second his logout time and his total purchase. You may assume that no customer appears twice in the data.

Create a new file called "buyers.txt" that includes only the data about customers who made purchases. Each line in the new file should contain the customer id and his purchase.

In [18]:
with open("buyers.txt", 'w') as f_buyers:
    with open("Christmas.txt", 'r') as f_customers:
        for line in f_customers:
            if "logout" in line:
                buyer = line.split()
                revenue = float(buyer[-1])
                if revenue > 0:
                    f_buyers.write("{:<7} - {}\n".format(buyer[3], buyer[6]))

## Additional notes

* Many other file-related functionalities (copy, remove, delete, existence, etc.) are available in other modules, and we will see some of them later.

* The concept of "openning" is very general and is in use by many other **file-like** objects, including web-pages, I/O streams and others.

* Two other common aspects of working with files are not covered here, and the reader is encouraged to explore them further by referring to the following:
    * Buffering - the _open()_ argument _buffering_ and the _File_ method _flush()_ 
    * Position - the _File_ methods _seek()_ and _tell()_

* File extensions (e.g. txt, csv, html, etc.) are irrelevant for the _open()_ functionality. They are used by the OS to relate files to their relevant application.

* for dealing with paths, see [blog post about the python pathlib module][1]


[1]: https://treyhunner.com/2018/12/why-you-should-be-using-pathlib/