<a href="https://www.hydroffice.org/epom/"><img src="images/000_000_epom_logo.png" alt="ePOM" title="Open ePOM home page" align="center" width="12%" alt="Python logo\"></a>

<img align="center" width="10%" style="padding-right:10px;" src="images/work.png">

# Read and Write Text Files

Now that you understand about `list` and `dict` as well as how to write your own functions with loops and conditional statements, you can already write simply programs that perform quite useful operations. 

However, as a research assistant, you will likely need to access data that are (locally or remotely) stored in files. A file provides a mechanism for **permanently store** information so that they can be retrieved when your program and/or your machine are restarted.

There are two main types of files:

<img align="left" width="6%" style="padding-right:10px;" src="images/key.png">

A **text file** is a file that only contains plain or marked-up text.

<img align="left" width="6%" style="padding-right:10px;" src="images/key.png">

A **binary file** is any other type of file that does not fit the previous definition of text file.

You can often recognize a text files by looking at the [file extension](https://en.wikipedia.org/wiki/Filename_extension). Extensions commonly in use for text files are: `.txt`, `.asc`, `.xyz`.

<img align="left" width="6%" style="padding-right:10px;" src="images/info.png">

A very simple test to evaluate whether a given file is or not a text file is to try opening it using a text editor. If you can understand the visualized content, then that is file is likely a text file. *(Be warned that opening a file in this way can take a long time depending on the size of the file.)*

We will first introduce some file managing capability of the `os` [Python module](https://docs.python.org/3.6/tutorial/modules.html#modules), then we will describe the use of the functions that Python natively provides for [reading and writing the content of a text file](https://docs.python.org/3.6/tutorial/inputoutput.html).

<img align="left" width="6%" style="padding-right:10px;" src="images/key.png">

In Python, a **module** is a file containing definitions and statements. The module name is given by the file name without the suffix `.py`.

<img align="left" width="6%" style="padding-right:10px;" src="images/key.png">

Python has native functions to read and write text files.

## The `os` module

The `os` module provides a **portable** way of using several functionalities [across different operating systems](https://en.wikipedia.org/wiki/Cross-platform_software) (i.e., the same code can run on [Linux Ubuntu](https://en.wikipedia.org/wiki/Ubuntu) and [Microsoft Windows 10](https://en.wikipedia.org/wiki/Windows_10)).

In particular, we will explore the `os.path` sub-module to retrieve some data files that are stored on the server's hard disk.

The first required operation is to **import** the `os` module. Then, we will use some of the `os.path` sub-module functionalities and variables to write a function that returns the full path of the server's folder where this notebook is located:

- `curdir`: The constant string used by the operating system to refer to the current directory. E.g., `.` for Windows and Linux.
- `abspath()`: A function that returns the full, absolute version of a path.

In [None]:
import os

def get_current_folder():
    cur_folder = path.abspath(os.path.curdir)
    return cur_folder

print("The current folder is: " + get_current_folder())

The data are inside a `data` sub-folder. We now extend the previous code using `os.path.join()` and `os.path.exist()` functions to:

- Create the full path to the `data` sub-folder.
- Check whether the resulting path actually exists.

In case that the `data` sub-folder does not exist, we raise an error using the `raise` keyword.

In [None]:
def get_data_folder():
    cur_folder = os.path.abspath(os.path.curdir)
    data_folder = os.path.join(cur_folder, 'data')
    if os.path.exists(data_folder):
        return data_folder
    else:  # in case that the data folder does not exists, we raise a meaningful error
        raise RuntimeError("Unable to locate the data folder: " + data_folder)

print("The data folder is: " + get_data_folder())

<img align="left" width="6%" style="padding-right:10px;" src="images/key.png">

We did not import the `os` module since it was already imported in the previous cell. Re-importing a module does not break your code, but makes it more verbose. 

<img align="left" width="6%" style="padding-right:10px;" src="images/info.png">

If you want to know more about error handing and exceptions, read [Errors and Exceptions](https://docs.python.org/3.6/tutorial/errors.html).

We will now retrieve all the paths to the files in the `data` sub-folder. Specifically, we will create a function `get_data_paths()` that will returns a list containing all the files in that folder, using the `os.listdir()` function.

In [None]:
def get_data_paths():
    data_paths = list()  # create a empty list that will be populate and returned
    data_folder = get_data_folder()  # use the previously created function to get the data folder path
    data_filenames = os.listdir(data_folder)  # call listdir() to get all the filenames in the data folder
    
    # Since we want to return the full file path, not only the filename, we will loop through the filenames
    for data_filename in data_filenames:
        data_path = os.path.join(data_folder, data_filename)  # we join the data folder path with the filename
        data_paths.append(data_path)  # we append the resulting data file path
    
    return data_paths  # we return a list of all the file paths

retrieved_paths = get_data_paths()
print("The data paths are: " + str(retrieved_paths))

In the above code, we wrote a function in which:

- We created and populated a list: `data_paths`
- We reused a function that we previously created: `get_data_folder()`.
- We used several Python functions from the `os` module: e.g., `listdir()`, `join()`.
- We executed a `for` loop to populate the `data_paths` list.
- We returned the populated list.

<img align="left" width="6%" style="padding-right:10px;" src="images/key.png">

You don't need to remember all the names of the available Python functions. But you need to learn how to search for them. The [official Python documentation](https://docs.python.org/3.6/index.html) is usually a good place to start with.

From the [Lists of Variables notebook](002_Lists_of_Variables.ipynb), you should remember how to access a value in a list by its index. 

Thus, to access the file named `sal.txt`, we can use `1` as index since it is the **second** element in the list.

In [None]:
sal_path = retrieved_paths[1]
print("The file path with index 1 is: " + sal_path)

In the next section, you will learn how to open and read the content of these text files.

***

## Read a Text File

As discussed above, a text file is a sequence of characters stored on a permanent medium (e.g., a flash memory).

The Python `open()` function takes the name of the file (as a parameter) and returns a file object. 

This object can be used to read the sequence of characters in a few ways:

- The `readline()` method reads characters until there is a new line.  

In [None]:
sal_file = open(sal_path)

sal_line = sal_file.readline()
print(sal_line)

sal_file.close()  # the close() method closes the file

- The `read()` method reads all the characters in the file.  

In [None]:
sal_file = open(sal_path)

sal_content = sal_file.read()
print(sal_content)

sal_file.close()  # the close() method closes the file

The execution of the above code will print the 20 salinity values in the text file. Although they look like numbers, they are actually just a single `str` of 100 characters!

In [None]:
type(sal_content)

In [None]:
len(sal_content)

<img align="left" width="6%" style="padding-right:10px;" src="images/info.png">

Why the characters are 100? Each row has 4 visible characters (e.g., `30.8`) but there is also an invisible character that the text editor interprets as a new line. Thus, `(4+1) * 20 = 100` characters.

We will now write a function that reads the sequence of characters, but also split them by line (using the `str` method named `splitlines()`) and convert the result in the corresponding `float` value.

In [None]:
def read_salinity_values(input_path):
    sal_list = list()
    
    sal_file = open(input_path)
    sal_content = sal_file.read()
    sal_file.close()
    
    sal_lines = sal_content.splitlines()  # split the string retrieved from the file by new line
    for sal_line in sal_lines:
        sal_list.append(float(sal_line))  # convert the string in each line to float, then append to the list
    
    return sal_list

sal_path = retrieved_paths[1]
sal_values = read_salinity_values(input_path=sal_path)
print(sal_values)

<img align="left" width="6%" style="padding-right:10px;" src="images/info.png">

There are more efficient ways to read a text file. We adopted an approach that is simple to understand for a first learner.

<img align="left" width="6%" style="padding-right:10px;" src="images/test.png">

Write code similar to the previous **Code** cell, but that reads the temperature values in the `temp.txt` file. *Hint: the path is at index `2` in the `retrieved_paths` list.*

In [None]:
def read_temperature_values(input_path):
    temp_list = list()
    
    temp_file = open(input_path)
    temp_content = temp_file.read()
    temp_file.close()
    
    temp_lines = temp_content.splitlines()
    for temp_line in temp_lines:
        temp_list.append(float(temp_line))
    
    return temp_list

temp_path = retrieved_paths[2]
temp_values = read_temperature_values(input_path=temp_path)
print(temp_values)

***

## Write a Text File

The first required decision is the location on where to store the text file. For this collection of notebook, we will use the `output` sub-folder that can be retrieved running the following code:

In [None]:
def get_output_folder():
    cur_folder = os.path.abspath(os.path.curdir)
    output_folder = os.path.join(cur_folder, 'output')
    if path.exists(output_folder):
        return output_folder
    else:  # in case that the output folder does not exists, we raise a meaningful error
        raise RuntimeError("Unable to locate the output folder: " + output_folder)

output_folder = get_output_folder()
print("The output folder is: " + output_folder)

We then use `join()` function to store the output file: e.g., `depths.txt`.

In [None]:
depths_path = os.path.join(output_folder, 'depths.txt')
print("The output file path is: " + depths_path)

To write a file, you have to use the `open()` passing the mode `w` as second parameter. We put this function within a function that take a list as a second parameter and write the content into the text file.

In [None]:
def write_list_to_disk(output_path, input_list):
    
    output_file = open(output_path, mode='w')
    
    for value in input_list:
        line_content = str(value) + '\n'  # the '\n' is the character for the new line
        output_file.write(line_content)
        
    output_file.close()
    
depths_path = os.path.join(output_folder, 'depths.txt')
depths_list = [1943.2, 1232.1, 132.2, 2.42, 123.5, 1093.2]
write_list_to_disk(output_path=depths_path, input_list=depths_list)

<img align="left" width="6%" style="padding-right:10px;" src="images/test.png">

Write the code required to visualize the content of the file written in the above *Code* cell.

In [None]:
depths_path = os.path.join(output_folder, 'depths.txt')  # the file path is the same as before

depths_file = open(depths_path)  # we don't need to add the `mode` parameter since the default value is 'r' for read
depths_content = depths_file.read()  # we retrieve the full content of the file as a single, multi-line string
depths_file.close()  # we close the file since we have already read its content

print(depths_content)  # we print the string content of the file

***

## Read and Write Binary Files

<img align="left" width="6%" style="padding-right:10px;" src="images/info.png">

Although Python also provides access to the content of a [binary file](https://en.wikipedia.org/wiki/Binary_file), you will need to know the format specifications on how the content is organized to properly interpret its content. This kind of task is outside the scope of this collection of notebooks.

***

<img align="left" width="6%" style="padding-right:10px; padding-top:10px;" src="images/refs.png">

## Useful References

* [The official Python 3.6 documentation](https://docs.python.org/3.6/index.html)
  * [The os module](https://docs.python.org/3.6/library/os.html)
  * [Input and Output](https://docs.python.org/3.6/tutorial/inputoutput.html)
* [Cross-platform software](https://en.wikipedia.org/wiki/Cross-platform_software)
* [Text file](https://en.wikipedia.org/wiki/Text_file)
* [Binary file](https://en.wikipedia.org/wiki/Binary_file)
* [Filename extension](https://en.wikipedia.org/wiki/Filename_extension)

<img align="left" width="5%" style="padding-right:10px;" src="images/email.png">

*For issues or suggestions related to this notebook, write to: gmasetti@ccom.unh.edu*

<!--NAVIGATION-->
[< Dictionaries](006_Dictionaries.ipynb) | [Contents](index.ipynb) | [First Steps of a Class >](008_First_Steps_of_a_Class.ipynb)