<a href="https://www.hydroffice.org/epom/"><img src="images/000_000_epom_logo.png" alt="ePOM" title="Open ePOM home page" align="center" width="12%" alt="Python logo\"></a>

<a href="https://piazza.com/class/js5dnu0q39n6qe"><img src="images/help.png" alt="ePOM" title="Ask questions on Piazza.com" align="right" width="10%" alt="Piazza.com\"></a>
# Read and Write Text Files

You have learned about lists and how to [write your own functions](005_Write_Your_Own_Functions.ipynb) with [loops](004_Loops.ipynb) and [conditional statements](003_Conditional_Execution.ipynb). This allows you to write programs performing a variety of tasks. 

However, a convenient mechanism to access data that you want to analyze is currently missing. In this notebook, we will explore the use of [files](https://en.wikipedia.org/wiki/Computer_file) since they are a common way to access stored data.

<img align="left" width="6%" style="padding-right:10px;" src="images/key.png">

A **file** provides a mechanism for **permanently storing information**.

There are two main types of files: text files and binary files.

<img align="left" width="6%" style="padding-right:10px;" src="images/key.png">

A **text file** is a file that only contains plain or marked-up text.<br>A **binary file** is any other type of computer file that does not fit the previous definition of a text file.

You can often recognize a text file by looking at the [file extension](https://en.wikipedia.org/wiki/Filename_extension). Commonly used text files extensions are: `.txt`, `.asc`, `.xyz`.

<img align="left" width="6%" style="padding-right:10px;" src="images/info.png">

A very simple test to evaluate whether a given file is a text file is to open it in a text editor. If you can recognize the visualized content of an opened file as text, then the file is likely a text file. *(Be warned that opening a file in this way can take a long time depending on the size of the file.)*

We will first introduce some file management capabilities of the `os.path` [Python module](https://docs.python.org/3.9/tutorial/modules.html#modules), then we will use the functions that Python provides for [reading and writing the content of a text file](https://docs.python.org/3.9/tutorial/inputoutput.html).

<img align="left" width="6%" style="padding-right:10px;" src="images/key.png">

In Python, a **module** is a file containing code (e.g., definitions and statements). 

The module name is given by the file name without the [file extension](https://en.wikipedia.org/wiki/Filename_extension). For example, a file `example.py` identifies the module `example`.

<img align="left" width="6%" style="padding-right:10px;" src="images/key.png">

Python has native functions to read and write text files.

## The `os.path` module

Here you will explore the `os.path` module to retrieve data files that are stored on the server's hard disk.

<img align="left" width="6%" style="padding-right:10px;" src="images/info.png">

Several functions in the `os.path` module are **portable**. This means that they can be used [across different operating systems](https://en.wikipedia.org/wiki/Cross-platform_software). For example, you can use `os.path` functionality in code that runs on [Linux Ubuntu](https://en.wikipedia.org/wiki/Ubuntu) and [Microsoft Windows 10](https://en.wikipedia.org/wiki/Windows_10).

<img align="left" width="6%" style="padding-right:10px;" src="images/key.png">

To have access to the `os.path` module functionality, it first must be made available using the **import** statement.

In the example below, we write a `get_current_folder()` function that returns the path of the folder where *this* notebook is located.

To achieve this task, we will use two of the `os.path` functions and variables:

- `curdir`: The string used by the [operating system](https://en.wikipedia.org/wiki/Operating_system) to refer to the current directory.
- `abspath()`: A function that returns the [absolute path](https://en.wikipedia.org/wiki/Path_(computing)#Absolute_and_relative_paths).

<img align="left" width="6%" style="padding-right:10px;" src="images/info.png">

An [**absolute path**](https://en.wikipedia.org/wiki/Path_(computing)#Absolute_and_relative_paths) points to the same location in a file system, regardless of the current working directory. In contrast, a [**relative path**](https://en.wikipedia.org/wiki/Path_(computing)#Absolute_and_relative_paths) starts from a given working directory.

In [None]:
import os.path

def get_current_folder():
    cur_folder = os.path.abspath(os.path.curdir) # get a string containing the absolute path to the current directory
    return cur_folder 

print("The current folder is: " + get_current_folder())

 The data that we will use for our following examples are contained in the `data` sub-folder (see image below): 

![Function Terminology](images/007_100_data_folder.png)

To be able to access the `data` sub-folder, we extend the previous code using `os.path.join()` and `os.path.exists()` functions to:

- Create the absolute path to the `data` sub-folder.
- Check whether the resulting path actually exists.

In case that the `data` sub-folder does not exist, we raise an error using the [`raise`](https://docs.python.org/3.9/tutorial/errors.html#raising-exceptions) keyword.

In [None]:
def get_data_folder():
    cur_folder = os.path.abspath(os.path.curdir) # get a string containing the absolute path to the current directory
    data_folder = os.path.join(cur_folder, "data") # augment the cur_folder path with the data directory
    if os.path.exists(data_folder): 
        return data_folder
    else:  # raise a meaningful error if the data folder does not exist
        raise RuntimeError("Unable to locate the data folder: " + data_folder)

print("The data folder is: " + get_data_folder())

<img align="left" width="6%" style="padding-right:10px;" src="images/key.png">

We did not import the `os.path` module since it was previously imported in this notebook. Re-importing a module does not break your code, but makes it more verbose. 

If you decide to [clear the results of this notebook](000_Welcome_on_Board.ipynb#How-to-Clear-the-Results-of-a-Notebook?), you will need to re-execute the code cell with the `import` statement.

<img align="left" width="6%" style="padding-right:10px;" src="images/info.png">

If you want to know more about error handing and exceptions, read [Errors and Exceptions](https://docs.python.org/3.9/tutorial/errors.html).

We will now retrieve all the paths to the files in the `data` folder. Specifically, we will create a function `get_data_paths()` that returns a list containing all the files in that folder, using the `os.listdir()` function from the `os` module.

In [None]:
import os

def get_data_paths():
    data_paths = list()  # create an empty list to be populated and returned
    data_folder = get_data_folder() # call the function you created to return the data directory path
    data_filenames = os.listdir(data_folder) # call listdir() to get all the filenames in your data directory

    # Combine each data file name in the data_filenames list with the full path
    for data_filename in data_filenames:
        data_path = os.path.join(data_folder, data_filename) # join the absolute path with the file name
        data_paths.append(data_path)  # add the data_path for this file to a list of data paths
    
    data_paths.sort()  # sort the paths in alphabetical order

    return data_paths  # return the list of all the file paths


retrieved_paths = get_data_paths()
print("The data paths are: " + str(retrieved_paths))

In the above code, we wrote a function in which a list: `data_paths` is created, populated and returned. The function:

- Creates an empty list.
- Calls the previously created function `get_data_folder()`.
- Uses Python functions from the `os.path` module: e.g., `listdir()`, `join()`.
- Executes a `for` loop to populate the `data_paths` list.
- Returns the populated list.

<img align="left" width="6%" style="padding-right:10px;" src="images/key.png">

You do not need to remember all the names of the available Python functions, but you need to learn how to search for them. The [official Python documentation](https://docs.python.org/3.9/index.html) is a good place to start. You can also get a list of the functions in the `os.path` module by entering `dir(os.path)` in a code cell.

From the [Lists of Variables notebook](002_Lists_of_Variables.ipynb), you know how to access an item in a list by its index. 

In this particular case, to access the file name `sal.txt`, we will use `1` as index since it is the **second** element in the `retrieved_paths` list.

In [None]:
sal_path = retrieved_paths[1]
print("The file path with index 1 is: " + sal_path)

<img align="left" width="6%" style="padding-right:10px;" src="images/key.png">

**CAUTION**: The above solution works **specifically** with the current directory content. For instance, changing the directory content may alter the index of the salinity file in the list, thus breaking your code.

In the next section, you will learn how to open and read the content of `sal_path`.

***

## Read a Text File

<img align="left" width="6%" style="padding-right:10px;" src="images/key.png">

The Python `open()` function takes the name of the file (as a parameter) and returns a [file object](https://docs.python.org/3.9/glossary.html#term-file-object). <br> The `close()` function closes the open file.

This file object can be used to read the sequence of characters contained in a text file in a few different ways:

- The `readline()` method reads a single line from the text file.

In [None]:
sal_file = open(sal_path)

sal_line = sal_file.readline()
print(sal_line)

sal_file.close()  # the close() method closes the file

- The `read()` method reads the entire text file.  

In [None]:
sal_file = open(sal_path)

sal_content = sal_file.read() # assigns the contents of the file to a string variable
print(sal_content)

sal_file.close()  # the close() method closes the file

The execution of the above code will print the 20 salinity values in the text file. Although they look like numbers, they are **actually** a single `str` consisting of a 100 characters!

In [None]:
type(sal_content)

In [None]:
len(sal_content)

You may ask why there are 100 characters instead of 80? Each of the 20 rows has 4 visible characters (e.g., `30.8`), but there is also the invisible [newline character](https://en.wikipedia.org/wiki/Newline) (i.e., `\n`) that text editors treat as a break between two lines. Thus, `(4+1) * 20 = 100` characters.

<img align="left" width="6%" style="padding-right:10px;" src="images/key.png">

The **newline character** is used to end a line of text and to start a new one.

In the code above, the `sal_content` variable holds the content of the file as a single sequence of characters.

We will now write a function that not only reads the sequence of characters, but also splits them in multiple lines based on the **newline character** (using the `str` method named `splitlines()`). Finally, we convert the results to corresponding `float` values and append these to `sal_list`.

In [None]:
def read_salinity_values(input_path):
    sal_list = list()
    
    sal_file = open(input_path)
    sal_content = sal_file.read()
    sal_file.close()
    
    sal_lines = sal_content.splitlines()  # split the string sal_content by the newline characters in this file
    for sal_line in sal_lines:
        sal_list.append(float(sal_line))  # convert the string in each line to float, then append to the list
    
    return sal_list

sal_path = retrieved_paths[1]
sal_values = read_salinity_values(input_path=sal_path) # does the same as sal_values = read_salinity_values(sal_path)
print(sal_values)

<img align="left" width="6%" style="padding-right:10px;" src="images/info.png">

In Python, there are more efficient methods to read a text file. We adopted an approach here that is simple to understand for a first-time learner.

<img align="left" width="6%" style="padding-right:10px;" src="images/test.png">

Write code similar to the previous **Code** cell, but that reads the temperature values in the `temp.txt` file. *Hint: the path is at index `2` in the `retrieved_paths` list.*

In [None]:
def read_temperature_values(input_path):
    temp_list = list()
    
    temp_file = open(input_path)
    temp_content = temp_file.read()
    temp_file.close()
    
    temp_lines = temp_content.splitlines()
    for temp_line in temp_lines:
        temp_list.append(float(temp_line))
    
    return temp_list

temp_path = retrieved_paths[2]
temp_values = read_temperature_values(input_path=temp_path)
print(temp_values)

***

## Write a Text File

If you want to write a text file, you need to decide where to store it. For this collection of notebooks, we will use the `output` directory that can be retrieved running the following code:

In [None]:
def get_output_folder():
    cur_folder = os.path.abspath(os.path.curdir)  # the absolute path to the current directory
    output_folder = os.path.join(cur_folder, "output")  # the absolute path to the output folder (may or may not exist)
    if os.path.exists(output_folder):
        return output_folder
    else:  # raise a meaningful error if the output folder does not exist
        raise RuntimeError("Unable to locate the output folder: " + output_folder)

output_folder = get_output_folder()
print("The output folder is: " + output_folder)

You can then use the `join()` function to assign an output file name: e.g., `depths.txt`.

In [None]:
depths_path = os.path.join(output_folder, "depths.txt")
print("The output file path is: " + depths_path)

In the code below, the `write_list_to_disk` function has the arguments:

* `output_path` as the location where the output file is to be written. 
* `input_list` as a list containing the data to be written to the output file.

The `write_list_to_disk` function below opens a file in the `w` mode (`w` is for *write*)

<img align="left" width="6%" style="padding-right:10px;" src="images/info.png">

You may learn about other modes for opening a file from the official [Python documentation](https://docs.python.org/3.9/library/functions.html?#open).

In [None]:
def write_list_to_disk(output_path, input_list):
    
    output_file = open(output_path, mode="w")  # mode="w" to open the file in writing mode
    
    for value in input_list:
        line_content = str(value) + "\n"  # the "\n" is the newline character
        output_file.write(line_content)
        
    output_file.close()
    
depths_path = os.path.join(output_folder, "depths.txt")
depths_list = [1943.2, 1232.1, 132.2, 2.42, 123.5, 1093.2]
write_list_to_disk(output_path=depths_path, input_list=depths_list)

<img align="left" width="6%" style="padding-right:10px;" src="images/test.png">

Write code that prints the contents of the "depths.txt" file written in the above *Code* cell.

In [None]:
depths_path = os.path.join(output_folder, "depths.txt")  # the file path is the same as before

depths_file = open(depths_path)  # we don't need to add the `mode` parameter since the default value is 'r' for read
depths_content = depths_file.read()  # we retrieve the full content of the file as a single, multi-line string
depths_file.close()  # we close the file since we have already read its content

print(depths_content)  # we print the string content of the file

***

## Read and Write Binary Files

<img align="left" width="6%" style="padding-right:10px;" src="images/info.png">

Although Python also provides access to the content of a [binary file](https://en.wikipedia.org/wiki/Binary_file), you will need to know the format specifications on how the content is organized to properly interpret its content. This kind of task is outside the scope of this collection of notebooks.

***

<img align="left" width="6%" style="padding-right:10px; padding-top:10px;" src="images/refs.png">

## Useful References

* [The official Python 3.9 documentation](https://docs.python.org/3.9/index.html)
  * [The os module](https://docs.python.org/3.9/library/os.html)
  * [Input and Output](https://docs.python.org/3.9/tutorial/inputoutput.html)
* [Cross-platform software](https://en.wikipedia.org/wiki/Cross-platform_software)
* [Computer file](https://en.wikipedia.org/wiki/Computer_file)
  * [Text file](https://en.wikipedia.org/wiki/Text_file)
  * [Binary file](https://en.wikipedia.org/wiki/Binary_file)
  * [Filename extension](https://en.wikipedia.org/wiki/Filename_extension)
* [Absolute and relative paths](https://en.wikipedia.org/wiki/Path_(computing)#Absolute_and_relative_paths)

<img align="left" width="5%" style="padding-right:10px;" src="images/email.png">

*For issues or suggestions related to this notebook, write to: epom@ccom.unh.edu*

<!--NAVIGATION-->
[< Write Your Own Functions](005_Write_Your_Own_Functions.ipynb) | [Contents](index.ipynb) | [Dictionaries and Metadata >](007_Dictionaries_and_Metadata.ipynb)