# Introduction to python for hydrologists &mdash; file input and output

In this exercise we will be learning about using python to work with file input and output. We will also learn a little about reading and writing formatted ascii files, binary files, and retrieving data from the web to creat plots and data files. 

### Initialization of Notebook
Populate the interactive namespace from numpy and matplotlib.

In [None]:
import sys
import os
import traceback
import numpy as np
import matplotlib.pyplot as plt

pthnb = os.path.join('data', 'fileio')
if not os.path.exists(pthnb):
    os.makedirs(pthnb)

### Exercise sources
http://openbookproject.net/thinkcs/python/english3e/files.html

### About files
While a program is running, its data is stored in random access memory (RAM). RAM is fast and inexpensive, but it is also **volatile**, which means that when the program ends, or the computer shuts down, data in RAM disappears. To make data available the next time the computer is turned on and the program is started, it has to be written to a non-volatile storage medium, such a hard drive, usb drive, or CD-RW.

Data on **non-volatile** storage media is stored in named locations on the media called **files**. By reading and writing files, programs can save information between program runs.

Working with files is a lot like working with a notebook. To use a notebook, it has to be opened. When done, it has to be closed. While the notebook is open, it can either be read from or written to. In either case, the notebook holder knows where they are. They can read the whole notebook in its natural order or they can skip around.

All of this applies to files as well. To open a file, we specify its name and indicate whether we want to read or write.

### Writing our first file
Let’s begin with a simple program that writes three lines of text into a file:

In [None]:
fname = os.path.join(pthnb,'test.txt')
myfile = open(fname, 'w')
myfile.write('My first file written from Python\n')
myfile.write('---------------------------------\n')
myfile.write('Hello, world!\n')
myfile.close()

Opening a file creates what we call a file **handle**. In this example, the variable myfile refers to the new handle object. Our program calls methods on the handle, and this makes changes to the actual file which is usually located on our disk.

On line 1, the open function takes two arguments. The first is the name of the file, and the second is the mode. Mode "w" means that we are opening the file for writing.

With mode "w", if there is no file named test.txt on the disk, it will be created. If there already is one, it will be replaced by the file we are writing.

To put data in the file we invoke the write method on the handle, shown in lines 2, 3 and 4 above. In bigger programs, lines 2–4 will usually be replaced by a loop that writes many more lines into the file.

Closing the file handle (line 5) tells the system that we are done writing and makes the disk file available for reading by other programs (or by our own program).

### Reading a file line-at-a-time
Now that the file exists on our disk, we can open it, this time for reading, and read all the lines in the file, one at a time. This time, the mode argument is `'r'` for reading:

In [None]:
mynewhandle = open(fname, 'r')
while True:                            # Keep reading forever
    theline = mynewhandle.readline()   # Try to read next line
    if len(theline) == 0:              # If there are no more lines
        break                          #     leave the loop
    # Now process the line we've just read
    print(theline.rstrip())
mynewhandle.close()

In bigger programs, we’d squeeze more extensive logic into the body of the loop at the `print` statement. For example, if each line of the file contained the name and email address of one of our friends, perhaps we’d split the line into some pieces and call a function to send the friend a party invitation.

We suppress the newline character (`'\n'`) in the string `theline` using `.rstrip()`. Why? 

This is because the string already has its own newline : the `readline` method in line 3 returns everything up to and including the newline character. This also explains the end-of-file detection logic: when there are no more lines to be read from the file, `readline` returns an empty string — one that does not even have a newline at the end, hence its length is 0.

See https://docs.python.org/2/library/string.html for more information on common string operations.

### What if we open a file that does not exist
If we try to open a file that does not exist, we get an error:

In [None]:
try:
    mynewhandle = open('wharrah.txt', 'r')
except Exception as e:
    traceback.print_exc()    

### Class Activity 1

Write and read your own file. Use the completed code blocks above as a template. **Don't be ashamed to adapt code you got from someone else or on the internet to accomplish something useful.**

### File paths
Files on non-volatile storage media are organized by a set of rules known as a **file system**. File systems are made up of files and **directories**, which are containers for both files and other directories.

By default, when we create a new file by opening it goes in the current directory (wherever we were when we ran the program). Similarly, when we open a file for reading, Python looks for it in the current directory. In the above example, we have used `os.path.join()` to write the file to a specific directory rather than the directory we are running this notebook from. If `'test.txt'` was used in the `open` statement, the file would have been written to the current working directory. The current working directory can be determined using:

In [None]:
os.getcwd()

Determine the directory that we wrote `'test.txt'` (`fname`) to in the blank code block below using `os.path.abspath()`:

In [None]:
os.path.abspath(fname)

On Windows, a full path could look like `'C:\\temp\\somefile.txt'`, while on MacOSX, Linux, and Unix systems the full path could be `'/home/jimmy/somefile.txt'`. Because backslashes are used to escape things like newlines and tabs, we need to write two backslashes in a literal string to get one! So the length of these two strings is the same!

We cannot use `/` or `\` as part of a filename; they are reserved as a delimiter between directory and filenames.

`os.path` includes a number of useful methods for manipulating pathnames. For example, `os.path.normpath(path)` can be used to take Unix style paths into paths that can be used with Windows.

Take a look at https://docs.python.org/2/library/os.path.html for more information of `os.path` methods.

### Turning a file into a list of lines
It is often useful to fetch data from a disk file and turn it into a list of lines. Suppose we have a file containing our friends and their email addresses, one per line in the file. But we’d like the lines sorted into alphabetical order. A good plan is to read everything into a list of lines, then sort the list, then write the sorted list back to another file, and then read the sorted file and print the data from the file:

In [None]:
fname = os.path.join(pthnb, 'friends.txt')
f = open(fname, 'r')
xs = f.readlines()
f.close()

xs.sort()

gname = os.path.join(pthnb, 'sortedfriends.txt')
g = open(gname, 'w')
for v in xs:
    g.write(v)
g.close()

mynewhandle = open(gname, 'r')
while True:                            # Keep reading forever
    theline = mynewhandle.readline()   # Try to read next line
    if len(theline) == 0:              # If there are no more lines
        break                          #     leave the loop
    # Now process the line we've just read
    print(theline.rstrip())
mynewhandle.close()

### Adding data from a file to a list of select data
It is also useful to fetch data from a disk file, extracting the data from the lines read from the disk file, and add select data to a list. We will read the the names of our friends from the sorted file we just created, and add the last name, the first name and the email address to a list. We will then print the list:

In [None]:
mynewhandle = open(gname, 'r')
mylist = []
for theline in mynewhandle: 
    t = theline.strip().split(',')
    mylist.append([t[0].strip(), t[1].strip(), t[3].strip()])
mynewhandle.close()

for [ln, fn, e] in mylist:
    print('{0:20s}, {1:1s}.: {2}'.format(ln, fn[0], e))

### Class Activity 2

1. What does the `split()` method do?
2. What does `fn[0]` do? What would `fn[:2]` do? What is `mylist[2]`?

Use the empty code blocks below to answer these questions. **Remember it os ok to use the internet as a crutch.**

### String format statements

“Format specifications” are used within replacement fields contained within a format string to define how individual values are presented. They can also be passed directly to the built-in `format()` function. Each formattable type may define how the format specification is to be interpreted.

Most built-in types implement the following options for format specifications, although some of the formatting options are only supported by the numeric types.

A general convention is that an empty format string (`''`) produces the same result as if you had called `str()` on the value. A non-empty format string typically modifies the result.

**The general form of a standard format specifier is:**

* format_spec :   `[[fill]align][sign][#][0][width][,][.precision][type]`

* fill        :   `<any character> `

* align       :   `'<'.  '>'.  '='.  '^'`

* sign        :   `'+', '-', ' '` 

* width       :   `integer`

* precision   :   `integer`

* type        :  `'b', 'c', 'd', 'e', 'E', 'f', 'F', 'g', 'G', 'n', 'o', 's', 'x', 'X', ,'%'`

If a valid *align* value is specified, it can be preceded by a *fill* character that can be any character and defaults to a space if omitted. Note that it is not possible to use { and } as *fill* `char` while using the `str.format()` method; this limitation however doesn’t affect the `format()` function.

**The meaning of the various alignment options is as follows:**

**Option**

* `'<'`	: Forces the field to be left-aligned within the available space (this is the default for most objects).
* `'>'`	: Forces the field to be right-aligned within the available space (this is the default for numbers).
* `'='`	: Forces the padding to be placed after the sign (if any) but before the digits. This is used for printing fields in the form ‘+000000120’. : This alignment option is only valid for numeric types.
* `'^'`	: Forces the field to be centered within the available space.

Note that unless a minimum field width is defined, the field width will always be the same size as the data to fill it, so that the alignment option has no meaning in this case.

The sign option is only valid for number types, and can be one of the following:

**Option**

* `'+'`	: indicates that a sign should be used for both positive as well as negative numbers.
* `'-'`	: indicates that a sign should be used only for negative numbers (this is the default behavior).
* `space`	: indicates that a leading space should be used on positive numbers, and a minus sign on negative numbers.

The `','` option signals the use of a comma for a thousands separator.


Accessing arguments by position:

In [None]:
print('{0}, {1}, {2}'.format('a', 'b', 'c'))
print('{}, {}, {}'.format('a', 'b', 'c'))  # 2.7+ only
print('{2}, {1}, {0}'.format('a', 'b', 'c'))
print('{2}, {1}, {0}'.format(*'abc'))      # unpacking argument sequence
print('{0}{1}{0}'.format('abra', 'cad'))   # arguments' indices can be repeated

Accessing arguments by name:

In [None]:
print('Coordinates: {latitude}, {longitude}'.format(latitude='37.24N', longitude='-115.81W'))
coord = {'latitude': '37.24N', 'longitude': '-115.81W'}
print('Coordinates: {latitude}, {longitude}'.format(**coord))

Accessing arguments’ items:

In [None]:
coord = (3, 5)
print('X: {0[0]};  Y: {0[1]}'.format(coord))

Aligning the text and specifying a width:

In [None]:
print('{:<30}'.format('left aligned'))
print('{:>30}'.format('right aligned'))
print('{:^30}'.format('centered'))
print('{:*^30}'.format('centered'))  # use '*' as a fill char

Replacing %+f, %-f, and % f and specifying a sign:

In [None]:
print('{:+f}; {:+f}'.format(3.14, -3.14))  # show it always
print('{: f}; {: f}'.format(3.14, -3.14))  # show a space for positive numbers
print(' {:-f}; {:-f}'.format(3.14, -3.14))  # show only the minus -- same as '{:f}; {:f}'

Using the comma as a thousands separator:

In [None]:
print('{:,}'.format(1234567890))

Expressing a percentage:

In [None]:
points = 19.5
total = 22
print('Correct answers: {:.2%}'.format(points/total))

Using type-specific formatting:

In [None]:
import datetime
d = datetime.datetime(2010, 7, 4, 12, 15, 58)
print('{:%Y-%m-%d %H:%M:%S}'.format(d))

These string formatting examples are from https://docs.python.org/2/library/string.html#formatstrings.

### Class Activity 3
Use the empty code blocks below to write the following:

1. print `'MODFLOW', 1., 1999'`. Enter them in this order in the format method but print them in reverse order.
2. print `'MODFLOW', 'SUTRA'` left and right justified in 25 character strings.
3. print `'MODFLOW'` three times and centered in a 10 character string but only include it once in the format method.

#### Reading and interpreting fixed format data from a string

The following example shows how python can be used to parse a fixed format string with touching numbers. This can be difficult to do in other programming languages.

In [None]:
d = '01.1102.2203.3304.4405.5506.6607.7708.8809.9910.1011.1112.1213.1314.1415.1516.1617.1718.1819.1920.20'
rawdata = []
width = 5
istart, istop = 0, width
for idx in range(0, len(d), width):
    rawdata.append(d[istart:istop])
    istart = istop
    istop += width
fd = np.empty((len(rawdata)), np.float)
for idx, raw in enumerate(rawdata):
    fd[idx] = float(raw)
print(fd)

### Reading the whole file at once
Another way of working with text files is to read the complete contents of the file into a string, and then to use our string-processing skills to work with the contents.

We’d normally use this method of processing files if we were not interested in the line structure of the file. Prior to the `split()`, we replace all commas (`','`) in the line with no space (`''`) and double spaces (`'  '`) with a single space (`' '`). And finally, we replace the line termination string (`'\n'`) with a comma and a space (`', '`). So here is how we might count the number of words in a file:

In [None]:
f = open(gname)
content = f.read()
f.close()

# remove commas and double spaces from line
content = content.replace(',','')
content = content.replace('  ',' ')
# replace line ending with ", "
content = content.replace('\n',', ')

words = content.split()
print('There are {0} words in the file.'.format(len(words)))

print('{0}'.format(content))

Notice here that we left out the `'r'` mode in the `open()` statement. By default, if we don’t supply the mode, Python opens the file for reading.

### Working with binary files
Files that hold photographs, videos, zip files, executable programs, etc. are called binary files. Binary files are not organized into lines, and cannot be opened with a normal text editor. MODFLOW and MT3DMS commonly write output data as binary files. So being able to use python to work with binary files can be a useful skill.

We will use the numpy library to create a random data array (100 rows x 50 columns), write the binary data, and read the binary data back in. We will learn more about these the numpy library in subsequent exercises.

In [None]:
# create the random array
arr = np.random.random(size=(100, 50))
# create a file name
fname = os.path.join(pthnb, 'random.dat')

# open the file for writing
f = open(fname, 'wb')
# save arr to a binary file
arr.tofile(f)
# close the file
f.close()


# open the file for reading
f = open(fname, 'rb')
# read the data back in
arr2 = np.fromfile(f)
# reshape for comparison
arr2 = arr2.reshape((100, 50))
print(arr2.shape)
# close the file
f.close()

In the open statement we added a **"b"** to the mode to tell Python that the files are binary rather than text files. The `np.fromfile()` method returns a one-dimensional vector which must be reshaped in order to compar it with the original two-dimensional array.

the `np.ndarray.tofile()` and `np.fromfile()` methods can also handle the file opening and closing. A simpler version of the code block above is:

```python
# create the random array
arr = np.random.random(size=(100, 50))
# create a file name
fname = os.path.join(pthnb, 'random.dat')

# save arr to a binary file
arr.tofile(fname)

# read the data back in
arr2 = np.fromfile(fname)
```
You can compare `arr` and `arr2` to confirm that the contents written to and read from `random.dat` are the same by entering the following the the blank code block below.

```python
np.array_equal(arr, arr2)
```

For working with MODFLOW-based model results, flopy includes methods for reading data from binary files. We will learn more about flopy post-processing functionality in subsequent exercises.

### An example
Many useful line-processing programs will read a text file line-at-a-time and do some minor processing as they write the lines to an output file. They might number the lines in the output file, or insert extra blank lines after every 60 lines to make it convenient for printing on sheets of paper, or extract some specific columns only from each line in the source file, or only print lines that contain a specific substring. We call this kind of program a filter.

Here is a filter that copies one file to another, omitting any lines that begin with #:

In [None]:
def filter(oldfile, newfile):
     infile = open(oldfile, 'r')
     outfile = open(newfile, 'w')
     while True:
         text = infile.readline()
         if len(text) == 0:
            break
         if text[0] == '#':
            continue

         # Put any more processing logic here
         outfile.write(text)

     infile.close()
     outfile.close()

The `continue` statement skips over the remaining lines in the current iteration of the loop, but the loop will still iterate. This style looks a bit contrived here, but it is often useful to say *“get the lines we’re not concerned with out of the way early, so that we have cleaner more focused logic in the meaty part of the loop that might be written around line 11.”*

Thus, if `text` is the empty string, the loop exits. If the first character of `text` is a hash mark, the flow of execution goes to the top of the loop, ready to start processing the next line. Only if both conditions fail do we fall through to do the processing at line 11, in this example, writing the line into the new file.

Let’s consider one more case: suppose our original file contained empty lines. At the  `if len(text) == 0` line, would this program find the first empty line in the file, and terminate immediately? No! Recall that `readline` always includes the newline character in the string it returns. It is only when we try to read beyond the end of the file that we get back the empty string of length 0.

### Class Activity 4
Let's use the `filter` function to remove the comment lines from `'FileWithComments.txt'` and create `'FileWithOutComments.txt'`. Use one of the approaches discussed above to open, read, and print data in both files after calling the `filter` function. Use the Remember to use `os.path.join` to access the file in the `pthnb` directory. 

An example of how to do it this is:

    fname = os.path.join(pthnb, 'FileWithComments.txt')
    gname = os.path.join(pthnb, 'FileWithOutComments.txt')
    filter(fname,gname)

    names = [fname, gname]
    for name in names:
        print('processing...{0}'.format(os.path.basename(name)))
        mynewhandle = open(name, 'r')
        for theline in mynewhandle:
            print(theline.rstrip())
        mynewhandle.close()

Use the blank code block below to complete this activity.

### Retrieving data from the web
You can access the web to copy content from a web URL file into memory or to a local file.

First we will install a new python package from the web (more on this later). To do this open a command line and type

```
pip install hydrofunctions
```

This python class is a simple wrapper around the USGS Water Information System JSON API.

Next we will import the hydrofunctions python class, and pandas so we can retrieve NWIS data and plot it. We will be using pandas since it provides an easy way to plot the data with matplotlib. We will learn more about these two python libraries in subsequent exercises.

In [None]:
import hydrofunctions as hf
import pandas as pd

Next we will pull discharge data for the Mississippi River at St. Louis, MO (site number 07010000) for water year 2016 and then plot the data.

In [None]:
site = '07010000'
start = '2015-10-01'
end = '2016-09-30'
stl = hf.get_nwis(site, 'dv', start, end)
stl.ok

In [None]:
# get the NWIS discharge data 
Q = hf.extract_nwis_df(stl.json())

# plot the discharge data
Q.plot()

We can get stage data for the same gage by creating an instance of the usgs class and passing the '00065' parameterCd url parameneter.

In [None]:
stl = hf.get_nwis(site, 'dv', start, end, parameterCd='00065')
stl.ok

In [None]:
# get the NWIS stage data 
h = hf.extract_nwis_df(stl.json())

# plot the stage data
h.plot()

### Class Activity 5 - *get data for another gage*
In the blank code block below retrieve discharge and/or stage from the **CLEAR CREEK NEAR LAWSON, CO (USGS 06716500)** gage (or another gage of your choosing) and plot it up.

Save the discharge or stage data for the gage you extracted to a csv file. One approach for doing this is:

```python
fname = os.path.join(pthnb, 'ClearCreekNearLawsonStage.csv')
f = open(fname, 'w')
for idx in range(h['value'].shape[0]):
    f.write('{},{}\n'.format(h.index[idx], h['value'][idx]))
f.close()
```

Information on how to specify different lengths of time to extract data using the `period` string can be found in the USGS time period documentation at <http://waterservices.usgs.gov/rest/IV-Service.html#Specifying>. The `period` parameter only accepts "period" values as explained in that documentation. A time range can be specified using `startDT` and `endDT` items in the `url_params` argument. See <http://waterservices.usgs.gov/rest/IV-Service.html> for a full list of parameters that can be specified in the `url_params` argument.

Modify your script to extract 5 years of data using either the `period` or `url_params` argument.