# Introduction to python for hydrologists &mdash; sys, path, shutil, and subprocess
These four packages are part of the standard python library and provide very useful functionality for working with your operating system and files.  This notebook will provide explore these packages and demonstrate some of their functionality.  Online documentation is at [sys](https://docs.python.org/2/library/sys.html "sys doc"), [os](https://docs.python.org/2/library/os.html "os doc"), [shutil](https://docs.python.org/2/library/shutil.html "shutil doc"), and [subprocess](https://docs.python.org/2/library/subprocess.html "subprocess doc").

Import things to cover:
* sys: path
* os: path, chdir, getcwd, listdir
* shutil: copy, copytree, rmtree
* subprocess: check_call, check_output

## Sys Module

System-specific parameters and functions.

The following cells simply print some of the sys methods and attributes that you might find useful.

In [None]:
import sys
import os
import shutil
import subprocess
import traceback
import zipfile

In [None]:
print('sys.argv: ', sys.argv)

In [None]:
print('sys.byteorder: ', sys.byteorder)

In [None]:
print('sys.copyright: ', sys.copyright)

In [None]:
print('sys.float_info: ', sys.float_info)

In [None]:
print('The size of an integer is ', sys.getsizeof(1), ' bytes.')
print('The size of a float is ', sys.getsizeof(1.0), ' bytes.')
print('The size of the string "Goldschlager" is ', sys.getsizeof('Goldschlager'), ' bytes.')

In [None]:
try:
    print(sys.getwindowsversion())
except:
    print('Why are you against windows?')

In [None]:
print(sys.prefix)

In [None]:
print(sys.version_info)

In [None]:
sys.platform

## sys.path

If you haven't seen `sys.path` already mentioned in a python script, you will soon.  `sys.path` is a list of directories.  This path list is used by python to search for python modules and packages.  If for some reason, you want to use a python package that is not installed in the main python folder, you can add directory containing your module to sys.path.

In [None]:
print(sys.path)

# Or more elegantly
for pth in sys.path:
    print(pth)

A common way that we add a folder to sys.path is as follows:

    pathtomymodule = os.path.join('..')
    if pathtomymodule not in sys.path:
        sys.path.append(pathtomymodule)

This will allow us to import any modules or packages that are up one directory from the current working directory.  Keep this in mind as we use this throughout the class exercises.

## os Module
Module for providing portable operating system functionality.

In [None]:
print('os.name: ', os.name)

In [None]:
#environment variables stored in a dictionary
print('os.environ: ', os.environ)
print('\n')

#or we can look at them in a nicer format
for k, v in os.environ.items():
    print('{0} : {1}'.format(k, v))

In [None]:
cwd = os.getcwd()
print(cwd)

In [None]:
#list all the entries in the specified directory. 
mylistofitems = os.listdir(os.getcwd())
for thingy in mylistofitems:
    if os.path.isdir(thingy):
        print('directory: ', thingy)
    else:
        print('file: ', thingy)

In [None]:
# Example of changing the working directory
old_wd = os.getcwd()

# Go up one directory
os.chdir('..')
cwd = os.getcwd()
print ('Now in: ', cwd)

# Change back to original
os.chdir(old_wd)
cwd = os.getcwd()
print('Switched back to: ', cwd)

## Glob
The glob library provides handy shorthand for listing files using patterns and wildcard (*) characters

https://en.wikipedia.org/wiki/Glob_(programming)

**Note!** Sorting of the files returned by `Glob` is platform-dependent. In general, if your code depends on a specific ordering of a list, it is best to explicitly sort it yourself using `sorted()` or `.sort()`, instead of depending on the behavior of an imported module.  
https://arstechnica.com/information-technology/2019/10/chemists-discover-cross-platform-python-scripts-not-so-cross-platform/

In [None]:
import glob

In [None]:
# list all of the Jupyter notebooks in the current working directory
glob.glob('*.ipynb')

In [None]:
sorted(glob.glob('*.ipynb'))

## os.path

os.path is a very widely used submodule of os.  In fact we use it in almost all of the class notebooks and scripts to deal with file system paths.  Some common os.path functions are:

    os.path.join()
    os.path.abspath()
    os.path.exists()
    os.path.isdir()
    os.path.normpath()
    os.path.split()
    os.path.splitext()
    
A common attribute of os.path is:

    os.path.sep
    
Experiment with these functions to gain a better understanding of what they do.

## os.walk

os.walk() is a great way to recursively generate all the file names and folders in a directory.  The following shows how it can be used to identify large directories.

In [None]:
pth = os.path.join('..')
for root, dirs, files in os.walk(pth):
    mbytes = sum(os.path.getsize(os.path.join(root, name)) for name in files) / 1.e6
    print('{:<50} --> {:10.2f} megabytes.'.format(root, mbytes))

## shutil Module
shutil is a high level file managment module for copying, moving, and deleting files and directories.

The functions from shutil that you may find useful are:

    shutil.copy()
    shutil.copytree()
    shutil.move()
    shutil.rmtree()  #obviously, you need to be careful with this one!
    
Give these guys a shot and see what they do.  Remember, you can always get help by typing:

    help(shutil.copy)


In [None]:
#try them here.  Be careful!


## subprocess Module

The subprocess module offers a way to execute system commands.  This is how we will run MODFLOW, for example, but you can also use run any operating system command that you can type at the command line.

subprocess.Popen() is the primary underlying function for running system commands, however, it is recommended that you use subprocess.check_output and subprocess.check_call instead.  Both of these functions use Popen.

Take a look at the following help descriptions for check_output, and check_call.

Note, that on Windows, you may commonly have to specify "shell=True" in order to access system commands.

In [None]:
help(subprocess.check_output)

In [None]:
help(subprocess.check_call)

In [None]:
# if on mac/unix
subprocess.check_output(['ls', '-l'], shell=True)

In [None]:
# if on Windows
try:
    output = subprocess.check_output(['dir'], shell=True)
    output
except:
    print('Why are you against windows?')

In [None]:
# What is going on here?
try:
    subprocess.check_call(['dir'], shell=True)
except Exception as e:
    traceback.print_exc()

## Zipfiles

#### zip up one of the files in data/

In [None]:
with zipfile.ZipFile('junk.zip', 'w') as dest:
    dest.write('data/430429089230301.dat')

#### now extract it

In [None]:
with zipfile.ZipFile('junk.zip') as src:
    src.extract('data/430429089230301.dat', path='data/extracted_data')

## Testing Your Skills with a truly awful example:

#### the problem:
Pretend that the file `data/fileio/netcdf_data.zip` contains some climate data that we downloaded. If you open `data/fileio/netcdf_data.zip`, you'll see that within a subfolder `zipped` are a bunch of additional subfolders, each for a different year. Within each subfolder is another zipfile. Within each of these zipfiles is yet another subfolder, inside of which is the actual data file we want (`prcp.nc`). 

#### the goal:
To extract all of these `prcp.nc` files into a single folder, after renaming them with their respective years (obtained from their enclosing folders or zip files). e.g.  
```
prcp_1980.nc
prcp_1981.nc
...
```
This will allow us to open them together as a dataset in `xarray` (more on that later). Does this sound awful? I'm not making this up. This is the kind of structure you get if when downloading tiles of climate data with the [Daymet Tile Selection Tool](https://daymet.ornl.gov/gridded/)

#### hint:
you might find these functions helpful:
```
glob.glob
os.path.isdir
os.makedirs
zipfile.ZipFile
os.path.split
os.path.splitext
os.path.join
shutil.move
os.rename
os.rmdir
```

In [None]:
# Write the code here!


## Bonus -- Determining the location of an executable

There are often times that you run an executable that is nested somewhere deep within your system path.  It can often be a good idea to know exactly where that executable is located.  This might help you one day from accidently using an older version of an executable, such as MODFLOW.

In [None]:
# Define two functions to help determine 'which' program you are using
def is_exe(fpath):
    """
    Return True if fpath is an executable, otherwise return False
    """
    return os.path.isfile(fpath) and os.access(fpath, os.X_OK)

def which(program):
    """
    Locate the program and return its full path.  Return
    None if the program cannot be located.
    """
    fpath, fname = os.path.split(program)
    if fpath:
        if is_exe(program):
            return program
    else:
        # test for exe in current working directory
        if is_exe(program):
            return program
        # test for exe in path statement
        for path in os.environ["PATH"].split(os.pathsep):
            path = path.strip('"')
            exe_file = os.path.join(path, program)
            if is_exe(exe_file):
                return exe_file
    return None

In [None]:
which('MODFLOW-NWT_64.exe')