# Introduction to python for hydrologists &mdash; sys, path, shutil, and subprocess
These four packages are part of the standard python library and provide very useful functionality for working with your operating system and files.  This notebook will provide explore these packages and demonstrate some of their functionality.  Online documentation is at [sys](https://docs.python.org/2/library/sys.html "sys doc"), [os](https://docs.python.org/2/library/os.html "os doc"), [shutil](https://docs.python.org/2/library/shutil.html "shutil doc"), and [subprocess](https://docs.python.org/2/library/subprocess.html "subprocess doc").

Import things to cover:
* sys: path
* os: path, chdir, getcwd, listdir
* shutil: copy, copytree, rmtree
* subprocess: check_call, check_output

## Sys Module

System-specific parameters and functions.

The following cells simply print some of the sys methods and attributes that you might find useful.

In [1]:
import sys
import os
import shutil
import subprocess
import traceback

In [2]:
print('sys.argv: ', sys.argv)

sys.argv:  ['/Users/aleaf/anaconda3/envs/pyclass/lib/python3.7/site-packages/ipykernel_launcher.py', '-f', '/Users/aleaf/Library/Jupyter/runtime/kernel-5ab3e387-06c1-420f-a023-ff852f05dce6.json']


In [3]:
print('sys.byteorder: ', sys.byteorder)

sys.byteorder:  little


In [4]:
print('sys.copyright: ', sys.copyright)

sys.copyright:  Copyright (c) 2001-2019 Python Software Foundation.
All Rights Reserved.

Copyright (c) 2000 BeOpen.com.
All Rights Reserved.

Copyright (c) 1995-2001 Corporation for National Research Initiatives.
All Rights Reserved.

Copyright (c) 1991-1995 Stichting Mathematisch Centrum, Amsterdam.
All Rights Reserved.


In [5]:
print('sys.float_info: ', sys.float_info)

sys.float_info:  sys.float_info(max=1.7976931348623157e+308, max_exp=1024, max_10_exp=308, min=2.2250738585072014e-308, min_exp=-1021, min_10_exp=-307, dig=15, mant_dig=53, epsilon=2.220446049250313e-16, radix=2, rounds=1)


In [6]:
print('The size of an integer is ', sys.getsizeof(1), ' bytes.')
print('The size of a float is ', sys.getsizeof(1.0), ' bytes.')
print('The size of the string "Goldschlager" is ', sys.getsizeof('Goldschlager'), ' bytes.')

The size of an integer is  28  bytes.
The size of a float is  24  bytes.
The size of the string "Goldschlager" is  61  bytes.


In [7]:
try:
    print(sys.getwindowsversion())
except:
    print('Why are you against windows?')

Why are you against windows?


In [8]:
print(sys.prefix)

/Users/aleaf/anaconda3/envs/pyclass


In [9]:
print(sys.version_info)

sys.version_info(major=3, minor=7, micro=3, releaselevel='final', serial=0)


In [10]:
sys.platform

'darwin'

## sys.path

If you haven't seen `sys.path` already mentioned in a python script, you will soon.  `sys.path` is a list of directories.  This path list is used by python to search for python modules and packages.  If for some reason, you want to use a python package that is not installed in the main python folder, you can add directory containing your module to sys.path.

In [11]:
print(sys.path)

# Or more elegantly
for pth in sys.path:
    print(pth)

['/Users/aleaf/Documents/GitHub/python-usgs-training/notebooks/part1_python_intro', '/Users/aleaf/anaconda3/envs/pyclass/lib/python37.zip', '/Users/aleaf/anaconda3/envs/pyclass/lib/python3.7', '/Users/aleaf/anaconda3/envs/pyclass/lib/python3.7/lib-dynload', '', '/Users/aleaf/anaconda3/envs/pyclass/lib/python3.7/site-packages', '/Users/aleaf/anaconda3/envs/pyclass/lib/python3.7/site-packages/IPython/extensions', '/Users/aleaf/.ipython']
/Users/aleaf/Documents/GitHub/python-usgs-training/notebooks/part1_python_intro
/Users/aleaf/anaconda3/envs/pyclass/lib/python37.zip
/Users/aleaf/anaconda3/envs/pyclass/lib/python3.7
/Users/aleaf/anaconda3/envs/pyclass/lib/python3.7/lib-dynload

/Users/aleaf/anaconda3/envs/pyclass/lib/python3.7/site-packages
/Users/aleaf/anaconda3/envs/pyclass/lib/python3.7/site-packages/IPython/extensions
/Users/aleaf/.ipython


A common way that we add a folder to sys.path is as follows:

    pathtomymodule = os.path.join('..')
    if pathtomymodule not in sys.path:
        sys.path.append(pathtomymodule)

This will allow us to import any modules or packages that are up one directory from the current working directory.  Keep this in mind as we use this throughout the class exercises.

## os Module
Module for providing portable operating system functionality.

In [12]:
print('os.name: ', os.name)

os.name:  posix


In [13]:
#environment variables stored in a dictionary
print('os.environ: ', os.environ)
print('\n')

#or we can look at them in a nicer format
for k, v in os.environ.items():
    print('{0} : {1}'.format(k, v))

os.environ:  environ({'PROJ_LIB': '/Users/aleaf/anaconda3/envs/pyclass/share/proj', 'TERM_PROGRAM': 'Apple_Terminal', 'SSL_CERT_FILE': '/Users/aleaf/cert.pem', 'TERM': 'xterm-color', 'SHELL': '/bin/bash', 'TMPDIR': '/var/folders/4x/bmhyjcdn3mgfdvkk_jgz6bsr0028s1/T/', 'CONDA_SHLVL': '2', 'Apple_PubSub_Socket_Render': '/private/tmp/com.apple.launchd.wGF13df8Wr/Render', 'CONDA_PROMPT_MODIFIER': '(pyclass) ', 'TERM_PROGRAM_VERSION': '421.2', 'OLDPWD': '/Users/aleaf/Documents/GitHub/python-usgs-training/notebooks', 'TERM_SESSION_ID': '3E1BFE83-D194-424D-B25C-D24FF67F417C', 'USER': 'aleaf', 'CONDA_EXE': '/Users/aleaf/anaconda3/bin/conda', 'SSH_AUTH_SOCK': '/private/tmp/com.apple.launchd.H3l2ZVDS4W/Listeners', '_CE_CONDA': '', 'CONDA_PREFIX_1': '/Users/aleaf/anaconda3/envs/gis', 'CPL_ZIP_ENCODING': 'UTF-8', 'PATH': '/Users/aleaf/anaconda3/envs/pyclass/bin:/Users/aleaf/anaconda3/condabin:/Library/Frameworks/Python.framework/Versions/3.6/bin:/Users/aleaf/anaconda3/bin:/Users/aleaf/anaconda3/bin

In [14]:
cwd = os.getcwd()
print(cwd)

/Users/aleaf/Documents/GitHub/python-usgs-training/notebooks/part1_python_intro


In [15]:
#list all the entries in the specified directory. 
mylistofitems = os.listdir(os.getcwd())
for thingy in mylistofitems:
    if os.path.isdir(thingy):
        print('directory: ', thingy)
    else:
        print('file: ', thingy)

file:  09_sys-os.ipynb
file:  02_functions.ipynb
file:  TheisExercise.pdf
file:  06_numpy.ipynb
file:  .DS_Store
file:  08_namespace.ipynb
file:  Pandas_weather_timeseries_Wunderground.ipynb
directory:  images
file:  Pandas_NWIS.ipynb
file:  04_objects.ipynb
file:  Pandas_ColoradoRiver-FFT.ipynb
file:  05_files.ipynb
file:  TheisExercise.tex
file:  03_scripts.ipynb
file:  mtsthelens.pdf
directory:  .ipynb_checkpoints
file:  Matplotlib_StHelens.ipynb
file:  Pandas_ColoradoRiver.ipynb
directory:  data
file:  01_basics.ipynb


In [16]:
# Example of changing the working directory
old_wd = os.getcwd()

# Go up one directory
os.chdir('..')
cwd = os.getcwd()
print ('Now in: ', cwd)

# Change back to original
os.chdir(old_wd)
cwd = os.getcwd()
print('Switched back to: ', cwd)

Now in:  /Users/aleaf/Documents/GitHub/python-usgs-training/notebooks
Switched back to:  /Users/aleaf/Documents/GitHub/python-usgs-training/notebooks/part1_python_intro


## Glob
The glob library provides handy shorthand for listing files using patterns and wildcard (*) characters

https://en.wikipedia.org/wiki/Glob_(programming)

**Note!** Sorting of the files returned by `Glob` is platform-dependent. In general, if your code depends on a specific ordering of a list, it is best to explicitly sort it yourself using `sorted()` or `.sort()`, instead of depending on the behavior of an imported module.  
https://arstechnica.com/information-technology/2019/10/chemists-discover-cross-platform-python-scripts-not-so-cross-platform/

In [17]:
import glob

In [18]:
# list all of the Jupyter notebooks in the current working directory
glob.glob('*.ipynb')

['09_sys-os.ipynb',
 '02_functions.ipynb',
 '06_numpy.ipynb',
 '08_namespace.ipynb',
 'Pandas_weather_timeseries_Wunderground.ipynb',
 'Pandas_NWIS.ipynb',
 '04_objects.ipynb',
 'Pandas_ColoradoRiver-FFT.ipynb',
 '05_files.ipynb',
 '03_scripts.ipynb',
 'Matplotlib_StHelens.ipynb',
 'Pandas_ColoradoRiver.ipynb',
 '01_basics.ipynb']

In [19]:
sorted(glob.glob('*.ipynb'))

['01_basics.ipynb',
 '02_functions.ipynb',
 '03_scripts.ipynb',
 '04_objects.ipynb',
 '05_files.ipynb',
 '06_numpy.ipynb',
 '08_namespace.ipynb',
 '09_sys-os.ipynb',
 'Matplotlib_StHelens.ipynb',
 'Pandas_ColoradoRiver-FFT.ipynb',
 'Pandas_ColoradoRiver.ipynb',
 'Pandas_NWIS.ipynb',
 'Pandas_weather_timeseries_Wunderground.ipynb']

## os.path

os.path is a very widely used submodule of os.  In fact we use it in almost all of the class notebooks and scripts to deal with file system paths.  Some common os.path functions are:

    os.path.join()
    os.path.abspath()
    os.path.exists()
    os.path.isdir()
    os.path.normpath()
    os.path.split()
    os.path.splitext()
    
A common attribute of os.path is:

    os.path.sep
    
Experiment with these functions to gain a better understanding of what they do.

## os.walk

os.walk() is a great way to recursively generate all the file names and folders in a directory.  The following shows how it can be used to identify large directories.

In [20]:
pth = os.path.join('..')
for root, dirs, files in os.walk(pth):
    mbytes = sum(os.path.getsize(os.path.join(root, name)) for name in files) / 1.e6
    print('{:<50} --> {:10.2f} megabytes.'.format(root, mbytes))

..                                                 -->       0.01 megabytes.
../part1_python_intro                              -->       2.22 megabytes.
../part1_python_intro/images                       -->       0.01 megabytes.
../part1_python_intro/.ipynb_checkpoints           -->       0.72 megabytes.
../part1_python_intro/data                         -->       1.50 megabytes.
../part1_python_intro/data/04_numpy                -->       2.71 megabytes.
../part1_python_intro/data/fileio                  -->       0.00 megabytes.
../part1_python_intro/data/pandas                  -->       9.97 megabytes.
../part2_flopy                                     -->       0.81 megabytes.


## shutil Module
shutil is a high level file managment module for copying, moving, and deleting files and directories.

The functions from shutil that you may find useful are:

    shutil.copy()
    shutil.copytree()
    shutil.move()
    shutil.rmtree()  #obviously, you need to be careful with this one!
    
Give these guys a shot and see what they do.  Remember, you can always get help by typing:

    help(shutil.copy)


In [21]:
#try them here.  Be careful!


## subprocess Module

The subprocess module offers a way to execute system commands.  This is how we will run MODFLOW, for example, but you can also use run any operating system command that you can type at the command line.

subprocess.Popen() is the primary underlying function for running system commands, however, it is recommended that you use subprocess.check_output and subprocess.check_call instead.  Both of these functions use Popen.

Take a look at the following help descriptions for check_output, and check_call.

Note, that on Windows, you may commonly have to specify "shell=True" in order to access system commands.

In [22]:
help(subprocess.check_output)

Help on function check_output in module subprocess:

check_output(*popenargs, timeout=None, **kwargs)
    Run command with arguments and return its output.
    
    If the exit code was non-zero it raises a CalledProcessError.  The
    CalledProcessError object will have the return code in the returncode
    attribute and output in the output attribute.
    
    The arguments are the same as for the Popen constructor.  Example:
    
    >>> check_output(["ls", "-l", "/dev/null"])
    b'crw-rw-rw- 1 root root 1, 3 Oct 18  2007 /dev/null\n'
    
    The stdout argument is not allowed as it is used internally.
    To capture standard error in the result, use stderr=STDOUT.
    
    >>> check_output(["/bin/sh", "-c",
    ...               "ls -l non_existent_file ; exit 0"],
    ...              stderr=STDOUT)
    b'ls: non_existent_file: No such file or directory\n'
    
    There is an additional optional argument, "input", allowing you to
    pass a string to the subprocess's stdin.  If

In [23]:
help(subprocess.check_call)

Help on function check_call in module subprocess:

check_call(*popenargs, **kwargs)
    Run command with arguments.  Wait for command to complete.  If
    the exit code was zero then return, otherwise raise
    CalledProcessError.  The CalledProcessError object will have the
    return code in the returncode attribute.
    
    The arguments are the same as for the call function.  Example:
    
    check_call(["ls", "-l"])



In [24]:
# if on mac/unix
subprocess.check_output(['ls', '-l'], shell=True)

b'01_basics.ipynb\n02_functions.ipynb\n03_scripts.ipynb\n04_objects.ipynb\n05_files.ipynb\n06_numpy.ipynb\n08_namespace.ipynb\n09_sys-os.ipynb\nMatplotlib_StHelens.ipynb\nPandas_ColoradoRiver-FFT.ipynb\nPandas_ColoradoRiver.ipynb\nPandas_NWIS.ipynb\nPandas_weather_timeseries_Wunderground.ipynb\nTheisExercise.pdf\nTheisExercise.tex\ndata\nimages\nmtsthelens.pdf\n'

In [25]:
# if on Windows
try:
    output = subprocess.check_output(['dir'], shell=True)
    output
except:
    print('Why are you against windows?')

Why are you against windows?


In [27]:
# What is going on here?
try:
    subprocess.check_call(['dir'], shell=True)
except Exception as e:
    traceback.print_exc()

Traceback (most recent call last):
  File "<ipython-input-27-aa18cf219b2e>", line 3, in <module>
    subprocess.check_call(['dir'], shell=True)
  File "/Users/aleaf/anaconda3/envs/pyclass/lib/python3.7/subprocess.py", line 347, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['dir']' returned non-zero exit status 127.


## Testing Your Skills

1.  Create a new folder called mynumpyarrays
2.  Create 20 random numpy arrays of shape (100, 100)
3.  Save each array in a different file in the mynumpyarrays folder
4.  Make a copy of the mynumpyarrays folder and call it mynumpyarrays2
5.  Time permitting, save an image file of the arrays also.

In [25]:
# Write the code here!


## Bonus -- Determining the location of an executable

There are often times that you run an executable that is nested somewhere deep within your system path.  It can often be a good idea to know exactly where that executable is located.  This might help you one day from accidently using an older version of an executable, such as MODFLOW.

In [26]:
# Define two functions to help determine 'which' program you are using
def is_exe(fpath):
    """
    Return True if fpath is an executable, otherwise return False
    """
    return os.path.isfile(fpath) and os.access(fpath, os.X_OK)

def which(program):
    """
    Locate the program and return its full path.  Return
    None if the program cannot be located.
    """
    fpath, fname = os.path.split(program)
    if fpath:
        if is_exe(program):
            return program
    else:
        # test for exe in current working directory
        if is_exe(program):
            return program
        # test for exe in path statement
        for path in os.environ["PATH"].split(os.pathsep):
            path = path.strip('"')
            exe_file = os.path.join(path, program)
            if is_exe(exe_file):
                return exe_file
    return None

In [27]:
which('MODFLOW-NWT_64.exe')

## TYS answer

In [28]:
#Create the mynumpyarrays directory
import numpy as np
print(os.getcwd())
dname = 'mynumpyarrays'

if os.path.exists(dname):
    print('Using shutil.rmtree to remove: ', dname)
    shutil.rmtree(dname)

if not os.path.isdir(dname):
    os.mkdir(dname)

# Create and write 100 arrays to the dname folder
for i in range(20):
    print('Creating array: ', i)
    a = np.random.random((100,100))
    fname = 'array' + str(i) + '.dat'
    fnamewithpath = os.path.join(dname, fname)
    np.savetxt(fnamewithpath, a)
    
# Now let's make a copy
# First delete the folder if it already exists
dname2 = 'mynumpyarrays2'
if os.path.exists(dname2):
    print('Using shutil.rmtree to remove: ', dname2)
    shutil.rmtree(dname2)

print('Using shutil.copytree to copy: ', dname, 'to ', dname2)
shutil.copytree(dname, dname2)

/Users/aleaf/Documents/GitHub/python-usgs-training/notebooks/part1_python_intro
Creating array:  0
Creating array:  1
Creating array:  2
Creating array:  3
Creating array:  4
Creating array:  5
Creating array:  6
Creating array:  7
Creating array:  8
Creating array:  9
Creating array:  10
Creating array:  11
Creating array:  12
Creating array:  13
Creating array:  14
Creating array:  15
Creating array:  16
Creating array:  17
Creating array:  18
Creating array:  19
Using shutil.copytree to copy:  mynumpyarrays to  mynumpyarrays2


'mynumpyarrays2'

In [29]:
# clean it up
dnamelist = [dname, dname2]
for dn in dnamelist:
    print('Attempting to delete: ', dn)
    if os.path.isdir(dn):
        print('Using shutil.rmtree to remove: ', dn)
        shutil.rmtree(dn)

Attempting to delete:  mynumpyarrays
Using shutil.rmtree to remove:  mynumpyarrays
Attempting to delete:  mynumpyarrays2
Using shutil.rmtree to remove:  mynumpyarrays2
