**`os`** and **`glob`** are two default Python packages that make management of files, directories and paths more convenient. 

## Package "os"

In [1]:
import os

Assuming that we have a local project folder "`project_OO`" look like this:

***
* project_OO
    * data
        * input
            * raster
                * net_001.tif
                * net_011.tif
                * net_112.tif
                * vessel_001.tif
                * vessel_002.tif
                * vessel_003.tif
            * vector
                * basin.geojson
    * script
***

It contains the input raster and vector data we need for analysis, so 
we firstly want to make this folder be our working directory, so we ***set working directory***: 

In [5]:
# Set / Change working directory
os.chdir("I:/project_OO")

We could also ***check current working directory*** before or after changing working directory by doing: 

In [6]:
# Check current working directory
os.getcwd()

'I:\\project_OO'

We usually prefer having input and output data placed separately; luckily using `os` package, we don't have to define a path with its full path name everytime, we could ***join path*** instead by using `os.path.join("dirA", "dirB")`. For example:

In [9]:
# Join path
path_input = os.path.join(os.getcwd(), "data", "input")
path_output = os.path.join(os.getcwd(), "data", "output")
print(path_input, '\n', path_output)

I:\project_OO\data\input 
 I:\project_OO\data\output


In this case, the current working directory `os.getcwd()` and the directory `input` are joined / combined to generate a new path called `path_input`.

***Note: when `os.path.join()` is run, it will adapt to the operating system.***

To ***check if the path exists***, we do:

In [10]:
# Check if the path 'path_input' exists
os.path.exists(path_input)

True

In [11]:
# Check if the path 'path_output' exists
os.path.exists(path_output)

False

If it returns `False`, then we could ***create the directory*** using:

In [12]:
# Create new directory
os.mkdir(path_output)
# Check again if the 'path_output' exists
os.path.exists(path_output)

True

## Package "glob"

***glob*** is a powerful tool to help with filtering through large datasets and pulling out only files of interest.

In [17]:
from glob import glob

### Find all files in a directory

In [34]:
# Find files in directory "path_input/raster"
glob(os.path.join(path_input, 'raster', '*'))

['I:\\project_OO\\data\\input\\raster\\vessel_003.tif',
 'I:\\project_OO\\data\\input\\raster\\net_001.tif',
 'I:\\project_OO\\data\\input\\raster\\net_011.tif',
 'I:\\project_OO\\data\\input\\raster\\net_112.tif',
 'I:\\project_OO\\data\\input\\raster\\vessel_001.tif',
 'I:\\project_OO\\data\\input\\raster\\vessel_002.tif']

In [35]:
# Find all files in directory "path_input/vector"
glob(os.path.join(path_input, 'vector', '*'))

['I:\\project_OO\\data\\input\\vector\\basin.geojson',
 'I:\\project_OO\\data\\input\\vector\\basin_prj.dbf',
 'I:\\project_OO\\data\\input\\vector\\basin_prj.cpg',
 'I:\\project_OO\\data\\input\\vector\\basin_prj.prj',
 'I:\\project_OO\\data\\input\\vector\\basin_prj.shp',
 'I:\\project_OO\\data\\input\\vector\\basin_prj.shx']

***Tip: using multiple ` * ` to find all subdirectories and their contents.***

In [36]:
# Find all files under "path_input"
glob(os.path.join(path_input, '*', '*'))

['I:\\project_OO\\data\\input\\raster\\vessel_003.tif',
 'I:\\project_OO\\data\\input\\raster\\net_001.tif',
 'I:\\project_OO\\data\\input\\raster\\net_011.tif',
 'I:\\project_OO\\data\\input\\raster\\net_112.tif',
 'I:\\project_OO\\data\\input\\raster\\vessel_001.tif',
 'I:\\project_OO\\data\\input\\raster\\vessel_002.tif',
 'I:\\project_OO\\data\\input\\vector\\basin.geojson',
 'I:\\project_OO\\data\\input\\vector\\basin_prj.dbf',
 'I:\\project_OO\\data\\input\\vector\\basin_prj.cpg',
 'I:\\project_OO\\data\\input\\vector\\basin_prj.prj',
 'I:\\project_OO\\data\\input\\vector\\basin_prj.shp',
 'I:\\project_OO\\data\\input\\vector\\basin_prj.shx']

### Find files based on a filter condition

In [37]:
# Find only .shp file
glob(os.path.join(path_input, 'vector', '*.shp'))

['I:\\project_OO\\data\\input\\vector\\basin_prj.shp']

In [38]:
# Find all raster files where the filename begins with 'vessel'
glob(os.path.join(path_input, 'raster', 'vessel*'))

['I:\\project_OO\\data\\input\\raster\\vessel_003.tif',
 'I:\\project_OO\\data\\input\\raster\\vessel_001.tif',
 'I:\\project_OO\\data\\input\\raster\\vessel_002.tif']

In [39]:
# Find all raster files with '2' somewhere in the filename
glob(os.path.join(path_input, 'raster', '*2*'))

['I:\\project_OO\\data\\input\\raster\\net_112.tif',
 'I:\\project_OO\\data\\input\\raster\\vessel_002.tif']

***Tip: Filter using Ranges.***

In [41]:
# Find all raster files having '2' or '3' in filenames
glob(os.path.join(path_input, 'raster', '*[2-3]*'))

['I:\\project_OO\\data\\input\\raster\\vessel_003.tif',
 'I:\\project_OO\\data\\input\\raster\\net_112.tif',
 'I:\\project_OO\\data\\input\\raster\\vessel_002.tif']

***Tip: Filter using `?` operater to replace characters.***

In [43]:
# Find all raster files that are numbered with '1' at tenth place
glob(os.path.join(path_input, 'raster', '*_?1?*'))

['I:\\project_OO\\data\\input\\raster\\net_011.tif',
 'I:\\project_OO\\data\\input\\raster\\net_112.tif']

### Sort glob List

The lists returned by `glob` is unsorted, so let's sort it!

In [45]:
rasters = glob(os.path.join(path_input, 'raster', '*'))
rasters

['I:\\project_OO\\data\\input\\raster\\vessel_003.tif',
 'I:\\project_OO\\data\\input\\raster\\net_001.tif',
 'I:\\project_OO\\data\\input\\raster\\net_011.tif',
 'I:\\project_OO\\data\\input\\raster\\net_112.tif',
 'I:\\project_OO\\data\\input\\raster\\vessel_001.tif',
 'I:\\project_OO\\data\\input\\raster\\vessel_002.tif']

In [47]:
rasters.sort()
rasters

['I:\\project_OO\\data\\input\\raster\\net_001.tif',
 'I:\\project_OO\\data\\input\\raster\\net_011.tif',
 'I:\\project_OO\\data\\input\\raster\\net_112.tif',
 'I:\\project_OO\\data\\input\\raster\\vessel_001.tif',
 'I:\\project_OO\\data\\input\\raster\\vessel_002.tif',
 'I:\\project_OO\\data\\input\\raster\\vessel_003.tif']

## Other "os" / "glob" functionality

**`os.path.commonpath()`**, when combined with `glob`, returns the lowest directory the files in the list have in common.

In [49]:
# The lowest directoy all out input data have in common is:
inputData = glob(os.path.join(path_input, "*", "*"))
os.path.commonpath(inputData)

'I:\\project_OO\\data\\input'

**`os.path.basename()`** returns the last section of the path.

In [51]:
os.path.basename(path_input)

'input'

**`os.path.split()`** returns the ***last section*** and the ***rest parts*** of a path where each part can be ***indexed***.

In [55]:
print(os.path.split(path_input), '\n')
print(os.path.split(path_input)[0], '\n')
print(os.path.split(path_input)[1])

('I:\\project_OO\\data', 'input') 

I:\project_OO\data 

input


Using python built-in function **`.split()`**, we could split a path into base pieces.

In [62]:
print(path_input.split('\\'))

['I:', 'project_OO', 'data', 'input']
