# Notebook 1: Subsetting the National Water Model domain input files

    
Authors: Tony Castronova <acastronova@cuahsi.org>  
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Danielle Tijerina <dtijerina@cuahsi.org>  
Date Modified: 06/20/2019  
Affiliation: CUAHSI


## 1. Goal
This notebook will walk you through the process of subsetting a portion of the National Water Model Hydro-Fabric for an area of interest that can be used to execute a WRF-Hydro simulation. This will include:
1. going to subset.cuahsi.org
2. selecting location of interest and request subset
3. downloading data via wget
4. inspecting the content


## 2. Background
The purpose of this application is to introduce a collaborative effort for preparing, publishing, and sharing subsets of the National Water Model input data and geo-fabric at watershed scales. Our hope is that these efforts will lower the barrier of entry for using and applying these models and engage a wide variety scientists from a diverse spectrum of expertise. With a combination of modern cyberinfrastructure techniques and state-of-the-science modeling tools, researchers will have access to subsets of National Water Model information that would otherwise require extensive computational resources. This work provides the foundation onto which similar efforts can be applied to other large-scale model simulations and input data.

## 3. Subsetting Exercise

In [None]:
pip install python3-wget

In [None]:
# import Python libraries 

import os
import wget
import glob
import shutil
import tarfile


### Step 1.  
Go to http://subset.cuahsi.org/, select an area of interest, and click submit to run the subsetter. If you know your bounding box of interest, the subsetter can be invoked programatically using REST web services: http://subset.cuahsi.org/api.

### Step 2.  
When the job is complete (~5min), you'll be presented with a link. Specify the URL of the DOMAIN that was subsetted in step 1 below, for example: http://subset.cuahsi.org/data/7c3b61d79999e5b8b96dab41b0b12d0b6975c166.tar.gz

In [None]:
domain_archive = input("Enter link to domain: ")
archive_name = domain_archive.split('/')[-1]
guid = archive_name.split('.')[0]                       

### Step 3. 
Run the following cells to download your subset data and move it into a directory called `DOMAIN`

In [None]:
domain_dir = 'DOMAIN'
if os.path.exists(domain_dir):
    while 1:
        res = input("Directory 'DOMAIN' already exists. Do you wish to remove it [Y/N]?")
        if res.lower() == 'y':
            shutil.rmtree(domain_dir)
            os.mkdir(domain_dir)
            break
        elif res.lower() == 'n':
            break
        else:
            print('\nInvalid input. Please answer either Y or N')
else:
    os.mkdir(domain_dir)
            
# download the subset archive
print('Downloading domain from subset.cuahsi.org')
archive_name = wget.download(domain_archive)

# untar the archive
print('Extracting archive contents')
tar = tarfile.open(archive_name)
tar.extractall()
tar.close()

# move the domain files into the DOMAIN directory
print('Organizing domain data')
extracted_folder = archive_name.split('.')[0]
for f in glob.glob(os.path.join(extracted_folder, '*')):
    shutil.move(f, domain_dir)

print('Cleaning up')
os.remove(archive_name)
shutil.rmtree(extracted_folder)


### Step 4.  
View the contents of `DOMAIN` to see the files that were downloaded. 

In [None]:
# look at the DOMAIN data what was downloaded
!du -h DOMAIN/*

Look at the metadata in the `Fulldom_hires.nc`, just for fun.

In [None]:
!ncdump -h DOMAIN/Fulldom_hires.nc

You have now successfully subsetted the domain data for running the WRF-Hydro model using the National Water Model Hydro-Fabric. The next notebook will walk you through the process of collecting and preparing meteorological forcing data: [WRFHydro-Forcing.ipynb](./WRFHydro-Forcing.ipynb)
    