# Python script to extract 1-hourly hycom model sur/surface diagnostic output

This script downloads 1-hourly hycom sur/surface diagnostic data of water_flux_into_ocean, ocean_mixed_layer_thickness, surface_downward_heat_flux_in_air, sea_surface_elevation, steric SSH, surface_boundary_layer_thickness, barotropic_eastward_sea_water_velocity, and barotropic_northward_sea_water_velocity for any selected ocean sub-area and for any selected time period between 2019 to present. HYCOM data are downloaded from www.hycom.org and saved in netcdf files on your google drive.
___

Adapted from a LiveOcean script by Parker MacCready (https://github.com/parkermac/LiveOcean)

Modified by Greg Pelletier (gjpelletier@gmail.com) for standalone use to download 1-hourly hycom data (https://github.com/gjpelletier/get_hycom)

___

INSTRUCTIONS

Specify the following in the code sections below:
  - list of variables to be extracted from any combination of var_list = 'emp,mixed_layer_thickness,qtot,ssh,steric_ssh,surface_boundary_layer_thickness,u_barotropic_velocity,v_barotropic_velocity'
  - west, east, south, and north extent of the ocean sub-area where data will be extracted
  - name of the resultDirectory where the hycom data will be saved as output
  - the date_start and number_of_days of the time period to be extracted, and corresponding hycom codes for the model glb and expt

During execution you should see the progress of each 1-hourly files that are extracted during the period of interest from beginning to end. Each nc file name has the format yyyyMMdd_HH.nc to indicate the datetime stamp in UTC


___

Import the required python packages:


In [1]:
import os
import sys
from datetime import *
import time
from urllib.request import urlretrieve
from urllib.error import URLError
from socket import timeout

Mount your google drive folder to make it possible to store the output nc files of 3-hourly data in your google drive:

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


Specify the name of the resultDirectory folder where the output files will be saved in your mounted google drive folder. Edit the name of resultDirectory below to use any name you want, as long as you start the name with 'drive/MyDrive/' if you are using Google Colab. This subfolder will be created by the script later if it does not already exist in your google drive:

In [3]:
resultDirectory = 'drive/MyDrive/Colab Notebooks/hycom/'   # include the ending '/' 

Specify the date_start in ISO format for the starting datetime when the data to be extracted. 

The starting hour must be either 00, 01, 02, ... 23, or must be 12 if the date_start is Jan 1 of the year or the first day of the expt. The date_start must be within the range of dates for the glb and expt as described at www.hycom.org

Also specify the number_of_days of 1-hourly data to be downloaded. There will be separate output nc files downloaded for 24 consecutive 1-hourly datetimes for each day in the number_of_days. For example, if number_of_days=1 there will be 24 nc files, if number_of_days=7 there will be 168 nc files, etc.

Each output nc file name will be generated by the script and will have the format yyyyMMdd_HH.nc to indicate the datetime stamp in UTC 

You can download up to one year of 1-hourly data at a time from any given calendar year. Note that it can take between about 10 seconds up to over a minute for each 1-hourly file to download. In other words, if number_of_days=1 then it will take up to a few minutes, and if number_of_days is 365 then it will take up to several hours to download all of the 1-hourly nc files.

Also note that some experiments may have some missing 1-hourly times which can cause the script to get stuck in those places. If that happens you can begin the script again with a new date_start to resume again starting with the first non-missing date_start.

In [4]:
date_start = '2020-01-01 12:00:00'      
number_of_days  = 1                     

Specify the HYCOM codes for glb and expt corresponding to the time period that will be downloaded

The following shows the correct correct glb and expt to use for the dates to be downloaded (more info on the glb and expt is available at www.hycom.org if needed):
*   Use glb = 'GLBy0.08' and expt = '93.0' for dates between 12/4/2018 or 1/1/2019 - present



In [5]:
glb = 'GLBy0.08'                     
expt = '93.0'                        

Specify spatial limits (default below is Parker MacCready's HYCOM bounding box for the boundary of the LiveOcean model):

In [6]:
north = 53              # -80 to 80 degN          
south = 39              # -80 to 80 degN
west = -131 + 360       # 0 to 360 degE
east = -121 + 360       # 0 to 360 degE

Edit the following var_list as needed to download any subset of these available variables:

In [7]:
var_list = 'emp,mixed_layer_thickness,qtot,ssh,steric_ssh,surface_boundary_layer_thickness,u_barotropic_velocity,v_barotropic_velocity'

Make a function to create a directory if it does not already exist:

In [9]:
def ensure_dir(file_path):
    # create a folder if it does not already exist
    directory = os.path.dirname(file_path)
    if not os.path.exists(directory):
        os.makedirs(directory)

Make a function to extract the hycom data during the loop through a list of all datetimes to be extracted:

In [10]:
def get_extraction(dt, out_fn, var_list):
    dstr0 = dt.strftime('%Y-%m-%d-T%H:00:00Z')
    print(dstr0)
    if expt == '53.X':
        url = ('http://ncss.hycom.org/thredds/ncss/' + glb + '/expt_' + expt + '/data/' + dt.strftime('%Y') + 
            '/sur?var='+var_list +
            '&north='+str(north)+'&south='+str(south)+'&west='+str(west)+'&east='+str(east) +
            '&disableProjSubset=on&horizStride=1' +
            '&time_start='+dstr0+'&time_end='+dstr0+'&timeStride=8' +
            '&vertCoord=&addLatLon=true&accept=netcdf4')
    else:
        url = ('http://ncss.hycom.org/thredds/ncss/' + glb + '/expt_' + expt + 
            '/sur?var='+var_list +
            '&north='+str(north)+'&south='+str(south)+'&west='+str(west)+'&east='+str(east) +
            '&disableProjSubset=on&horizStride=1' +
            '&time_start='+dstr0+'&time_end='+dstr0+'&timeStride=8' +
            '&vertCoord=&addLatLon=true&accept=netcdf4')
    # get the data and save as a netcdf file
    counter = 1
    got_file = False
    while (counter <= 10) and (got_file == False):
        print('  Attempting to get data, counter = ' + str(counter))
        tt0 = time.time()
        try:
            (a,b) = urlretrieve(url, out_fn)
            # a is the output file name
            # b is a message you can see with b.as_string()
        except URLError as ee:
            if hasattr(ee, 'reason'):
                print('  *We failed to reach a server.')
                print('  -Reason: ', ee.reason)
            elif hasattr(ee, 'code'):
                print('  *The server could not fulfill the request.')
                print('  -Error code: ', ee.code)
        except timeout:
            print('  *Socket timed out')
        else:
            got_file = True
            print('  Downloaded data')
        print('  Time elapsed: %0.1f seconds' % (time.time() - tt0))
        counter += 1
    if got_file:
        result = 'success'
    else:
        result = 'fail'
    return result

Make a dt_list of all of the 1-hourly datetimes to extract from hycom:

In [11]:
base = datetime.fromisoformat(date_start)
if base.strftime('%H') == '12' and number_of_days >= 365:
    ndt = number_of_days * 24 - 12
else:
    ndt = number_of_days * 24
dt_list = []
dt_list = [base + timedelta(hours=x) for x in range(ndt)]

Loop through all of the datetimes in dt_list and download all of the nc files for the number_of_days of 1-hourly data:

In [12]:
out_dir = resultDirectory                  # specify output directory adding the ending '/'
ensure_dir(out_dir)                        # make sure the output directory exists, make one if not
f = open(out_dir + 'log.txt', 'w+')        # open log of successful downloads
print('\n** Working on ' + glb + '/expt_' + expt + ' **')
f.write('\n\n** Working on ' + glb + '/expt_' + expt + ' **')
tt1 = time.time()                          # tic for total elapsed time
force_overwrite = True                     # overwrite any already existing nc files in the output folder
for dt in dt_list:
    out_fn = out_dir + datetime.strftime(dt, '%Y%m%d_%H') + '.nc'
    print(out_fn)
    if os.path.isfile(out_fn):
        if force_overwrite:
            os.remove(out_fn)
    if not os.path.isfile(out_fn):
        result = get_extraction(dt, out_fn, var_list)
        f.write('\n ' + datetime.strftime(dt, '%Y%m%d_%H') + ' ' + result)

totmin = (time.time() - tt1)/60             # total time elapsed for loop over all datetimes in minutes
print('')
print('All downloads are completed.')
print('Total time elapsed: %0.1f minutes' % totmin)
f.close()       # close log of successful downloads


** Working on GLBy0.08/expt_93.0 **
drive/MyDrive/Colab Notebooks/hycom/20200101_12.nc
2020-01-01-T12:00:00Z
  Attempting to get data, counter = 1
  Downloaded data
  Time elapsed: 44.8 seconds
drive/MyDrive/Colab Notebooks/hycom/20200101_13.nc
2020-01-01-T13:00:00Z
  Attempting to get data, counter = 1
  Downloaded data
  Time elapsed: 92.2 seconds
drive/MyDrive/Colab Notebooks/hycom/20200101_14.nc
2020-01-01-T14:00:00Z
  Attempting to get data, counter = 1
  Downloaded data
  Time elapsed: 50.9 seconds
drive/MyDrive/Colab Notebooks/hycom/20200101_15.nc
2020-01-01-T15:00:00Z
  Attempting to get data, counter = 1
  Downloaded data
  Time elapsed: 48.3 seconds
drive/MyDrive/Colab Notebooks/hycom/20200101_16.nc
2020-01-01-T16:00:00Z
  Attempting to get data, counter = 1
  Downloaded data
  Time elapsed: 80.1 seconds
drive/MyDrive/Colab Notebooks/hycom/20200101_17.nc
2020-01-01-T17:00:00Z
  Attempting to get data, counter = 1
  Downloaded data
  Time elapsed: 3.6 seconds
drive/MyDrive/Co