# Extract monthly outputs from the GLORYS "Global Ocean Biogeochemistry Analysis and Forecast"

This script downloads selected variables from the GLORYS "Global Ocean Biogeochemistry Analysis and Forecast" data product from 11/2020 to present. 

GLORYS data are downloaded from the following link and saved in netcdf files on your google drive:

https://data.marine.copernicus.eu/product/GLOBAL_ANALYSIS_FORECAST_BIO_001_028/download

___

by Greg Pelletier | gjpelletier@gmail.com | https://github.com/gjpelletier/get_glorys

___

INSTRUCTIONS

Before using this script, it is first necessary to establish a free account with https://data.marine.copernicus.eu/ 
Your username will be assigned when you establish your account. Your password should not include special characters.

Specify the following in the code sections below:
  - list of variables to be extracted in any combination, e.g. var_list = ["dissic","talk","si","po4","ph","spco2","o2","no3","fe","phyc","chl","nppv"] 
  - west, east, south, and north extent of the ocean sub-area where data will be extracted
  - name of the OUTPUT_DIRECTORY where the hycom data will be saved as output
  - the date_start and number_of_months of the time period to be extracted (between 11/2020 and present)
  - the min and max depths (dep_min and dep_max) (between 0 and 5728m)

During execution you should see the progress of each monthly file that is extracted during the period of interest from beginning to end. Each nc file name has the format glorys_biogeochem_yyyy_MM.nc to indicate the date stamp


___

Install the motuclient package that is needed for this script:


In [None]:
pip install motuclient==1.8.8

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting motuclient==1.8.8
  Downloading motuclient-1.8.8.tar.gz (30 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: motuclient
  Building wheel for motuclient (setup.py) ... [?25l[?25hdone
  Created wheel for motuclient: filename=motuclient-1.8.8-py3-none-any.whl size=34214 sha256=c2f9cf1718fdbd8c2c852b4968830414a1524d8fcdc4059935e0670807547ab6
  Stored in directory: /root/.cache/pip/wheels/d1/b5/6e/0ce6aa0aa4a126905874a0a1378fa4872e2b12c69295b410da
Successfully built motuclient
Installing collected packages: motuclient
Successfully installed motuclient-1.8.8


Import the required packages for this script:

In [None]:
import os
import sys
from datetime import *
import time
from socket import timeout
from dateutil.relativedelta import relativedelta
import calendar     # calendar.monthrange(year, month) returns weekday (0-6 ~ Mon-Sun) and number of days (28-31) for year, month.

# aditional packages needed per https://help.marine.copernicus.eu/en/articles/5211063-how-to-use-the-motuclient-within-python-environment:
import getpass
import motuclient

Mount your google drive folder to make it possible to store the output nc files in your google drive:

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


Specify the name of the OUTPUT_DIRECTORY folder where the output files will be saved in your mounted google drive folder. Edit the name of OUTPUT_DIRECTORY below to use any name you want, as long as you start the name with 'drive/MyDrive/' if you are using Google Colab. This subfolder will be created by the script later if it does not already exist in your google drive:

In [None]:
OUTPUT_DIRECTORY = 'drive/MyDrive/Colab Notebooks/glorys/'   # include the ending '/' 

Specify the date_start in ISO format for the starting datetime when the data to be extracted (between 11/2020 and present). 

The datetime format must be 'YYYY-MM-DD hh:mm:ss'. The starting day should be 01, and the starting hh:mm:ss should be 00:00:00

Also specify the number_of_months to be downloaded. There will be separate output nc files downloaded for each date in the number_of_months. For example, if number_of_monthss=12 there will be 12 nc files.

Each output nc file name will be generated by the script and will have the format glorys_biogeochem_yyyy_MM.nc to indicate the date stamp. 

Note that the download time for each nc file will depend on the size of the area being extracted. The default area below typically takes less than 1 minute to download each monthly nc file.



In [None]:
# Specify the date_start and number_of_months from 11/2020 - present
date_start = '2022-01-01 00:00:00'      
number_of_months  = 12                     

Specify spatial limits (default below is Parker MacCready's boundary of the LiveOcean model):

In [None]:
north = 53              # -90 to 90 degN          
south = 39              # -90 to 90 degN
west = -131             # -180 to 180 degE
east = -122             # -180 to 180 degE

Specify the min and max depths to download (between 0 - 5728m):

In [None]:
# Specify the min and max depths (between 0 - 5728m)
dep_min = 0
dep_max = 5728

Edit the following var_list as needed to download any subset of these available variables. Any of the following variables may be included in the var_list:

- dissic = Mole concentration of dissolved inorganic carbon in sea water [mol/m3]
- talk = Sea water alkalinity expressed as mole equivalent [mol/m3]
- si = Mole concentration of silicate in sea water [mmol/m3]
- po4 = Mole concentration of phosphate in sea water [mmol/m3]
- ph = Sea water pH reported on total scale
- spco2 = Surface partial pressure of carbon dioxide in sea water [Pa]
- o2 = Mole concentration of dissolved molecular oxygen in sea water [mmol/m3]
- no3 = Mole concentration of nitrate in sea water [mmol/m3]
- fe = Mole concentration of dissolved iron in sea water [mmol/m3] 
- phyc = Mole concentration of phytoplankton expressed as carbon in sea water [mmol/m3]
- chl = Mass concentration of chlorophyll a in sea water [mg/m3]
- nppv = Net primary production of biomass expressed as carbon per unit volume in sea water [mg/m3/day]


In [None]:
var_list = ["dissic","talk","si","po4","ph","spco2","o2","no3","fe","phyc","chl","nppv"]

Make a function to create a directory if it does not already exist:

In [None]:
def ensure_dir(file_path):
    # create a folder if it does not already exist
    directory = os.path.dirname(file_path)
    if not os.path.exists(directory):
        os.makedirs(directory)

Define a class MotuOptions that will be used to parse the motuclient:

In [None]:
class MotuOptions:
    def __init__(self, attrs: dict):
        super(MotuOptions, self).__setattr__("attrs", attrs)

    def __setattr__(self, k, v):
        self.attrs[k] = v

    def __getattr__(self, k):
        try:
            return self.attrs[k]
        except KeyError:
            return None

Make a function to extract the glorys data during the loop through a list of all datetimes to be extracted:

In [None]:
def get_extraction(data_request_options_dict):
    # get the data and save as a netcdf file
    counter = 1
    got_file = False
    while (counter <= 10) and (got_file == False):
        print('  Attempting to get data, counter = ' + str(counter))
        tt0 = time.time()
        try:
            motuclient.motu_api.execute_request(MotuOptions(data_request_options_dict))
        except timeout:
            print('  *Socket timed out, trying again')
        except:
            print('  *Something went wrong, trying again')      
        else:
            got_file = True
            print('  Downloaded data')
        print('  Time elapsed: %0.1f seconds' % (time.time() - tt0))
        counter += 1
    if got_file:
        result = 'success'
    else:
        result = 'fail'
    return result

Make a dt_list of all of the dates to extract from glorys:

In [None]:
base = datetime.fromisoformat(date_start)
ndt = number_of_months
dt_list = []
dt_list = [base + relativedelta(months = x) for x in range(ndt)]

Before using this script, it is first necessary to establish a free account with https://data.marine.copernicus.eu/ Your username will be assigned when you establish your account. Your password should not include special characters. 

Enter your username and password at the prompts (press enter after inputting your username to get the password prompt, then press enter again after inputting the password). 

In [None]:
USERNAME = input('Enter your username: ')
PASSWORD = getpass.getpass('Enter your password: ')

Enter your username: gpelletier
Enter your password: ··········


Make a template dictionary that will used by motuclient and will be updated with new dates and output filenames during each iteration of the loop through dates:

In [None]:
data_request_options_dict_manual = {
    "service_id": "GLOBAL_ANALYSIS_FORECAST_BIO_001_028-TDS",
    "product_id": "global-analysis-forecast-bio-001-028-monthly",
    "date_min": " ",
    "date_max": " ",
    "longitude_min": float(west),
    "longitude_max": float(east),
    "latitude_min": float(south),
    "latitude_max": float(north),
    "depth_min": float(dep_min),
    "depth_max": float(dep_max),
    "variable": var_list,
    "motu": "https://nrt.cmems-du.eu/motu-web/Motu",
    "out_dir": OUTPUT_DIRECTORY,
    "out_name": " ",
    "auth_mode": "cas",
    "user": USERNAME,
    "pwd": PASSWORD
}

Loop through all of the dates in dt_list and download all of the nc files for the number_of_months of data:

In [None]:
out_dir = OUTPUT_DIRECTORY                   # specify output directory adding the ending '/'
ensure_dir(out_dir)                         # make sure the output directory exists, make one if not
f = open(out_dir + 'log.txt', 'w+')         # open log of successful downloads
print('\n** Working on GLORYS extraction **')
f.write('\n\n** Working on GLORYS extraction **')
tt1 = time.time()                           # tic for total elapsed time
force_overwrite = True                      # overwrite any already existing nc files in the output folder that have the same names
for dt in dt_list:
    out_fn = datetime.strftime(dt, 'glorys_biogeochem_%Y_%m') + '.nc'
    dstr_min = dt.strftime('%Y-%m-01 00:00:00')
    dd_str = str(calendar.monthrange(int(dt.strftime('%Y')), int(dt.strftime('%m')))[1])  # string of last day in this month
    dstr_max = dt.strftime('%Y-%m-'+dd_str+' 23:59:59')
    data_request_options_dict_manual["out_name"] = out_fn
    data_request_options_dict_manual["date_min"] = dstr_min
    data_request_options_dict_manual["date_max"] = dstr_max
    print(out_dir + out_fn)
    if os.path.isfile(out_fn):
        if force_overwrite:
            os.remove(out_fn)
    if not os.path.isfile(out_fn):
        result = get_extraction(data_request_options_dict_manual)
        f.write('\n ' + datetime.strftime(dt, '%Y_%m') + ' ' + result)
           
# - - -
# final message
totmin = (time.time() - tt1)/60             # total time elapsed for loop over all datetimes in minutes
print('')
print('All downloads are completed.')
print('Total time elapsed: %0.1f minutes' % totmin)
f.close()       # close log of successful downloads


** Working on GLORYS extraction **
drive/MyDrive/Colab Notebooks/glorys/glorys_biogeochem_2022_01.nc
  Attempting to get data, counter = 1
  Downloaded data
  Time elapsed: 20.7 seconds
drive/MyDrive/Colab Notebooks/glorys/glorys_biogeochem_2022_02.nc
  Attempting to get data, counter = 1
  Downloaded data
  Time elapsed: 21.0 seconds
drive/MyDrive/Colab Notebooks/glorys/glorys_biogeochem_2022_03.nc
  Attempting to get data, counter = 1
  Downloaded data
  Time elapsed: 20.7 seconds
drive/MyDrive/Colab Notebooks/glorys/glorys_biogeochem_2022_04.nc
  Attempting to get data, counter = 1
  Downloaded data
  Time elapsed: 21.0 seconds
drive/MyDrive/Colab Notebooks/glorys/glorys_biogeochem_2022_05.nc
  Attempting to get data, counter = 1
  Downloaded data
  Time elapsed: 20.6 seconds
drive/MyDrive/Colab Notebooks/glorys/glorys_biogeochem_2022_06.nc
  Attempting to get data, counter = 1
  Downloaded data
  Time elapsed: 20.5 seconds
drive/MyDrive/Colab Notebooks/glorys/glorys_biogeochem_202