Here we download Get the CPC data from the IRI Data Library.
We only get rainfall estimate -- neglects information on number of stations used in analysis.
Please see caveats and documentation on IRI Server at http://iridl.ldeo.columbia.edu/SOURCES/.NOAA/.NCEP/.CPC/.UNIFIED_PRCP/.GAUGE_BASED/.GLOBAL/.v1p0/

In [1]:
import pandas as pd
import numpy as np
import datetime
import os
import xarray as xr
from paraguayfloodspy.xrutil import *
from paraguayfloodspy.pars  import GetPars

Get the parameters we need

In [2]:
pars = GetPars('time')
syear, eyear = pars['syear'], pars['eyear']
overwrite = False

Define a function to do the heavy lifting.
Essentially, the function, for each year:

1. Sets the URL for "retro" data if the year is 2005 or earlier, or the "realtime" data for after 2005
2. Converts the IRI Data Library values of T to a more standard format
3. Reads a year of data

This data is then saved to file

In [3]:
# IRI Has strange time conventions
def convert_t_to_time(Tvec):
    times = np.array([datetime.date(1960, 1, 1) + datetime.timedelta(np.int(ti)) for ti in Tvec])
    return(np.array(times))
def convert_time_to_t(date):
    date_diff = date - datetime.date(1960,1,1)
    T = date_diff.days
    return(T)

In [4]:
def IRICPCYear(year, verbose=True):
        if verbose:
            print('Downloading data for {}...'.format(year))
        # realtime or retro?
        if year >= 1979 and year <= 2005:
            url = 'http://iridl.ldeo.columbia.edu/SOURCES/.NOAA/.NCEP/.CPC/.UNIFIED_PRCP/.GAUGE_BASED/.GLOBAL/.v1p0/.RETRO/.rain/dods'
        elif year >= 2006 and year <= 2016:
            url = 'http://iridl.ldeo.columbia.edu/SOURCES/.NOAA/.NCEP/.CPC/.UNIFIED_PRCP/.GAUGE_BASED/.GLOBAL/.v1p0/.REALTIME/.rain/dods'
        else:
            raise ValueError('You have entered an invalid year. {} is outside range [1979, 2016]'.format(year))

        # get the data
        Tstart = convert_time_to_t(datetime.date(year, 1, 1))
        Tend = convert_time_to_t(datetime.date(year, 12, 31))
        ds = xr.open_dataarray(url, decode_times=False)
        ds = ds.sel(T = slice(Tstart, Tend))
        ds.load() # force it to download

        # convert to more standard format
        ds = ds.rename({'X': 'lon', 'Y': 'lat', 'T': 'time'})
        ds['time'] = convert_t_to_time(ds['time'])
        ds['time'] = ds['time'].astype('datetime64')
        return(ds)

For each year of data that we download, we need a filename for it

In [5]:
def GetFileName(year):
    fn = "../_data/rainfall/raw/cpc_{}.nc".format(year)
    return(fn)

Now loop through the years and download data to file

In [6]:
for year_i in np.arange(syear, eyear+1):
    fn = GetFileName(year=year_i)
    if os.path.isfile(fn) and not overwrite:
        print("\tData for {} already exists -- not re-downloading".format(year_i))
    else:
        ds = IRICPCYear(year=year_i, verbose=True)
        ds.to_netcdf(fn, format='NETCDF4')

Downloading data for 1979...
Downloading data for 1980...
Downloading data for 1981...
Downloading data for 1982...
Downloading data for 1983...
Downloading data for 1984...
Downloading data for 1985...
Downloading data for 1986...
Downloading data for 1987...
Downloading data for 1988...
Downloading data for 1989...
Downloading data for 1990...
Downloading data for 1991...
Downloading data for 1992...
Downloading data for 1993...
Downloading data for 1994...
Downloading data for 1995...
Downloading data for 1996...
Downloading data for 1997...
Downloading data for 1998...
Downloading data for 1999...
Downloading data for 2000...
Downloading data for 2001...
Downloading data for 2002...
Downloading data for 2003...
Downloading data for 2004...
Downloading data for 2005...
Downloading data for 2006...
Downloading data for 2007...
Downloading data for 2008...
Downloading data for 2009...
Downloading data for 2010...
Downloading data for 2011...
Downloading data for 2012...
Downloading da