Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected behaviour of climpred.utils.add_time_from_init_lead() #698

Closed
dougiesquire opened this issue Dec 2, 2021 · 14 comments · Fixed by #700
Closed

Unexpected behaviour of climpred.utils.add_time_from_init_lead() #698

dougiesquire opened this issue Dec 2, 2021 · 14 comments · Fixed by #700
Labels

Comments

@dougiesquire
Copy link
Collaborator

Describe the bug
The function climpred.utils.add_time_from_init_lead (which adds the "valid_time" coordinate) seems to return unusual values when the lead frequency is "yearly" and the init frequency is higher than annual (e.g. 6 monthly). The "valid_time" coordinate is used heavily by climpred internals so this causes strange behaviour/bugs further downstream.

Code Sample

# Hindcasts initialised every 6 months with yearly lead
init = xr.cftime_range(start="2000-01-01", end="2002-01-01", freq="6MS")
lead = range(0, 5)
data = np.random.random((len(init), len(lead)))
hind = xr.DataArray(data, coords=[init, lead], dims=["init", "lead"], name="var")
hind["lead"].attrs["units"] = "years"

# Add "valid_time" coordinate using `climpred.utils.add_time_from_init_lead()`
hind = climpred.utils.add_time_from_init_lead(hind)
print(hind["valid_time"])

Returns

<xarray.DataArray 'valid_time' (lead: 4, init: 5)>
array([[cftime.DatetimeGregorian(2000, 10, 1, 0, 0, 0, 0, has_year_zero=False),
        cftime.DatetimeGregorian(2000, 10, 1, 0, 0, 0, 0, has_year_zero=False),
        cftime.DatetimeGregorian(2001, 10, 1, 0, 0, 0, 0, has_year_zero=False),
        cftime.DatetimeGregorian(2001, 10, 1, 0, 0, 0, 0, has_year_zero=False),
        cftime.DatetimeGregorian(2002, 10, 1, 0, 0, 0, 0, has_year_zero=False)],
       [cftime.DatetimeGregorian(2000, 10, 1, 0, 0, 0, 0, has_year_zero=False),
        cftime.DatetimeGregorian(2000, 10, 1, 0, 0, 0, 0, has_year_zero=False),
        cftime.DatetimeGregorian(2001, 10, 1, 0, 0, 0, 0, has_year_zero=False),
        cftime.DatetimeGregorian(2001, 10, 1, 0, 0, 0, 0, has_year_zero=False),
        cftime.DatetimeGregorian(2002, 10, 1, 0, 0, 0, 0, has_year_zero=False)],
       [cftime.DatetimeGregorian(2001, 10, 1, 0, 0, 0, 0, has_year_zero=False),
        cftime.DatetimeGregorian(2001, 10, 1, 0, 0, 0, 0, has_year_zero=False),
        cftime.DatetimeGregorian(2002, 10, 1, 0, 0, 0, 0, has_year_zero=False),
        cftime.DatetimeGregorian(2002, 10, 1, 0, 0, 0, 0, has_year_zero=False),
        cftime.DatetimeGregorian(2003, 10, 1, 0, 0, 0, 0, has_year_zero=False)],
       [cftime.DatetimeGregorian(2002, 10, 1, 0, 0, 0, 0, has_year_zero=False),
        cftime.DatetimeGregorian(2002, 10, 1, 0, 0, 0, 0, has_year_zero=False),
        cftime.DatetimeGregorian(2003, 10, 1, 0, 0, 0, 0, has_year_zero=False),
        cftime.DatetimeGregorian(2003, 10, 1, 0, 0, 0, 0, has_year_zero=False),
        cftime.DatetimeGregorian(2004, 10, 1, 0, 0, 0, 0, has_year_zero=False)]],
      dtype=object)
Coordinates:
  * init        (init) object 2000-01-01 00:00:00 ... 2002-01-01 00:00:00
  * lead        (lead) int64 0 1 2 3
    valid_time  (lead, init) object 2000-10-01 00:00:00 ... 2004-10-01 00:00:00
Attributes:
    long_name:      validity time
    standard_name:  time
    description:    time for which the forecast is valid
    calculate:      init + lead

You can see that the "valid_time"s at the first two leads are identical and that all "valid_times" are at month 10, regardless of initialisation month (which alternates between 1 and 7).

Expected behavior
"valid_time"s should correspond to L years after the initialisation date, where L is the lead.

Output of climpred.show_versions()

# Paste the output here climpred.show_versions() here INSTALLED VERSIONS ------------------ commit: None python: 3.9.4 | packaged by conda-forge | (default, May 10 2021, 22:13:33) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 4.18.0-305.19.1.el8.nci.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: None LOCALE: en_US.UTF-8

climpred: 2.1.7.dev10+gb8bf61b
xarray: 0.20.1
pandas: 1.3.4
numpy: 1.21.4
scipy: 1.6.3
cftime: 1.5.0
netcdf4: None
nc_time_axis: 1.4.0
matplotlib: 3.4.2
cf_xarray: 0.6.1
xclim: 0.31.0
dask: 2021.11.2
distributed: 2021.11.2
setuptools: 49.6.0.post20210108
pip: 21.1.2
conda: 4.11.0
IPython: 7.24.0
sphinx: None

@dougiesquire
Copy link
Collaborator Author

P.S. other combinations of init and lead frequencies also produce unusual results, e.g. monthly init and "seasons" lead (what is a seasonal lead time anyway? Is it 3 months?)

@aaronspring
Copy link
Collaborator

Thanks for the bug report.

Yes. Seasonal means 3 months.

@aaronspring
Copy link
Collaborator

Instead of lead attrs years pleas try months and range(0,12*5,12). Cftimeindex.shift on init causes the problems I think and I should implement it better.

@aaronspring
Copy link
Collaborator

aaronspring commented Dec 3, 2021

I will look into your example tomorrow.

Thanks for trying to use climpred and the reproducible issue with sample code @dougiesquire

@aaronspring
Copy link
Collaborator

The freq is strange:

xr.cftime_range(start="2000-01-01", end="2002-01-01", freq="6MS")
CFTimeIndex([2000-01-01 00:00:00, 2000-07-01 00:00:00, 2001-01-01 00:00:00,
             2001-07-01 00:00:00, 2002-01-01 00:00:00],
            dtype='object', length=5, calendar='gregorian', freq='2QS-OCT')

totally unrelated to OCT IMO

@aaronspring
Copy link
Collaborator

pandas version

pd.date_range(start="2000-01-01", end="2002-01-01", freq="6MS")
DatetimeIndex(['2000-01-01', '2000-07-01', '2001-01-01', '2001-07-01',
               '2002-01-01'],
              dtype='datetime64[ns]', freq='6MS')

@dougiesquire
Copy link
Collaborator Author

Yes that does seem odd - good catch. Perhaps worth opening an xarray issue?

@aaronspring
Copy link
Collaborator

aaronspring commented Dec 3, 2021

either xarray or cftime, would you raise the issue? - still trying to fix a quick fix for your climpred timings

EDIT: https://github.com/pydata/xarray/blob/eea76733770be03e78a0834803291659136bca31/xarray/coding/frequencies.py#L58

@dougiesquire
Copy link
Collaborator Author

Thanks, I'll post an issue, but probably won't get to it till Monday

@aaronspring
Copy link
Collaborator

I started digging into xarray, must be somewhere here: https://github.com/pydata/xarray/blob/eea76733770be03e78a0834803291659136bca31/xarray/coding/frequencies.py#L149

do you mind if I post the issue?

@aaronspring
Copy link
Collaborator

aaronspring commented Dec 3, 2021

removing mod_dict in https://github.com/pydata/xarray/blob/eea76733770be03e78a0834803291659136bca31/xarray/coding/frequencies.py#L161 would yield freq 2QS which is identical to 6MS.

i = pd.date_range(start="2000-01-01", end="2002-01-01", freq="6MS")
i
DatetimeIndex(['2000-01-01', '2000-07-01', '2001-01-01', '2001-07-01',
               '2002-01-01'],
              dtype='datetime64[ns]', freq='6MS')

from xarray.coding.frequencies import (
    _CFTimeFrequencyInferer,
    _ONE_DAY,
    _is_multiple,
    _MONTH_ABBREVIATIONS,
    _maybe_add_count,
)

fi = _CFTimeFrequencyInferer(i)
fi.month_deltas  # array([6])

fi.get_freq()  # 2QS-OCT

fi._infer_daily_rule()  # 2QS-OCT

_is_multiple(fi.deltas, _ONE_DAY)  # TRUE

fi._get_annual_rule()  # None

quartely_rule = fi._get_quartely_rule()  # QS

nquarters = fi.month_deltas[0] / 3  # 2

mod_dict = {0: 12, 2: 11, 1: 10}
month = _MONTH_ABBREVIATIONS[mod_dict[fi.index[0].month % 3]]  # OCT

alias = f"{quartely_rule}-{month}"
_maybe_add_count(alias, nquarters) # 2QS-OCT

fi.index[0].month  # 1

fi.index[0].month % 3  # 1

_MONTH_ABBREVIATIONS[fi.index[0].month % 3] # JAN when removing mod_dict

@aaronspring
Copy link
Collaborator

ok. the above should be an xarray issue, but there is still an unrelated issue in climpred. .utils.my_shift just needs the quartely_rule and nquarters or monthly_rule and nmonths and so far doesnt use nquarters afaik:

def my_shift(init, lead):

@dougiesquire
Copy link
Collaborator Author

I started digging into xarray, must be somewhere here: https://github.com/pydata/xarray/blob/eea76733770be03e78a0834803291659136bca31/xarray/coding/frequencies.py#L149

do you mind if I post the issue?

Please go ahead

@aaronspring
Copy link
Collaborator

aaronspring commented Dec 4, 2021

very likely not an xarray/pandas bug but me/us not understanding the docs

nevertheless I found a fix in #700

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants