Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression: time attributes on PeriodIndex #1565

Open
fmaussion opened this issue Sep 10, 2017 · 12 comments
Open

Regression: time attributes on PeriodIndex #1565

fmaussion opened this issue Sep 10, 2017 · 12 comments
Labels

Comments

@fmaussion
Copy link
Member

The following used to work with xarray 0.9.5 but doesn't anymore with 0.9.6 or master:

import xarray as xr
import pandas as pd
import numpy as np
time = pd.period_range('2000-01', '2000-12', freq='M')
da = xr.DataArray(np.arange(12), dims=['time'], coords={'time':time})
da['time.month']
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~/.pyvirtualenvs/py3/lib/python3.5/site-packages/xarray/core/dataarray.py in _getitem_coord(self, key)
    458         try:
--> 459             var = self._coords[key]
    460         except KeyError:

KeyError: 'time.month'

During handling of the above exception, another exception occurred:

AttributeError                            Traceback (most recent call last)
<ipython-input-1-41829b924596> in <module>()
      4 time = pd.period_range('2000-01', '2000-12', freq='M')
      5 da = xr.DataArray(np.arange(12), dims=['time'], coords={'time':time})
----> 6 da['time.month']

~/.pyvirtualenvs/py3/lib/python3.5/site-packages/xarray/core/dataarray.py in __getitem__(self, key)
    467     def __getitem__(self, key):
    468         if isinstance(key, basestring):
--> 469             return self._getitem_coord(key)
    470         else:
    471             # orthogonal array indexing

~/.pyvirtualenvs/py3/lib/python3.5/site-packages/xarray/core/dataarray.py in _getitem_coord(self, key)
    461             dim_sizes = dict(zip(self.dims, self.shape))
    462             _, key, var = _get_virtual_variable(
--> 463                 self._coords, key, self._level_coords, dim_sizes)
    464 
    465         return self._replace_maybe_drop_dims(var, name=key)

~/.pyvirtualenvs/py3/lib/python3.5/site-packages/xarray/core/dataset.py in _get_virtual_variable(variables, key, level_vars, dim_sizes)
     82             data = getattr(ref_var.dt, var_name).data
     83         else:
---> 84             data = getattr(ref_var, var_name).data
     85         virtual_var = Variable(ref_var.dims, data)
     86 

AttributeError: 'IndexVariable' object has no attribute 'month'
@shoyer
Copy link
Member

shoyer commented Sep 11, 2017

I never intended to support time attributes on PeriodIndex objects so there was never any test coverage here, but I guess it has been working! Nonetheless this should be pretty easy to fix.

@fmaussion
Copy link
Member Author

I was wondering which PR was responsible for the regression, but I guess I don't understand the internals well enough to classify this as "easy fix" for me ;)

@shoyer
Copy link
Member

shoyer commented Sep 11, 2017

Looks like #1356, which introduced a is_datatime_like() check to the dtype of a variable before adding time components.

@fmaussion
Copy link
Member Author

Indeed! Will look into it.

@fmaussion
Copy link
Member Author

OK, so the problem is that when given to is_datetime_like here, the pd.PeriodIndex is already transformed into an array of dtype object.

Before @darothen 's #1356, the piece if code that was executed is date = ref_var.to_index(), which has the nice property to return a PeriodIndex, as shown here from my debugger:

>>> ref_var
Out[1]: 
<xarray.IndexVariable 'time' (time: 12)>
array([Period('2000-01', 'M'), Period('2000-02', 'M'), Period('2000-03', 'M'),
       Period('2000-04', 'M'), Period('2000-05', 'M'), Period('2000-06', 'M'),
       Period('2000-07', 'M'), Period('2000-08', 'M'), Period('2000-09', 'M'),
       Period('2000-10', 'M'), Period('2000-11', 'M'), Period('2000-12', 'M')], dtype=object)
>>> ref_var.to_index()
Out[3]: 
PeriodIndex(['2000-01', '2000-02', '2000-03', '2000-04', '2000-05', '2000-06',
             '2000-07', '2000-08', '2000-09', '2000-10', '2000-11', '2000-12'],
            dtype='period[M]', name='time', freq='M')

@shoyer
Copy link
Member

shoyer commented Sep 11, 2017

It's only an dtype=object array externally: internally, the data is still stored as a pandas.PeriodIndex. I think you'll find that inside ref_var._data.array .

Potentially we could some sort of API for surfacing this information, e.g., a pandas_dtype property to xarray.Variable.

@fmaussion
Copy link
Member Author

Potentially we could some sort of API for surfacing this information, e.g., a pandas_dtype property to xarray.Variable.

It's probably cleaner, because the dtype of PeriodIndex can be several things, i.e. 'period[M]' in my case.

@stale
Copy link

stale bot commented Aug 14, 2019

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity

If this issue remains relevant, please comment here or remove the stale label; otherwise it will be marked as closed automatically

@stale stale bot added the stale label Aug 14, 2019
@stale stale bot closed this as completed Sep 13, 2019
@dcherian dcherian reopened this Sep 13, 2019
@stale stale bot removed the stale label Sep 13, 2019
@hding1981
Copy link

I have similar problems with time coordinate. How did you solve your problem in the end?

I also list my problem here.

The following are my commands in a jupyter notebook

import xarray as xr
import numpy as np
import pandas as pd
import sys

dset=xr.open_dataset("input/ts_Amon_CNRM-CM6-1_piControl_r1i1p1f2_gr_185001-234912.nc",decode_times=False)
dset['time'] = pd.period_range(start='1850-01-15', end='2349-12-15', freq='M')
varname="ts"
anom = dset[varname].groupby('time.month')-dset[varname].groupby('time.month').mean('time', keep_attrs=True)

Then, I got the following error message.


KeyError Traceback (most recent call last)
~/miniconda3/envs/python_tutorial/lib/python3.9/site-packages/xarray/core/dataarray.py in _getitem_coord(self, key)
692 try:
--> 693 var = self._coords[key]
694 except KeyError:

KeyError: 'time.month'

During handling of the above exception, another exception occurred:

AttributeError Traceback (most recent call last)
in
7 dset['time'] = pd.period_range(start='1850-01-15', end='2349-12-15', freq='M')
8 varname="ts"
----> 9 anom = dset[varname].groupby('time.month')-dset[varname].groupby('time.month').mean('time', keep_attrs=True)

~/miniconda3/envs/python_tutorial/lib/python3.9/site-packages/xarray/core/common.py in groupby(self, group, squeeze, restore_coord_dims)
703 )
704
--> 705 return self._groupby_cls(
706 self, group, squeeze=squeeze, restore_coord_dims=restore_coord_dims
707 )

~/miniconda3/envs/python_tutorial/lib/python3.9/site-packages/xarray/core/groupby.py in init(self, obj, group, squeeze, grouper, bins, restore_coord_dims, cut_kwargs)
313 f"Received {group!r} instead."
314 )
--> 315 group = obj[group]
316 if len(group) == 0:
317 raise ValueError(f"{group.name} must not be empty")

~/miniconda3/envs/python_tutorial/lib/python3.9/site-packages/xarray/core/dataarray.py in getitem(self, key)
702 def getitem(self, key: Any) -> "DataArray":
703 if isinstance(key, str):
--> 704 return self._getitem_coord(key)
705 else:
706 # xarray-style array indexing

~/miniconda3/envs/python_tutorial/lib/python3.9/site-packages/xarray/core/dataarray.py in _getitem_coord(self, key)
694 except KeyError:
695 dim_sizes = dict(zip(self.dims, self.shape))
--> 696 _, key, var = _get_virtual_variable(
697 self._coords, key, self._level_coords, dim_sizes
698 )

~/miniconda3/envs/python_tutorial/lib/python3.9/site-packages/xarray/core/dataset.py in _get_virtual_variable(variables, key, level_vars, dim_sizes)
179 data = getattr(ref_var.dt, var_name).data
180 else:
--> 181 data = getattr(ref_var, var_name).data
182 virtual_var = Variable(ref_var.dims, data)
183

AttributeError: 'IndexVariable' object has no attribute 'month'

@max-sixty
Copy link
Collaborator

@hding1981 IIUC this isn't supported at the moment. Though with the approaching index refactor, we may be able to support this more easily

@hding1981
Copy link

I am calculating monthly mean anomalies from a CMIP6 file, which is attached and ended by .docx (Please remove .docx from its name after downloading it).

This file has reasonable time coordinate, which has been confirmed by ncdump and cdo. But it shows NaT after a time point by reading the file using xr.open_dataset. I printed out outputs of processing this file using some commands in a jyputer notebook in a pdf file, which is also attached. I really have no idea why xr.open_datase cannot read the time coordinate properly.

Then, I thought maybe I can redefine its time axis by pd.date_range. But it is also not working.

Thank you so much!

Untitled5 - Jupyter Notebook.pdf
ts_Amon_CNRM-CM6-1_piControl_r1i1p1f2_gr_185001-234912.tmp.nc.docx

@max-sixty
Copy link
Collaborator

@hding1981 you would need to make a MCVE like this: https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports in a new issue. It's unlikely someone can help you debug your data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants