Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: add to_xarray conversion method #11972

Closed
wants to merge 1 commit into from
Closed

Conversation

jreback
Copy link
Contributor

@jreback jreback commented Jan 6, 2016

supersedes #11950
xref #10000

using xarray >= 07.0

In [10]: Series([1,2,3]).to_xarray()
Out[10]: 
<xarray.DataArray (index: 3)>
array([1, 2, 3])
Coordinates:
  * index    (index) int64 0 1 2

In [11]: DataFrame({'A' : [1,2,3], 'B' : ['foo','bar','baz']}).to_xarray()
Out[11]: 
<xarray.Dataset>
Dimensions:  (index: 3)
Coordinates:
  * index    (index) int64 0 1 2
Data variables:
    A        (index) int64 1 2 3
    B        (index) object 'foo' 'bar' 'baz'

In [14]: Panel(np.random.randn(2, 3, 4)).to_xarray()
Out[14]: 
<xarray.DataArray (items: 2, major_axis: 3, minor_axis: 4)>
array([[[ 0.23726039,  0.44636322,  0.04425575,  1.06178388],
        [-0.58236405,  0.62602167,  0.36156612, -0.12687913],
        [-0.67854107,  0.72270844, -1.15402631, -1.89758909]],

       [[ 0.42948622,  0.93227075, -0.13943692,  0.83043343],
        [ 1.0355157 , -0.30532004, -0.16337369, -0.29283026],
        [ 0.65199775, -0.11131818,  0.37853134,  1.56620844]]])
Coordinates:
  * items       (items) int64 0 1
  * major_axis  (major_axis) int64 0 1 2
  * minor_axis  (minor_axis) int64 0 1 2 3

TODO:

  • examples in doc-string
  • how-to-convert/use xarray

@jreback jreback added Enhancement Reshaping Concat, Merge/Join, Stack/Unstack, Explode Compat pandas objects compatability with Numpy or Python functions labels Jan 6, 2016
@jreback jreback added this to the 0.18.0 milestone Jan 6, 2016
@jreback
Copy link
Contributor Author

jreback commented Jan 6, 2016

since I renamed the branch now really sure if there was a way to avoid creating a new PR. oh well.

if self.ndim == 1:
return xray.DataArray.from_series(self)
elif self.ndim == 2:
return xray.Dataset.from_dataframe(self)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shoyer should this be different from just xray.DataArray(series) or xray.Dataset(df)?
This might be an xray change; seems a bit off to have special handling in pandas though

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

best would be: xarray.from_pandas(...) (and you guys handle the construction).

Though ok here as well. as > 2 nd we want to have easy transition for current Panel users.

@jreback
Copy link
Contributor Author

jreback commented Jan 6, 2016

@shoyer ping when releasing new xarray. I am going to min-version this to what you release.

@jreback
Copy link
Contributor Author

jreback commented Jan 6, 2016

@MaximilianR it would be quite helpful if you can post / write a mini-doc (where we can incorporate in a doc-string / document) on how to migrate Panel -> xarray, with use-cases / how-tos etc.

can obviously also add to here: http://xray.readthedocs.org/en/stable/pandas.html

@max-sixty
Copy link
Contributor

@jreback Yes I can have a go at that. I could see that being fairly short given the existing docs - just a couple of examples of Panel migration. Is that what you envision?

@jreback
Copy link
Contributor Author

jreback commented Jan 6, 2016

@MaximilianR yep, the migration part as well as 'working' with them (which might require another section). E.g. imagine you have Panels and now you have xarray objects. So they want to perform 'typical' operations there (we will prob need a section in the pandas docs when I do the deprecation PR's) as well (so will effectively just copy what is there).

@jreback jreback force-pushed the xarray branch 2 times, most recently from c03bdd4 to 2a007b5 Compare January 7, 2016 01:17

# > 2 dims
coords = [(a, self._get_axis(a)) for a in self._AXIS_ORDERS]
return xray.DataArray(self,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shoyer you didn't like this?

as an aside, do you want to put this routine in xarray and I'll just call xray.DataArray.from_pandas(self)?

@jreback
Copy link
Contributor Author

jreback commented Jan 26, 2016

@shoyer @MaximilianR

updated to use xarray 0.7.0

works with all index types including MultiIndex! nice!
Panel <-> xarray <-> Panel works nicely!
>3 dim doesn't convert back to pandas (but no problem)

@jreback
Copy link
Contributor Author

jreback commented Jan 26, 2016

I think you need to have RTD use xarray? (and have a pointer from xray)?

@shoyer
Copy link
Member

shoyer commented Jan 26, 2016

@jreback I'm suggesting http://xarray.pydata.org/ instead... it turns out you can't change RTD stubs, so we're stuck with http://xray.readthedocs.org for now.


# > 2 dims
coords = [(a, self._get_axis(a)) for a in self._AXIS_ORDERS]
return xarray.DataArray(self,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should add Panel4D support to the DataArray constructor so this could just be xarray.DataArray(self)... OTOH I'm pretty sure Panel4D is barely used

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeh its no big deal. you could always add later if its real useful.

@jreback
Copy link
Contributor Author

jreback commented Jan 27, 2016

expected,
check_index_type=False)

# not implemented
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shoyer

what am I doing wrong here?

In [1]:        df = DataFrame({'a': list('abc'),
   ...:                         'b': list(range(1, 4)),
   ...:                         'c': np.arange(3, 6).astype('u1'),
   ...:                         'd': np.arange(4.0, 7.0, dtype='float64'),
   ...:                         'e': [True, False, True],
   ...:                         'f': pd.Categorical(list('abc')),
   ...:                         'g': pd.date_range('20130101', periods=3),
   ...:                         'h': pd.date_range('20130101',
   ...:                                            periods=3,
   ...:                                            tz='US/Eastern')}
   ...:                        )

In [2]: df.to_xarray()
Out[2]: 
<xarray.Dataset>
Dimensions:  (index: 3)
Coordinates:
  * index    (index) int64 0 1 2
Data variables:
    a        (index) object 'a' 'b' 'c'
    b        (index) int64 1 2 3
    c        (index) uint8 3 4 5
    d        (index) float64 4.0 5.0 6.0
    e        (index) bool True False True
    f        (index) category 'a' 'b' 'c'
    g        (index) datetime64[ns] 2013-01-01 2013-01-02 2013-01-03
    h        (index) datetime64[ns] 2013-01-01T05:00:00 2013-01-02T05:00:00 ...

In [3]:        df = DataFrame({'a': list('abc'),
   ...:                         'b': list(range(1, 4)),
   ...:                         'c': np.arange(3, 6).astype('u1'),
   ...:                         'd': np.arange(4.0, 7.0, dtype='float64'),
   ...:                         'e': [True, False, True],
   ...:                         'f': pd.Categorical(list('abc')),
   ...:                         'g': pd.date_range('20130101', periods=3),
   ...:                         'h': pd.date_range('20130101',
   ...:                                            periods=3,
   ...:                                            tz='US/Eastern')}
   ...:                        )

In [4]:        df.index = pd.MultiIndex.from_product([['a'], range(3)],
   ...:                                               names=['one', 'two'])

In [5]: df
Out[5]: 
         a  b  c  d      e  f          g                         h
one two                                                           
a   0    a  1  3  4   True  a 2013-01-01 2013-01-01 00:00:00-05:00
    1    b  2  4  5  False  b 2013-01-02 2013-01-02 00:00:00-05:00
    2    c  3  5  6   True  c 2013-01-03 2013-01-03 00:00:00-05:00

In [6]: df.to_xarray()
ValueError: dimensions ('one', 'two') must have the same length as the number of data dimensions, ndim=1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to blame this one (in part) on pandas's Categorical:

ipdb> series.values
[a, b, c]
Categories (3, object): [a, b, c]
ipdb> series.values.reshape(shape)
[a, b, c]
Categories (3, object): [a, b, c]
ipdb> shape
[1, 3]

Instead of erroring, it ignores the reshape argument (to 2D).

This certainly needs a fix in xarray, too, though -- we should use np.asarray when converting DataFrame columns rather than .values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, going to merge as is then. do you want me to create an issue on xarray for this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pydata/xarray#737

yes, let's merge -- I'll fix that in the next xarray bug fix release

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep thxs

@jreback
Copy link
Contributor Author

jreback commented Jan 27, 2016

@shoyer note that conda has updated to include xarray

@@ -245,6 +245,7 @@ Optional Dependencies
* `Cython <http://www.cython.org>`__: Only necessary to build development
version. Version 0.19.1 or higher.
* `SciPy <http://www.scipy.org>`__: miscellaneous statistical functions
* `xarray <http://xarray.readthedocs.org>`__: pandas like handling for > 2 dims. Version 0.7.0 or higher is recommeded.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

recommeded -> recommended

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe also mention for what functionality this optional dependency is needed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@max-sixty
Copy link
Contributor

@jreback @shoyer I've done a (currently very short & basic) draft of how to move from pandas to xarray, with the intention you guys can offer some feedback and we can iterate.

Do you want me to do a separate PR into pandas? Or should this go into the 'from Pandas' section of the xarray docs, with a link from pandas' What's New?

@jreback
Copy link
Contributor Author

jreback commented Jan 28, 2016

@MaximilianR I think best is to add to the xray docs, which we will link to.

@jorisvandenbossche
Copy link
Member

Personally I think this maybe more belongs in the pandas docs. I mean, the xarray docs could certainly have a section about how it interplays with pandas (as it already has, http://xarray.pydata.org/en/stable/pandas.html), but as this will also be about how to move from the deprecated panel to xarray, it feels more at place in the pandas docs (I don't think xarray docs should handle about deprecated pandas features).

Anyhow, not that important, as just link to wherever it is located.

@jreback
Copy link
Contributor Author

jreback commented Jan 28, 2016

@jorisvandenbossche right, I have a note about that above, e.g. about how to transition from using a Panel to a DataArray.

So @MaximilianR can certainly add to the pandas docs (but should obviously add to xarray as well)

@@ -271,6 +271,7 @@ In addition, ``.round()``, ``.floor()`` and ``.ceil()`` will be available thru t
s
s.dt.round('D')

<<<<<<< 6693a723aa2a8a53a071860a43804c173a7f92c6
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merge conflict

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep fixing as I am merging
thxs

@jreback
Copy link
Contributor Author

jreback commented Feb 10, 2016

@shoyer was just about to merge. anything else besides those 2 comments?

(and the linked issue for more docs)

@jreback jreback closed this in 358da56 Feb 10, 2016
@jreback
Copy link
Contributor Author

jreback commented Feb 10, 2016

@jorisvandenbossche I think I broke the doc build, but not sure how: https://travis-ci.org/pydata/pandas/jobs/108337474


See Also
--------
`xarray docs <http://xarray.pydata.org/en/stable/>`__
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly it is this line where it is choking on, as in a See also should come a python object

I would just make a 'Note' section of it instead of See also, and then say "See also the xarray docs .."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cldy pushed a commit to cldy/pandas that referenced this pull request Feb 11, 2016
supersedes pandas-dev#11950
xref pandas-dev#10000

Author: Jeff Reback <jeff@reback.net>

Closes pandas-dev#11972 from jreback/xarray and squashes the following commits:

85de0b7 [Jeff Reback] ENH: add to_xarray conversion method
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions Enhancement Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants