- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 1.2k
Closed
Description
We have a pandas DataFrame which is not aligned on an xray Dataset:
In [34]:
da = xray.DataArray(
    np.random.rand(5,2), 
    coords=(
    ('date', pd.date_range(start='2000', periods=5)),
    ('company', list('ab')),
        )
    )
da
Out[34]:
<xray.DataArray (date: 5, company: 2)>
array([[ 0.82168647,  0.93097023],
       [ 0.34928855,  0.23245631],
       [ 0.32857461,  0.12554705],
       [ 0.44983381,  0.27182767],
       [ 0.31063147,  0.52894834]])
Coordinates:
  * date     (date) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 ...
  * company  (company) |S1 'a' 'b'
In [35]:
ds = xray.Dataset({'returns': da})
ds
Out[35]:
<xray.Dataset>
Dimensions:  (company: 2, date: 5)
Coordinates:
  * date     (date) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 ...
  * company  (company) |S1 'a' 'b'
Data variables:
    returns  (date, company) float64 0.8217 0.931 0.3493 0.2325 0.3286 ...
In [36]:
df=da.to_pandas()
df
Out[36]:
company a   b
date        
2000-01-01  0.821686    0.930970
2000-01-02  0.349289    0.232456
2000-01-03  0.328575    0.125547
2000-01-04  0.449834    0.271828
2000-01-05  0.310631    0.528948
In [41]:
rank
rank = df.rank()
rank
Out[41]:
company a   b
date        
2000-01-01  5   5
2000-01-02  3   2
2000-01-03  2   1
2000-01-04  4   3
2000-01-05  1   4
In [42]:
rank=rank.reindex(columns=list('ba'))
rank
Out[42]:
company b   a
date        
2000-01-01  5   5
2000-01-02  2   3
2000-01-03  1   2
2000-01-04  3   4
2000-01-05  4   1When we add it to a Dataset, it ignores the index on the columns:
In [49]:
ds['rank'] = (('date','company'),rank)
ds['rank'].to_pandas()
Out[49]:
company a   b
date        
2000-01-01  5   5
2000-01-02  2   3
2000-01-03  1   2
2000-01-04  3   4
2000-01-05  4   1And adding the DataFrame without supplying dims doesn't work. One solution, is to construct a DataArray out of the pandas object:
In [45]:
ds['rank'] = xray.DataArray(rank)
ds
Out[45]:
<xray.Dataset>
Dimensions:  (company: 2, date: 5)
Coordinates:
  * date     (date) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 ...
  * company  (company) object 'a' 'b'
Data variables:
    returns  (date, company) float64 0.8217 0.931 0.3493 0.2325 0.3286 ...
    rank     (date, company) float64 5.0 5.0 3.0 2.0 2.0 1.0 4.0 3.0 1.0 4.0Possible additions to make this easier:
- Align pandas objects that are passed in, when dims are supplied
- Allow adding pandas objects to Datasets with labelled axes without supplying dims, and align those (similar to wrapping them in a DataArray constructor)
What are your thoughts?
Metadata
Metadata
Assignees
Labels
No labels