Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should series have an is_view method? #5856

Closed
ghost opened this issue Jan 5, 2014 · 10 comments · Fixed by #5859
Closed

Should series have an is_view method? #5856

ghost opened this issue Jan 5, 2014 · 10 comments · Fixed by #5859
Milestone

Comments

@ghost
Copy link

ghost commented Jan 5, 2014

See reverted PR in #5853 and discussion on synergy with changes to pandas
re SetttingWithCopy.

@jtratner
Copy link
Contributor

jtratner commented Jan 5, 2014

This would be helped by understanding what code actually needs this on the
user side. @jseabold ?

@jtratner
Copy link
Contributor

jtratner commented Jan 5, 2014

I just played around with this and I don't understand the original logic - numpy doesn't raise on sorting a view and pandas doesn't either (in fact you get into some weird behavior)...

In [9]: ser = pd.Series(range(10)[::-1]); ser
Out[9]:
9    9
8    8
7    7
6    6
5    5
4    4
3    3
2    2
1    1
0    0
dtype: int64

In [11]: view = ser.view()

In [12]: view.sort()

In [13]: view
Out[13]:
0    0
1    1
2    2
3    3
4    4
5    5
6    6
7    7
8    8
9    9
dtype: int64

In [14]: ser
Out[14]:
9    0
8    1
7    2
6    3
5    4
4    5
3    6
2    7
1    8
0    9
dtype: int64

In [15]: vals = ser.values.copy()

In [16]: vals
Out[16]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [17]: vals = vals[::-1]

In [18]: np_view = vals.view()

In [19]: np_view.sort()

In [20]: np_view
Out[20]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [21]: vals
Out[21]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

@ghost
Copy link
Author

ghost commented Jan 5, 2014

numpy sort returns a copy, series.sort sorts in place, hence the hazard.

In [53]: df=mkdf(10,4,data_gen_f=lambda r,c:np.random.random())
    ...: ss= df.iloc[:,0]
    ...: ss.sort()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-53-27670adc5cb8> in <module>()
      1 s=mkdf(10,4,data_gen_f=lambda r,c:np.random.random())
      2 ss= s.iloc[:,0]
----> 3 ss.sort()

/home/user1/src/pandas/pandas/core/series.py in sort(self, axis, kind, order, ascending)
   1674         if (true_base is not None and
   1675                 (true_base.ndim != 1 or true_base.shape != self.shape)):
-> 1676             raise TypeError('This Series is a view of some other array, to '
   1677                             'sort in-place you must create a copy')
   1678 

TypeError: This Series is a view of some other array, to sort in-place you must create a copy

ss.base/ss.values.base points at the underlying ndarray held by df's block manager if I understand correctly.

@jtratner
Copy link
Contributor

jtratner commented Jan 5, 2014

Except Series.view() creates an object that doesn't have is_view() == True... (or at least that's what I think I found on my computer).

@ghost
Copy link
Author

ghost commented Jan 5, 2014

If the method was more accurately called is_shared, would you still object? I admit I still don't understand the objections.

@jseabold
Copy link
Contributor

jseabold commented Jan 5, 2014

I'm not sure I understand the objections either. This doesn't actually change anything. I just exposed existing code in the sort method to avoid having to blindly copy Series. So any objections I think are to the status quo.

I wanted to avoid having to call .sort() to find out that I should do .copy() then .sort() since sort on a Series will sometimes raise an error. Without having this method exposed, to write a function that takes and sorts a Series (knowingly in place) I'd have to always copy even when it might not strictly be necessary. But now I can do if is_view; copy; sort.

I'm not going to go to bat for whether or not this actually works, but I thought it would be nice to have exposed (even if in a private method).

@jreback
Copy link
Contributor

jreback commented Jan 5, 2014

see #5859

you can do

s._is_cacher if you really want to

why wouldn't you just always do

s.order() in any event? which always gives u back a new series

sort is really

s.order(inplace=True)

@jreback
Copy link
Contributor

jreback commented Jan 5, 2014

the API will change in #5190 (still somewhat to be determined), to unify sort/order/sort_index across objects in any event.

(but will prob be deprecated in 0.14 at the earlierst; not actually changed till later)

@jseabold
Copy link
Contributor

jseabold commented Jan 5, 2014

Because I didn't know about that? order is non-obvious to me. I also didn't want to copy unless I am forced to. I expected to find a sort(copy=True) but didn't find it.

@jreback
Copy link
Contributor

jreback commented Jan 5, 2014

IIRC order has been around quite a long time (I think this is the R name for sorting); was the original name. Not sure where/how sort came about.

I agree should prob just be sort (but then the complication of sort_index; as sort/order operate on the values).

could add a copy kw, but no current logic to figure out if its sorted already (its not hard, but doesn't exist). You always need to copy unless its already sorted (as the actual algos are inplace).

pls add that as a request to #5190

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants