Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feedback on v0.14.0 RC1 #7146

Closed
jreback opened this issue May 16, 2014 · 30 comments
Closed

Feedback on v0.14.0 RC1 #7146

jreback opened this issue May 16, 2014 · 30 comments
Labels
Testing pandas testing functions or related to the test suite
Milestone

Comments

@jreback
Copy link
Contributor

jreback commented May 16, 2014

Issue for feeback / comments on release candidate 1, esp. on the signficant display changes.

@jreback jreback added this to the 0.14.0 milestone May 16, 2014
@michaelaye
Copy link
Contributor

Love the display of dataframes with cut-out middle parts.
In Safari on OSX (10.9.3) I find it irritating though that I can't see the right end of the table, just the border of right side of the table is not displayed (note that I made sure I scrolled all the way to the right).
Here's a screenshot:
https://www.dropbox.com/s/5kwpzsae4r14akh/Screenshot%202014-05-17%2003.43.19.pdf

@jreback
Copy link
Contributor Author

jreback commented May 17, 2014

I think this has been since before 0.14.0.

@cpcloud want to take a look?

maybe this needs some hint in the html to not cut it off

@cpcloud
Copy link
Member

cpcloud commented May 17, 2014

@michaelaye can you post the code that generated this frame? i can't reproduce this.... maybe a safari-only problem?

@TomAugspurger
Copy link
Contributor

I'll try to figure out what's going on tomorrow.

@michaelaye
Copy link
Contributor

df = pd.DataFrame(randn(100,30))

I confirmed that the same happens on Chrome@OSX. At least here.

@patrick-russell
Copy link

The central truncation was a little weird at first but I like it. Being able to see the head and tail in one view is really nice. I could see myself changing my default display options to get this view regularly ( keep it set really wide). And saves a df.head() and df.tail() when pulling in data to make sure it's all as I expect.

@rosnfeld
Copy link
Contributor

Love the central truncation. I've also done the same performance tests I mentioned on the mailing list back when 0.13/0.13.1 came out and have found that 0.14 comes in very similar for my use cases to 0.13.1.

@jreback
Copy link
Contributor Author

jreback commented May 18, 2014

@rosnfeld if u have a self contained perf test - pls open an issue for us to look (only vaguely remember you precious one)

@rosnfeld
Copy link
Contributor

Sadly not very self-contained, just the same end-to-end test I mentioned in https://groups.google.com/forum/#!topic/pydata/OHutLByJvK0 which was very "apply"-heavy, and I think you incorporated tests to address that ( #6013 ).

I'm just saying "first impressions are that 0.14 perf is pretty similar to 0.13.1 perf, for my use cases" - would you have expected big gains?

@jreback
Copy link
Contributor Author

jreback commented May 18, 2014

now I remember - I think most of the perf issues were addressed as much as possible in 0.13.1

0.14.0 should be similar to prior versions (except for some specif perf improvements)

but always working in that as a goal

@cpcloud
Copy link
Member

cpcloud commented May 19, 2014

there's an issue with the IPython notebook repr of a MultiIndexed DataFrame when the number of columns exceeds pd.options.display.max_columns.

@jreback
Copy link
Contributor Author

jreback commented May 19, 2014

@cpcloud can you spin that off as a sep issue?

@cpcloud
Copy link
Member

cpcloud commented May 19, 2014

yep

@miketkelly
Copy link

One bit of feedback unrelated to display issues -- I'm not a fan of the change to "allow a Series to utilize index methods depending on its index type" (#6380). In particular, I'd rather the date methods act on the values of a datetime series, rather than the index. I know that isn't implemented, but could we preserve the option?

I don't mind having to say s.index to operate directly on index values. On the other hand, I do find it awkward to manipulate date values in a datetime series without these methods being available.

I also think this will be error prone when applied to a datetime series:

>>> dates = pd.date_range('2014-01-01', '2014-01-03')
>>> s = pd.Series(dates, index=dates.shift(1, 'D'))
>>> s
2014-01-02   2014-01-01
2014-01-03   2014-01-02
2014-01-04   2014-01-03
Freq: D, dtype: datetime64[ns]

>>> s.day
2014-01-02    2
2014-01-03    3
2014-01-04    4
Freq: D, dtype: int64

It's be nice to have both s.day and s.index.day available, one operating on values, the other on the index. In any case, thanks for all the hard work. I understand if the ship has sailed on this one.

@jreback
Copy link
Contributor Author

jreback commented May 22, 2014

You can do this

In [16]: DatetimeIndex(s).day
Out[16]: array([1, 2, 3])

In [17]: s.index.day
Out[17]: array([2, 3, 4])

s.day is very problematic because of the ambiguity, e.g. am I working on the values or the index and because having a datelike index and a non-datelike values (e.g. float), is by far the most common case.

maybe have a s.to_index().day is possible, e.g. create an index from the values would solve this?

@miketkelly
Copy link

Yes, I do DatetimeIndex(s).day all the time. It's awkward and confusing to beginners. I don't want to create an index just to calculate a value. to_index() helps, but it still injects the concept of an index into an expression. Could it be simplified to s.date.day, so we have date functions in a namespace analogous to the s.str.* functions?

And I agree, s.day is very problematic because of the ambiguity. That's my point! Its more natural to assume functions operate on the values of a series.

@rosnfeld
Copy link
Contributor

Just as a side-observer on this thread: +1 on the namespace idea (maybe it could be "datetime"?)

@jreback
Copy link
Contributor Author

jreback commented May 22, 2014

so to replace the DatetimeIndex(s) call could allow to_index()
I suppose a namespace is fine too; I don't think .datetime is any more intuitive though
what you really want is something like .values_as_index (but this is too long IMHO).

this was originally motivated because Series.weekday existed (since as far back as I can remember); so needed either to remove it, or all the rest, see here: #5519

In [2]: s = Series(np.arange(20),index=pd.date_range('20130101',periods=20))

In [3]: s
Out[3]: 
2013-01-01     0
2013-01-02     1
2013-01-03     2
2013-01-04     3
2013-01-05     4
2013-01-06     5
2013-01-07     6
2013-01-08     7
2013-01-09     8
2013-01-10     9
2013-01-11    10
2013-01-12    11
2013-01-13    12
2013-01-14    13
2013-01-15    14
2013-01-16    15
2013-01-17    16
2013-01-18    17
2013-01-19    18
2013-01-20    19
Freq: D, dtype: int64

In [4]: s[s.weekday==5]
Out[4]: 
2013-01-05     4
2013-01-12    11
2013-01-19    18
dtype: int64

In [5]: s[s.index.weekday==5]
Out[5]: 
2013-01-05     4
2013-01-12    11
2013-01-19    18
dtype: int64

In [6]: s = Series(pd.date_range('20130104',periods=20),index=pd.date_range('20130101',periods=20))

In [7]: s
Out[7]: 
2013-01-01   2013-01-04
2013-01-02   2013-01-05
2013-01-03   2013-01-06
2013-01-04   2013-01-07
2013-01-05   2013-01-08
2013-01-06   2013-01-09
2013-01-07   2013-01-10
2013-01-08   2013-01-11
2013-01-09   2013-01-12
2013-01-10   2013-01-13
2013-01-11   2013-01-14
2013-01-12   2013-01-15
2013-01-13   2013-01-16
2013-01-14   2013-01-17
2013-01-15   2013-01-18
2013-01-16   2013-01-19
2013-01-17   2013-01-20
2013-01-18   2013-01-21
2013-01-19   2013-01-22
2013-01-20   2013-01-23
Freq: D, dtype: datetime64[ns]

In [8]: s[DatetimeIndex(s).weekday==5]
Out[8]: 
Out[8]: 
2013-01-02   2013-01-05
2013-01-09   2013-01-12
2013-01-16   2013-01-19
dtype: datetime64[ns]

@jreback
Copy link
Contributor Author

jreback commented May 22, 2014

Its actually pretty trivial to raise a helpful error message, e.g.

s.weekday could raise something like:

TypeError('cannot perform weekday directly on a Series, use series.index.weekday or series.to_index().weekday instead')

@jreback
Copy link
Contributor Author

jreback commented May 22, 2014

see #7206. Going to remove for now (the net effect is to just remove Series.weekday) from what existed in 0.13.1 (and adding in support for methods to Index types). Will revisit in : #7207

@jreback
Copy link
Contributor Author

jreback commented May 23, 2014

@mtkni @rosnfeld can you give a test with master (I mean test whatever you were testing to discover the 'issues' with the new series properties, which are now removed)

@rosnfeld
Copy link
Contributor

I can confirm that in similar investigations with the latest changes, Series operations perform as expected, including no "series.day" (new) or "series.weekday" (preexisting). Doing those operations on series.index still seem to work fine.

@jreback
Copy link
Contributor Author

jreback commented May 25, 2014

ok master is clean!

pls give a test out one more time everyone

if anything comes up pls post here

anyone think we need an rc2?

@jreback
Copy link
Contributor Author

jreback commented May 29, 2014

going to tag 0.14.0 tomorrow after #7275 is merged. speak now or hold your peace (at least until 0.14.1)

@cpcloud
Copy link
Member

cpcloud commented May 29, 2014

👍

@jreback
Copy link
Contributor Author

jreback commented May 30, 2014

release is posted, pending builds/uploads to PyPi: https://github.com/pydata/pandas/releases

@jreback jreback closed this as completed May 30, 2014
@cpcloud
Copy link
Member

cpcloud commented May 30, 2014

Round of 👏 for @jreback!

@TomAugspurger
Copy link
Contributor

👏

@michaelaye
Copy link
Contributor

👏 I'm quite expressed by the number of bugs you guys squashed. Congrats to everyone!

@rosnfeld
Copy link
Contributor

👏 it's pretty awesome

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Testing pandas testing functions or related to the test suite
Projects
None yet
Development

No branches or pull requests

7 participants