Feedback on v0.14.0 RC1 #7146

jreback · 2014-05-16T22:14:25Z

Issue for feeback / comments on release candidate 1, esp. on the signficant display changes.

michaelaye · 2014-05-17T10:46:24Z

Love the display of dataframes with cut-out middle parts.
In Safari on OSX (10.9.3) I find it irritating though that I can't see the right end of the table, just the border of right side of the table is not displayed (note that I made sure I scrolled all the way to the right).
Here's a screenshot:
https://www.dropbox.com/s/5kwpzsae4r14akh/Screenshot%202014-05-17%2003.43.19.pdf

jreback · 2014-05-17T12:53:42Z

I think this has been since before 0.14.0.

@cpcloud want to take a look?

maybe this needs some hint in the html to not cut it off

cpcloud · 2014-05-17T12:57:58Z

@michaelaye can you post the code that generated this frame? i can't reproduce this.... maybe a safari-only problem?

TomAugspurger · 2014-05-17T13:52:10Z

I'll try to figure out what's going on tomorrow.

michaelaye · 2014-05-17T17:05:31Z

df = pd.DataFrame(randn(100,30))

I confirmed that the same happens on Chrome@OSX. At least here.

patrick-russell · 2014-05-17T17:39:43Z

The central truncation was a little weird at first but I like it. Being able to see the head and tail in one view is really nice. I could see myself changing my default display options to get this view regularly ( keep it set really wide). And saves a df.head() and df.tail() when pulling in data to make sure it's all as I expect.

rosnfeld · 2014-05-18T18:51:26Z

Love the central truncation. I've also done the same performance tests I mentioned on the mailing list back when 0.13/0.13.1 came out and have found that 0.14 comes in very similar for my use cases to 0.13.1.

jreback · 2014-05-18T18:55:01Z

@rosnfeld if u have a self contained perf test - pls open an issue for us to look (only vaguely remember you precious one)

rosnfeld · 2014-05-18T19:01:23Z

Sadly not very self-contained, just the same end-to-end test I mentioned in https://groups.google.com/forum/#!topic/pydata/OHutLByJvK0 which was very "apply"-heavy, and I think you incorporated tests to address that ( #6013 ).

I'm just saying "first impressions are that 0.14 perf is pretty similar to 0.13.1 perf, for my use cases" - would you have expected big gains?

jreback · 2014-05-18T19:05:36Z

now I remember - I think most of the perf issues were addressed as much as possible in 0.13.1

0.14.0 should be similar to prior versions (except for some specif perf improvements)

but always working in that as a goal

cpcloud · 2014-05-19T15:30:30Z

there's an issue with the IPython notebook repr of a MultiIndexed DataFrame when the number of columns exceeds pd.options.display.max_columns.

jreback · 2014-05-19T15:33:18Z

@cpcloud can you spin that off as a sep issue?

cpcloud · 2014-05-19T15:33:54Z

yep

miketkelly · 2014-05-22T12:36:03Z

One bit of feedback unrelated to display issues -- I'm not a fan of the change to "allow a Series to utilize index methods depending on its index type" (#6380). In particular, I'd rather the date methods act on the values of a datetime series, rather than the index. I know that isn't implemented, but could we preserve the option?

I don't mind having to say s.index to operate directly on index values. On the other hand, I do find it awkward to manipulate date values in a datetime series without these methods being available.

I also think this will be error prone when applied to a datetime series:

>>> dates = pd.date_range('2014-01-01', '2014-01-03')
>>> s = pd.Series(dates, index=dates.shift(1, 'D'))
>>> s
2014-01-02   2014-01-01
2014-01-03   2014-01-02
2014-01-04   2014-01-03
Freq: D, dtype: datetime64[ns]

>>> s.day
2014-01-02    2
2014-01-03    3
2014-01-04    4
Freq: D, dtype: int64

It's be nice to have both s.day and s.index.day available, one operating on values, the other on the index. In any case, thanks for all the hard work. I understand if the ship has sailed on this one.

jreback · 2014-05-22T12:47:02Z

You can do this

In [16]: DatetimeIndex(s).day
Out[16]: array([1, 2, 3])

In [17]: s.index.day
Out[17]: array([2, 3, 4])

s.day is very problematic because of the ambiguity, e.g. am I working on the values or the index and because having a datelike index and a non-datelike values (e.g. float), is by far the most common case.

maybe have a s.to_index().day is possible, e.g. create an index from the values would solve this?

miketkelly · 2014-05-22T13:31:05Z

Yes, I do DatetimeIndex(s).day all the time. It's awkward and confusing to beginners. I don't want to create an index just to calculate a value. to_index() helps, but it still injects the concept of an index into an expression. Could it be simplified to s.date.day, so we have date functions in a namespace analogous to the s.str.* functions?

And I agree, s.day is very problematic because of the ambiguity. That's my point! Its more natural to assume functions operate on the values of a series.

rosnfeld · 2014-05-22T13:52:52Z

Just as a side-observer on this thread: +1 on the namespace idea (maybe it could be "datetime"?)

jreback · 2014-05-22T14:27:24Z

so to replace the DatetimeIndex(s) call could allow to_index()
I suppose a namespace is fine too; I don't think .datetime is any more intuitive though
what you really want is something like .values_as_index (but this is too long IMHO).

this was originally motivated because Series.weekday existed (since as far back as I can remember); so needed either to remove it, or all the rest, see here: #5519

In [2]: s = Series(np.arange(20),index=pd.date_range('20130101',periods=20))

In [3]: s
Out[3]: 
2013-01-01     0
2013-01-02     1
2013-01-03     2
2013-01-04     3
2013-01-05     4
2013-01-06     5
2013-01-07     6
2013-01-08     7
2013-01-09     8
2013-01-10     9
2013-01-11    10
2013-01-12    11
2013-01-13    12
2013-01-14    13
2013-01-15    14
2013-01-16    15
2013-01-17    16
2013-01-18    17
2013-01-19    18
2013-01-20    19
Freq: D, dtype: int64

In [4]: s[s.weekday==5]
Out[4]: 
2013-01-05     4
2013-01-12    11
2013-01-19    18
dtype: int64

In [5]: s[s.index.weekday==5]
Out[5]: 
2013-01-05     4
2013-01-12    11
2013-01-19    18
dtype: int64

In [6]: s = Series(pd.date_range('20130104',periods=20),index=pd.date_range('20130101',periods=20))

In [7]: s
Out[7]: 
2013-01-01   2013-01-04
2013-01-02   2013-01-05
2013-01-03   2013-01-06
2013-01-04   2013-01-07
2013-01-05   2013-01-08
2013-01-06   2013-01-09
2013-01-07   2013-01-10
2013-01-08   2013-01-11
2013-01-09   2013-01-12
2013-01-10   2013-01-13
2013-01-11   2013-01-14
2013-01-12   2013-01-15
2013-01-13   2013-01-16
2013-01-14   2013-01-17
2013-01-15   2013-01-18
2013-01-16   2013-01-19
2013-01-17   2013-01-20
2013-01-18   2013-01-21
2013-01-19   2013-01-22
2013-01-20   2013-01-23
Freq: D, dtype: datetime64[ns]

In [8]: s[DatetimeIndex(s).weekday==5]
Out[8]: 
Out[8]: 
2013-01-02   2013-01-05
2013-01-09   2013-01-12
2013-01-16   2013-01-19
dtype: datetime64[ns]

jreback · 2014-05-22T14:45:38Z

Its actually pretty trivial to raise a helpful error message, e.g.

s.weekday could raise something like:

TypeError('cannot perform weekday directly on a Series, use series.index.weekday or series.to_index().weekday instead')

jreback · 2014-05-22T15:15:52Z

see #7206. Going to remove for now (the net effect is to just remove Series.weekday) from what existed in 0.13.1 (and adding in support for methods to Index types). Will revisit in : #7207

jreback · 2014-05-23T14:33:00Z

@mtkni @rosnfeld can you give a test with master (I mean test whatever you were testing to discover the 'issues' with the new series properties, which are now removed)

rosnfeld · 2014-05-24T00:26:34Z

I can confirm that in similar investigations with the latest changes, Series operations perform as expected, including no "series.day" (new) or "series.weekday" (preexisting). Doing those operations on series.index still seem to work fine.

jreback · 2014-05-25T16:08:16Z

ok master is clean!

pls give a test out one more time everyone

if anything comes up pls post here

anyone think we need an rc2?

jreback · 2014-05-29T16:01:59Z

going to tag 0.14.0 tomorrow after #7275 is merged. speak now or hold your peace (at least until 0.14.1)

cpcloud · 2014-05-29T16:34:48Z

👍

jreback · 2014-05-30T12:17:46Z

release is posted, pending builds/uploads to PyPi: https://github.com/pydata/pandas/releases

cpcloud · 2014-05-30T13:27:18Z

Round of 👏 for @jreback!

TomAugspurger · 2014-05-30T13:50:38Z

👏

michaelaye · 2014-05-30T20:48:47Z

👏 I'm quite expressed by the number of bugs you guys squashed. Congrats to everyone!

rosnfeld · 2014-05-30T20:57:21Z

👏 it's pretty awesome

jreback added this to the 0.14.0 milestone May 16, 2014

jreback added the Testing label May 16, 2014

cpcloud mentioned this issue May 19, 2014

MultiIndex notebook repr is incorrect when number of columns is > max_columns #7174

Closed

This was referenced May 22, 2014

API: remove datetime-like index ops from Series #7206

Merged

API: revisit adding datetime-like ops in Series #7207

Closed

jreback closed this as completed May 30, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feedback on v0.14.0 RC1 #7146

Feedback on v0.14.0 RC1 #7146

jreback commented May 16, 2014

michaelaye commented May 17, 2014

jreback commented May 17, 2014

cpcloud commented May 17, 2014

TomAugspurger commented May 17, 2014

michaelaye commented May 17, 2014

patrick-russell commented May 17, 2014

rosnfeld commented May 18, 2014

jreback commented May 18, 2014

rosnfeld commented May 18, 2014

jreback commented May 18, 2014

cpcloud commented May 19, 2014

jreback commented May 19, 2014

cpcloud commented May 19, 2014

miketkelly commented May 22, 2014

jreback commented May 22, 2014

miketkelly commented May 22, 2014

rosnfeld commented May 22, 2014

jreback commented May 22, 2014

jreback commented May 22, 2014

jreback commented May 22, 2014

jreback commented May 23, 2014

rosnfeld commented May 24, 2014

jreback commented May 25, 2014

jreback commented May 29, 2014

cpcloud commented May 29, 2014

jreback commented May 30, 2014

cpcloud commented May 30, 2014

TomAugspurger commented May 30, 2014

michaelaye commented May 30, 2014

rosnfeld commented May 30, 2014

Feedback on v0.14.0 RC1 #7146

Feedback on v0.14.0 RC1 #7146

Comments

jreback commented May 16, 2014

michaelaye commented May 17, 2014

jreback commented May 17, 2014

cpcloud commented May 17, 2014

TomAugspurger commented May 17, 2014

michaelaye commented May 17, 2014

patrick-russell commented May 17, 2014

rosnfeld commented May 18, 2014

jreback commented May 18, 2014

rosnfeld commented May 18, 2014

jreback commented May 18, 2014

cpcloud commented May 19, 2014

jreback commented May 19, 2014

cpcloud commented May 19, 2014

miketkelly commented May 22, 2014

jreback commented May 22, 2014

miketkelly commented May 22, 2014

rosnfeld commented May 22, 2014

jreback commented May 22, 2014

jreback commented May 22, 2014

jreback commented May 22, 2014

jreback commented May 23, 2014

rosnfeld commented May 24, 2014

jreback commented May 25, 2014

jreback commented May 29, 2014

cpcloud commented May 29, 2014

jreback commented May 30, 2014

cpcloud commented May 30, 2014

TomAugspurger commented May 30, 2014

michaelaye commented May 30, 2014

rosnfeld commented May 30, 2014