Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: including offset/freq in Timestamp repr (#4553) #6575

Merged
merged 1 commit into from
Mar 10, 2014

Conversation

rosnfeld
Copy link
Contributor

@rosnfeld rosnfeld commented Mar 7, 2014

fixes #4553

Hopefully the test_repr() implementation I included was sufficient. I see some test_repr() instances in the code that are more minimal - literally just making sure that repr(obj) doesn't blow up. And some which are pretty detailed, which makes me worry they might be a bit fragile. I went more middle-of-the-road.

In [1]: from pandas import Timestamp

In [2]: date = '2014-03-07'

In [3]: tz = 'US/Eastern'

In [4]: freq = 'M'

In [5]: repr(Timestamp(date))  # date_only_repr
Out[5]: "Timestamp('2014-03-07 00:00:00')"

In [6]: repr(Timestamp(date, tz=tz))  # date_tz_repr
Out[6]: "Timestamp('2014-03-07 00:00:00-0500', tz='US/Eastern')"

In [7]: repr(Timestamp(date, offset=freq))  # date_freq_repr
Out[7]: "Timestamp('2014-03-07 00:00:00', offset='M')"

In [8]: repr(Timestamp(date, tz=tz, offset=freq))  # date_tz_freq_repr
Out[8]: "Timestamp('2014-03-07 00:00:00-0500', tz='US/Eastern', offset='M')"

In [9]: repr(Timestamp('2014-03-13 00:00:00-0400', tz=None))  # date_with_utc_offset_repr
Out[9]: "Timestamp('2014-03-13 00:00:00-0400')"

@hayd
Copy link
Contributor

hayd commented Mar 8, 2014

Thanks for putting this together! The tests look good.

One suggestion/tweak is to I have tz and freq in the repr only if they're not None? I think that would be an improvement (less noisy and still having eval(repr(t)) == t)... ?

@rosnfeld
Copy link
Contributor Author

rosnfeld commented Mar 8, 2014

I don't have strong feelings on it, I just followed the pattern already there with tz. On the one hand, it's nice to be explicit and show users something they may have missed ("oh, I forgot to set time zone on this thing"), on the other hand, it's noisy/less elegant.

I'll follow whatever you senior pandas folks deem most desirable.

@jreback jreback added this to the 0.14.0 milestone Mar 9, 2014
@@ -215,7 +219,9 @@ class Timestamp(_Timestamp):
pass
zone = "'%s'" % zone if zone else 'None'

return "Timestamp('%s', tz=%s)" % (result, zone)
freq = "'%s'" % self.offset.freqstr if self.offset else 'None'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

conditionally add tz and offset if they are not None

also use the new style formatting

e.g. something like

freq = ", offset={0}".format(self.offset.freqstr) if self.offset is not None else ""
tz = ", tz={0}".format(self.tz) if self.tz is not None else ""

"Timestamp('{stamp}'{zone}{freq}})".format(stamp=stamp,zone=zone,freq=freq)

@jreback
Copy link
Contributor

jreback commented Mar 9, 2014

also pls do a perf check

@rosnfeld
Copy link
Contributor Author

rosnfeld commented Mar 9, 2014

Good catch on if self.offset not being the same as if self.offset is not None.

Playing around with it, I've just found one case where a repr can be generated that isn't executable:

In [4]: pd.Timestamp('2014-03-05 00:00:00-0500')
Out[4]: Timestamp('2014-03-05 00:00:00-0500', tz='tzoffset(None, -18000)')

This is a pre-existing bug. I'll need to do another pass to handle that.

By "do a perf check" do you mean add a new benchmark to vbench or just run the vbench suite? (not that I've ever done either)

@jreback
Copy link
Contributor

jreback commented Mar 9, 2014

I mean just run

test_perf.sh -b master -t HEAD

and report if anything > 1.2 or so

@rosnfeld
Copy link
Contributor Author

rosnfeld commented Mar 9, 2014

Oddly it took 2 tries for test_perf.sh to work, initially I got the same exact error as in:
#5978 (comment)
and
#5283 (comment)

All seems okay on the second run though, and nothing came in over ~1.15.

I'll try and handle the tzoffset case and test again.

@rosnfeld
Copy link
Contributor Author

Well, if I include code like:

        tz = ""
        from dateutil.tz import tzoffset
        if isinstance(zone, tzoffset):
            tz = ", tz={0}".format(zone)
        elif zone is not None:
            tz = ", tz='{0}'".format(zone)

which I find somewhat gross/inelegant, then we can get the following:

In [3]: pd.Timestamp('2014-03-05 00:00:00-0500')
Out[3]: Timestamp('2014-03-05 00:00:00-0500', tz=tzoffset(None, -18000))

which if we have tzoffset imported, would be eval-able.

Yes? No? If people like it I can add a test to exercise this case and then hopefully we are done.

@jreback
Copy link
Contributor

jreback commented Mar 10, 2014

tz should appear as a common tz string not an offset

@rosnfeld
Copy link
Contributor Author

I just want to make sure we're on the same page here. My present understanding is there are 2 different types of objects which can be found stored in Timestamp.tzinfo:

  1. pytz timezones, which str() nicely as strings like "US/Eastern". These strings can be passed back into the Timestamp constructor and it converts them back to the pytz instance.
  2. dateutil tzoffsets, which are represented as seconds from UTC, and don't have a nice str() implementation. They just show a repr() value like "tzoffset(None, -18000)".

Are you saying that Timezone's repr() should display the 2nd type as if it was the 1st type? (so that if someone eval'd the string, they'd get an instance with different internals than what was originally repr'd)

@jreback
Copy link
Contributor

jreback commented Mar 10, 2014

@rosnfeld forgot about that...ok

so the repr (and str) for a tzoffset DOES not include the tz then (its embedded in the actual time field), while for a pytz it is in the tz field.

So I think these needed to be treated separately, so only print the tz field if its not None and not tzoffset (or you can do it if only a pytz)....annoying that this has to be done....

@rosnfeld
Copy link
Contributor Author

Ok, the latest commit is rebased on top of master, I handled this new tzoffset case and fleshed out the tests a bit more, Travis passes (after an initial network failure accessing yahoo) and a perf test came up clean.

@jreback
Copy link
Contributor

jreback commented Mar 10, 2014

gr8!

can you post the tests reprs (the 4 or 5 cases that you have in the test) just to see

@rosnfeld
Copy link
Contributor Author

In [1]: from pandas import Timestamp

In [2]: date = '2014-03-07'

In [3]: tz = 'US/Eastern'

In [4]: freq = 'M'

In [5]: repr(Timestamp(date))  # date_only_repr
Out[5]: "Timestamp('2014-03-07 00:00:00')"

In [6]: repr(Timestamp(date, tz=tz))  # date_tz_repr
Out[6]: "Timestamp('2014-03-07 00:00:00-0500', tz='US/Eastern')"

In [7]: repr(Timestamp(date, offset=freq))  # date_freq_repr
Out[7]: "Timestamp('2014-03-07 00:00:00', offset='M')"

In [8]: repr(Timestamp(date, tz=tz, offset=freq))  # date_tz_freq_repr
Out[8]: "Timestamp('2014-03-07 00:00:00-0500', tz='US/Eastern', offset='M')"

In [9]: repr(Timestamp("2014-03-13 00:00:00-0500"))  # date_with_utc_offset_repr
Out[9]: "Timestamp('2014-03-13 00:00:00-0500')"

@jreback
Copy link
Contributor

jreback commented Mar 10, 2014

hmm

isn't it more correct for

In [6]: repr(Timestamp(date, tz=tz))  # date_tz_repr
Out[6]: "Timestamp('2014-03-07 00:00:00-0500', tz='US/Eastern')"

to be

In [6]: repr(Timestamp(date, tz=tz))  # date_tz_repr
Out[6]: "Timestamp('2014-03-07 00:00:00', tz='US/Eastern')"

as the original is showing essentially double information?

@rosnfeld
Copy link
Contributor Author

Well, somewhat. US/Eastern does not always mean -0500, like today and the next few weeks for example:

In [4]: repr(Timestamp('2014-03-13', tz='US/Eastern'))
Out[4]: "Timestamp('2014-03-13 00:00:00-0400', tz='US/Eastern')"

IMO it's handy to see the UTC offset separately.

@rosnfeld
Copy link
Contributor Author

Ahem, and more than just a few weeks, it's not like UTC has DST.

@jreback
Copy link
Contributor

jreback commented Mar 10, 2014

ahh..ok...that makes sense

@jreback
Copy link
Contributor

jreback commented Mar 10, 2014

cc @rockg

comments?

@rockg
Copy link
Contributor

rockg commented Mar 10, 2014

@jreback I see your comment about it being a little confusing about having the duplicate information but I think it's complete this way. For the last test, it might be informative to have tz=None or something (maybe tzoffset(...)) to inform that there actually isn't a tz, but just a straight offset.

@rosnfeld
Copy link
Contributor Author

Sure, I like passing tz=None. I'll also update the offset to -0400 in that test so that it is consistent with US/Eastern as used in the earlier tests.

@jreback
Copy link
Contributor

jreback commented Mar 10, 2014

@rosnfeld
can you put the reprs (the above post with your most recent changes) in the top of the PR?

@rosnfeld
Copy link
Contributor Author

Done

tz = 'US/Eastern'
freq = 'M'

date_only_repr = repr(Timestamp(date))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these need a round-trip check for each of these, e.g.

ts = Timestamp(date)
self.assertEqual(ts,eval(repr(ts)))

normally frown on using eval but that is the point here

(leave all the other tests as well) as they are checking what the repr looks like

@rosnfeld
Copy link
Contributor Author

Good idea, I added the round-trip check. While playing with it, I noted that offset is not being used in equality testing, e.g.

In [2]: Timestamp('2014-03-13') == Timestamp('2014-03-13', offset='D')
Out[2]: True

Not sure what your thoughts are on that.

@rosnfeld
Copy link
Contributor Author

(Whoops, tiny stylistic inconsistency in the previous push. Fixed.)

@jreback
Copy link
Contributor

jreback commented Mar 10, 2014

equality in Timestamp space is only dependent on the UTC (e.g. the value).

I think they are equal regardless of the freq/offset as they represent a single instant in time. You could argue otherwise I suppose. I am not sure of the implications of changing this.

@rosnfeld
Copy link
Contributor Author

Yeah, on reflection I think that makes sense.

This 670edaa commit has the desired code. Hopefully this is it.

@jreback
Copy link
Contributor

jreback commented Mar 10, 2014

ok...this looks good..ping when travis is all green

@rosnfeld
Copy link
Contributor Author

ping


return "Timestamp('%s', tz=%s)" % (result, zone)
from dateutil.tz import tzoffset
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can move this particular import to the very top of tslib.pyx

@rosnfeld
Copy link
Contributor Author

Ok, done. I feel I have already seen a few import patterns within pandas, and have read there is a tiny perf boost to doing a function-local import... are there policies on this?

@jreback
Copy link
Contributor

jreback commented Mar 10, 2014

in general try not to import locally if at all possible; cython is a bit tricky because you can't you can't directly import from the pandas namespace. If the function is called a lot their IS a big perf drag (as an import is several lookups), I don't think its an issue here. but just trying to be clean.

generally you only import localy if you you can't otherwise, e.g. its a circular-import problem, otherwise need to refactor into different modules, but sometimes that's too complicated . again doesnt matter here

jreback added a commit that referenced this pull request Mar 10, 2014
ENH: including offset/freq in Timestamp repr (#4553)
@jreback jreback merged commit 26e8fa8 into pandas-dev:master Mar 10, 2014
@jreback
Copy link
Contributor

jreback commented Mar 10, 2014

thanks!

@rosnfeld rosnfeld deleted the issue_4553 branch March 10, 2014 21:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Output-Formatting __repr__ of pandas objects, to_string
Projects
None yet
Development

Successfully merging this pull request may close these issues.

FORMAT: __repr__ of Timestamp should include the offset (freq) info
4 participants