CLN/ENH: Stop instantiating all offsets on load. #5189

jtratner · 2013-10-12T04:28:45Z

Implements @wesm's suggestion and clears the way for new offsets.

Very simple to extend - map a string to your class in mapping, define a from_name
classmethod, then class gets passed everything after the - (if
anything) to its method from_name(). Caches to dict afterwards (right
now doesn't use cacheable, but could take advantage of that).

cc @cancan101

For reference, here are all the existing offset: output aliases (using
types because otherwise prints exactly the same).
[(None, NoneType),
('A-APR', pandas.tseries.offsets.YearEnd),
('A-AUG', pandas.tseries.offsets.YearEnd),
('A-DEC', pandas.tseries.offsets.YearEnd),
('A-FEB', pandas.tseries.offsets.YearEnd),
('A-JAN', pandas.tseries.offsets.YearEnd),
('A-JUL', pandas.tseries.offsets.YearEnd),
('A-JUN', pandas.tseries.offsets.YearEnd),
('A-MAR', pandas.tseries.offsets.YearEnd),
('A-MAY', pandas.tseries.offsets.YearEnd),
('A-NOV', pandas.tseries.offsets.YearEnd),
('A-OCT', pandas.tseries.offsets.YearEnd),
('A-SEP', pandas.tseries.offsets.YearEnd),
('AS-APR', pandas.tseries.offsets.YearBegin),
('AS-AUG', pandas.tseries.offsets.YearBegin),
('AS-DEC', pandas.tseries.offsets.YearBegin),
('AS-FEB', pandas.tseries.offsets.YearBegin),
('AS-JAN', pandas.tseries.offsets.YearBegin),
('AS-JUL', pandas.tseries.offsets.YearBegin),
('AS-JUN', pandas.tseries.offsets.YearBegin),
('AS-MAR', pandas.tseries.offsets.YearBegin),
('AS-MAY', pandas.tseries.offsets.YearBegin),
('AS-NOV', pandas.tseries.offsets.YearBegin),
('AS-OCT', pandas.tseries.offsets.YearBegin),
('AS-SEP', pandas.tseries.offsets.YearBegin),
('B', pandas.tseries.offsets.BusinessDay),
('BA-APR', pandas.tseries.offsets.BYearEnd),
('BA-AUG', pandas.tseries.offsets.BYearEnd),
('BA-DEC', pandas.tseries.offsets.BYearEnd),
('BA-FEB', pandas.tseries.offsets.BYearEnd),
('BA-JAN', pandas.tseries.offsets.BYearEnd),
('BA-JUL', pandas.tseries.offsets.BYearEnd),
('BA-JUN', pandas.tseries.offsets.BYearEnd),
('BA-MAR', pandas.tseries.offsets.BYearEnd),
('BA-MAY', pandas.tseries.offsets.BYearEnd),
('BA-NOV', pandas.tseries.offsets.BYearEnd),
('BA-OCT', pandas.tseries.offsets.BYearEnd),
('BA-SEP', pandas.tseries.offsets.BYearEnd),
('BAS-APR', pandas.tseries.offsets.BYearBegin),
('BAS-AUG', pandas.tseries.offsets.BYearBegin),
('BAS-DEC', pandas.tseries.offsets.BYearBegin),
('BAS-FEB', pandas.tseries.offsets.BYearBegin),
('BAS-JAN', pandas.tseries.offsets.BYearBegin),
('BAS-JUL', pandas.tseries.offsets.BYearBegin),
('BAS-JUN', pandas.tseries.offsets.BYearBegin),
('BAS-MAR', pandas.tseries.offsets.BYearBegin),
('BAS-MAY', pandas.tseries.offsets.BYearBegin),
('BAS-NOV', pandas.tseries.offsets.BYearBegin),
('BAS-OCT', pandas.tseries.offsets.BYearBegin),
('BAS-SEP', pandas.tseries.offsets.BYearBegin),
('BM', pandas.tseries.offsets.BusinessMonthEnd),
('BMS', pandas.tseries.offsets.BusinessMonthBegin),
('BQ', pandas.tseries.offsets.BQuarterEnd),
('BQ-APR', pandas.tseries.offsets.BQuarterEnd),
('BQ-AUG', pandas.tseries.offsets.BQuarterEnd),
('BQ-DEC', pandas.tseries.offsets.BQuarterEnd),
('BQ-FEB', pandas.tseries.offsets.BQuarterEnd),
('BQ-JAN', pandas.tseries.offsets.BQuarterEnd),
('BQ-JUL', pandas.tseries.offsets.BQuarterEnd),
('BQ-JUN', pandas.tseries.offsets.BQuarterEnd),
('BQ-MAR', pandas.tseries.offsets.BQuarterEnd),
('BQ-MAY', pandas.tseries.offsets.BQuarterEnd),
('BQ-NOV', pandas.tseries.offsets.BQuarterEnd),
('BQ-OCT', pandas.tseries.offsets.BQuarterEnd),
('BQ-SEP', pandas.tseries.offsets.BQuarterEnd),
('BQS', pandas.tseries.offsets.BQuarterBegin),
('BQS-APR', pandas.tseries.offsets.BQuarterBegin),
('BQS-AUG', pandas.tseries.offsets.BQuarterBegin),
('BQS-DEC', pandas.tseries.offsets.BQuarterBegin),
('BQS-FEB', pandas.tseries.offsets.BQuarterBegin),
('BQS-JAN', pandas.tseries.offsets.BQuarterBegin),
('BQS-JUL', pandas.tseries.offsets.BQuarterBegin),
('BQS-JUN', pandas.tseries.offsets.BQuarterBegin),
('BQS-MAR', pandas.tseries.offsets.BQuarterBegin),
('BQS-MAY', pandas.tseries.offsets.BQuarterBegin),
('BQS-NOV', pandas.tseries.offsets.BQuarterBegin),
('BQS-OCT', pandas.tseries.offsets.BQuarterBegin),
('BQS-SEP', pandas.tseries.offsets.BQuarterBegin),
('Q-APR', pandas.tseries.offsets.QuarterEnd),
('Q-AUG', pandas.tseries.offsets.QuarterEnd),
('Q-DEC', pandas.tseries.offsets.QuarterEnd),
('Q-FEB', pandas.tseries.offsets.QuarterEnd),
('Q-JAN', pandas.tseries.offsets.QuarterEnd),
('Q-JUL', pandas.tseries.offsets.QuarterEnd),
('Q-JUN', pandas.tseries.offsets.QuarterEnd),
('Q-MAR', pandas.tseries.offsets.QuarterEnd),
('Q-MAY', pandas.tseries.offsets.QuarterEnd),
('Q-NOV', pandas.tseries.offsets.QuarterEnd),
('Q-OCT', pandas.tseries.offsets.QuarterEnd),
('Q-SEP', pandas.tseries.offsets.QuarterEnd),
('QS', pandas.tseries.offsets.QuarterBegin),
('QS-APR', pandas.tseries.offsets.QuarterBegin),
('QS-AUG', pandas.tseries.offsets.QuarterBegin),
('QS-DEC', pandas.tseries.offsets.QuarterBegin),
('QS-FEB', pandas.tseries.offsets.QuarterBegin),
('QS-JAN', pandas.tseries.offsets.QuarterBegin),
('QS-JUL', pandas.tseries.offsets.QuarterBegin),
('QS-JUN', pandas.tseries.offsets.QuarterBegin),
('QS-MAR', pandas.tseries.offsets.QuarterBegin),
('QS-MAY', pandas.tseries.offsets.QuarterBegin),
('QS-NOV', pandas.tseries.offsets.QuarterBegin),
('QS-OCT', pandas.tseries.offsets.QuarterBegin),
('QS-SEP', pandas.tseries.offsets.QuarterBegin),
('S', pandas.tseries.offsets.Second),
('T', pandas.tseries.offsets.Minute),
('U', pandas.tseries.offsets.Micro),
('W-FRI', pandas.tseries.offsets.Week),
('W-MON', pandas.tseries.offsets.Week),
('W-SAT', pandas.tseries.offsets.Week),
('W-SUN', pandas.tseries.offsets.Week),
('W-THU', pandas.tseries.offsets.Week),
('W-TUE', pandas.tseries.offsets.Week),
('W-WED', pandas.tseries.offsets.Week),
('WOM-1FRI', pandas.tseries.offsets.WeekOfMonth),
('WOM-1MON', pandas.tseries.offsets.WeekOfMonth),
('WOM-1THU', pandas.tseries.offsets.WeekOfMonth),
('WOM-1TUE', pandas.tseries.offsets.WeekOfMonth),
('WOM-1WED', pandas.tseries.offsets.WeekOfMonth),
('WOM-2FRI', pandas.tseries.offsets.WeekOfMonth),
('WOM-2MON', pandas.tseries.offsets.WeekOfMonth),
('WOM-2THU', pandas.tseries.offsets.WeekOfMonth),
('WOM-2TUE', pandas.tseries.offsets.WeekOfMonth),
('WOM-2WED', pandas.tseries.offsets.WeekOfMonth),
('WOM-3FRI', pandas.tseries.offsets.WeekOfMonth),
('WOM-3MON', pandas.tseries.offsets.WeekOfMonth),
('WOM-3THU', pandas.tseries.offsets.WeekOfMonth),
('WOM-3TUE', pandas.tseries.offsets.WeekOfMonth),
('WOM-3WED', pandas.tseries.offsets.WeekOfMonth),
('WOM-4FRI', pandas.tseries.offsets.WeekOfMonth),
('WOM-4MON', pandas.tseries.offsets.WeekOfMonth),
('WOM-4THU', pandas.tseries.offsets.WeekOfMonth),
('WOM-4TUE', pandas.tseries.offsets.WeekOfMonth),
('WOM-4WED', pandas.tseries.offsets.WeekOfMonth)]

jreback · 2013-10-12T14:17:01Z

pandas/tseries/frequencies.py

+_offset_map = {}
+# TODO: Incorporate this into the make_offset thing in offsets
+# And what, Nano doesn't exist with np >= 1.7??
+# if not _np_version_under1p7:


Nano is not supported under < 1.7 so it exists, but he rule-code should error should raise

oh, lol it's not under.

anyways, just need to fix up offsets.py for this.

cancan101 · 2013-10-12T19:26:09Z

@wesm @jtratner This might be a good time to answer: #5015

cancan101 · 2013-10-12T19:33:06Z

pandas/tseries/tests/test_offsets.py

+                                       for day in days]
+        #singletons
+        names += ['S', 'T', 'U', 'BM', 'BMS', 'BQ', 'QS'] # No 'Q'
+        _offset_map.clear()


Maybe put this in a setUp method?

If you mean for _offset_map, I thought about it, but only matters for certain methods to actually clear the offset map - we can revisit as necessary. The rest we only need to test once - if more testing needed, can refactor then.

Yea, I meant the call: _offset_map.clear(). Fair enough

cancan101 · 2013-10-12T19:41:51Z

@jtratner I wonder if this is anything worth cherry picking from #5148

jtratner · 2013-10-12T19:47:56Z

I'll make it _named, that seems reasonable.

jtratner · 2013-10-12T19:54:59Z

Looking through, the rule code is pretty clear, it's prefix-OffsetSpecificEnding, where prefix is something like B, Q, etc. and the OffsetSpecificEnding is something like month, day, etc. There are a bunch of aliases for backwards compatibility and that's about it.

cancan101 · 2013-10-12T19:56:13Z

Looking at the regex you can see there are a few more rules: for example numbers are allowed only right after the dash.

jtratner · 2013-10-12T19:59:49Z

Yeah, but that's really just for validation: it covers the set of rules that already exist (as well as the legacy rules from scikits.timeseries, etc). (since you only end up with #s in the string for WOM) If we wanted to add some crazy additional rule_code, you'd change the regex to make it pass the validation.

cancan101 · 2013-10-12T20:05:23Z

I agree that we can always change the regex since the regex is just used for validation and not for parsing. We just want to make sure that we can express the parameterization for a given offset in a reasonable manner.

jtratner · 2013-10-12T20:06:29Z

Places where rule_code actually matters:

is_subperiod/is_superperiod (I guess different WOM can't downsample to each other)
tseries/tools - where it checks that rule_code == 'M' (so would be fine with static rule_code)
Timegrouper - which actually has to hack around rule_code to remove the suffix and just check for the prefix
generic - used for an error message
_get_freq in tseries/plotting - used to get period alias [probably could just have offsets know their period aliases)

jtratner · 2013-10-12T20:07:32Z

Okay, I have to go, but I think it's simple: you split on '-', first element needs to define class for lookup, you can have whatever else you want can go afterwards (and it's up to the dateoffset to handle it)

cancan101 · 2013-10-12T20:07:56Z

Agreed.

cancan101 · 2013-10-12T20:09:11Z

I would like to cleanup the is_subperiod/is_superperiod eventually. One of the nice features of the 52-53 week fiscal years is that by design they upsample to weeks.

jtratner · 2013-10-12T21:44:00Z

actually not sure whether we should deprecate rule_code, just want to work around it.

jtratner · 2013-10-12T21:55:43Z

@cancan101 good suggestion about adding a prefix attribute, ended up meaning that I could remove most of the special rule_codes and simplify things! (I ended up leaving comments with the actual prefixes just for clarity 😄)

cancan101 · 2013-10-12T23:42:23Z

@jtratner An additional location where rule_code matters is _FrequencyInferer. See for example get_freq, _get_wom_rule, _get_monthly_rule.

jtratner · 2013-10-12T23:43:10Z

okay well this doesn't change rule_code.

cancan101 · 2013-10-12T23:46:52Z

That may be the case but I would be in favor of at the very least moving these over to using constants (along lines of #5189 (comment)) if not writing this code to check against a DateOffset object rather than the freq_str / rule_code.

jtratner · 2013-10-12T23:48:00Z

I want to push that to a separate refactor. This keeps the behavior relatively the same and will enable you to merge your 52-53 week PR.

cancan101 · 2013-10-12T23:49:11Z

That's totally fine. I was just going through looking at the upsample/downsample code and was reminded of those string literals...

jtratner · 2013-10-12T23:50:33Z

frankly, you probably have a better sense of this than I, just figured I could simplify a bit for you first, make sure we have a working framework, then it's easier to make more advanced tweaks later.

cancan101 · 2013-10-13T19:17:49Z

pandas/tseries/frequencies.py

        return offset
-    else:
-        raise ValueError('Bad rule name requested: %s.' % name)
+    except KeyError:


Under what conditions would a KeyError be raised (ie which lines could raise the error)?

cancan101 · 2013-10-13T22:03:28Z

pandas/tseries/offsets.py

@@ -683,6 +711,9 @@ def rule_code(self):
    6: 'SUN'
 }

+# reversed
+_weekday_dict.update([(v, k) for k, v in _weekday_dict.items()])


What is the point of adding the reversed map to the same dict?

Means that _weekday_dict['SUN'] is 6, while _weekday_dict[6] is 'SUN' and that way you don't have to maintain two separate dicts of the same information (have to lookup the first way in from_name and the second in rule_code.

what does Week.from_name(suffix=1) give you?

It'll raise an error because you can't slice it. I'll go change all of them to private methods now, thanks for the heads up.

I think it will actually create the Week but when that Week is used it will raise an Exception.

Actually, this makes me realize that my implementation is wrong.

Fixed it - need to catch the TypeError from getting an offset with too many -

I guess I don't see the benefit of sharing the same dict for forward and reverse lookups. It seems to leave open the possibility of not fully checking for invalid inputs. Admittedly the dict is private.

okay, that's reasonable and trivial to implement - I'll add it.

jtratner · 2013-10-13T22:28:05Z

Okay, I think this is in a pretty good place at this point. Thanks for the feedback @cancan101! Any other comments? @wesm - does this match what you were thinking?

cancan101 · 2013-10-13T22:36:43Z

@jtratner NP.

jreback · 2013-10-13T23:01:16Z

pandas/tseries/frequencies.py

-    else:
+    # what the heck is this Exception supposed to be for???
+    try:
+        return offset.freqstr


any idea about this?

I still don't know...all I know is that BusinessDay was treated separately and explicitly tested to be this way. I think I might be able to remove the try/except though.

Yep, can remove the try/except now because of the check - is this considered public? I.e., does it need to validate anything? If it's not I might put the try/except back.

I mean rather, if it is, would prefer to have error be "Bad rule" rather than "No attribute freqstr"

Put it back for just that reason, catching any Exception

jtratner · 2013-10-14T01:39:33Z

@jreback and others - okay?

jreback · 2013-10-14T01:40:31Z

ok by me

Use a more flexible lookup based on prefix then delegate creation to class' `from_name()` method. Totally get rid of hasOffsetName (not necessary). Finally, make some inheriting go on to simplify things.

jtratner · 2013-10-14T21:59:03Z

okay, going to merge this soon? any other comments?

cancan101 · 2013-10-15T00:55:56Z

Is there a vbench for pandas load time?

jtratner · 2013-10-15T00:58:24Z

No, I don't think so (hard to time), but this is mostly to make your PR
simpler and easier to support.

…load CLN/ENH: Stop instantiating all offsets on load.

cancan101 · 2013-10-15T01:00:39Z

@jtratner I'm totally fine with commit. I was just curious as to start up gain.

jtratner · 2013-10-15T01:34:16Z

It's a good point, I've definitely thought about it too. Not sure how you'd
test within Python, don't know how you could unload it to load again.
Memory footprint might be a better metric, but even that isn't so high.

cancan101 · 2013-10-15T01:36:36Z

Maybe something from:
http://stackoverflow.com/questions/437589/how-do-i-unload-reload-a-python-module
On Oct 14, 2013 9:34 PM, "Jeff Tratner" notifications@github.com wrote:

It's a good point, I've definitely thought about it too. Not sure how you'd
test within Python, don't know how you could unload it to load again.
Memory footprint might be a better metric, but even that isn't so high.

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/5189#issuecomment-26303269
.

jtratner · 2013-10-15T03:09:09Z

Go for it - I'd certainly be interested to see the result.

cancan101 · 2013-10-15T03:34:00Z

@jtratner Any reason for this code to still exist in frequencies:

for _i, _weekday in enumerate(['MON', 'TUE', 'WED', 'THU', 'FRI']):
    for _iweek in range(4):
        _name = 'WOM-%d%s' % (_iweek + 1, _weekday)
        _offset_map[_name] = offsets.WeekOfMonth(week=_iweek, weekday=_i)
        _rule_aliases[_name.replace('-', '@')] = _name

jtratner · 2013-10-15T03:35:34Z

shoot, the _offset_map part shouldn't be there. _rule_aliases needs to be though (I think). I didn't cover everything, there's still a bunch of work for you to do after this.

jtratner · 2013-10-15T03:36:11Z

I mean in the internal cleanup you previously proposed.

cancan101 · 2013-10-15T03:36:53Z

I dont think you need the alias part either. You handle that:

def _make_offset(key):
    """Gets offset based on key. KeyError if prefix is bad, ValueError if
    suffix is bad. All handled by `get_offset` in tseries/frequencies. Not
    public."""
    if key is None:
        return None
    split = key.replace('@', '-').split('-')

cancan101 · 2013-10-15T03:42:47Z

I will leave the alias in for now but I will remove the _offset_map?

jtratner · 2013-10-15T03:53:31Z

yes

On Mon, Oct 14, 2013 at 11:42 PM, Alex Rothberg notifications@github.comwrote:

I will leave the alias in for now but I will remove the _offset_map ?

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/5189#issuecomment-26307329
.

cancan101 · 2014-04-14T19:31:49Z

pandas/tseries/offsets.py

+    klass = prefix_mapping[split[0]]
+    # handles case where there's no suffix (and will TypeError if too many '-')
+    obj = klass._from_name(*split[1:])
+    obj._named = key


What is the reason for setting the _named field here? This leads to some printing weirdness: #5148 (comment)

to preserve the special case existing behavior for shorter offset names (i.e. there are offsets that are synonyms and we wanted to preserve their repr output in the console).

The issue is depending how the same offset is created , it will print
differently.
On Apr 14, 2014 5:49 PM, "Jeff Tratner" notifications@github.com wrote:

In pandas/tseries/offsets.py:

+if not _np_version_under1p7:

Only 1.7+ supports nanosecond resolution

prefix_mapping['N'] = Nano

+def _make_offset(key):

"""Gets offset based on key. KeyError if prefix is bad, ValueError if

suffix is bad. All handled by get_offset in tseries/frequencies. Not

public."""

if key is None:

return None

split = key.replace('@', '-').split('-')

klass = prefix_mapping[split[0]]

handles case where there's no suffix (and will TypeError if too many '-')

obj = klass._from_name(*split[1:])

obj._named = key

to preserve the special case existing behavior for shorter offset names
(i.e. there are offsets that are synonyms and we wanted to preserve their
repr output in the console).

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/5189/files#r11610049
.

okay. 6 months later, I agree with you that we should just have canonicalized offset names.

looking back, that was probably a mistake on my part, I'm okay with changing it

ghost assigned jtratner Oct 12, 2013

jreback reviewed Oct 12, 2013
View reviewed changes

cancan101 reviewed Oct 12, 2013
View reviewed changes

cancan101 mentioned this pull request Oct 12, 2013

ENH: Support for "52–53-week fiscal year" / "4–4–5 calendar" and LastWeekOfMonth DateOffset #5004

Merged

cancan101 reviewed Oct 13, 2013
View reviewed changes

jreback reviewed Oct 13, 2013
View reviewed changes

CLN/ENH: Stop instantiating all offsets on load.

9c1bae4

Use a more flexible lookup based on prefix then delegate creation to class' `from_name()` method. Totally get rid of hasOffsetName (not necessary). Finally, make some inheriting go on to simplify things.

jtratner added a commit that referenced this pull request Oct 15, 2013

Merge pull request #5189 from jtratner/stop-instantiating-offsets-on-…

4db583d

…load CLN/ENH: Stop instantiating all offsets on load.

jtratner merged commit 4db583d into pandas-dev:master Oct 15, 2013

jtratner deleted the stop-instantiating-offsets-on-load branch October 15, 2013 03:35

cancan101 reviewed Apr 14, 2014
View reviewed changes

sinhrks mentioned this pull request Sep 27, 2015

BUG: pd.frequency.get_offset cache may be overwritten #11192

Closed

Uh oh!

CLN/ENH: Stop instantiating all offsets on load. #5189

CLN/ENH: Stop instantiating all offsets on load. #5189

Uh oh!

Conversation

jtratner commented Oct 12, 2013

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cancan101 commented Oct 12, 2013

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cancan101 commented Oct 12, 2013

Uh oh!

jtratner commented Oct 12, 2013

Uh oh!

jtratner commented Oct 12, 2013

Uh oh!

cancan101 commented Oct 12, 2013

Uh oh!

jtratner commented Oct 12, 2013

Uh oh!

cancan101 commented Oct 12, 2013

Uh oh!

jtratner commented Oct 12, 2013

Uh oh!

jtratner commented Oct 12, 2013

Uh oh!

cancan101 commented Oct 12, 2013

Uh oh!

cancan101 commented Oct 12, 2013

Uh oh!

jtratner commented Oct 12, 2013

Uh oh!

jtratner commented Oct 12, 2013

Uh oh!

cancan101 commented Oct 12, 2013

Uh oh!

jtratner commented Oct 12, 2013

Uh oh!

cancan101 commented Oct 12, 2013

Uh oh!

jtratner commented Oct 12, 2013

Uh oh!

cancan101 commented Oct 12, 2013

Uh oh!

jtratner commented Oct 12, 2013

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jtratner commented Oct 13, 2013

Uh oh!

cancan101 commented Oct 13, 2013