Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Support for "52–53-week fiscal year" / "4–4–5 calendar" and LastWeekOfMonth DateOffset #5004

Merged
merged 1 commit into from Oct 22, 2013

Conversation

@cancan101
Copy link
Contributor

commented Sep 27, 2013

Closes #4511 and #4637

  • Added LastWeekOfMonth DateOffset #4637
  • Added FY5253, and FY5253Quarter DateOffsets #4511
  • Improved error handling in get_freq_code, _period_str_to_code and _base_and_stride
  • Fix issue with datetime - Timestamp
@cancan101

This comment has been minimized.

Copy link
Contributor Author

commented Sep 27, 2013

Right now I am using the variable startingMonth to mean the month in which the fiscal year ends. This seems silly but IS consistent with BQuarterEnd, for example.

@cpcloud

This comment has been minimized.

Copy link
Member

commented Sep 27, 2013

@cancan101 Looks like u need a rebase. Can you also squash/reword most of these commits? Maybe leave 1 or 2 with a moderate amount of detail in the commit message. Thanks

@cancan101

This comment has been minimized.

Copy link
Contributor Author

commented Sep 27, 2013

@cpcloud Rebased and squashed.

@jreback

This comment has been minimized.

Copy link
Contributor

commented Sep 27, 2013

@cancan101 can this be done as a parameter instead of creating new classes?

@cancan101

This comment has been minimized.

Copy link
Contributor Author

commented Sep 27, 2013

Do you mean instead of creating five new public classes and two private classes, just create 3 public classes? If so, probably, there is minimal code in two of the classes. That being said, downstream use of parameters seem clunky. See for example #5015 and some of our discussions here: #4511 (comment)

Or do you mean using parameters instead of creating any new classes?

@jreback

This comment has been minimized.

Copy link
Contributor

commented Sep 28, 2013

@cancan101 I like your `LastWeekInMonthoffset, but I think that they other 4 offsets are quite specialized. Would it not be better to haveFYear``(fiscal year), and just parameterize, with a``typ`` parameter? that would control the prefix, qtr_week_in and so on ?

@jtratner

This comment has been minimized.

Copy link
Contributor

commented Sep 28, 2013

You could also try an enum-like class (or toplevel defitinition) that has members that define each frequency in translation, so you'd do this:

class Offsets(object):
    LastWeekOfMonth = 'LWM'
    Day = 'D'
    Week = 'W'
    BusinessDay = 'B'

and then put the class in the toplevel namespace of pandas. Not sure if it would be better, but it might help resolve this. I too don't like putting so many new classes in.

@cancan101

This comment has been minimized.

Copy link
Contributor Author

commented Sep 28, 2013

@jreback @jtratner Reducing the number of classes should not be too hard. That being said I still think it makes sense to add three classes: LastWeekOfMonth, FYear and FQuarter. I feel that there is a significant difference in functionality and meaning between fiscal quarters and fiscal years. Thoughts?

Also to be clear, these new Offsets are not the standard fiscal year/ quarter but rather the "retail fiscal year" aka 52-53 week years.

@cancan101

This comment has been minimized.

Copy link
Contributor Author

commented Oct 2, 2013

bump

@cpcloud

This comment has been minimized.

Copy link
Member

commented Oct 2, 2013

needs a rebase .. some changes in offsets.py and the usual release notes gnat

@cancan101

This comment has been minimized.

Copy link
Contributor Author

commented Oct 2, 2013

There should be release notes. Are more needed?
What changes to offsets.py I can reduce the number of classes, but I still think it makes sense to add three classes: LastWeekOfMonth, FYear and FQuarter.

@jreback

This comment has been minimized.

Copy link
Contributor

commented Oct 2, 2013

I ok with those classes.

@cpcloud

This comment has been minimized.

Copy link
Member

commented Oct 2, 2013

@cancan101 I meant that there were recently some changes in offsets.py....(now we have nanosecond date_ranges! yay!)

@cancan101

This comment has been minimized.

Copy link
Contributor Author

commented Oct 2, 2013

@jreback I assume by those you mean the three I listed?

GIven that I am going to reduce the number of classes, what should the rule_code look like? It already looks ugly as is. I posted: #5015

@jreback

This comment has been minimized.

Copy link
Contributor

commented Oct 2, 2013

can you put a couple of possibilites up?

@jtratner

This comment has been minimized.

Copy link
Contributor

commented Oct 2, 2013

@cancan101 @jreback did you mean reduce the number of classes altogether or just reduce the number of public classes?

If you combine classes, use a classmethod that takes a rulecode and spits out the correct offset.

@cancan101

This comment has been minimized.

Copy link
Contributor Author

commented Oct 2, 2013

@jreback

For the yearly offset:
Currently:
For last X day of month: RL-%{month}%{day} for example RL-DECSAT
For nearest X to last day of month: RN-%{month}%{day} for example RN-DECSAT

For the quarterly-offset:
For last X day of month: RLQ-%{qtr}%{month}%{day} for example RLQ-2DECSAT
For nearest X to last day of month: RNQ-%{qtr}%{month}%{day} for example RNQ-1DECSAT
where qtr is the quarter with the leap week.

These are hard to read since after the initial dash, there is no other delimitation. Further because of the regex I referred to in #5015, the number must be the first character after the dash.

Question: Will anything break if one DateOffset class produces rule codes with different prefixes?

@jtratner What do you mean by creating a "classmethod that takes a rulecode and spits out the correct offset."? Are your referring to the idea I suggested in #5028?

@cancan101

This comment has been minimized.

Copy link
Contributor Author

commented Oct 2, 2013

Given that I am going to add just FYear and FQuarter, how should the end be specified? One options would be a boolean (for example use_closest). Another would be an enum (which currently only takes on two values).

@jtratner

This comment has been minimized.

Copy link
Contributor

commented Oct 2, 2013

No, that's not what I mean. What I mean is that you might want to let people use a class without having to pass all the arguments to the constructor. For example, here's what we did in TimeOp so that __init__ could take all different arguments and then return different types or None or whatever: https://github.com/pydata/pandas/blob/master/pandas/core/ops.py#L382

@cancan101

This comment has been minimized.

Copy link
Contributor Author

commented Oct 2, 2013

@jtratner I don't really follow what would be the point of that method here.

@jtratner

This comment has been minimized.

Copy link
Contributor

commented Oct 2, 2013

Didn't realize you only had two at this point.

@cancan101

This comment has been minimized.

Copy link
Contributor Author

commented Oct 2, 2013

@jtratner Theoretically these could be merged into BYear and BQuarter. I am hesitant to do so because there are more parameters for these new classes than apply to BYear and BQuarter.

You could then merge BYear and BQuarter into Year and Quarter and just use an enum like:

{LastDay, LastBDay, ClosestToLastX, LastX} but that seems like overkill.

@cancan101

This comment has been minimized.

Copy link
Contributor Author

commented Oct 2, 2013

Otherwise as far as my reading of Wikipedia and subsequent research has gone, there are only two variations of the 52-53 week calendar.

@jtratner

This comment has been minimized.

Copy link
Contributor

commented Oct 2, 2013

Sounds like you've spent a lot of time thinking about this!

@jreback

This comment has been minimized.

Copy link
Contributor

commented Oct 4, 2013

@cancan101 can you rebase and update?

@cancan101

This comment has been minimized.

Copy link
Contributor Author

commented Oct 4, 2013

@jreback Can I get some feedback on the rule code?

@@ -367,9 +369,27 @@ def get_period_alias(offset_str):

for _i, _weekday in enumerate(['MON', 'TUE', 'WED', 'THU', 'FRI']):

This comment has been minimized.

Copy link
@cpcloud

cpcloud Oct 4, 2013

Member

just define a no argument function (add add global declarations where necessary) and call it. then you don't need to have all these underscored variables (obviously keep the ones that were there before)

This comment has been minimized.

Copy link
@cancan101

cancan101 Oct 5, 2013

Author Contributor

@cpcloud What do you think about moving this logic to a classmethod on the respective offsets?
Also I am not quite sure what you mean by the function.

This comment has been minimized.

Copy link
@cpcloud

cpcloud Oct 5, 2013

Member

classmethod probably ok....

by function i just meant something like

def initializer():
    # your init code here
initializer()

This comment has been minimized.

Copy link
@cancan101

cancan101 Oct 5, 2013

Author Contributor

The idea with classmethods is to put the logic for generating all instances of a given offset with the offset itself rather than having the code in separate files

This comment has been minimized.

Copy link
@cancan101

cancan101 Oct 5, 2013

Author Contributor

initializer for now

_name = offset.rule_code
_offset_map[_name] = offset

for _weekday in range(7):

This comment has been minimized.

Copy link
@cpcloud

cpcloud Oct 4, 2013

Member

same as above, a no argument function

@@ -744,7 +761,7 @@ def _period_str_to_code(freqstr):
try:
freqstr = freqstr.upper()
return _period_code_map[freqstr]
except:
except KeyError:
alias = _period_alias_dict[freqstr]
return _period_code_map[alias]

This comment has been minimized.

Copy link
@cpcloud

cpcloud Oct 4, 2013

Member

can you catch the KeyError here as well and provide a more informative error message? it's slightly annoying when you pass a frequency that isn't defined and you get a KeyError. there is a test for this, so it's acknowledged that it does raise but it's not very user-friendly

This comment has been minimized.

Copy link
@cancan101

cancan101 Oct 5, 2013

Author Contributor

Fixed.


return self.getOffsetOfMonth(other + relativedelta(months=months, day=1))

def getOffsetOfMonth(self, dt):

This comment has been minimized.

Copy link
@cpcloud

cpcloud Oct 4, 2013

Member

instead of awesomeFunction use awesome_function, ie separate words by underscores, same for below

This comment has been minimized.

Copy link
@cancan101

cancan101 Oct 5, 2013

Author Contributor

@cpcloud

Which method are your referring to? If it is getOffsetOfMonth, i was jsut being consistent with existing code: https://github.com/cancan101/pandas/blob/6789ef89977e4b357a86a1d07d781c7cff703711/pandas/tseries/offsets.py#L738

This comment has been minimized.

Copy link
@cpcloud

cpcloud Oct 5, 2013

Member

okay no problemo then

@jreback

This comment has been minimized.

Copy link
Contributor

commented Oct 21, 2013

dt is a datetime, not a Timestamp..., you prov want to convert it at the top of year_has_extra_week

-> week_in_year = (next_year_end - prev_year_end).days/7
(Pdb) l
1242                next_year_end = dt
1243            else:    
1244                next_year_end = dt + self._offset
1245                prev_year_end = dt - self._offset
1246            
1247 ->         week_in_year = (next_year_end - prev_year_end).days/7
1248 
1249            return week_in_year == 53     
1250        
1251        def onOffset(self, dt):
1252            if self._offset.onOffset(dt):
(Pdb) l
1253                return True
1254            
1255            next_year_end = dt - self._offset
1256 
1257            qtr_lens = self.get_weeks(dt)
1258                
1259            current = next_year_end
1260            for qtr_len in qtr_lens[0:4]:
1261                current += relativedelta(weeks=qtr_len)
1262                if dt == current:
1263                    return True
(Pdb) (Timestamp(next_year_end)-Timestamp(prev_year_end)).days/7
52
@cancan101

This comment has been minimized.

Copy link
Contributor Author

commented Oct 21, 2013

Any idea what changed?

@cancan101

This comment has been minimized.

Copy link
Contributor Author

commented Oct 21, 2013

Also, what is wrong with doing the math on a datetime?

(datetime.datetime(2013,10,24)-datetime.datetime(2013,10,21)).days
@jreback

This comment has been minimized.

Copy link
Contributor

commented Oct 21, 2013

this is your code, you probably have some relativedelata stuff in there which is not returning the correct type

@cancan101

This comment has been minimized.

Copy link
Contributor Author

commented Oct 21, 2013

Okay, I can take a closer look. I assume relativedelata returns a datetime?
Also this worked until I rebased onto master. I had rebased relatively recently too. I wonder what changed?

@jreback

This comment has been minimized.

Copy link
Contributor

commented Oct 21, 2013

not much has been merged recently....can you pinpoint what commit it last worked?

@cancan101

This comment has been minimized.

Copy link
Contributor Author

commented Oct 21, 2013

I'll see what I can come up with

@jtratner

This comment has been minimized.

Copy link
Contributor

commented Oct 21, 2013

Use git reflog to grab your pre-rebase branch. Can you try doing arithmetic
with datetimes and Timestamp and see if you can reproduce it? If you can
get that specific error, then there is probably a bug with Timestamp.

That is a very weird error to get - suggests there is (potentially) some
part of that code that flips the order of args to__sub__

@wuan

This comment has been minimized.

Copy link
Contributor

commented Oct 21, 2013

The error occurs when subtracting a Timestamp from a datetime type.

git bisect found that the problem was introduced to master with commit 02158e6 on September 30th.

I will just have a close look on that one.

@cancan101

This comment has been minimized.

Copy link
Contributor Author

commented Oct 21, 2013

That weird. I'm pretty sure I had rebased more recently to get @jtratner's change for Offset construction.

@wuan

This comment has been minimized.

Copy link
Contributor

commented Oct 21, 2013

in tslib.pyx class _Timestamp(datetime), method sub

    datetime.__sub__(self, other)

was replaced by

    super(_Timestamp, self).__sub__(other)

As _Timestamp is a direct subtype of datetime.datetime, the result should be the same. But the latter case leads to the error

TypeError: super(type, obj): obj must be an instance or subtype of type

I have no clue why self in _Timestamp is not an instance or subtype ot _Timestamp.

@jreback

This comment has been minimized.

Copy link
Contributor

commented Oct 21, 2013

@cancan101 just revert that one line

datetime.__sub__(other) and should be ok

they don't act the same because _Timestamp is a shadow class (of a c-lass) AFAIK..

@cancan101

This comment has been minimized.

Copy link
Contributor Author

commented Oct 21, 2013

@cpcloud Thoughts?

@wuan

This comment has been minimized.

Copy link
Contributor

commented Oct 21, 2013

@cancan101 Could you add a test for subtracting a datetime from a timestamp to test_tslib.py as well?

@cancan101

This comment has been minimized.

Copy link
Contributor Author

commented Oct 21, 2013

Sure.

@jtratner

This comment has been minimized.

Copy link
Contributor

commented Oct 21, 2013

Hurray for git bisect - which is why you want every commit to at least
build, if not pass the test suite. Btw, I think @jreback's replacement
should be datetime.__sub__(self, other)

@cancan101

This comment has been minimized.

Copy link
Contributor Author

commented Oct 22, 2013

Issues should be fixed.
On Oct 21, 2013 5:09 PM, "Jeff Tratner" notifications@github.com wrote:

Hurray for git bisect - which is why you want every commit to at least
build, if not pass the test suite. Btw, I think @jreback's replacement
should be datetime.__sub__(self, other)


Reply to this email directly or view it on GitHubhttps://github.com//pull/5004#issuecomment-26756646
.

@jtratner

This comment has been minimized.

Copy link
Contributor

commented Oct 22, 2013

Okay, please rebase on top of latest master if you haven't already
On Oct 21, 2013 9:30 PM, "Alex Rothberg" notifications@github.com wrote:

Issues should be fixed.
On Oct 21, 2013 5:09 PM, "Jeff Tratner" notifications@github.com wrote:

Hurray for git bisect - which is why you want every commit to at least
build, if not pass the test suite. Btw, I think @jreback's replacement
should be datetime.__sub__(self, other)


Reply to this email directly or view it on GitHub<
https://github.com/pydata/pandas/pull/5004#issuecomment-26756646>
.


Reply to this email directly or view it on GitHubhttps://github.com//pull/5004#issuecomment-26771149
.

@cancan101

This comment has been minimized.

Copy link
Contributor Author

commented Oct 22, 2013

Is it not merge able? If not I'll rebate.
On Oct 21, 2013 9:38 PM, "Jeff Tratner" notifications@github.com wrote:

Okay, please rebase on top of latest master if you haven't already
On Oct 21, 2013 9:30 PM, "Alex Rothberg" notifications@github.com
wrote:

Issues should be fixed.
On Oct 21, 2013 5:09 PM, "Jeff Tratner" notifications@github.com
wrote:

Hurray for git bisect - which is why you want every commit to at least
build, if not pass the test suite. Btw, I think @jreback's replacement
should be datetime.__sub__(self, other)


Reply to this email directly or view it on GitHub<
https://github.com/pydata/pandas/pull/5004#issuecomment-26756646>
.


Reply to this email directly or view it on GitHub<
https://github.com/pydata/pandas/pull/5004#issuecomment-26771149>
.


Reply to this email directly or view it on GitHubhttps://github.com//pull/5004#issuecomment-26771408
.

@jtratner

This comment has been minimized.

Copy link
Contributor

commented Oct 22, 2013

You need to rebase because this previously failed after rebasing (wasn't
this PR's fault) and I want to make sure I'm not merging something that's
going to break the builds/tests.

@cancan101

This comment has been minimized.

Copy link
Contributor Author

commented Oct 22, 2013

Np. Rebased and pushed. Easy enough. Should be running tests now.

@cancan101

This comment has been minimized.

Copy link
Contributor Author

commented Oct 22, 2013

passed

ENH: Support for "52–53-week fiscal year" / "4–4–5 calendar" and
LastWeekOfMonth DateOffset.
- Added ``LastWeekOfMonth`` DateOffset
- Added ``FY5253 and ``FY5253Quarter`` DateOffsets
@cancan101

This comment has been minimized.

Copy link
Contributor Author

commented Oct 22, 2013

@jtratner What can I do to get this merged in? I rebased earlier today

jtratner added a commit that referenced this pull request Oct 22, 2013
Merge pull request #5004 from cancan101/retail
ENH: Support for "52–53-week fiscal year" / "4–4–5 calendar" and LastWeekOfMonth DateOffset

@jtratner jtratner merged commit b55c790 into pandas-dev:master Oct 22, 2013

@jtratner

This comment has been minimized.

Copy link
Contributor

commented Oct 22, 2013

there you go :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants
You can’t perform that action at this time.