Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

_offset_map appears to have redundant data #5028

Closed
cancan101 opened this issue Sep 28, 2013 · 11 comments
Closed

_offset_map appears to have redundant data #5028

cancan101 opened this issue Sep 28, 2013 · 11 comments
Labels
Frequency DateOffsets Refactor Internal refactoring of code

Comments

@cancan101
Copy link
Contributor

Many / most of the entries in _offset_map look like:

_offset_map['X'] = OffsetX()

where X is a string literal for what would have been returned by OffsetX().rule_code

See for example: https://github.com/pydata/pandas/blob/master/pandas/tseries/frequencies.py#L132

This makes adding to and maintaining the list of Offsets more cumbersome.

Manually populating _offset_map could be reserved for cases in which the entry in _offset_map is not the same as the rule_code. The remaining cases would be auto-populated using a classmethod on the Offset to return all possible Offsets.

A related map is _offset_names

One suggestion would be to change get_offset_name to return offset.rule_code in the else case rather than raising an exception. This would remove the need for _offset_names

@cancan101
Copy link
Contributor Author

I was able to verify that ALL entries in _offset_map have the form _offset_map[OffsetX().rule_code] = OffsetX():

    def test_offset_map(self):
        for name,offset in pandas.tseries.frequencies._offset_map.iteritems():
            assert name == None if offset is None else offset.rule_code

Given that is there any reason to keep the code looking like this:

_offset_map = {
    'D': Day(),
    'C': cday,
    'B': BDay(),
    'H': Hour(),
    'T': Minute(),
    'S': Second(),
    'L': Milli(),
    'U': Micro(),
...

@cpcloud
Copy link
Member

cpcloud commented Oct 2, 2013

Instead of assert name == None do assert name is None because there is only ever a single object with the value None (i.e., you can't make distinct None objects like you can with almost every other kind of object). It's also standard practice when writing Python.

@cancan101
Copy link
Contributor Author

@cpcloud I actually think you are wrong there :-)

@cancan101
Copy link
Contributor Author

On second look, my precedence is wrong.

@cancan101
Copy link
Contributor Author

The == was intentional with the None as it was supposed to say:

assert name == (None if offset is None else offset.rule_code)

@cpcloud
Copy link
Member

cpcloud commented Oct 2, 2013

@cancan101 Not a big deal. You can also use a plain ol' if..else suite here; no need to scrunch things up if you don't need to.

@cancan101
Copy link
Contributor Author

That is what happens when you write the code initially without any if/else and then realize there are Nones in there.

@cancan101
Copy link
Contributor Author

Also now that the test is fixed, it DOES fail... Looking into that.

@cancan101
Copy link
Contributor Author

Okay. So it looks like that test will fail with 'QS', 'BQS', 'BQ', not surprisingly since they are constructed like:

'QS': QuarterBegin(startingMonth=1),
'QS-JAN': QuarterBegin(startingMonth=1),

Any reason not to just put the 'QS' et. al. entries into the _rule_aliases?

As a side note:

This code is not well defined for the above situations:

_offset_names = {}
for name, offset in compat.iteritems(_offset_map):
    if offset is None:
        continue
    offset.name = name
    _offset_names[offset] = name

@cancan101
Copy link
Contributor Author

#5148 is my WIP for this. So far the above unit test passes and the values in _offset_names are no longer ambiguous.

@jreback jreback modified the milestones: 0.15.0, 0.14.0 Mar 28, 2014
@jreback jreback modified the milestones: 0.14.1, 0.14.0 Apr 27, 2014
@jreback jreback modified the milestones: 0.15.0, 0.14.1 Jun 10, 2014
@jreback jreback modified the milestones: 0.15.0, 0.15.1 Jul 6, 2014
@jreback jreback modified the milestones: 0.15.1, 0.15.0 Sep 8, 2014
@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015
@mroeschke
Copy link
Member

Looks like _offset_map is a dynamic cache now. There's probably been a lot of refactoring since this issue so closing unless I have misunderstand this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Frequency DateOffsets Refactor Internal refactoring of code
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants