Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: pd.frequency.get_offset cache may be overwritten #11192

Closed
sinhrks opened this issue Sep 26, 2015 · 2 comments
Closed

BUG: pd.frequency.get_offset cache may be overwritten #11192

sinhrks opened this issue Sep 26, 2015 · 2 comments
Labels
Bug Frequency DateOffsets
Milestone

Comments

@sinhrks
Copy link
Member

sinhrks commented Sep 26, 2015

Because frequencies.get_offset may be broken by user operation because of its cache.

https://github.com/pydata/pandas/blob/master/pandas/tseries/frequencies.py#L515

import pandas as pd
x = pd.tseries.frequencies.get_offset('D')
x.__dict__
# {'normalize': False, 'kwds': {}, 'n': 1, '_offset': datetime.timedelta(1), '_named': 'D', '_use_relativedelta': False}

# modify its property
x.n = 5

# NG, modified instance is returned
y = pd.tseries.frequencies.get_offset('D')
y.__dict__
# {'normalize': False, 'kwds': {}, 'n': 5, '_offset': datetime.timedelta(1), '_named': 'D', '_use_relativedelta': False}

It should return a copy of the cache.

@sinhrks sinhrks added Bug Frequency DateOffsets labels Sep 26, 2015
@sinhrks sinhrks added this to the 0.17.1 milestone Sep 26, 2015
@jreback
Copy link
Contributor

jreback commented Sep 26, 2015

yeh these are class properties, but get_offset should prob return a copy. Though need to check for perf impacts on this.

@sinhrks
Copy link
Member Author

sinhrks commented Sep 27, 2015

n is an instance property specify its length/span. I don't think user touch it actually though...

From what I've seen, there is no function uses freq taken from get_offset as it is. Applying numeric ops (like multiplying freq mult) will results in copy. Thus, user ops should not affect to internal cache.

pd.tseries.frequencies.get_offset('A')
# A-DEC
pd.tseries.frequencies.get_offset('A') * 1
# <YearEnd: month=12>

Above repr difference is caused by _named property, which looks to be attached by _make_offset. It is introduced in #5189 for repr backcompat. I don't think it is needed anymore (because user can't get freq from get_offset).

Maybe we can cleanup freq-retrieving functions in freqnencies.py and guarantee user can't touch the internal cache.

@jreback jreback modified the milestones: Next Major Release, 0.17.1 Nov 13, 2015
@jreback jreback modified the milestones: 0.18.0, Next Major Release Dec 13, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Frequency DateOffsets
Projects
None yet
Development

No branches or pull requests

2 participants