New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLN: Refactor series datetime arithmetic #4613

Merged
merged 1 commit into from Aug 27, 2013

Conversation

Projects
None yet
3 participants
@jtratner
Contributor

jtratner commented Aug 20, 2013

This is a prelude to the larger refactoring of arithmetic into core/ops,
etc. It doesn't change any external behavior (and I'm relatively confident that
the test suite covers all the branches).

The complexity of the time arithmetic code was making it hard to tweak,
so I decided to change it up. It's not that complicated, but wanted to
run it by you all in an easy-to-read diff before I change its files.

Specifically it:

  1. Moves Series time arithmetic out of _arith_method into a encapsulated _TimeOp class.
  2. Consolidates the previously repeated type conversions into one.
  3. Separates validations from type conversions, which makes the function
    easier to follow and write.
  4. Changes all obscure character codes to clear ones.
  5. Removes the unnecessary set() calls in convert_to_array.
@jtratner

This comment has been minimized.

Show comment
Hide comment
@jtratner

jtratner Aug 20, 2013

Contributor

@cpcloud it probably should be a style convention going forward to avoid using character codes in favor of the clearer, more verbose names.

For example, this previously had two separate conditional branches that did:

if cond1:
    lvalues.astype('i8')
    rvalues.astype('i8')
elif cond2:
    lvalues.astype(np.int64)
    rvalues.astype(np.int64)

Written like the above, it's harder to see they do the same thing and can be combined.

I'd also argue that dtypes like m8[ns] and M8[ns] should be written as timedelta64[ns]and datetime64[ns] respectively. Much clearer that way.

Contributor

jtratner commented Aug 20, 2013

@cpcloud it probably should be a style convention going forward to avoid using character codes in favor of the clearer, more verbose names.

For example, this previously had two separate conditional branches that did:

if cond1:
    lvalues.astype('i8')
    rvalues.astype('i8')
elif cond2:
    lvalues.astype(np.int64)
    rvalues.astype(np.int64)

Written like the above, it's harder to see they do the same thing and can be combined.

I'd also argue that dtypes like m8[ns] and M8[ns] should be written as timedelta64[ns]and datetime64[ns] respectively. Much clearer that way.

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Aug 20, 2013

Contributor

would it make sense to have a class hold the self and other then have the functions as methods of the class? (maybe in core/ops)

something like

t = _TimeOps(self, other)
t.get_result()

this patter is pretty common
eg concat, groupby etc

makes things into nice packages
rther than just collectives of functions

Contributor

jreback commented Aug 20, 2013

would it make sense to have a class hold the self and other then have the functions as methods of the class? (maybe in core/ops)

something like

t = _TimeOps(self, other)
t.get_result()

this patter is pretty common
eg concat, groupby etc

makes things into nice packages
rther than just collectives of functions

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Aug 20, 2013

Contributor

agree about dtype codes (though i8 is very common)

Contributor

jreback commented Aug 20, 2013

agree about dtype codes (though i8 is very common)

@cpcloud

This comment has been minimized.

Show comment
Hide comment
@cpcloud

cpcloud Aug 20, 2013

Member

i really don't like that m8[unit] business so +1 on the type code stuff

Member

cpcloud commented Aug 20, 2013

i really don't like that m8[unit] business so +1 on the type code stuff

@jtratner

This comment has been minimized.

Show comment
Hide comment
@jtratner

jtratner Aug 20, 2013

Contributor

@jreback that's a good idea - it was definitely getting awkward to pass everything around like that.

on type codes, it's not that it's wrong, it's just confusing for those who aren't in the know (or are skimming) 😄

Contributor

jtratner commented Aug 20, 2013

@jreback that's a good idea - it was definitely getting awkward to pass everything around like that.

on type codes, it's not that it's wrong, it's just confusing for those who aren't in the know (or are skimming) 😄

@cpcloud

This comment has been minimized.

Show comment
Hide comment
@cpcloud

cpcloud Aug 20, 2013

Member

OTOH it's extremely cumbersome to type out datetime64[ns] all the time (esp. in ipython)....hopefully pandas takes a lot of the pain away of having to manually cast things like that 😄 however a simple td64[ns] or dt64[ns] would have sufficed. Wonder where the m comes from...

Member

cpcloud commented Aug 20, 2013

OTOH it's extremely cumbersome to type out datetime64[ns] all the time (esp. in ipython)....hopefully pandas takes a lot of the pain away of having to manually cast things like that 😄 however a simple td64[ns] or dt64[ns] would have sufficed. Wonder where the m comes from...

@cpcloud

This comment has been minimized.

Show comment
Hide comment
@cpcloud

cpcloud Aug 20, 2013

Member

let me not derail. i'm also +1 on the _TimeOps encapsulation.

god forbid we started using the ? type code (bool). kind of weird. i guess it's a tip of the hat to its C roots a la a == b ? a : b

Member

cpcloud commented Aug 20, 2013

let me not derail. i'm also +1 on the _TimeOps encapsulation.

god forbid we started using the ? type code (bool). kind of weird. i guess it's a tip of the hat to its C roots a la a == b ? a : b

@jtratner

This comment has been minimized.

Show comment
Hide comment
@jtratner

jtratner Aug 20, 2013

Contributor

I'm slightly mystified by the test failure here - I assume that it's a real failure, because it's failing on the same test (and no output should change from current master). Maybe I introduced a type mismatch? I also don't get why 2.7 and 3.3 would pass but not 2.6 or 3.2.

Contributor

jtratner commented Aug 20, 2013

I'm slightly mystified by the test failure here - I assume that it's a real failure, because it's failing on the same test (and no output should change from current master). Maybe I introduced a type mismatch? I also don't get why 2.7 and 3.3 would pass but not 2.6 or 3.2.

@cpcloud

This comment has been minimized.

Show comment
Hide comment
@cpcloud

cpcloud Aug 20, 2013

Member

sounds like a numpy 1.6 issue...2.6 and 3.2 both have np 1.6.1 and 1.6.2 respectively while 2.7 and 3.3 have 1.7.1

Member

cpcloud commented Aug 20, 2013

sounds like a numpy 1.6 issue...2.6 and 3.2 both have np 1.6.1 and 1.6.2 respectively while 2.7 and 3.3 have 1.7.1

@jtratner

This comment has been minimized.

Show comment
Hide comment
@jtratner

jtratner Aug 20, 2013

Contributor

whelp I'll figure it out...

Contributor

jtratner commented Aug 20, 2013

whelp I'll figure it out...

@jtratner

This comment has been minimized.

Show comment
Hide comment
@jtratner

jtratner Aug 27, 2013

Contributor

@jreback @cpcloud finally got this all cleaned up, working and refactored into a class. I'll squash the commits down into 1 big commit after any comments on this are addressed. (much easier to tweak with the split out commits). Plus, the Series arithmetic method is now much cleaner.

After this is finished, will be much easier to do the ops refactor.

Contributor

jtratner commented Aug 27, 2013

@jreback @cpcloud finally got this all cleaned up, working and refactored into a class. I'll squash the commits down into 1 big commit after any comments on this are addressed. (much easier to tweak with the split out commits). Plus, the Series arithmetic method is now much cleaner.

After this is finished, will be much easier to do the ops refactor.

@jtratner

This comment has been minimized.

Show comment
Hide comment
@jtratner

jtratner Aug 27, 2013

Contributor

okay, I squashed this such that it's ready to go - obviously this will later be moved into core/ops, but, as I said, I wanted to do this separately because it's more complicated than the other refactoring.

Contributor

jtratner commented Aug 27, 2013

okay, I squashed this such that it's ready to go - obviously this will later be moved into core/ops, but, as I said, I wanted to do this separately because it's more complicated than the other refactoring.

Move time operations for Series into class _TimeOp
* clarify/simplify time arithmetic-specific code into a separate class
* cleanup extraneous `set` and containment checks in branches
* refactor validation checks to separate function
* combine branches and dry out repetitive code
* move convert_to_array out of arith method
* self --> left; other --> right [possibly clearer for future refacotr]
* change handling of DateTimeIndex (no longer need additional return)
* add np1.6 check for integer --> timedelta conversion under the hood
@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Aug 27, 2013

Contributor

@jtratner looks good to me....your first merge (that you do yourself)!

Contributor

jreback commented Aug 27, 2013

@jtratner looks good to me....your first merge (that you do yourself)!

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Aug 27, 2013

Contributor

maye should create a core/types.py where
all type stuff is consolidated ? (pretty much in common.py now)

Contributor

jreback commented Aug 27, 2013

maye should create a core/types.py where
all type stuff is consolidated ? (pretty much in common.py now)

jtratner added a commit that referenced this pull request Aug 27, 2013

Merge pull request #4613 from jtratner/refactor-datetimes
CLN: Refactor series datetime arithmetic

@jtratner jtratner merged commit c472099 into pandas-dev:master Aug 27, 2013

@jtratner

This comment has been minimized.

Show comment
Hide comment
@jtratner

jtratner Aug 27, 2013

Contributor

Yeah, that might make sense - I'll keep that in mind. Now on to finishing up the arithmetic refactoring!

Contributor

jtratner commented Aug 27, 2013

Yeah, that might make sense - I'll keep that in mind. Now on to finishing up the arithmetic refactoring!

@jtratner jtratner deleted the jtratner:refactor-datetimes branch Aug 28, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment