vectorised setting of timestamp columns fails with python datetime and numpy datetime64 #10408

Closed
seanv507 opened this Issue Jun 22, 2015 · 13 comments

Comments

Projects
None yet
4 participants
import pandas as pd
import numpy as np
import datetime as dt
z=dt.date(2010,11,1)
zs=[z+dt.timedelta(days=r) for r in range(5)]
df=pd.DataFrame({'obj':zs, 'b':pd.Timestamp('2010-10-01'),'c':pd.Timestamp('2010-10-01')})
df.dtypes

#df.loc[0:2,'c']=dt.date(2010,10,12) # causes error: long() argument must be a string or a number, not 'datetime.date
df.loc[0:2,'c']=np.datetime64('2010-10-12') # sets to 1970...
df.at[4,'c']=np.datetime64('2010-10-12') # works
df.loc[0:2,'obj']=np.datetime64('2010-10-12') #works

df.loc[0:2,'obj']=dt.date(2010,10,12)

df
ind b c obj
0 2010-10-01 1970-01-01 2010-10-12
1 2010-10-01 1970-01-01 2010-10-12
2 2010-10-01 1970-01-01 2010-10-12
3 2010-10-01 2010-10-01 2010-11-04
4 2010-10-01 2010-10-12 2010-11-05

I am using Pandas 0.16.2

Contributor

jreback commented Jun 22, 2015

datetime.date are normally not supported for most datetime operations, simply use datetime.datetime and this will work. datetime.date are stored as object dtypes and thus are not very efficient. If you want to use a pure date object, the you might find Period objects useful.

I will mark this as a bug, and if you'd like to dig-in you are welcome. This fix is actually pretty straightforward.

jreback added this to the Someday milestone Jun 22, 2015

jreback added the Bug label Jun 22, 2015

@jreback, you understood there is also an issue with numpy datetime64 'dates' not just python datetime.dates. (ie I get 1970 when I use a numpy datetime64 date)

Contributor

jreback commented Jun 23, 2015

@seanv507 you must have an older version of pandas/numpy. In current this works with np.datetime (see issue #9516), which is in 0.16.0.

In [13]: df.loc[0:2,'c']=np.datetime64('2010-10-12') # sets to 1970...

In [14]: df
Out[14]: 
           b                             c         obj
0 2010-10-01 1970-01-01 00:00:00.000014894  2010-11-01
1 2010-10-01 1970-01-01 00:00:00.000014894  2010-11-02
2 2010-10-01 1970-01-01 00:00:00.000014894  2010-11-03
3 2010-10-01 2010-10-01 00:00:00.000000000  2010-11-04
4 2010-10-01 2010-10-01 00:00:00.000000000  2010-11-05

@jreback I can confirm this also with pandas 0.16.2 / numpy 1.9.2

In [36]: np.__version__
Out[36]: '1.9.2'

In [37]: pd.__version__
Out[37]: '0.16.2'

In [38]: df.loc[0:2,'c'] = np.datetime64('2010-10-12')

In [39]: df
Out[39]:
           b                             c         obj
0 2010-10-01 1970-01-01 00:00:00.000014894  2010-11-01
1 2010-10-01 1970-01-01 00:00:00.000014894  2010-11-02
2 2010-10-01 1970-01-01 00:00:00.000014894  2010-11-03
3 2010-10-01 2010-10-01 00:00:00.000000000  2010-11-04
4 2010-10-01 2010-10-01 00:00:00.000000000  2010-11-05

which is the version i am using as I mentioned in my original bug report
(forgot to state the numpy version 1.9.2)

On Tue, Jun 23, 2015 at 12:52 PM, Joris Van den Bossche <
notifications@github.com> wrote:

@jreback https://github.com/jreback I can confirm this also with pandas
0.16.2 / numpy 1.9.2

In [36]: np.version
Out[36]: '1.9.2'

In [37]: pd.version
Out[37]: '0.16.2'

In [38]: df.loc[0:2,'c'] = np.datetime64('2010-10-12')

In [39]: df
Out[39]:
b c obj
0 2010-10-01 1970-01-01 00:00:00.000014894 2010-11-01
1 2010-10-01 1970-01-01 00:00:00.000014894 2010-11-02
2 2010-10-01 1970-01-01 00:00:00.000014894 2010-11-03
3 2010-10-01 2010-10-01 00:00:00.000000000 2010-11-04
4 2010-10-01 2010-10-01 00:00:00.000000000 2010-11-05


Reply to this email directly or view it on GitHub
pydata#10408 (comment).

Contributor

jreback commented Jun 23, 2015

Ahh, ok, seems that the test was insufficient; e.g. we are testing the equivalent of the [21], you are doing [22]

In [21]: np.datetime64(Timestamp('2010-10-12'))
Out[21]: numpy.datetime64('2010-10-11T20:00:00.000000-0400')

In [22]: np.datetime64('2010-10-12')
Out[22]: numpy.datetime64('2010-10-12')

ok, easy enough prob to fix, want to take a crack at it?

@jreback jreback modified the milestone: 0.17.0, Someday Jun 23, 2015

Contributor

jreback commented Jun 23, 2015

fix was in #9522 (original)

@jreback the PR you link to is about a datetime64 in the left-hand-side (inside the loc), while here it is the value being assigned, so don't know if this is related

Contributor

jreback commented Jun 23, 2015

ahh right - ok should be straightforward in any event

yep, indeed. But just to be sure this does not get lost, I will open a separate issue, and leave this one for the date error problem

-> #10412

@jreback - yes I will give it a go!

seanv507 closed this Jun 23, 2015

seanv507 reopened this Jun 23, 2015

Contributor

jreback commented Jun 23, 2015

@seanv507 gr8! here are the contributing docs. shout if you need help.

@yarikoptic yarikoptic added a commit to neurodebian/pandas that referenced this issue Jul 2, 2015

@yarikoptic yarikoptic Merge commit 'v0.16.2-42-g383865f' into debian
* commit 'v0.16.2-42-g383865f': (72 commits)
  BUG: provide categorical concat always on axis 0, #10430     numpy 1.10 makes this an error for 1-d on axis != 0
  DOC: update missing.rst with ref to groupby.rst
  BUG: Timedeltas with no specified units (and frac) should raise, #10426
  BUG: using .loc[:,column] fails when the object is a multi-index, #10408
  Removed scikit-timeseries migration docs from FAQ
  BUG: GH10395 bug in DataFrame.interpolate with axis=1 and inplace=True
  BUG: GH10392 bug where Table.select_column does not preserve column name
  TST: Use unicode literals in string test
  PERF: fix _get_level_indexer to accept an intermediate indexer result
  PERF: bench for #10287
  BUG: drop_duplicates drops name(s).
  ENH: Enable ExcelWriter to construct in-memory sheets
  BLD: remove support for 3.2, #9118
  PERF: timedelta and datetime64 ops improvements
  PERF: parse timedelta strings in cython #6755
  closes bug in reset_index when index contains NaT
  Check for size=0 before setting item Fixes #10193
  closes bug in apply when function returns categorical
  BUG: frequencies.get_freq_code raises an error against offset with n != 1
  CI: run doc-tests always
  ...
be8c77a
Contributor

schettino72 commented Jul 21, 2015

PR #10644

jreback closed this in #10644 Jul 24, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment