Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pd.date_range Attribute_Error #5200

Closed
vipatel10 opened this issue Oct 12, 2013 · 9 comments
Closed

pd.date_range Attribute_Error #5200

vipatel10 opened this issue Oct 12, 2013 · 9 comments

Comments

@vipatel10
Copy link

I'm using pandas 0.12 and numpy 1.7.1.

I'm trying to calculate the number of business days between two columns of dates. Not all rows have valid dates though, and pd.date_range does not have an ignore errors option, so I'm getting this error:

In [439]: td = pd.DataFrame({'a':[pd.Timestamp('2010-1-1'), pd.Timestamp('2010-2-1')], 'b':[pd.Timestamp('2010-1-10'), p
d.Timestamp('2010-2-9')]})

In [440]: td.apply(lambda row: len(pd.date_range(row['a'], row['b'], freq='B')), axis=1)
Out[440]:
0    6
1    7
dtype: int64

In [441]: td = pd.DataFrame({'a':[pd.Timestamp('2010-1-1'), pd.Timestamp('2010-2-1')], 'b':[pd.Timestamp('2010-1-10'), p
d.NaT]})

In [442]: td.apply(lambda row: len(pd.date_range(row['a'], row['b'], freq='B')), axis=1)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-442-30b1500c1594> in <module>()
----> 1 td.apply(lambda row: len(pd.date_range(row['a'], row['b'], freq='B')), axis=1)

C:\Python27\lib\site-packages\pandas\core\frame.pyc in apply(self, func, axis, broadcast, raw, args, **kwds)
   4414                     return self._apply_raw(f, axis)
   4415                 else:
-> 4416                     return self._apply_standard(f, axis)
   4417             else:
   4418                 return self._apply_broadcast(f, axis)

C:\Python27\lib\site-packages\pandas\core\frame.pyc in _apply_standard(self, func, axis, ignore_failures)
   4489                     # no k defined yet
   4490                     pass
-> 4491                 raise e
   4492
   4493

AttributeError: ("'NaTType' object has no attribute 'tz'", u'occurred at index 1')

In [443]:
@jreback
Copy link
Contributor

jreback commented Oct 12, 2013

That is quite ineffficient as you'd have to construct the entire series. Try this instead.
Period handingling of NaT is not supported currently, but easy enough to roll your own.

In [27]: def f(row):
   ....:     if pd.isnull(row['a']) or pd.isnull(row['b']):
   ....:         return np.nan
   ....:     return pd.Period(row['a'],freq='B')-pd.Period(row['b'],freq='B')
   ....: 

In [29]: td.apply(f,axis=1)
Out[29]: 
0    -5
1   NaN
dtype: float64

@vipatel10
Copy link
Author

Thanks so much!

On Sat, Oct 12, 2013 at 7:04 PM, jreback notifications@github.com wrote:

That is quite ineffficient as you'd have to construct the entire series.
Try this instead.
Period handingling of NaT is not supported currently, but easy enough to
roll your own.

In [27]: def f(row):
....: if pd.isnull(row['a']) or pd.isnull(row['b']):
....: return np.nan
....: return pd.Period(row['a'],freq='B')-pd.Period(row['b'],freq='B')
....:

In [29]: td.apply(f,axis=1)
Out[29]:
0 -5
1 NaN
dtype: float64


Reply to this email directly or view it on GitHubhttps://github.com//issues/5200#issuecomment-26207925
.

@vipatel10
Copy link
Author

Actually, I just noticed this below. Does that make sense to you? I would think all three should be 3, but notice one result is 4.

In [475]: a = pd.Period(pd.Timestamp('2013-10-1'), freq='B')

In [476]: b = pd.Period(pd.Timestamp('2013-10-4'), freq='B')

In [477]: b - a
Out[477]: 3L

In [478]: b = pd.Period(pd.Timestamp('2013-10-5'), freq='B')

In [479]: b - a
Out[479]: 4L

In [480]: b = pd.Period(pd.Timestamp('2013-10-6'), freq='B')

In [481]: b - a
Out[481]: 3L

In [482]:

@vipatel10
Copy link
Author

This I think is the problem:

In [485]: b = pd.Period(pd.Timestamp('2013-10-4'), freq='B')

In [486]: b
Out[486]: Period('2013-10-04', 'B')

In [487]: b = pd.Period(pd.Timestamp('2013-10-5'), freq='B')

In [488]: b
Out[488]: Period('2013-10-07', 'B')

In [489]: b = pd.Period(pd.Timestamp('2013-10-6'), freq='B')

In [490]: b
Out[490]: Period('2013-10-04', 'B')

In [491]:

@jreback
Copy link
Contributor

jreback commented Oct 12, 2013

that is correct, business days are not on weekends (10/6) is a sunday

@vipatel10
Copy link
Author

Right, but why does b = pd.Period(pd.Timestamp('2013-10-5'), freq='B') round up to 10-7-13, but

b = pd.Period(pd.Timestamp('2013-10-6'), freq='B') rounds down to 10-4-13?

@jreback
Copy link
Contributor

jreback commented Oct 12, 2013

might be a bug.....will create an issue about it....

@jreback jreback closed this as completed Oct 12, 2013
@vipatel10
Copy link
Author

Thanks!

On Sat, Oct 12, 2013 at 7:47 PM, jreback notifications@github.com wrote:

might be a bug.....will create an issue about it....


Reply to this email directly or view it on GitHubhttps://github.com//issues/5200#issuecomment-26208464
.

@jreback
Copy link
Contributor

jreback commented Oct 12, 2013

see #5203

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants