You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When instantiating a series with a datetimeindex but no data value (=None), a very slow codepath is used (involving Timestamp creation), compared to if a NaN value is supplied. The results are identical.
dr = pd.date_range(
start=datetime(2015,10,26),
end=datetime(2016,1,1),
freq='10s'
) # ~500k long
pd.Series(index=dr) # slow, 2.78 s
pd.Series(np.nan, index=dr) # fast, 0.01 s
pandas version 0.17.0
The text was updated successfully, but these errors were encountered:
lenolib
changed the title
Series(index=DatetimeIndex(...)) very slow
Series(index=DatetimeIndex(...)) constructor very slow
Oct 26, 2015
This patch will fix it. needs an asv benchmark though. Want to do a pull-request?
diff --git a/pandas/core/series.py b/pandas/core/series.py
index 2fc90ef..9103cdf 100644
--- a/pandas/core/series.py
+++ b/pandas/core/series.py
@@ -169,19 +169,24 @@ class Series(base.IndexOpsMixin, generic.NDFrame):
index = Index(data)
else:
index = Index(_try_sort(data))
+
try:
if isinstance(index, DatetimeIndex):
- # coerce back to datetime objects for lookup
- data = _dict_compat(data)
- data = lib.fast_multiget(data, index.astype('O'),
- default=np.nan)
+ if len(data):
+ # coerce back to datetime objects for lookup
+ data = _dict_compat(data)
+ data = lib.fast_multiget(data, index.astype('O'),
+ default=np.nan)
+ else:
+ data = np.nan
elif isinstance(index, PeriodIndex):
- data = [data.get(i, nan) for i in index]
+ data = [data.get(i, nan) for i in index] if len(data) else np.nan
+
else:
data = lib.fast_multiget(data, index.values,
default=np.nan)
except TypeError:
- data = [data.get(i, nan) for i in index]
+ data = [data.get(i, nan) for i in index] if len(data) else np.nan
elif isinstance(data, SingleBlockManager):
if index is None:
When instantiating a series with a datetimeindex but no data value (=None), a very slow codepath is used (involving Timestamp creation), compared to if a NaN value is supplied. The results are identical.
pandas version 0.17.0
The text was updated successfully, but these errors were encountered: