Series(index=DatetimeIndex(...)) constructor very slow #11433

lenolib · 2015-10-26T13:42:56Z

When instantiating a series with a datetimeindex but no data value (=None), a very slow codepath is used (involving Timestamp creation), compared to if a NaN value is supplied. The results are identical.

dr = pd.date_range(
    start=datetime(2015,10,26),
    end=datetime(2016,1,1),
    freq='10s'
)  # ~500k long
pd.Series(index=dr)  # slow, 2.78 s
pd.Series(np.nan, index=dr)  # fast, 0.01 s

pandas version 0.17.0

The text was updated successfully, but these errors were encountered:

jreback · 2015-10-26T21:05:23Z

This patch will fix it. needs an asv benchmark though. Want to do a pull-request?

diff --git a/pandas/core/series.py b/pandas/core/series.py
index 2fc90ef..9103cdf 100644
--- a/pandas/core/series.py
+++ b/pandas/core/series.py
@@ -169,19 +169,24 @@ class Series(base.IndexOpsMixin, generic.NDFrame):
                         index = Index(data)
                     else:
                         index = Index(_try_sort(data))
+
                 try:
                     if isinstance(index, DatetimeIndex):
-                        # coerce back to datetime objects for lookup
-                        data = _dict_compat(data)
-                        data = lib.fast_multiget(data, index.astype('O'),
-                                                 default=np.nan)
+                        if len(data):
+                            # coerce back to datetime objects for lookup
+                            data = _dict_compat(data)
+                            data = lib.fast_multiget(data, index.astype('O'),
+                                                     default=np.nan)
+                        else:
+                            data = np.nan
                     elif isinstance(index, PeriodIndex):
-                        data = [data.get(i, nan) for i in index]
+                        data = [data.get(i, nan) for i in index] if len(data) else np.nan
+
                     else:
                         data = lib.fast_multiget(data, index.values,
                                                  default=np.nan)
                 except TypeError:
-                    data = [data.get(i, nan) for i in index]
+                    data = [data.get(i, nan) for i in index] if len(data) else np.nan

             elif isinstance(data, SingleBlockManager):
                 if index is None:

@jreback

ref pandas-dev#11433 Code. taken from @jreback comment on pandas-dev#11433

lenolib changed the title ~~Series(index=DatetimeIndex(...)) very slow~~ Series(index=DatetimeIndex(...)) constructor very slow Oct 26, 2015

jreback added Datetime Datetime data dtype Performance Memory or execution speed performance Effort Low and removed Effort Low labels Oct 26, 2015

jreback added this to the Next Major Release milestone Oct 26, 2015

jreback added the Effort Low label Oct 26, 2015

lexual mentioned this issue Nov 13, 2015

PERF: Faster Series construction with no data and DatetimeIndex. #11598

Merged

jreback modified the milestones: 0.17.1, Next Major Release Nov 13, 2015

lexual added a commit to lexual/pandas that referenced this issue Nov 15, 2015

PERF: Faster Series construction with no data and DatetimeIndex.

20511dd

ref pandas-dev#11433 Code. taken from @jreback comment on pandas-dev#11433

jreback closed this as completed in #11598 Nov 15, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Series(index=DatetimeIndex(...)) constructor very slow #11433

Series(index=DatetimeIndex(...)) constructor very slow #11433

lenolib commented Oct 26, 2015

jreback commented Oct 26, 2015

Series(index=DatetimeIndex(...)) constructor very slow #11433

Series(index=DatetimeIndex(...)) constructor very slow #11433

Comments

lenolib commented Oct 26, 2015

jreback commented Oct 26, 2015