Series(index=DatetimeIndex(...)) constructor very slow #11433

Closed
lenolib opened this Issue Oct 26, 2015 · 1 comment

Comments

Projects
None yet
2 participants
Contributor

lenolib commented Oct 26, 2015

When instantiating a series with a datetimeindex but no data value (=None), a very slow codepath is used (involving Timestamp creation), compared to if a NaN value is supplied. The results are identical.

dr = pd.date_range(
    start=datetime(2015,10,26),
    end=datetime(2016,1,1),
    freq='10s'
)  # ~500k long
pd.Series(index=dr)  # slow, 2.78 s
pd.Series(np.nan, index=dr)  # fast, 0.01 s

pandas version 0.17.0

lenolib changed the title from Series(index=DatetimeIndex(...)) very slow to Series(index=DatetimeIndex(...)) constructor very slow Oct 26, 2015

Contributor

jreback commented Oct 26, 2015

This patch will fix it. needs an asv benchmark though. Want to do a pull-request?

diff --git a/pandas/core/series.py b/pandas/core/series.py
index 2fc90ef..9103cdf 100644
--- a/pandas/core/series.py
+++ b/pandas/core/series.py
@@ -169,19 +169,24 @@ class Series(base.IndexOpsMixin, generic.NDFrame):
                         index = Index(data)
                     else:
                         index = Index(_try_sort(data))
+
                 try:
                     if isinstance(index, DatetimeIndex):
-                        # coerce back to datetime objects for lookup
-                        data = _dict_compat(data)
-                        data = lib.fast_multiget(data, index.astype('O'),
-                                                 default=np.nan)
+                        if len(data):
+                            # coerce back to datetime objects for lookup
+                            data = _dict_compat(data)
+                            data = lib.fast_multiget(data, index.astype('O'),
+                                                     default=np.nan)
+                        else:
+                            data = np.nan
                     elif isinstance(index, PeriodIndex):
-                        data = [data.get(i, nan) for i in index]
+                        data = [data.get(i, nan) for i in index] if len(data) else np.nan
+
                     else:
                         data = lib.fast_multiget(data, index.values,
                                                  default=np.nan)
                 except TypeError:
-                    data = [data.get(i, nan) for i in index]
+                    data = [data.get(i, nan) for i in index] if len(data) else np.nan

             elif isinstance(data, SingleBlockManager):
                 if index is None:

jreback added this to the Next Major Release milestone Oct 26, 2015

jreback added the Effort Low label Oct 26, 2015

@jreback jreback modified the milestone: 0.17.1, Next Major Release Nov 13, 2015

jreback closed this in #11598 Nov 15, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment