Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Series(index=DatetimeIndex(...)) constructor very slow #11433

Closed
lenolib opened this issue Oct 26, 2015 · 1 comment · Fixed by #11598
Closed

Series(index=DatetimeIndex(...)) constructor very slow #11433

lenolib opened this issue Oct 26, 2015 · 1 comment · Fixed by #11598
Labels
Datetime Datetime data dtype Performance Memory or execution speed performance
Milestone

Comments

@lenolib
Copy link
Contributor

lenolib commented Oct 26, 2015

When instantiating a series with a datetimeindex but no data value (=None), a very slow codepath is used (involving Timestamp creation), compared to if a NaN value is supplied. The results are identical.

dr = pd.date_range(
    start=datetime(2015,10,26),
    end=datetime(2016,1,1),
    freq='10s'
)  # ~500k long
pd.Series(index=dr)  # slow, 2.78 s
pd.Series(np.nan, index=dr)  # fast, 0.01 s

pandas version 0.17.0

@lenolib lenolib changed the title Series(index=DatetimeIndex(...)) very slow Series(index=DatetimeIndex(...)) constructor very slow Oct 26, 2015
@jreback
Copy link
Contributor

jreback commented Oct 26, 2015

This patch will fix it. needs an asv benchmark though. Want to do a pull-request?

diff --git a/pandas/core/series.py b/pandas/core/series.py
index 2fc90ef..9103cdf 100644
--- a/pandas/core/series.py
+++ b/pandas/core/series.py
@@ -169,19 +169,24 @@ class Series(base.IndexOpsMixin, generic.NDFrame):
                         index = Index(data)
                     else:
                         index = Index(_try_sort(data))
+
                 try:
                     if isinstance(index, DatetimeIndex):
-                        # coerce back to datetime objects for lookup
-                        data = _dict_compat(data)
-                        data = lib.fast_multiget(data, index.astype('O'),
-                                                 default=np.nan)
+                        if len(data):
+                            # coerce back to datetime objects for lookup
+                            data = _dict_compat(data)
+                            data = lib.fast_multiget(data, index.astype('O'),
+                                                     default=np.nan)
+                        else:
+                            data = np.nan
                     elif isinstance(index, PeriodIndex):
-                        data = [data.get(i, nan) for i in index]
+                        data = [data.get(i, nan) for i in index] if len(data) else np.nan
+
                     else:
                         data = lib.fast_multiget(data, index.values,
                                                  default=np.nan)
                 except TypeError:
-                    data = [data.get(i, nan) for i in index]
+                    data = [data.get(i, nan) for i in index] if len(data) else np.nan

             elif isinstance(data, SingleBlockManager):
                 if index is None:

@jreback jreback added Datetime Datetime data dtype Performance Memory or execution speed performance Effort Low and removed Effort Low labels Oct 26, 2015
@jreback jreback added this to the Next Major Release milestone Oct 26, 2015
@jreback jreback modified the milestones: 0.17.1, Next Major Release Nov 13, 2015
lexual added a commit to lexual/pandas that referenced this issue Nov 15, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants