New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: pandas' import time #16764

Closed
chris-b1 opened this Issue Jun 23, 2017 · 9 comments

Comments

Projects
None yet
5 participants
@chris-b1
Contributor

chris-b1 commented Jun 23, 2017

I wouldn't normally be concerned about this, as of it course it only happens once, but our import time has gotten quite long, to the point I notice it hanging my ipython startup.

I don't have a good sense of what would be required to improve this, probably deferring more imports to be just in time?

on 0.20.2 - each import in a separate process

>>> import timeit
>>> timeit.timeit('import pandas', number=1)
1.0524120664765442
>>> quit()

>>> timeit.timeit('import numpy', number=1)
0.1550516492424085
>>> quit()

>>> timeit.timeit('import matplotlib', number=1)
0.24022248792225612

Below is a single process, importing deps first

>>> import timeit
>>> timeit.timeit('import matplotlib', number=1)
0.2508611853454641
>>> timeit.timeit('import numpy', number=1)
1.2033075643458346e-05
>>> timeit.timeit('import pandas', number=1)
0.840005673777485
@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Jun 23, 2017

Contributor

do these in independent processes

Contributor

jreback commented Jun 23, 2017

do these in independent processes

@chris-b1

This comment has been minimized.

Show comment
Hide comment
@chris-b1

chris-b1 Jun 23, 2017

Contributor

Oh, right - matplotlib one already was, updated the top for numpy.

Contributor

chris-b1 commented Jun 23, 2017

Oh, right - matplotlib one already was, updated the top for numpy.

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Jun 23, 2017

Contributor

or better yet import numpy and matplotlib first
then run it

Contributor

jreback commented Jun 23, 2017

or better yet import numpy and matplotlib first
then run it

@TomAugspurger

This comment has been minimized.

Show comment
Hide comment
@TomAugspurger

TomAugspurger Jun 23, 2017

Contributor

We do attempt to import matplotlib at import time. We could delay that with something like

diff --git a/pandas/plotting/__init__.py b/pandas/plotting/__init__.py
index c3cbedb0f..8f98e297e 100644
--- a/pandas/plotting/__init__.py
+++ b/pandas/plotting/__init__.py
@@ -4,12 +4,6 @@ Plotting api
 
 # flake8: noqa
 
-try:  # mpl optional
-    from pandas.plotting import _converter
-    _converter.register()  # needs to override so set_xlim works with str/number
-except ImportError:
-    pass
-
 from pandas.plotting._misc import (scatter_matrix, radviz,
                                    andrews_curves, bootstrap_plot,
                                    parallel_coordinates, lag_plot,
diff --git a/pandas/plotting/_core.py b/pandas/plotting/_core.py
index 391fa377f..9821c89c4 100644
--- a/pandas/plotting/_core.py
+++ b/pandas/plotting/_core.py
@@ -37,12 +37,7 @@ from pandas.plotting._tools import (_subplots, _flatten, table,
                                     _get_xlim, _set_ticks_props,
                                     format_date_labels)
 
-
-if _mpl_ge_1_5_0():
-    # Compat with mp 1.5, which uses cycler.
-    import cycler
-    colors = mpl_stylesheet.pop('axes.color_cycle')
-    mpl_stylesheet['axes.prop_cycle'] = cycler.cycler('color', colors)
+_registered = False
 
 
 def _get_standard_kind(kind):
@@ -92,6 +87,7 @@ class MPLPlot(object):
                  secondary_y=False, colormap=None,
                  table=False, layout=None, **kwds):
 
+        self._setup()
         self.data = data
         self.by = by
 
@@ -175,6 +171,20 @@ class MPLPlot(object):
 
         self._validate_color_args()
 
+    def _setup(self):
+        global _registered
+        if not _registered:
+            from pandas.plotting import _converter
+            _converter.register()
+
+            if _mpl_ge_1_5_0():
+                # Compat with mp 1.5, which uses cycler.
+                import cycler
+                colors = mpl_stylesheet.pop('axes.color_cycle')
+                mpl_stylesheet['axes.prop_cycle'] = cycler.cycler('color', colors)
+
+            _registered = True
+
     def _validate_color_args(self):
         if 'color' not in self.kwds and 'colors' in self.kwds:
             warnings.warn(("'colors' is being deprecated. Please use 'color'"

That covers all the .plot methods. Would need a decorator or something to cover the plotting methods not attached to NDFrame.

Contributor

TomAugspurger commented Jun 23, 2017

We do attempt to import matplotlib at import time. We could delay that with something like

diff --git a/pandas/plotting/__init__.py b/pandas/plotting/__init__.py
index c3cbedb0f..8f98e297e 100644
--- a/pandas/plotting/__init__.py
+++ b/pandas/plotting/__init__.py
@@ -4,12 +4,6 @@ Plotting api
 
 # flake8: noqa
 
-try:  # mpl optional
-    from pandas.plotting import _converter
-    _converter.register()  # needs to override so set_xlim works with str/number
-except ImportError:
-    pass
-
 from pandas.plotting._misc import (scatter_matrix, radviz,
                                    andrews_curves, bootstrap_plot,
                                    parallel_coordinates, lag_plot,
diff --git a/pandas/plotting/_core.py b/pandas/plotting/_core.py
index 391fa377f..9821c89c4 100644
--- a/pandas/plotting/_core.py
+++ b/pandas/plotting/_core.py
@@ -37,12 +37,7 @@ from pandas.plotting._tools import (_subplots, _flatten, table,
                                     _get_xlim, _set_ticks_props,
                                     format_date_labels)
 
-
-if _mpl_ge_1_5_0():
-    # Compat with mp 1.5, which uses cycler.
-    import cycler
-    colors = mpl_stylesheet.pop('axes.color_cycle')
-    mpl_stylesheet['axes.prop_cycle'] = cycler.cycler('color', colors)
+_registered = False
 
 
 def _get_standard_kind(kind):
@@ -92,6 +87,7 @@ class MPLPlot(object):
                  secondary_y=False, colormap=None,
                  table=False, layout=None, **kwds):
 
+        self._setup()
         self.data = data
         self.by = by
 
@@ -175,6 +171,20 @@ class MPLPlot(object):
 
         self._validate_color_args()
 
+    def _setup(self):
+        global _registered
+        if not _registered:
+            from pandas.plotting import _converter
+            _converter.register()
+
+            if _mpl_ge_1_5_0():
+                # Compat with mp 1.5, which uses cycler.
+                import cycler
+                colors = mpl_stylesheet.pop('axes.color_cycle')
+                mpl_stylesheet['axes.prop_cycle'] = cycler.cycler('color', colors)
+
+            _registered = True
+
     def _validate_color_args(self):
         if 'color' not in self.kwds and 'colors' in self.kwds:
             warnings.warn(("'colors' is being deprecated. Please use 'color'"

That covers all the .plot methods. Would need a decorator or something to cover the plotting methods not attached to NDFrame.

@TomAugspurger

This comment has been minimized.

Show comment
Hide comment
@TomAugspurger

TomAugspurger Jun 23, 2017

Contributor

Looks like get_versions takes up about 25% of the import time for pandas.__init__.py; That could easily be delayed.

Contributor

TomAugspurger commented Jun 23, 2017

Looks like get_versions takes up about 25% of the import time for pandas.__init__.py; That could easily be delayed.

@TomAugspurger

This comment has been minimized.

Show comment
Hide comment
@TomAugspurger

TomAugspurger Jun 23, 2017

Contributor

Oh, sorry, I was thinking of show_versions, not get_versions. get_versions would be a bit harder to fix... I did try out https://github.com/pypa/setuptools_scm instead of versioneer, and it worked well. May be worth looking into.

Contributor

TomAugspurger commented Jun 23, 2017

Oh, sorry, I was thinking of show_versions, not get_versions. get_versions would be a bit harder to fix... I did try out https://github.com/pypa/setuptools_scm instead of versioneer, and it worked well. May be worth looking into.

@jorisvandenbossche

This comment has been minimized.

Show comment
Hide comment
@jorisvandenbossche

jorisvandenbossche Jun 23, 2017

Member

I did a profile with https://github.com/cournape/import-profiler

From a very quick skim:

  • s3fs (boto3) also takes a lot of time (140.4 of 786.7 ms). This can maybe be delayed?
  • do we need to import pytest ? (43 ms) (this is in pandas.util._tester, I think we can easily move the pytest import inside the test function?)
  • xlsxwriter (22 ms) probably doesn't need to be imported on pandas import (but didn't look into it, it is imported in the config files)

Not huge, but those two would already remove ca 20% of the import time.

However I don't see get_versions somewhere in there, so not sure how reliable the results are.

Full output:

In [1]: from import_profiler import profile_import
   ...: 
   ...: with profile_import() as context:
   ...:     # Anything expensive in here
   ...:     import pandas
   ...: 

In [2]: context.print_info()
  cumtime (ms)    intime (ms)  name
         786.7           48.4  pandas
         196.9            3.3  +numpy
           1.8            1.8  ++_globals
           1.5            1.5  ++numpy.__config__
           1.9            1.9  ++version
           1.4            1.3  ++_import_tools
         156.9            0    ++
         156.9            3.4  +++numpy.add_newdocs
         150.9            0.8  ++++numpy.lib
         101.6            0.7  +++++type_check
          96.6            2.9  ++++++numpy.core.numeric
          93.7            1.2  +++++++numpy.core
          19              0.1  ++++++++
          18.9           18.8  +++++++++numpy.core.multiarray
           2.4            0.1  ++++++++
           2.3            2.2  +++++++++numpy.core.umath
          17.5            0    ++++++++
          17.5            2.7  +++++++++numpy.core._internal
           2.5            0.6  ++++++++++numpy.compat
           1              0    +++++++++++
           1              0.9  ++++++++++++numpy.compat._inspect
           6.2            3.4  ++++++++++ctypes
           1.5            1.5  +++++++++++_ctypes
           6              3.4  ++++++++++numerictypes
           2.3            2.2  +++++++++++numbers
           9              0    ++++++++
           8.9            2.2  +++++++++numpy.core.numeric
           6.3            1.5  ++++++++++arrayprint
           4.7            1    +++++++++++fromnumeric
           3.5            0    ++++++++++++
           3.4            3.2  +++++++++++++numpy.core._methods
           2.1            0    ++++++++
           2.1            1.7  +++++++++numpy.core.defchararray
           1.3            0    ++++++++
           1.2            1.1  +++++++++numpy.core.records
          36.4            0.1  ++++++++numpy.testing.nosetester
          36.3            0.5  +++++++++numpy.testing
          22.6            0.7  ++++++++++unittest
           3.2            0.8  +++++++++++result
           2.3            0    ++++++++++++
           2.2            2.1  +++++++++++++unittest.util
           3.7            3.4  +++++++++++case
           4.9            4.8  +++++++++++suite
           6.4            6.2  +++++++++++loader
           3.7            1.1  +++++++++++main
           2.5            0    ++++++++++++
           2.4            1.4  +++++++++++++unittest.runner
          13.1            0    ++++++++++
          13.1            2.1  +++++++++++numpy.testing.decorators
          10.9            2.2  ++++++++++++utils
           5              4.9  +++++++++++++nosetester
           3.5            3.3  +++++++++++++numpy.lib.utils
           4.2            4.1  ++++++ufunclike
          27.3            2.5  +++++index_tricks
          15.7            0    ++++++
          15.6           11.2  +++++++numpy.lib.function_base
           3.9            3.8  ++++++++numpy.lib.twodim_base
           6.8            2.7  ++++++numpy.matrixlib
           3.9            3.8  +++++++defmatrix
           2.2            2.2  ++++++numpy.lib.stride_tricks
           1.4            1.3  +++++nanfunctions
           7.1            1.4  +++++polynomial
           1              1    ++++++numpy.lib.twodim_base
           4.5            0.6  ++++++numpy.linalg
           3.4            1.1  +++++++linalg
           2.1            0    ++++++++numpy.linalg
           1.2            1.1  +++++++++numpy.linalg._umath_linalg
           4.6            1.5  +++++npyio
           1.1            1    ++++++_iotools
           2.6            2.5  +++++financial
           3.5            0    ++
           3.4            0.5  +++numpy.fft
           1.6            0.6  ++++fftpack
          10              0    ++
           9.9            0.6  +++numpy.polynomial
           3.4            1.4  ++++polynomial
           1.1            1    ++++chebyshev
           1              0.9  ++++legendre
           1.1            1    ++++hermite
           1.3            1.1  ++++hermite_e
           1.3            1.1  ++++laguerre
           7              0    ++
           7              1.2  +++numpy.random
           4.7            4.6  ++++mtrand
           2.2            0    ++
           2.2            2.1  +++numpy.ctypeslib
           6.9            0    ++
           6.8            0.6  +++numpy.ma
           4.7            0    ++++
           4.6            4.4  +++++numpy.ma.core
           1.5            0    ++++
           1.5            1.3  +++++numpy.ma.extras
           5.6            1.9  +pytz
           1.4            0.7  ++pytz.lazy
          25.5            0.8  +pandas.compat.numpy
          24.7            1.1  ++pandas.compat
           1.9            1.4  +++distutils.version
          14.6            2    +++http.client
           2              2    ++++http
          10.5            4.3  ++++ssl
           4              4    +++++ipaddress
           1.7            1.7  +++++_ssl
           5.3            0    +++dateutil
           5.2            1.6  ++++dateutil.parser
           2.9            0    +++++
           2.9            0.4  ++++++dateutil.tz
           2.5            1.5  +++++++tz
          26.2            0.4  +pandas._libs
          11.8           10.6  ++tslib
          14              1.9  ++pandas._libs.hashtable
          11.3            6.8  +++pandas._libs.lib
           2.7            2.6  ++++_decimal
          30.6            1.8  +pandas.core.config_init
           4.6            2.3  ++pandas.core.config
           2.1            0.5  +++pandas.io.formats.printing
          22.6            0.4  ++xlsxwriter
          22.2            1    +++workbook
           2.5            0.3  ++++compatibility
           1.7            1.7  +++++fractions
           9.3            5    ++++xlsxwriter.worksheet
           3.1            0.6  +++++drawing
           1.8            1.8  ++++++shape
           4              0.4  ++++xlsxwriter.packager
           1.6            0.3  ++++xlsxwriter.chart_area
           1.3            0    +++++
           1.3            1.2  ++++++xlsxwriter.chart
         369.9            0.4  +pandas.core.api
           9.9            1    ++pandas.core.algorithms
           6              0.6  +++pandas.core.dtypes.cast
           5              0.6  ++++common
           2.2            0    +++++pandas._libs
           2.2            2    ++++++pandas._libs.algos
           1.3            1.3  +++++dtypes
           2.7            0    +++pandas.core
           2.7            0.8  ++++pandas.core.common
           1.4            0.2  +++++pandas.api
           1.2            0.4  ++++++pandas.api.types
          15.4            1.1  ++pandas.core.categorical
          13.6            1.4  +++pandas.core.base
           2.9            0.3  ++++pandas.util._validators
           2.6            0.3  +++++pandas.util
           1.7            0.5  ++++++pandas.core.util.hashing
           7.6            0.8  ++++pandas.core.nanops
           6.7            0.4  +++++bottleneck
           1.7            0    ++++++
           1.7            0.4  +++++++bottleneck.slow
           1              1    ++++++reduce
           1              0.5  ++++++bottleneck.benchmark.bench
           1.7            0    ++++pandas.compat.numpy
           1.7            1.6  +++++pandas.compat.numpy.function
         330.5            6.6  ++pandas.core.groupby
          81.5            0.3  +++pandas.core.index
          81.2            0.6  ++++pandas.core.indexes.api
          27.1            2.8  +++++pandas.core.indexes.base
           4.9            0    ++++++pandas._libs
           2.3            1.8  +++++++pandas._libs.index
           2.6            1.9  +++++++pandas._libs.join
          16.2            0.9  ++++++pandas.core.ops
          15.1            0.6  +++++++pandas.core.computation.expressions
          14.5            0.3  ++++++++pandas.core.computation
          14.1            0.6  +++++++++numexpr
           4.6            4.6  ++++++++++cpuinfo
           4.7            1.2  ++++++++++numexpr.expressions
           3.5            0    +++++++++++numexpr
           3.4            3.3  ++++++++++++numexpr.interpreter
           1.4            1    ++++++++++numexpr.necompiler
           2.1            0.2  ++++++++++numexpr.tests
           1.8            1.7  +++++++++++numexpr.tests.test_numexpr
           2.4            2.2  ++++++pandas.core.strings
           2.2            2    +++++pandas.core.indexes.category
          32.4           32.2  +++++pandas.core.indexes.multi
           1.4            1.3  +++++pandas.core.indexes.interval
           1.6            1.4  +++++pandas.core.indexes.numeric
           1              0.9  +++++pandas.core.indexes.range
          11.5            1    +++++pandas.core.indexes.timedeltas
           6.3            1.8  ++++++pandas.tseries.frequencies
           4.1            2.6  +++++++pandas.tseries.offsets
           1.1            0.7  ++++++++pandas.core.tools.datetimes
           3.5            1    ++++++pandas.core.indexes.datetimelike
           2.2            1.6  +++++++pandas._libs.period
           3              1.2  +++++pandas.core.indexes.period
           1.6            1.4  ++++++pandas.core.indexes.datetimes
         235.6            7.2  +++pandas.core.frame
         161.6            3.7  ++++pandas.core.generic
           1.3            1.1  +++++pandas.core.indexing
           7.1            3.4  +++++pandas.core.internals
           3.4            1.1  ++++++pandas.core.sparse.array
           1.9            1.7  +++++++pandas._libs.sparse
         149.3            1.5  +++++pandas.io.formats.format
         147.2            0.7  ++++++pandas.io.common
           1.3            0.6  +++++++csv
         140.4            0.4  +++++++s3fs
         139.5            0.9  ++++++++core
         128              0.5  +++++++++boto3
         127.5            0.4  ++++++++++boto3.session
         116.9            0.7  +++++++++++botocore.session
          57.3            0.3  ++++++++++++botocore.configloader
           3.1            0    +++++++++++++six.moves
           3.1            3    ++++++++++++++configparser
          53.8            1.7  +++++++++++++botocore.exceptions
          52.1            0    ++++++++++++++botocore.vendored.requests.exceptions
          52              0.6  +++++++++++++++botocore.vendored.requests
          25.6            0.7  ++++++++++++++++packages.urllib3.contrib
          22.7            0.1  +++++++++++++++++botocore.vendored.requests.packages.urllib3
          22.6            0.4  ++++++++++++++++++botocore.vendored.requests.packages
          22.2            0    +++++++++++++++++++
          22.2            0.7  ++++++++++++++++++++botocore.vendored.requests.packages.urllib3
          20.2            0.7  +++++++++++++++++++++connectionpool
           1.1            1.1  ++++++++++++++++++++++exceptions
           3.8            0.4  ++++++++++++++++++++++connection
           3.3            0    +++++++++++++++++++++++util.ssl_
           3.3            0.2  ++++++++++++++++++++++++botocore.vendored.requests.packages.urllib3.util
           1.1            1    +++++++++++++++++++++++++url
          11.1            0.3  ++++++++++++++++++++++request
          10.8            0.3  +++++++++++++++++++++++filepost
           9.8            9.4  ++++++++++++++++++++++++uuid
           1.6            0.8  ++++++++++++++++++++++response
           1.1            0.9  +++++++++++++++++++++poolmanager
           2.2            1.5  +++++++++++++++++botocore.vendored.requests.packages.urllib3.contrib.pyopenssl
          20.8            0    ++++++++++++++++
          20.8            0.7  +++++++++++++++++botocore.vendored.requests.utils
           3.3            0.8  ++++++++++++++++++cgi
           2.5            0.7  +++++++++++++++++++html
           1.7            1.7  ++++++++++++++++++++html.entities
          13.6            0.4  ++++++++++++++++++compat
           4              3.2  +++++++++++++++++++urllib.request
           5.2            0    +++++++++++++++++++http
           5.2            5.1  ++++++++++++++++++++http.cookiejar
           3.3            3.3  +++++++++++++++++++http.cookies
           1.2            1.1  ++++++++++++++++++cookies
           2.3            0.8  ++++++++++++++++models
           1.1            0.5  +++++++++++++++++auth
           2.4            0.4  ++++++++++++++++api
           2.1            0    +++++++++++++++++
           2              1.1  ++++++++++++++++++botocore.vendored.requests.sessions
          12.4            2.2  ++++++++++++botocore.credentials
           8.8            0.8  +++++++++++++botocore.compat
           3.3            0    ++++++++++++++botocore.vendored
           3.2            3.1  +++++++++++++++botocore.vendored.six
           4.2            0.6  ++++++++++++++xml.etree.cElementTree
           3              1.1  +++++++++++++++xml.etree.ElementTree
           1.2            1.1  +++++++++++++botocore.utils
          41.8            1    ++++++++++++botocore.client
          29.6            0    +++++++++++++botocore
          29.6            1    ++++++++++++++botocore.waiter
           9.4            0.8  +++++++++++++++jmespath
           8.6            0.1  ++++++++++++++++jmespath
           8.6            2.1  +++++++++++++++++jmespath.parser
           3.4            0.1  ++++++++++++++++++jmespath
           3.3            1.1  +++++++++++++++++++jmespath.lexer
           2.2            1.3  ++++++++++++++++++++jmespath.exceptions
           2.3            0    ++++++++++++++++++jmespath
           2.3            0.8  +++++++++++++++++++jmespath.visitor
           1.4            0    ++++++++++++++++++++jmespath
           1.4            1.3  +++++++++++++++++++++jmespath.functions
          19.2            0.5  +++++++++++++++botocore.docs.docstring
          18.6            0.4  ++++++++++++++++botocore.docs
          18.1            0.7  +++++++++++++++++botocore.docs.service
           1.9            1.8  ++++++++++++++++++botocore.docs.utils
           3.3            0.5  ++++++++++++++++++botocore.docs.client
           2.1            0.5  +++++++++++++++++++botocore.docs.method
          10.9            0.8  ++++++++++++++++++botocore.docs.bcdoc.restdoc
           7.3            0.8  +++++++++++++++++++botocore.docs.bcdoc.docstringparser
           6.5            4.9  ++++++++++++++++++++html.parser
           1.6            1.5  +++++++++++++++++++++_markupbase
           2.3            2.2  +++++++++++++++++++botocore.docs.bcdoc.style
           1.4            0.9  +++++++++++++botocore.auth
           1.2            1    +++++++++++++botocore.awsrequest
           1.4            1.3  +++++++++++++botocore.hooks
           5.6            0.3  +++++++++++++botocore.args
           1.3            0.6  ++++++++++++++botocore.serialize
           3.3            0.3  ++++++++++++++botocore.config
           3              0.8  +++++++++++++++botocore.endpoint
           1.3            0.3  ++++++++++++++++botocore.response
           2.6            0    ++++++++++++botocore
           2.5            1.4  +++++++++++++botocore.handlers
           1              1    +++++++++++boto3.utils
           8.5            0.7  +++++++++++resources.factory
           6.6            0.4  ++++++++++++action
           4.7            0.4  +++++++++++++boto3.docs.docstring
           4.2            0.2  ++++++++++++++boto3.docs
           4              0.5  +++++++++++++++boto3.docs.service
           3.1            0.5  ++++++++++++++++boto3.docs.resource
           1.1            0.3  +++++++++++++++++boto3.docs.action
           9.9            1.1  +++++++++boto3.s3.transfer
           8.4            0.3  ++++++++++concurrent
           8.1            0.3  +++++++++++concurrent.futures
           1.4            1.3  ++++++++++++concurrent.futures._base
           6              0.7  ++++++++++++concurrent.futures.process
           2.3            0.4  +++++++++++++multiprocessing
           1.9            0    ++++++++++++++
           1.8            0.8  +++++++++++++++multiprocessing.context
           2.9            1    +++++++++++++multiprocessing.connection
           1.3            0    +++++++py.path
           1.3            0.8  ++++++++py
           3              1.9  +++++++py._path.local
          58.5            5.2  ++++pandas.core.series
           5.8            0    +++++pandas.core
           5.8            3.6  ++++++pandas.core.window
           2              1.9  +++++++pandas._libs.window
          46.3            0    +++++pandas.plotting._core
          46.2            0.6  ++++++pandas.plotting
          41              0    +++++++pandas.plotting
          40.9            1.3  ++++++++pandas.plotting._converter
          23.7            0.5  +++++++++matplotlib.units
          23.2            9    ++++++++++matplotlib
           2.5            1.5  +++++++++++distutils.sysconfig
           1              1    ++++++++++++errors
           2.9            2    +++++++++++matplotlib.cbook
           7.1            1.2  +++++++++++matplotlib.rcsetup
           2.3            2.3  ++++++++++++matplotlib.fontconfig_pattern
           2.9            1.5  ++++++++++++matplotlib.colors
           1.3            1.3  +++++++++++++_color_data
          15.3            1.3  +++++++++matplotlib.dates
           1.1            0.9  ++++++++++dateutil.rrule
          12.7            1.8  ++++++++++matplotlib.ticker
          10.9            0    +++++++++++matplotlib
          10.8            8.6  ++++++++++++matplotlib.transforms
           1.1            1    +++++++++++++path
           1.6            0.5  +++++++pandas.plotting._misc
           3              2.8  +++++++pandas.plotting._core
           7.8            0.4  ++++pandas.core.computation.eval
           6.4            3.9  +++++pandas.core.computation.expr
           1.6            0.7  ++++++pandas.core.computation.ops
           4.7            4.4  +++pandas.core.panel
           1.3            0    +++pandas._libs
           1.3            1.2  ++++pandas._libs.groupby
           2.1            1.8  ++pandas.core.panel4d
           8.9            0.9  ++pandas.core.reshape.reshape
           7              0.3  +++pandas.core.sparse.api
           3.2            2.2  ++++pandas.core.sparse.series
           3              2.9  ++++pandas.core.sparse.frame
           1.8            1.6  ++pandas.core.resample
           1.6            0.4  +pandas.stats.api
           1              1    ++pandas.stats.moments
           2.8            0.2  +pandas.core.reshape.api
           1.1            0.8  ++pandas.core.reshape.merge
          29              0.3  +pandas.io.api
           5.9            2.7  ++pandas.io.parsers
           2.6            2    +++pandas._libs.parsers
           3              1.5  ++pandas.io.excel
           1.1            1    +++pandas._libs.json
           5.8            3.5  ++pandas.io.pytables
           2              1.9  +++pandas.core.computation.pytables
           2.4            0.5  ++pandas.io.json
           1.8            1    +++json
           2              1.8  ++pandas.io.stata
           5.3            0.7  ++pandas.io.packers
           3.8            1.1  +++pandas.io.msgpack
          43.9            0.2  +pandas.util._tester
          43.7            6.1  ++pytest
           8.8            1.3  +++_pytest.config
           2.9            0.3  ++++_pytest._code
           2              1.2  +++++code
           2              0.4  ++++_pytest.hookspec
           1.5            0.2  +++++_pytest._pluggy
           1.3            1.1  ++++++_pytest.vendored_packages.pluggy
           2.3            0.5  ++++_pytest.assertion
           1.1            0    +++++_pytest.assertion
           1.1            1    ++++++_pytest.assertion.rewrite
           1.9            0.9  +++_pytest.main
           5.7            1.3  +++_pytest.python
           4.3            0    ++++_pytest
           4.3            1.8  +++++_pytest.fixtures
           1.8            1.1  ++++++py._code.code
           1.2            0.5  +++_pytest.unittest
           2.5            0.9  +++_pytest.capture
           1.4            0.7  ++++py._io.capture
           1              0.4  +++_pytest.tmpdir
          10.1            9.1  +++_pytest.junitxml
           3.3            0.3  +pandas.testing
           3              1.7  ++pandas.util.testing
           1.1            0    +++pandas._libs
           1.1            0.9  ++++pandas._libs.testing
Member

jorisvandenbossche commented Jun 23, 2017

I did a profile with https://github.com/cournape/import-profiler

From a very quick skim:

  • s3fs (boto3) also takes a lot of time (140.4 of 786.7 ms). This can maybe be delayed?
  • do we need to import pytest ? (43 ms) (this is in pandas.util._tester, I think we can easily move the pytest import inside the test function?)
  • xlsxwriter (22 ms) probably doesn't need to be imported on pandas import (but didn't look into it, it is imported in the config files)

Not huge, but those two would already remove ca 20% of the import time.

However I don't see get_versions somewhere in there, so not sure how reliable the results are.

Full output:

In [1]: from import_profiler import profile_import
   ...: 
   ...: with profile_import() as context:
   ...:     # Anything expensive in here
   ...:     import pandas
   ...: 

In [2]: context.print_info()
  cumtime (ms)    intime (ms)  name
         786.7           48.4  pandas
         196.9            3.3  +numpy
           1.8            1.8  ++_globals
           1.5            1.5  ++numpy.__config__
           1.9            1.9  ++version
           1.4            1.3  ++_import_tools
         156.9            0    ++
         156.9            3.4  +++numpy.add_newdocs
         150.9            0.8  ++++numpy.lib
         101.6            0.7  +++++type_check
          96.6            2.9  ++++++numpy.core.numeric
          93.7            1.2  +++++++numpy.core
          19              0.1  ++++++++
          18.9           18.8  +++++++++numpy.core.multiarray
           2.4            0.1  ++++++++
           2.3            2.2  +++++++++numpy.core.umath
          17.5            0    ++++++++
          17.5            2.7  +++++++++numpy.core._internal
           2.5            0.6  ++++++++++numpy.compat
           1              0    +++++++++++
           1              0.9  ++++++++++++numpy.compat._inspect
           6.2            3.4  ++++++++++ctypes
           1.5            1.5  +++++++++++_ctypes
           6              3.4  ++++++++++numerictypes
           2.3            2.2  +++++++++++numbers
           9              0    ++++++++
           8.9            2.2  +++++++++numpy.core.numeric
           6.3            1.5  ++++++++++arrayprint
           4.7            1    +++++++++++fromnumeric
           3.5            0    ++++++++++++
           3.4            3.2  +++++++++++++numpy.core._methods
           2.1            0    ++++++++
           2.1            1.7  +++++++++numpy.core.defchararray
           1.3            0    ++++++++
           1.2            1.1  +++++++++numpy.core.records
          36.4            0.1  ++++++++numpy.testing.nosetester
          36.3            0.5  +++++++++numpy.testing
          22.6            0.7  ++++++++++unittest
           3.2            0.8  +++++++++++result
           2.3            0    ++++++++++++
           2.2            2.1  +++++++++++++unittest.util
           3.7            3.4  +++++++++++case
           4.9            4.8  +++++++++++suite
           6.4            6.2  +++++++++++loader
           3.7            1.1  +++++++++++main
           2.5            0    ++++++++++++
           2.4            1.4  +++++++++++++unittest.runner
          13.1            0    ++++++++++
          13.1            2.1  +++++++++++numpy.testing.decorators
          10.9            2.2  ++++++++++++utils
           5              4.9  +++++++++++++nosetester
           3.5            3.3  +++++++++++++numpy.lib.utils
           4.2            4.1  ++++++ufunclike
          27.3            2.5  +++++index_tricks
          15.7            0    ++++++
          15.6           11.2  +++++++numpy.lib.function_base
           3.9            3.8  ++++++++numpy.lib.twodim_base
           6.8            2.7  ++++++numpy.matrixlib
           3.9            3.8  +++++++defmatrix
           2.2            2.2  ++++++numpy.lib.stride_tricks
           1.4            1.3  +++++nanfunctions
           7.1            1.4  +++++polynomial
           1              1    ++++++numpy.lib.twodim_base
           4.5            0.6  ++++++numpy.linalg
           3.4            1.1  +++++++linalg
           2.1            0    ++++++++numpy.linalg
           1.2            1.1  +++++++++numpy.linalg._umath_linalg
           4.6            1.5  +++++npyio
           1.1            1    ++++++_iotools
           2.6            2.5  +++++financial
           3.5            0    ++
           3.4            0.5  +++numpy.fft
           1.6            0.6  ++++fftpack
          10              0    ++
           9.9            0.6  +++numpy.polynomial
           3.4            1.4  ++++polynomial
           1.1            1    ++++chebyshev
           1              0.9  ++++legendre
           1.1            1    ++++hermite
           1.3            1.1  ++++hermite_e
           1.3            1.1  ++++laguerre
           7              0    ++
           7              1.2  +++numpy.random
           4.7            4.6  ++++mtrand
           2.2            0    ++
           2.2            2.1  +++numpy.ctypeslib
           6.9            0    ++
           6.8            0.6  +++numpy.ma
           4.7            0    ++++
           4.6            4.4  +++++numpy.ma.core
           1.5            0    ++++
           1.5            1.3  +++++numpy.ma.extras
           5.6            1.9  +pytz
           1.4            0.7  ++pytz.lazy
          25.5            0.8  +pandas.compat.numpy
          24.7            1.1  ++pandas.compat
           1.9            1.4  +++distutils.version
          14.6            2    +++http.client
           2              2    ++++http
          10.5            4.3  ++++ssl
           4              4    +++++ipaddress
           1.7            1.7  +++++_ssl
           5.3            0    +++dateutil
           5.2            1.6  ++++dateutil.parser
           2.9            0    +++++
           2.9            0.4  ++++++dateutil.tz
           2.5            1.5  +++++++tz
          26.2            0.4  +pandas._libs
          11.8           10.6  ++tslib
          14              1.9  ++pandas._libs.hashtable
          11.3            6.8  +++pandas._libs.lib
           2.7            2.6  ++++_decimal
          30.6            1.8  +pandas.core.config_init
           4.6            2.3  ++pandas.core.config
           2.1            0.5  +++pandas.io.formats.printing
          22.6            0.4  ++xlsxwriter
          22.2            1    +++workbook
           2.5            0.3  ++++compatibility
           1.7            1.7  +++++fractions
           9.3            5    ++++xlsxwriter.worksheet
           3.1            0.6  +++++drawing
           1.8            1.8  ++++++shape
           4              0.4  ++++xlsxwriter.packager
           1.6            0.3  ++++xlsxwriter.chart_area
           1.3            0    +++++
           1.3            1.2  ++++++xlsxwriter.chart
         369.9            0.4  +pandas.core.api
           9.9            1    ++pandas.core.algorithms
           6              0.6  +++pandas.core.dtypes.cast
           5              0.6  ++++common
           2.2            0    +++++pandas._libs
           2.2            2    ++++++pandas._libs.algos
           1.3            1.3  +++++dtypes
           2.7            0    +++pandas.core
           2.7            0.8  ++++pandas.core.common
           1.4            0.2  +++++pandas.api
           1.2            0.4  ++++++pandas.api.types
          15.4            1.1  ++pandas.core.categorical
          13.6            1.4  +++pandas.core.base
           2.9            0.3  ++++pandas.util._validators
           2.6            0.3  +++++pandas.util
           1.7            0.5  ++++++pandas.core.util.hashing
           7.6            0.8  ++++pandas.core.nanops
           6.7            0.4  +++++bottleneck
           1.7            0    ++++++
           1.7            0.4  +++++++bottleneck.slow
           1              1    ++++++reduce
           1              0.5  ++++++bottleneck.benchmark.bench
           1.7            0    ++++pandas.compat.numpy
           1.7            1.6  +++++pandas.compat.numpy.function
         330.5            6.6  ++pandas.core.groupby
          81.5            0.3  +++pandas.core.index
          81.2            0.6  ++++pandas.core.indexes.api
          27.1            2.8  +++++pandas.core.indexes.base
           4.9            0    ++++++pandas._libs
           2.3            1.8  +++++++pandas._libs.index
           2.6            1.9  +++++++pandas._libs.join
          16.2            0.9  ++++++pandas.core.ops
          15.1            0.6  +++++++pandas.core.computation.expressions
          14.5            0.3  ++++++++pandas.core.computation
          14.1            0.6  +++++++++numexpr
           4.6            4.6  ++++++++++cpuinfo
           4.7            1.2  ++++++++++numexpr.expressions
           3.5            0    +++++++++++numexpr
           3.4            3.3  ++++++++++++numexpr.interpreter
           1.4            1    ++++++++++numexpr.necompiler
           2.1            0.2  ++++++++++numexpr.tests
           1.8            1.7  +++++++++++numexpr.tests.test_numexpr
           2.4            2.2  ++++++pandas.core.strings
           2.2            2    +++++pandas.core.indexes.category
          32.4           32.2  +++++pandas.core.indexes.multi
           1.4            1.3  +++++pandas.core.indexes.interval
           1.6            1.4  +++++pandas.core.indexes.numeric
           1              0.9  +++++pandas.core.indexes.range
          11.5            1    +++++pandas.core.indexes.timedeltas
           6.3            1.8  ++++++pandas.tseries.frequencies
           4.1            2.6  +++++++pandas.tseries.offsets
           1.1            0.7  ++++++++pandas.core.tools.datetimes
           3.5            1    ++++++pandas.core.indexes.datetimelike
           2.2            1.6  +++++++pandas._libs.period
           3              1.2  +++++pandas.core.indexes.period
           1.6            1.4  ++++++pandas.core.indexes.datetimes
         235.6            7.2  +++pandas.core.frame
         161.6            3.7  ++++pandas.core.generic
           1.3            1.1  +++++pandas.core.indexing
           7.1            3.4  +++++pandas.core.internals
           3.4            1.1  ++++++pandas.core.sparse.array
           1.9            1.7  +++++++pandas._libs.sparse
         149.3            1.5  +++++pandas.io.formats.format
         147.2            0.7  ++++++pandas.io.common
           1.3            0.6  +++++++csv
         140.4            0.4  +++++++s3fs
         139.5            0.9  ++++++++core
         128              0.5  +++++++++boto3
         127.5            0.4  ++++++++++boto3.session
         116.9            0.7  +++++++++++botocore.session
          57.3            0.3  ++++++++++++botocore.configloader
           3.1            0    +++++++++++++six.moves
           3.1            3    ++++++++++++++configparser
          53.8            1.7  +++++++++++++botocore.exceptions
          52.1            0    ++++++++++++++botocore.vendored.requests.exceptions
          52              0.6  +++++++++++++++botocore.vendored.requests
          25.6            0.7  ++++++++++++++++packages.urllib3.contrib
          22.7            0.1  +++++++++++++++++botocore.vendored.requests.packages.urllib3
          22.6            0.4  ++++++++++++++++++botocore.vendored.requests.packages
          22.2            0    +++++++++++++++++++
          22.2            0.7  ++++++++++++++++++++botocore.vendored.requests.packages.urllib3
          20.2            0.7  +++++++++++++++++++++connectionpool
           1.1            1.1  ++++++++++++++++++++++exceptions
           3.8            0.4  ++++++++++++++++++++++connection
           3.3            0    +++++++++++++++++++++++util.ssl_
           3.3            0.2  ++++++++++++++++++++++++botocore.vendored.requests.packages.urllib3.util
           1.1            1    +++++++++++++++++++++++++url
          11.1            0.3  ++++++++++++++++++++++request
          10.8            0.3  +++++++++++++++++++++++filepost
           9.8            9.4  ++++++++++++++++++++++++uuid
           1.6            0.8  ++++++++++++++++++++++response
           1.1            0.9  +++++++++++++++++++++poolmanager
           2.2            1.5  +++++++++++++++++botocore.vendored.requests.packages.urllib3.contrib.pyopenssl
          20.8            0    ++++++++++++++++
          20.8            0.7  +++++++++++++++++botocore.vendored.requests.utils
           3.3            0.8  ++++++++++++++++++cgi
           2.5            0.7  +++++++++++++++++++html
           1.7            1.7  ++++++++++++++++++++html.entities
          13.6            0.4  ++++++++++++++++++compat
           4              3.2  +++++++++++++++++++urllib.request
           5.2            0    +++++++++++++++++++http
           5.2            5.1  ++++++++++++++++++++http.cookiejar
           3.3            3.3  +++++++++++++++++++http.cookies
           1.2            1.1  ++++++++++++++++++cookies
           2.3            0.8  ++++++++++++++++models
           1.1            0.5  +++++++++++++++++auth
           2.4            0.4  ++++++++++++++++api
           2.1            0    +++++++++++++++++
           2              1.1  ++++++++++++++++++botocore.vendored.requests.sessions
          12.4            2.2  ++++++++++++botocore.credentials
           8.8            0.8  +++++++++++++botocore.compat
           3.3            0    ++++++++++++++botocore.vendored
           3.2            3.1  +++++++++++++++botocore.vendored.six
           4.2            0.6  ++++++++++++++xml.etree.cElementTree
           3              1.1  +++++++++++++++xml.etree.ElementTree
           1.2            1.1  +++++++++++++botocore.utils
          41.8            1    ++++++++++++botocore.client
          29.6            0    +++++++++++++botocore
          29.6            1    ++++++++++++++botocore.waiter
           9.4            0.8  +++++++++++++++jmespath
           8.6            0.1  ++++++++++++++++jmespath
           8.6            2.1  +++++++++++++++++jmespath.parser
           3.4            0.1  ++++++++++++++++++jmespath
           3.3            1.1  +++++++++++++++++++jmespath.lexer
           2.2            1.3  ++++++++++++++++++++jmespath.exceptions
           2.3            0    ++++++++++++++++++jmespath
           2.3            0.8  +++++++++++++++++++jmespath.visitor
           1.4            0    ++++++++++++++++++++jmespath
           1.4            1.3  +++++++++++++++++++++jmespath.functions
          19.2            0.5  +++++++++++++++botocore.docs.docstring
          18.6            0.4  ++++++++++++++++botocore.docs
          18.1            0.7  +++++++++++++++++botocore.docs.service
           1.9            1.8  ++++++++++++++++++botocore.docs.utils
           3.3            0.5  ++++++++++++++++++botocore.docs.client
           2.1            0.5  +++++++++++++++++++botocore.docs.method
          10.9            0.8  ++++++++++++++++++botocore.docs.bcdoc.restdoc
           7.3            0.8  +++++++++++++++++++botocore.docs.bcdoc.docstringparser
           6.5            4.9  ++++++++++++++++++++html.parser
           1.6            1.5  +++++++++++++++++++++_markupbase
           2.3            2.2  +++++++++++++++++++botocore.docs.bcdoc.style
           1.4            0.9  +++++++++++++botocore.auth
           1.2            1    +++++++++++++botocore.awsrequest
           1.4            1.3  +++++++++++++botocore.hooks
           5.6            0.3  +++++++++++++botocore.args
           1.3            0.6  ++++++++++++++botocore.serialize
           3.3            0.3  ++++++++++++++botocore.config
           3              0.8  +++++++++++++++botocore.endpoint
           1.3            0.3  ++++++++++++++++botocore.response
           2.6            0    ++++++++++++botocore
           2.5            1.4  +++++++++++++botocore.handlers
           1              1    +++++++++++boto3.utils
           8.5            0.7  +++++++++++resources.factory
           6.6            0.4  ++++++++++++action
           4.7            0.4  +++++++++++++boto3.docs.docstring
           4.2            0.2  ++++++++++++++boto3.docs
           4              0.5  +++++++++++++++boto3.docs.service
           3.1            0.5  ++++++++++++++++boto3.docs.resource
           1.1            0.3  +++++++++++++++++boto3.docs.action
           9.9            1.1  +++++++++boto3.s3.transfer
           8.4            0.3  ++++++++++concurrent
           8.1            0.3  +++++++++++concurrent.futures
           1.4            1.3  ++++++++++++concurrent.futures._base
           6              0.7  ++++++++++++concurrent.futures.process
           2.3            0.4  +++++++++++++multiprocessing
           1.9            0    ++++++++++++++
           1.8            0.8  +++++++++++++++multiprocessing.context
           2.9            1    +++++++++++++multiprocessing.connection
           1.3            0    +++++++py.path
           1.3            0.8  ++++++++py
           3              1.9  +++++++py._path.local
          58.5            5.2  ++++pandas.core.series
           5.8            0    +++++pandas.core
           5.8            3.6  ++++++pandas.core.window
           2              1.9  +++++++pandas._libs.window
          46.3            0    +++++pandas.plotting._core
          46.2            0.6  ++++++pandas.plotting
          41              0    +++++++pandas.plotting
          40.9            1.3  ++++++++pandas.plotting._converter
          23.7            0.5  +++++++++matplotlib.units
          23.2            9    ++++++++++matplotlib
           2.5            1.5  +++++++++++distutils.sysconfig
           1              1    ++++++++++++errors
           2.9            2    +++++++++++matplotlib.cbook
           7.1            1.2  +++++++++++matplotlib.rcsetup
           2.3            2.3  ++++++++++++matplotlib.fontconfig_pattern
           2.9            1.5  ++++++++++++matplotlib.colors
           1.3            1.3  +++++++++++++_color_data
          15.3            1.3  +++++++++matplotlib.dates
           1.1            0.9  ++++++++++dateutil.rrule
          12.7            1.8  ++++++++++matplotlib.ticker
          10.9            0    +++++++++++matplotlib
          10.8            8.6  ++++++++++++matplotlib.transforms
           1.1            1    +++++++++++++path
           1.6            0.5  +++++++pandas.plotting._misc
           3              2.8  +++++++pandas.plotting._core
           7.8            0.4  ++++pandas.core.computation.eval
           6.4            3.9  +++++pandas.core.computation.expr
           1.6            0.7  ++++++pandas.core.computation.ops
           4.7            4.4  +++pandas.core.panel
           1.3            0    +++pandas._libs
           1.3            1.2  ++++pandas._libs.groupby
           2.1            1.8  ++pandas.core.panel4d
           8.9            0.9  ++pandas.core.reshape.reshape
           7              0.3  +++pandas.core.sparse.api
           3.2            2.2  ++++pandas.core.sparse.series
           3              2.9  ++++pandas.core.sparse.frame
           1.8            1.6  ++pandas.core.resample
           1.6            0.4  +pandas.stats.api
           1              1    ++pandas.stats.moments
           2.8            0.2  +pandas.core.reshape.api
           1.1            0.8  ++pandas.core.reshape.merge
          29              0.3  +pandas.io.api
           5.9            2.7  ++pandas.io.parsers
           2.6            2    +++pandas._libs.parsers
           3              1.5  ++pandas.io.excel
           1.1            1    +++pandas._libs.json
           5.8            3.5  ++pandas.io.pytables
           2              1.9  +++pandas.core.computation.pytables
           2.4            0.5  ++pandas.io.json
           1.8            1    +++json
           2              1.8  ++pandas.io.stata
           5.3            0.7  ++pandas.io.packers
           3.8            1.1  +++pandas.io.msgpack
          43.9            0.2  +pandas.util._tester
          43.7            6.1  ++pytest
           8.8            1.3  +++_pytest.config
           2.9            0.3  ++++_pytest._code
           2              1.2  +++++code
           2              0.4  ++++_pytest.hookspec
           1.5            0.2  +++++_pytest._pluggy
           1.3            1.1  ++++++_pytest.vendored_packages.pluggy
           2.3            0.5  ++++_pytest.assertion
           1.1            0    +++++_pytest.assertion
           1.1            1    ++++++_pytest.assertion.rewrite
           1.9            0.9  +++_pytest.main
           5.7            1.3  +++_pytest.python
           4.3            0    ++++_pytest
           4.3            1.8  +++++_pytest.fixtures
           1.8            1.1  ++++++py._code.code
           1.2            0.5  +++_pytest.unittest
           2.5            0.9  +++_pytest.capture
           1.4            0.7  ++++py._io.capture
           1              0.4  +++_pytest.tmpdir
          10.1            9.1  +++_pytest.junitxml
           3.3            0.3  +pandas.testing
           3              1.7  ++pandas.util.testing
           1.1            0    +++pandas._libs
           1.1            0.9  ++++pandas._libs.testing
@rockg

This comment has been minimized.

Show comment
Hide comment
@rockg

rockg Jun 23, 2017

Contributor

Also see #7282, but seems like already more attention here.

Contributor

rockg commented Jun 23, 2017

Also see #7282, but seems like already more attention here.

@jorisvandenbossche

This comment has been minimized.

Show comment
Hide comment
@jorisvandenbossche

jorisvandenbossche Jun 23, 2017

Member

Also see #7282, but seems like already more attention here.

It's a bit different issue, this is in general about reducing import time, the other issue is about a specific case where the import takes many seconds (but also numpy takes seconds to import, so IMO it's not pandas specific issue)

Member

jorisvandenbossche commented Jun 23, 2017

Also see #7282, but seems like already more attention here.

It's a bit different issue, this is in general about reducing import time, the other issue is about a specific case where the import takes many seconds (but also numpy takes seconds to import, so IMO it's not pandas specific issue)

@TomAugspurger TomAugspurger added this to the Next Major Release milestone Jul 12, 2017

@jreback jreback modified the milestones: Next Major Release, 0.21.0 Oct 2, 2017

jreback added a commit that referenced this issue Oct 2, 2017

kchomski-reef added a commit to reef-technologies/pandas that referenced this issue Oct 16, 2017

alanbato added a commit to alanbato/pandas that referenced this issue Nov 10, 2017

No-Stream added a commit to No-Stream/pandas that referenced this issue Nov 28, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment