Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Subsequent calls to df.sub() are much faster than the first call #34297

Closed
2 of 3 tasks
philippegr opened this issue May 21, 2020 · 8 comments · Fixed by #34354
Closed
2 of 3 tasks

BUG: Subsequent calls to df.sub() are much faster than the first call #34297

philippegr opened this issue May 21, 2020 · 8 comments · Fixed by #34354
Labels
MultiIndex Numeric Operations Arithmetic, Comparison, and Logical operations Performance Memory or execution speed performance
Milestone

Comments

@philippegr
Copy link

philippegr commented May 21, 2020

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

import numpy as np
import pandas as pd

# Building some general structure 
my_date_range = pd.date_range('20200101 00:00', '20200102 0:00', freq='S')
level_0_names = list(str(i) for i in range(30))
#level_0_names = list(range(30))
index = pd.MultiIndex.from_product([level_0_names, my_date_range])
column_names = ['col_1', 'col_2']

# Building a df that represents some value over time (think sensors)
# Indexed by sensor and time 
value_df = pd.DataFrame(np.random.rand(len(index),2), index=index, columns=column_names)

# Build a reference df for the reference value the sensor can take (like its max)
# Indexed by sensor
ref_df = pd.DataFrame(np.random.randint(1, 10, (len(level_0_names), 2)), 
                   index = level_0_names, 
                   columns=column_names)

# We now want to consider for each time index in value_df what is the deviation of the value observed wrt to the ref value 

# In a notebook, this first execution will be slow: 8-10s on my machine
# %%time 
value_df.sub(ref_df, level=0)

# This second execution will be fast: 100-150ms 
# %%time 
value_df.sub(ref_df, level=0)

# For reference, this is NOT the problem, the following lines would produce the same output
# On my machine it takes ~2s 
# %%time 
same_w_merge = pd.merge(left = value_df.reset_index(level=1), right = ref_df, right_index=True, left_index=True) 
same_w_merge['col_1_x'] -= same_w_merge['col_1_y']
same_w_merge['col_2_x'] -= same_w_merge['col_2_y']
same_w_merge = same_w_merge.drop(columns = ['col_1_y', 'col_2_y'])
same_w_merge = same_w_merge.rename({'col_1_x':'col_1', 'col_2_x': 'col_2'})
same_w_merge = same_w_merge.set_index('level_1', append=True).sort_index()

Problem description

There is a significant difference in speed between the first and second call to sub (which are the same instruction) in the code above. I don't understand where this is coming from. In particular why this is notably slower than merge (whose performance remains consistent).

Upon investigation, I noticed that the difference between runs is much smaller if value_df.index.level[0] is of type int (80ms for the first run 60ms for the subsequent)

Expected Output

Current output is correct, speed of first call is the issue here

Output of pd.show_versions()

Bug reproduced here on a conda/ OS X install for simplicity but can confirm it exists as well in a Ubuntu Based Docker

INSTALLED VERSIONS

commit : None
python : 3.7.7.final.0
python-bits : 64
OS : Darwin
OS-release : 18.7.0
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_CA.UTF-8
LOCALE : en_CA.UTF-8

pandas : 1.0.3
numpy : 1.18.1
pytz : 2020.1
dateutil : 2.8.1
pip : 20.0.2
setuptools : 46.4.0.post20200518
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.13.0
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pytest : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None
numba : None

@philippegr philippegr added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels May 21, 2020
@TomAugspurger
Copy link
Contributor

Is it from allocating the hashtable for the index(es)?

Can you profile the first and subsequent calls with snakeviz and see what differs?

@philippegr
Copy link
Author

philippegr commented May 21, 2020

Output of %prun for first call

        12965040 function calls (12964951 primitive calls) in 11.296 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  2592030    7.151    0.000    8.919    0.000 datetimes.py:476(<lambda>)
        1    1.316    1.316    1.316    1.316 {pandas._libs.lib.fast_zip}
  2592035    0.917    0.000    1.472    0.000 datetimes.py:500(tz)
        1    0.784    0.784    9.704    9.704 {pandas._libs.lib.map_infer}
  2592038    0.299    0.000    0.299    0.000 datetimes.py:478(dtype)
  2592036    0.296    0.000    0.296    0.000 datetimelike.py:969(freq)
2592487/2592481    0.256    0.000    0.256    0.000 {built-in method builtins.getattr}
        1    0.039    0.039    0.039    0.039 {built-in method _operator.sub}
        5    0.035    0.007   11.118    2.224 base.py:853(_ndarray_values)
        1    0.034    0.034    0.034    0.034 {pandas._libs.algos.take_2d_axis0_float64_float64}
       13    0.034    0.003    0.034    0.003 {pandas._libs.algos.ensure_int64}
        3    0.032    0.011    0.032    0.011 {pandas._libs.algos.take_1d_int64_int64}
        1    0.031    0.031    0.031    0.031 {pandas._libs.algos.take_2d_axis0_int64_int64}
       17    0.016    0.001    0.016    0.001 {built-in method numpy.empty}
        1    0.011    0.011   11.296   11.296 <string>:1(<module>)
        5    0.011    0.002    0.011    0.002 {pandas._libs.algos.take_1d_object_object}
        1    0.006    0.006    0.050    0.050 base.py:3445(_join_level)
        2    0.004    0.002    0.061    0.030 multi.py:1535(_get_level_values)
        1    0.004    0.004   11.242   11.242 generic.py:8408(align)
        4    0.003    0.001    0.003    0.001 {built-in method numpy.arange}
       10    0.002    0.000    0.161    0.016 algorithms.py:1565(take_nd)
        1    0.002    0.002    0.002    0.002 {pandas._libs.algos.ensure_int8}
        1    0.002    0.002   11.083   11.083 multi.py:1339(values)
      2/1    0.002    0.001    0.052    0.052 base.py:3243(join)
        2    0.001    0.001    0.001    0.001 {built-in method numpy.core._multiarray_umath.implement_array_function}
        5    0.001    0.000    0.001    0.000 {method 'reduce' of 'numpy.ufunc' objects}
      737    0.000    0.000    0.000    0.000 {built-in method builtins.isinstance}
       13    0.000    0.000    0.000    0.000 {built-in method posix.stat}
    42/40    0.000    0.000    0.000    0.000 {built-in method numpy.array}
      278    0.000    0.000    0.000    0.000 generic.py:10(_check)
       10    0.000    0.000    0.000    0.000 algorithms.py:1436(_get_take_nd_function)
        2    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap_external>:914(get_data)
       32    0.000    0.000    0.000    0.000 _dtype.py:333(_name_get)
      3/1    0.000    0.000   11.296   11.296 {built-in method builtins.exec}
       55    0.000    0.000    0.000    0.000 common.py:1565(is_extension_array_dtype)
        4    0.000    0.000    0.000    0.000 cast.py:347(maybe_promote)
  248/178    0.000    0.000    0.000    0.000 {built-in method builtins.len}
       74    0.000    0.000    0.000    0.000 common.py:1708(_is_dtype_type)
        9    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap_external>:1356(find_spec)
        3    0.000    0.000    0.000    0.000 managers.py:212(_rebuild_blknos_and_blklocs)
       60    0.000    0.000    0.000    0.000 dtypes.py:75(find)
        2    0.000    0.000    0.000    0.000 {built-in method marshal.loads}
      3/2    0.000    0.000    0.000    0.000 base.py:276(__new__)
      100    0.000    0.000    0.000    0.000 {built-in method builtins.hasattr}
        4    0.000    0.000    0.001    0.000 series.py:183(__init__)
        4    0.000    0.000    0.000    0.000 base.py:1588(is_monotonic_increasing)
        4    0.000    0.000    0.001    0.000 generic.py:5519(dtypes)
        5    0.000    0.000    0.000    0.000 {pandas._libs.lib.infer_dtype}
       59    0.000    0.000    0.000    0.000 base.py:247(is_dtype)
      264    0.000    0.000    0.000    0.000 {built-in method builtins.issubclass}
       31    0.000    0.000    0.000    0.000 common.py:1844(pandas_dtype)
        2    0.000    0.000    0.002    0.001 dispatch.py:48(should_series_dispatch)
        3    0.000    0.000    0.000    0.000 {method 'get_indexer' of 'pandas._libs.index.IndexEngine' objects}
        2    0.000    0.000    0.068    0.034 generic.py:4584(_reindex_with_indexers)
        1    0.000    0.000   11.239   11.239 generic.py:8491(_align_frame)
        2    0.000    0.000    0.068    0.034 blocks.py:1271(take_nd)
       24    0.000    0.000    0.000    0.000 {pandas._libs.lib.is_list_like}
        3    0.000    0.000    0.000    0.000 blocks.py:343(ftype)
       32    0.000    0.000    0.000    0.000 _dtype.py:319(_name_includes_bit_suffix)
       17    0.000    0.000    0.000    0.000 frozen.py:66(__getitem__)
        3    0.000    0.000    0.000    0.000 base.py:2706(get_indexer)
        3    0.000    0.000    0.000    0.000 missing.py:402(array_equivalent)
        1    0.000    0.000    0.001    0.001 base.py:2277(_union)
       33    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:1009(_handle_fromlist)
       18    0.000    0.000    0.000    0.000 {method 'view' of 'numpy.ndarray' objects}
       24    0.000    0.000    0.000    0.000 common.py:222(is_object_dtype)
       24    0.000    0.000    0.000    0.000 common.py:1672(_get_dtype)
       13    0.000    0.000    0.000    0.000 numerictypes.py:365(issubdtype)
        5    0.000    0.000    0.000    0.000 blocks.py:2981(get_block_type)
        7    0.000    0.000    0.000    0.000 blocks.py:3027(make_block)
       26    0.000    0.000    0.000    0.000 numerictypes.py:293(issubclass_)
       45    0.000    0.000    0.000    0.000 base.py:615(__len__)
        9    0.000    0.000    0.000    0.000 generic.py:5276(__setattr__)
        3    0.000    0.000    0.000    0.000 managers.py:329(_verify_integrity)
        2    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap_external>:793(get_code)
        7    0.000    0.000    0.000    0.000 blocks.py:118(__init__)
        4    0.000    0.000    0.000    0.000 cast.py:1209(maybe_cast_to_datetime)
        7    0.000    0.000    0.000    0.000 generic.py:190(__init__)
        4    0.000    0.000    0.000    0.000 construction.py:388(sanitize_array)
        4    0.000    0.000    0.000    0.000 managers.py:248(get_dtypes)
       18    0.000    0.000    0.000    0.000 {method 'format' of 'str' objects}
       47    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap_external>:58(<listcomp>)
       47    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap_external>:56(_path_join)
        3    0.000    0.000    0.000    0.000 frame.py:414(__init__)
       13    0.000    0.000    0.000    0.000 dtypes.py:917(is_dtype)
       21    0.000    0.000    0.000    0.000 common.py:403(is_datetime64tz_dtype)
        4    0.000    0.000    0.000    0.000 managers.py:798(as_array)
       24    0.000    0.000    0.000    0.000 {built-in method _abc._abc_instancecheck}
        3    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:882(_find_spec)
       12    0.000    0.000    0.000    0.000 dtypes.py:1124(is_dtype)
        1    0.000    0.000    0.000    0.000 base.py:1656(is_unique)
        3    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap_external>:1240(_get_spec)
       29    0.000    0.000    0.000    0.000 _asarray.py:16(asarray)
       62    0.000    0.000    0.000    0.000 common.py:208(<lambda>)
        4    0.000    0.000    0.000    0.000 construction.py:506(_try_cast)
        9    0.000    0.000    0.000    0.000 managers.py:163(shape)
        8    0.000    0.000    0.000    0.000 managers.py:199(_is_single_block)
        1    0.000    0.000    0.000    0.000 {built-in method posix.getcwd}
        1    0.000    0.000    0.013    0.013 algorithms.py:1276(wrapper)
        1    0.000    0.000   11.285   11.285 __init__.py:751(f)
        1    0.000    0.000    0.000    0.000 datetimes.py:1976(_validate_dt64_dtype)
        2    0.000    0.000    0.000    0.000 frequencies.py:74(to_offset)
        1    0.000    0.000    0.041    0.041 frame.py:5281(_combine_frame)
        2    0.000    0.000    0.068    0.034 managers.py:1224(reindex_indexer)
        1    0.000    0.000    0.001    0.001 expressions.py:7(<module>)
      3/1    0.000    0.000    0.001    0.001 <frozen importlib._bootstrap>:978(_find_and_load)
        4    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap_external>:271(cache_from_source)
        3    0.000    0.000    0.000    0.000 _dtype.py:46(__str__)
        1    0.000    0.000    0.000    0.000 {pandas._libs.lib.get_reverse_indexer}
        4    0.000    0.000    0.000    0.000 cast.py:1088(maybe_castable)
        1    0.000    0.000    9.704    9.704 datetimelike.py:616(astype)
        8    0.000    0.000    0.000    0.000 generic.py:5331(_protect_consolidate)
        4    0.000    0.000    0.000    0.000 generic.py:5412(values)
        2    0.000    0.000    0.000    0.000 datetimes.py:259(_simple_new)
        2    0.000    0.000    0.000    0.000 datetimelike.py:598(_shallow_copy)
        4    0.000    0.000    0.000    0.000 base.py:472(_simple_new)
        3    0.000    0.000    0.001    0.000 managers.py:122(__init__)
        2    0.000    0.000    0.000    0.000 {method 'read' of '_io.FileIO' objects}
      3/1    0.000    0.000    0.001    0.001 <frozen importlib._bootstrap>:948(_find_and_load_unlocked)
        2    0.000    0.000    0.000    0.000 base.py:526(_shallow_copy)
       11    0.000    0.000    0.000    0.000 base.py:5393(maybe_extract_name)
       10    0.000    0.000    0.000    0.000 multi.py:1175(__len__)
        1    0.000    0.000   11.118   11.118 multi.py:3101(equals)
        4    0.000    0.000    0.000    0.000 managers.py:1467(__init__)
       27    0.000    0.000    0.000    0.000 {method 'get' of 'dict' objects}
       32    0.000    0.000    0.000    0.000 _dtype.py:36(_kind_name)
        4    0.000    0.000    0.000    0.000 _ufunc_config.py:39(seterr)
       62    0.000    0.000    0.000    0.000 common.py:206(classes)
       14    0.000    0.000    0.000    0.000 common.py:441(is_timedelta64_dtype)
        2    0.000    0.000    0.000    0.000 datetimes.py:286(_simple_new)
        1    0.000    0.000    0.002    0.002 multi.py:240(__new__)
        1    0.000    0.000    0.000    0.000 multi.py:1288(_get_level_number)
        7    0.000    0.000    0.000    0.000 blocks.py:251(mgr_locs)
       98    0.000    0.000    0.000    0.000 {method 'rstrip' of 'str' objects}
        7    0.000    0.000    0.000    0.000 _internal.py:830(npy_ctypes_check)
       15    0.000    0.000    0.000    0.000 common.py:372(is_datetime64_dtype)
        2    0.000    0.000    0.000    0.000 base.py:4046(equals)
        4    0.000    0.000    0.000    0.000 indexing.py:1754(__getitem__)
       15    0.000    0.000    0.000    0.000 common.py:542(is_categorical_dtype)
       13    0.000    0.000    0.000    0.000 base.py:3638(values)
       21    0.000    0.000    0.000    0.000 base.py:5293(ensure_index)
        4    0.000    0.000    0.000    0.000 indexing.py:2116(_getitem_axis)
        3    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:157(_get_module_lock)
       20    0.000    0.000    0.000    0.000 inference.py:358(is_hashable)
        5    0.000    0.000    0.000    0.000 common.py:252(is_sparse)
       13    0.000    0.000    0.000    0.000 common.py:506(is_interval_dtype)
        9    0.000    0.000    0.000    0.000 common.py:1401(is_float_dtype)
        1    0.000    0.000    9.705    9.705 extension.py:285(astype)
        1    0.000    0.000    0.039    0.039 expressions.py:65(_evaluate_standard)
       14    0.000    0.000    0.000    0.000 common.py:472(is_period_dtype)
        9    0.000    0.000    0.000    0.000 common.py:685(is_dtype_equal)
        1    0.000    0.000    0.000    0.000 algorithms.py:1940(safe_sort)
        3    0.000    0.000    0.000    0.000 managers.py:655(_consolidate_check)
        2    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:504(_init_module_attrs)
        6    0.000    0.000    0.000    0.000 {method 'fill' of 'numpy.ndarray' objects}
        4    0.000    0.000    0.000    0.000 _ufunc_config.py:139(geterr)
        1    0.000    0.000    0.000    0.000 {method 'to_datetime64' of 'pandas._libs.tslibs.nattype._NaT' objects}
        2    0.000    0.000    0.000    0.000 {pandas._libs.lib.array_equivalent_object}
       14    0.000    0.000    0.000    0.000 construction.py:337(extract_array)
        1    0.000    0.000    0.000    0.000 frequencies.py:204(_get_offset)
        2    0.000    0.000    0.002    0.001 multi.py:3499(_coerce_indexer_frozen)
        2    0.000    0.000    0.068    0.034 managers.py:1260(<listcomp>)
        4    0.000    0.000    0.000    0.000 common.py:99(is_bool_indexer)
        3    0.000    0.000    0.000    0.000 common.py:219(asarray_tuplesafe)
        1    0.000    0.000    0.000    0.000 datetimes.py:211(__init__)
        4    0.000    0.000    0.000    0.000 base.py:509(<dictcomp>)
        4    0.000    0.000    0.000    0.000 {built-in method pandas._libs.index.get_value_at}
        1    0.000    0.000    0.000    0.000 multi.py:668(_set_levels)
       15    0.000    0.000    0.000    0.000 multi.py:794(codes)
        1    0.000    0.000    0.000    0.000 range.py:83(__new__)
       17    0.000    0.000    0.000    0.000 {function FrozenList.__getitem__ at 0x11466cb90}
      2/1    0.000    0.000    0.001    0.001 <frozen importlib._bootstrap>:663(_load_unlocked)
       24    0.000    0.000    0.000    0.000 abc.py:137(__instancecheck__)
        4    0.000    0.000    0.000    0.000 cast.py:503(_ensure_dtype_type)
        2    0.000    0.000    0.002    0.001 cast.py:757(coerce_indexer_dtype)
        1    0.000    0.000    0.000    0.000 offsets.py:2580(__eq__)
        8    0.000    0.000    0.000    0.000 base.py:592(_reset_identity)
       31    0.000    0.000    0.000    0.000 blocks.py:247(mgr_locs)
        1    0.000    0.000    0.000    0.000 construction.py:123(init_ndarray)
       49    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:222(_verbose_message)
        1    0.000    0.000    0.000    0.000 {method 'argsort' of 'numpy.ndarray' objects}
        4    0.000    0.000    0.000    0.000 {built-in method numpy.seterrobj}
        5    0.000    0.000    0.000    0.000 {pandas._libs.lib.values_from_object}
        6    0.000    0.000    0.000    0.000 common.py:775(is_integer_dtype)
        1    0.000    0.000    0.040    0.040 array_ops.py:126(na_arithmetic_op)
        1    0.000    0.000    0.000    0.000 base.py:5387(default_index)
        4    0.000    0.000    0.000    0.000 multi.py:1178(_get_names)
        4    0.000    0.000    0.000    0.000 series.py:376(_set_axis)
        4    0.000    0.000    0.000    0.000 series.py:844(_ixs)
       51    0.000    0.000    0.000    0.000 {method 'join' of 'str' objects}
       24    0.000    0.000    0.000    0.000 {method 'rpartition' of 'str' objects}
       20    0.000    0.000    0.000    0.000 {built-in method builtins.hash}
        6    0.000    0.000    0.000    0.000 {built-in method posix.fspath}
        3    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:78(acquire)
       13    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap_external>:74(_path_stat)
        2    0.000    0.000    0.000    0.000 common.py:608(is_excluded_dtype)
        3    0.000    0.000    0.000    0.000 common.py:1435(is_bool_dtype)
        1    0.000    0.000    0.000    0.000 frame.py:5320(_construct_result)
        1    0.000    0.000    0.000    0.000 multi.py:1181(_set_names)
        4    0.000    0.000    0.000    0.000 indexing.py:627(_get_loc)
        4    0.000    0.000    0.000    0.000 indexing.py:2045(_validate_integer)
        4    0.000    0.000    0.000    0.000 blocks.py:2591(__init__)
       10    0.000    0.000    0.000    0.000 {built-in method __new__ of type object at 0x102423568}
        5    0.000    0.000    0.000    0.000 {built-in method builtins.any}
        3    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:58(__init__)
        2    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap_external>:369(_get_cached)
        2    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap_external>:574(spec_from_file_location)
       10    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap_external>:1203(_path_importer_cache)
        2    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap_external>:1351(_get_spec)
        1    0.000    0.000    0.000    0.000 {method 'nonzero' of 'numpy.ndarray' objects}
        4    0.000    0.000    0.000    0.000 base.py:505(_get_attributes_dict)
        3    0.000    0.000    0.000    0.000 base.py:573(is_)
        3    0.000    0.000    0.000    0.000 base.py:1730(inferred_type)
        1    0.000    0.000    0.001    0.001 base.py:2217(union)
        4    0.000    0.000    0.000    0.000 generic.py:396(_get_axis_number)
        4    0.000    0.000    0.000    0.000 generic.py:409(_get_axis_name)
        4    0.000    0.000    0.000    0.000 generic.py:422(_get_axis)
        4    0.000    0.000    0.000    0.000 generic.py:426(_get_block_manager_axis)
        4    0.000    0.000    0.000    0.000 generic.py:5257(__getattr__)
        1    0.000    0.000    0.002    0.002 multi.py:798(_set_codes)
        9    0.000    0.000    0.000    0.000 blocks.py:339(dtype)
       11    0.000    0.000    0.000    0.000 managers.py:647(is_consolidated)
        4    0.000    0.000    0.000    0.000 series.py:428(name)
        1    0.000    0.000    0.000    0.000 {method 'split' of 're.Pattern' objects}
        6    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap_external>:51(_r_long)
        4    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap_external>:62(_path_split)
        1    0.000    0.000    0.000    0.000 re.py:271(_compile)
        2    0.000    0.000    0.000    0.000 _ufunc_config.py:446(__exit__)
       10    0.000    0.000    0.000    0.000 {pandas._libs.lib.is_integer}
        3    0.000    0.000    0.000    0.000 missing.py:132(_isna_new)
        5    0.000    0.000    0.000    0.000 common.py:339(is_categorical)
        4    0.000    0.000    0.000    0.000 common.py:330(apply_if_callable)
        4    0.000    0.000    0.000    0.000 construction.py:570(is_empty_data)
        1    0.000    0.000    9.704    9.704 datetimelike.py:434(_box_values)
        3    0.000    0.000    0.000    0.000 base.py:1667(is_boolean)
        1    0.000    0.000    0.000    0.000 base.py:2351(_wrap_setop_result)
        4    0.000    0.000    0.000    0.000 generic.py:5235(__finalize__)
        3    0.000    0.000    0.000    0.000 multi.py:684(<genexpr>)
        7    0.000    0.000    0.000    0.000 blocks.py:129(_check_ndim)
        2    0.000    0.000    0.000    0.000 blocks.py:275(make_block_same_class)
       11    0.000    0.000    0.000    0.000 managers.py:232(items)
        2    0.000    0.000    0.000    0.000 {built-in method builtins.sorted}
        3    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:103(release)
        3    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:147(__enter__)
        2    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:576(module_from_spec)
        2    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap_external>:438(_classify_pyc)
        1    0.000    0.000    0.001    0.001 numeric.py:283(full)
        1    0.000    0.000    0.000    0.000 _methods.py:47(_all)
        4    0.000    0.000    0.000    0.000 {method 'transpose' of 'numpy.ndarray' objects}
       12    0.000    0.000    0.000    0.000 common.py:216(<lambda>)
        6    0.000    0.000    0.000    0.000 common.py:613(<genexpr>)
        6    0.000    0.000    0.000    0.000 common.py:987(is_datetime64_any_dtype)
        1    0.000    0.000    0.000    0.000 common.py:183(all_none)
        1    0.000    0.000    0.000    0.000 __init__.py:625(_align_method_FRAME)
        1    0.000    0.000    9.704    9.704 datetimes.py:579(astype)
        1    0.000    0.000    0.000    0.000 base.py:602(_engine)
       10    0.000    0.000    0.000    0.000 base.py:1182(name)
        3    0.000    0.000    0.000    0.000 base.py:1186(name)
        5    0.000    0.000    0.000    0.000 base.py:3700(_internal_get_values)
        1    0.000    0.000    0.000    0.000 base.py:5463(_maybe_cast_data_without_dtype)
        2    0.000    0.000    0.000    0.000 generic.py:219(_init_mgr)
        4    0.000    0.000    0.000    0.000 generic.py:255(_validate_dtype)
        4    0.000    0.000    0.000    0.000 generic.py:5371(_is_mixed_type)
        2    0.000    0.000    0.000    0.000 datetimelike.py:895(_delegate_property_get)
        1    0.000    0.000    0.000    0.000 range.py:131(_simple_new)
       27    0.000    0.000    0.000    0.000 managers.py:165(<genexpr>)
        4    0.000    0.000    0.000    0.000 managers.py:249(<listcomp>)
        1    0.000    0.000    0.000    0.000 managers.py:1643(create_block_manager_from_blocks)
        4    0.000    0.000    0.000    0.000 series.py:480(_values)
        1    0.000    0.000    0.000    0.000 check.py:1(<module>)
        4    0.000    0.000    0.000    0.000 {method 'update' of 'dict' objects}
       16    0.000    0.000    0.000    0.000 {built-in method _imp.release_lock}
       10    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:859(__exit__)
        2    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap_external>:523(_compile_bytecode)
      2/1    0.000    0.000    0.001    0.001 <frozen importlib._bootstrap_external>:722(exec_module)
        3    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap_external>:1272(find_spec)
        1    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(min_scalar_type)
        1    0.000    0.000    0.001    0.001 <__array_function__ internals>:2(copyto)
        4    0.000    0.000    0.000    0.000 {method 'any' of 'numpy.ndarray' objects}
        2    0.000    0.000    0.000    0.000 _ufunc_config.py:437(__init__)
        3    0.000    0.000    0.000    0.000 {built-in method pandas._libs.missing.checknull}
        2    0.000    0.000    0.000    0.000 common.py:575(is_string_dtype)
        3    0.000    0.000    0.000    0.000 common.py:887(is_unsigned_integer_dtype)
        4    0.000    0.000    0.000    0.000 cast.py:1483(construct_1d_ndarray_preserving_na)
        1    0.000    0.000    0.000    0.000 __init__.py:101(get_op_result_name)
        2    0.000    0.000    0.000    0.000 base.py:621(__array__)
        1    0.000    0.000   11.242   11.242 frame.py:3810(align)
        4    0.000    0.000    0.000    0.000 generic.py:491(_info_axis)
        4    0.000    0.000    0.000    0.000 generic.py:5373(<lambda>)
        2    0.000    0.000    0.000    0.000 multi.py:999(dtype)
        4    0.000    0.000    0.000    0.000 blocks.py:225(get_values)
        3    0.000    0.000    0.000    0.000 managers.py:128(<listcomp>)
       15    0.000    0.000    0.000    0.000 managers.py:167(ndim)
        4    0.000    0.000    0.000    0.000 managers.py:660(is_mixed_type)
        4    0.000    0.000    0.000    0.000 managers.py:1565(internal_values)
        3    0.000    0.000    0.000    0.000 {built-in method builtins.sum}
        9    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap_external>:36(_relax_case)
        2    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:318(__exit__)
        3    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:792(find_spec)
       10    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:855(__enter__)
        1    0.000    0.000    0.000    0.000 _collections_abc.py:72(_check_methods)
        1    0.000    0.000    0.000    0.000 re.py:205(split)
        1    0.000    0.000    0.000    0.000 {method 'all' of 'numpy.ndarray' objects}
        8    0.000    0.000    0.000    0.000 {built-in method numpy.geterrobj}
        2    0.000    0.000    0.000    0.000 _ufunc_config.py:441(__enter__)
        9    0.000    0.000    0.000    0.000 {pandas._libs.lib.is_scalar}
        3    0.000    0.000    0.000    0.000 common.py:830(is_signed_integer_dtype)
        2    0.000    0.000    0.000    0.000 common.py:1647(_is_dtype)
        1    0.000    0.000    0.000    0.000 __init__.py:681(_should_reindex_frame_op)
        2    0.000    0.000    0.000    0.000 missing.py:136(dispatch_fill_zeros)
        2    0.000    0.000    0.000    0.000 accessor.py:84(_getter)
        4    0.000    0.000    0.000    0.000 base.py:1581(is_monotonic)
        3    0.000    0.000    0.000    0.000 base.py:4505(_maybe_promote)
        2    0.000    0.000    0.000    0.000 base.py:5409(_maybe_cast_with_dtype)
        4    0.000    0.000    0.000    0.000 generic.py:5344(f)
        3    0.000    0.000    0.002    0.001 multi.py:809(<genexpr>)
        4    0.000    0.000    0.000    0.000 indexing.py:93(iloc)
        4    0.000    0.000    0.000    0.000 managers.py:927(consolidate)
        2    0.000    0.000    0.000    0.000 {method 'endswith' of 'str' objects}
        2    0.000    0.000    0.000    0.000 {built-in method _imp._fix_co_filename}
        1    0.000    0.000    0.000    0.000 {built-in method _abc._abc_subclasscheck}
        2    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:35(_new_module)
        3    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:176(cb)
        8    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:321(<genexpr>)
        4    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:403(cached)
        3    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:719(find_spec)
        1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:994(_gcd_import)
        2    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap_external>:471(_validate_timestamp_pyc)
        2    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap_external>:951(path_stats)
        1    0.000    0.000    0.000    0.000 __init__.py:109(import_module)
        2    0.000    0.000    0.000    0.000 _dtype.py:190(_datetime_metadata_str)
        4    0.000    0.000    0.000    0.000 {method 'ravel' of 'numpy.ndarray' objects}
        1    0.000    0.000    0.000    0.000 {method 'take' of 'numpy.ndarray' objects}
        1    0.000    0.000    0.000    0.000 config.py:83(_get_single_key)
        3    0.000    0.000    0.000    0.000 missing.py:49(isna)
        4    0.000    0.000    0.000    0.000 inference.py:96(is_iterator)
        2    0.000    0.000    0.000    0.000 common.py:179(ensure_python_int)
        2    0.000    0.000    0.000    0.000 common.py:605(condition)
        4    0.000    0.000    0.000    0.000 indexers.py:23(is_list_like_indexer)
        1    0.000    0.000    0.000    0.000 __init__.py:124(_maybe_match_name)
        4    0.000    0.000    0.000    0.000 missing.py:73(clean_fill_method)
        3    0.000    0.000    0.000    0.000 missing.py:601(clean_reindex_fill_method)
        1    0.000    0.000    0.000    0.000 _optional.py:47(import_optional_dependency)
        1    0.000    0.000    0.000    0.000 datetimes.py:549(__array__)
        4    0.000    0.000    0.000    0.000 base.py:638(dtype)
        2    0.000    0.000    0.000    0.000 generic.py:515(ndim)
        1    0.000    0.000    0.000    0.000 generic.py:3294(_clear_item_cache)
        4    0.000    0.000    0.000    0.000 generic.py:5341(_consolidate_inplace)
        4    0.000    0.000    0.000    0.000 multi.py:1881(nlevels)
        6    0.000    0.000    0.000    0.000 range.py:675(__len__)
        8    0.000    0.000    0.000    0.000 managers.py:314(__len__)
        3    0.000    0.000    0.000    0.000 managers.py:656(<listcomp>)
        7    0.000    0.000    0.000    0.000 managers.py:943(_consolidate_inplace)
        1    0.000    0.000    0.000    0.000 construction.py:260(prep_ndarray)
        1    0.000    0.000    0.000    0.000 construction.py:417(_get_axes)
        1    0.000    0.000    0.039    0.039 expressions.py:191(evaluate)
        1    0.000    0.000    0.000    0.000 {method 'index' of 'list' objects}
        6    0.000    0.000    0.000    0.000 {built-in method from_bytes}
        2    0.000    0.000    0.000    0.000 {method 'split' of 'str' objects}
        2    0.000    0.000    0.000    0.000 {method 'strip' of 'str' objects}
        2    0.000    0.000    0.000    0.000 {built-in method builtins.setattr}
       16    0.000    0.000    0.000    0.000 {built-in method _imp.acquire_lock}
        1    0.000    0.000    0.000    0.000 {built-in method _imp.is_builtin}
        3    0.000    0.000    0.000    0.000 {built-in method _imp.is_frozen}
        6    0.000    0.000    0.000    0.000 {built-in method _thread.allocate_lock}
        6    0.000    0.000    0.000    0.000 {built-in method _thread.get_ident}
        3    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:143(__init__)
        3    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:151(__exit__)
        3    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:369(__init__)
        2    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:416(parent)
        2    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:424(has_location)
        1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:873(_find_spec_legacy)
        2    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap_external>:84(_path_is_mode_type)
        2    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap_external>:93(_path_isfile)
        2    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap_external>:401(_check_name_wrapper)
        4    0.000    0.000    0.000    0.000 _methods.py:44(_any)
        2    0.000    0.000    0.000    0.000 {built-in method numpy.datetime_data}
        1    0.000    0.000    0.000    0.000 config.py:101(_get_option)
        1    0.000    0.000    0.000    0.000 config.py:230(__call__)
        1    0.000    0.000    0.000    0.000 config.py:533(_select_options)
        1    0.000    0.000    0.000    0.000 config.py:551(_get_root)
        1    0.000    0.000    0.000    0.000 config.py:607(_warn_if_deprecated)
        4    0.000    0.000    0.000    0.000 {pandas._libs.lib.item_from_zerodim}
        1    0.000    0.000    0.000    0.000 common.py:1027(is_datetime64_ns_dtype)
        2    0.000    0.000    0.000    0.000 common.py:187(<genexpr>)
        1    0.000    0.000    0.040    0.040 array_ops.py:120(na_op)
        1    0.000    0.000    0.000    0.000 datetimelike.py:443(asi8)
        1    0.000    0.000    0.000    0.000 datetimelike.py:484(__array__)
        3    0.000    0.000    0.000    0.000 base.py:609(<lambda>)
        1    0.000    0.000    0.000    0.000 base.py:2194(_is_compatible_with_other)
        7    0.000    0.000    0.000    0.000 base.py:3670(_values)
        3    0.000    0.000    0.000    0.000 frame.py:399(_constructor)
        4    0.000    0.000    0.000    0.000 generic.py:238(attrs)
        1    0.000    0.000    0.000    0.000 generic.py:660(_set_axis)
        4    0.000    0.000    0.000    0.000 extension.py:57(fget)
        4    0.000    0.000    0.000    0.000 blocks.py:213(internal_values)
        3    0.000    0.000    0.000    0.000 blocks.py:335(shape)
        1    0.000    0.000    0.000    0.000 managers.py:171(set_axis)
        6    0.000    0.000    0.000    0.000 managers.py:331(<genexpr>)
        4    0.000    0.000    0.000    0.000 managers.py:1520(_block)
        4    0.000    0.000    0.000    0.000 series.py:403(_set_subtyp)
        4    0.000    0.000    0.000    0.000 series.py:432(name)
        1    0.000    0.000    0.000    0.000 expressions.py:160(_has_bool_dtype)
        1    0.000    0.000    0.000    0.000 {method 'count' of 'list' objects}
        2    0.000    0.000    0.000    0.000 {method 'pop' of 'dict' objects}
        3    0.000    0.000    0.000    0.000 {method 'keys' of 'dict' objects}
        2    0.000    0.000    0.000    0.000 {method 'items' of 'dict' objects}
        1    0.000    0.000    0.000    0.000 {method 'clear' of 'dict' objects}
        1    0.000    0.000    0.000    0.000 {method 'lower' of 'str' objects}
        1    0.000    0.000    0.000    0.000 {method 'upper' of 'str' objects}
        2    0.000    0.000    0.000    0.000 {method 'startswith' of 'str' objects}
        1    0.000    0.000    0.000    0.000 {built-in method builtins.all}
      2/1    0.000    0.000    0.001    0.001 <frozen importlib._bootstrap>:211(_call_with_frames_removed)
        2    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:307(__init__)
        2    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:311(__enter__)
        1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:929(_sanity_check)
        2    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap_external>:884(__init__)
        2    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap_external>:909(get_filename)
        1    0.000    0.000    0.000    0.000 abc.py:141(__subclasscheck__)
        1    0.000    0.000    0.000    0.000 _collections_abc.py:252(__subclasshook__)
        1    0.000    0.000    0.000    0.000 six.py:184(find_module)
        1    0.000    0.000    0.000    0.000 multiarray.py:1043(copyto)
        2    0.000    0.000    0.000    0.000 config.py:566(_get_deprecated_option)
        1    0.000    0.000    0.000    0.000 config.py:594(_translate_key)
        4    0.000    0.000    0.000    0.000 {pandas._libs.lib.is_float}
        5    0.000    0.000    0.000    0.000 {pandas._libs.algos.ensure_platform_int}
        1    0.000    0.000    0.000    0.000 {pandas._libs.algos.ensure_int32}
        4    0.000    0.000    0.000    0.000 inference.py:220(is_array_like)
       12    0.000    0.000    0.000    0.000 common.py:211(classes_and_not_datetimelike)
        2    0.000    0.000    0.000    0.000 datetimes.py:55(tz_to_dtype)
        3    0.000    0.000    0.000    0.000 base.py:61(_reset_cache)
        1    0.000    0.000    0.000    0.000 datetimes.py:474(_box_func)
        1    0.000    0.000    0.000    0.000 datetimelike.py:495(__len__)
        1    0.000    0.000    0.000    0.000 base.py:498(_constructor)
        1    0.000    0.000    0.000    0.000 base.py:2581(_assert_can_do_setop)
        1    0.000    0.000    0.000    0.000 base.py:5382(_validate_join_method)
        2    0.000    0.000    0.000    0.000 blocks.py:243(fill_value)
        1    0.000    0.000    0.000    0.000 expressions.py:40(set_use_numexpr)
        1    0.000    0.000    0.000    0.000 expressions.py:169(_bool_arith_check)
        2    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
        4    0.000    0.000    0.000    0.000 {built-in method builtins.callable}
        2    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap_external>:719(create_module)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.000    0.000    0.000    0.000 multiarray.py:584(min_scalar_type)
        4    0.000    0.000    0.000    0.000 {pandas._libs.algos.ensure_object}
        1    0.000    0.000    0.000    0.000 base.py:2210(_validate_sort_keyword)
        1    0.000    0.000    0.000    0.000 numeric.py:83(_validate_dtype)

@philippegr
Copy link
Author

And for the second call

         3591 function calls (3517 primitive calls) in 0.100 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.018    0.018    0.018    0.018 {pandas._libs.algos.take_2d_axis0_float64_float64}
        1    0.017    0.017    0.017    0.017 {pandas._libs.algos.take_2d_axis0_int64_int64}
        1    0.013    0.013    0.013    0.013 {built-in method _operator.sub}
        1    0.011    0.011    0.100    0.100 <string>:1(<module>)
        2    0.010    0.005    0.010    0.005 {pandas._libs.algos.take_1d_int64_int64}
       11    0.007    0.001    0.007    0.001 {pandas._libs.algos.ensure_int64}
        1    0.006    0.006    0.030    0.030 base.py:3445(_join_level)
        1    0.004    0.004    0.074    0.074 generic.py:8408(align)
        4    0.003    0.001    0.003    0.001 {built-in method numpy.arange}
        1    0.002    0.002    0.002    0.002 {pandas._libs.algos.ensure_int8}
        8    0.002    0.000    0.055    0.007 algorithms.py:1565(take_nd)
      2/1    0.002    0.001    0.032    0.032 base.py:3243(join)
        5    0.001    0.000    0.001    0.000 {method 'reduce' of 'numpy.ufunc' objects}
      610    0.000    0.000    0.000    0.000 {built-in method builtins.isinstance}
      225    0.000    0.000    0.000    0.000 generic.py:10(_check)
  234/165    0.000    0.000    0.000    0.000 {built-in method builtins.len}
       26    0.000    0.000    0.000    0.000 _dtype.py:333(_name_get)
        8    0.000    0.000    0.000    0.000 algorithms.py:1436(_get_take_nd_function)
       33    0.000    0.000    0.000    0.000 {built-in method numpy.array}
        3    0.000    0.000    0.000    0.000 managers.py:212(_rebuild_blknos_and_blklocs)
  357/354    0.000    0.000    0.000    0.000 {built-in method builtins.getattr}
       44    0.000    0.000    0.000    0.000 common.py:1565(is_extension_array_dtype)
       63    0.000    0.000    0.000    0.000 common.py:1708(_is_dtype_type)
        4    0.000    0.000    0.001    0.000 series.py:183(__init__)
       14    0.000    0.000    0.000    0.000 {built-in method numpy.empty}
      219    0.000    0.000    0.000    0.000 {built-in method builtins.issubclass}
        2    0.000    0.000    0.002    0.001 dispatch.py:48(should_series_dispatch)
       48    0.000    0.000    0.000    0.000 dtypes.py:75(find)
       46    0.000    0.000    0.000    0.000 base.py:247(is_dtype)
       54    0.000    0.000    0.000    0.000 common.py:208(<lambda>)
      2/1    0.000    0.000    0.000    0.000 base.py:276(__new__)
        2    0.000    0.000    0.038    0.019 generic.py:4584(_reindex_with_indexers)
        2    0.000    0.000    0.037    0.019 blocks.py:1271(take_nd)
       71    0.000    0.000    0.000    0.000 {built-in method builtins.hasattr}
        1    0.000    0.000    0.100    0.100 {built-in method builtins.exec}
        3    0.000    0.000    0.000    0.000 {method 'get_indexer' of 'pandas._libs.index.IndexEngine' objects}
       13    0.000    0.000    0.000    0.000 multi.py:794(codes)
        3    0.000    0.000    0.000    0.000 blocks.py:343(ftype)
        3    0.000    0.000    0.000    0.000 {pandas._libs.lib.infer_dtype}
        4    0.000    0.000    0.001    0.000 generic.py:5519(dtypes)
        4    0.000    0.000    0.000    0.000 {pandas._libs.algos.take_1d_object_object}
       13    0.000    0.000    0.000    0.000 frozen.py:66(__getitem__)
        7    0.000    0.000    0.000    0.000 blocks.py:3027(make_block)
        4    0.000    0.000    0.000    0.000 construction.py:388(sanitize_array)
        7    0.000    0.000    0.000    0.000 blocks.py:118(__init__)
        5    0.000    0.000    0.000    0.000 blocks.py:2981(get_block_type)
       26    0.000    0.000    0.000    0.000 common.py:1844(pandas_dtype)
       25    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:1009(_handle_fromlist)
       24    0.000    0.000    0.000    0.000 {pandas._libs.lib.is_list_like}
       22    0.000    0.000    0.000    0.000 common.py:222(is_object_dtype)
        9    0.000    0.000    0.000    0.000 generic.py:5276(__setattr__)
       26    0.000    0.000    0.000    0.000 _dtype.py:319(_name_includes_bit_suffix)
       11    0.000    0.000    0.000    0.000 numerictypes.py:365(issubdtype)
        1    0.000    0.000    0.070    0.070 generic.py:8491(_align_frame)
        2    0.000    0.000    0.000    0.000 frequencies.py:74(to_offset)
        3    0.000    0.000    0.000    0.000 base.py:2706(get_indexer)
        7    0.000    0.000    0.000    0.000 generic.py:190(__init__)
        8    0.000    0.000    0.000    0.000 managers.py:199(_is_single_block)
        2    0.000    0.000    0.000    0.000 cast.py:347(maybe_promote)
        1    0.000    0.000    0.001    0.001 base.py:2277(_union)
        4    0.000    0.000    0.000    0.000 construction.py:506(_try_cast)
        9    0.000    0.000    0.000    0.000 managers.py:163(shape)
        4    0.000    0.000    0.000    0.000 managers.py:798(as_array)
       24    0.000    0.000    0.000    0.000 {built-in method _abc._abc_instancecheck}
        4    0.000    0.000    0.000    0.000 _ufunc_config.py:39(seterr)
        7    0.000    0.000    0.000    0.000 blocks.py:251(mgr_locs)
       22    0.000    0.000    0.000    0.000 numerictypes.py:293(issubclass_)
       44    0.000    0.000    0.000    0.000 base.py:615(__len__)
        3    0.000    0.000    0.000    0.000 frame.py:414(__init__)
        4    0.000    0.000    0.000    0.000 indexing.py:2116(_getitem_axis)
        3    0.000    0.000    0.001    0.000 managers.py:122(__init__)
        1    0.000    0.000    0.014    0.014 frame.py:5281(_combine_frame)
        4    0.000    0.000    0.000    0.000 managers.py:248(get_dtypes)
        3    0.000    0.000    0.000    0.000 managers.py:329(_verify_integrity)
        3    0.000    0.000    0.000    0.000 missing.py:402(array_equivalent)
       17    0.000    0.000    0.000    0.000 common.py:403(is_datetime64tz_dtype)
       20    0.000    0.000    0.000    0.000 common.py:1672(_get_dtype)
        4    0.000    0.000    0.000    0.000 cast.py:1209(maybe_cast_to_datetime)
       13    0.000    0.000    0.000    0.000 {method 'format' of 'str' objects}
        3    0.000    0.000    0.000    0.000 _dtype.py:46(__str__)
       10    0.000    0.000    0.000    0.000 dtypes.py:917(is_dtype)
       10    0.000    0.000    0.000    0.000 dtypes.py:1124(is_dtype)
        4    0.000    0.000    0.000    0.000 cast.py:1088(maybe_castable)
       24    0.000    0.000    0.000    0.000 _asarray.py:16(asarray)
       10    0.000    0.000    0.000    0.000 base.py:3638(values)
        9    0.000    0.000    0.000    0.000 base.py:5393(maybe_extract_name)
       10    0.000    0.000    0.000    0.000 multi.py:1175(__len__)
        2    0.000    0.000    0.038    0.019 managers.py:1224(reindex_indexer)
        5    0.000    0.000    0.000    0.000 common.py:252(is_sparse)
        8    0.000    0.000    0.000    0.000 generic.py:5331(_protect_consolidate)
        4    0.000    0.000    0.000    0.000 managers.py:1467(__init__)
       12    0.000    0.000    0.000    0.000 {method 'view' of 'numpy.ndarray' objects}
       12    0.000    0.000    0.000    0.000 common.py:441(is_timedelta64_dtype)
       21    0.000    0.000    0.000    0.000 base.py:5293(ensure_index)
        1    0.000    0.000    0.003    0.003 multi.py:240(__new__)
       26    0.000    0.000    0.000    0.000 _dtype.py:36(_kind_name)
        4    0.000    0.000    0.000    0.000 _ufunc_config.py:139(geterr)
       13    0.000    0.000    0.000    0.000 common.py:372(is_datetime64_dtype)
        8    0.000    0.000    0.000    0.000 common.py:685(is_dtype_equal)
        4    0.000    0.000    0.000    0.000 indexing.py:1754(__getitem__)
        1    0.000    0.000    0.013    0.013 expressions.py:65(_evaluate_standard)
       54    0.000    0.000    0.000    0.000 common.py:206(classes)
       11    0.000    0.000    0.000    0.000 common.py:542(is_categorical_dtype)
       12    0.000    0.000    0.000    0.000 construction.py:337(extract_array)
        1    0.000    0.000    0.000    0.000 datetimes.py:211(__init__)
        1    0.000    0.000    0.000    0.000 frequencies.py:204(_get_offset)
       31    0.000    0.000    0.000    0.000 blocks.py:247(mgr_locs)
       11    0.000    0.000    0.000    0.000 common.py:472(is_period_dtype)
        1    0.000    0.000    0.000    0.000 datetimes.py:1976(_validate_dt64_dtype)
        1    0.000    0.000    0.000    0.000 base.py:526(_shallow_copy)
        1    0.000    0.000    0.000    0.000 datetimes.py:259(_simple_new)
        4    0.000    0.000    0.000    0.000 series.py:844(_ixs)
       19    0.000    0.000    0.000    0.000 {method 'get' of 'dict' objects}
        1    0.000    0.000    0.000    0.000 {pandas._libs.lib.get_reverse_indexer}
        1    0.000    0.000    0.089    0.089 __init__.py:751(f)
        2    0.000    0.000    0.000    0.000 base.py:472(_simple_new)
        1    0.000    0.000    0.000    0.000 multi.py:668(_set_levels)
       17    0.000    0.000    0.000    0.000 inference.py:358(is_hashable)
        2    0.000    0.000    0.002    0.001 cast.py:757(coerce_indexer_dtype)
        1    0.000    0.000    0.000    0.000 offsets.py:2580(__eq__)
        2    0.000    0.000    0.000    0.000 base.py:4046(equals)
        1    0.000    0.000    0.000    0.000 construction.py:123(init_ndarray)
       24    0.000    0.000    0.000    0.000 abc.py:137(__instancecheck__)
        7    0.000    0.000    0.000    0.000 common.py:1401(is_float_dtype)
        1    0.000    0.000    0.013    0.013 array_ops.py:126(na_arithmetic_op)
        4    0.000    0.000    0.000    0.000 generic.py:409(_get_axis_name)
        3    0.000    0.000    0.000    0.000 multi.py:684(<genexpr>)
        1    0.000    0.000    0.000    0.000 multi.py:1288(_get_level_number)
        2    0.000    0.000    0.002    0.001 multi.py:3499(_coerce_indexer_frozen)
        4    0.000    0.000    0.000    0.000 blocks.py:2591(__init__)
        3    0.000    0.000    0.000    0.000 managers.py:655(_consolidate_check)
        4    0.000    0.000    0.000    0.000 series.py:376(_set_axis)
        4    0.000    0.000    0.000    0.000 series.py:428(name)
        4    0.000    0.000    0.000    0.000 {method 'any' of 'numpy.ndarray' objects}
        6    0.000    0.000    0.000    0.000 {method 'fill' of 'numpy.ndarray' objects}
        1    0.000    0.000    0.000    0.000 base.py:5387(default_index)
        4    0.000    0.000    0.000    0.000 generic.py:396(_get_axis_number)
        4    0.000    0.000    0.000    0.000 multi.py:1178(_get_names)
        1    0.000    0.000    0.000    0.000 multi.py:3101(equals)
        4    0.000    0.000    0.000    0.000 indexing.py:2045(_validate_integer)
       27    0.000    0.000    0.000    0.000 managers.py:165(<genexpr>)
       11    0.000    0.000    0.000    0.000 common.py:506(is_interval_dtype)
        1    0.000    0.000    0.000    0.000 algorithms.py:1940(safe_sort)
        4    0.000    0.000    0.000    0.000 common.py:99(is_bool_indexer)
        2    0.000    0.000    0.000    0.000 common.py:219(asarray_tuplesafe)
        1    0.000    0.000    0.000    0.000 frame.py:5320(_construct_result)
        4    0.000    0.000    0.000    0.000 generic.py:426(_get_block_manager_axis)
        4    0.000    0.000    0.000    0.000 generic.py:5412(values)
        1    0.000    0.000    0.000    0.000 datetimelike.py:598(_shallow_copy)
        1    0.000    0.000    0.002    0.002 multi.py:798(_set_codes)
        1    0.000    0.000    0.000    0.000 range.py:83(__new__)
        7    0.000    0.000    0.000    0.000 blocks.py:129(_check_ndim)
        3    0.000    0.000    0.000    0.000 managers.py:128(<listcomp>)
        2    0.000    0.000    0.037    0.019 managers.py:1260(<listcomp>)
        1    0.000    0.000    0.000    0.000 {method 'split' of 're.Pattern' objects}
        1    0.000    0.000    0.000    0.000 re.py:271(_compile)
        1    0.000    0.000    0.000    0.000 {method 'argsort' of 'numpy.ndarray' objects}
        5    0.000    0.000    0.000    0.000 {pandas._libs.lib.values_from_object}
        3    0.000    0.000    0.000    0.000 datetimes.py:500(tz)
        4    0.000    0.000    0.000    0.000 generic.py:255(_validate_dtype)
        4    0.000    0.000    0.000    0.000 generic.py:5257(__getattr__)
        4    0.000    0.000    0.000    0.000 generic.py:5371(_is_mixed_type)
        2    0.000    0.000    0.000    0.000 blocks.py:275(make_block_same_class)
        6    0.000    0.000    0.000    0.000 {built-in method __new__ of type object at 0x102423568}
        1    0.000    0.000    0.000    0.000 _methods.py:47(_all)
        4    0.000    0.000    0.000    0.000 {method 'transpose' of 'numpy.ndarray' objects}
        4    0.000    0.000    0.000    0.000 inference.py:96(is_iterator)
        2    0.000    0.000    0.000    0.000 common.py:608(is_excluded_dtype)
        1    0.000    0.000    0.000    0.000 common.py:183(all_none)
        1    0.000    0.000    0.001    0.001 base.py:2217(union)
        4    0.000    0.000    0.000    0.000 generic.py:422(_get_axis)
        1    0.000    0.000    0.000    0.000 multi.py:1181(_set_names)
        9    0.000    0.000    0.000    0.000 blocks.py:339(dtype)
       15    0.000    0.000    0.000    0.000 managers.py:167(ndim)
       11    0.000    0.000    0.000    0.000 managers.py:647(is_consolidated)
        4    0.000    0.000    0.000    0.000 managers.py:660(is_mixed_type)
        4    0.000    0.000    0.000    0.000 managers.py:1565(internal_values)
        1    0.000    0.000    0.000    0.000 managers.py:1643(create_block_manager_from_blocks)
        2    0.000    0.000    0.000    0.000 {built-in method builtins.sorted}
        3    0.000    0.000    0.000    0.000 {built-in method builtins.sum}
        1    0.000    0.000    0.000    0.000 re.py:205(split)
        2    0.000    0.000    0.000    0.000 _ufunc_config.py:437(__init__)
        2    0.000    0.000    0.000    0.000 _ufunc_config.py:446(__exit__)
        2    0.000    0.000    0.000    0.000 _internal.py:830(npy_ctypes_check)
        2    0.000    0.000    0.000    0.000 missing.py:132(_isna_new)
        5    0.000    0.000    0.000    0.000 common.py:339(is_categorical)
        6    0.000    0.000    0.000    0.000 common.py:613(<genexpr>)
        2    0.000    0.000    0.000    0.000 common.py:575(is_string_dtype)
        5    0.000    0.000    0.000    0.000 common.py:775(is_integer_dtype)
        4    0.000    0.000    0.000    0.000 cast.py:1483(construct_1d_ndarray_preserving_na)
        4    0.000    0.000    0.000    0.000 common.py:330(apply_if_callable)
        2    0.000    0.000    0.000    0.000 base.py:505(_get_attributes_dict)
        5    0.000    0.000    0.000    0.000 base.py:592(_reset_identity)
        4    0.000    0.000    0.000    0.000 generic.py:5235(__finalize__)
        4    0.000    0.000    0.000    0.000 generic.py:5341(_consolidate_inplace)
        1    0.000    0.000    0.000    0.000 datetimelike.py:895(_delegate_property_get)
        3    0.000    0.000    0.002    0.001 multi.py:809(<genexpr>)
        4    0.000    0.000    0.000    0.000 managers.py:249(<listcomp>)
        8    0.000    0.000    0.000    0.000 managers.py:314(__len__)
        4    0.000    0.000    0.000    0.000 managers.py:927(consolidate)
        4    0.000    0.000    0.000    0.000 series.py:432(name)
        4    0.000    0.000    0.000    0.000 series.py:480(_values)
        1    0.000    0.000    0.000    0.000 <__array_function__ internals>:2(min_scalar_type)
        4    0.000    0.000    0.000    0.000 _methods.py:44(_any)
        4    0.000    0.000    0.000    0.000 {built-in method numpy.seterrobj}
        2    0.000    0.000    0.000    0.000 {pandas._libs.lib.array_equivalent_object}
        2    0.000    0.000    0.000    0.000 cast.py:503(_ensure_dtype_type)
        1    0.000    0.000    0.000    0.000 __init__.py:625(_align_method_FRAME)
        1    0.000    0.000    0.000    0.000 __init__.py:681(_should_reindex_frame_op)
        3    0.000    0.000    0.000    0.000 base.py:573(is_)
        1    0.000    0.000    0.000    0.000 base.py:2351(_wrap_setop_result)
        5    0.000    0.000    0.000    0.000 base.py:3700(_internal_get_values)
        4    0.000    0.000    0.000    0.000 {built-in method pandas._libs.index.get_value_at}
        2    0.000    0.000    0.000    0.000 generic.py:219(_init_mgr)
        4    0.000    0.000    0.000    0.000 generic.py:5344(f)
        1    0.000    0.000    0.000    0.000 range.py:131(_simple_new)
        4    0.000    0.000    0.000    0.000 indexing.py:627(_get_loc)
        4    0.000    0.000    0.000    0.000 blocks.py:225(get_values)
        7    0.000    0.000    0.000    0.000 managers.py:943(_consolidate_inplace)
       13    0.000    0.000    0.000    0.000 {function FrozenList.__getitem__ at 0x11466cb90}
        3    0.000    0.000    0.000    0.000 {built-in method builtins.any}
       17    0.000    0.000    0.000    0.000 {built-in method builtins.hash}
        1    0.000    0.000    0.000    0.000 {method 'all' of 'numpy.ndarray' objects}
        4    0.000    0.000    0.000    0.000 {method 'ravel' of 'numpy.ndarray' objects}
        1    0.000    0.000    0.000    0.000 {built-in method numpy.core._multiarray_umath.implement_array_function}
        2    0.000    0.000    0.000    0.000 _ufunc_config.py:441(__enter__)
        9    0.000    0.000    0.000    0.000 {pandas._libs.lib.is_integer}
        4    0.000    0.000    0.000    0.000 inference.py:220(is_array_like)
        4    0.000    0.000    0.000    0.000 common.py:987(is_datetime64_any_dtype)
        4    0.000    0.000    0.000    0.000 construction.py:570(is_empty_data)
        2    0.000    0.000    0.000    0.000 missing.py:136(dispatch_fill_zeros)
        3    0.000    0.000    0.000    0.000 missing.py:601(clean_reindex_fill_method)
        1    0.000    0.000    0.000    0.000 datetimes.py:286(_simple_new)
        2    0.000    0.000    0.000    0.000 base.py:509(<dictcomp>)
        3    0.000    0.000    0.000    0.000 base.py:1667(is_boolean)
        1    0.000    0.000    0.000    0.000 base.py:5463(_maybe_cast_data_without_dtype)
        1    0.000    0.000    0.074    0.074 frame.py:3810(align)
        2    0.000    0.000    0.000    0.000 generic.py:515(ndim)
        1    0.000    0.000    0.000    0.000 generic.py:3294(_clear_item_cache)
        1    0.000    0.000    0.000    0.000 managers.py:171(set_axis)
       11    0.000    0.000    0.000    0.000 managers.py:232(items)
        1    0.000    0.000    0.000    0.000 construction.py:417(_get_axes)
        1    0.000    0.000    0.013    0.013 expressions.py:191(evaluate)
        8    0.000    0.000    0.000    0.000 {built-in method numpy.geterrobj}
        2    0.000    0.000    0.000    0.000 {built-in method pandas._libs.missing.checknull}
        6    0.000    0.000    0.000    0.000 {pandas._libs.lib.is_scalar}
        2    0.000    0.000    0.000    0.000 missing.py:49(isna)
        2    0.000    0.000    0.000    0.000 common.py:179(ensure_python_int)
        9    0.000    0.000    0.000    0.000 common.py:216(<lambda>)
        9    0.000    0.000    0.000    0.000 common.py:211(classes_and_not_datetimelike)
        2    0.000    0.000    0.000    0.000 common.py:605(condition)
        2    0.000    0.000    0.000    0.000 common.py:1435(is_bool_dtype)
        1    0.000    0.000    0.000    0.000 accessor.py:84(_getter)
        3    0.000    0.000    0.000    0.000 base.py:61(_reset_cache)
        4    0.000    0.000    0.000    0.000 base.py:853(_ndarray_values)
        4    0.000    0.000    0.000    0.000 datetimelike.py:969(freq)
        2    0.000    0.000    0.000    0.000 base.py:1186(name)
        4    0.000    0.000    0.000    0.000 base.py:1588(is_monotonic_increasing)
        4    0.000    0.000    0.000    0.000 generic.py:491(_info_axis)
        1    0.000    0.000    0.000    0.000 generic.py:660(_set_axis)
        4    0.000    0.000    0.000    0.000 generic.py:5373(<lambda>)
        1    0.000    0.000    0.000    0.000 multi.py:999(dtype)
        3    0.000    0.000    0.000    0.000 blocks.py:335(shape)
        1    0.000    0.000    0.000    0.000 construction.py:260(prep_ndarray)
        4    0.000    0.000    0.000    0.000 series.py:403(_set_subtyp)
        1    0.000    0.000    0.000    0.000 expressions.py:160(_has_bool_dtype)
        2    0.000    0.000    0.000    0.000 {method 'update' of 'dict' objects}
        1    0.000    0.000    0.000    0.000 {method 'startswith' of 'str' objects}
        1    0.000    0.000    0.000    0.000 {method 'nonzero' of 'numpy.ndarray' objects}
        1    0.000    0.000    0.000    0.000 {method 'take' of 'numpy.ndarray' objects}
        4    0.000    0.000    0.000    0.000 {pandas._libs.lib.item_from_zerodim}
        2    0.000    0.000    0.000    0.000 common.py:1647(_is_dtype)
        4    0.000    0.000    0.000    0.000 indexers.py:23(is_list_like_indexer)
        1    0.000    0.000    0.000    0.000 __init__.py:124(_maybe_match_name)
        1    0.000    0.000    0.013    0.013 array_ops.py:120(na_op)
        7    0.000    0.000    0.000    0.000 base.py:1182(name)
        4    0.000    0.000    0.000    0.000 base.py:1581(is_monotonic)
        1    0.000    0.000    0.000    0.000 base.py:1730(inferred_type)
        3    0.000    0.000    0.000    0.000 base.py:4505(_maybe_promote)
        3    0.000    0.000    0.000    0.000 frame.py:399(_constructor)
        3    0.000    0.000    0.000    0.000 multi.py:1881(nlevels)
        4    0.000    0.000    0.000    0.000 indexing.py:93(iloc)
        4    0.000    0.000    0.000    0.000 blocks.py:213(internal_values)
        2    0.000    0.000    0.000    0.000 blocks.py:243(fill_value)
        6    0.000    0.000    0.000    0.000 managers.py:331(<genexpr>)
        3    0.000    0.000    0.000    0.000 managers.py:656(<listcomp>)
        1    0.000    0.000    0.000    0.000 expressions.py:169(_bool_arith_check)
        1    0.000    0.000    0.000    0.000 {method 'index' of 'list' objects}
        2    0.000    0.000    0.000    0.000 {method 'pop' of 'dict' objects}
        2    0.000    0.000    0.000    0.000 {method 'items' of 'dict' objects}
        1    0.000    0.000    0.000    0.000 {method 'split' of 'str' objects}
        1    0.000    0.000    0.000    0.000 {method 'lower' of 'str' objects}
        2    0.000    0.000    0.000    0.000 {method 'strip' of 'str' objects}
        1    0.000    0.000    0.000    0.000 {built-in method builtins.all}
        1    0.000    0.000    0.000    0.000 multiarray.py:584(min_scalar_type)
        2    0.000    0.000    0.000    0.000 {pandas._libs.lib.is_float}
        5    0.000    0.000    0.000    0.000 {pandas._libs.algos.ensure_platform_int}
        4    0.000    0.000    0.000    0.000 {pandas._libs.algos.ensure_object}
        2    0.000    0.000    0.000    0.000 common.py:830(is_signed_integer_dtype)
        2    0.000    0.000    0.000    0.000 common.py:187(<genexpr>)
        1    0.000    0.000    0.000    0.000 __init__.py:101(get_op_result_name)
        3    0.000    0.000    0.000    0.000 datetimes.py:478(dtype)
        1    0.000    0.000    0.000    0.000 datetimelike.py:495(__len__)
        1    0.000    0.000    0.000    0.000 base.py:638(dtype)
        1    0.000    0.000    0.000    0.000 base.py:2194(_is_compatible_with_other)
        1    0.000    0.000    0.000    0.000 base.py:2581(_assert_can_do_setop)
        1    0.000    0.000    0.000    0.000 base.py:5382(_validate_join_method)
        1    0.000    0.000    0.000    0.000 base.py:5409(_maybe_cast_with_dtype)
        4    0.000    0.000    0.000    0.000 generic.py:238(attrs)
        2    0.000    0.000    0.000    0.000 extension.py:57(fget)
        1    0.000    0.000    0.000    0.000 numeric.py:83(_validate_dtype)
        1    0.000    0.000    0.000    0.000 multi.py:1339(values)
        4    0.000    0.000    0.000    0.000 managers.py:1520(_block)
        1    0.000    0.000    0.000    0.000 {method 'count' of 'list' objects}
        3    0.000    0.000    0.000    0.000 {method 'keys' of 'dict' objects}
        1    0.000    0.000    0.000    0.000 {method 'clear' of 'dict' objects}
        1    0.000    0.000    0.000    0.000 {method 'upper' of 'str' objects}
        4    0.000    0.000    0.000    0.000 {built-in method builtins.callable}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.000    0.000    0.000    0.000 {pandas._libs.algos.ensure_int32}
        2    0.000    0.000    0.000    0.000 common.py:887(is_unsigned_integer_dtype)
        4    0.000    0.000    0.000    0.000 missing.py:73(clean_fill_method)
        1    0.000    0.000    0.000    0.000 datetimes.py:55(tz_to_dtype)
        1    0.000    0.000    0.000    0.000 base.py:2210(_validate_sort_keyword)
        3    0.000    0.000    0.000    0.000 base.py:3670(_values)
        6    0.000    0.000    0.000    0.000 range.py:675(__len__)

@philippegr
Copy link
Author

So, in more graphical view, this is what comes inside _align_frame that seems to differ
First call
Screen Shot 2020-05-21 at 12 27 28 PM
Second call
Screen Shot 2020-05-21 at 12 27 43 PM

@philippegr
Copy link
Author

philippegr commented May 21, 2020

This definitely has something to do with ref_df index being of type str

# Before running any of the sub command in the first code snippet
# Changing index type of ref to category
ref_df.index = ref_df.index.astype('category')
# %%time about 150 ms, output is MultiIndexed (str, datetime) 
res_1 = value_df.sub(ref_df, level=0)
# Changing index back to str and ensuring result is the same
ref_df.index = ref_df.index.astype('str')
res_2 = value_df.sub(ref_df, level=0)
assert pd.DataFrame.equals(res_1, res_2)

@jbrockmendel
Copy link
Member

Looks like there is some low-hanging fruit in MultIIndex.equals, since it is going through

        if not isinstance(other, MultiIndex):
            # d-level MultiIndex can equal d-tuple Index
            if not is_object_dtype(other.dtype):
                # other cannot contain tuples, so cannot match self
                return False

            return array_equivalent(self._values, other._values)

Its the self._values that is taking up most of the time here. could be pre-empted by any number of things, for starters a length-check

@philippegr
Copy link
Author

Looks like there is some low-hanging fruit in MultIIndex.equals, since it is going through

        if not isinstance(other, MultiIndex):
            # d-level MultiIndex can equal d-tuple Index
            if not is_object_dtype(other.dtype):
                # other cannot contain tuples, so cannot match self
                return False

            return array_equivalent(self._values, other._values)

Its the self._values that is taking up most of the time here. could be pre-empted by any number of things, for starters a length-check

Is that what gets computed (cached?) upon first call (hence the delay) and thus why it is faster afterwards?

@jreback jreback added MultiIndex Numeric Operations Arithmetic, Comparison, and Logical operations Performance Memory or execution speed performance and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels May 25, 2020
@jreback jreback added this to the 1.1 milestone May 25, 2020
@philippegr
Copy link
Author

Thank you very much, looking forward to 1.1!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
MultiIndex Numeric Operations Arithmetic, Comparison, and Logical operations Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants