DataFrame.groupby() is terribly slow when grouping by multiple columns and one of them is Categorical

#### Code Sample, a copy-pastable example if possible

```python

import pandas as pd

df_str = pd.DataFrame(dict(key_cat=['a', 'b', 'c', 'd'] * 250, key_num1=range(1000), key_num2=range(1000), val=[1]*1000))
df_cat = df_str.copy() 
df_cat.key_cat = pd.Categorical(df_cat.key_cat) 

%%time
df_cat.groupby(["key_cat", "key_num1", "key_num2"]).count()
# CPU times: user 10.8 s, sys: 3 s, total: 13.8 s
# Wall time: 13.8 s # 14 sec for a dataframe with 1000 elements?!!

%%time
df_str.groupby(["key_cat", "key_num1", "key_num2"]).count()
# CPU times: user 0 ns, sys: 4 ms, total: 4 ms
# Wall time: 4.05 ms # expected performance when using a column with string type
```
#### Problem description

Changing column dtype to categorical makes `groupby()` operation 3500 times slower.

The problem occurs both in pandas-0.23.4 and in pandas-0.24.0 (untagged.1.g216986d)

#### Expected Output

Similar execution times.

#### Output of ``pd.show_versions()``

<details>

INSTALLED VERSIONS
------------------
commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.10.0-35-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.23.4
pytest: 3.6.3
pip: 18.0
setuptools: 39.0.1
Cython: 0.28.2
numpy: 1.15.1
scipy: 1.1.0
pyarrow: 0.8.0
xarray: None
IPython: 6.3.1
sphinx: None
patsy: 0.4.1
dateutil: 2.7.2
pytz: 2018.4
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 3.0.2
openpyxl: None
xlrd: 1.1.0
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.6.3
html5lib: 1.0.1
sqlalchemy: 1.2.0
pymysql: None
psycopg2: 2.7.3.1 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: 0.1.2
fastparquet: None
pandas_gbq: 0.2.1
pandas_datareader: None

</details>

### Profiling output

The output of `%%prun -s cumtime`:

```
         14574 function calls (14438 primitive calls) in 14.267 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   14.267   14.267 {built-in method builtins.exec}
        1    0.140    0.140   14.267   14.267 <string>:2(<module>)
        1    0.000    0.000   14.125   14.125 groupby.py:4843(count)
        1    0.000    0.000   14.122   14.122 groupby.py:4739(_wrap_agged_blocks)
        1    0.000    0.000   14.118   14.118 groupby.py:4757(_reindex_output)
        2    0.000    0.000   13.833    6.917 _decorators.py:185(wrapper)
        2    0.000    0.000   13.833    6.917 frame.py:3552(reindex)
        2    0.000    0.000   13.833    6.917 generic.py:3647(reindex)
        2    0.000    0.000   13.833    6.917 frame.py:3489(_reindex_axes)
        1    0.000    0.000   13.833   13.833 frame.py:3505(_reindex_index)
        1    0.000    0.000   13.823   13.823 multi.py:2023(reindex)
        1    0.000    0.000   13.823   13.823 multi.py:1981(get_indexer)
        1    3.043    3.043   13.823   13.823 {method 'get_indexer' of 'pandas._libs.index.BaseMultiIndexCodesEngine' objects}
        1    0.000    0.000    7.333    7.333 category.py:560(get_indexer)
        1    1.277    1.277    6.440    6.440 {method 'get_indexer_non_unique' of 'pandas._libs.index.IndexEngine' objects}
      400    0.029    0.000    5.163    0.013 fromnumeric.py:1173(resize)
      403    5.120    0.013    5.120    0.013 {built-in method numpy.core.multiarray.concatenate}
        1    0.000    0.000    2.788    2.788 base.py:912(__iter__)
        1    0.000    0.000    2.788    2.788 base.py:893(tolist)
    33/29    0.009    0.000    2.743    0.095 base.py:677(_values)
        3    0.000    0.000    2.734    0.911 multi.py:807(values)
        1    2.621    2.621    2.621    2.621 {pandas._libs.lib.fast_zip}
       12    0.000    0.000    0.863    0.072 base.py:3219(get_indexer)
        1    0.008    0.008    0.687    0.687 category.py:247(equals)
       40    0.030    0.001    0.680    0.017 base.py:4914(_ensure_index)
    13/10    0.000    0.000    0.651    0.065 base.py:255(__new__)
        1    0.000    0.000    0.628    0.628 category.py:218(_is_dtype_compat)
        1    0.000    0.000    0.491    0.491 base.py:3471(isin)
      2/1    0.000    0.000    0.491    0.491 algorithms.py:384(isin)
        1    0.010    0.010    0.491    0.491 categorical.py:2292(isin)
        2    0.230    0.115    0.371    0.185 base.py:850(_try_convert_to_int_index)
       12    0.271    0.023    0.271    0.023 {method 'get_indexer' of 'pandas._libs.index.IndexEngine' objects}
  504/500    0.202    0.000    0.263    0.001 {built-in method numpy.core.multiarray.array}
        1    0.000    0.000    0.248    0.248 multi.py:1878(sortlevel)
        1    0.008    0.008    0.238    0.238 algorithms.py:426(<lambda>)
        1    0.005    0.005    0.230    0.230 arraysetops.py:438(in1d)
        1    0.000    0.000    0.227    0.227 sorting.py:177(indexer_from_factorized)
        2    0.000    0.000    0.225    0.112 arraysetops.py:121(unique)
        2    0.049    0.024    0.225    0.112 arraysetops.py:268(_unique1d)
    84/82    0.000    0.000    0.209    0.003 numeric.py:433(asarray)
       11    0.208    0.019    0.208    0.019 {pandas._libs.lib.infer_dtype}
        8    0.000    0.000    0.151    0.019 common.py:301(_asarray_tuplesafe)
       14    0.000    0.000    0.150    0.011 category.py:139(_create_categorical)
       25    0.143    0.006    0.143    0.006 {method 'astype' of 'numpy.ndarray' objects}
        1    0.000    0.000    0.143    0.143 sorting.py:321(get_group_index_sorter)
        1    0.143    0.143    0.143    0.143 {pandas._libs.algos.groupsort_indexer}
       12    0.000    0.000    0.138    0.012 categorical.py:267(__init__)
        2    0.000    0.000    0.136    0.068 categorical.py:2420(_get_codes_for_values)
    23/22    0.010    0.000    0.130    0.006 algorithms.py:1548(take_nd)
        2    0.001    0.000    0.117    0.058 missing.py:189(_isna_ndarraylike)
        2    0.000    0.000    0.117    0.058 missing.py:32(isna)
        2    0.000    0.000    0.117    0.058 missing.py:112(_isna_new)
        1    0.116    0.116    0.116    0.116 {pandas._libs.missing.isnaobj}
        8    0.000    0.000    0.081    0.010 algorithms.py:224(_get_data_algo)
        2    0.062    0.031    0.080    0.040 sorting.py:20(get_group_index)
       11    0.077    0.007    0.077    0.007 {method 'argsort' of 'numpy.ndarray' objects}
        3    0.000    0.000    0.062    0.021 categorical.py:1248(__array__)
        3    0.000    0.000    0.060    0.020 multi.py:1007(_get_level_values)
      437    0.055    0.000    0.055    0.000 {method 'reduce' of 'numpy.ufunc' objects}
      406    0.001    0.000    0.055    0.000 fromnumeric.py:49(_wrapfunc)
        1    0.000    0.000    0.053    0.053 category.py:302(get_values)
        1    0.000    0.000    0.053    0.053 categorical.py:1451(get_values)
        2    0.053    0.026    0.053    0.026 {method 'lookup' of 'pandas._libs.hashtable.StringHashTable' objects}
        3    0.000    0.000    0.051    0.017 missing.py:376(array_equivalent)
        1    0.000    0.000    0.051    0.051 category.py:342(__array__)
       14    0.000    0.000    0.047    0.003 algorithms.py:48(_ensure_data)
        1    0.045    0.045    0.045    0.045 {method 'sort' of 'numpy.ndarray' objects}
        1    0.044    0.044    0.044    0.044 {method 'tolist' of 'numpy.ndarray' objects}
       36    0.000    0.000    0.040    0.001 common.py:1578(is_bool_dtype)
        3    0.000    0.000    0.040    0.013 base.py:1964(inferred_type)
        1    0.000    0.000    0.037    0.037 multi.py:1332(from_product)
        3    0.034    0.011    0.034    0.011 {pandas._libs.algos.take_1d_object_object}
       38    0.033    0.001    0.033    0.001 {built-in method pandas._libs.algos.ensure_int64}
        2    0.011    0.006    0.033    0.016 multi.py:58(_codes_to_ints)
        2    0.000    0.000    0.032    0.016 fromnumeric.py:2092(cumsum)
        2    0.032    0.016    0.032    0.016 {method 'cumsum' of 'numpy.ndarray' objects}
        1    0.000    0.000    0.024    0.024 util.py:16(cartesian_product)
        1    0.000    0.000    0.023    0.023 util.py:63(<listcomp>)
        5    0.022    0.004    0.022    0.004 {method 'repeat' of 'numpy.ndarray' objects}
        2    0.022    0.011    0.022    0.011 {method 'flatten' of 'numpy.ndarray' objects}
        3    0.000    0.000    0.020    0.007 fromnumeric.py:404(repeat)
        7    0.020    0.003    0.020    0.003 {method 'take' of 'numpy.ndarray' objects}
        1    0.000    0.000    0.020    0.020 multi.py:1940(<listcomp>)
        6    0.000    0.000    0.018    0.003 {method 'max' of 'numpy.ndarray' objects}
        6    0.000    0.000    0.018    0.003 _methods.py:26(_amax)
        1    0.000    0.000    0.018    0.018 categorical.py:1774(take_nd)
       39    0.015    0.000    0.015    0.000 {built-in method numpy.core.multiarray.empty}
        4    0.013    0.003    0.014    0.004 sorting.py:55(maybe_lift)
        4    0.000    0.000    0.013    0.003 multi.py:212(__new__)
       14    0.000    0.000    0.013    0.001 base.py:510(_shallow_copy)
        5    0.000    0.000    0.013    0.003 category.py:193(_shallow_copy)
        9    0.000    0.000    0.013    0.001 category.py:178(_simple_new)
        5    0.000    0.000    0.013    0.003 categorical.py:717(_set_dtype)
       10    0.013    0.001    0.013    0.001 {pandas._libs.algos.take_1d_int64_int64}
        6    0.000    0.000    0.013    0.002 categorical.py:2437(_recode_for_categories)
        1    0.000    0.000    0.012    0.012 algorithms.py:1454(take)
```



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

DataFrame.groupby() is terribly slow when grouping by multiple columns and one of them is Categorical #24333

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of `pd.show_versions()`

INSTALLED VERSIONS

Profiling output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

DataFrame.groupby() is terribly slow when grouping by multiple columns and one of them is Categorical #24333

Description

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

Profiling output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Output of `pd.show_versions()`