The select_dtypes() method implements subsetting of columns based on their dtype.

First, let’s create a DataFrame with a slew of different dtypes:

In [11]:
import numpy as np
import pandas as pd

In [12]:
df = pd.DataFrame({'string': list('pqr'),
                   'int64': list(range(1, 4)),
                   'uint8': np.arange(3, 6).astype('u1'),
                   'float64': np.arange(2.0, 5.0),
                   'bool1': [True, False, True],
                   'bool2': [False, True, False],
                   'dates': pd.date_range('now', periods=3),
                   'category': pd.Series(list("PQR")).astype('category')})

In [13]:
df['tdeltas'] = df.dates.diff()

In [14]:
df['uint64'] = np.arange(3, 6).astype('u8')

In [15]:
df['other_dates'] = pd.date_range('20190101', periods=3)

In [17]:
df['tz_aware_dates'] = pd.date_range('20190101', periods=3, tz='US/Eastern')
df

Unnamed: 0,string,int64,uint8,float64,bool1,bool2,dates,category,tdeltas,uint64,other_dates,tz_aware_dates
0,p,1,3,2.0,True,False,2019-09-07 11:05:09.878205,P,NaT,3,2019-01-01,2019-01-01 00:00:00-05:00
1,q,2,4,3.0,False,True,2019-09-08 11:05:09.878205,Q,1 days,4,2019-01-02,2019-01-02 00:00:00-05:00
2,r,3,5,4.0,True,False,2019-09-09 11:05:09.878205,R,1 days,5,2019-01-03,2019-01-03 00:00:00-05:00


And the dtypes:

In [18]:
df.dtypes

string                                object
int64                                  int64
uint8                                  uint8
float64                              float64
bool1                                   bool
bool2                                   bool
dates                         datetime64[ns]
category                            category
tdeltas                      timedelta64[ns]
uint64                                uint64
other_dates                   datetime64[ns]
tz_aware_dates    datetime64[ns, US/Eastern]
dtype: object

select_dtypes() has two parameters include and exclude that allow you to say “give me the columns<br>
with these dtypes” (include) and/or “give the columns without these dtypes” (exclude).

For example, to select bool columns:

In [20]:
df.select_dtypes(include=[bool])

Unnamed: 0,bool1,bool2
0,True,False
1,False,True
2,True,False


You can also pass the name of a dtype in the NumPy dtype hierarchy:

In [21]:
df.select_dtypes(include=['bool'])

Unnamed: 0,bool1,bool2
0,True,False
1,False,True
2,True,False


select_dtypes() also works with generic dtypes as well.

For example, to select all numeric and boolean columns while excluding unsigned integers:

In [23]:
df.select_dtypes(include=['number', 'bool'], exclude=['unsignedinteger'])

Unnamed: 0,int64,float64,bool1,bool2,tdeltas
0,1,2.0,True,False,NaT
1,2,3.0,False,True,1 days
2,3,4.0,True,False,1 days


To select string columns you must use the object dtype:

In [24]:
df.select_dtypes(include=['object'])

Unnamed: 0,string
0,p
1,q
2,r


To see all the child dtypes of a generic dtype like numpy.number you can define a function that returns<br>
a tree of child dtypes:

In [25]:
def subdtypes(dtype):
     subs = dtype.__subclasses__()
     if not subs:
         return dtype
     return [dtype, [subdtypes(dt) for dt in subs]]

All NumPy dtypes are subclasses of numpy.generic:

In [26]:
subdtypes(np.generic)

[numpy.generic,
 [[numpy.number,
   [[numpy.integer,
     [[numpy.signedinteger,
       [numpy.int8,
        numpy.int16,
        numpy.int32,
        numpy.int32,
        numpy.int64,
        numpy.timedelta64]],
      [numpy.unsignedinteger,
       [numpy.uint8,
        numpy.uint16,
        numpy.uint32,
        numpy.uint32,
        numpy.uint64]]]],
    [numpy.inexact,
     [[numpy.floating,
       [numpy.float16, numpy.float32, numpy.float64, numpy.float64]],
      [numpy.complexfloating,
       [numpy.complex64, numpy.complex128, numpy.complex128]]]]]],
  [numpy.flexible,
   [[numpy.character, [numpy.bytes_, numpy.str_]],
    [numpy.void, [numpy.record]]]],
  numpy.bool_,
  numpy.datetime64,
  numpy.object_]]

**Note:** Pandas also defines the types category, and datetime64[ns, tz], which are not integrated into<br>
the normal NumPy hierarchy and won’t show up with the above function.