## Selecting Columns

In [1]:
import pandas as pd
import janitor
import numpy as np
import datetime
import re
from janitor import patterns
from pandas.api.types import is_datetime64_dtype

In [2]:
df = pd.DataFrame(
        {
            "id": [0, 1],
            "Name": ["ABC", "XYZ"],
            "code": [1, 2],
            "code1": [4, np.nan],
            "code2": ["8", 5],
            "type": ["S", "R"],
            "type1": ["E", np.nan],
            "type2": ["T", "U"],
            "code3": pd.Series(["a", "b"], dtype="category"),
            "type3": pd.to_datetime([np.datetime64("2018-01-01"),
                                    datetime.datetime(2018, 1, 1)]),
        }
    )

df



Unnamed: 0,id,Name,code,code1,code2,type,type1,type2,code3,type3
0,0,ABC,1,4.0,8,S,E,T,a,2018-01-01
1,1,XYZ,2,,5,R,,U,b,2018-01-01


- Select by string:

In [3]:
df.select_columns("id")

['id']

- Select via shell-like glob strings (`*`) is possible:

In [4]:
df.select_columns("type*")

['type', 'type1', 'type2', 'type3']

- Select by slice:

In [5]:
df.select_columns(slice("code1", "type1"))

['code1', 'code2', 'type', 'type1']

- Select by `Callable` (the callable is applied to every column  and should return a single `True` or `False` per column):

In [6]:
df.select_columns(is_datetime64_dtype)

['type3']

In [7]:
df.select_columns(lambda x: x.name.startswith("code") or
                            x.name.endswith("1"))

['code', 'code1', 'code2', 'type1', 'code3']

In [8]:
df.select_columns(lambda x: x.isna().any())

['code1', 'type1']

- Select by regular expression:

In [9]:
df.select_columns(re.compile("\\d+"))

['code1', 'code2', 'type1', 'type2', 'code3', 'type3']

In [10]:
# same as above, with janitor.patterns
# simply a wrapper around re.compile

df.select_columns(patterns("\\d+"))


  df.select_columns(patterns("\\d+"))


['code1', 'code2', 'type1', 'type2', 'code3', 'type3']

 - Select a combination of the above (you can combine any of the previous options):

In [11]:
df.select_columns("id", "code*", slice("code", "code2"))

array(['id', 'code', 'code1', 'code2', 'code3'], dtype=object)

- You can also pass a sequence of booleans:

In [12]:
df.select_columns([True, False, True, True, True,
                   False, False, False, True, False])

Index(['id', 'code', 'code1', 'code2', 'code3'], dtype='object')

- Setting `invert` to `True` returns the complement of the columns provided:

In [13]:
df.select_columns("id", "code*", slice("code", "code2"),
                  invert = True)

array(['id', 'code', 'code1', 'code2', 'code3'], dtype=object)