## [Regular expression replacement](https://pandas.pydata.org/pandas-docs/stable/user_guide/missing_data.html#regular-expression-replacement)

Python strings prefixed with the `r` character such as `r'hello world'` are [“raw” strings](https://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-literals). They have different semantics regarding backslashes than strings without this prefix. Backslashes in raw strings will be interpreted as an escaped backslash, e.g., `r'\' == '\\'`.

Replace the ‘`.`’ with `NaN`

In [1]:
import pandas as pd
import numpy as np

In [3]:
d = {'a': list(range(4)), 'b': list('ab..'), 'c': ['a','b',np.nan,'d']}
d

{'a': [0, 1, 2, 3], 'b': ['a', 'b', '.', '.'], 'c': ['a', 'b', nan, 'd']}

In [4]:
df = pd.DataFrame(d)
df

Unnamed: 0,a,b,c
0,0,a,a
1,1,b,b
2,2,.,
3,3,.,d


In [5]:
df.replace('.', np.nan)

Unnamed: 0,a,b,c
0,0,a,a
1,1,b,b
2,2,,
3,3,,d


Replace the ‘`.`’ with `NaN` with regular expression that removes surrounding whitespace

In [6]:
df

Unnamed: 0,a,b,c
0,0,a,a
1,1,b,b
2,2,.,
3,3,.,d


In [10]:
df.replace(r'\s*\.\s*', np.nan, regex=True)

Unnamed: 0,a,b,c
0,0,a,a
1,1,b,b
2,2,,
3,3,,d


Replace with a list of regexes.

In [19]:
df.replace([r'\.', r'(a)'], ['dot', r'\1_stuff'], regex=True)

Unnamed: 0,a,b,c
0,0,a_stuff,a_stuff
1,1,b,b
2,2,dot,
3,3,dot,d


Replace with a regex in a mapping dict.

In [13]:
df

Unnamed: 0,a,b,c
0,0,a,a
1,1,b,b
2,2,.,
3,3,.,d


In [14]:
df.replace({'b': r'\s*\.\s*'}, {'b':np.nan}, regex=True)

Unnamed: 0,a,b,c
0,0,a,a
1,1,b,b
2,2,,
3,3,,d


Pass nested dictionaries of regular expressions that use the regex keyword.

In [15]:
df.replace({'b': {'b': r""}}, regex=True)

Unnamed: 0,a,b,c
0,0,a,a
1,1,,b
2,2,.,
3,3,.,d


In [18]:
df.replace(regex={'b': {r'\s*\.\s*': np.nan}})

Unnamed: 0,a,b,c
0,0,a,a
1,1,b,b
2,2,,
3,3,,d


In [22]:
df.replace({'b': {r'\s*(\.)\s*': r'\1ty'}}, regex=True)

Unnamed: 0,a,b,c
0,0,a,a
1,1,b,b
2,2,.ty,
3,3,.ty,d


Pass a list of regular expressions that will replace matches with a scalar.

In [25]:
df.replace([r'\s*\.\s*', r'a|b'], "placeholder", regex=True)

Unnamed: 0,a,b,c
0,0,placeholder,placeholder
1,1,placeholder,placeholder
2,2,placeholder,
3,3,placeholder,d


All of the regular expression examples can also be passed with the `to_replace` argument as the `regex` argument. In this case the `value` argument must be passed explicitly by name or `regex` must be a nested dictionary.

In [27]:
df.replace(regex=[r"\s*\.\s*", r"a|b"], value='placeholder')

Unnamed: 0,a,b,c
0,0,placeholder,placeholder
1,1,placeholder,placeholder
2,2,placeholder,
3,3,placeholder,d
