You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I think that it is confusing how DataFrame.replace interprets the to_replace values as a pure string or a regexp.
When you pass a dictionary of from and to values, these are interpreted as pure string literals and work as expected. If you pass a nested dictionary with the same values, it is interpreted as a regex even if regex = False is specifically added. This causes a problem if you want to replace values that have special characters in them.
As you can see, in the first case the () got replaced as expected. In the second case, even though the only thing I changed was to specify the column in which the replace should occur, the parentheses got treated as a regex. The same happens if I add regex = False to the options.
The current workaround for me is to make sure that the from/to values are regexes.
Cheers,
Adam
Ps.: Pandas is awesome, thanks a lot for all the effort!
commit: None
python: 2.7.6.final.0
python-bits: 32
OS: Windows
OS-release: 7
machine: x86
processor: x86 Family 6 Model 23 Stepping 10, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
Dear developers,
I think that it is confusing how DataFrame.replace interprets the to_replace values as a pure string or a regexp.
When you pass a dictionary of from and to values, these are interpreted as pure string literals and work as expected. If you pass a nested dictionary with the same values, it is interpreted as a regex even if regex = False is specifically added. This causes a problem if you want to replace values that have special characters in them.
Let's see an example:
As you can see, in the first case the () got replaced as expected. In the second case, even though the only thing I changed was to specify the column in which the replace should occur, the parentheses got treated as a regex. The same happens if I add regex = False to the options.
The current workaround for me is to make sure that the from/to values are regexes.
Cheers,
Adam
Ps.: Pandas is awesome, thanks a lot for all the effort!
@Buddha[J:T26]|24> pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 2.7.6.final.0
python-bits: 32
OS: Windows
OS-release: 7
machine: x86
processor: x86 Family 6 Model 23 Stepping 10, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
pandas: 0.13.1
Cython: None
numpy: 1.8.0
scipy: 0.13.3
statsmodels: 0.5.0
IPython: 1.2.0
sphinx: 1.2.1
patsy: 0.2.1
scikits.timeseries: None
dateutil: 2.2
pytz: 2013.9
bottleneck: 0.8.0
tables: 3.1.0
numexpr: 2.3
matplotlib: 1.3.1
openpyxl: None
xlrd: 0.9.2
xlwt: 0.7.5
xlsxwriter: None
sqlalchemy: 0.8.4
lxml: 3.3.1
bs4: 4.3.2
html5lib: 0.999
bq: None
apiclient: None
The text was updated successfully, but these errors were encountered: