Implement vectorized, NA-friendly friendly string utils, a la R's stringr #620

wesm · 2012-01-12T20:42:57Z

cc @hammer, @arthurgerigk. Not sure when will be able to make this happen, but this would be a very nice addition. I've often found myself doing stuff like:

df[col].map(lambda x: x[:10])

or various other forms of string munging / regex-processing.

Obviously that would fail if any of df[col] is NA. And having to write this kinda sucks:

df[col].map(lambda x: x[:10] if notnull(x) else x)

If multiple columns were involved in some kind of string processing exercise, you'd just want the whole operation to short circuit and be NA if an NA is encountered.

The text was updated successfully, but these errors were encountered:

wesm · 2012-02-07T16:20:33Z

from @gdraps in #746

Hi Wes,

First off, thanks for Pandas and your recent talk at the NYC Python meetup.  On the topic of filtering, I found that NumPy vector filters are awesome for numeric data, but I have found myself reaching for the following idioms when dealing with alpha-numeric columns:

   df[df.method.contains('abc')]
   df[df.method.startswith('ghi')]
   df[df.method.endswith('xyz')]

Would you consider the addition of these methods to the Series class, not only to complement the existing `isin()` method, but to bridge the gap with SQL libraries, such as SQLAlchemy (http://docs.sqlalchemy.org/en/latest/core/expression_api.html#sqlalchemy.sql.operators.ColumnOperators), and improve expressiveness of string queries in Pandas?

* string-methods: TST: added unit tests from PR #1179 and copied docstrings ENH: additional unicode handling ENH: don't repeat numerical types TST: mixed types for string methods ENH: finish docs TST: add testing module for string methods #620 ENH: continue filling out string methods + tests #620 ENH: get working on vectorized string methods #620

wesm · 2012-07-15T22:50:48Z

Better late than never. Excited to finish this feature

This makes `VersionedItem` backwards compatible, so any third-party libraries creating their own `VersionedItem`s won't be affected.

wesm mentioned this issue Feb 7, 2012

Improve docs about filtering #746

Closed

wesm added a commit that referenced this issue Jul 13, 2012

ENH: get working on vectorized string methods #620

408ee6f

wesm added a commit that referenced this issue Jul 14, 2012

ENH: continue filling out string methods + tests #620

dfb5343

wesm added a commit that referenced this issue Jul 14, 2012

TST: add testing module for string methods #620

01e0ff3

wesm closed this as completed Jul 15, 2012

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement vectorized, NA-friendly friendly string utils, a la R's stringr #620

Implement vectorized, NA-friendly friendly string utils, a la R's stringr #620

wesm commented Jan 12, 2012

wesm commented Feb 7, 2012

wesm commented Jul 15, 2012

Implement vectorized, NA-friendly friendly string utils, a la R's stringr #620

Implement vectorized, NA-friendly friendly string utils, a la R's stringr #620

Comments

wesm commented Jan 12, 2012

wesm commented Feb 7, 2012

wesm commented Jul 15, 2012