Implement vectorized, NA-friendly friendly string utils, a la R's stringr #620

Closed
wesm opened this Issue Jan 12, 2012 · 2 comments

Projects

None yet

1 participant

@wesm
Member
wesm commented Jan 12, 2012

cc @hammer, @arthurgerigk. Not sure when will be able to make this happen, but this would be a very nice addition. I've often found myself doing stuff like:

df[col].map(lambda x: x[:10]) 

or various other forms of string munging / regex-processing.

Obviously that would fail if any of df[col] is NA. And having to write this kinda sucks:

df[col].map(lambda x: x[:10] if notnull(x) else x) 

If multiple columns were involved in some kind of string processing exercise, you'd just want the whole operation to short circuit and be NA if an NA is encountered.

@wesm
Member
wesm commented Feb 7, 2012

from @gdraps in #746

Hi Wes,

First off, thanks for Pandas and your recent talk at the NYC Python meetup.  On the topic of filtering, I found that NumPy vector filters are awesome for numeric data, but I have found myself reaching for the following idioms when dealing with alpha-numeric columns:

   df[df.method.contains('abc')]
   df[df.method.startswith('ghi')]
   df[df.method.endswith('xyz')]

Would you consider the addition of these methods to the Series class, not only to complement the existing `isin()` method, but to bridge the gap with SQL libraries, such as SQLAlchemy (http://docs.sqlalchemy.org/en/latest/core/expression_api.html#sqlalchemy.sql.operators.ColumnOperators), and improve expressiveness of string queries in Pandas?
@wesm wesm added a commit that referenced this issue Jul 15, 2012
@wesm wesm Merge branch 'string-methods'
* string-methods:
  TST: added unit tests from PR #1179 and copied docstrings
  ENH: additional unicode handling
  ENH: don't repeat numerical types
  TST: mixed types for string methods
  ENH: finish docs
  TST: add testing module for string methods #620
  ENH: continue filling out string methods + tests #620
  ENH: get working on vectorized string methods #620
5a64a12
@wesm
Member
wesm commented Jul 15, 2012

Better late than never. Excited to finish this feature

@wesm wesm closed this Jul 15, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment