Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement vectorized, NA-friendly friendly string utils, a la R's stringr #620

Closed
wesm opened this issue Jan 12, 2012 · 2 comments
Closed
Milestone

Comments

@wesm
Copy link
Member

wesm commented Jan 12, 2012

cc @hammer, @arthurgerigk. Not sure when will be able to make this happen, but this would be a very nice addition. I've often found myself doing stuff like:

df[col].map(lambda x: x[:10]) 

or various other forms of string munging / regex-processing.

Obviously that would fail if any of df[col] is NA. And having to write this kinda sucks:

df[col].map(lambda x: x[:10] if notnull(x) else x) 

If multiple columns were involved in some kind of string processing exercise, you'd just want the whole operation to short circuit and be NA if an NA is encountered.

@wesm
Copy link
Member Author

wesm commented Feb 7, 2012

from @gdraps in #746

Hi Wes,

First off, thanks for Pandas and your recent talk at the NYC Python meetup.  On the topic of filtering, I found that NumPy vector filters are awesome for numeric data, but I have found myself reaching for the following idioms when dealing with alpha-numeric columns:

   df[df.method.contains('abc')]
   df[df.method.startswith('ghi')]
   df[df.method.endswith('xyz')]

Would you consider the addition of these methods to the Series class, not only to complement the existing `isin()` method, but to bridge the gap with SQL libraries, such as SQLAlchemy (http://docs.sqlalchemy.org/en/latest/core/expression_api.html#sqlalchemy.sql.operators.ColumnOperators), and improve expressiveness of string queries in Pandas?

wesm added a commit that referenced this issue Jul 15, 2012
* string-methods:
  TST: added unit tests from PR #1179 and copied docstrings
  ENH: additional unicode handling
  ENH: don't repeat numerical types
  TST: mixed types for string methods
  ENH: finish docs
  TST: add testing module for string methods #620
  ENH: continue filling out string methods + tests #620
  ENH: get working on vectorized string methods #620
@wesm
Copy link
Member Author

wesm commented Jul 15, 2012

Better late than never. Excited to finish this feature

@wesm wesm closed this as completed Jul 15, 2012
dan-nadler pushed a commit to dan-nadler/pandas that referenced this issue Sep 23, 2019
This makes `VersionedItem` backwards compatible, so any third-party
libraries creating their own `VersionedItem`s won't be affected.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant