Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement StringMethod slice_replace #8888

Closed
billletson opened this issue Nov 24, 2014 · 4 comments · Fixed by #9014
Closed

Implement StringMethod slice_replace #8888

billletson opened this issue Nov 24, 2014 · 4 comments · Fixed by #9014
Labels
Enhancement Strings String extension data type and string data
Milestone

Comments

@billletson
Copy link
Contributor

The StringMethod slice_replace currently raises NotImplementedError. Couldn't find an issue for it, so opening this. PR with implementation in the works.

Could use thoughts on how to handle slices that return an empty string on some subset of strings - should the replacement string appear or not? This could happen when the input string is empty, the stop index is less than or equal to the start index (separate cases?), or both start and stop are greater than the length of the string.

@jreback
Copy link
Contributor

jreback commented Nov 24, 2014

can you show a pseudo-example of what you are contemplating?

@jreback jreback added Strings String extension data type and string data Enhancement labels Nov 24, 2014
@billletson
Copy link
Contributor Author

Sure thing. Looks like the skeleton got added in #1621 but behavior was never specified (Docs show the info for StringMethods.slice due to a c&p error, I think.)

Basic functionality is straightforward, remove the specified slice, replace with a specified string. A test:

values = Series(['short', 'a bit longer', 'evenlongerthanthat', '', NA])
result = values.str.slice_replace(2, 3, 'z')
exp = Series(['shzrt', 'a zit longer', 'evznlongerthanthat', '', NA])
tm.assert_series_equal(result, exp)

Tricky cases are when the slice returns an empty string. I could see two ways of doing it: leave the string as is, or force the replacement string in there, depending on exactly why the slice returned nothing. I think leave alone is the right way to go, but maybe I'm wrong.

  1. Slice doesn't define a non-zero length range:

    • Leave as is
    >>>values.str.slice(2, 2, 'z')[0]
    'short'
    
    • insert at start index
    >>>values.str.slice(2, 2, 'z')[0]
    'shzort'
    
  2. Slice spans a non-zero range, but doesn't overlap with 0:len(string). An empty string would be a special case of this.

    • Leave alone
    >>>values.str.slice(6, 8, 'z')[0]
    'short'
    
    • Add replacement string to end:
    >>>values.str.slice(6, 8, 'z')[0]
    'shortz'
    

@immerrr
Copy link
Contributor

immerrr commented Nov 28, 2014

I think this should be a natural extension to x.str slicing. We already support x.str[:5] as a shortcut for applying s[:5] for all strings in a given container:

In [6]: s = pd.Series(['foo', 'bar', 'baz']); s
Out[6]: 
0    foo
1    bar
2    baz
dtype: object

In [7]: s.str[:2]
Out[7]: 
0    fo
1    ba
2    ba
dtype: object

It's quite natural to also support setting them all at once, e.g. s.str[:2] = 'zz' should replace first two characters on all elements. If we do that with broadcasting/alignment of the value, we'll get a way to provide a replacement for each element for free: s.str[:2] = ['aa', 'bb', 'cc'].

And this would allow us to deprecate and drop slice and slice_replace methods completely and use their pythonic counterparts.

@billletson
Copy link
Contributor Author

Well, strings being immutable there is no slice_replace-like behavior to replicate, but we can replicate the behavior with lists, which is the second option in each case (insert at start index or append to end). Implementing setitem is fairly easy as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Strings String extension data type and string data
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants