Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: str accessor #2991

Merged
merged 1 commit into from Jun 10, 2019
Merged

ENH: str accessor #2991

merged 1 commit into from Jun 10, 2019

Conversation

0x0L
Copy link
Contributor

@0x0L 0x0L commented May 25, 2019

Hello,

Some of the pandas str functionalities. Instead of wrapping pandas internal as in #2983 I copy/pasted the code since it's simple and tiny.

Currently it's a bit more restrictive than pandas since it expects all elements to be string like.

  • Closes string accessor #2983
  • Tests added
  • Fully documented, including whats-new.rst for all changes and api.rst for new API

@pep8speaks
Copy link

pep8speaks commented May 25, 2019

Hello @0x0L! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-06-01 17:10:58 UTC

@0x0L 0x0L force-pushed the str_accessor branch 2 times, most recently from 0d8f4e9 to d7dc294 Compare May 25, 2019 21:33
Copy link
Member

@shoyer shoyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs tests, but looks like a great start!

This is getting a little long, so maybe it would make sense to split this into another module, e.g., xarray/core/str_accessor.py? The existing xarray/core/accessors.py could be renamed to xarray/core/dt_accessor.py.

f = lambda x: x[s]
return self._apply(f)

def slice_replace(self, start=None, stop=None, repl=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this just be repl=''?


def _apply(self, f):
# TODO handling of na values ?
return apply_ufunc(np.vectorize(f), self._obj)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add dask='parallelized to add dask support? I'm pretty sure that should "just work" with these functions.

@0x0L
Copy link
Contributor Author

0x0L commented May 26, 2019

@shoyer
The api reference does not generate doc for the accessor methods. It's also missing for .dt
Do you know how to do this ?

@shoyer
Copy link
Member

shoyer commented May 27, 2019

You will need to reference the underlying class, e.g., xarray.core.str_accessor.StringAccessor. We do something similar for GroupBy objects: http://xarray.pydata.org/en/stable/api.html#groupby-objects

@shoyer
Copy link
Member

shoyer commented May 27, 2019

@0x0L did you copy any of this code or tests from pandas? If so, that's totally fine, but we should include the original pandas copyright notice. See what we did in cftimeindex.py for an example: https://github.com/pydata/xarray/blob/master/xarray/coding/cftimeindex.py

@0x0L 0x0L changed the title [WIP] str accessor ENH: str accessor Jun 1, 2019
@0x0L
Copy link
Contributor Author

0x0L commented Jun 8, 2019

@shoyer it should be all good now

@shoyer shoyer merged commit fa55060 into pydata:master Jun 10, 2019
@shoyer
Copy link
Member

shoyer commented Jun 10, 2019

thanks @0x0L !

dcherian added a commit to dcherian/xarray that referenced this pull request Jun 24, 2019
* master: (31 commits)
  Add quantile method to GroupBy (pydata#2828)
  rolling_exp (nee ewm) (pydata#2650)
  Ensure explicitly indexed arrays are preserved (pydata#3027)
  add back dask-dev tests (pydata#3025)
  ENH: keepdims=True for xarray reductions (pydata#3033)
  Revert cmap fix (pydata#3038)
  Add "errors" keyword argument to drop() and drop_dims() (pydata#2994) (pydata#3028)
  More consistency checks (pydata#2859)
  Check types in travis (pydata#3024)
  Update issue templates (pydata#3019)
  Add pytest markers to avoid warnings (pydata#3023)
  Feature/merge errormsg (pydata#2971)
  More support for missing_value. (pydata#2973)
  Use flake8 rather than pycodestyle (pydata#3010)
  Pandas labels deprecation (pydata#3016)
  Pytest capture uses match, not message (pydata#3011)
  dask-dev tests to allowed failures in travis (pydata#3014)
  Fix 'to_masked_array' computing dask arrays twice (pydata#3006)
  str accessor (pydata#2991)
  fix safe_cast_to_index (pydata#3001)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

string accessor
3 participants