New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Added np.char.slice_ #20694
Open
madphysicist
wants to merge
9
commits into
numpy:main
Choose a base branch
from
madphysicist:char_slice
base: main
Could not load branches
Branch not found: {{ refName }}
Could not load tags
Nothing to show
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
ENH: Added np.char.slice_ #20694
Changes from all commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
4d0db71
ENH: Added np.char.slice_
madphysicist 4d410ce
MAINT: Fixed linter errors
madphysicist 4a1167b
MAINT: Fixed remaining linter issues
madphysicist 430adcb
ENH: Added support for non-numpy buffers
madphysicist 0056fda
MAINT: Fixed another linter mistake
madphysicist da404de
MAINT: Added release note
madphysicist 89efa96
MAINT: I just want the linter to be happy!
madphysicist 3fd48e9
BUG: Added missing stacklevel
madphysicist 01dc74c
MAINT: Removed annoying swap of start and stop
madphysicist File filter
Filter by extension
Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
Added `numpy.char.slice_` | ||
------------------------- | ||
Allows slicing of strings within character arrays of type `np.str_` and | ||
`np.unicode_`. In addition to the normal slice parameters of `start`, | ||
`stop` and `step`, this function supports a `chunksize` parameter. Slices | ||
with `step == chunksize == 1` are treated slightly differently than other | ||
slices, in that they do not add an extra dimension to the array. This | ||
oft-requested feature is conceptually simple, but ran afoul of the dtype | ||
conversion checks in view creation until now. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be possible to achieve this with
stride_tricks
so that a copy is never returned.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Let me see what stride tricks is doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eric-wieser. I have to ask. How is this line, in
as_strided
fundamentally different from what you marked above:array = np.asarray(DummyArray(interface, base=x))
. Is is becauseDummyArray
is "trusted" because it's already an array subclass, vs what I have here, which is attempting to make a new one from scratch? The only thing that won't work here is thatas_strided
does not accept an offset. I can hack in another parameter to do that.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eric-wieser I tried it your way: madphysicist@e422186. However, as it turns out, it makes no difference. The issue is with changing dtype: both
ndarray
andas_strided
complain equally when you try to do that unless the array is contiguous. Maybe that's something worth bothering about, maybe not. Within the scope of this PR, doing what I'm doing now seems to be a fix. I do need to add a test for an array created as a non-contiguous view into abytes
object or something.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eric-wieser Please take a look at the changes and additional tests I made to deal with non-contiguous base arrays. I basically made the assumption that somewhere there is a contiguous block of memory available, got its address and calculated the size out to the last element I need, then used that to create my view. This is a deficiency in the API: I left a ranty comment about it in the code. There is no reason a responsible user shouldn't be able to use
as_strided
to view with a new dtype. In the meantime, please enjoy my hack.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eric-wieser Given that I basically agree with your preference for using
as_strided
, would you prefer that I move the dtype-modifying code out toas_strided
, similarly to the private branch I linked above? It would definitely make this function cleaner and less error prone.