Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[query] NDArray Slice with Indices Needs Bounds Checks #9223

Merged

Conversation

johnc1231
Copy link
Contributor

@johnc1231 johnc1231 commented Aug 6, 2020

CHANGELOG: Fixed issue where ndarrays being sliced and indexed into in one expression didn't have sufficient bounds checks

Fixes #9144 by checking if indices provided in a slicing expression that mixes slices and single indices are out of bounds.

Alex, please tell me if you're able to get anymore errors from #9144 with this change. I wasn't able to, but that issue was not super well defined at first so it's unclear to me if this covers all of your problems. I retitled the issue to reflect my understanding of the problems.

More detail:

In numpy, it is ok for the upper bound of a slice to go past the end of an array, but it is not ok for an indexing operation to do the same. For example:

n = np.array([[1,2,3,4], [1,2,3,4]])
n[0:1000, 0:1000]

is allowed, the bounds just get clamped. However, this is not allowed:

n[1000, 1000]

nor is:

n[1000, 0:1000]

Hail was not handling that last case with a mix of slices and indices correctly. Namely, it was not throwing an out bounds error on the first axis index.

@akotlar
Copy link
Contributor

akotlar commented Aug 6, 2020

Nice!

@@ -3617,8 +3617,12 @@ def __getitem__(self, item):
stop = hl.cond(step > 0, dlen, to_expr(-1, tint64))

slices.append(hl.tuple((start, stop, step)))
else:
slices.append(s)
else: # Not a slice, just an int index
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can avoid this comment; the typechecker already requires this to be either a slice or an int64

@akotlar
Copy link
Contributor

akotlar commented Aug 6, 2020

This also matches the corresponding Numpy error message almost precisely, which is nice. Difference is that "Index" is lowercase for Numpy. It's too bad that we can't keep the error message in python land, and the stack trace is utterly useless to a user. I'll make an issue for this

akotlar
akotlar previously requested changes Aug 6, 2020
Copy link
Contributor

@akotlar akotlar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove comment and it's perfect

else: # Not a slice, just an int index
adjusted_index = hl.if_else(s < 0, s + dlen, s)
checked_int = hl.case().when((adjusted_index < dlen) & (adjusted_index >= 0), adjusted_index).or_error(
hl.str("Index ") + hl.str(s) + hl.str(f" is out of bounds for axis {i} with size ") + hl.str(dlen)
Copy link
Contributor

@akotlar akotlar Aug 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be simplified to a single string expression I think

.or_error(f"Index {hl.str(s)} is out of bounds for axis {i} with size {hl.str(dlen)}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't do this. f strings get turned into python strings at python runtime, which means you get:

Index <StringExpression of type str> is out of bounds for axis 0 with size <StringExpression of type str>'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd need a special hail string formatting thing. Maybe that exists, I didn't look hard for it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, cool, thanks for the explanation

@johnc1231
Copy link
Contributor Author

I agree with you about the stack traces though. If something expected happens, I don't want one.

@akotlar
Copy link
Contributor

akotlar commented Aug 6, 2020

needs bump, looks like random failure (shuffler was last test run)

@danking danking merged commit 9e42047 into hail-is:main Aug 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ndarray: Bounds checking on slicing is not working
3 participants