Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: Optimize can_cast_dtype. #2726

Closed
wants to merge 5 commits into from

Conversation

groutr
Copy link
Contributor

@groutr groutr commented Jan 23, 2023

  1. Don't cast to array
  2. Avoid trial casting and comparing. Use numpy type hierarchy to determine safety.

These two changes make can_cast_array smarter for larger datasets or quick checks.

1. Don't cast to array
2. Avoid trial casting and comparing. Use numpy type hierarchy to determine safety.
@groutr
Copy link
Contributor Author

groutr commented Jan 23, 2023

Surprisingly, accessing the name attribute of a numpy dtype is extremely slow.

alignment
96.1 ns ± 3.26 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
base
94.5 ns ± 2.34 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
byteorder
104 ns ± 1.56 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
char
101 ns ± 1.17 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
descr
445 ns ± 4.26 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
fields
91.7 ns ± 0.72 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
flags
94.2 ns ± 0.296 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
hasobject
92.1 ns ± 0.838 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
isalignedstruct
92 ns ± 0.331 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
isbuiltin
95 ns ± 0.448 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
isnative
98.3 ns ± 2.84 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
itemsize
97.5 ns ± 0.578 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
kind
100 ns ± 0.625 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
metadata
92.6 ns ± 0.844 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
name  <----------------- ?????
3.32 µs ± 38.9 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
names
93.3 ns ± 2.38 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
ndim
96.1 ns ± 0.425 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
newbyteorder
119 ns ± 0.363 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
num
99.5 ns ± 4.11 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
shape
96.5 ns ± 3.25 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
str
360 ns ± 2.29 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
subdtype
96.8 ns ± 0.666 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
type
92.8 ns ± 0.799 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

else:
return np.array_equal(values, values.astype(dtype))
min_dtype = get_minimum_dtype(values)
return np.can_cast(min_dtype, dtype_name, casting='safe')
Copy link
Member

@snowman2 snowman2 Jan 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see in https://numpy.org/doc/stable/reference/generated/numpy.can_cast.html, that it can accept an array:

Suggested change
return np.can_cast(min_dtype, dtype_name, casting='safe')
return np.can_cast(values, dtype_name, casting='safe')

Have you tried this out to see what the performance is like versus get_minimum_dtype?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't do this because it won't allow you to downcast (ie go from 'float64' to 'float32' if all your data fits in float32).

@snowman2 snowman2 changed the title Optimize can_cast_dtype. PERF: Optimize can_cast_dtype. Jan 25, 2023
@snowman2
Copy link
Member

Does this PR depend on #2724?

@snowman2 snowman2 added this to the 1.4.0 milestone Jan 25, 2023
@groutr
Copy link
Contributor Author

groutr commented Jan 25, 2023

Does this PR depend on #2724?

At the end of the day, it depends on the updated behavior of that PR, though there is no direct dependency (as in one PR must be merged before the other).

EDIT: Looks like the failing test does depend on the fixes in #2724.

This is essentially what dtype.name does, but avoids all the overhead in numpy for accessing dtype.name.
@groutr
Copy link
Contributor Author

groutr commented Jan 25, 2023

Some benchmarks:

# a = np.array([1, 2, 3])
In [7]: %timeit dtypes.can_cast_dtype(a, 'float32')    # old
14.2 µs ± 277 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [10]: %timeit dtypes.can_cast_dtype(a, 'float32')   # new
7.28 µs ± 33.1 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

# a = np.random.rand(512, 512)  # dtype is float64
In [9]: %timeit dtypes.can_cast_dtype(a, 'float32')   # old
4.48 ms ± 51.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [16]: %timeit dtypes.can_cast_dtype(a, 'float32')   # new
147 µs ± 3.73 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

@sgillies sgillies removed this from the 1.4.0 milestone Dec 21, 2023
@sgillies
Copy link
Member

sgillies commented Jan 3, 2024

@groutr @snowman2 what about just using np.can_cast() in features.py?

@groutr
Copy link
Contributor Author

groutr commented Jan 4, 2024

@sgillies See #2726 (comment)

get_minimum_dtype is an important piece of rasterio's can_cast_dtype to allow demoting to a smaller type if possible.

@sgillies
Copy link
Member

sgillies commented Jan 4, 2024

@groutr yes, I realized yesterday that rasterize() is fancy and gets its output dtype from 4 different options and tries to minimize the dtype. I seriously think that this is too fancy. I prefer we match the simpler behavior of numpy.array(numbers), which is to create either an int64 or float64 array depending on whether the numbers include floats regardless of the values, with an option to explicitly choose. Doing so cuts the complexity of that function by more than half and breaks very little. It could produce unexpectedly large arrays, but only by a factor of 4 in the worst case, and we're not removing the option to explicitly clamp the type.

@sgillies
Copy link
Member

sgillies commented Jan 5, 2024

Closing in favor of #3003, which eliminates internal use of can_cast_dtype() and get_minimum_dtype().

@sgillies sgillies closed this Jan 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants