PERF: Optimize can_cast_dtype. #2726

groutr · 2023-01-23T19:17:33Z

Don't cast to array
Avoid trial casting and comparing. Use numpy type hierarchy to determine safety.

These two changes make can_cast_array smarter for larger datasets or quick checks.

1. Don't cast to array 2. Avoid trial casting and comparing. Use numpy type hierarchy to determine safety.

groutr · 2023-01-23T19:41:13Z

Surprisingly, accessing the name attribute of a numpy dtype is extremely slow.

alignment
96.1 ns ± 3.26 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
base
94.5 ns ± 2.34 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
byteorder
104 ns ± 1.56 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
char
101 ns ± 1.17 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
descr
445 ns ± 4.26 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
fields
91.7 ns ± 0.72 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
flags
94.2 ns ± 0.296 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
hasobject
92.1 ns ± 0.838 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
isalignedstruct
92 ns ± 0.331 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
isbuiltin
95 ns ± 0.448 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
isnative
98.3 ns ± 2.84 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
itemsize
97.5 ns ± 0.578 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
kind
100 ns ± 0.625 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
metadata
92.6 ns ± 0.844 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
name  <----------------- ?????
3.32 µs ± 38.9 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
names
93.3 ns ± 2.38 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
ndim
96.1 ns ± 0.425 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
newbyteorder
119 ns ± 0.363 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
num
99.5 ns ± 4.11 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
shape
96.5 ns ± 3.25 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
str
360 ns ± 2.29 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
subdtype
96.8 ns ± 0.666 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
type
92.8 ns ± 0.799 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

snowman2 · 2023-01-25T04:07:18Z

rasterio/dtypes.py

-    else:
-        return np.array_equal(values, values.astype(dtype))
+    min_dtype = get_minimum_dtype(values)
+    return np.can_cast(min_dtype, dtype_name, casting='safe')


I see in https://numpy.org/doc/stable/reference/generated/numpy.can_cast.html, that it can accept an array:

Suggested change

return np.can_cast(min_dtype, dtype_name, casting='safe')

return np.can_cast(values, dtype_name, casting='safe')

Have you tried this out to see what the performance is like versus get_minimum_dtype?

I didn't do this because it won't allow you to downcast (ie go from 'float64' to 'float32' if all your data fits in float32).

snowman2 · 2023-01-25T04:08:47Z

Does this PR depend on #2724?

groutr · 2023-01-25T16:44:12Z

Does this PR depend on #2724?

At the end of the day, it depends on the updated behavior of that PR, though there is no direct dependency (as in one PR must be merged before the other).

EDIT: Looks like the failing test does depend on the fixes in #2724.

This is essentially what dtype.name does, but avoids all the overhead in numpy for accessing dtype.name.

groutr · 2023-01-25T17:17:03Z

Some benchmarks:

# a = np.array([1, 2, 3])
In [7]: %timeit dtypes.can_cast_dtype(a, 'float32')    # old
14.2 µs ± 277 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [10]: %timeit dtypes.can_cast_dtype(a, 'float32')   # new
7.28 µs ± 33.1 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

# a = np.random.rand(512, 512)  # dtype is float64
In [9]: %timeit dtypes.can_cast_dtype(a, 'float32')   # old
4.48 ms ± 51.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [16]: %timeit dtypes.can_cast_dtype(a, 'float32')   # new
147 µs ± 3.73 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

sgillies · 2024-01-03T22:01:15Z

@groutr @snowman2 what about just using np.can_cast() in features.py?

groutr · 2024-01-04T00:06:49Z

@sgillies See #2726 (comment)

get_minimum_dtype is an important piece of rasterio's can_cast_dtype to allow demoting to a smaller type if possible.

sgillies · 2024-01-04T15:59:36Z

@groutr yes, I realized yesterday that rasterize() is fancy and gets its output dtype from 4 different options and tries to minimize the dtype. I seriously think that this is too fancy. I prefer we match the simpler behavior of numpy.array(numbers), which is to create either an int64 or float64 array depending on whether the numbers include floats regardless of the values, with an option to explicitly choose. Doing so cuts the complexity of that function by more than half and breaks very little. It could produce unexpectedly large arrays, but only by a factor of 4 in the worst case, and we're not removing the option to explicitly clamp the type.

sgillies · 2024-01-05T15:40:56Z

Closing in favor of #3003, which eliminates internal use of can_cast_dtype() and get_minimum_dtype().

Optimize can_cast_dtype.

be04105

1. Don't cast to array 2. Avoid trial casting and comparing. Use numpy type hierarchy to determine safety.

groutr mentioned this pull request Jan 23, 2023

Enhancements to validate_dtype #2725

Closed

Restore missing check for ndarray.

0f95db7

snowman2 reviewed Jan 25, 2023

View reviewed changes

snowman2 changed the title ~~Optimize can_cast_dtype.~~ PERF: Optimize can_cast_dtype. Jan 25, 2023

snowman2 added the enhancement label Jan 25, 2023

snowman2 added this to the 1.4.0 milestone Jan 25, 2023

groutr added 3 commits January 25, 2023 10:51

Add faster method for dtype str.

b890368

This is essentially what dtype.name does, but avoids all the overhead in numpy for accessing dtype.name.

Handle bool special case.

f6c49fc

Test _dtype_name.

7f6844e

groutr mentioned this pull request Feb 8, 2023

rasterio.dtypes improvements #2763

Open

sgillies removed this from the 1.4.0 milestone Dec 21, 2023

sgillies closed this Jan 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF: Optimize can_cast_dtype. #2726

PERF: Optimize can_cast_dtype. #2726

groutr commented Jan 23, 2023

groutr commented Jan 23, 2023

snowman2 Jan 25, 2023 •

edited

Loading

groutr Jan 25, 2023

snowman2 commented Jan 25, 2023

groutr commented Jan 25, 2023 •

edited

Loading

groutr commented Jan 25, 2023

sgillies commented Jan 3, 2024

groutr commented Jan 4, 2024 •

edited

Loading

sgillies commented Jan 4, 2024

sgillies commented Jan 5, 2024

	return np.can_cast(min_dtype, dtype_name, casting='safe')
	return np.can_cast(values, dtype_name, casting='safe')

PERF: Optimize can_cast_dtype. #2726

PERF: Optimize can_cast_dtype. #2726

Conversation

groutr commented Jan 23, 2023

groutr commented Jan 23, 2023

snowman2 Jan 25, 2023 • edited Loading

Choose a reason for hiding this comment

groutr Jan 25, 2023

Choose a reason for hiding this comment

snowman2 commented Jan 25, 2023

groutr commented Jan 25, 2023 • edited Loading

groutr commented Jan 25, 2023

sgillies commented Jan 3, 2024

groutr commented Jan 4, 2024 • edited Loading

sgillies commented Jan 4, 2024

sgillies commented Jan 5, 2024

snowman2 Jan 25, 2023 •

edited

Loading

groutr commented Jan 25, 2023 •

edited

Loading

groutr commented Jan 4, 2024 •

edited

Loading