Port ScalarField, AsyncFieldMixin and friends to rust#23204
Port ScalarField, AsyncFieldMixin and friends to rust#23204tobni merged 1 commit intopantsbuild:mainfrom
Conversation
| def create(cls, field: type[Field], *, provider: str) -> TargetFieldHelpInfo: | ||
| raw_value_type = get_type_hints(field.compute_value)["raw_value"] | ||
| type_hint = pretty_print_type_hint(raw_value_type) | ||
| hints = get_type_hints(field.compute_value) |
There was a problem hiding this comment.
We cannot rely on typehints for help info on fields that are implemented in pyo3. I find this solution more appropriate than trying to update __annotations__ in other ways.
There was a problem hiding this comment.
Would be good to comment here about when we expect raw_value to exist and when we don't. I.e., that that it's not arbitrary, but that we have two well-defined cases.
| } | ||
|
|
||
| fn __hash__(self_: &Bound<'_, Self>, py: Python) -> PyResult<isize> { | ||
| Ok(self_.get_type().hash()? & self_.borrow().value.bind(py).hash()?) |
There was a problem hiding this comment.
Field's __hash__ combined two hashes with bitwise AND. AND biases every bit toward 0 (75% chance per bit), so the output clusters near zero. Since CPython picks hash table buckets from the low-order bits, this clustering means more entries land in the same buckets, turning O(1) lookups into chain walks. With two AND'd hashes, you lose ~12 bits, roughly equivalent to a hash table running at 4096x its expected collision rate.
There was a problem hiding this comment.
Yikes, that is a huge facepalm
There was a problem hiding this comment.
How much of the perf benefit might be attributable to fixing this?
There was a problem hiding this comment.
I haven't measured, this was a drive by when implementing AsyncFieldMixin's hash.
8c53d44 to
fd60669
Compare
fd60669 to
1875d57
Compare
benjyw
left a comment
There was a problem hiding this comment.
LGTM!
To avoid CI churn, feel free to merge as-is and jam that comment I asked for on top of a future follow up PR.
| def create(cls, field: type[Field], *, provider: str) -> TargetFieldHelpInfo: | ||
| raw_value_type = get_type_hints(field.compute_value)["raw_value"] | ||
| type_hint = pretty_print_type_hint(raw_value_type) | ||
| hints = get_type_hints(field.compute_value) |
There was a problem hiding this comment.
Would be good to comment here about when we expect raw_value to exist and when we don't. I.e., that that it's not arbitrary, but that we have two well-defined cases.
The real juice is in porting
SourcesFieldand what that unlocks. This is a stepping stone, hopefully worth the squeeze.This additionally marks Field as frozen, allowing to avoid additional ref-counting overhead.
Still noticable in our 26k python, 100 JS file file repo