Skip to content

Port ScalarField, AsyncFieldMixin and friends to rust#23204

Merged
tobni merged 1 commit intopantsbuild:mainfrom
tobni:add/port-scalar-and-async-mixin-fields
Apr 2, 2026
Merged

Port ScalarField, AsyncFieldMixin and friends to rust#23204
tobni merged 1 commit intopantsbuild:mainfrom
tobni:add/port-scalar-and-async-mixin-fields

Conversation

@tobni
Copy link
Copy Markdown
Contributor

@tobni tobni commented Mar 31, 2026

The real juice is in porting SourcesField and what that unlocks. This is a stepping stone, hopefully worth the squeeze.

This additionally marks Field as frozen, allowing to avoid additional ref-counting overhead.

Still noticable in our 26k python, 100 JS file file repo

hyperfine --warmup 1 --runs 3 \                                                         
        -n 'main' \                                                                                                             
        --prepare 'git -C <redacted>/pants checkout main --quiet' \                                                      
        'PANTS_SOURCE=<redacted>/pants PYENV_VERSION=pants@3.14.3 pants --no-pantsd dependencies :: > /dev/null' \     
        -n 'branch' \                                                                                                           
        --prepare 'git -C <redacted>/pants checkout add/port-scalar-and-async-mixin-fields --quiet' \                  
        'PANTS_SOURCE=<redacted>/pants PYENV_VERSION=pants@3.14.3 pants --no-pantsd dependencies :: > /dev/null' \     
        2>&1)
  ⎿  Benchmark 1: main
       Time (mean ± σ):     35.261 s ±  1.261 s    [User: 40.704 s, System: 12.371 s]
       Range (min … max):   34.021 s … 36.542 s    3 runs
      
     Benchmark 2: branch                                                                                                        
       Time (mean ± σ):     33.615 s ±  0.202 s    [User: 38.675 s, System: 12.403 s]
       Range (min … max):   33.478 s … 33.847 s    3 runs      

@tobni tobni added category:performance release-notes:not-required [CI] PR doesn't require mention in release notes category:internal CI, fixes for not-yet-released features, etc. labels Mar 31, 2026
@tobni tobni changed the title Port ScalarField and AsyncFieldMixin to rust Port ScalarField, AsyncFieldMixin and friends to rust Mar 31, 2026
def create(cls, field: type[Field], *, provider: str) -> TargetFieldHelpInfo:
raw_value_type = get_type_hints(field.compute_value)["raw_value"]
type_hint = pretty_print_type_hint(raw_value_type)
hints = get_type_hints(field.compute_value)
Copy link
Copy Markdown
Contributor Author

@tobni tobni Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot rely on typehints for help info on fields that are implemented in pyo3. I find this solution more appropriate than trying to update __annotations__ in other ways.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to comment here about when we expect raw_value to exist and when we don't. I.e., that that it's not arbitrary, but that we have two well-defined cases.

}

fn __hash__(self_: &Bound<'_, Self>, py: Python) -> PyResult<isize> {
Ok(self_.get_type().hash()? & self_.borrow().value.bind(py).hash()?)
Copy link
Copy Markdown
Contributor Author

@tobni tobni Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Field's __hash__ combined two hashes with bitwise AND. AND biases every bit toward 0 (75% chance per bit), so the output clusters near zero. Since CPython picks hash table buckets from the low-order bits, this clustering means more entries land in the same buckets, turning O(1) lookups into chain walks. With two AND'd hashes, you lose ~12 bits, roughly equivalent to a hash table running at 4096x its expected collision rate.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yikes, that is a huge facepalm

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much of the perf benefit might be attributable to fixing this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't measured, this was a drive by when implementing AsyncFieldMixin's hash.

@tobni tobni force-pushed the add/port-scalar-and-async-mixin-fields branch 5 times, most recently from 8c53d44 to fd60669 Compare April 1, 2026 07:59
@tobni tobni force-pushed the add/port-scalar-and-async-mixin-fields branch from fd60669 to 1875d57 Compare April 1, 2026 08:20
@tobni tobni requested a review from benjyw April 1, 2026 08:36
Copy link
Copy Markdown
Contributor

@benjyw benjyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

To avoid CI churn, feel free to merge as-is and jam that comment I asked for on top of a future follow up PR.

def create(cls, field: type[Field], *, provider: str) -> TargetFieldHelpInfo:
raw_value_type = get_type_hints(field.compute_value)["raw_value"]
type_hint = pretty_print_type_hint(raw_value_type)
hints = get_type_hints(field.compute_value)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to comment here about when we expect raw_value to exist and when we don't. I.e., that that it's not arbitrary, but that we have two well-defined cases.

@tobni tobni merged commit a87eeb2 into pantsbuild:main Apr 2, 2026
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category:internal CI, fixes for not-yet-released features, etc. category:performance release-notes:not-required [CI] PR doesn't require mention in release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants