Skip to content

Conversation

@ldematte
Copy link
Contributor

The C and C++ standards do not prescribe a fixed size for integral and floating-point primitive data types (e.g. int or float); those depends on the platform (compiler + OS + architecture).
Java, on the other end, has a fixed and precise width for each primitive type. Getting those to match is important for interoperability between Java and the native libraries via FFI.

The newest C/C++ standards define new types with explicit width, e.g. int8_t or float32_t; this PR updates the function definitions of the simdvec native library to use these types.

@elasticsearchmachine elasticsearchmachine added Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.3.0 labels Nov 21, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

Copy link
Contributor

@ChrisHegarty ChrisHegarty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, yes, yes. Thank you.

@ChrisHegarty
Copy link
Contributor

I assume that we'll bump the native lib version in this PR too.

@ldematte
Copy link
Contributor Author

I assume that we'll bump the native lib version in this PR too

I'm torn; yes, I think it's worth it. At least, it will help us testing it going through CI.

@ldematte
Copy link
Contributor Author

ldematte commented Nov 21, 2025

float32_t is part of C++23, but not of C23 unfortunately; clang on ARM seems to not care and defines it anyway, but on x64 does not.
On the other hand, we should be OK with float: while it's true that float, double, and long double, have very lenient requirements, those are actually OK in the case of float - it is basically constrained to be 32 bits. See C23 Standard - E [Implementation limits]
I've reverted the float32_t changes for now, but I'll keep investigating for a bit before surrendering :)

@ldematte ldematte added the test-arm Pull Requests that should be tested against arm agents label Nov 24, 2025
@ldematte ldematte requested a review from a team as a code owner November 24, 2025 09:18
@ldematte
Copy link
Contributor Author

OK, I have re-introduced the fixed 32-bit float type, but this time it's our own type defined in the best possible way:

  • std::float32_t if we are compiling C++ code with a C++23 standard compile
  • _Float32 if we are compiling C code with a C23 standard compile
  • otherwise float, but we check it is 32-bit wide

@ldematte
Copy link
Contributor Author

I've tested this with a variety of compilers: clang19, clang17, gcc12 and gcc13 (supporting the 23 standards and not), and it seems to work well. We use clang18/17 atm, and that is well covered.

@ldematte
Copy link
Contributor Author

@ChrisHegarty let me know if still LGTM - I have addressed the float situation and bumped versions.

Copy link
Contributor

@ChrisHegarty ChrisHegarty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SLGTM

@ldematte ldematte merged commit eba1e2a into elastic:main Nov 24, 2025
40 checks passed
szybia added a commit to szybia/elasticsearch that referenced this pull request Nov 24, 2025
…-json

* upstream/main: (247 commits)
  Mute org.elasticsearch.xpack.inference.integration.SemanticTextIndexOptionsIT testValidateIndexOptionsWithBasicLicense elastic#138513
  Mute org.elasticsearch.xpack.esql.heap_attack.HeapAttackLookupJoinIT testLookupExplosionBigString elastic#138510
  This shouldn't be zero (elastic#138501)
  sum of empty histogram is now null (elastic#138378)
  Test ES|QL bfloat16 support (elastic#138499)
  Fix exception handling in S3 `compareAndExchangeRegister` (elastic#138488)
  Mute org.elasticsearch.xpack.exponentialhistogram.ExponentialHistogramFieldMapperTests testFormattedDocValues elastic#138504
  Mute org.elasticsearch.ingest.geoip.IngestGeoIpClientYamlTestSuiteIT test {yaml=ingest_geoip/60_ip_location_databases/Test adding, getting, and removing ip location databases} elastic#138502
  ESQL: Refactor HeapAttackIT (elastic#138432)
  [Inference API] Add ElasticInferenceServiceDenseTextEmbeddingsServiceSettings to InferenceNamedWriteablesProvider (elastic#138484)
  Store split indices (elastic#138396)
  ES|QL Update CHUNK to support chunking_settings as optional argument (elastic#138123)
  Extract common blob-update logic in `S3HttpHandler` (elastic#138490)
  Cleanup esql request building api (elastic#138398)
  Round sum and avg in exponential_histogram CSV tests (elastic#138472)
  ESQL: load exponential_histogram total count as double instead of long (elastic#138417)
  [SIMD] Use fixed width native types for better Java interoperability (elastic#138429)
  Do not use Min or Max as Top's surrogate when there is an outputField (elastic#138380)
  ES|QL: Fix generative tests (elastic#138478)
  Mute org.elasticsearch.xpack.inference.integration.AuthorizationTaskExecutorIT testCreatesEisChatCompletion_DoesNotRemoveEndpointWhenNoLongerAuthorized elastic#138480
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>refactoring :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch test-arm Pull Requests that should be tested against arm agents v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants