Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runtime detection of CPU features #53

Merged
merged 12 commits into from Apr 30, 2020
Merged

Conversation

Julow
Copy link
Contributor

@Julow Julow commented Apr 25, 2020

Hi !

This PR adds runtime detection of CPU features.
Both accelerated and generic functions are built and linked. This is implemented with a global variable checked on every call.
The detection is done at initialization time using the library cpuid.

Running the all the benchmarks in bench/ either shows no difference or a slight decrease in performances (consistent 1% on sha1, too much noise on others).

I first tried using modules (on this branch) and selecting the right function at the beginning.
This was slower because calls to external functions were no longer inlined (a closure and caml_apply indirections)

The cpuid library allocates (around 3Kb) data structures that are only needed once but not collected.

I didn't remove the MIRAGE_CRYPTO_ACCELERATE env variable, the acceleration is only enabled on x86_64 systems so removing the variable would not remove any code.

Partially solves issue #45.
The RDRAND/RDSEED flags are still detected at build time but this may not be a problem for deployments as only architecture is checked, which is still in cross-compilation's scope.

@hannesm
Copy link
Member

hannesm commented Apr 25, 2020

thanks @Julow, nice work.
some comments:

  • I'd be in favour to remove the MIRAGE_CRYPTO_ACCELERATE environment variable entirely (from cfg.ml and README)
  • as you mention, the aesni should only be enabled on x86 -- so could we in cfg.ml guard these flags behind the uname -m? (so they won't be enabled on ARM)
  • will cpuid (which includes C code) fine on mirage / solo5?
  • I'm not convinced of the extra C file misc_sse.c, at least the export_counter can be shared with misc.c
  • the rdseed & rdrand CPU features are already detected at run-time, whether the compiler supported -mrdseed is discovered (by uname -m) at compile time (thus, this is fine -- it could now use cpuid in a similar fashion as you do for aes/sse/pcmulq)

| "false" -> []
| _ -> flags
| exception Not_found -> flags
in
let ent_flags =
let c = Configurator.V1.create "mirage-crypto" in
let arch = Configurator.V1.Process.run c "uname" ["-m"] in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just two lines below, the accelerate_flags should be passed -- i.e only on x86 platform, not on arm (this is as far as I can tell what cpuid would have done as well)... or am I misguided and ARM nowadays supports -mssse3 -maes -mpclmul?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering the same. Fixed !
Should it be passed for amd64 and x86 ? It seems only x86_64 was selected in mirage-crypto.h. I removed that check so we have only one place to worry about this.

@Julow
Copy link
Contributor Author

Julow commented Apr 27, 2020

I'd be in favour to remove the MIRAGE_CRYPTO_ACCELERATE environment variable entirely (from cfg.ml and README)

Done.

as you mention, the aesni should only be enabled on x86 -- so could we in cfg.ml guard these flags behind the uname -m? (so they won't be enabled on ARM)

Currently this is guarded behind the preprocessor macro __x86_64__, which is defined by the compiler. Is the uname check necessary to build on other architectures ? (eg. what happen if we pass the -mssse3 to the compiler on ARM?)

will cpuid (which includes C code) fine on mirage / solo5?

I didn't test that yet. How can I ensures that ? Building a simple unikernel on my machine is enough ?

I'm not convinced of the extra C file misc_sse.c, at least the export_counter can be shared with misc.c

The export_counter macro is different too. I kept it to keep similarity between the two files.
I created an other file for similarity with other "accelerated" files (aes_aesni and ghash_pclmul). Should I join the files ?

the rdseed & rdrand CPU features are already detected at run-time, whether the compiler supported -mrdseed is discovered (by uname -m) at compile time (thus, this is fine -- it could now use cpuid in a similar fashion as you do for aes/sse/pcmulq)

It is also possible to do the opposite, checking all features in C, removing the cpuid library if it is problematic for portability. (or memory)

@avsm
Copy link
Member

avsm commented Apr 27, 2020

I think merging this may need to be dependent on @samoht's port to dune, as the cpuid code will just work then in all supported Mirage backends (since C code is compiled in a workspace automatically in @samoht's patches).

@hannesm
Copy link
Member

hannesm commented Apr 27, 2020

Thanks @Julow

How can I ensures that ? Building a simple unikernel on my machine is enough ?

yes, that should be enough. to properly run on mirageos, C sources need to be "cross-"compiled with different headers (see the freestanding and xen subdirectories in this repository).

The export_counter macro is different too.

hmm, at a second look, it is fine to have the sources as you have in this PR.

It is also possible to do the opposite, checking all features in C

looking at entropy_cpu_stubs.c (which already reads the CPUID and evaluates the feature bits), I'm in favour of this approach: less code, less dependencies, and it doesn't look too horrible.

@Julow
Copy link
Contributor Author

Julow commented Apr 28, 2020

Added a commit that implements cpuid call without using the cpuid OCaml library. No longer needed to worry about cross compilation.
The code look the same and detection is still called from OCaml initialization.

@hannesm
Copy link
Member

hannesm commented Apr 29, 2020

thanks again Julow, I pushed a single commit onto your branch, which:

  • does acceleration as well if uname -m is "amd64" (as BSDs do it)
  • extend and use the cpu feature detection in entropy_cpu_stubs (and guard the feature detection only by i386 || x86_64, not -DACCELERATE))

I hope these changes are fine with you, if that is the case, and CI is green, I'm eager to merge this (all squashed into a single commit). 🌴

…res) also in mirage-crypto-entropy (entropy_cpu_stubs)
Copy link
Member

@hannesm hannesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

hannesm and others added 3 commits April 30, 2020 10:38
Call [caml_entroy_detect] even when it's not needed. This ensures it is always called after
cpu features detection.
@hannesm hannesm merged commit a7ced09 into mirage:master Apr 30, 2020
@hannesm
Copy link
Member

hannesm commented Apr 30, 2020

merged, the CI failure was a transient network failure, and other macOS jobs were successful.

@hannesm hannesm mentioned this pull request Apr 30, 2020
hannesm added a commit to hannesm/opam-repository that referenced this pull request May 18, 2020
…0.7.0)

CHANGES:

* CPU feature detection (AESNI, SSE3, PCLMULQ) at runtime instead of compile
  time (mirage/mirage-crypto#53 @Julow, fixed MirageOS support mirage/mirage-crypto#61, review by @hannesm)
  performance hit up to 5%
* Revise entropy collection (mirage/mirage-crypto#64 @hannesm review by @dinosaure @cfcs)
  mirage-crypto-entropy has been folded into mirage-crypto-rng.{unix,lwt,mirage}
  - the RNG is no longer fork() safe, if you use fork in your code, be sure to
    reseed the RNG in the child process
  - on Unix and Lwt, the used RNG is Fortuna, seeded by getrandom(),
    rdrand/rdseed, and whirlwind
  - Mirage_crypto_rng_lwt does entropy collection for Lwt applications
  - entropy collection is now similar to FreeBSD:
    - rdrand/rdseed is executed in a separate task (by default every second)
    - on Unix, getrandom() is executed in another separate task (by default
      every 10 seconds)
    - on every enter of the Lwt event loop, some bits of rdtsc are collected
      (rdrand/rdseed is not on each even loop enter anymore)
  - Fortuna only uses entropy pools if the given period is exhausted (defaults
    to 1s), and the pool size exceeds 64 bytes
  - The unseeded generator exception prints instructions how to seed the RNG
* 32 bit support (for ghash), requested by @TImada in mirage/mirage-crypto#60, mirage/mirage-crypto#65 @hannesm
* use Eqaf_cstruct.find_uint8 instead of Cs.ct_find_uint8 (mirage/mirage-crypto#52 @dinosaure)
* add (:standard) in C flags to allow cross-compilation mirage/mirage-crypto#47 @samoht
* Mirage_crypto.Uncommon: remove several functions (Cs.create, Option),
  requires OCaml 4.08 (mirage/mirage-crypto#49 mirage/mirage-crypto#51 @hannesm)
* remove ocplib-endian dependency, use Bytes directly (since 4.07) mirage/mirage-crypto#51 @hannesm
* bitfn.h cleanup (mirage/mirage-crypto#56 mirage/mirage-crypto#58 @hannesm)
* fix build if opam is not available (mirage/mirage-crypto#66 @hannesm)
* update test.yml GitHub actions (mirage/mirage-crypto#44 mirage/mirage-crypto#57 @imbsky)
* Travis CI for arm64 (mirage/mirage-crypto#55 @hannesm)
@hannesm hannesm mentioned this pull request Nov 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants