-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lto = "fat" causes doctest to generate invalid code for Apple M1 (and potentially x86) #116941
Comments
have you run the tests with miri to make sure that no UB is happening? |
Hello @Nilstrieb not yet, I guess it's the occasion to use it, I'll report back |
Should I try it on the example reproducing the doctest or is there a way to run it on doctests ? |
I forgot whether Miri runs doctests. But you can just extract the doctest into a binary and run that. |
Hello @Nilstrieb no undefined behavior found by MIRI, it seems rayon does not terminate its threads so MIRI detects that but the doctest taken out as an example is UB free (I adapted our crypto parameters to be very small but still trigger the crashing doctest). Cheers |
adding I-unsound as this looks like a miscompilation. |
agreed, for now it's a bit hard as we used to hit that a bit randomly until we identified it seemed linked to doctests + LTO, we will try to find a way to minify it, in the meantime the assembly in the original report should already contain the miscompiled code (though I understand it's likely way too big for a reasonable analysis) Cheers |
FWIW, a pattern I've seen a few times with doctests + fat lto is that doctests are unoptimized, so you get optimized IR from LTO fed into the backend with disabled optimization. This can expose bugs in FastISel that we don't otherwise hit. The key to reproducing it outside doc tests may be to use an lto=fat opt-level=0 configuration. |
Thanks will give it a shot, I'm surprised doctest don't honor opitmization/configuration from the Cargo profile though 🤔 any reason for this? If the various parts are not supposed to have mismatched opt levels then yes I can see how some hypothesis done would break at the junction of the optimized code and unoptimized one |
For the doctest as an example binary with the lto=fat opt-level=0 configuration I cannot get it to crash |
setting the opt-level=0 in release allows the doctest to pass as well, so a mixing of LTO and non 0 opt level seems to trigger the issue for the doctest, trying to minimize the repro case |
setting opt-level=1 in release with LTO=fat crashes the doc test |
Hello again, some news, I have a state where I'm able to trigger the crash at will with a single line being uncommented. the repro branch is here https://github.com/zama-ai/tfhe-rs/tree/am/doctest_bug MIRI is still OK on the doctest taken as an example. The interesting part is in here, the line below nok_function_call.S has both the call to the function and the inlined version use tfhe::core_crypto::prelude::*;
use tfhe::shortint::engine::ShortintEngine;
use tfhe::shortint::parameters::PARAM_MESSAGE_2_CARRY_2_KS_PBS;
use tfhe::shortint::server_key::{MaxDegree, ShortintCompressedBootstrappingKey};
use tfhe::shortint::{
ClientKey as ShortintClientKey, CompressedServerKey as ShortintCompressedServerKey,
};
fn main() {
{
let cks = ShortintClientKey::new(PARAM_MESSAGE_2_CARRY_2_KS_PBS);
// let compressed_sks = ShortintCompressedServerKey::new(&cks);
let mut engine = ShortintEngine::new();
// let compressed_sks = engine.new_compressed_server_key(&cks).unwrap();
// Plaintext Max Value
let max_value = cks.parameters.message_modulus().0 * cks.parameters.carry_modulus().0 - 1;
// The maximum number of operations before we need to clean the carry buffer
let max_degree = MaxDegree(max_value);
// UNCOMMENT TO PRODUCE THE MISCOMPILE
let compressed_sks = engine.new_compressed_server_key_with_max_degree(&cks, max_degree);
// THIS BELOW IS THE SAME AS THE ABOVE FUNCTION INLINED
let compressed_sks = {
let bootstrapping_key = match cks.parameters.pbs_parameters().unwrap() {
tfhe::shortint::PBSParameters::PBS(pbs_params) => {
let bootstrapping_key = allocate_and_generate_new_seeded_lwe_bootstrap_key(
&cks.small_lwe_secret_key,
&cks.glwe_secret_key,
pbs_params.pbs_base_log,
pbs_params.pbs_level,
pbs_params.glwe_modular_std_dev,
pbs_params.ciphertext_modulus,
&mut engine.seeder,
);
ShortintCompressedBootstrappingKey::Classic(bootstrapping_key)
}
tfhe::shortint::PBSParameters::MultiBitPBS(pbs_params) => {
let bootstrapping_key =
par_allocate_and_generate_new_seeded_lwe_multi_bit_bootstrap_key(
&cks.small_lwe_secret_key,
&cks.glwe_secret_key,
pbs_params.pbs_base_log,
pbs_params.pbs_level,
pbs_params.glwe_modular_std_dev,
pbs_params.grouping_factor,
pbs_params.ciphertext_modulus,
&mut engine.seeder,
);
ShortintCompressedBootstrappingKey::MultiBit {
seeded_bsk: bootstrapping_key,
deterministic_execution: pbs_params.deterministic_execution,
}
}
};
// Creation of the key switching key
let key_switching_key = allocate_and_generate_new_seeded_lwe_keyswitch_key(
&cks.large_lwe_secret_key,
&cks.small_lwe_secret_key,
cks.parameters.ks_base_log(),
cks.parameters.ks_level(),
cks.parameters.lwe_modular_std_dev(),
cks.parameters.ciphertext_modulus(),
&mut engine.seeder,
);
// Pack the keys in the server key set:
ShortintCompressedServerKey {
key_switching_key,
bootstrapping_key,
message_modulus: cks.parameters.message_modulus(),
carry_modulus: cks.parameters.carry_modulus(),
max_degree,
ciphertext_modulus: cks.parameters.ciphertext_modulus(),
pbs_order: cks.parameters.encryption_key_choice().into(),
}
};
}
println!("MIRI run done");
} run MIRI
Run doctest
|
still feels like the wrong part of the stack gets read in the function where a value should be a 0 and something else is read from who knows where edit: will try to minify some more |
With some logging from the inlined non bugged version allocate_and_generate_new_seeded_lwe_bootstrap_key=CiphertextModulus(2^64)
generate_seeded_lwe_bootstrap_key=CiphertextModulus(2^64)
encrypt_constant_seeded_ggsw_ciphertext_with_existing_generator=CiphertextModulus(2^64)
encrypt_constant_seeded_ggsw_ciphertext_with_existing_generator=CiphertextModulus(2^64)
encrypt_constant_seeded_ggsw_ciphertext_with_existing_generator=CiphertextModulus(2^64)
encrypt_constant_seeded_ggsw_ciphertext_with_existing_generator=CiphertextModulus(2^64)
MIRI run done Edit: the "MIRI run done" was a check for the MIRI example, I share the code between both, both runs here are normal runs from the function call bugged version new_compressed_server_key_with_max_degree=CiphertextModulus(2^64)
allocate_and_generate_new_seeded_lwe_bootstrap_key=CiphertextModulus(2^64)
generate_seeded_lwe_bootstrap_key=CiphertextModulus(2^64)
encrypt_constant_seeded_ggsw_ciphertext_with_existing_generator=CiphertextModulus(2^64)
encrypt_constant_seeded_ggsw_ciphertext_with_existing_generator=CiphertextModulus(79395112631681133340136570880) looks like the first iteration of a loop is fine but then the data gets corrupted here the native modulus 2^64 is encoded as 0 in a u128, so the 2^64 are valid, but the 79395112631681133340136570880 is random data and seems to be changing from run to run |
looks similar in spirit to #112548 with the sensitivity to opt levels, though I'm not familiar with the mir opt level |
Hello, minified the example to an iterator call returning corrupted data when a specific feature (on which the code does not depend) is enabled. Disabling said feature makes the code run properly, changing the iterator to be an immutable iter does not cause the issue (https://github.com/zama-ai/tfhe-rs/blob/73bf8af9ec7eca7b36f016b2bbfeccfd3b1ac7d2/tfhe/src/lib.rs#L103) The iterator in question is a wrapping lending iterator that is defined in https://github.com/zama-ai/tfhe-rs/blob/73bf8af9ec7eca7b36f016b2bbfeccfd3b1ac7d2/tfhe/src/core_crypto/commons/traits/contiguous_entity_container.rs#L326, immutable variant is here https://github.com/zama-ai/tfhe-rs/blob/73bf8af9ec7eca7b36f016b2bbfeccfd3b1ac7d2/tfhe/src/core_crypto/commons/traits/contiguous_entity_container.rs#L127 Available here https://github.com/zama-ai/tfhe-rs/tree/am/doctest_bug_minify Run the doctest and crash it :
Run the doctest and does not crash :
Tried to take the code out to a different repo, it did not repro Let me know if there is something more I can do, but here I can't seem to minify it anymore than that at the moment |
I can reproduce the crash on aarch64-linux after also enabling the seeder_unix feature. I'm not sure how to debug though -- it doesn't seem like rustdoc has any support for that at all? Passing |
I do the following (notice RUSTDOCFLAGS) :
then
and copy the only executable found in there Yes I agree with the O3 thing, I just don't quite get why rustdoc does not use the configuration from the cargo profile provided in the command line Edit: I'm guessing it's not an easy problem and there may be a reason for this |
should there be a rustdoc specific bug report somewhere ? I have to say I posted here mainly because I found an old issue that looked similar |
Thanks, using I've used this LLVM patch (https://gist.github.com/nikic/7fd69aef8f3bb8401db508e5ff08324d) to identify
Yes, I think that would be a good idea. I think this is probably a cargo bug, as |
Thanks a lot 🙏 and great that you could find the faulty function!
Alright then you advise opening an issue on https://github.com/rust-lang/cargo to let them know of the opt level not being forwarded ? |
Upstream issue: llvm/llvm-project#70207
Yeah. It might be intentional, but it seems suspicious to forward |
WG-prioritization assigning priority (Zulip discussion). @rustbot label -I-prioritize +P-high |
Update to LLVM 17.0.4 Fixes rust-lang#116668. Fixes rust-lang#116941. Fixes rust-lang#116976. r? `@cuviper`
Update to LLVM 17.0.4 Fixes rust-lang#116668. Fixes rust-lang#116941. Fixes rust-lang#116976. r? `@cuviper`
- following merge of 17.0.4 in rust stable the bug uncovered by lto on aarch64 has been fixed rust-lang/rust#116941 so we remove the hard coded override
- following merge of 17.0.4 in rust stable the bug uncovered by lto on aarch64 has been fixed rust-lang/rust#116941 so we remove the hard coded override - update nightly toolchain to have fixed LLVM as well - fix lints linked to latest nightly
- following merge of 17.0.4 in rust stable the bug uncovered by lto on aarch64 has been fixed rust-lang/rust#116941 so we remove the hard coded override - update nightly toolchain to have fixed LLVM as well - fix lints linked to latest nightly
- following merge of 17.0.4 in rust stable the bug uncovered by lto on aarch64 has been fixed rust-lang/rust#116941 so we remove the hard coded override - update nightly toolchain to have fixed LLVM as well - fix lints linked to latest nightly
- following merge of 17.0.4 in rust stable the bug uncovered by lto on aarch64 has been fixed rust-lang/rust#116941 so we remove the hard coded override
- following merge of 17.0.4 in rust stable the bug uncovered by lto on aarch64 has been fixed rust-lang/rust#116941 so we remove the hard coded override
We have this code in our https://github.com/zama-ai/tfhe-rs project on commit f1c21888a762ddf9de017ae52dc120c141ec9c02 from tfhe/docs/how_to/compress.md line 44 and beyond:
I expected to see this happen: running the doctest with the following command should work (note that we modify the release profile to have lto = "fat" enabled):
RUSTFLAGS="-C target-cpu=native" cargo +nightly-2023-10-17 test --profile release --doc --features=aarch64-unix,boolean,shortint,integer,internal-keycache -p tfhe -- test_user_docs::how_to_compress
Instead, this happened: the program crashes, compiling the same code in a separate example and the same cargo configuration results in an executable that works. Turning LTO off also makes a doctest that compiles properly, indicating LTO is at fault or part of the problem when combined with doctests.
It has been happening randomly for doctests on a lot of Rust versions but we could not identify what the issue was, looks like enabling LTO creates a miscompile where a value that is provably 0 (as it's never modified by the code) is asserted to be != 0 and crashes the program, sometimes different things error out, it looks like the program is reading at the wrong location on the stack. The value being asserted != 0 is in https://github.com/zama-ai/tfhe-rs/blob/f1c21888a762ddf9de017ae52dc120c141ec9c02/tfhe/src/core_crypto/algorithms/ggsw_encryption.rs#L551
Unfortunately we are not able to minify this issue at the moment as it's not happening reliably across doctests.
Meta
rustc --version --verbose
:Unfortunately on nightly (used to recover the doctest binaries via RUSTDOCFLAGS="-Z unstable-options --persist-doctests doctestbins") only exhibits the crash for the parallel version of an encryption algorithm used with rayon (on current stable we can also get the crash with a serial algorithm but we don't seem to be able to recover the doctest binary).
doctest_miscompile.zip
The archive contains the
objdump --disassemble
for the code compiled as an example (running fine) and the code compiled as a doctest exhibiting the miscompilation, if needed I can provide the binaries, but I would understand if nobody would want to run a binary coming from a bug report.Here is a snippet of a backtrace with two threads erroring on two different issues (while there is no problem having the same code compiled as an example).
Backtrace
We have also seen some flaky doctests on x86_64 and could not narrow down the issue, we have turned off LTO for all of our doctests for now and we will monitor how things evolve, the reason for the suspicion of an issue on x86 as well is that M1 builds have been running with LTO off for months and have never exhibited the flaky doctest we saw on x86_64, though given the compiled code in that case is significantly different (intrinsics usage being one factor) we can't yet be sure a similar issue is happening on x86_64.
Cheers
The text was updated successfully, but these errors were encountered: