Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lto = "fat" causes doctest to generate invalid code for Apple M1 (and potentially x86) #116941

Closed
IceTDrinker opened this issue Oct 19, 2023 · 24 comments · Fixed by #117436 or zama-ai/tfhe-rs#721
Assignees
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. E-needs-mcve Call for participation: This issue has a repro, but needs a Minimal Complete and Verifiable Example I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness O-AArch64 Armv8-A or later processors in AArch64 mode P-high High priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@IceTDrinker
Copy link

IceTDrinker commented Oct 19, 2023

We have this code in our https://github.com/zama-ai/tfhe-rs project on commit f1c21888a762ddf9de017ae52dc120c141ec9c02 from tfhe/docs/how_to/compress.md line 44 and beyond:

use tfhe::prelude::*;
use tfhe::{
    generate_keys, set_server_key, ClientKey, CompressedServerKey, ConfigBuilder, FheUint8,
};

fn main() {
    let config = ConfigBuilder::all_disabled()
        .enable_default_integers()
        .build();

    let cks = ClientKey::generate(config);
    let compressed_sks = CompressedServerKey::new(&cks);

    println!(
        "compressed size  : {}",
        bincode::serialize(&compressed_sks).unwrap().len()
    );

    let sks = compressed_sks.decompress();

    println!(
        "decompressed size: {}",
        bincode::serialize(&sks).unwrap().len()
    );

    set_server_key(sks);

    let clear_a = 12u8;
    let a = FheUint8::try_encrypt(clear_a, &cks).unwrap();

    let c = a + 234u8;
    let decrypted: u8 = c.decrypt(&cks);
    assert_eq!(decrypted, clear_a.wrapping_add(234));
}

I expected to see this happen: running the doctest with the following command should work (note that we modify the release profile to have lto = "fat" enabled):

RUSTFLAGS="-C target-cpu=native" cargo +nightly-2023-10-17 test --profile release --doc --features=aarch64-unix,boolean,shortint,integer,internal-keycache -p tfhe -- test_user_docs::how_to_compress

Instead, this happened: the program crashes, compiling the same code in a separate example and the same cargo configuration results in an executable that works. Turning LTO off also makes a doctest that compiles properly, indicating LTO is at fault or part of the problem when combined with doctests.

It has been happening randomly for doctests on a lot of Rust versions but we could not identify what the issue was, looks like enabling LTO creates a miscompile where a value that is provably 0 (as it's never modified by the code) is asserted to be != 0 and crashes the program, sometimes different things error out, it looks like the program is reading at the wrong location on the stack. The value being asserted != 0 is in https://github.com/zama-ai/tfhe-rs/blob/f1c21888a762ddf9de017ae52dc120c141ec9c02/tfhe/src/core_crypto/algorithms/ggsw_encryption.rs#L551

Unfortunately we are not able to minify this issue at the moment as it's not happening reliably across doctests.

Meta

rustc --version --verbose:

rustc 1.75.0-nightly (49691b1f7 2023-10-16)
binary: rustc
commit-hash: 49691b1f70d71dd7b8349c332b7f277ee527bf08
commit-date: 2023-10-16
host: aarch64-apple-darwin
release: 1.75.0-nightly
LLVM version: 17.0.2

Unfortunately on nightly (used to recover the doctest binaries via RUSTDOCFLAGS="-Z unstable-options --persist-doctests doctestbins") only exhibits the crash for the parallel version of an encryption algorithm used with rayon (on current stable we can also get the crash with a serial algorithm but we don't seem to be able to recover the doctest binary).

doctest_miscompile.zip
The archive contains the objdump --disassemble for the code compiled as an example (running fine) and the code compiled as a doctest exhibiting the miscompilation, if needed I can provide the binaries, but I would understand if nobody would want to run a binary coming from a bug report.

objdump --version
Apple LLVM version 14.0.3 (clang-1403.0.22.14.1)
  Optimized build.
  Default target: arm64-apple-darwin22.5.0
  Host CPU: apple-m1l

  Registered Targets:
    aarch64    - AArch64 (little endian)
    aarch64_32 - AArch64 (little endian ILP32)
    aarch64_be - AArch64 (big endian)
    arm        - ARM
    arm64      - ARM64 (little endian)
    arm64_32   - ARM64 (little endian ILP32)
    armeb      - ARM (big endian)
    thumb      - Thumb
    thumbeb    - Thumb (big endian)
    x86        - 32-bit X86: Pentium-Pro and above
    x86-64     - 64-bit X86: EM64T and AMD64

Here is a snippet of a backtrace with two threads erroring on two different issues (while there is no problem having the same code compiled as an example).

Backtrace

stack backtrace:
   0:        0x102712f6c - thread '<<unnamed>std' panicked at tfhe/src/core_crypto/algorithms/ggsw_encryption.rs:551:::5sys_common:
::assertion failed: ciphertext_modulus.is_compatible_with_native_modulus()backtrace
::_print::DisplayBacktrace as core::fmt::Display>::fmt::h06ea57ce7b13512d
   1:        0x10268b4f8 - core::fmt::write::h4d15d254ca20c331
   2:        0x1026c6a68 - std::io::Write::write_fmt::hfdc8b2852a9a03fa
   3:        0x102715ea0 - std::sys_common::backtrace::print::h139bbaa51f48014c
   4:        0x102715a08 - std::panicking::default_hook::{{closure}}::hbbb7d85a61092397
   5:        0x1027157cc - std::panicking::default_hook::hb0db088803baef11
   6:        0x102717234 - std::panicking::rust_panic_with_hook::h78dc274574606137
   7:        0x102716da8 - std::panicking::begin_panic_handler::{{closure}}::h2905be29dbe9281c
   8:        0x102716c88 - std::sys_common::backtrace::__rust_end_short_backtrace::h2a15f4fd2d64df91
   9:        0x102716c7c - _rust_begin_unwind
  10:        0x1027fe624 - core::panicking::panic_fmt::hd8e61ff6f38230f9
  11:        0x1027fe7b0 - core::panicking::panic::h4a945e52b5fb1050
  12:        0x1027990bc - tfhe::core_crypto::algorithms::glwe_encryption::encrypt_seeded_glwe_ciphertext_assign_with_existing_generator::hb32b93df2aa13c6e
  13:        0x1027d8d44 - <rayon::iter::for_each::ForEachConsumer<F> as rayon::iter::plumbing::Folder<T>>::consume_iter::h6b9d6bce496a26b2
  14:        0x10277099c - rayon::iter::plumbing::Producer::fold_with::h3252c105ae5580f0
  15:        0x10278c92c - rayon::iter::plumbing::bridge_producer_consumer::helper::h516df06807eeed76
  16:        0x10271ff70 - rayon_core::join::join_context::{{closure}}::h7ecf44f403b2e94c
  17:        0x102729d00 - rayon_core::registry::in_worker::hb2d005d9f62ec9b8
  18:        0x10278c918 - rayon::iter::plumbing::bridge_producer_consumer::helper::h516df06807eeed76
  19:        0x102792d0c - <<rayon::iter::map::Map<I,F> as rayon::iter::IndexedParallelIterator>::with_producer::Callback<CB,F> as rayon::iter::plumbing::ProducerCallback<T>>::callback::h282ea6fb42ca6c2b
  20:        0x10276aaa0 - <<rayon::iter::zip::Zip<A,B> as rayon::iter::IndexedParallelIterator>::with_producer::CallbackB<CB,A> as rayon::iter::plumbing::ProducerCallback<ITEM>>::callback::h6c6ab19b4791d17e
  21:        0x1027dcc88 - <<rayon::iter::enumerate::Enumerate<I> as rayon::iter::IndexedParallelIterator>::with_producer::Callback<CB> as rayon::iter::plumbing::ProducerCallback<I>>::callback::h62504345ff3d393a
  22:        0x10278f38c - rayon::iter::plumbing::bridge::h142cac5b932df279
  23:        0x1027de84c - rayon::iter::plumbing::Producer::fold_with::hda6c429fb67861a6
  24:        0x10278b204 - rayon::iter::plumbing::bridge_producer_consumer::helper::ha97da0be53d3520b
  25:        0x1027930fc - <<rayon::iter::map::Map<I,F> as rayon::iter::IndexedParallelIterator>::with_producer::Callback<CB,F> as rayon::iter::plumbing::ProducerCallback<T>>::callback::h5caece096ea77aa2
  26:        0x102768cdc - <<rayon::iter::zip::Zip<A,B> as rayon::iter::IndexedParallelIterator>::with_producer::CallbackA<CB,B> as rayon::iter::plumbing::ProducerCallback<ITEM>>::callback::h9c59859a5ada9da8
  27:        0x102790548 - rayon::iter::plumbing::bridge::h691ef483cd06a966
  28:        0x1027d896c - tfhe::core_crypto::algorithms::ggsw_encryption::par_encrypt_constant_seeded_ggsw_ciphertext_with_existing_generator::h1092854bcdddc1c5
  29:        0x1027d8540 - <rayon::iter::for_each::ForEachConsumer<F> as rayon::iter::plumbing::Folder<T>>::consume_iter::h58460779da245a1d
  30:        0x102771604 - rayon::iter::plumbing::Producer::fold_with::h5c2dab692eefc651
  31:        0x10278a424 - rayon::iter::plumbing::bridge_producer_consumer::helper::hd7e30ce6b8c8fdf8
  32:        0x102759bec - <rayon_core::job::StackJob<L,F,R> as rayon_core::job::Job>::execute::he14a52c10f982320
  33:        0x1027ff980 - rayon_core::registry::WorkerThread::wait_until_cold::hadf889fe03869109
  34:        0x10271ec34 - rayon_core::join::join_context::{{closure}}::h6ff07f0ad22d988f
  35:        0x1027292dc - rayon_core::registry::in_worker::h72ac659d0872c7bc
  36:        0x10278a410 - rayon::iter::plumbing::bridge_producer_consumer::helper::hd7e30ce6b8c8fdf8
  37:        0x102759bec - <rayon_core::job::StackJob<L,F,R> as rayon_core::job::Job>::execute::he14a52c10f982320
  38:        0x1027ff980 - rayon_core::registry::WorkerThread::wait_until_cold::hadf889fe03869109
  39:        0x10280004c - rayon_core::join::join_recover_from_panic::hac430d1fb14e684b
  40:        0x10271eb10 - rayon_core::join::join_context::{{closure}}::h6ff07f0ad22d988f
  41:        0x1027292dc - rayon_core::registry::in_worker::h72ac659d0872c7bc
  42:        0x10278a410 - rayon::iter::plumbing::bridge_producer_consumer::helper::hd7e30ce6b8c8fdf8
  43:        0x10271eac8 - rayon_core::join::join_context::{{closure}}::h6ff07f0ad22d988f
  44:        0x1027292dc - rayon_core::registry::in_worker::h72ac659d0872c7bc
  45:        0x10278a410 - rayon::iter::plumbing::bridge_producer_consumer::helper::hd7e30ce6b8c8fdf8
  46:        0x1027306d4 - rayon_core::join::join_context::{{closure}}::h6ff07f0ad22d988f
  47:        0x102750400 - <rayon_core::job::StackJob<L,F,R> as rayon_core::job::Job>::execute::h5752c5eaefb098bd
  48:        0x1027ff980 - rayon_core::registry::WorkerThread::wait_until_cold::hadf889fe03869109
  49:        0x1026a9300 - rayon_core::registry::ThreadBuilder::run::h03f0186f2f91b865
  50:        0x1026b1ee4 - std::sys_common::backtrace::__rust_begin_short_backtrace::hf857650a9dcd5e44
  51:        0x1026ac8c8 - core::ops::function::FnOnce::call_once{{vtable.shim}}::heab0ff5ef27f89d0
  52:        0x1027183c4 - std::sys::unix::thread::Thread::new::thread_start::h2ab8753089ede7d0
  53:        0x19832bfa8 - __pthread_joiner_wake
stack backtrace:
   0:        0x102712f6c - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h06ea57ce7b13512d
   1:        0x10268b4f8 - core::fmt::write::h4d15d254ca20c331
   2:        0x1026c6a68 - std::io::Write::write_fmt::hfdc8b2852a9a03fa
   3:        0x102715ea0 - std::sys_common::backtrace::print::h139bbaa51f48014c
   4:        0x102715a08 - std::panicking::default_hook::{{closure}}::hbbb7d85a61092397
   5:        0x1027157cc - std::panicking::default_hook::hb0db088803baef11
   6:        0x102717234 - std::panicking::rust_panic_with_hook::h78dc274574606137
   7:        0x102716da8 - std::panicking::begin_panic_handler::{{closure}}::h2905be29dbe9281c
   8:        0x102716c88 - std::sys_common::backtrace::__rust_end_short_backtrace::h2a15f4fd2d64df91
   9:        0x102716c7c - _rust_begin_unwind
  10:  thread ' <unnamed> ' panicked at  /rustc/49691b1f70d71dd7b8349c332b7f277ee527bf08/library/core/src/num/mod.rs : 1166 :0x51027fe624:
 - attempt to calculate the remainder with a divisor of zerocore
::panicking::panic_fmt::hd8e61ff6f38230f9
  11:        0x1027fe7b0 - core::panicking::panic::h4a945e52b5fb1050
  12:        0x1027990bc - tfhe::core_crypto::algorithms::glwe_encryption::encrypt_seeded_glwe_ciphertext_assign_with_existing_generator::hb32b93df2aa13c6e
  13:        0x1027d8d44 - <rayon::iter::for_each::ForEachConsumer<F> as rayon::iter::plumbing::Folder<T>>::consume_iter::h6b9d6bce496a26b2
  14:        0x10277099c - rayon::iter::plumbing::Producer::fold_with::h3252c105ae5580f0
  15:        0x10278c92c - rayon::iter::plumbing::bridge_producer_consumer::helper::h516df06807eeed76
  16:        0x102756c50 - <rayon_core::job::StackJob<L,F,R> as rayon_core::job::Job>::execute::hb4b2cce923b187bc
  17:        0x1027ff980 - rayon_core::registry::WorkerThread::wait_until_cold::hadf889fe03869109
  18:        0x10280004c - rayon_core::join::join_recover_from_panic::hac430d1fb14e684b
  19:        0x10271eb10 - rayon_core::join::join_context::{{closure}}::h6ff07f0ad22d988f
  20:        0x1027292dc - rayon_core::registry::in_worker::h72ac659d0872c7bc
  21:        0x10278a410 - rayon::iter::plumbing::bridge_producer_consumer::helper::hd7e30ce6b8c8fdf8
  22:        0x102759bec - <rayon_core::job::StackJob<L,F,R> as rayon_core::job::Job>::execute::he14a52c10f982320
  23:        0x1027ff980 - rayon_core::registry::WorkerThread::wait_until_cold::hadf889fe03869109
  24:        0x10280004c - rayon_core::join::join_recover_from_panic::hac430d1fb14e684b
  25:        0x10271eb10 - rayon_core::join::join_context::{{closure}}::h6ff07f0ad22d988f
  26:        0x1027292dc - rayon_core::registry::in_worker::h72ac659d0872c7bc
  27:        0x10278a410 - rayon::iter::plumbing::bridge_producer_consumer::helper::hd7e30ce6b8c8fdf8
  28:        0x102759bec - <rayon_core::job::StackJob<L,F,R> as rayon_core::job::Job>::execute::he14a52c10f982320
  29:        0x1027ff980 - rayon_core::registry::WorkerThread::wait_until_cold::hadf889fe03869109
  30:        0x1026a9300 - rayon_core::registry::ThreadBuilder::run::h03f0186f2f91b865
  31:        0x1026b1ee4 - std::sys_common::backtrace::__rust_begin_short_backtrace::hf857650a9dcd5e44
  32:        0x1026ac8c8 - core::ops::function::FnOnce::call_once{{vtable.shim}}::heab0ff5ef27f89d0
  33:        0x1027183c4 - std::sys::unix::thread::Thread::new::thread_start::h2ab8753089ede7d0
  34:        0x19832bfa8 - __pthread_joiner_wake

We have also seen some flaky doctests on x86_64 and could not narrow down the issue, we have turned off LTO for all of our doctests for now and we will monitor how things evolve, the reason for the suspicion of an issue on x86 as well is that M1 builds have been running with LTO off for months and have never exhibited the flaky doctest we saw on x86_64, though given the compiled code in that case is significantly different (intrinsics usage being one factor) we can't yet be sure a similar issue is happening on x86_64.

Cheers

@IceTDrinker IceTDrinker added the C-bug Category: This is a bug. label Oct 19, 2023
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Oct 19, 2023
@Noratrieb
Copy link
Member

have you run the tests with miri to make sure that no UB is happening?

@IceTDrinker
Copy link
Author

Hello @Nilstrieb not yet, I guess it's the occasion to use it, I'll report back

@IceTDrinker
Copy link
Author

Should I try it on the example reproducing the doctest or is there a way to run it on doctests ?

@Noratrieb
Copy link
Member

I forgot whether Miri runs doctests. But you can just extract the doctest into a binary and run that.

@IceTDrinker
Copy link
Author

IceTDrinker commented Oct 20, 2023

Hello @Nilstrieb no undefined behavior found by MIRI, it seems rayon does not terminate its threads so MIRI detects that but the doctest taken out as an example is UB free (I adapted our crypto parameters to be very small but still trigger the crashing doctest).

Cheers

@Noratrieb
Copy link
Member

adding I-unsound as this looks like a miscompilation.
The next step would be to minimize the issue and create a smaller reproducer.

@Noratrieb Noratrieb added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness O-AArch64 Armv8-A or later processors in AArch64 mode and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Oct 20, 2023
@rustbot rustbot added the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Oct 20, 2023
@IceTDrinker
Copy link
Author

agreed, for now it's a bit hard as we used to hit that a bit randomly until we identified it seemed linked to doctests + LTO, we will try to find a way to minify it, in the meantime the assembly in the original report should already contain the miscompiled code (though I understand it's likely way too big for a reasonable analysis)

Cheers

@nikic
Copy link
Contributor

nikic commented Oct 20, 2023

FWIW, a pattern I've seen a few times with doctests + fat lto is that doctests are unoptimized, so you get optimized IR from LTO fed into the backend with disabled optimization. This can expose bugs in FastISel that we don't otherwise hit.

The key to reproducing it outside doc tests may be to use an lto=fat opt-level=0 configuration.

@IceTDrinker
Copy link
Author

FWIW, a pattern I've seen a few times with doctests + fat lto is that doctests are unoptimized, so you get optimized IR from LTO fed into the backend with disabled optimization. This can expose bugs in FastISel that we don't otherwise hit.

The key to reproducing it outside doc tests may be to use an lto=fat opt-level=0 configuration.

Thanks will give it a shot, I'm surprised doctest don't honor opitmization/configuration from the Cargo profile though 🤔 any reason for this? If the various parts are not supposed to have mismatched opt levels then yes I can see how some hypothesis done would break at the junction of the optimized code and unoptimized one

@IceTDrinker
Copy link
Author

IceTDrinker commented Oct 20, 2023

For the doctest as an example binary with the lto=fat opt-level=0 configuration I cannot get it to crash

@apiraino apiraino added the E-needs-mcve Call for participation: This issue has a repro, but needs a Minimal Complete and Verifiable Example label Oct 23, 2023
@IceTDrinker
Copy link
Author

setting the opt-level=0 in release allows the doctest to pass as well, so a mixing of LTO and non 0 opt level seems to trigger the issue for the doctest, trying to minimize the repro case

@IceTDrinker
Copy link
Author

setting opt-level=1 in release with LTO=fat crashes the doc test

@IceTDrinker
Copy link
Author

IceTDrinker commented Oct 23, 2023

Hello again, some news, I have a state where I'm able to trigger the crash at will with a single line being uncommented.

the repro branch is here https://github.com/zama-ai/tfhe-rs/tree/am/doctest_bug

MIRI is still OK on the doctest taken as an example.

The interesting part is in here, the line below // UNCOMMENT TO PRODUCE THE MISCOMPILE will cause the doctest to crash when not commented, i.e. that function call has the issue, when commented out and inlining the function (i.e. the rest of the doctest below) the execution goes well. I will attach the dump of the assembly for both.

doctest_minified.zip

nok_function_call.S has both the call to the function and the inlined version
ok_function_inlined.S only has the inlined version

use tfhe::core_crypto::prelude::*;
use tfhe::shortint::engine::ShortintEngine;
use tfhe::shortint::parameters::PARAM_MESSAGE_2_CARRY_2_KS_PBS;
use tfhe::shortint::server_key::{MaxDegree, ShortintCompressedBootstrappingKey};
use tfhe::shortint::{
    ClientKey as ShortintClientKey, CompressedServerKey as ShortintCompressedServerKey,
};

fn main() {
    {
        let cks = ShortintClientKey::new(PARAM_MESSAGE_2_CARRY_2_KS_PBS);
        // let compressed_sks = ShortintCompressedServerKey::new(&cks);
        let mut engine = ShortintEngine::new();
        // let compressed_sks = engine.new_compressed_server_key(&cks).unwrap();

        // Plaintext Max Value
        let max_value = cks.parameters.message_modulus().0 * cks.parameters.carry_modulus().0 - 1;

        // The maximum number of operations before we need to clean the carry buffer
        let max_degree = MaxDegree(max_value);
        // UNCOMMENT TO PRODUCE THE MISCOMPILE
        let compressed_sks = engine.new_compressed_server_key_with_max_degree(&cks, max_degree);

        // THIS BELOW IS THE SAME AS THE ABOVE FUNCTION INLINED
        let compressed_sks = {
            let bootstrapping_key = match cks.parameters.pbs_parameters().unwrap() {
                tfhe::shortint::PBSParameters::PBS(pbs_params) => {
                    let bootstrapping_key = allocate_and_generate_new_seeded_lwe_bootstrap_key(
                        &cks.small_lwe_secret_key,
                        &cks.glwe_secret_key,
                        pbs_params.pbs_base_log,
                        pbs_params.pbs_level,
                        pbs_params.glwe_modular_std_dev,
                        pbs_params.ciphertext_modulus,
                        &mut engine.seeder,
                    );

                    ShortintCompressedBootstrappingKey::Classic(bootstrapping_key)
                }
                tfhe::shortint::PBSParameters::MultiBitPBS(pbs_params) => {
                    let bootstrapping_key =
                        par_allocate_and_generate_new_seeded_lwe_multi_bit_bootstrap_key(
                            &cks.small_lwe_secret_key,
                            &cks.glwe_secret_key,
                            pbs_params.pbs_base_log,
                            pbs_params.pbs_level,
                            pbs_params.glwe_modular_std_dev,
                            pbs_params.grouping_factor,
                            pbs_params.ciphertext_modulus,
                            &mut engine.seeder,
                        );

                    ShortintCompressedBootstrappingKey::MultiBit {
                        seeded_bsk: bootstrapping_key,
                        deterministic_execution: pbs_params.deterministic_execution,
                    }
                }
            };

            // Creation of the key switching key
            let key_switching_key = allocate_and_generate_new_seeded_lwe_keyswitch_key(
                &cks.large_lwe_secret_key,
                &cks.small_lwe_secret_key,
                cks.parameters.ks_base_log(),
                cks.parameters.ks_level(),
                cks.parameters.lwe_modular_std_dev(),
                cks.parameters.ciphertext_modulus(),
                &mut engine.seeder,
            );

            // Pack the keys in the server key set:
            ShortintCompressedServerKey {
                key_switching_key,
                bootstrapping_key,
                message_modulus: cks.parameters.message_modulus(),
                carry_modulus: cks.parameters.carry_modulus(),
                max_degree,
                ciphertext_modulus: cks.parameters.ciphertext_modulus(),
                pbs_order: cks.parameters.encryption_key_choice().into(),
            }
        };
    }

    println!("MIRI run done");
}

run MIRI

MIRIFLAGS="-Zmiri-disable-isolation" RUSTFLAGS="-C target-cpu=native" cargo +nightly-2023-10-17 miri run --release --example debug_minify --features=seeder_unix,shortint -p tfhe

Run doctest

RUST_BACKTRACE=full RUSTFLAGS="-C target-cpu=native" RUSTDOCFLAGS="-Z unstable-options --persist-doctests doctestbins" cargo +nightly-2023-10-17 test --profile release --doc --features=seeder_unix,shortint -p tfhe --test_user_docs::how_to_compress

@IceTDrinker
Copy link
Author

IceTDrinker commented Oct 23, 2023

still feels like the wrong part of the stack gets read in the function where a value should be a 0 and something else is read from who knows where

edit: will try to minify some more

@IceTDrinker
Copy link
Author

IceTDrinker commented Oct 23, 2023

With some logging

from the inlined non bugged version

allocate_and_generate_new_seeded_lwe_bootstrap_key=CiphertextModulus(2^64)
generate_seeded_lwe_bootstrap_key=CiphertextModulus(2^64)
encrypt_constant_seeded_ggsw_ciphertext_with_existing_generator=CiphertextModulus(2^64)
encrypt_constant_seeded_ggsw_ciphertext_with_existing_generator=CiphertextModulus(2^64)
encrypt_constant_seeded_ggsw_ciphertext_with_existing_generator=CiphertextModulus(2^64)
encrypt_constant_seeded_ggsw_ciphertext_with_existing_generator=CiphertextModulus(2^64)
MIRI run done

Edit: the "MIRI run done" was a check for the MIRI example, I share the code between both, both runs here are normal runs

from the function call bugged version

new_compressed_server_key_with_max_degree=CiphertextModulus(2^64)
allocate_and_generate_new_seeded_lwe_bootstrap_key=CiphertextModulus(2^64)
generate_seeded_lwe_bootstrap_key=CiphertextModulus(2^64)
encrypt_constant_seeded_ggsw_ciphertext_with_existing_generator=CiphertextModulus(2^64)
encrypt_constant_seeded_ggsw_ciphertext_with_existing_generator=CiphertextModulus(79395112631681133340136570880)

looks like the first iteration of a loop is fine but then the data gets corrupted

here the native modulus 2^64 is encoded as 0 in a u128, so the 2^64 are valid, but the 79395112631681133340136570880 is random data and seems to be changing from run to run

@IceTDrinker
Copy link
Author

looks similar in spirit to #112548 with the sensitivity to opt levels, though I'm not familiar with the mir opt level

@IceTDrinker
Copy link
Author

Hello, minified the example to an iterator call returning corrupted data when a specific feature (on which the code does not depend) is enabled. Disabling said feature makes the code run properly, changing the iterator to be an immutable iter does not cause the issue (https://github.com/zama-ai/tfhe-rs/blob/73bf8af9ec7eca7b36f016b2bbfeccfd3b1ac7d2/tfhe/src/lib.rs#L103)

The iterator in question is a wrapping lending iterator that is defined in https://github.com/zama-ai/tfhe-rs/blob/73bf8af9ec7eca7b36f016b2bbfeccfd3b1ac7d2/tfhe/src/core_crypto/commons/traits/contiguous_entity_container.rs#L326, immutable variant is here https://github.com/zama-ai/tfhe-rs/blob/73bf8af9ec7eca7b36f016b2bbfeccfd3b1ac7d2/tfhe/src/core_crypto/commons/traits/contiguous_entity_container.rs#L127

Available here https://github.com/zama-ai/tfhe-rs/tree/am/doctest_bug_minify

Run the doctest and crash it :

RUST_BACKTRACE=full RUSTFLAGS="-C target-cpu=native" cargo +nightly-2023-10-17 test --profile release --doc \
                --features=shortint -p tfhe \
                -- test_user_docs::how_to_compress

Run the doctest and does not crash :

RUST_BACKTRACE=full RUSTFLAGS="-C target-cpu=native" cargo +nightly-2023-10-17 test --profile release --doc \
                -p tfhe \
                -- test_user_docs::how_to_compress

Tried to take the code out to a different repo, it did not repro

Let me know if there is something more I can do, but here I can't seem to minify it anymore than that at the moment

@nikic
Copy link
Contributor

nikic commented Oct 24, 2023

I can reproduce the crash on aarch64-linux after also enabling the seeder_unix feature.

I'm not sure how to debug though -- it doesn't seem like rustdoc has any support for that at all? -vv does nothing and neither does -C save-temps. How do you even get the executable it generates?

Passing -Cllvm-args=-global-isel=0 to rustdoc does fix the issue, so this looks like another "passing O3 IR to O0 backend" issue.

@IceTDrinker
Copy link
Author

IceTDrinker commented Oct 24, 2023

I do the following (notice RUSTDOCFLAGS) :

RUST_BACKTRACE=full RUSTFLAGS="-C target-cpu=native" RUSTDOCFLAGS="-Z unstable-options --persist-doctests doctestbins" cargo +nightly-2023-10-17 test --profile release --doc \
                --features=shortint -p tfhe \
                -- test_user_docs::how_to_compress

then

find doctestbins -type f -name '*out'

and copy the only executable found in there

Yes I agree with the O3 thing, I just don't quite get why rustdoc does not use the configuration from the cargo profile provided in the command line

Edit: I'm guessing it's not an easy problem and there may be a reason for this

@nikic nikic added the A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. label Oct 24, 2023
@IceTDrinker
Copy link
Author

should there be a rustdoc specific bug report somewhere ? I have to say I posted here mainly because I found an old issue that looked similar

@nikic
Copy link
Contributor

nikic commented Oct 25, 2023

Thanks, using --persist-doctests worked!

I've used this LLVM patch (https://gist.github.com/nikic/7fd69aef8f3bb8401db508e5ff08324d) to identify _ZN4core3ops8function6FnOnce9call_once17h881b5e27c390a63eE.llvm.424339700988180217 as the miscompiled function. This is the extracted IR: https://gist.github.com/nikic/ad91e65c3332717d2e0855b4bf81734f

should there be a rustdoc specific bug report somewhere ? I have to say I posted here mainly because I found an old issue that looked similar

Yes, I think that would be a good idea. I think this is probably a cargo bug, as -C opt-level doesn't get passed to rustdoc.

@nikic nikic self-assigned this Oct 25, 2023
@IceTDrinker
Copy link
Author

Thanks a lot 🙏 and great that you could find the faulty function!

should there be a rustdoc specific bug report somewhere ? I have to say I posted here mainly because I found an old issue that looked similar

Yes, I think that would be a good idea. I think this is probably a cargo bug, as -C opt-level doesn't get passed to rustdoc.

Alright then you advise opening an issue on https://github.com/rust-lang/cargo to let them know of the opt level not being forwarded ?

@nikic
Copy link
Contributor

nikic commented Oct 25, 2023

Upstream issue: llvm/llvm-project#70207

Alright then you advise opening an issue on https://github.com/rust-lang/cargo to let them know of the opt level not being forwarded ?

Yeah. It might be intentional, but it seems suspicious to forward -C lto=fat but not -C opt-level=3.

@apiraino
Copy link
Contributor

WG-prioritization assigning priority (Zulip discussion).

@rustbot label -I-prioritize +P-high

@rustbot rustbot added P-high High priority and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Oct 26, 2023
bors added a commit to rust-lang-ci/rust that referenced this issue Nov 1, 2023
@bors bors closed this as completed in d1611e3 Nov 1, 2023
3tilley pushed a commit to 3tilley/rust that referenced this issue Nov 1, 2023
IceTDrinker added a commit to zama-ai/tfhe-rs that referenced this issue Nov 30, 2023
- following merge of 17.0.4 in rust stable the bug uncovered by lto on
aarch64 has been fixed rust-lang/rust#116941 so
we remove the hard coded override
IceTDrinker added a commit to zama-ai/tfhe-rs that referenced this issue Nov 30, 2023
- following merge of 17.0.4 in rust stable the bug uncovered by lto on
aarch64 has been fixed rust-lang/rust#116941 so
we remove the hard coded override
- update nightly toolchain to have fixed LLVM as well
- fix lints linked to latest nightly
IceTDrinker added a commit to zama-ai/tfhe-rs that referenced this issue Nov 30, 2023
- following merge of 17.0.4 in rust stable the bug uncovered by lto on
aarch64 has been fixed rust-lang/rust#116941 so
we remove the hard coded override
- update nightly toolchain to have fixed LLVM as well
- fix lints linked to latest nightly
IceTDrinker added a commit to zama-ai/tfhe-rs that referenced this issue Nov 30, 2023
- following merge of 17.0.4 in rust stable the bug uncovered by lto on
aarch64 has been fixed rust-lang/rust#116941 so
we remove the hard coded override
- update nightly toolchain to have fixed LLVM as well
- fix lints linked to latest nightly
IceTDrinker added a commit to zama-ai/tfhe-rs that referenced this issue Dec 1, 2023
- following merge of 17.0.4 in rust stable the bug uncovered by lto on
aarch64 has been fixed rust-lang/rust#116941 so
we remove the hard coded override
IceTDrinker added a commit to zama-ai/tfhe-rs that referenced this issue Dec 1, 2023
- following merge of 17.0.4 in rust stable the bug uncovered by lto on
aarch64 has been fixed rust-lang/rust#116941 so
we remove the hard coded override
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. E-needs-mcve Call for participation: This issue has a repro, but needs a Minimal Complete and Verifiable Example I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness O-AArch64 Armv8-A or later processors in AArch64 mode P-high High priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
5 participants