Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error: casting &T to &mut T is undefined behavior #1485

Closed
Jipok opened this issue Apr 3, 2024 · 7 comments
Closed

error: casting &T to &mut T is undefined behavior #1485

Jipok opened this issue Apr 3, 2024 · 7 comments
Labels

Comments

@Jipok
Copy link

Jipok commented Apr 3, 2024

ERROR: Failed building wheel for tokenizers:

Building wheels for collected packages: tokenizers
  Building wheel for tokenizers (pyproject.toml) ... error
  error: subprocess-exited-with-error

  × Building wheel for tokenizers (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [592 lines of output]
      running bdist_wheel
      running build
      running build_py
      creating build
      creating build/lib.linux-x86_64-cpython-312
      creating build/lib.linux-x86_64-cpython-312/tokenizers
      copying py_src/tokenizers/__init__.py -> build/lib.linux-x86_64-cpython-312/tokenizers
      creating build/lib.linux-x86_64-cpython-312/tokenizers/models
      copying py_src/tokenizers/models/__init__.py -> build/lib.linux-x86_64-cpython-312/tokenizers/models
      creating build/lib.linux-x86_64-cpython-312/tokenizers/decoders
      copying py_src/tokenizers/decoders/__init__.py -> build/lib.linux-x86_64-cpython-312/tokenizers/decoders
      creating build/lib.linux-x86_64-cpython-312/tokenizers/normalizers
      copying py_src/tokenizers/normalizers/__init__.py -> build/lib.linux-x86_64-cpython-312/tokenizers/normalizers
      creating build/lib.linux-x86_64-cpython-312/tokenizers/pre_tokenizers
      copying py_src/tokenizers/pre_tokenizers/__init__.py -> build/lib.linux-x86_64-cpython-312/tokenizers/pre_tokenizers
      creating build/lib.linux-x86_64-cpython-312/tokenizers/processors
      copying py_src/tokenizers/processors/__init__.py -> build/lib.linux-x86_64-cpython-312/tokenizers/processors
      creating build/lib.linux-x86_64-cpython-312/tokenizers/trainers
      copying py_src/tokenizers/trainers/__init__.py -> build/lib.linux-x86_64-cpython-312/tokenizers/trainers
      creating build/lib.linux-x86_64-cpython-312/tokenizers/implementations
      copying py_src/tokenizers/implementations/__init__.py -> build/lib.linux-x86_64-cpython-312/tokenizers/implementations
      copying py_src/tokenizers/implementations/base_tokenizer.py -> build/lib.linux-x86_64-cpython-312/tokenizers/implementations
      copying py_src/tokenizers/implementations/bert_wordpiece.py -> build/lib.linux-x86_64-cpython-312/tokenizers/implementations
      copying py_src/tokenizers/implementations/byte_level_bpe.py -> build/lib.linux-x86_64-cpython-312/tokenizers/implementations
      copying py_src/tokenizers/implementations/char_level_bpe.py -> build/lib.linux-x86_64-cpython-312/tokenizers/implementations
      copying py_src/tokenizers/implementations/sentencepiece_bpe.py -> build/lib.linux-x86_64-cpython-312/tokenizers/implementations
      copying py_src/tokenizers/implementations/sentencepiece_unigram.py -> build/lib.linux-x86_64-cpython-312/tokenizers/implementations
      creating build/lib.linux-x86_64-cpython-312/tokenizers/tools
      copying py_src/tokenizers/tools/__init__.py -> build/lib.linux-x86_64-cpython-312/tokenizers/tools
      copying py_src/tokenizers/tools/visualizer.py -> build/lib.linux-x86_64-cpython-312/tokenizers/tools
      copying py_src/tokenizers/__init__.pyi -> build/lib.linux-x86_64-cpython-312/tokenizers
      copying py_src/tokenizers/models/__init__.pyi -> build/lib.linux-x86_64-cpython-312/tokenizers/models
      copying py_src/tokenizers/decoders/__init__.pyi -> build/lib.linux-x86_64-cpython-312/tokenizers/decoders
      copying py_src/tokenizers/normalizers/__init__.pyi -> build/lib.linux-x86_64-cpython-312/tokenizers/normalizers
      copying py_src/tokenizers/pre_tokenizers/__init__.pyi -> build/lib.linux-x86_64-cpython-312/tokenizers/pre_tokenizers
      copying py_src/tokenizers/processors/__init__.pyi -> build/lib.linux-x86_64-cpython-312/tokenizers/processors
      copying py_src/tokenizers/trainers/__init__.pyi -> build/lib.linux-x86_64-cpython-312/tokenizers/trainers
      copying py_src/tokenizers/tools/visualizer-styles.css -> build/lib.linux-x86_64-cpython-312/tokenizers/tools
      running build_ext
      running build_rust
          Updating crates.io index
      cargo rustc --lib --message-format=json-render-diagnostics --manifest-path Cargo.toml --release -v --features pyo3/extension-module --crate-type cdylib --
         Compiling libc v0.2.153
         Compiling proc-macro2 v1.0.79
         Compiling unicode-ident v1.0.12
         Compiling autocfg v1.2.0
         Compiling pkg-config v0.3.30
         Compiling cfg-if v1.0.0
         Compiling typenum v1.17.0
         Compiling memchr v2.7.2
         Compiling version_check v0.9.4
         Compiling once_cell v1.19.0
         Compiling syn v1.0.109
         Compiling pin-project-lite v0.2.14
         Compiling target-lexicon v0.12.14
         Compiling vcpkg v0.2.15
         Compiling bitflags v2.5.0
         Compiling bytes v1.6.0
         Compiling itoa v1.0.11
         Compiling subtle v2.5.0
         Compiling futures-core v0.3.30
         Compiling serde v1.0.197
         Compiling crossbeam-utils v0.8.19
         Compiling openssl v0.10.64
         Compiling fnv v1.0.7
           Running `rustc --crate-name build_script_build --edition=2021 /home/sd/.cargo/registry/src/index.crates.io-6f17d22bba15001f/proc-macro2-1.0.79/build.rs --error-format=json --json=diagnostic-rendered-ansi,artifacts,future-incompat --crate-type bin --emit=dep-info,link -C embed-bitcode=no -C debug-assertions=off --cfg 'feature="default"' --cfg 'feature="proc-macro"' -C metadata=b4ee986c80539004 -C extra-filename=-b4ee986c80539004 --out-dir /tmp/pip-install-bvuco56u/tokenizers_8fea27572d074ce5977e58f1408074ea/target/release/build/proc-macro2-b4ee986c80539004 -L dependency=/tmp/pip-install-bvuco56u/tokenizers_8fea27572d074ce5977e58f1408074ea/target/release/deps --cap-lints allow`
  ...
         Compiling tokenizers v0.13.3 (/tmp/pip-install-bvuco56u/tokenizers_8fea27572d074ce5977e58f1408074ea/tokenizers-lib)
           Running `rustc --crate-name tokenizers --edition=2018 tokenizers-lib/src/lib.rs --error-format=json --json=diagnostic-rendered-ansi,artifacts,future-incompat --crate-type lib --emit=dep-info,metadata,link -C opt-level=3 -C embed-bitcode=no --cfg 'feature="cached-path"' --cfg 'feature="clap"' --cfg 'feature="cli"' --cfg 'feature="default"' --cfg 'feature="dirs"' --cfg 'feature="esaxx_fast"' --cfg 'feature="http"' --cfg 'feature="indicatif"' --cfg 'feature="onig"' --cfg 'feature="progressbar"' --cfg 'feature="reqwest"' -C metadata=7328a86746abf437 -C extra-filename=-7328a86746abf437 --out-dir /tmp/pip-install-bvuco56u/tokenizers_8fea27572d074ce5977e58f1408074ea/target/release/deps -L dependency=/tmp/pip-install-bvuco56u/tokenizers_8fea27572d074ce5977e58f1408074ea/target/release/deps --extern aho_corasick=/tmp/pip-install-bvuco56u/tokenizers_8fea27572d074ce5977e58f1408074ea/target/release/deps/libaho_corasick-4b3322f33dd90c4d.rmeta --extern cached_path=/tmp/pip-install-bvuco56u/tokenizers_8fea27572d074ce5977e58f1408074ea/target/release/deps/libcached_path-5ed5128f026500fa.rmeta --extern clap=/tmp/pip-install-bvuco56u/tokenizers_8fea27572d074ce5977e58f1408074ea/target/release/deps/libclap-75fa0696f6a35286.rmeta --extern derive_builder=/tmp/pip-install-bvuco56u/tokenizers_8fea27572d074ce5977e58f1408074ea/target/release/deps/libderive_builder-8a346deffc2ebe1e.rmeta --extern dirs=/tmp/pip-install-bvuco56u/tokenizers_8fea27572d074ce5977e58f1408074ea/target/release/deps/libdirs-b91e2b12ef7b3a26.rmeta --extern esaxx_rs=/tmp/pip-install-bvuco56u/tokenizers_8fea27572d074ce5977e58f1408074ea/target/release/deps/libesaxx_rs-ec5f02997062ab07.rmeta --extern getrandom=/tmp/pip-install-bvuco56u/tokenizers_8fea27572d074ce5977e58f1408074ea/target/release/deps/libgetrandom-93dfba45fc2b8e30.rmeta --extern indicatif=/tmp/pip-install-bvuco56u/tokenizers_8fea27572d074ce5977e58f1408074ea/target/release/deps/libindicatif-3730620e4a8e6215.rmeta --extern itertools=/tmp/pip-install-bvuco56u/tokenizers_8fea27572d074ce5977e58f1408074ea/target/release/deps/libitertools-2aa7fd4d247f314a.rmeta --extern lazy_static=/tmp/pip-install-bvuco56u/tokenizers_8fea27572d074ce5977e58f1408074ea/target/release/deps/liblazy_static-d60e4dd36b567e7f.rmeta --extern log=/tmp/pip-install-bvuco56u/tokenizers_8fea27572d074ce5977e58f1408074ea/target/release/deps/liblog-e7360a68c8f9fdb6.rmeta --extern macro_rules_attribute=/tmp/pip-install-bvuco56u/tokenizers_8fea27572d074ce5977e58f1408074ea/target/release/deps/libmacro_rules_attribute-0c5e18dae1223f79.rmeta --extern monostate=/tmp/pip-install-bvuco56u/tokenizers_8fea27572d074ce5977e58f1408074ea/target/release/deps/libmonostate-cd0f5691d941ce54.rmeta --extern onig=/tmp/pip-install-bvuco56u/tokenizers_8fea27572d074ce5977e58f1408074ea/target/release/deps/libonig-18382c6494a03e83.rmeta --extern paste=/tmp/pip-install-bvuco56u/tokenizers_8fea27572d074ce5977e58f1408074ea/target/release/deps/libpaste-24a5791047389e1c.so --extern rand=/tmp/pip-install-bvuco56u/tokenizers_8fea27572d074ce5977e58f1408074ea/target/release/deps/librand-c6f59ec8eb990809.rmeta --extern rayon=/tmp/pip-install-bvuco56u/tokenizers_8fea27572d074ce5977e58f1408074ea/target/release/deps/librayon-d58498cddcaf228e.rmeta --extern rayon_cond=/tmp/pip-install-bvuco56u/tokenizers_8fea27572d074ce5977e58f1408074ea/target/release/deps/librayon_cond-81749514621b9292.rmeta --extern regex=/tmp/pip-install-bvuco56u/tokenizers_8fea27572d074ce5977e58f1408074ea/target/release/deps/libregex-0296cd254892a4e0.rmeta --extern regex_syntax=/tmp/pip-install-bvuco56u/tokenizers_8fea27572d074ce5977e58f1408074ea/target/release/deps/libregex_syntax-899dc8a5500cf19b.rmeta --extern reqwest=/tmp/pip-install-bvuco56u/tokenizers_8fea27572d074ce5977e58f1408074ea/target/release/deps/libreqwest-e23a1f2e52abfda1.rmeta --extern serde=/tmp/pip-install-bvuco56u/tokenizers_8fea27572d074ce5977e58f1408074ea/target/release/deps/libserde-86a61dd17283abc2.rmeta --extern serde_json=/tmp/pip-install-bvuco56u/tokenizers_8fea27572d074ce5977e58f1408074ea/target/release/deps/libserde_json-4224df968acf122a.rmeta --extern spm_precompiled=/tmp/pip-install-bvuco56u/tokenizers_8fea27572d074ce5977e58f1408074ea/target/release/deps/libspm_precompiled-a430935e8998b536.rmeta --extern thiserror=/tmp/pip-install-bvuco56u/tokenizers_8fea27572d074ce5977e58f1408074ea/target/release/deps/libthiserror-8c7345053709316a.rmeta --extern unicode_normalization_alignments=/tmp/pip-install-bvuco56u/tokenizers_8fea27572d074ce5977e58f1408074ea/target/release/deps/libunicode_normalization_alignments-da6b9466ba182084.rmeta --extern unicode_segmentation=/tmp/pip-install-bvuco56u/tokenizers_8fea27572d074ce5977e58f1408074ea/target/release/deps/libunicode_segmentation-71eb6260cf3865ac.rmeta --extern unicode_categories=/tmp/pip-install-bvuco56u/tokenizers_8fea27572d074ce5977e58f1408074ea/target/release/deps/libunicode_categories-166b45fdd025eb04.rmeta -L native=/tmp/pip-install-bvuco56u/tokenizers_8fea27572d074ce5977e58f1408074ea/target/release/build/bzip2-sys-503e67e92eb0af78/out/lib -L native=/tmp/pip-install-bvuco56u/tokenizers_8fea27572d074ce5977e58f1408074ea/target/release/build/zstd-sys-70e1cad5ab897179/out -L native=/tmp/pip-install-bvuco56u/tokenizers_8fea27572d074ce5977e58f1408074ea/target/release/build/esaxx-rs-f91c3d0a3966aca1/out -L native=/tmp/pip-install-bvuco56u/tokenizers_8fea27572d074ce5977e58f1408074ea/target/release/build/onig_sys-20fbd3f96135b358/out`
      warning: variable does not need to be mutable
         --> tokenizers-lib/src/models/unigram/model.rs:265:21
          |
      265 |                 let mut target_node = &mut best_path_ends_at[key_pos];
          |                     ----^^^^^^^^^^^
          |                     |
          |                     help: remove this `mut`
          |
          = note: `#[warn(unused_mut)]` on by default

      warning: variable does not need to be mutable
         --> tokenizers-lib/src/models/unigram/model.rs:282:21
          |
      282 |                 let mut target_node = &mut best_path_ends_at[starts_at + mblen];
          |                     ----^^^^^^^^^^^
          |                     |
          |                     help: remove this `mut`

      warning: variable does not need to be mutable
         --> tokenizers-lib/src/pre_tokenizers/byte_level.rs:200:59
          |
      200 |     encoding.process_tokens_with_offsets_mut(|(i, (token, mut offsets))| {
          |                                                           ----^^^^^^^
          |                                                           |
          |                                                           help: remove this `mut`

      error: casting `&T` to `&mut T` is undefined behavior, even if the reference is unused, consider instead using an `UnsafeCell`
         --> tokenizers-lib/src/models/bpe/trainer.rs:526:47
          |
      522 |                     let w = &words[*i] as *const _ as *mut _;
          |                             -------------------------------- casting happend here
      ...
      526 |                         let word: &mut Word = &mut (*w);
          |                                               ^^^^^^^^^
          |
          = note: for more information, visit <https://doc.rust-lang.org/book/ch15-05-interior-mutability.html>
          = note: `#[deny(invalid_reference_casting)]` on by default

      warning: `tokenizers` (lib) generated 3 warnings
      error: could not compile `tokenizers` (lib) due to 1 previous error; 3 warnings emitted

      Caused by:
        process didn't exit successfully: `rustc --crate-name tokenizers --edition=2018 tokenizers-lib/src/lib.rs --error-format=json --json=diagnostic-rendered-ansi,artifacts,future-incompat --crate-type lib --emit=dep-info,metadata,link -C opt-level=3 -C embed-bitcode=no --cfg 'feature="cached-path"' --cfg 'feature="clap"' --cfg 'feature="cli"' --cfg 'feature="default"' --cfg 'feature="dirs"' --cfg 'feature="esaxx_fast"' --cfg 'feature="http"' --cfg 'feature="indicatif"' --cfg 'feature="onig"' --cfg 'feature="progressbar"' --cfg 'feature="reqwest"' -C metadata=7328a86746abf437 -C extra-filename=-7328a86746abf437 --out-dir /tmp/pip-install-bvuco56u/tokenizers_8fea27572d074ce5977e58f1408074ea/target/release/deps -L dependency=/tmp/pip-install-bvuco56u/tokenizers_8fea27572d074ce5977e58f1408074ea/target/release/deps --extern aho_corasick=/tmp/pip-install-bvuco56u/tokenizers_8fea27572d074ce5977e58f1408074ea/target/release/deps/libaho_corasick-4b3322f33dd90c4d.rmeta --extern cached_path=/tmp/pip-install-bvuco56u/tokenizers_8fea27572d074ce5977e58f1408074ea/target/release/deps/libcached_path-5ed5128f026500fa.rmeta --extern clap=/tmp/pip-install-bvuco56u/tokenizers_8fea27572d074ce5977e58f1408074ea/target/release/deps/libclap-75fa0696f6a35286.rmeta --extern derive_builder=/tmp/pip-install-bvuco56u/tokenizers_8fea27572d074ce5977e58f1408074ea/target/release/deps/libderive_builder-8a346deffc2ebe1e.rmeta --extern dirs=/tmp/pip-install-bvuco56u/tokenizers_8fea27572d074ce5977e58f1408074ea/target/release/deps/libdirs-b91e2b12ef7b3a26.rmeta --extern esaxx_rs=/tmp/pip-install-bvuco56u/tokenizers_8fea27572d074ce5977e58f1408074ea/target/release/deps/libesaxx_rs-ec5f02997062ab07.rmeta --extern getrandom=/tmp/pip-install-bvuco56u/tokenizers_8fea27572d074ce5977e58f1408074ea/target/release/deps/libgetrandom-93dfba45fc2b8e30.rmeta --extern indicatif=/tmp/pip-install-bvuco56u/tokenizers_8fea27572d074ce5977e58f1408074ea/target/release/deps/libindicatif-3730620e4a8e6215.rmeta --extern itertools=/tmp/pip-install-bvuco56u/tokenizers_8fea27572d074ce5977e58f1408074ea/target/release/deps/libitertools-2aa7fd4d247f314a.rmeta --extern lazy_static=/tmp/pip-install-bvuco56u/tokenizers_8fea27572d074ce5977e58f1408074ea/target/release/deps/liblazy_static-d60e4dd36b567e7f.rmeta --extern log=/tmp/pip-install-bvuco56u/tokenizers_8fea27572d074ce5977e58f1408074ea/target/release/deps/liblog-e7360a68c8f9fdb6.rmeta --extern macro_rules_attribute=/tmp/pip-install-bvuco56u/tokenizers_8fea27572d074ce5977e58f1408074ea/target/release/deps/libmacro_rules_attribute-0c5e18dae1223f79.rmeta --extern monostate=/tmp/pip-install-bvuco56u/tokenizers_8fea27572d074ce5977e58f1408074ea/target/release/deps/libmonostate-cd0f5691d941ce54.rmeta --extern onig=/tmp/pip-install-bvuco56u/tokenizers_8fea27572d074ce5977e58f1408074ea/target/release/deps/libonig-18382c6494a03e83.rmeta --extern paste=/tmp/pip-install-bvuco56u/tokenizers_8fea27572d074ce5977e58f1408074ea/target/release/deps/libpaste-24a5791047389e1c.so --extern rand=/tmp/pip-install-bvuco56u/tokenizers_8fea27572d074ce5977e58f1408074ea/target/release/deps/librand-c6f59ec8eb990809.rmeta --extern rayon=/tmp/pip-install-bvuco56u/tokenizers_8fea27572d074ce5977e58f1408074ea/target/release/deps/librayon-d58498cddcaf228e.rmeta --extern rayon_cond=/tmp/pip-install-bvuco56u/tokenizers_8fea27572d074ce5977e58f1408074ea/target/release/deps/librayon_cond-81749514621b9292.rmeta --extern regex=/tmp/pip-install-bvuco56u/tokenizers_8fea27572d074ce5977e58f1408074ea/target/release/deps/libregex-0296cd254892a4e0.rmeta --extern regex_syntax=/tmp/pip-install-bvuco56u/tokenizers_8fea27572d074ce5977e58f1408074ea/target/release/deps/libregex_syntax-899dc8a5500cf19b.rmeta --extern reqwest=/tmp/pip-install-bvuco56u/tokenizers_8fea27572d074ce5977e58f1408074ea/target/release/deps/libreqwest-e23a1f2e52abfda1.rmeta --extern serde=/tmp/pip-install-bvuco56u/tokenizers_8fea27572d074ce5977e58f1408074ea/target/release/deps/libserde-86a61dd17283abc2.rmeta --extern serde_json=/tmp/pip-install-bvuco56u/tokenizers_8fea27572d074ce5977e58f1408074ea/target/release/deps/libserde_json-4224df968acf122a.rmeta --extern spm_precompiled=/tmp/pip-install-bvuco56u/tokenizers_8fea27572d074ce5977e58f1408074ea/target/release/deps/libspm_precompiled-a430935e8998b536.rmeta --extern thiserror=/tmp/pip-install-bvuco56u/tokenizers_8fea27572d074ce5977e58f1408074ea/target/release/deps/libthiserror-8c7345053709316a.rmeta --extern unicode_normalization_alignments=/tmp/pip-install-bvuco56u/tokenizers_8fea27572d074ce5977e58f1408074ea/target/release/deps/libunicode_normalization_alignments-da6b9466ba182084.rmeta --extern unicode_segmentation=/tmp/pip-install-bvuco56u/tokenizers_8fea27572d074ce5977e58f1408074ea/target/release/deps/libunicode_segmentation-71eb6260cf3865ac.rmeta --extern unicode_categories=/tmp/pip-install-bvuco56u/tokenizers_8fea27572d074ce5977e58f1408074ea/target/release/deps/libunicode_categories-166b45fdd025eb04.rmeta -L native=/tmp/pip-install-bvuco56u/tokenizers_8fea27572d074ce5977e58f1408074ea/target/release/build/bzip2-sys-503e67e92eb0af78/out/lib -L native=/tmp/pip-install-bvuco56u/tokenizers_8fea27572d074ce5977e58f1408074ea/target/release/build/zstd-sys-70e1cad5ab897179/out -L native=/tmp/pip-install-bvuco56u/tokenizers_8fea27572d074ce5977e58f1408074ea/target/release/build/esaxx-rs-f91c3d0a3966aca1/out -L native=/tmp/pip-install-bvuco56u/tokenizers_8fea27572d074ce5977e58f1408074ea/target/release/build/onig_sys-20fbd3f96135b358/out` (exit status: 1)
      error: `cargo rustc --lib --message-format=json-render-diagnostics --manifest-path Cargo.toml --release -v --features pyo3/extension-module --crate-type cdylib --` failed with code 101
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for tokenizers

Full log: https://snips.sh/f/XPSNFeHd9Q
Python 3.12.2
rustc 1.76.0 (07dca489a 2024-02-04) (Void Linux)

@austinleroy
Copy link

While the code in this repo should be fixed, a temporary workaround is to use an older version of the rust toolchain (I had success with rust 1.72.0, installing version 0.13.2):

RUSTUP_TOOLCHAIN=1.72.0 pip install tokenizers==0.13.2

Originally I was trying to install 0.13.3, but ran into issues because the clap dependency requires rust 1.74 or newer.

Copy link

github-actions bot commented May 6, 2024

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label May 6, 2024
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 11, 2024
@fcolecumberri
Copy link

making github-actions close an obvious bug just because no one made comments on it doesn't make the bug go away.

@ArthurZucker
Copy link
Collaborator

Pretty sure this was fixed

@Arondight
Copy link

rust 1:1.78.0-1

         Compiling tokenizers v0.13.3 (/tmp/pip-install-_9lczfk8/tokenizers_a626b57540ed48ed8ef6ce337e9f06c5/tokenizers-lib)
           Running `rustc --crate-name tokenizers --edition=2018 tokenizers-lib/src/lib.rs --error-format=json --json=diagnostic-rendered-ansi,artifacts,future-incompat --crate-type lib --emit=dep-info,metadata,link -C opt-level=3 -C embed-bitcode=no --cfg 'feature="cached-path"' --cfg 'feature="clap"' --cfg 'feature="cli"' --cfg 'feature="default"' --cfg 'feature="dirs"' --cfg 'feature="esaxx_fast"' --cfg 'feature="http"' --cfg 'feature="indicatif"' --cfg 'feature="onig"' --cfg 'feature="progressbar"' --cfg 'feature="reqwest"' -C metadata=8900dc2403a2c8dd -C extra-filename=-8900dc2403a2c8dd --out-dir /tmp/pip-install-_9lczfk8/tokenizers_a626b57540ed48ed8ef6ce337e9f06c5/target/release/deps -C strip=debuginfo -L dependency=/tmp/pip-install-_9lczfk8/tokenizers_a626b57540ed48ed8ef6ce337e9f06c5/target/release/deps --extern aho_corasick=/tmp/pip-install-_9lczfk8/tokenizers_a626b57540ed48ed8ef6ce337e9f06c5/target/release/deps/libaho_corasick-6aa983f83cc1d860.rmeta --extern cached_path=/tmp/pip-install-_9lczfk8/tokenizers_a626b57540ed48ed8ef6ce337e9f06c5/target/release/deps/libcached_path-3ec46145dd1d130e.rmeta --extern clap=/tmp/pip-install-_9lczfk8/tokenizers_a626b57540ed48ed8ef6ce337e9f06c5/target/release/deps/libclap-b6f996e8d27659fd.rmeta --extern derive_builder=/tmp/pip-install-_9lczfk8/tokenizers_a626b57540ed48ed8ef6ce337e9f06c5/target/release/deps/libderive_builder-9b96e240da4197d9.rmeta --extern dirs=/tmp/pip-install-_9lczfk8/tokenizers_a626b57540ed48ed8ef6ce337e9f06c5/target/release/deps/libdirs-5779e67b580d982d.rmeta --extern esaxx_rs=/tmp/pip-install-_9lczfk8/tokenizers_a626b57540ed48ed8ef6ce337e9f06c5/target/release/deps/libesaxx_rs-a4589fe58879f69e.rmeta --extern getrandom=/tmp/pip-install-_9lczfk8/tokenizers_a626b57540ed48ed8ef6ce337e9f06c5/target/release/deps/libgetrandom-a1725e4f12011643.rmeta --extern indicatif=/tmp/pip-install-_9lczfk8/tokenizers_a626b57540ed48ed8ef6ce337e9f06c5/target/release/deps/libindicatif-5a25b637c223a512.rmeta --extern itertools=/tmp/pip-install-_9lczfk8/tokenizers_a626b57540ed48ed8ef6ce337e9f06c5/target/release/deps/libitertools-25bf26bb9d7012e3.rmeta --extern lazy_static=/tmp/pip-install-_9lczfk8/tokenizers_a626b57540ed48ed8ef6ce337e9f06c5/target/release/deps/liblazy_static-efe629d64d1e110a.rmeta --extern log=/tmp/pip-install-_9lczfk8/tokenizers_a626b57540ed48ed8ef6ce337e9f06c5/target/release/deps/liblog-aefa68b3bb6aa74b.rmeta --extern macro_rules_attribute=/tmp/pip-install-_9lczfk8/tokenizers_a626b57540ed48ed8ef6ce337e9f06c5/target/release/deps/libmacro_rules_attribute-905f7969e6855dc7.rmeta --extern monostate=/tmp/pip-install-_9lczfk8/tokenizers_a626b57540ed48ed8ef6ce337e9f06c5/target/release/deps/libmonostate-979232758d229ae8.rmeta --extern onig=/tmp/pip-install-_9lczfk8/tokenizers_a626b57540ed48ed8ef6ce337e9f06c5/target/release/deps/libonig-d57fa18c6b270e69.rmeta --extern paste=/tmp/pip-install-_9lczfk8/tokenizers_a626b57540ed48ed8ef6ce337e9f06c5/target/release/deps/libpaste-dcd1fc4ea32404f5.so --extern rand=/tmp/pip-install-_9lczfk8/tokenizers_a626b57540ed48ed8ef6ce337e9f06c5/target/release/deps/librand-14a9ba308db49e20.rmeta --extern rayon=/tmp/pip-install-_9lczfk8/tokenizers_a626b57540ed48ed8ef6ce337e9f06c5/target/release/deps/librayon-bbce6394af2ecdb4.rmeta --extern rayon_cond=/tmp/pip-install-_9lczfk8/tokenizers_a626b57540ed48ed8ef6ce337e9f06c5/target/release/deps/librayon_cond-1a99da87a6ad378d.rmeta --extern regex=/tmp/pip-install-_9lczfk8/tokenizers_a626b57540ed48ed8ef6ce337e9f06c5/target/release/deps/libregex-abe854aac7680929.rmeta --extern regex_syntax=/tmp/pip-install-_9lczfk8/tokenizers_a626b57540ed48ed8ef6ce337e9f06c5/target/release/deps/libregex_syntax-bf2c82fdea1a20c9.rmeta --extern reqwest=/tmp/pip-install-_9lczfk8/tokenizers_a626b57540ed48ed8ef6ce337e9f06c5/target/release/deps/libreqwest-08741808824a1069.rmeta --extern serde=/tmp/pip-install-_9lczfk8/tokenizers_a626b57540ed48ed8ef6ce337e9f06c5/target/release/deps/libserde-48f9ebe75a8f3233.rmeta --extern serde_json=/tmp/pip-install-_9lczfk8/tokenizers_a626b57540ed48ed8ef6ce337e9f06c5/target/release/deps/libserde_json-722fc36ce3c5e169.rmeta --extern spm_precompiled=/tmp/pip-install-_9lczfk8/tokenizers_a626b57540ed48ed8ef6ce337e9f06c5/target/release/deps/libspm_precompiled-4b735c268352039e.rmeta --extern thiserror=/tmp/pip-install-_9lczfk8/tokenizers_a626b57540ed48ed8ef6ce337e9f06c5/target/release/deps/libthiserror-0a045910d95e7f7c.rmeta --extern unicode_normalization_alignments=/tmp/pip-install-_9lczfk8/tokenizers_a626b57540ed48ed8ef6ce337e9f06c5/target/release/deps/libunicode_normalization_alignments-72a662c4885161d8.rmeta --extern unicode_segmentation=/tmp/pip-install-_9lczfk8/tokenizers_a626b57540ed48ed8ef6ce337e9f06c5/target/release/deps/libunicode_segmentation-d45cbfa0bdea00fb.rmeta --extern unicode_categories=/tmp/pip-install-_9lczfk8/tokenizers_a626b57540ed48ed8ef6ce337e9f06c5/target/release/deps/libunicode_categories-c386831dea5a0d6e.rmeta -L native=/usr/lib -L native=/tmp/pip-install-_9lczfk8/tokenizers_a626b57540ed48ed8ef6ce337e9f06c5/target/release/build/zstd-sys-5958720fa03c9e44/out -L native=/tmp/pip-install-_9lczfk8/tokenizers_a626b57540ed48ed8ef6ce337e9f06c5/target/release/build/esaxx-rs-cd4e20ef7e068fc7/out -L native=/tmp/pip-install-_9lczfk8/tokenizers_a626b57540ed48ed8ef6ce337e9f06c5/target/release/build/onig_sys-2153c850ad2e752d/out`
      warning: variable does not need to be mutable
         --> tokenizers-lib/src/models/unigram/model.rs:265:21
          |
      265 |                 let mut target_node = &mut best_path_ends_at[key_pos];
          |                     ----^^^^^^^^^^^
          |                     |
          |                     help: remove this `mut`
          |
          = note: `#[warn(unused_mut)]` on by default

      warning: variable does not need to be mutable
         --> tokenizers-lib/src/models/unigram/model.rs:282:21
          |
      282 |                 let mut target_node = &mut best_path_ends_at[starts_at + mblen];
          |                     ----^^^^^^^^^^^
          |                     |
          |                     help: remove this `mut`

      warning: variable does not need to be mutable
         --> tokenizers-lib/src/pre_tokenizers/byte_level.rs:200:59
          |
      200 |     encoding.process_tokens_with_offsets_mut(|(i, (token, mut offsets))| {
          |                                                           ----^^^^^^^
          |                                                           |
          |                                                           help: remove this `mut`

      error: casting `&T` to `&mut T` is undefined behavior, even if the reference is unused, consider instead using an `UnsafeCell`
         --> tokenizers-lib/src/models/bpe/trainer.rs:526:47
          |
      522 |                     let w = &words[*i] as *const _ as *mut _;
          |                             -------------------------------- casting happend here
      ...
      526 |                         let word: &mut Word = &mut (*w);
          |                                               ^^^^^^^^^
          |
          = note: for more information, visit <https://doc.rust-lang.org/book/ch15-05-interior-mutability.html>
          = note: `#[deny(invalid_reference_casting)]` on by default

      warning: `tokenizers` (lib) generated 3 warnings
      error: could not compile `tokenizers` (lib) due to 1 previous error; 3 warnings emitted

      Caused by:
        process didn't exit successfully: `rustc --crate-name tokenizers --edition=2018 tokenizers-lib/src/lib.rs --error-format=json --json=diagnostic-rendered-ansi,artifacts,future-incompat --crate-type lib --emit=dep-info,metadata,link -C opt-level=3 -C embed-bitcode=no --cfg 'feature="cached-path"' --cfg 'feature="clap"' --cfg 'feature="cli"' --cfg 'feature="default"' --cfg 'feature="dirs"' --cfg 'feature="esaxx_fast"' --cfg 'feature="http"' --cfg 'feature="indicatif"' --cfg 'feature="onig"' --cfg 'feature="progressbar"' --cfg 'feature="reqwest"' -C metadata=8900dc2403a2c8dd -C extra-filename=-8900dc2403a2c8dd --out-dir /tmp/pip-install-_9lczfk8/tokenizers_a626b57540ed48ed8ef6ce337e9f06c5/target/release/deps -C strip=debuginfo -L dependency=/tmp/pip-install-_9lczfk8/tokenizers_a626b57540ed48ed8ef6ce337e9f06c5/target/release/deps --extern aho_corasick=/tmp/pip-install-_9lczfk8/tokenizers_a626b57540ed48ed8ef6ce337e9f06c5/target/release/deps/libaho_corasick-6aa983f83cc1d860.rmeta --extern cached_path=/tmp/pip-install-_9lczfk8/tokenizers_a626b57540ed48ed8ef6ce337e9f06c5/target/release/deps/libcached_path-3ec46145dd1d130e.rmeta --extern clap=/tmp/pip-install-_9lczfk8/tokenizers_a626b57540ed48ed8ef6ce337e9f06c5/target/release/deps/libclap-b6f996e8d27659fd.rmeta --extern derive_builder=/tmp/pip-install-_9lczfk8/tokenizers_a626b57540ed48ed8ef6ce337e9f06c5/target/release/deps/libderive_builder-9b96e240da4197d9.rmeta --extern dirs=/tmp/pip-install-_9lczfk8/tokenizers_a626b57540ed48ed8ef6ce337e9f06c5/target/release/deps/libdirs-5779e67b580d982d.rmeta --extern esaxx_rs=/tmp/pip-install-_9lczfk8/tokenizers_a626b57540ed48ed8ef6ce337e9f06c5/target/release/deps/libesaxx_rs-a4589fe58879f69e.rmeta --extern getrandom=/tmp/pip-install-_9lczfk8/tokenizers_a626b57540ed48ed8ef6ce337e9f06c5/target/release/deps/libgetrandom-a1725e4f12011643.rmeta --extern indicatif=/tmp/pip-install-_9lczfk8/tokenizers_a626b57540ed48ed8ef6ce337e9f06c5/target/release/deps/libindicatif-5a25b637c223a512.rmeta --extern itertools=/tmp/pip-install-_9lczfk8/tokenizers_a626b57540ed48ed8ef6ce337e9f06c5/target/release/deps/libitertools-25bf26bb9d7012e3.rmeta --extern lazy_static=/tmp/pip-install-_9lczfk8/tokenizers_a626b57540ed48ed8ef6ce337e9f06c5/target/release/deps/liblazy_static-efe629d64d1e110a.rmeta --extern log=/tmp/pip-install-_9lczfk8/tokenizers_a626b57540ed48ed8ef6ce337e9f06c5/target/release/deps/liblog-aefa68b3bb6aa74b.rmeta --extern macro_rules_attribute=/tmp/pip-install-_9lczfk8/tokenizers_a626b57540ed48ed8ef6ce337e9f06c5/target/release/deps/libmacro_rules_attribute-905f7969e6855dc7.rmeta --extern monostate=/tmp/pip-install-_9lczfk8/tokenizers_a626b57540ed48ed8ef6ce337e9f06c5/target/release/deps/libmonostate-979232758d229ae8.rmeta --extern onig=/tmp/pip-install-_9lczfk8/tokenizers_a626b57540ed48ed8ef6ce337e9f06c5/target/release/deps/libonig-d57fa18c6b270e69.rmeta --extern paste=/tmp/pip-install-_9lczfk8/tokenizers_a626b57540ed48ed8ef6ce337e9f06c5/target/release/deps/libpaste-dcd1fc4ea32404f5.so --extern rand=/tmp/pip-install-_9lczfk8/tokenizers_a626b57540ed48ed8ef6ce337e9f06c5/target/release/deps/librand-14a9ba308db49e20.rmeta --extern rayon=/tmp/pip-install-_9lczfk8/tokenizers_a626b57540ed48ed8ef6ce337e9f06c5/target/release/deps/librayon-bbce6394af2ecdb4.rmeta --extern rayon_cond=/tmp/pip-install-_9lczfk8/tokenizers_a626b57540ed48ed8ef6ce337e9f06c5/target/release/deps/librayon_cond-1a99da87a6ad378d.rmeta --extern regex=/tmp/pip-install-_9lczfk8/tokenizers_a626b57540ed48ed8ef6ce337e9f06c5/target/release/deps/libregex-abe854aac7680929.rmeta --extern regex_syntax=/tmp/pip-install-_9lczfk8/tokenizers_a626b57540ed48ed8ef6ce337e9f06c5/target/release/deps/libregex_syntax-bf2c82fdea1a20c9.rmeta --extern reqwest=/tmp/pip-install-_9lczfk8/tokenizers_a626b57540ed48ed8ef6ce337e9f06c5/target/release/deps/libreqwest-08741808824a1069.rmeta --extern serde=/tmp/pip-install-_9lczfk8/tokenizers_a626b57540ed48ed8ef6ce337e9f06c5/target/release/deps/libserde-48f9ebe75a8f3233.rmeta --extern serde_json=/tmp/pip-install-_9lczfk8/tokenizers_a626b57540ed48ed8ef6ce337e9f06c5/target/release/deps/libserde_json-722fc36ce3c5e169.rmeta --extern spm_precompiled=/tmp/pip-install-_9lczfk8/tokenizers_a626b57540ed48ed8ef6ce337e9f06c5/target/release/deps/libspm_precompiled-4b735c268352039e.rmeta --extern thiserror=/tmp/pip-install-_9lczfk8/tokenizers_a626b57540ed48ed8ef6ce337e9f06c5/target/release/deps/libthiserror-0a045910d95e7f7c.rmeta --extern unicode_normalization_alignments=/tmp/pip-install-_9lczfk8/tokenizers_a626b57540ed48ed8ef6ce337e9f06c5/target/release/deps/libunicode_normalization_alignments-72a662c4885161d8.rmeta --extern unicode_segmentation=/tmp/pip-install-_9lczfk8/tokenizers_a626b57540ed48ed8ef6ce337e9f06c5/target/release/deps/libunicode_segmentation-d45cbfa0bdea00fb.rmeta --extern unicode_categories=/tmp/pip-install-_9lczfk8/tokenizers_a626b57540ed48ed8ef6ce337e9f06c5/target/release/deps/libunicode_categories-c386831dea5a0d6e.rmeta -L native=/usr/lib -L native=/tmp/pip-install-_9lczfk8/tokenizers_a626b57540ed48ed8ef6ce337e9f06c5/target/release/build/zstd-sys-5958720fa03c9e44/out -L native=/tmp/pip-install-_9lczfk8/tokenizers_a626b57540ed48ed8ef6ce337e9f06c5/target/release/build/esaxx-rs-cd4e20ef7e068fc7/out -L native=/tmp/pip-install-_9lczfk8/tokenizers_a626b57540ed48ed8ef6ce337e9f06c5/target/release/build/onig_sys-2153c850ad2e752d/out` (exit status: 1)
      error: `cargo rustc --lib --message-format=json-render-diagnostics --manifest-path Cargo.toml --release -v --features pyo3/extension-module --crate-type cdylib --` failed with code 101
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for tokenizers
Failed to build tokenizers
ERROR: Could not build wheels for tokenizers, which is required to install pyproject.toml-based projects

@Arondight
Copy link

oh i see this fix in lastest code

@robert-irelan-tiktokusds

For future reference, I think this was fixed in commit 4322056

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants