Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bc3ec39d breaks the compilation (as noted in #1355) #1359

Closed
baptisterajaut opened this issue Oct 8, 2023 · 13 comments
Closed

bc3ec39d breaks the compilation (as noted in #1355) #1359

baptisterajaut opened this issue Oct 8, 2023 · 13 comments

Comments

@baptisterajaut
Copy link

baptisterajaut commented Oct 8, 2023

As stated, this commit breaks building the tokenizers on modern toolchains, even stable

error: casting `&T` to `&mut T` is undefined behavior, even if the reference is unused, consider instead using an `UnsafeCell`
         --> tokenizers-lib/src/models/bpe/trainer.rs:526:47
          |
      522 |                     let w = &words[*i] as *const _ as *mut _;
          |                             -------------------------------- casting happend here
      ...
      526 |                         let word: &mut Word = &mut (*w);
          |                                               ^^^^^^^^^
          |

% rustc -V
rustc 1.73.0 (cc66ad468 2023-10-03)

@adwaraki
Copy link

adwaraki commented Oct 28, 2023

Tokenizers cannot be installed for me too. It is being installed as part of the Allen-NLP package and the new version of the Rust compiler breaks it.

Installing Rust via the Rust site using their shell script installs 1.73.0 I presume and breaks the Tokenizers compilation, but installing it via Homebrew installs 1.72.1, which is works.

@Narsil
Copy link
Collaborator

Narsil commented Oct 30, 2023

Which version are you using.

This was fixed already on main and 0.14.1

https://github.com/huggingface/tokenizers/blob/main/tokenizers/src/models/bpe/trainer.rs#L541-L546

@Songcheng-Xie
Copy link

Songcheng-Xie commented Nov 6, 2023

To escape from this error, I install transformers with conda, which uses command 'conda install -c huggingface transformers'. then it works.

Copy link

github-actions bot commented Dec 7, 2023

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Dec 7, 2023
@DavidAdamczyk
Copy link

I have the same problem with Python 3.11 do you need more information about this issue?

@Narsil
Copy link
Collaborator

Narsil commented Dec 9, 2023

@DavidAdamczyk Use a more recent tokenizers version, or an older Rust compiler version.

@github-actions github-actions bot removed the Stale label Dec 10, 2023
@DavidAdamczyk
Copy link

I use the latest version of tokenizers and the most recent stable version of the Rust compiler. Additionally, I follow the installation instructions available here. Could someone update the installation instructions and include information about the supported versions of all dependencies?

@Mr-AniP
Copy link

Mr-AniP commented Dec 23, 2023

Hey Hi,
This same error has happened with me I am trying to install transformers v 4.6.1 on Pyng z2 board (v2.5 {arm7l})
with rust v 1.74.1

Edit: Strategy to solve this error is to use older rust version -> (What I did)

  1. install rust v1.72.1
    rustup default 1.72.1
  2. Remove rust stable or set environment variable to make sure that compilation does not use rust stable
    rustup toolchain remove stable
    or
    export RUSTUP_TOOLCHAIN=1.72.1

After this It should work properly

Copy link

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Jan 25, 2024
@davehorner
Copy link

davehorner commented Jan 25, 2024

pip3 install transformers==4.15.0 timm==0.4.12 fairscale==0.4.4

  error: casting `&T` to `&mut T` is undefined behavior, even if the reference is unused, consider instead using an `UnsafeCell`
     --> tokenizers-lib\src\models\bpe\trainer.rs:517:47
      |
  513 |                     let w = &words[*i] as *const _ as *mut _;
      |                             -------------------------------- casting happend here
  ...
  517 |                         let word: &mut Word = &mut (*w);
      |                                               ^^^^^^^^^
      |
      = note: for more information, visit <https://doc.rust-lang.org/book/ch15-05-interior-mutability.html>
      = note: `#[deny(invalid_reference_casting)]` on by default

running into this tonight too.

Requirement already satisfied: requests in c:\users\dhorner\anaconda3\envs\hotz\lib\site-packages (from transformers==4.15.0->-r requirements.txt (line 2)) (2.31.0)
Collecting sacremoses (from transformers==4.15.0->-r requirements.txt (line 2))
Using cached sacremoses-0.1.1-py3-none-any.whl.metadata (8.3 kB)
Collecting tokenizers<0.11,>=0.10.1 (from transformers==4.15.0->-r requirements.txt (line 2))
Using cached tokenizers-0.10.3.tar.gz (212 kB)

THE SOLUTION FOR ME WAS TO SET RUSTFLAGS=-A invalid_reference_casting
worked for me in 1.75.0

@github-actions github-actions bot removed the Stale label Jan 26, 2024
@athewsey
Copy link

Also ran in to this issue last week, installing transformers==4.22.1 pinned by a different project. tokenizers resolved to v0.12.1. Platform was macOS Sonoma, M2 chip.

I also worked around by running:

export RUSTFLAGS="-A invalid_reference_casting"

...before installing, but it'd be great if the problem could be tackled at source!

@davehorner
Copy link

I would love to be the one to help resolve this further than a environment flag.

tokenizers-lib/src/models/bpe/trainer.rs:526

I do not see tokenizers-lib in tree.
rg "let w = &words[*i] as *const _ as *mut _;" finds nothing

The error guidance is not clear. GPT says:
This error message indicates that you're attempting to cast a shared reference (&T) into a mutable reference (&mut T), which is considered undefined behavior in Rust, even if the mutable reference is not actually used. Rust's safety guarantees rely on preventing such unsound operations.

To resolve this issue, you should use appropriate safe patterns for mutable access, such as Cell, RefCell, or UnsafeCell for interior mutability, depending on your specific use case.

In your case, since you're dealing with mutable access to data through raw pointers, you should consider using UnsafeCell. Here's how you can adjust your code:

use std::cell::UnsafeCell;

// Assuming Word is some struct or type you're working with
struct Word {
    // fields of Word
}

// Assuming words is some collection of Word
let words: Vec<Word> = /* initialization of words */;

// Assuming i is some index into the words vector
let i = /* index */;

// Accessing the word at index i in a mutable way
let w = &words[i] as *const _ as *mut UnsafeCell<Word>;
let word: &UnsafeCell<Word> = unsafe { &*w };
let word_mut: &mut Word = unsafe { &mut *word.get() };

However, using UnsafeCell requires careful handling as it bypasses Rust's safety checks. Make sure you understand the implications of using UnsafeCell and ensure that your code is correct and safe.

Alternatively, consider restructuring your code to avoid mutable raw pointer access if possible, as raw pointer manipulation can be error-prone and harder to reason about compared to safe Rust constructs.

so Rustonomicon.

If someone can orient me to where the code is. I don't know where it lives.

@ArthurZucker
Copy link
Collaborator

I'll close this as the latest releases don't have this issue anymore I believe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants