Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can not load tokoenizer from_pretrained through http_proxy since 0.14.0 #1373

Closed
jtsai-quid opened this issue Oct 25, 2023 · 7 comments
Closed
Labels

Comments

@jtsai-quid
Copy link

Hi hf,

I encountered an issue where I couldn't load the tokenizer using from_pretrained via the http_proxy in version 0.14.0, while it worked successfully in version 0.13.3.
This caused the fast tokenizer initialization issue in TGI 1.1.0.
huggingface/text-generation-inference#1108

Here is the code snippet that I use to test for testing.

//# tokenizers = { version = "0.14.0", features = ["http"] }

use tokenizers::tokenizer::{Result, Tokenizer};
use tokenizers::{FromPretrainedParameters};

fn main() -> Result<()> {
        let authorization_token = std::env::var("HUGGING_FACE_HUB_TOKEN").ok();
        let params = FromPretrainedParameters {
            revision: None.clone().unwrap_or("main".to_string()),
            auth_token: authorization_token.clone(),
            ..Default::default()
        };

        let tokenizer = Tokenizer::from_pretrained("TheBloke/Llama-2-13B-chat-GPTQ", Some(params))?;

        let encoding = tokenizer.encode("Hey there!", false)?;
        println!("{:?}", encoding.get_tokens());
    Ok(())
}

Error output

> http_proxy=http://squid:3128 https_proxy=http://squid:3128 cargo play run.rs
   Compiling p4u7iybabtwyzvxf2zdtkustjgod2 v0.1.0 (/tmp/cargo-play.4U7iybABTwyZVxF2ZDTKUstjgod2)
    Finished dev [unoptimized + debuginfo] target(s) in 3.14s
     Running `/tmp/cargo-play.4U7iybABTwyZVxF2ZDTKUstjgod2/target/debug/p4u7iybabtwyzvxf2zdtkustjgod2`
Error: RequestError(Transport(Transport { kind: Io, message: None, url: Some(Url { scheme: "https", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("huggingface.co")), port: None, path: "/TheBloke/Llama-2-13B-chat-GPTQ/resolve/main/tokenizer.json", query: None, fragment: None }), source: Some(Custom { kind: TimedOut, error: "timed out reading response" }) }))

I suspect that this is related to the client refactoring in here

Thanks and appreciate for any help from you!

@ArthurZucker
Copy link
Collaborator

Indeed. Could you try with the latest release? Otherwise I'll have look at what I can do!

@jtsai-quid
Copy link
Author

Just try the version 0.14.1 and the error still occurs. 😞

@jtsai-quid
Copy link
Author

hi @ArthurZucker ,
Would this PR fix this issue?
huggingface/hf-hub#34

@ArthurZucker
Copy link
Collaborator

Ah! Yeah most probably because now we use the hf-hub api to load files, so if proxy is an issue there, will affect us.

Copy link

github-actions bot commented Dec 6, 2023

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Dec 6, 2023
@jtsai-quid
Copy link
Author

hi @ArthurZucker ,
I have noticed hf-hub has fixed this issue.
huggingface/hf-hub#34
Would it be possible to use the latest version of hf-hub in the tokenizer?
Thanks~

@github-actions github-actions bot removed the Stale label Dec 8, 2023
Copy link

github-actions bot commented Jan 7, 2024

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Jan 7, 2024
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jan 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants