Can not load tokoenizer from_pretrained through http_proxy since 0.14.0 #1373

jtsai-quid · 2023-10-25T07:01:28Z

Hi hf,

I encountered an issue where I couldn't load the tokenizer using from_pretrained via the http_proxy in version 0.14.0, while it worked successfully in version 0.13.3.
This caused the fast tokenizer initialization issue in TGI 1.1.0.
huggingface/text-generation-inference#1108

Here is the code snippet that I use to test for testing.

//# tokenizers = { version = "0.14.0", features = ["http"] }

use tokenizers::tokenizer::{Result, Tokenizer};
use tokenizers::{FromPretrainedParameters};

fn main() -> Result<()> {
        let authorization_token = std::env::var("HUGGING_FACE_HUB_TOKEN").ok();
        let params = FromPretrainedParameters {
            revision: None.clone().unwrap_or("main".to_string()),
            auth_token: authorization_token.clone(),
            ..Default::default()
        };

        let tokenizer = Tokenizer::from_pretrained("TheBloke/Llama-2-13B-chat-GPTQ", Some(params))?;

        let encoding = tokenizer.encode("Hey there!", false)?;
        println!("{:?}", encoding.get_tokens());
    Ok(())
}

Error output

> http_proxy=http://squid:3128 https_proxy=http://squid:3128 cargo play run.rs
   Compiling p4u7iybabtwyzvxf2zdtkustjgod2 v0.1.0 (/tmp/cargo-play.4U7iybABTwyZVxF2ZDTKUstjgod2)
    Finished dev [unoptimized + debuginfo] target(s) in 3.14s
     Running `/tmp/cargo-play.4U7iybABTwyZVxF2ZDTKUstjgod2/target/debug/p4u7iybabtwyzvxf2zdtkustjgod2`
Error: RequestError(Transport(Transport { kind: Io, message: None, url: Some(Url { scheme: "https", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("huggingface.co")), port: None, path: "/TheBloke/Llama-2-13B-chat-GPTQ/resolve/main/tokenizer.json", query: None, fragment: None }), source: Some(Custom { kind: TimedOut, error: "timed out reading response" }) }))

I suspect that this is related to the client refactoring in here

Thanks and appreciate for any help from you!

The text was updated successfully, but these errors were encountered:

ArthurZucker · 2023-10-25T07:07:48Z

Indeed. Could you try with the latest release? Otherwise I'll have look at what I can do!

jtsai-quid · 2023-10-25T07:21:07Z

Just try the version 0.14.1 and the error still occurs. 😞

jtsai-quid · 2023-10-26T02:29:26Z

hi @ArthurZucker ,
Would this PR fix this issue?
huggingface/hf-hub#34

ArthurZucker · 2023-10-26T06:58:56Z

Ah! Yeah most probably because now we use the hf-hub api to load files, so if proxy is an issue there, will affect us.

github-actions · 2023-12-06T01:50:42Z

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

jtsai-quid · 2023-12-07T02:45:56Z

hi @ArthurZucker ,
I have noticed hf-hub has fixed this issue.
huggingface/hf-hub#34
Would it be possible to use the latest version of hf-hub in the tokenizer?
Thanks~

github-actions · 2024-01-07T01:52:18Z

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

jtsai-quid mentioned this issue Oct 30, 2023

Add proxy from env. huggingface/hf-hub#34

Merged

github-actions bot added the Stale label Dec 6, 2023

github-actions bot removed the Stale label Dec 8, 2023

github-actions bot added the Stale label Jan 7, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jan 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can not load tokoenizer from_pretrained through http_proxy since 0.14.0 #1373

Can not load tokoenizer from_pretrained through http_proxy since 0.14.0 #1373

jtsai-quid commented Oct 25, 2023

ArthurZucker commented Oct 25, 2023

jtsai-quid commented Oct 25, 2023

jtsai-quid commented Oct 26, 2023

ArthurZucker commented Oct 26, 2023

github-actions bot commented Dec 6, 2023

jtsai-quid commented Dec 7, 2023

github-actions bot commented Jan 7, 2024

Can not load tokoenizer from_pretrained through http_proxy since 0.14.0 #1373

Can not load tokoenizer from_pretrained through http_proxy since 0.14.0 #1373

Comments

jtsai-quid commented Oct 25, 2023

ArthurZucker commented Oct 25, 2023

jtsai-quid commented Oct 25, 2023

jtsai-quid commented Oct 26, 2023

ArthurZucker commented Oct 26, 2023

github-actions bot commented Dec 6, 2023

jtsai-quid commented Dec 7, 2023

github-actions bot commented Jan 7, 2024