Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prop:Help:StartUp #159

Closed
remotejob opened this issue Jun 10, 2021 · 12 comments
Closed

prop:Help:StartUp #159

remotejob opened this issue Jun 10, 2021 · 12 comments

Comments

@remotejob
Copy link

I have a simple GPT2 model on
https://huggingface.co/remotejob/tweetsGPT2fi_v1/tree/main
rust_model.ot uploaded
Pls. give me some idea for a startup.

my python version working correctly

from transformers import AutoModelWithLMHead, AutoTokenizer, pipeline
model = AutoModelWithLMHead.from_pretrained("remotejob/tweetsGPT2fi_v1")
tokenizer = AutoTokenizer.from_pretrained("remotejob/tweetsGPT2fi_v1")
generator= pipeline('text-generation', model=model, tokenizer=tokenizer)
res = generator("Kuka sei")
print(res)

@guillaume-be
Copy link
Owner

Hello,

For a model available online, I'd recommend using RemoteResources which allow automatic download and caching of the files. The example below illustrates how your example would wokr:

use rust_bert::pipelines::common::ModelType;
use rust_bert::pipelines::text_generation::{TextGenerationConfig, TextGenerationModel};
use rust_bert::resources::{RemoteResource, Resource};

fn main() -> anyhow::Result<()> {
    //    Set-up model
    let custom_gpt2_model = Resource::Remote(RemoteResource::new(
        "https://huggingface.co/remotejob/tweetsGPT2fi_v1/resolve/main/rust_model.ot",
        "tweetsGPT2fi_v1/model",
    ));

    let custom_gpt2_vocab = Resource::Remote(RemoteResource::new(
        "https://huggingface.co/remotejob/tweetsGPT2fi_v1/resolve/main/vocab.json",
        "tweetsGPT2fi_v1/vocab",
    ));

    let custom_gpt2_merges = Resource::Remote(RemoteResource::new(
        "https://huggingface.co/remotejob/tweetsGPT2fi_v1/resolve/main/merges.txt",
        "tweetsGPT2fi_v1/merges",
    ));

    let custom_gpt2_config = Resource::Remote(RemoteResource::new(
        "https://huggingface.co/remotejob/tweetsGPT2fi_v1/resolve/main/config.json",
        "tweetsGPT2fi_v1/config",
    ));

    let generate_config = TextGenerationConfig {
        model_resource: custom_gpt2_model,
        vocab_resource: custom_gpt2_vocab,
        merges_resource: custom_gpt2_merges,
        config_resource: custom_gpt2_config,
        model_type: ModelType::GPT2,
        max_length: 64,
        do_sample: true,
        num_beams: 5,
        temperature: 1.1,
        num_return_sequences: 3,
        ..Default::default()
    };
    let model = TextGenerationModel::new(generate_config)?;

    let input_context = "Hello";
    let output = model.generate(&[input_context], None);

    for sentence in output {
        println!("{:?}", sentence);
    }
    Ok(())
}

I tried running this example and unforntuately the tokenizer raises an exception because it expects an unknown token that is aligned with the standard GPT2 token <|endoftext|>. I see you added <|endoftext|> to added_tokens.json - this is not yet supported by the Rust crate and all tokens are expected in the vocab.json file.

You would need to add <|endoftext|>: 50000 at the end of config.json to load the model in Rust.

@remotejob
Copy link
Author

remotejob commented Jun 12, 2021

My tentative resolve issue:

  1. I tried the original GPT2 model:

let custom_gpt2_model = Resource::Remote(RemoteResource::new(
    "https://huggingface.co/gpt2/resolve/main/rust_model.ot",
    "gpt2/model",
));

let custom_gpt2_vocab = Resource::Remote(RemoteResource::new(
    "https://huggingface.co/gpt2/resolve/main/vocab.json",
    "gpt2/vocab",
));

let custom_gpt2_merges = Resource::Remote(RemoteResource::new(
    "https://huggingface.co/gpt2/resolve/main/merges.txt",
    "gpt2/merges",
));

let custom_gpt2_config = Resource::Remote(RemoteResource::new(
    "https://huggingface.co/gpt2/resolve/main/config.json",
    "gpt2/config",
));

It's OK working

  1. I Create my model by "fine-tuning" instead of "from scratch" (I suppose <|endoftext|> must be included in what case?)
    (it's uploaded https://huggingface.co/remotejob/tweetsGPT2fi_v1)

I substitute the original model by my model.
But now I have an error:

Error: Tch tensor error: Internal torch error: [enforce fail at inline_container.cc:110] . file in archive is not in a subdirectory: transformer.wte.weight.npy
What is wrong?

PS: Python version working by the way!

@guillaume-be
Copy link
Owner

Hello @remotejob ,

The vocabulary is now fine. I tried loading the weights and could reproduce the error. I realized the latest version of the utilities script was performing an optimization of the model size that was not compatible with the GPT2 implementation. I just pushed some changes to master (#160) that allow leveraging this optimization, or turn these optimizations off.

Could you please pull the latest version of master and try converting the Rust weights again? You have 2 options to do so:

  1. python path/to/convert_model.py pytorch_model.bin will results in a set of weights that is compatible with the current version of the library published on crates.io
  2. python path/to/convert_model.py pytorch_model.bin --skip_embeddings will result in a smaller model size/memory footprint. These ill be compatible with current version of master, but you would have to wait for the next release if you want to use binaries published on crates.io

I ran the conversion on my end and the model is running:

"Kuka sei oo kuullu väärinymmärrystä? Onko kyseessä täm"
"Kuka sei vaan ole mitään tekemistä niin mikä tässä on ongelma? Ei"

@remotejob
Copy link
Author

remotejob commented Jun 12, 2021

Unfortunately python path/to/convert_model.py pytorch_model.bin
Segmentation fault (core dumped)
my python env:

autopep8 1.4.4
certifi 2021.5.30
chardet 4.0.0
click 8.0.1
filelock 3.0.12
huggingface-hub 0.0.8
idna 2.10
importlib-metadata 4.5.0
joblib 1.0.1
numpy 1.20.3
packaging 20.9
pip 21.1.2
pycodestyle 2.7.0
pyparsing 2.4.7
regex 2021.4.4
requests 2.25.1
sacremoses 0.0.45
setuptools 52.0.0.post20210125
six 1.16.0
tokenizers 0.10.3
torch 1.8.1
tqdm 4.61.0
transformers 4.7.0.dev0
typing-extensions 3.10.0.0
urllib3 1.26.5
wheel 0.36.2
zipp 3.4.1


Python 3.7.0

@remotejob
Copy link
Author

My solution:

  1. Downgrade Python env (transformers==4.6.1 instead of transformers==4.7.0.dev0)

Python 3.6.8 instead of Python 3.7.0 ?

certifi 2021.5.30
chardet 4.0.0
click 8.0.1
dataclasses 0.8
filelock 3.0.12
huggingface-hub 0.0.8
idna 2.10
importlib-metadata 4.5.0
joblib 1.0.1
numpy 1.19.5
packaging 20.9
pip 21.1.2
pyparsing 2.4.7
regex 2021.4.4
requests 2.25.1
sacremoses 0.0.45
setuptools 52.0.0.post20210125
six 1.16.0
tokenizers 0.10.3
torch 1.8.1
tqdm 4.61.1
transformers 4.6.1
typing-extensions 3.10.0.0
urllib3 1.26.5
wheel 0.36.2
zipp 3.4.1

As well in my case:
unset LIBTORCH
unset LD_LIBRARY_PATH

After path/to/convert_model.py pytorch_model.bin create a correct model

Great!!!

@guillaume-be
Copy link
Owner

Awesome! I was trying to reproduce the issue, and I had no problem with the following environment:
python v3.7.1 with a fresh new environment

Package            Version    
------------------ ---------- 
autopep8           1.4.4      
certifi            2021.5.30  
chardet            4.0.0      
click              8.0.1      
colorama           0.4.4      
filelock           3.0.12     
huggingface-hub    0.0.8      
idna               2.10       
importlib-metadata 4.5.0      
joblib             1.0.1      
numpy              1.20.3     
packaging          20.9       
pip                21.1.2     
pycodestyle        2.7.0      
pyparsing          2.4.7      
regex              2021.4.4   
requests           2.25.1     
sacremoses         0.0.45     
setuptools         39.0.1     
six                1.16.0     
tokenizers         0.10.3     
torch              1.8.1      
tqdm               4.61.0     
transformers       4.7.0.dev0  (current master version)
typing-extensions  3.10.0.0   
urllib3            1.26.5     
wheel              0.36.2     
zipp               3.4.1      

Glad it now works out!

@remotejob
Copy link
Author

It's my first approach to RUST.
Scope: increase performance. We are use GPT2 in production.
For now, I don't feel particular improvement. (It's only feeling after some tests)
Question:
It will be( speed improvement)? I think it must be.

So as I said I am new to RUST.
I will wait for a new release to make --skip_embeddings

I am appreciating for some hints related to making the program more productive.

Thanks.

@guillaume-be
Copy link
Owner

Hello @remotejob ,

The speed-up that can be expected from the Rust implementation depends on the use case. For text generation using GPT2, I did notice significant performance improvement (especially if top-k/p sampling is enabled). I have published some benchmarks available at https://guillaume-be.github.io/2020-11-21/generation_benchmarks where I was observing around 2x speed improvement on the Rust version compared to Python for GPT2 generation.

A few questions/comments that may be useful:

  • Are you compiling and running your program with all optimizations turned on? You'd need to pass the --release flag when building or running it (see https://doc.rust-lang.org/book/ch14-01-release-profiles.html)
  • Do you have a GPU available? In the absence of GPU, the bottleneck of your application will be the forward pass through the model. This library relies on Libtorch and shares the same backend as the Python API to Torch. In that case no benefits of the Rust implementation will be observed.
  • If you have a GPU, do you use the correct version of Libtorch? Have you installed and pointed to a GPU-enabled version of Libtorch? If you installed automatically (unset LIBTORCH; unset LD_LIBRARY_PATH), the CPU version is downloaded by default. You would have to set back these environment variables to a valid location, or set the environment variable TORCH_CUDA_VERSION=cu111 to automatically download a GPU-enabled version.
  • If you have a GPU, is it used? You could monitor your GPU usage or insert println!("Using CUDA: {:?}", Cuda::is_available()). If it returns false while you'd expect CUDA to be available, you can try to insert
unsafe {
    torch_sys::dummy_cuda_dependency();
}

in your code and add to your dependencies:

[dependencies]
torch-sys = "0.4.1"

@remotejob
Copy link
Author

  1. I can't use GPU in production
  2. --release flag doesn't help. I think the bottleneck in model size.

Question:
Is it possible substide GPT2 on DISTILGPT2?

let generate_config = TextGenerationConfig {
    model_resource: custom_gpt2_model,
    vocab_resource: custom_gpt2_vocab,
    merges_resource: custom_gpt2_merges,
    config_resource: custom_gpt2_config,
    model_type: ModelType::DISTILGPT2,
    max_length: 100,
    do_sample: true,
    num_beams: 5,
    temperature: 1.1,
    num_return_sequences: 30,
    ..Default::default()
};

probably it not yet supported?
I will create small DISTILGPT2 model and upload it on huggingface.co anyway.

@guillaume-be
Copy link
Owner

Hello @remotejob ,

DistilGPT2 is supported and uses the same architecture / ModelType as GPT2: ModelType::GPT2

@remotejob
Copy link
Author

remotejob commented Jun 15, 2021

Yes, I tested on https://huggingface.co/remotejob/tweetsDISTILGPT2fi_v1. It's work a little better, size is matter. But pytorch_model.bin smaller rust_model.ot. Unfortunately in my case looks like the python version works more productive. I don't see a particular improvement. I am waiting for a new release. Probably flag --skip_embeddings help.

Maybe parameters matter:
Python version

   beam_output = model.generate(
        input_ids,
        max_length=100,
        top_k=50,
        top_p=0.95,
        no_repeat_ngram_size=2,
        num_return_sequences=225, #200
        early_stopping=True,
        do_sample=True,
    )

Is it possible to set the same on RUST version?

Ok, it's working but no improvement in speed :(((

@guillaume-be
Copy link
Owner

Hello,

The generation parameters can be changed in the TextGenerationConfig, for example:
The defaults are different between the Python implementation and Rust (Rust uses 5 beams), but these can be changed. If you want the same settings as the example provided you could use the configuration below:

let generate_config = TextGenerationConfig {
	model_resource: custom_gpt2_model,
	vocab_resource: custom_gpt2_vocab,
	merges_resource: custom_gpt2_merges,
	config_resource: custom_gpt2_config,
	model_type: ModelType::GPT2,
	max_length: 100,
        top_k: 50,
        top_p: 0.95,
	do_sample: true,
	num_beams: 1,
        no_repeat_ngram_size: 2,
	num_return_sequences: 225,
	..Default::default()
};
let model = TextGenerationModel::new(generate_config)?;

Note that setting 225 for num_return_sequences is very large and could explain why computation are very expensive. I would recommend to lower this value to 5 to 10 maximum which should help the computation speed.

The next release will slightly reduce the model size and memory footprint, but will have no impact on the computation speed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants