add context_length to model configs that us HF models (and as param to HFTokenizer) #222

iejMac · 2022-11-13T12:56:08Z

No description provided.

usuyama · 2022-11-16T04:29:13Z

How about something like this?

Line 73 in 3e3e8a0

def get_tokenizer(model_name):

def get_tokenizer(model_name):
    config = get_model_config(model_name)
    context_length = config['text_cfg'].get("context_length", 77)
    if 'hf_tokenizer_name' in config['text_cfg']:
       tokenizer = HFTokenizer(config['text_cfg']['hf_tokenizer_name'], context_length)
    else:
       tokenizer = lambda texts: tokenize(texts, context_length)
    return tokenizer

rom1504 · 2022-11-20T00:04:46Z

sure why not

iejMac · 2022-11-20T00:06:15Z

btw, I'm not sure if this is urgent so I've been putting it off. If people need this soon I can push it up the priority queue

rom1504 · 2022-11-21T22:10:23Z

we noticed that some models that use relative positional embeddings (eg mt5) do not have a limited context length.

That's an edge case to take into account here

iejMac mentioned this issue Nov 13, 2022

Add FLAVA #218

Closed

rom1504 added the new feature label Nov 28, 2022

gabrielilharco closed this as completed Oct 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add context_length to model configs that us HF models (and as param to HFTokenizer) #222

add context_length to model configs that us HF models (and as param to HFTokenizer) #222

iejMac commented Nov 13, 2022

usuyama commented Nov 16, 2022

rom1504 commented Nov 20, 2022

iejMac commented Nov 20, 2022

rom1504 commented Nov 21, 2022

add context_length to model configs that us HF models (and as param to HFTokenizer) #222

add context_length to model configs that us HF models (and as param to HFTokenizer) #222

Comments

iejMac commented Nov 13, 2022

usuyama commented Nov 16, 2022

rom1504 commented Nov 20, 2022

iejMac commented Nov 20, 2022

rom1504 commented Nov 21, 2022