This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
Fail to train openai-community / gpt2 model for custom NER on SpaCy framework #13334
Labels
feat/llm
Feature: LLMs (incl. spacy-llm)
I want to create a custom NER tag using GPT2. I want to use this model. I am familiar with SpaCy custom training framework. I formatted the config.cfg file as per the requirement. The config.cfg file is as follows
I received an error "ValueError: Asking to pad but the tokenizer does not have a padding token. Please select a token to use as pad_token (tokenizer.pad_token = tokenizer.eos_token e.g.) or add a new pad token via tokenizer.add_special_tokens({'pad_token': '[PAD]'})."
I posted this issue to the OpenAi forum and came to know from @younesbelkada that I need to call tokenizer.pad_token = tokenizer.eos_token before launching the training.
I modified the config.cfg file again. The modification portion is as follows
Now I received the error "✘ Config validation error
nlp -> tokenizer.pad_token extra fields not permitted
{'lang': 'en', 'pipeline': ['transformer', 'ner'], 'batch_size': 128, 'disabled': [], 'before_creation': None, 'after_creation': None, 'after_pipeline_creation': None, 'tokenizer': {'@Tokenizers': 'spacy.Tokenizer.v1'}, 'vectors': {'@vectors': 'spacy.Vectors.v1'}, 'tokenizer.pad_token': 'tokenizer.eos_token'}"
Can you please let me know how can I fix this issue and train the GPT2 model.
Thank you in advance.
The text was updated successfully, but these errors were encountered: