Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LayoutLM Token Classification not learning #8524

Closed
AntPeixe opened this issue Nov 13, 2020 · 4 comments
Closed

LayoutLM Token Classification not learning #8524

AntPeixe opened this issue Nov 13, 2020 · 4 comments
Labels

Comments

@AntPeixe
Copy link

Environment info

  • transformers version: 3.4.0
  • Platform: in docker based on image: nvidia/cuda:10.1-cudnn7-devel-ubuntu18.04
  • Python version: 3.7.9
  • PyTorch version (GPU?): 1.5.1+cu92 (True)
  • Tensorflow version (GPU?): 2.2.0-rc0 (True)
  • Using GPU in script?: True
  • Using distributed or parallel set-up in script?:

Information

Model I am using (Bert, XLNet ...): LayoutLMForTokenClassification

The problem arises when using: my own scripts

The tasks I am working on is: my own task
NER task. I've reproduced the implementation of Dataset, compute metrics (and other helper functions) as in the original repo microsoft/layoutlm repo

When initially trying with the original repo and training script the model managed to learn and provided reasonable results after very few epochs. After implementing with Huggingface the model doesn't learn at all even after a much higher number of epochs.

To reproduce

Model loading and trainer configuration:

config = LayoutLMConfig.from_pretrained(
    <path_layoutlm_base_uncased>,
    num_labels=<num_labels>,
    cache_dir=None
)
model = LayoutLMForTokenClassification.from_pretrained(
    <path_layoutlm_base_uncased>
    from_tf=bool(".ckpt" in <path_layoutlm_base_uncased>),
    config=config,
    cache_dir=None,
)

device = torch.device("cuda")
model.train().to(device)

TrainingArguments(
    output_dir=<pytorch_model_dir>,  # output directory
    do_train=True,
    do_eval=True,
    do_predict=False,
    evaluation_strategy=EvaluationStrategy.EPOCH,
    num_train_epochs=<epochs>,  # total # of training epochs
    per_device_train_batch_size=<batch_size>,  # batch size per device during training
    per_device_eval_batch_size=<batch_size>,  # batch size for evaluation
    weight_decay=<weight_decay>,  # strength of weight decay
    learning_rate=<learning_rate>,
    adam_epsilon=<adam_epsilon>,
    logging_dir=<profile_logs>,  # Tensorboard log directory
    logging_steps=0,  # it logs when running evaluation so no need to log on step interval
    save_steps=0,
    seed=seed,
    overwrite_output_dir=True,
    disable_tqdm=False,
    load_best_model_at_end=True,
    save_total_limit=10,
    fp16=True,
)

trainer = MetaMazeTrainer(
    model=model,  # the instantiated 🤗 Transformers model to be trained
    args=training_args,  # training arguments, defined above
    train_dataset=train_dataset,  # training dataset
    eval_dataset=test_dataset,  # evaluation dataset
    compute_metrics=compute_metrics,
)

Expected behavior

Similar results to the original repo as the given the same parameters to the trainer and the Dataset being the same after processing the data.

Is this due to the ongoing integration of this model? Is the setup wrong?

@aleksandra-sp
Copy link

Is there any update on this issue?

@NielsRogge
Copy link
Contributor

NielsRogge commented Jan 4, 2021

Hi there!

I have been investigating the model by making integration tests, and turns out it outputs the same tensors as the original repository on the same input data, so there are no issues (tested this both for the base model - LayoutLMModel as well as the models with heads on top - LayoutLMForTokenClassification and LayoutLMForSequenceClassification).

However, the model is poorly documented in my opinion, I needed to first look at the original repository to understand everything. I made a demo notebook that showcases how to fine-tune HuggingFace's LayoutLMForTokenClassification on the FUNSD dataset (a sequence labeling task): https://github.com/NielsRogge/Transformers-Tutorials/blob/master/LayoutLM/Fine_tuning_LayoutLMForTokenClassification_on_FUNSD.ipynb

Let me know if this helps you!

@hasansalimkanmaz
Copy link
Contributor

I have experienced the same issue, I realized that model files from here are different than the weights in the original repo. I was using weights from the original repo and the model couldn't load them at the start of the training. So, I was starting from a random model instead of a pre-trained one. That's why it is not learning much in a down-stream task.

I solved the issue by using model files from huggingface

@github-actions
Copy link

github-actions bot commented Mar 6, 2021

This issue has been automatically marked as stale and been closed because it has not had recent activity. Thank you for your contributions.

If you think this still needs to be addressed please comment on this thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants