Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model not training beyond 1st epoch #10146

Closed
3 of 6 tasks
neel04 opened this issue Feb 11, 2021 · 16 comments
Closed
3 of 6 tasks

Model not training beyond 1st epoch #10146

neel04 opened this issue Feb 11, 2021 · 16 comments

Comments

@neel04
Copy link

neel04 commented Feb 11, 2021

Environment info

  • transformers version: 4.4.0.dev0
  • Platform: Linux-4.19.112+-x86_64-with-Ubuntu-18.04-bionic
  • Python version: 3.6.9
  • PyTorch version (GPU?): 1.7.0+cu101 (True)
  • Tensorflow version (GPU?): 2.4.1 (True)
  • Using GPU in script?: Yes
  • Using distributed or parallel set-up in script?: No (Single GPU) --> COLAB

Who can help

Models:

Information

Model I am using (Bert, XLNet ...): RoBERTa

The problem arises when using:

  • the official example scripts: (give details below)
  • my own modified scripts: (give details below)

The tasks I am working on is:

  • an official GLUE/SQUaD task: (give the name)
  • my own task or dataset: (give details below)

To reproduce

First off, this issue is basically a continuation of #10055 but since that error was mostly resolved, I have thus opened another issue. I am using a private dataset, so I am not at liberty to share it. However, I can provide a clue as to how the csv looks like:-


,ID,Text,Label
......................
Id_1, "Lorem Ipsum", 14

This is the code:-


!git clone https://github.com/huggingface/transformers.git
!cd transformers
!pip install -e .

train_text = list(train['Text'].values)
train_label = list(train['Label'].values)

val_text = list(val['Text'].values)
val_label = list(val['Label'].values)

from transformers import RobertaTokenizer, TFRobertaForSequenceClassification
import tensorflow as tf

tokenizer = RobertaTokenizer.from_pretrained('roberta-base')
model = TFRobertaForSequenceClassification.from_pretrained('roberta-base')

train_encodings = tokenizer(train_text, truncation=True, padding=True)
val_encodings = tokenizer(val_text, truncation=True, padding=True)

train_dataset = tf.data.Dataset.from_tensor_slices((
    dict(train_encodings),
    train_label
))
val_dataset = tf.data.Dataset.from_tensor_slices((
    dict(val_encodings),
    val_label
))

#----------------------------------------------------------------------------------------------------------------------
#Since The trainer does not work, I will use the native one
from transformers import TFTrainingArguments, TFTrainer

training_args = TFTrainingArguments(
    output_dir='./results',          # output directory
    num_train_epochs=3,              # total number of training epochs
    per_device_train_batch_size=16,  # batch size per device during training
    per_device_eval_batch_size=64,   # batch size for evaluation
    warmup_steps=500,                # number of warmup steps for learning rate scheduler
    weight_decay=0.01,               # strength of weight decay
    logging_dir='./logs',            # directory for storing logs
    logging_steps=10,
)

with training_args.strategy.scope():
    model = TFRobertaForSequenceClassification.from_pretrained("roberta-base")

trainer = TFTrainer(
    model=model,                         # the instantiated Transformers model to be trained
    args=training_args,                  # training arguments, defined above
    train_dataset=train_dataset,         # training dataset
    eval_dataset=val_dataset             # evaluation dataset
)

trainer.train()
#----------------------------------------------------------------------------------------------------------------------
#Using Native Tensorflow 

from transformers import TFRobertaForSequenceClassification
import tensorflow as tf

model = TFRobertaForSequenceClassification.from_pretrained('roberta-base', num_labels=1)

optimizer = tf.keras.optimizers.Adam(learning_rate=1e-18)

loss_fn = tf.keras.losses.CategoricalCrossentropy(from_logits=True)

model.compile(optimizer=optimizer, loss=loss_fn, metrics=['accuracy']) # can also use any keras loss fn
model.fit(train_dataset.batch(8), validation_data = val_dataset.batch(64), epochs=15, batch_size=8)

The Problems:

  • Cannot train using the Trainer() method. The cell successfully executes, but it does nothing - does not start training at all. This is not much of a major issue but it may be a factor in this problem.
  • Model does not train more than 1 epoch :---> I have shared this log for you, where you can clearly see that the model does not train beyond 1st epoch; The rest of epochs just do what the first accomplished:-
All model checkpoint layers were used when initializing TFRobertaForSequenceClassification.

Some layers of TFRobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

Epoch 1/5
WARNING:tensorflow:The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).WARNING:tensorflow:AutoGraph could not transform <bound method Socket.send of <zmq.sugar.socket.Socket object at 0x7f5b14f1b6c8>> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: <cyfunction Socket.send at 0x7f5b323fb2a0> is not a module, class, method, function, traceback, frame, or code object
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
WARNING: AutoGraph could not transform <bound method Socket.send of <zmq.sugar.socket.Socket object at 0x7f5b14f1b6c8>> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: <cyfunction Socket.send at 0x7f5b323fb2a0> is not a module, class, method, function, traceback, frame, or code object
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert

WARNING:tensorflow:AutoGraph could not transform <function wrap at 0x7f5b301d3c80> and will run it as-is.
Cause: while/else statement not yet supported
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
WARNING: AutoGraph could not transform <function wrap at 0x7f5b301d3c80> and will run it as-is.
Cause: while/else statement not yet supported
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
WARNING:tensorflow:The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
WARNING:tensorflow:The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
WARNING:tensorflow:The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
180/180 [==============================] - ETA: 0s - loss: 0.0000e+00 - accuracy: 0.0022WARNING:tensorflow:The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
WARNING:tensorflow:The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
180/180 [==============================] - 150s 589ms/step - loss: 0.0000e+00 - accuracy: 0.0022 - val_loss: 0.0000e+00 - val_accuracy: 0.0077
Epoch 2/5
180/180 [==============================] - 105s 582ms/step - loss: 0.0000e+00 - accuracy: 0.0022 - val_loss: 0.0000e+00 - val_accuracy: 0.0077
Epoch 3/5
180/180 [==============================] - 105s 582ms/step - loss: 0.0000e+00 - accuracy: 0.0022 - val_loss: 0.0000e+00 - val_accuracy: 0.0077

I think the problem may be that the activation function may be wrong. For CategoricalCrossentropy we need a Sigmoid loss but maybe the activation used in my code is not that.

Can anyone tell me how exactly to change the activation function, or maybe other thoughts on the potential problem? I have tried changing the learning rate with no effect.

@NielsRogge
Copy link
Contributor

Could you please post this on the forum, rather than here? The authors of HuggingFace like to keep this place for bugs or feature requests, and they're more than happy to help you on the forum.

Looking at your code, this seems more like an issue with preparing the data correctly for the model.

Take a look at this example in the docs on how to perform text classification with the Trainer.

@neel04
Copy link
Author

neel04 commented Feb 11, 2021

@NielsRogge Not very pleased with your reply, please ask someone a question if you are unclear about something rather than trying to just close an issue.

As regards the data, I can assure you it is in the format specified by your guide - It is in NumPy arrays converted to list and then made into a TFDataset object and has all the correct parts. The conversion was made to list because an error clearly specified that lists are to be passed.

This is a bug because the model does appear to be training, just having extremely low accuracy (Which may be because of the activation function, but I am not sure) and it won't train any further than the 1st epoch, where subsequent epochs don't pick up where the previous epoch left.

@NielsRogge
Copy link
Contributor

NielsRogge commented Feb 11, 2021

I've created a Google Colab that will hopefully resolve your issue:

https://colab.research.google.com/drive/1azTvNc0AZeN5JMyzPnOGic53jddIS-QK?usp=sharing

What I did was create some dummy data based on the format of your data, and then see if the model is able to overfit them (as this is one of the most common things to do first when debugging a neural network). As you can see in the notebook, it appears to do, so everything seems to be working fine. Let me know if this helps.

UPDATE: looking at your code, it appears that the learning rate is way too low in your case. A typical value for Transformers is 5e-5.

@neel04
Copy link
Author

neel04 commented Feb 11, 2021

@NielsRogge Thanx a lot for the advice, I will surely update you regarding any solution.

I have been trying to apply this to my own code, but I am still reproducing the bug - the warnings are there (unlike yours) I am using the latest version of transformers. The problem is that it doesn't learn - whatever progress it has made in 1st epoch is replicated in the rest of them. As an example, using this dummy dataset:-

train_text = ['a', 'b']
train_label = [0,1]
val_text = ['b']
val_label = [1]

even after 35 epochs, the model does not overfit. the same accuracy/loss is maintained irrespective of the loss function.


from transformers import TFRobertaForSequenceClassification
import tensorflow as tf

model = TFRobertaForSequenceClassification.from_pretrained('roberta-base', num_labels=1)

optimizer = tf.keras.optimizers.Adam(learning_rate=5e-5)

loss_fn = tf.keras.losses.CategoricalCrossentropy(from_logits=True)

model.compile(optimizer=optimizer, loss=loss_fn, metrics=['accuracy']) # can also use any keras loss fn
model.fit(train_dataset.batch(16), validation_data = val_dataset.batch(64), epochs=5, batch_size=1)

UPDATE: You might have missed this line @NeilsRogge about using the Keras loss function rather than the default one
loss_fn = tf.keras.losses.CategoricalCrossentropy(from_logits=True)
can you try reproduce the issue with that?

@sgugger
Copy link
Collaborator

sgugger commented Feb 11, 2021

Not very pleased with your reply, please ask someone a question if you are unclear about something rather than trying to just close an issue.

I want to jump in here and let you know that this kind of behavior is inappropriate. @NielsRogge is doing his best to help you here and he is doing this on his own free time. "My model is not training" is very vague and doesn't seem like a bug, so suggesting to take this on the forums is very appropriate: more people will be able to help you there.

Please respect that this is an open-source project. No one has to help you solve your bug so staying open-mined and kind will go a long way into getting the help you need.

@neel04
Copy link
Author

neel04 commented Feb 11, 2021

@sgugger with all due respect, My model was training; just that it lost all progress it had made in an epoch for the next one - starting and ending with the exact number. And this is very much a bug.

And about the open-source project, I do understand that this is voluntary but, someday if you need help and someone else tells you without reading your question that whatever you have done (without any prior proof) and suggests you to ask your question somewhere else that I know for a fact is not that active, I would like to see your response.

We have many projects that are not backed by a company - look at TPOT for instance. its maintainer (weixuanfu) does this mostly as a hobby and for learning but if there is something he does not know, he wouldn't say "ask your question somewhere else" and not fully try to solve the problem.

If you don't want to spend time solving my problem, that's fine. I have no issue with that. But if you do not want to solve my problem just to close down the list of issues then, it feels pretty bad. I do know that I don't understand ML very deeply and certainly not enough to make a project of mine, but I do know the difference between someone actually trying to help me versus just trying to reduce the number of open GIthub issues.

@NielsRogge
Copy link
Contributor

NielsRogge commented Feb 12, 2021

I do think there's a bit of a misunderstanding with what we mean by a bug.

Of course, since your model isn't training properly, there's a bug in your code. But in this case, it's a bug probably caused by the user (these bugs include setting hyperparameters like learning rate too low, not setting your model in training mode, improper use of the Trainer, etc.). These things are bugs, but they are caused by the user. And for such cases, the forum is the ideal place to seek help.

Github issues are mostly for bugs caused by the Transformers library itself, i.e. caused by the authors (these bugs include implementations of models which are incorrect, a bug in the implementation of the Trainer, etc.).

So the issue you're posting here is a perfect use case for the forum! It's not that we want to close issues as soon as possible, and it's also not the case that we don't want to help you. It's just a difference between bugs due to the user/bugs due to the library itself, and there are 2 different places for this.

@jplu
Copy link
Contributor

jplu commented Feb 12, 2021

What said @NielsRogge is correct, your way of training your model is not correct (and your data might also be malformed). As far as I can see, if your data really looks like:

ID,Text,Label
......................
Id_1, "Lorem Ipsum", 14

I guess that if you have label id up to at least 14, it certainly means that you have more than one label, then the line
model = TFRobertaForSequenceClassification.from_pretrained('roberta-base', num_labels=1) is wrong and 1 should be replaced by the proper number.

Nevertheless, if you really have only one label, your loss must be tf.keras.losses.MeanSquaredError and not tf.keras.losses.CategoricalCrossentropy. But, if you have more than one label your loss must be tf.keras.losses.SparseCategoricalCrossentropy.

So as far as I can say, I second what has been said before and this post should be on the forum, not here.

@neel04
Copy link
Author

neel04 commented Feb 12, 2021

@jplu Hmm.. I had thought that num_labels was the number labels to be predicted by the model (Like if it is multi-label classification) and about the data, I am importing it in NumPy arrays after preprocessing so I don't see why the structure of the data frame might be a problem.

@NielsRogge You may be right that the bug may be hyperparameter (I tried using all sorts of LR but it didn't work) but the reason why I think it is a bug in transformers is that if the loss starts from 100 and ends at 70 in 1st epoch, it is the exact same story in the rest of the epochs (They start and end with the same numbers):

.................
accuracy: 0.0025 - val_loss: 87.4479 - val_accuracy: 0.0077
accuracy: 0.0047 - val_loss: 87.4479 - val_accuracy: 0.0077
accuracy: 0.0049 - val_loss: 87.4479 - val_accuracy: 0.0077
accuracy: 0.0043 - val_loss: 87.4479 - val_accuracy: 0.0077
accuracy: 0.0052 - val_loss: 87.4479 - val_accuracy: 0.0077
.................

Another reason was that trying to train the model using Trainer() did not work (the cell executes successfully) but does not start training nor report an error. Can you tell me whether this is a bug or not? I had put it in the list above, and this is the output of the cell:- [just normal warnings, but does not start training]

All model checkpoint layers were used when initializing TFRobertaForSequenceClassification.

Some layers of TFRobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

WARNING:tensorflow:The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
WARNING:tensorflow:The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
WARNING:tensorflow:The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
WARNING:tensorflow:The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.

UPDATE: After quite some fixing, the model is now training and seems to be learning (I am still confused about what exactly num_labels is supposed to mean - number of total labels present in data OR labels that the model has to predict [multi-label classification]). Anyways, It still doesn't train with Trainer() which means I can't do Hyperparameter tuning :(

@sgugger
Copy link
Collaborator

sgugger commented Feb 12, 2021

Anyways, It still doesn't train with Trainer() which means I can't do Hyperparameter tuning :(

As mentioned before TFTrainer does not have hyper-parameter tuning. You should try the Keras one.

@neel04
Copy link
Author

neel04 commented Feb 13, 2021

@sgugger I don't get what you mean - I should use PyTorch trainer? because I can't find any trainer for Keras in docs, only for native Tensorflow. In the example, here they just use Trainer. Is there any way to do Htuning with keras/TF only, and not use pytorch?

@sgugger
Copy link
Collaborator

sgugger commented Feb 14, 2021

This example is using PyTorch, not TensorFlow. There is no hyper-parameter tuning implemented in Transformers in TensorFlow, which is why I was recommending Keras Tuner.

@neel04
Copy link
Author

neel04 commented Feb 14, 2021

Alright. Thanx a ton!

@neel04 neel04 closed this as completed Feb 14, 2021
@liaocs2008
Copy link

Anyways, It still doesn't train with Trainer() which means I can't do Hyperparameter tuning :(

As mentioned before TFTrainer does not have hyper-parameter tuning. You should try the Keras one.

Do you plan to add this support for TFTrainer?

@LysandreJik
Copy link
Member

@liaocs2008 the TFTrainer is not deprecated in favor of Keras which is now the default in all of our examples.

@mrinalTheCoder
Copy link

After quite some fixing, the model is now training and seems to be learning

@neel04 I am facing the same issue, the model seems to be resetting after each epoch. Could you please share what fixes you implemented?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants