Same data but has large different while evaluating in the training stage vs evaluate it standalone from read the finetuned-model

Hi team. 
I have stuck on this problem for a whole week and still cannot figure out why. 
Env: python 3.8, transformer -- 4.28
I am using the XLMRobertA Base for finetuning the model for a multi-class classification. 
However, 
when in the training step, I run trainer.evaluate() it shows the accuracy is 68% while in the evaluate standalone, which it reads the base model and then make the prediction and evaluate it, the accuracy drops to 30%. Is there any reason why it happens, or it's a bug? 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Same data but has large different while evaluating in the training stage vs evaluate it standalone from read the finetuned-model #37265

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Same data but has large different while evaluating in the training stage vs evaluate it standalone from read the finetuned-model #37265

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions