-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Object of type 'int64' is not JSON serializable in Trainer.save_checkpoint #10299
Comments
I too ran into this problem and its caused by turning on evaluation strategy which then adds metrics in the log_history of the models state, which is using numpy data types and causes the JSON encoder issue. That was the case with 4.3.3. There appear to be a bunch of changes in the trainer in the works, whether this has been fixed as a result of those i've not checked. |
As a temporary work around you can modify trainer.py at line 1260 "output = {**logs, **{"step": self.state.global_step}}" and add the following three lines after. If the metrics are being calculated the same in the latest code as in 4.3.3 then something like this may also be needed going forward, or things calling the log method will need to ensure they safely cast data points beforehand if its going to be added to the trainer state still.
|
I confirm I can reproduce in master. Will investigate more tomorrow. |
My only comment on the fix submitted is that it targets the metrics output, but will not stop others putting things into the log history in the model state which later on cause the same problem if serializing the state to json. |
I am using the recent run_ner.py example script to train an NER model. I want to evaluate the performance of the model during training and use the following command for training:
I run the command in the current docker image huggingface/transformers-pytorch-gpu
However, I get the following error:
The text was updated successfully, but these errors were encountered: