Object of type 'int64' is not JSON serializable in Trainer.save_checkpoint #10299

arthurbra · 2021-02-20T13:12:30Z

I am using the recent run_ner.py example script to train an NER model. I want to evaluate the performance of the model during training and use the following command for training:

python3 run_ner.py 
--model_name_or_path bert-base-uncased                                                                                                                                     
--dataset_name conll2003                                                                                                                         
--return_entity_level_metrics                                                                                                                                               
--output_dir conll-tmp                                                                                                                                                    
--overwrite_output_dir                                                                                                                                                      
--do_train                                                                                                                                                                 
--do_eval                                                                                                                                                                   
--do_predict                                                                                                                                                                
--evaluation_strategy steps                                                                                                                                                
--logging_steps 10                                                                                                                                                         
--eval_steps 10                                                                                                                                                            
--load_best_model_at_end

I run the command in the current docker image huggingface/transformers-pytorch-gpu
However, I get the following error:

Traceback (most recent call last):                                                                                                                                             
File "run_ner.py", line 470, in main()                                                                                                                                                                     
File "run_ner.py", line 404, in main                                                                                                                              
train_result = trainer.train(resume_from_checkpoint=checkpoint)                                                                                                            File "/usr/local/lib/python3.6/dist-packages/transformers/trainer.py", line 983, in train                                                                                      self._maybe_log_save_evaluate(tr_loss, model, trial, epoch)                                                                                                                File "/usr/local/lib/python3.6/dist-packages/transformers/trainer.py", line 1062, in _maybe_log_save_evaluate                                                                  self._save_checkpoint(model, trial, metrics=metrics)                                                                                                                       
File "/usr/local/lib/python3.6/dist-packages/transformers/trainer.py", line 1126, in _save_checkpoint                                                                          self.state.save_to_json(os.path.join(output_dir, "trainer_state.json"))                                                                                                    File "/usr/local/lib/python3.6/dist-packages/transformers/trainer_callback.py", line 95, in save_to_json                                                                       json_string = json.dumps(dataclasses.asdict(self), indent=2, sort_keys=True) + "\n"                                                                                        File "/usr/lib/python3.6/json/__init__.py", line 238, in dumps                                                                                                                 **kw).encode(obj)                                                                                                                                                          
File "/usr/lib/python3.6/json/encoder.py", line 201, in encode                                                                                                                 chunks = list(chunks)                                                                                                                                                      
File "/usr/lib/python3.6/json/encoder.py", line 430, in _iterencode                                                                                                            yield from _iterencode_dict(o, _current_indent_level)                                                                                                                      
File "/usr/lib/python3.6/json/encoder.py", line 404, in _iterencode_dict                                                                                                       yield from chunks                                                                                                                                                          
File "/usr/lib/python3.6/json/encoder.py", line 325, in _iterencode_list                                                                                                       yield from chunks                                                                                                                                                          
File "/usr/lib/python3.6/json/encoder.py", line 404, in _iterencode_dict                                                                                                       yield from chunks                                                                                                                                                          
File "/usr/lib/python3.6/json/encoder.py", line 437, in _iterencode                                                                                                            o = _default(o)                                                                                                                                                            
File "/usr/lib/python3.6/json/encoder.py", line 180, in default                                                                                                                o.__class__.__name__)                                                                                                                                                    
TypeError: Object of type 'int64' is not JSON serializable
--

The text was updated successfully, but these errors were encountered:

antonyscerri · 2021-03-09T12:16:15Z

I too ran into this problem and its caused by turning on evaluation strategy which then adds metrics in the log_history of the models state, which is using numpy data types and causes the JSON encoder issue. That was the case with 4.3.3. There appear to be a bunch of changes in the trainer in the works, whether this has been fixed as a result of those i've not checked.

antonyscerri · 2021-03-09T16:33:52Z

As a temporary work around you can modify trainer.py at line 1260 "output = {**logs, **{"step": self.state.global_step}}" and add the following three lines after. If the metrics are being calculated the same in the latest code as in 4.3.3 then something like this may also be needed going forward, or things calling the log method will need to ensure they safely cast data points beforehand if its going to be added to the trainer state still.

        for k,v in output.items():
            if isinstance(v, np.generic):
                output[k]=v.item()

sgugger · 2021-03-10T03:21:41Z

I confirm I can reproduce in master. Will investigate more tomorrow.

antonyscerri · 2021-03-11T14:08:43Z

My only comment on the fix submitted is that it targets the metrics output, but will not stop others putting things into the log history in the model state which later on cause the same problem if serializing the state to json.

LysandreJik assigned sgugger Mar 10, 2021

sgugger mentioned this issue Mar 10, 2021

Ensure metric results are JSON-serializable #10632

Merged

LysandreJik closed this as completed in #10632 Mar 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Object of type 'int64' is not JSON serializable in Trainer.save_checkpoint #10299

Object of type 'int64' is not JSON serializable in Trainer.save_checkpoint #10299

arthurbra commented Feb 20, 2021 •

edited

antonyscerri commented Mar 9, 2021

antonyscerri commented Mar 9, 2021

sgugger commented Mar 10, 2021

antonyscerri commented Mar 11, 2021

Object of type 'int64' is not JSON serializable in Trainer.save_checkpoint #10299

Object of type 'int64' is not JSON serializable in Trainer.save_checkpoint #10299

Comments

arthurbra commented Feb 20, 2021 • edited

antonyscerri commented Mar 9, 2021

antonyscerri commented Mar 9, 2021

sgugger commented Mar 10, 2021

antonyscerri commented Mar 11, 2021

arthurbra commented Feb 20, 2021 •

edited