You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Traceback (most recent call last):
File "/home/paperspace/llm-foundry/scripts/train/train.py", line 376, in <module>
main(cfg)
File "/home/paperspace/llm-foundry/scripts/train/train.py", line 365, in main
trainer.fit()
File "/home/paperspace/llm-foundry/env/lib/python3.9/site-packages/composer/trainer/trainer.py", line 1766, in fit
self._train_loop()
File "/home/paperspace/llm-foundry/env/lib/python3.9/site-packages/composer/trainer/trainer.py", line 1996, in _train_loop
self.engine.run_event(Event.BATCH_CHECKPOINT)
File "/home/paperspace/llm-foundry/env/lib/python3.9/site-packages/composer/core/engine.py", line 293, in run_event
self._run_nonlogger_callbacks(event)
File "/home/paperspace/llm-foundry/env/lib/python3.9/site-packages/composer/core/engine.py", line 475, in _run_nonlogger_callbacks
self._run_callbacks(event, callbacks)
File "/home/paperspace/llm-foundry/env/lib/python3.9/site-packages/composer/core/engine.py", line 467, in _run_callbacks
cb.run_event(event, self.state, self.logger)
File "/home/paperspace/llm-foundry/env/lib/python3.9/site-packages/composer/core/callback.py", line 96, in run_event
return event_cb(state, logger)
File "/home/paperspace/llm-foundry/env/lib/python3.9/site-packages/composer/callbacks/checkpoint_saver.py", line 346, in batch_checkpoint
self._save_checkpoint(
File "/home/paperspace/llm-foundry/env/lib/python3.9/site-packages/composer/callbacks/checkpoint_saver.py", line 384, in _save_checkpoint
saved_path = checkpoint.save_checkpoint(
File "/home/paperspace/llm-foundry/env/lib/python3.9/site-packages/composer/utils/checkpoint.py", line 518, in save_checkpoint
'state': state.state_dict(),
File "/home/paperspace/llm-foundry/env/lib/python3.9/site-packages/composer/core/state.py", line 838, in state_dict
state_dict['integrations'] = self._get_integrations_state_dict()
File "/home/paperspace/llm-foundry/env/lib/python3.9/site-packages/composer/core/state.py", line 727, in _get_integrations_state_dict
integrations['huggingface'] = self.model.get_metadata()
File "/home/paperspace/llm-foundry/env/lib/python3.9/site-packages/composer/models/huggingface.py", line 404, in get_metadata
self.model.config.save_pretrained(model_dir)
File "/home/paperspace/llm-foundry/env/lib/python3.9/site-packages/transformers/configuration_utils.py", line 456, in save_pretrained
self.to_json_file(output_config_file, use_diff=True)
File "/home/paperspace/llm-foundry/env/lib/python3.9/site-packages/transformers/configuration_utils.py", line 845, in to_json_file
writer.write(self.to_json_string(use_diff=use_diff))
File "/home/paperspace/llm-foundry/env/lib/python3.9/site-packages/transformers/configuration_utils.py", line 831, in to_json_string
return json.dumps(config_dict, indent=2, sort_keys=True) + "\n"
File "/usr/lib/python3.9/json/__init__.py", line 234, in dumps
return cls(
File "/usr/lib/python3.9/json/encoder.py", line 201, in encode
chunks = list(chunks)
File "/usr/lib/python3.9/json/encoder.py", line 431, in _iterencode
yield from _iterencode_dict(o, _current_indent_level)
File "/usr/lib/python3.9/json/encoder.py", line 405, in _iterencode_dict
yield from chunks
File "/usr/lib/python3.9/json/encoder.py", line 438, in _iterencode
o = _default(o)
File "/usr/lib/python3.9/json/encoder.py", line 179, in default
raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type DictConfig is not JSON serializable
when a checkpoint is being saved. There's a chance I'm passing in the info incorrectly though
The text was updated successfully, but these errors were encountered:
@vchiley This problem will still occur when the default value for a specific config key is not a mapping but the override value is. This is the case for e.g., llama-2-hf scaled rope embeddings. The relevant part of the config.json is null by default, but to activate the scaling you replace it with a map like so:
When using
train.py
inscripts/train
with a config likewe get the error
when a checkpoint is being saved. There's a chance I'm passing in the info incorrectly though
The text was updated successfully, but these errors were encountered: