Skip to content

Issue in resuming finetuning Llama 3.1 Instruct Model #36035

@Rishav-hub

Description

@Rishav-hub

When I run the below script, I am not able to resume it.

trainer.train(resume_from_checkpoint = "/content/drive/MyDrive/001_projects/exigent/Contract Management and Extraction (CME)/main_data/001_CML_3/CMS_models/phase_6/llama_models/tuning_Llama_3-1_8B_instruct_phase_6_exp_with_sys_prompt/checkpoint-21450/") # tuning_Llama_3-1_8B_instruct_phase_6_exp_with_sys_prompt

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
[<ipython-input-61-9126eb228050>](https://localhost:8080/#) in <cell line: 0>()
----> 1 trainer.train(resume_from_checkpoint = "/content/drive/MyDrive/001_projects/exigent/Contract Management and Extraction (CME)/main_data/001_CML_3/CMS_models/phase_6/llama_models/tuning_Llama_3-1_8B_instruct_phase_6_exp_with_sys_prompt/checkpoint-21450/") # tuning_Llama_3-1_8B_instruct_phase_6_exp_with_sys_prompt

2 frames
[/usr/local/lib/python3.11/dist-packages/trl/trainer/sft_trainer.py](https://localhost:8080/#) in train(self, *args, **kwargs)
    449             self.model = self._trl_activate_neftune(self.model)
    450 
--> 451         output = super().train(*args, **kwargs)
    452 
    453         # After training we make sure to retrieve back the original forward pass method

[/usr/local/lib/python3.11/dist-packages/transformers/trainer.py](https://localhost:8080/#) in train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
   2134         if resume_from_checkpoint is not None:
   2135             if not is_sagemaker_mp_enabled() and not self.is_deepspeed_enabled and not self.is_fsdp_enabled:
-> 2136                 self._load_from_checkpoint(resume_from_checkpoint)
   2137             # In case of repeating the find_executable_batch_size, set `self._train_batch_size` properly
   2138             state = TrainerState.load_from_json(os.path.join(resume_from_checkpoint, TRAINER_STATE_NAME))

[/usr/local/lib/python3.11/dist-packages/transformers/trainer.py](https://localhost:8080/#) in _load_from_checkpoint(self, resume_from_checkpoint, model)
   2842                     if hasattr(model, "active_adapters"):
   2843                         active_adapters = model.active_adapters
-> 2844                         if len(active_adapters) > 1:
   2845                             logger.warning("Multiple active adapters detected will only consider the first adapter")
   2846                         active_adapter = active_adapters[0]

TypeError: object of type 'method' has no len()

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions