Releases: hiyouga/LLaMA-Factory
v0.7.0: LLaVA Multimodal LLM Support
Congratulations on 20k stars π We are the 1st of the GitHub Trending at Apr. 23rd π₯ Follow us at X
New features
- Support SFT/PPO/DPO/ORPO for the LLaVA-1.5 model by @BUAADreamer in #3450
- Support inferring the LLaVA-1.5 model with both native Transformers and vLLM by @hiyouga in #3454
- Support vLLM+LoRA inference for partial models (see support list)
- Support 2x faster generation of the QLoRA model based on UnslothAI's optimization
- Support adding new special tokens to the tokenizer via the
new_special_tokens
argument - Support choosing the device to merge LoRA in LlamaBoard via the
export_device
argument - Add a Colab notebook for getting into fine-tuning the Llama-3 model on a free T4 GPU
- Automatically enable SDPA attention and fast tokenizer for higher performance
New models
- Base models
- OLMo-1.7-7B
- Jamba-v0.1-51B
- Qwen1.5-110B
- DBRX-132B-Base
- Instruct/Chat models
- Phi-3-mini-3.8B-instruct (4k/128k)
- LLaVA-1.5-7B
- LLaVA-1.5-13B
- Qwen1.5-110B-Chat
- DBRX-132B-Instruct
New datasets
- Supervised fine-tuning datasets
- LLaVA mixed (en&zh) by @BUAADreamer in #3471
- Preference datasets
- DPO mixed (en&zh) by @hiyouga
Bug fix
v0.6.3: Llama-3 and 3x Longer QLoRA
New features
- Support Meta Llama-3 (8B/70B) models
- Support UnslothAI's long-context QLoRA optimization (56,000 context length for Llama-2 7B in 24GB)
- Support previewing local datasets in directories in LlamaBoard by @codemayq in #3291
New algorithms
New models
- Base models
- CodeGemma (2B/7B)
- CodeQwen1.5-7B
- Llama-3 (8B/70B)
- Mixtral-8x22B-v0.1
- Instruct/Chat models
- CodeGemma-7B-it
- CodeQwen1.5-7B-Chat
- Llama-3-Instruct (8B/70B)
- Command R (35B) by @marko1616 in #3254
- Command R+ (104B) by @marko1616 in #3254
- Mixtral-8x22B-Instruct-v0.1
Bug fix
- Fix full-tuning batch prediction examples by @khazic in #3261
- Fix output_router_logits of Mixtral by @liu-zichen in #3276
- Fix automodel from pretrained with attn implementation (see huggingface/transformers#30298)
- Fix unable to convergence issue in the layerwise galore optimizer (see huggingface/transformers#30371)
- Fix #3184 #3238 #3247 #3273 #3316 #3317 #3324 #3348 #3352 #3365 #3366
v0.6.2: ORPO and Qwen1.5-32B
New features
- Support ORPO algorithm by @hiyouga in #3066
- Support inferring BNB 4-bit models on multiple GPUs via the
quantization_device_map
argument - Reorganize README files, move example scripts to the
examples
folder - Support saving & loading arguments quickly in LlamaBoard by @hiyouga and @marko1616 in #3046
- Support load alpaca-format dataset from the hub without
dataset_info.json
by specifying--dataset_dir ONLINE
- Add a parameter
moe_aux_loss_coef
to control the coefficient of auxiliary loss in MoE models.
New models
- Base models
- Breeze-7B-Base
- Qwen1.5-MoE-A2.7B (14B)
- Qwen1.5-32B
- Instruct/Chat models
- Breeze-7B-Instruct
- Qwen1.5-MoE-A2.7B-Chat (14B)
- Qwen1.5-32B-Chat
Bug fix
- Fix pile dataset download config by @lealaxy in #3053
- Fix model generation config by @marko1616 in #3057
- Fix qwen1.5 models DPO training by @changingivan and @hiyouga in #3083
- Support Qwen1.5-32B by @sliderSun in #3160
- Support Breeze-7B by @codemayq in #3161
- Fix
addtional_target
in unsloth by @kno10 in #3201 - Fix #2807 #3022 #3023 #3046 #3077 #3085 #3116 #3200 #3225
v0.6.1: Patch release
This patch mainly fixes #2983
In commit 9bec3c9, we built the optimizer and scheduler inside the trainers, which inadvertently introduced a bug: when DeepSpeed was enabled, the trainers in transformers would build an optimizer and scheduler before calling the create_optimizer_and_scheduler
method [1], then the optimizer created by our method would overwrite the original one, while the scheduler would not. Consequently, the scheduler would no longer affect the learning rate in the optimizer, leading to a regression in the training result. We have fixed this bug in 3bcd41b and 8c77b10. Thank @HideLord for helping us identify this critical bug.
[1] https://github.com/huggingface/transformers/blob/v4.39.1/src/transformers/trainer.py#L1877-L1881
v0.6.0: Paper Release, GaLore and FSDP+QLoRA
We released our paper on arXiv! Thanks to all co-authors and AK's recommendation
New features
- Support GaLore algorithm, allowing full-parameter learning of a 7B model using less than 24GB VRAM
- Support FSDP+QLoRA that allows QLoRA fine-tuning of a 70B model on 2x24GB GPUs
- Support LoRA+ algorithm for better LoRA fine-tuning by @qibaoyuan in #2830
- LLaMA Factory π€ vLLM, enjoy 270% inference speed with
--infer_backend vllm
- Add Colab notebook for easily getting started
- Support pushing fine-tuned models to Hugging Face Hub in web UI
- Support
apply_chat_template
by adding a chat template to the tokenizer after fine-tuning - Add dockerize support by @S3Studio in #2743 #2849
New models
- Base models
- OLMo (1B/7B)
- StarCoder2 (3B/7B/15B)
- Yi-9B
- Instruct/Chat models
- OLMo-7B-Instruct
New datasets
- Supervised fine-tuning datasets
- Cosmopedia (en)
- Preference datasets
- Orca DPO (en)
Bug fix
- Fix flash_attn in web UI by @cx2333-gt in #2730
- Fix deepspeed runtime error in PPO by @stephen-nju in #2746
- Fix readme ddp instruction by @khazic in #2903
- Fix environment variable in datasets by @SirlyDreamer in #2905
- Fix readme information by @0xez in #2919
- Fix generation config validation by @marko1616 in #2945
- Fix requirements by @rkinas in #2963
- Fix bitsandbytes windows version by @Tsumugii24 in #2967
- Fix #2346 #2642 #2649 #2732 #2735 #2756 #2766 #2775 #2777 #2782 #2798 #2802 #2803 #2817 #2895 #2928 #2936 #2941
v0.5.3: DoRA and AWQ/AQLM QLoRA
New features
- Support DoRA (Weight-Decomposed LoRA)
- Support QLoRA for the AWQ/AQLM quantized models, now 2-bit QLoRA is feasible
- Provide some example scripts in https://github.com/hiyouga/LLaMA-Factory/tree/main/examples
New models
- Base models
- Gemma (2B/7B)
- Instruct/Chat models
- Gemma-it (2B/7B)
Bug fix
v0.5.2: Block Expansion, Qwen1.5 Models
New features
- Support block expansion in LLaMA Pro, see
tests/llama_pro.py
for usage - Add
use_rslora
option for the LoRA method
New models
- Base models
- Qwen1.5 (0.5B/1.8B/4B/7B/14B/72B)
- DeepSeekMath-7B-Base
- DeepSeekCoder-7B-Base-v1.5
- Orion-14B-Base
- Instruct/Chat models
- Qwen1.5-Chat (0.5B/1.8B/4B/7B/14B/72B)
- MiniCPM-2B-SFT/DPO
- DeepSeekMath-7B-Instruct
- DeepSeekCoder-7B-Instruct-v1.5
- Orion-14B-Chat
- Orion-14B-Long-Chat
- Orion-14B-RAG-Chat
- Orion-14B-Plugin-Chat
New datasets
- Supervised fine-tuning datasets
- SlimOrca (en)
- Dolly (de)
- Dolphin (de)
- Airoboros (de)
- Preference datasets
- Orca DPO (de)
Bug fix
- Fix
torch_dtype
check in export model by @fenglui in #2262 - Add Russian locale to LLaMA Board by @seoeaa in #2264
- Remove manually set
use_cache
in export model by @yhyu13 in #2266 - Fix DeepSpeed Zero3 training with MoE models by @A-Cepheus in #2283
- Add a patch for full training of the Mixtral model using DeepSpeed Zero3 by @ftgreat in #2319
- Fix bug in data pre-processing by @lxsyz in #2411
- Add German sft and dpo datasets by @johannhartmann in #2423
- Add version checking in
test_toolcall.py
by @mini-tiger in #2435 - Enable parsing of SlimOrca dataset by @mnmueller in #2462
- Add tags for models when pushing to hf hub by @younesbelkada in #2474
- Fix #2189 #2268 #2282 #2320 #2338 #2376 #2388 #2394 #2397 #2404 #2412 #2420 #2421 #2436 #2438 #2471 #2481
v0.5.0: Agent Tuning, Unsloth Integration
Congratulations on 10k stars π Make LLM fine-tuning easier and faster together with LLaMA-Factory β¨
New features
- Support agent tuning for most models, you can fine-tune any LLMs with
--dataset glaive_toolcall
for tool using #2226 - Support function calling in both API and Web mode with fine-tuned models, same as the OpenAI's format
- LLaMA Factory π€ Unsloth, enjoy 170% LoRA training speed with
--use_unsloth
, see benchmarking here - Supports fine-tuning models on MPS device #2090
New models
- Base models
- Phi-2 (2.7B)
- InternLM2 (7B/20B)
- SOLAR-10.7B
- DeepseekMoE-16B-Base
- XVERSE-65B-2
- Instruct/Chat models
- InternLM2-Chat (7B/20B)
- SOLAR-10.7B-Instruct
- DeepseekMoE-16B-Chat
- Yuan (2B/51B/102B)
New datasets
- Supervised fine-tuning datasets
- deepctrl dataset
- Glaive function calling dataset v2
Core updates
- Refactor data engine: clearer dataset alignment, easier templating and tool formatting
- Refactor saving logic for models with value head #1789
- Use ruff code formatter for stylish code
Bug fix
- Bump transformers version to 4.36.2 by @ShaneTian in #1932
- Fix requirements by @dasdristanta13 in #2117
- Add Machine-Mindset project by @JessyTsu1 in #2163
- Fix typo in readme file by @junuMoon in #2194
- Support resize token embeddings with ZeRO3 by @liu-zichen in #2201
- Fix #1073 #1462 #1617 #1735 #1742 #1789 #1821 #1875 #1895 #1900 #1908 #1907 #1909 #1923 #2014 #2067 #2081 #2090 #2098 #2125 #2127 #2147 #2161 #2164 #2183 #2195 #2249 #2260
v0.4.0: Mixtral-8x7B, DPO-ftx, AutoGPTQ Integration
π¨π¨ Core refactor
- Deprecate
checkpoint_dir
and useadapter_name_or_path
instead - Replace
resume_lora_training
withcreate_new_adapter
- Move the patches in model loading to
llmtuner.model.patcher
- Bump to Transformers 4.36.1 to adapt to the Mixtral models
- Wide adaptation for FlashAttention2 (LLaMA, Falcon, Mistral)
- Temporarily disable LongLoRA due to breaking changes, which will be supported later
The above changes were made by @hiyouga in #1864
New features
- Add DPO-ftx: mixing fine-tuning gradients to DPO via the
dpo_ftx
argument, suggested by @lylcst in #1347 (comment) - Integrate AutoGPTQ into the model export via the
export_quantization_bit
andexport_quantization_dataset
arguments - Support loading datasets from ModelScope Hub by @tastelikefeet and @wangxingjun778 in #1802
- Support resizing token embeddings with the noisy mean initialization by @hiyouga in a66186b
- Support system column in both alpaca and sharegpt dataset formats
New models
- Base models
- Mixtral-8x7B-v0.1
- Instruct/Chat models
- Mixtral-8x7B-v0.1-instruct
- Mistral-7B-Instruct-v0.2
- XVERSE-65B-Chat
- Yi-6B-Chat
Bug fix
v0.3.3: ModelScope Integration, Reward Server
New features
- Support loading pre-trained models from ModelScope Hub by @tastelikefeet in #1700
- Support launching a reward model server in demo API via specifying
--stage=rm
inapi_demo.py
- Support using a reward model server in PPO training via specifying
--reward_model_type api
- Support adjusting the shard size of exported models via the
export_size
argument
New models
- Base models
- DeepseekLLM-Base (7B/67B)
- Qwen (1.8B/72B)
- Instruct/Chat models
- DeepseekLLM-Chat (7B/67B)
- Qwen-Chat (1.8B/72B)
- Yi-34B-Chat
New datasets
- Supervised fine-tuning datasets
- Preference datasets