Skip to content

Commit

Permalink
Qwen bug fixes (unslothai#639)
Browse files Browse the repository at this point in the history
* Update llama.py

* offload

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* continued pretraining trainer

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* is_bfloat16_supported

* Update __init__.py

* Update README.md

* Update llama.py

* is_bfloat16_supported

* Update __init__.py

* Mistral v3

* Phi 3 medium

* Update chat_templates.py

* Update chat_templates.py

* Phi-3

* Update save.py

* Update README.md

Mistral v3 to Mistral v0.3

* Untrained tokens

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update llama.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update save.py

* Update save.py

* Update save.py

* checkpoint

* Update _utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update llama.py

* accelerate

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update tokenizer_utils.py

* train_dataloader

* Update llama.py

* Update llama.py

* Update llama.py

* use_fast_convert

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* remove_special_tokens

* Ollama

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update llama.py

* Update chat_templates.py

* Support bfloat16 GGUF

* Update save.py

* Update llama.py

* fast_forward_inference

* Update mapper.py

* Update loader.py

* Update llama.py

* Update tokenizer_utils.py

* info

* edits

* Create chat template

* Fix tokenizer

* Update tokenizer_utils.py

* fix case where gguf saving fails due to first_conversion dtype (unslothai#630)

* Support revision parameter in FastLanguageModel.from_pretrained (unslothai#629)

* support `revision` parameter

* match unsloth formatting of named parameters

* clears any selected_adapters before calling internal_model.save_pretrained (unslothai#609)

* Update __init__.py (unslothai#602)

Check for incompatible modules before importing unsloth

* Fixed unsloth/tokenizer_utils.py for chat training (unslothai#604)

* Add GGML saving option to Unsloth for easier Ollama model creation and testing. (unslothai#345)

* Add save to llama.cpp GGML to save.py.

* Fix conversion command and path of convert to GGML function.

* Add autosaving lora to the GGML function

* Create lora save function for conversion to GGML

* Test fix unslothai#2 for saving lora

* Test fix unslothai#3 to save  the lora adapters to convert to GGML

* Remove unwated tokenizer saving for conversion to ggml and added a few print statements.

* Needed tokenizer for saving, added it back, also made it more unslothy style by having positional arguments, and added a few messages.

* Positional arguments didn't work out, so reverted to older version of the code, and added a few comments.

* Test fix 1 for arch

* Test fix 2 new Mistral error.

* Test fix 3

* Revert to old version for testing.

* Upload issue test fix 1

* Fix 2 uploading ggml

* Positional ags added.

* Temporray remove positional args

* Fix upload again!!!

* Add print statements and fix link

* Make the calling name better

* Create local saving for GGML

* Add choosing directory to save local GGML.

* Fix lil variable error in the save_to_custom_dir func

* docs: Add LoraConfig parameters documentation (unslothai#619)

* llama.cpp failing (unslothai#371)

llama.cpp is failing to generate quantize versions for the trained models.

Error:

```bash
You might have to compile llama.cpp yourself, then run this again.
You do not need to close this Python program. Run the following commands in a new terminal:
You must run this in the same folder as you're saving your model.
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp && make clean && LLAMA_CUDA=1 make all -j
Once that's done, redo the quantization.
```

But when i do clone this with recursive it works.

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* fix libcuda_dirs import for triton 3.0 (unslothai#227)

* fix libcuda_dirs import for triton 3.0

* Update __init__.py

* Update __init__.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update save.py

* Update __init__.py

* Update fast_lora.py

* Update save.py

* Update save.py

* Update save.py

* Update loader.py

* Update save.py

* Update save.py

* quantize now llama-quantize

* Update chat_templates.py

* Update loader.py

* Update mapper.py

* Update __init__.py

* embedding size

* Update qwen2.py

* docs

* Update README.md

* Update qwen2.py

* README: Fix minor typo. (unslothai#559)

* README: Fix minor typo.

One-character typo fix while reading.

* Update README.md

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update mistral.py

* Update qwen2.py

* Update qwen2.py

* Update qwen2.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update README.md

* FastMistralModel

---------

Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Eliot Hall <60240707+chrehall68@users.noreply.github.com>
Co-authored-by: Rickard Edén <rickardeden@gmail.com>
Co-authored-by: XiaoYang <xyangk@gmail.com>
Co-authored-by: Oseltamivir <58582368+Oseltamivir@users.noreply.github.com>
Co-authored-by: mahiatlinux <110882203+mahiatlinux@users.noreply.github.com>
Co-authored-by: Sébastien De Greef <sebdg@binarycompany.com>
Co-authored-by: Alberto Ferrer <albertof@barrahome.org>
Co-authored-by: Thomas Viehmann <tv.github-private@beamnet.de>
Co-authored-by: Walter Korman <lemurware@gmail.com>
  • Loading branch information
11 people committed Jun 14, 2024
1 parent 70244d8 commit 911feb4
Show file tree
Hide file tree
Showing 4 changed files with 16 additions and 101 deletions.
87 changes: 0 additions & 87 deletions PARAMETERS.md

This file was deleted.

18 changes: 10 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ All notebooks are **beginner friendly**! Add your dataset, click "Run All", and
- Run [Llama 3 conversational notebook](https://colab.research.google.com/drive/1XamvWYinY6FOSX9GLvnqSjjsNflxdhNc?usp=sharing) and [Mistral 7B v3 ChatML](https://colab.research.google.com/drive/15F1xyn8497_dUbxZP4zWmPZ3PJx1Oymv?usp=sharing)
- This [text completion notebook](https://colab.research.google.com/drive/1ef-tab5bhkvWmBOObepl1WgJvfvSzn5Q?usp=sharing) is for continued pretraining / raw text
- This [continued pretraining notebook](https://colab.research.google.com/drive/1tEd1FrOXWMnCU9UIvdYhs61tkxdMuKZu?usp=sharing) is for learning another language

- Click [here](https://github.com/unslothai/unsloth/wiki) for detailed documentation for Unsloth.

## 🦥 Unsloth.ai News
- 📣 NEW! Continued Pretraining [notebook](https://colab.research.google.com/drive/1tEd1FrOXWMnCU9UIvdYhs61tkxdMuKZu?usp=sharing) for other languages like Korean!
Expand Down Expand Up @@ -76,7 +76,7 @@ model = FastLanguageModel.get_peft_model(


## 🥇 Performance Benchmarking
- For the full list of **reproducable** benchmarking tables, [go to our website](https://unsloth.ai/blog/mistral-benchmark#Benchmark%20tables)
- For the full list of **reproducible** benchmarking tables, [go to our website](https://unsloth.ai/blog/mistral-benchmark#Benchmark%20tables)

| 1 A100 40GB | 🤗Hugging Face | Flash Attention | 🦥Unsloth Open Source | 🦥[Unsloth Pro](https://unsloth.ai/pricing) |
|--------------|--------------|-----------------|---------------------|-----------------|
Expand All @@ -100,14 +100,16 @@ model = FastLanguageModel.get_peft_model(
### Conda Installation
Select either `pytorch-cuda=11.8` for CUDA 11.8 or `pytorch-cuda=12.1` for CUDA 12.1. If you have `mamba`, use `mamba` instead of `conda` for faster solving. See this [Github issue](https://github.com/unslothai/unsloth/issues/73) for help on debugging Conda installs.
```bash
conda create --name unsloth_env python=3.10
conda create --name unsloth_env \
python=3.10 \
pytorch-cuda=<11.8/12.1> \
pytorch cudatoolkit xformers -c pytorch -c nvidia -c xformers \
-y
conda activate unsloth_env

conda install pytorch-cuda=<12.1/11.8> pytorch cudatoolkit xformers -c pytorch -c nvidia -c xformers

pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"

pip install --no-deps trl peft accelerate bitsandbytes
pip install --no-deps "trl<0.9.0" peft accelerate bitsandbytes
```

### Pip Installation
Expand Down Expand Up @@ -162,7 +164,7 @@ pip install --no-deps packaging ninja einops flash-attn xformers trl peft accele

# Pre Ampere RTX 2080, T4, GTX 1080 GPUs:
pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
pip install --no-deps xformers trl peft accelerate bitsandbytes
pip install --no-deps xformers "trl<0.9.0" peft accelerate bitsandbytes
```
7. For Pytorch 2.3.0: Use the `"ampere"` path for newer RTX 30xx GPUs or higher.
```bash
Expand Down Expand Up @@ -257,7 +259,7 @@ trainer.train()
# (1) Saving to GGUF / merging to 16bit for vLLM
# (2) Continued training from a saved LoRA adapter
# (3) Adding an evaluation loop / OOMs
# (4) Cutomized chat templates
# (4) Customized chat templates
```

<a name="DPO"></a>
Expand Down
4 changes: 3 additions & 1 deletion unsloth/models/mistral.py
Original file line number Diff line number Diff line change
Expand Up @@ -512,7 +512,7 @@ def from_pretrained(
if "n_total_devices >" not in inner_training_loop:
raise RuntimeError(
"Our OSS was designed for people with few GPU resources to level the playing field.\n"
"The OSS Apache 2 license only supports four GPUs - please obtain a commercial license from our website.\n"
"The OSS Apache 2 license only supports one GPU - please obtain a commercial license.\n"
"We're a 2 person team, so we still have to fund our development costs - thanks!\n"
"If you don't, please consider at least sponsoring us through Ko-fi! Appreciate it!",
)
Expand All @@ -521,6 +521,7 @@ def from_pretrained(
"is_sagemaker_mp_enabled()",
"False",
)
exec(inner_training_loop, globals())
Trainer._inner_training_loop = _fast_inner_training_loop

# Save max_seq_length
Expand Down Expand Up @@ -560,6 +561,7 @@ def from_pretrained(

# Add save modules
patch_saving_functions(model)
Trainer._inner_training_loop = _fast_inner_training_loop

# Save tokenizer for inference purposes
tokenizer.padding_side = "left" # Force inference
Expand Down
8 changes: 3 additions & 5 deletions unsloth/models/qwen2.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.

from .llama import *
import os
from ._utils import __version__
from .mistral import *

from transformers.models.qwen2.modeling_qwen2 import (
Qwen2Attention,
Expand All @@ -34,7 +32,7 @@
pass


class FastQwen2Model(FastLlamaModel):
class FastQwen2Model(FastMistralModel):

@staticmethod
def pre_patch():
Expand Down Expand Up @@ -72,7 +70,7 @@ def from_pretrained(
trust_remote_code = False,
**kwargs,
):
return FastLlamaModel.from_pretrained(
return FastMistralModel.from_pretrained(
model_name = model_name,
max_seq_length = max_seq_length,
dtype = dtype,
Expand Down

0 comments on commit 911feb4

Please sign in to comment.