Skip to content

Commit

Permalink
Nightly (unslothai#632)
Browse files Browse the repository at this point in the history
* Update llama.py

* offload

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* Update llama.py

* continued pretraining trainer

* Update trainer.py

* Update trainer.py

* Update trainer.py

* Update trainer.py

* is_bfloat16_supported

* Update __init__.py

* Update README.md

* Update llama.py

* is_bfloat16_supported

* Update __init__.py

* Mistral v3

* Phi 3 medium

* Update chat_templates.py

* Update chat_templates.py

* Phi-3

* Update save.py

* Update README.md

Mistral v3 to Mistral v0.3

* Untrained tokens

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update llama.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update save.py

* Update save.py

* Update save.py

* checkpoint

* Update _utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update tokenizer_utils.py

* Update llama.py

* accelerate

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update _utils.py

* Update tokenizer_utils.py

* train_dataloader

* Update llama.py

* Update llama.py

* Update llama.py

* use_fast_convert

* Update save.py

* Update save.py

* Update save.py

* Update save.py

* remove_special_tokens

* Ollama

* Update chat_templates.py

* Update chat_templates.py

* Update chat_templates.py

* Update llama.py

* Update chat_templates.py

* Support bfloat16 GGUF

* Update save.py

* Update llama.py

* fast_forward_inference

* Update mapper.py

* Update loader.py

* Update llama.py

* Update tokenizer_utils.py

* info

* edits

* Create chat template

* Fix tokenizer

* Update tokenizer_utils.py

* fix case where gguf saving fails due to first_conversion dtype (unslothai#630)

* Support revision parameter in FastLanguageModel.from_pretrained (unslothai#629)

* support `revision` parameter

* match unsloth formatting of named parameters

* clears any selected_adapters before calling internal_model.save_pretrained (unslothai#609)

* Update __init__.py (unslothai#602)

Check for incompatible modules before importing unsloth

* Fixed unsloth/tokenizer_utils.py for chat training (unslothai#604)

* Add GGML saving option to Unsloth for easier Ollama model creation and testing. (unslothai#345)

* Add save to llama.cpp GGML to save.py.

* Fix conversion command and path of convert to GGML function.

* Add autosaving lora to the GGML function

* Create lora save function for conversion to GGML

* Test fix unslothai#2 for saving lora

* Test fix unslothai#3 to save  the lora adapters to convert to GGML

* Remove unwated tokenizer saving for conversion to ggml and added a few print statements.

* Needed tokenizer for saving, added it back, also made it more unslothy style by having positional arguments, and added a few messages.

* Positional arguments didn't work out, so reverted to older version of the code, and added a few comments.

* Test fix 1 for arch

* Test fix 2 new Mistral error.

* Test fix 3

* Revert to old version for testing.

* Upload issue test fix 1

* Fix 2 uploading ggml

* Positional ags added.

* Temporray remove positional args

* Fix upload again!!!

* Add print statements and fix link

* Make the calling name better

* Create local saving for GGML

* Add choosing directory to save local GGML.

* Fix lil variable error in the save_to_custom_dir func

* docs: Add LoraConfig parameters documentation (unslothai#619)

* llama.cpp failing (unslothai#371)

llama.cpp is failing to generate quantize versions for the trained models.

Error:

```bash
You might have to compile llama.cpp yourself, then run this again.
You do not need to close this Python program. Run the following commands in a new terminal:
You must run this in the same folder as you're saving your model.
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp && make clean && LLAMA_CUDA=1 make all -j
Once that's done, redo the quantization.
```

But when i do clone this with recursive it works.

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* fix libcuda_dirs import for triton 3.0 (unslothai#227)

* fix libcuda_dirs import for triton 3.0

* Update __init__.py

* Update __init__.py

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Update save.py

* Update __init__.py

* Update fast_lora.py

* Update save.py

* Update save.py

* Update save.py

* Update loader.py

* Update save.py

* Update save.py

* quantize now llama-quantize

* Update chat_templates.py

* Update loader.py

* Update mapper.py

* Update __init__.py

* embedding size

---------

Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com>
Co-authored-by: Eliot Hall <60240707+chrehall68@users.noreply.github.com>
Co-authored-by: Rickard Edén <rickardeden@gmail.com>
Co-authored-by: XiaoYang <xyangk@gmail.com>
Co-authored-by: Oseltamivir <58582368+Oseltamivir@users.noreply.github.com>
Co-authored-by: mahiatlinux <110882203+mahiatlinux@users.noreply.github.com>
Co-authored-by: Sébastien De Greef <sebdg@binarycompany.com>
Co-authored-by: Alberto Ferrer <albertof@barrahome.org>
Co-authored-by: Thomas Viehmann <tv.github-private@beamnet.de>
  • Loading branch information
10 people committed Jun 14, 2024
1 parent 1cd72b2 commit b0648a0
Show file tree
Hide file tree
Showing 8 changed files with 255 additions and 159 deletions.
29 changes: 25 additions & 4 deletions unsloth/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,20 @@
import os
import warnings
import importlib
import sys
from packaging.version import Version

# Currently only supports 1 GPU, or else seg faults will occur.
# Define a list of modules to check
MODULES_TO_CHECK = ["peft", "bitsandbytes"]

# Check if any of the modules in the list have been imported
for module in MODULES_TO_CHECK:
if module in sys.modules:
raise ImportError(f"Unsloth: Please import Unsloth before {module}.")
pass
pass

# Currently only supports 1 GPU, or else seg faults will occur.
if "CUDA_VISIBLE_DEVICES" in os.environ:
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
devices = os.environ["CUDA_VISIBLE_DEVICES"]
Expand Down Expand Up @@ -66,8 +78,14 @@ def is_bf16_supported(): return SUPPORTS_BFLOAT16

# Try loading bitsandbytes and triton
import bitsandbytes as bnb

import triton
from triton.common.build import libcuda_dirs
libcuda_dirs = lambda: None
if Version(triton.__version__) >= Version("3.0.0"):
try: from triton.backends.nvidia.driver import libcuda_dirs
except: pass
else: from triton.common.build import libcuda_dirs

import os
import re
import numpy as np
Expand Down Expand Up @@ -103,8 +121,11 @@ def is_bf16_supported(): return SUPPORTS_BFLOAT16
importlib.reload(bnb)
importlib.reload(triton)
try:
import bitsandbytes as bnb
from triton.common.build import libcuda_dirs
libcuda_dirs = lambda: None
if Version(triton.__version__) >= Version("3.0.0"):
try: from triton.backends.nvidia.driver import libcuda_dirs
except: pass
else: from triton.common.build import libcuda_dirs
cdequantize_blockwise_fp32 = bnb.functional.lib.cdequantize_blockwise_fp32
libcuda_dirs()
except:
Expand Down
2 changes: 1 addition & 1 deletion unsloth/chat_templates.py
Original file line number Diff line number Diff line change
Expand Up @@ -1286,7 +1286,7 @@ def test_hf_gguf_equivalence(tokenizer, gguf_model = "./model-unsloth.F16.gguf")
pass

for prompt in prompts:
command = f"./llama.cpp/main -m {gguf_model} -n 0 --temp 0.0 --verbose-prompt "\
command = f"./llama.cpp/llama-cli -m {gguf_model} -n 0 --temp 0.0 --verbose-prompt "\
f"--check-tensors -p '{prompt}'"

datas = []
Expand Down
1 change: 1 addition & 0 deletions unsloth/kernels/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
)
from .fast_lora import (
get_lora_parameters,
get_lora_parameters_bias,
apply_lora_mlp_swiglu,
apply_lora_mlp_geglu_exact,
apply_lora_mlp_geglu_approx,
Expand Down
8 changes: 7 additions & 1 deletion unsloth/kernels/fast_lora.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,13 @@
# limitations under the License.

import torch
from .utils import fast_dequantize, QUANT_STATE, get_lora_parameters, matmul_lora
from .utils import (
fast_dequantize,
QUANT_STATE,
get_lora_parameters,
get_lora_parameters_bias,
matmul_lora,
)


class LoRA_MLP(torch.autograd.Function):
Expand Down
68 changes: 38 additions & 30 deletions unsloth/models/loader.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,11 +33,8 @@

def _get_model_name(model_name, load_in_4bit = True):

# First try replacing lowercase 'b' with uppercase 'B'
model_name = model_name.lower()

if not SUPPORTS_FOURBIT and model_name in INT_TO_FLOAT_MAPPER:
model_name = INT_TO_FLOAT_MAPPER[model_name]
model_name = INT_TO_FLOAT_MAPPER[model_name.lower()]
logger.warning_once(
f"Unsloth: Your transformers version of {transformers_version} does not support native "\
f"4bit loading.\nThe minimum required version is 4.37.\n"\
Expand All @@ -47,15 +44,15 @@ def _get_model_name(model_name, load_in_4bit = True):
)

elif not load_in_4bit and model_name in INT_TO_FLOAT_MAPPER:
new_model_name = INT_TO_FLOAT_MAPPER[model_name]
new_model_name = INT_TO_FLOAT_MAPPER[model_name.lower()]
# logger.warning_once(
# f"Unsloth: You passed in `{model_name}` which is a 4bit model, yet you set\n"\
# f"`load_in_4bit = False`. We shall load `{new_model_name}` instead."
# )
model_name = new_model_name

elif load_in_4bit and SUPPORTS_FOURBIT and model_name in FLOAT_TO_INT_MAPPER:
new_model_name = FLOAT_TO_INT_MAPPER[model_name]
new_model_name = FLOAT_TO_INT_MAPPER[model_name.lower()]
# logger.warning_once(
# f"Unsloth: You passed in `{model_name}` and `load_in_4bit = True`.\n"\
# f"We shall load `{new_model_name}` for 4x faster loading."
Expand All @@ -70,17 +67,18 @@ def _get_model_name(model_name, load_in_4bit = True):
class FastLanguageModel(FastLlamaModel):
@staticmethod
def from_pretrained(
model_name = "unsloth/llama-3-8b-bnb-4bit",
max_seq_length = None,
dtype = None,
load_in_4bit = True,
token = None,
device_map = "sequential",
rope_scaling = None,
fix_tokenizer = True,
trust_remote_code = False,
use_gradient_checkpointing = True,
resize_model_vocab = None,
model_name = "unsloth/llama-3-8b-bnb-4bit",
max_seq_length = None,
dtype = None,
load_in_4bit = True,
token = None,
device_map = "sequential",
rope_scaling = None,
fix_tokenizer = True,
trust_remote_code = False,
use_gradient_checkpointing = "unsloth",
resize_model_vocab = None,
revision = None,
*args, **kwargs,
):
if token is None and "HF_TOKEN" in os.environ:
Expand All @@ -95,12 +93,12 @@ def from_pretrained(
# First check if it's a normal model via AutoConfig
is_peft = False
try:
model_config = AutoConfig.from_pretrained(model_name, token = token)
model_config = AutoConfig.from_pretrained(model_name, token = token, revision = revision)
is_peft = False
except:
try:
# Most likely a PEFT model
peft_config = PeftConfig.from_pretrained(model_name, token = token)
peft_config = PeftConfig.from_pretrained(model_name, token = token, revision = revision)
except:
raise RuntimeError(f"Unsloth: `{model_name}` is not a full model or a PEFT model.")

Expand Down Expand Up @@ -143,22 +141,24 @@ def from_pretrained(
pass

model, tokenizer = dispatch_model.from_pretrained(
model_name = model_name,
max_seq_length = max_seq_length,
dtype = dtype,
load_in_4bit = load_in_4bit,
token = token,
device_map = device_map,
rope_scaling = rope_scaling,
fix_tokenizer = fix_tokenizer,
model_patcher = dispatch_model,
tokenizer_name = tokenizer_name,
model_name = model_name,
max_seq_length = max_seq_length,
dtype = dtype,
load_in_4bit = load_in_4bit,
token = token,
device_map = device_map,
rope_scaling = rope_scaling,
fix_tokenizer = fix_tokenizer,
model_patcher = dispatch_model,
tokenizer_name = tokenizer_name,
trust_remote_code = trust_remote_code,
revision = revision if not is_peft else None,
*args, **kwargs,
)

if resize_model_vocab is not None:
model.resize_token_embeddings(resize_model_vocab)
pass

# In case the model supports tagging, add the unsloth tag.
if hasattr(model, "add_model_tags"):
Expand Down Expand Up @@ -188,8 +188,16 @@ def from_pretrained(
pass

if is_peft:
# From https://github.com/huggingface/peft/issues/184
# Now add PEFT adapters
model = PeftModel.from_pretrained(model, old_model_name, token = token)
model.enable_input_require_grads()
model = PeftModel.from_pretrained(
model,
old_model_name,
token = token,
revision = revision,
is_trainable = True,
)
# Patch it as well!
model = dispatch_model.patch_peft_model(model, use_gradient_checkpointing)
pass
Expand Down
3 changes: 3 additions & 0 deletions unsloth/models/mapper.py
Original file line number Diff line number Diff line change
Expand Up @@ -186,6 +186,9 @@
"unsloth/Qwen2-70B-Instruct-bnb-4bit" : (
"Qwen/Qwen2-70B-Instruct",
),
"mistralai/Codestral-22B-v0.1" : (
"mistral-community/Codestral-22B-v0.1",
),
}

INT_TO_FLOAT_MAPPER = {}
Expand Down
Loading

0 comments on commit b0648a0

Please sign in to comment.