Updated to custom SGMV kernel to fix issue with certain ranks #70

tgaddair · 2023-11-27T18:53:03Z

The existing cutlass based SGMV kernel produces erroneous output for certain LoRA ranks (ex: 32). More recent versions of Punica have introduced a custom shrink kernel that resolves the issue, but only supports ranks [16, 128]. As such, we introduce logic in this PR to use the custom kernel when the rank falls within the valid range, otherwise fallback to the legacy kernel (which does work for rank 8, but may not work for other ranks, need additional testing).

Example:

curl 127.0.0.1:8080/generate \
    -X POST \
    -d '{"inputs": "<|system|> You are a helpful assistant <|user|> What is deep learning? </s> <|assistant|>", "parameters": {"max_new_tokens": 64, "adapter_id": "qblocks/mistral_7b_norobots"}}' \
    -H 'Content-Type: application/json'

Compared with transformers:

from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer

config = PeftConfig.from_pretrained("qblocks/mistral_7b_norobots")
model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1", device_map="auto")
model = PeftModel.from_pretrained(model, "qblocks/mistral_7b_norobots")
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-v0.1")

prompt = "<|system|> You are a helpful assistant <|user|> What is deep learning? </s> <|assistant|>"
inputs = tokenizer(prompt, return_tensors="pt")
generation_output = model.generate(**inputs, max_new_tokens=64)
print(tokenizer.decode(generation_output[0], skip_special_tokens=True))

Output:

Deep learning is a subset of machine learning that uses artificial neural networks to learn from data. It is a powerful tool for solving complex problems in fields such as natural language processing, computer vision, and speech recognition. Deep learning algorithms can learn from large amounts of data and make predictions or decisions based on that data. They can

Updated vendored Kernels to punica-ai/punica@87cb9f5

Related: #68.

tgaddair added 9 commits November 27, 2023 10:14

Added flashinfer submodule

39c3c11

Removed submodule

297f604

Revert

2202869

Submodule

766b631

Kernel header

7c3c715

Fixed prefix

98009bc

Fallback to legacy SGMV

6129465

Revert

6a22a80

README

8409227

geoffreyangus approved these changes Nov 27, 2023

View reviewed changes

tgaddair mentioned this pull request Nov 27, 2023

adapters produce unk tokens only #68

Closed

4 tasks

magdyksaleh approved these changes Nov 27, 2023

View reviewed changes

tgaddair added 2 commits November 27, 2023 11:20

Merge branch 'main' into fix-sgmv-ranks

d366e54

Fixed

d6b85b3

tgaddair merged commit f9409af into main Nov 27, 2023

tgaddair deleted the fix-sgmv-ranks branch November 27, 2023 20:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updated to custom SGMV kernel to fix issue with certain ranks #70

Updated to custom SGMV kernel to fix issue with certain ranks #70

tgaddair commented Nov 27, 2023 •

edited

Loading

Updated to custom SGMV kernel to fix issue with certain ranks #70

Updated to custom SGMV kernel to fix issue with certain ranks #70

Conversation

tgaddair commented Nov 27, 2023 • edited Loading

tgaddair commented Nov 27, 2023 •

edited

Loading