Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated to custom SGMV kernel to fix issue with certain ranks #70

Merged
merged 11 commits into from
Nov 27, 2023

Conversation

tgaddair
Copy link
Contributor

@tgaddair tgaddair commented Nov 27, 2023

The existing cutlass based SGMV kernel produces erroneous output for certain LoRA ranks (ex: 32). More recent versions of Punica have introduced a custom shrink kernel that resolves the issue, but only supports ranks [16, 128]. As such, we introduce logic in this PR to use the custom kernel when the rank falls within the valid range, otherwise fallback to the legacy kernel (which does work for rank 8, but may not work for other ranks, need additional testing).

Example:

curl 127.0.0.1:8080/generate \
    -X POST \
    -d '{"inputs": "<|system|> You are a helpful assistant <|user|> What is deep learning? </s> <|assistant|>", "parameters": {"max_new_tokens": 64, "adapter_id": "qblocks/mistral_7b_norobots"}}' \
    -H 'Content-Type: application/json'

Compared with transformers:

from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer

config = PeftConfig.from_pretrained("qblocks/mistral_7b_norobots")
model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1", device_map="auto")
model = PeftModel.from_pretrained(model, "qblocks/mistral_7b_norobots")
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-v0.1")

prompt = "<|system|> You are a helpful assistant <|user|> What is deep learning? </s> <|assistant|>"
inputs = tokenizer(prompt, return_tensors="pt")
generation_output = model.generate(**inputs, max_new_tokens=64)
print(tokenizer.decode(generation_output[0], skip_special_tokens=True))

Output:

Deep learning is a subset of machine learning that uses artificial neural networks to learn from data. It is a powerful tool for solving complex problems in fields such as natural language processing, computer vision, and speech recognition. Deep learning algorithms can learn from large amounts of data and make predictions or decisions based on that data. They can

Updated vendored Kernels to punica-ai/punica@87cb9f5

Related: #68.

@tgaddair tgaddair mentioned this pull request Nov 27, 2023
4 tasks
@tgaddair tgaddair merged commit f9409af into main Nov 27, 2023
@tgaddair tgaddair deleted the fix-sgmv-ranks branch November 27, 2023 20:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants