-
Notifications
You must be signed in to change notification settings - Fork 522
[Local App Snippet] support non conversational LLMs #954
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
18 commits
Select commit
Hold shift + click to select a range
8239d41
[Local App Snippet] support non conversational LLMs
mishig25 36761d5
llama_cpp_python
mishig25 c94a986
Apply suggestions from code review
mishig25 f61ac3c
Apply suggestions from code review
mishig25 bd09de1
Add test cases
mishig25 968bc02
prefer to use array const
mishig25 524e965
real examples
mishig25 de46212
"once upon a time," example
mishig25 6761b20
fix rebase
mishig25 04e8f0c
fix imports
mishig25 d4e7fcd
simplify strings
mishig25 0485bd1
format
mishig25 2e7c080
use shared example message
mishig25 31632a1
vLLM VLM snippet support
mishig25 d6c5b5b
match naming
mishig25 fa745eb
llama_cpp_python use same snippet
mishig25 6bc8650
Merge branch 'main' into non_conv_models
mishig25 0e7cf59
lint
mishig25 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,123 @@ | ||
import { describe, expect, it } from "vitest"; | ||
import { LOCAL_APPS } from "./local-apps.js"; | ||
import type { ModelData } from "./model-data.js"; | ||
|
||
describe("local-apps", () => { | ||
it("llama.cpp conversational", async () => { | ||
const { snippet: snippetFunc } = LOCAL_APPS["llama.cpp"]; | ||
const model: ModelData = { | ||
id: "bartowski/Llama-3.2-3B-Instruct-GGUF", | ||
tags: ["conversational"], | ||
inference: "", | ||
}; | ||
const snippet = snippetFunc(model); | ||
|
||
expect(snippet[0].content).toEqual(`# Load and run the model: | ||
llama-cli \\ | ||
--hf-repo "bartowski/Llama-3.2-3B-Instruct-GGUF" \\ | ||
--hf-file {{GGUF_FILE}} \\ | ||
-p "You are a helpful assistant" \\ | ||
--conversation`); | ||
}); | ||
|
||
it("llama.cpp non-conversational", async () => { | ||
const { snippet: snippetFunc } = LOCAL_APPS["llama.cpp"]; | ||
const model: ModelData = { | ||
id: "mlabonne/gemma-2b-GGUF", | ||
tags: [], | ||
inference: "", | ||
}; | ||
const snippet = snippetFunc(model); | ||
|
||
expect(snippet[0].content).toEqual(`# Load and run the model: | ||
llama-cli \\ | ||
--hf-repo "mlabonne/gemma-2b-GGUF" \\ | ||
--hf-file {{GGUF_FILE}} \\ | ||
-p "Once upon a time,"`); | ||
}); | ||
|
||
it("vLLM conversational llm", async () => { | ||
const { snippet: snippetFunc } = LOCAL_APPS["vllm"]; | ||
const model: ModelData = { | ||
id: "meta-llama/Llama-3.2-3B-Instruct", | ||
pipeline_tag: "text-generation", | ||
tags: ["conversational"], | ||
inference: "", | ||
}; | ||
const snippet = snippetFunc(model); | ||
|
||
expect((snippet[0].content as string[]).join("\n")).toEqual(`# Load and run the model: | ||
vllm serve "meta-llama/Llama-3.2-3B-Instruct" | ||
# Call the server using curl: | ||
curl -X POST "http://localhost:8000/v1/chat/completions" \\ | ||
-H "Content-Type: application/json" \\ | ||
--data '{ | ||
"model": "meta-llama/Llama-3.2-3B-Instruct", | ||
"messages": [ | ||
{ | ||
"role": "user", | ||
"content": "What is the capital of France?" | ||
} | ||
] | ||
}'`); | ||
}); | ||
|
||
it("vLLM non-conversational llm", async () => { | ||
const { snippet: snippetFunc } = LOCAL_APPS["vllm"]; | ||
const model: ModelData = { | ||
id: "meta-llama/Llama-3.2-3B", | ||
tags: [""], | ||
inference: "", | ||
}; | ||
const snippet = snippetFunc(model); | ||
|
||
expect((snippet[0].content as string[]).join("\n")).toEqual(`# Load and run the model: | ||
vllm serve "meta-llama/Llama-3.2-3B" | ||
# Call the server using curl: | ||
curl -X POST "http://localhost:8000/v1/completions" \\ | ||
-H "Content-Type: application/json" \\ | ||
--data '{ | ||
"model": "meta-llama/Llama-3.2-3B", | ||
"prompt": "Once upon a time,", | ||
"max_tokens": 512, | ||
"temperature": 0.5 | ||
}'`); | ||
}); | ||
|
||
it("vLLM conversational vlm", async () => { | ||
const { snippet: snippetFunc } = LOCAL_APPS["vllm"]; | ||
const model: ModelData = { | ||
id: "meta-llama/Llama-3.2-11B-Vision-Instruct", | ||
pipeline_tag: "image-text-to-text", | ||
tags: ["conversational"], | ||
inference: "", | ||
}; | ||
const snippet = snippetFunc(model); | ||
|
||
expect((snippet[0].content as string[]).join("\n")).toEqual(`# Load and run the model: | ||
vllm serve "meta-llama/Llama-3.2-11B-Vision-Instruct" | ||
# Call the server using curl: | ||
curl -X POST "http://localhost:8000/v1/chat/completions" \\ | ||
-H "Content-Type: application/json" \\ | ||
--data '{ | ||
"model": "meta-llama/Llama-3.2-11B-Vision-Instruct", | ||
"messages": [ | ||
{ | ||
"role": "user", | ||
"content": [ | ||
{ | ||
"type": "text", | ||
"text": "Describe this image in one sentence." | ||
}, | ||
{ | ||
"type": "image_url", | ||
"image_url": { | ||
"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" | ||
} | ||
} | ||
] | ||
} | ||
] | ||
}'`); | ||
}); | ||
}); |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,54 @@ | ||
import { describe, expect, it } from "vitest"; | ||
import type { ModelData } from "./model-data.js"; | ||
import { llama_cpp_python } from "./model-libraries-snippets.js"; | ||
|
||
describe("model-libraries-snippets", () => { | ||
it("llama_cpp_python conversational", async () => { | ||
const model: ModelData = { | ||
id: "bartowski/Llama-3.2-3B-Instruct-GGUF", | ||
pipeline_tag: "text-generation", | ||
tags: ["conversational"], | ||
inference: "", | ||
}; | ||
const snippet = llama_cpp_python(model); | ||
|
||
expect(snippet.join("\n")).toEqual(`from llama_cpp import Llama | ||
|
||
llm = Llama.from_pretrained( | ||
repo_id="bartowski/Llama-3.2-3B-Instruct-GGUF", | ||
filename="{{GGUF_FILE}}", | ||
) | ||
|
||
llm.create_chat_completion( | ||
messages = [ | ||
{ | ||
"role": "user", | ||
"content": "What is the capital of France?" | ||
} | ||
] | ||
)`); | ||
}); | ||
|
||
it("llama_cpp_python non-conversational", async () => { | ||
const model: ModelData = { | ||
id: "mlabonne/gemma-2b-GGUF", | ||
tags: [""], | ||
inference: "", | ||
}; | ||
const snippet = llama_cpp_python(model); | ||
|
||
expect(snippet.join("\n")).toEqual(`from llama_cpp import Llama | ||
|
||
llm = Llama.from_pretrained( | ||
repo_id="mlabonne/gemma-2b-GGUF", | ||
filename="{{GGUF_FILE}}", | ||
) | ||
|
||
output = llm( | ||
"Once upon a time,", | ||
max_tokens=512, | ||
echo=True | ||
) | ||
print(output)`); | ||
}); | ||
}); |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah nice!