Skip to content

Conversation

mishig25
Copy link
Collaborator

@mishig25 mishig25 commented Oct 7, 2024

Description

Most GGUF files on the hub are insutrct/conversational. However, not all of them. Previously, local app snippets assumed that all GGUFs are insutrct/conversational.

vLLM

https://huggingface.co/meta-llama/Llama-3.2-3B?local-app=vllm

mishig@machine:~$ curl -X POST "http://localhost:8000/v1/completions" \
        -H "Content-Type: application/json" \
        --data '{
                "model": "meta-llama/Llama-3.2-3B",
                "prompt": "Once upon a time",
                "max_tokens": 150,
                "temperature": 0.5
        }'

{"id":"cmpl-157aad50ba6d45a5a7e2641a3c8157dd","object":"text_completion","created":1728293162,"model":"meta-llama/Llama-3.2-3B","choices":[{"index":0,"text":" there was a man who was very generous and kind to everyone. He was a good man and a good person. One day he was walking down the street and he saw a man who was very poor and starving. The man was so hungry that he was crying and shaking. The man was so hungry that he was crying and shaking. The man was so hungry that he was crying and shaking. The man was so hungry that he was crying and shaking. The man was so hungry that he was crying and shaking. The man was so hungry that he was crying and shaking. The man was so hungry that he was crying and shaking. The man was so hungry that he was crying and shaking. The man was so hungry that he was crying and shaking","logprobs":null,"finish_reason":"length","stop_reason":null,"prompt_logprobs":null}],"usage":{"prompt_tokens":5,"total_tokens":155,"completion_tokens":150}}

llama.cpp

https://huggingface.co/mlabonne/gemma-2b-GGUF?local-app=llama.cpp

llama-cli \
  --hf-repo "mlabonne/gemma-2b-GGUF" \
  --hf-file gemma-2b.Q2_K.gguf \
  -p "Once upon a time "

llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
        repo_id="mlabonne/gemma-2b-GGUF",
        filename="gemma-2b.Q2_K.gguf",
)

output = llm(
        "Once upon a time ",
        max_tokens=512,
        echo=True
)

print(output)

@mishig25 mishig25 marked this pull request as ready for review October 7, 2024 10:07
@mishig25 mishig25 requested review from ngxson and Vaibhavs10 October 7, 2024 10:07
Base automatically changed from fix_vlmm_snippet to main October 7, 2024 10:08
Copy link
Member

@Vaibhavs10 Vaibhavs10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit, but important specially wrt llama.cpp

@mishig25
Copy link
Collaborator Author

mishig25 commented Oct 7, 2024

Added test cases as the examples are getting more complex and we can be sure not to break any existing examples

packages/tasks/src/local-apps.spec.ts & packages/tasks/src/model-libraries-snippets.spec.ts

` --data '{`,
` "model": "${model.id}",`,
` "messages": [`,
` {"role": "user", "content": "Hello!"}`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
` {"role": "user", "content": "Hello!"}`,
` {"role": "user", "content": "What is the capital of France?"}`,

Minor suggestion: Hello! looks a bit too terse. Perhaps we can unify the Instruct examples to be the same as llama-cpp-python and so on.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handled in 2e7c080

@mishig25
Copy link
Collaborator Author

This PR is finally ready to be reviewed.

Besides the changes described in the description, vLLM snippet also supports now vision models:

vllm serve "meta-llama/Llama-3.2-11B-Vision-Instruct"
# Call the server using curl:
curl -X POST "http://localhost:8000/v1/chat/completions" \\
-H "Content-Type: application/json" \\
--data '{
"model": "meta-llama/Llama-3.2-11B-Vision-Instruct",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Describe this image in one sentence."
},
{
"type": "image_url",
"image_url": {
"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
}
}
]
}
]
}'`);

@Vaibhavs10 @pcuenca @julien-c

Copy link
Member

@pcuenca pcuenca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conceptually looks great!

` ]`,
` }'`,
];
const messages = getModelInputSnippet(model) as ChatCompletionInputMessage[];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah nice!

Copy link
Member

@Vaibhavs10 Vaibhavs10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Niceee! All good wrt llama.cpp + llama-cpp-python + vllm snippets. Do we need to standardise TGI snippets too?

@mishig25
Copy link
Collaborator Author

mishig25 commented Nov 18, 2024

Do we need to standardise TGI snippets too?

Yes, but lets do it in subseq PR once this PR gets merged into main

All good wrt llama.cpp + llama-cpp-python + vllm snippets

approve ?

Copy link
Member

@Vaibhavs10 Vaibhavs10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merci!

@mishig25 mishig25 merged commit f83bbe6 into main Nov 20, 2024
4 of 5 checks passed
@mishig25 mishig25 deleted the non_conv_models branch November 20, 2024 09:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants