Pad left for decode-only architecture llm models #3537

sivanantha321 · 2024-03-21T04:03:07Z

/kind bug

What steps did you take and what happened:
[A clear and concise description of what the bug is.]
With decoder-only models, such as GPT-2 padding should be done on the left. This is because the output is a continuation of the input prompt -- there would be gaps in the output without left padding. Huggingface will warn if we don't use left padding for decoder models.
huggingface/transformers#18388 (comment)
The huggingface will throw a warning about this

A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set padding_side='left' when initializing the tokenizer

What did you expect to happen:
It should pad left for decoder only models.

What's the InferenceService yaml:
[To help us debug please run kubectl get isvc $name -n $namespace -oyaml and paste the output]

apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  name: huggingface-gpt2
spec:
  predictor:
    model:
      modelFormat:
        name: huggingface
      args:
        - "--model_name=gpt2"
        - "--model_id=openai-community/gpt2"
        - "--tensor_input_names=input_ids"
        - "--disable_vllm"
        - "--task=6"
      resources:
        limits:
          nvidia.com/gpu: "1"
        requests:
          nvidia.com/gpu: "1"

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

Istio Version:
Knative Version:
KServe Version: 0.12.0
Kubeflow version:
Cloud Environment:[k8s_istio/istio_dex/gcp_basic_auth/gcp_iap/aws/aws_cognito/ibm]
Minikube/Kind version:
Kubernetes version: (use kubectl version):
OS (e.g. from /etc/os-release):

The text was updated successfully, but these errors were encountered:

oss-prow-bot bot added the kind/bug label Mar 21, 2024

sivanantha321 mentioned this issue Mar 21, 2024

Only pad left for decode-only architecture models. #3534

Merged

9 tasks

oss-prow-bot bot closed this as completed in #3534 Mar 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pad left for decode-only architecture llm models #3537

Pad left for decode-only architecture llm models #3537

sivanantha321 commented Mar 21, 2024

Pad left for decode-only architecture llm models #3537

Pad left for decode-only architecture llm models #3537

Comments

sivanantha321 commented Mar 21, 2024