Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explicitly specify pad token id when generating tokens #3565

Merged
merged 4 commits into from
May 2, 2024

Conversation

sivanantha321
Copy link
Member

What this PR does / why we need it:
We have added a fallback pad token if it is not already present in the tokenizer as part of the PR #3459. But it does not explicitly specifies pad_token_id when invoking generate method which leads to huggingface using eos_token_id as the pad_token_id. The log from huggingface server is show below. To avoid this we should explicitly specify the pad_token_id

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #3536

Type of changes
Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Feature/Issue validation/testing:

Please describe the tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.

  • Test A

  • Test B

  • Logs

Special notes for your reviewer:

  1. Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

Checklist:

  • Have you added unit/e2e tests that prove your fix is effective or that this feature works?
  • Has code been commented, particularly in hard-to-understand areas?
  • Have you made corresponding changes to the documentation?

Release note:

Explicitly specify pad token id when generating tokens

@sivanantha321 sivanantha321 force-pushed the support-to-specify-pad-token branch 4 times, most recently from c926e10 to 8589658 Compare April 3, 2024 09:32
@sivanantha321
Copy link
Member Author

supersedes #3535

@@ -198,6 +197,16 @@ def load(self) -> bool:
raise ValueError(
f"Unsupported task {self.task}. Please check the supported `task` option."
)
if not self.tokenizer.pad_token:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this is moved to only for the predictor case, for the transformer mode do we still need to apply the padding ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We won't have access to the model to update the vocabulary size. So, Even if we add the pad token we will get an Index out of range error.

request_two = "my name is teven and i am"
response = asyncio.run(model({"instances": [request_one, request_two]}, headers={}))
assert request_one in response["predictions"][0]
assert request_two in response["predictions"][1]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am asserting it this way because the output changes occasionally.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can specify the temperature to 0 to get deterministic response

@sivanantha321 sivanantha321 force-pushed the support-to-specify-pad-token branch 2 times, most recently from ae81834 to 3b91b2b Compare April 6, 2024 16:32
Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>
Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>
Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>
@saileshd1402
Copy link
Contributor

saileshd1402 commented May 2, 2024

Greetings!
There was a certain bug I faced while I was testing master branch which might be related to this PR:

CMD to run hf backend run:

python -m huggingfaceserver --model_id=microsoft/phi-2 --model_name=phi --backend=huggingface

cURL request:

curl -v -H "Content-Type: application/json" http://localhost:8080/openai/v1/completions -d '{"model":"phi", "prompt":"Hello give me a hello world python program", "stream": true, "max_tokens": 5}'

Error:

Exception in thread Thread-4 (generate):
Traceback (most recent call last):
  File "/usr/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.11/threading.py", line 982, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/.testenv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/.testenv/lib/python3.11/site-packages/transformers/generation/utils.py", line 1527, in generate
    result = self._greedy_search(
             ^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/.testenv/lib/python3.11/site-packages/transformers/generation/utils.py", line 2452, in _greedy_search
    raise ValueError("If `eos_token_id` is defined, make sure that `pad_token_id` is defined.")
ValueError: If `eos_token_id` is defined, make sure that `pad_token_id` is defined.

After displaying this error, the runtime freezes and won't take any other requests

CC: @johnugeorge

@yuzisun
Copy link
Member

yuzisun commented May 2, 2024

@sivanantha321 can you help resolve the merge conflict ?

Copy link
Contributor

@spolti spolti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>
@oss-prow-bot oss-prow-bot bot removed the lgtm label May 2, 2024
@sivanantha321
Copy link
Member Author

/rerun-all

@sivanantha321
Copy link
Member Author

/rerun-workflow test-llm

@sivanantha321
Copy link
Member Author

/rerun-workflow E2E Tests

@sivanantha321
Copy link
Member Author

/rerun-all

@yuzisun
Copy link
Member

yuzisun commented May 2, 2024

/approve

Copy link

oss-prow-bot bot commented May 2, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sivanantha321, spolti, yuzisun

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@oss-prow-bot oss-prow-bot bot added the approved label May 2, 2024
@yuzisun
Copy link
Member

yuzisun commented May 2, 2024

/lgtm

@oss-prow-bot oss-prow-bot bot added the lgtm label May 2, 2024
@yuzisun yuzisun merged commit 9c6a6b8 into kserve:master May 2, 2024
57 of 58 checks passed
asd981256 pushed a commit to asd981256/kserve that referenced this pull request May 14, 2024
* Add fall back pad token for tokenizer

Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>

* Make linter happy

Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>

* Update test

Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>

* Rebase master

Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>

---------

Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>
Signed-off-by: asd981256 <asd981256@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Explicitly specify pad token id when generating tokens
4 participants