Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a custom exception ValueError subclass for the special tokens warning #290

Open
simonw opened this issue May 2, 2024 · 0 comments
Open

Comments

@simonw
Copy link

simonw commented May 2, 2024

This code here:

tiktoken/tiktoken/core.py

Lines 375 to 383 in 39f29ce

def raise_disallowed_special_token(token: str) -> NoReturn:
raise ValueError(
f"Encountered text corresponding to disallowed special token {token!r}.\n"
"If you want this text to be encoded as a special token, "
f"pass it to `allowed_special`, e.g. `allowed_special={{{token!r}, ...}}`.\n"
f"If you want this text to be encoded as normal text, disable the check for this token "
f"by passing `disallowed_special=(enc.special_tokens_set - {{{token!r}}})`.\n"
"To disable this check for all special tokens, pass `disallowed_special=()`.\n"
)

I wanted to do something special on this exception in my own code, so I had to write this:

try:
    tokens = encoding.encode(text, **kwargs)
except ValueError as ex:
    if 'disallowed special token' in str(ex):
        # Do something special

I suggest having a custom exception class for this instead:

class DisallowedSpecialTokenError(ValueError):
    pass

Raising that class instead would let people like me catch it explicitly, and since it's a subclass of ValueError it should not break any existing code that currently catches ValueError directly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant