Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upstream additions #1

Merged
merged 8 commits into from
Jul 24, 2021
Merged

Upstream additions #1

merged 8 commits into from
Jul 24, 2021

Conversation

afiaka87
Copy link

No description provided.

jongwook and others added 7 commits July 18, 2021 18:45
* Add truncate_text option to tokenize

This makes it possible to run tokenize on texts that are longer than the number of tokens
that fit the context length without having to try to guess how to cut in number of 
characters beforehand

* add doc, rename to just "truncate", use eot_token

Co-authored-by: Jong Wook Kim <jongwook@openai.com>
@afiaka87
Copy link
Author

openai just released two new checkpoints. I haven't tested this branch; just merged in the changes. May be a good idea to look over it since you're more familiar with the codebase.

Looks like they liked the idea of having a truncate option as well; which i refactored back to truncate_text.

clip/clip.py Outdated
@@ -161,7 +182,7 @@ def patch_float(module):
return model, _transform(model.input_resolution.item())


def tokenize(texts: Union[str, List[str]], context_length: int = 77, truncate_text = False) -> torch.LongTensor:
def tokenize(texts: Union[str, List[str]], context_length: int = 77, truncate_text: bool = False) -> torch.LongTensor:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think they named this truncate, let's do the same here ?

@rom1504 rom1504 merged commit 169f6cb into rom1504:main Jul 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants