Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hangs on input longer than 8192 characters #4

Closed
simonw opened this issue Oct 26, 2023 · 2 comments
Closed

Hangs on input longer than 8192 characters #4

simonw opened this issue Oct 26, 2023 · 2 comments
Labels
bug Something isn't working

Comments

@simonw
Copy link
Owner

simonw commented Oct 26, 2023

I tried running this in the https://github.com/simonw/hmb-map folder (to get all the node_models READMEs):

llm embed-multi jina-readmes \
      -m jina-embeddings-v2-small-en \
      --files . '**/README.md' --store \
      --database test.db

I got this:

Embedding [####################################] 100%Token indices sequence length is longer than the specified maximum sequence length for this model (8367 > 8192). Running this sequence through the model will result in indexing errors
/Users/simon/.cache/huggingface/modules/transformers_modules/jinaai/jina-bert-implementation/a9db86227f71a0bd7bc05e5dda0359f1e09abb0f/modeling_bert.py:774: UserWarning: Increasing alibi size from 8192 to 8367.
warnings.warn(

The process seemed to hang - I had to Ctrl+Z and then kill %1 to exit it.

@simonw simonw added the bug Something isn't working label Oct 26, 2023
@simonw
Copy link
Owner Author

simonw commented Oct 26, 2023

Truncating to 8192 characters before passing it to the model seemed to fix it.

@simonw
Copy link
Owner Author

simonw commented Oct 26, 2023

I couldn't get the new test to fail but I did see it fail and then work testing manually on the CLI.

@simonw simonw closed this as completed in 2edd77e Oct 26, 2023
simonw added a commit that referenced this issue Oct 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant