Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support rate limiting in embeddings API #579

Closed
AyushExel opened this issue Oct 17, 2023 · 0 comments · Fixed by #614
Closed

Support rate limiting in embeddings API #579

AyushExel opened this issue Oct 17, 2023 · 0 comments · Fixed by #614

Comments

@AyushExel
Copy link
Contributor

Most LLM APIs and their derivates have some form of rate limiting. The trail versions for testing have smaller limits. When using new new embeddings API, the calls to model APIs is made implicitly when the data is added to the tables. There are some ways to manually rate limit by sleeping for a few seconds when adding data in smaller batches/individual rows. But if when adding data in larger batches using the ingested EmebddingFunction instance, there is no way to prevent hitting the rate limit.

There are 2 types of rate limits that we could potentially support:

  • Requests level - Allow user to set the RPM(request per minute) when initializing the EmbeddingFunction instance. It keeps the rolling count of requests made in the last 60 seconds and sleeps if the limit occurs. Each call to EmbeddingFunction.generate_embeddings can be assumed to be 1 batched request.
  • Token level - We should simply provide an interface for the user to handle this case themselves as handling this on a lower level can be a bit tricky, as there are 2 cases. 1 - TRM occurs because the combines token of multiple texts is more than the limit. This can be handled same as the request limit by waiting out. but in case 2 - the TRL can be exceeded by a single text, so the solution can be to either chunk it or maybe just truncate everything after the token limit. And token limit can be applied in combination with request limit.

something like this

cohere = EmbeddingFunctionRegistry().get_instance("model_name", rate_limit=10, token_limit=1000000)
AyushExel added a commit that referenced this issue Oct 18, 2023
Sets things up for this -> #579
- Just separates out the registry/ingestion code from the function
implementation code
- adds a `get_registry` util
- package name "open-clip" -> "open-clip-torch"
AyushExel added a commit that referenced this issue Nov 2, 2023
…g functions (#614)

Users ingesting data using rate limited apis don't need to manually make
the process sleep for counter rate limits
resolves #579
raghavdixit99 pushed a commit to raghavdixit99/lancedb that referenced this issue Apr 5, 2024
Sets things up for this -> lancedb#579
- Just separates out the registry/ingestion code from the function
implementation code
- adds a `get_registry` util
- package name "open-clip" -> "open-clip-torch"
raghavdixit99 pushed a commit to raghavdixit99/lancedb that referenced this issue Apr 5, 2024
…g functions (lancedb#614)

Users ingesting data using rate limited apis don't need to manually make
the process sleep for counter rate limits
resolves lancedb#579
westonpace pushed a commit that referenced this issue Apr 5, 2024
Sets things up for this -> #579
- Just separates out the registry/ingestion code from the function
implementation code
- adds a `get_registry` util
- package name "open-clip" -> "open-clip-torch"
westonpace pushed a commit that referenced this issue Apr 5, 2024
…g functions (#614)

Users ingesting data using rate limited apis don't need to manually make
the process sleep for counter rate limits
resolves #579
alexkohler pushed a commit to alexkohler/lancedb that referenced this issue Apr 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant