-
Notifications
You must be signed in to change notification settings - Fork 223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support rate limiting in embeddings API #579
Comments
AyushExel
added a commit
that referenced
this issue
Oct 18, 2023
Sets things up for this -> #579 - Just separates out the registry/ingestion code from the function implementation code - adds a `get_registry` util - package name "open-clip" -> "open-clip-torch"
raghavdixit99
pushed a commit
to raghavdixit99/lancedb
that referenced
this issue
Apr 5, 2024
Sets things up for this -> lancedb#579 - Just separates out the registry/ingestion code from the function implementation code - adds a `get_registry` util - package name "open-clip" -> "open-clip-torch"
raghavdixit99
pushed a commit
to raghavdixit99/lancedb
that referenced
this issue
Apr 5, 2024
…g functions (lancedb#614) Users ingesting data using rate limited apis don't need to manually make the process sleep for counter rate limits resolves lancedb#579
westonpace
pushed a commit
that referenced
this issue
Apr 5, 2024
Sets things up for this -> #579 - Just separates out the registry/ingestion code from the function implementation code - adds a `get_registry` util - package name "open-clip" -> "open-clip-torch"
alexkohler
pushed a commit
to alexkohler/lancedb
that referenced
this issue
Apr 20, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Most LLM APIs and their derivates have some form of rate limiting. The trail versions for testing have smaller limits. When using new new embeddings API, the calls to model APIs is made implicitly when the data is added to the tables. There are some ways to manually rate limit by sleeping for a few seconds when adding data in smaller batches/individual rows. But if when adding data in larger batches using the ingested
EmebddingFunction
instance, there is no way to prevent hitting the rate limit.There are 2 types of rate limits that we could potentially support:
EmbeddingFunction
instance. It keeps the rolling count of requests made in the last 60 seconds and sleeps if the limit occurs. Each call toEmbeddingFunction.generate_embeddings
can be assumed to be 1 batched request.something like this
The text was updated successfully, but these errors were encountered: