-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Content-Encoding: gzip #136
Comments
Have you considered grcp protocol? If you fork the project and start building, thats something I potetntially would consider to pull in. Questions:
|
Does your FastAPI server accept gRPC? I am using your docker container, behind nginx terminating TLS as a reverse proxy. Nginx apparently can proxy gRPC.
Here is an example of decompression middleware for FastAPI: from fastapi import FastAPI, Request
from starlette.middleware.base import BaseHTTPMiddleware
import gzip
class GZipRequestMiddleware(BaseHTTPMiddleware):
async def dispatch(self, request: Request, call_next):
if 'content-encoding' in request.headers and request.headers['content-encoding'] == 'gzip':
# Decompress the request body
body = await request.body()
decompressed_body = gzip.decompress(body)
# Create a new request with the decompressed body
scope = request.scope
scope['body'] = decompressed_body
request = Request(scope)
response = await call_next(request)
return response
app = FastAPI()
# Add the middleware to the app
app.add_middleware(GZipRequestMiddleware) After that, request.body is used just as before. I'll look into gRPC. I need speed. |
@peebles Thanks for the extensive example. https://stackoverflow.com/questions/43628605/does-the-zlib-module-release-the-global-interpreter-lock-gil-in-python-3 -> I assume this will not affect the GIL or performance. Thoughts:
|
I am doing /rerank, where the input (to you) is a potentially large amount of text, and the output is a very small summary ... no floats, all text. In /rerank, it may make sense to compress the input but not the output ... the output is too small. As for "I assume this will not affect the GIL or performance. decompressed_body = gzip.decompress(body)", I don't know. I come from more of a NodeJS background where everything is async. I have seen significant performance improvements on past projects when I started compressing large network requests between clients on AWS to MongoDB servers at Atlas for example. Which is why I looked into this on Infinity in the first place. |
What is the difference between Infinity and https://github.com/huggingface/text-embeddings-inference? |
@peebles the most similar project out there - I think TEI is an exciting project showcasing a new framework in rust (I link rust). here are a couple of key differences.
Re: Routing: https://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-gzip-compression-decompression.html e.g. via AWS API Gateway and similar. @peebles Feel free to PR the gzip compression, I can add a unit test if needed. |
I'll look into doing the PR. |
I wonder if it would make sense to support compressed requests, esp. for /rerank, where the query and document list could be many 1k or 2k chunks of text? The incoming request could easily exceed 20 or 30k. The http server does not appear to handle gzipped request bodies, if present.
The text was updated successfully, but these errors were encountered: