Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idea: add a parameter to configure number of decimals in JSON output #64

Open
lasttero opened this issue Jan 17, 2024 · 3 comments
Open
Labels
enhancement New feature or request

Comments

@lasttero
Copy link

Please consider adding a parameter to set the number of decimals in the Json output. This would be beneficial to reduce network bandwidth requirements and the time for parsing the output. This is relevant for users who do not need/want full accuracy e.g. is the embedding values are quantized and/or have a latency critical applications.

@michaelfeil
Copy link
Owner

Good idea, I assume as the payload is stringified and sent as payload.

On the other hand, json encoding took around 20% of the CPU, in some cases was responsible for up to half the share of latency time. I solved the issue by switching to orjson. I do not think that https://github.com/ijl/orjson supports such a feature.

So pro:

  • no need for more than 4/6 digits, might reduce latency. will reduce network usage marginally (if that's a bottleneck)

Con:

  • no implementation available in orjson afaik
  • switching to a different
  • additional source of error

@michaelfeil michaelfeil added the enhancement New feature or request label Jan 17, 2024
@lasttero
Copy link
Author

Thank you for responding quickly. Inspired by the comment above I realized I had a sub-optimal implementation for JSON parsing, and replaced it with hand-coded parser for the fastest processing. It would be beneficial to have this, but not anymore critical. Backgrounder: we run a number of infinity processes locally on the same GPU (as that seem to stochastically interleave GPU usage to maximize GPU utilization and total throughput). Again, thank you for the convenient application.

@michaelfeil
Copy link
Owner

I slightly optimized queueing - I don't think the decimals in the json would significantly influence the throughput.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants