You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Please consider adding a parameter to set the number of decimals in the Json output. This would be beneficial to reduce network bandwidth requirements and the time for parsing the output. This is relevant for users who do not need/want full accuracy e.g. is the embedding values are quantized and/or have a latency critical applications.
The text was updated successfully, but these errors were encountered:
Good idea, I assume as the payload is stringified and sent as payload.
On the other hand, json encoding took around 20% of the CPU, in some cases was responsible for up to half the share of latency time. I solved the issue by switching to orjson. I do not think that https://github.com/ijl/orjson supports such a feature.
So pro:
no need for more than 4/6 digits, might reduce latency. will reduce network usage marginally (if that's a bottleneck)
Thank you for responding quickly. Inspired by the comment above I realized I had a sub-optimal implementation for JSON parsing, and replaced it with hand-coded parser for the fastest processing. It would be beneficial to have this, but not anymore critical. Backgrounder: we run a number of infinity processes locally on the same GPU (as that seem to stochastically interleave GPU usage to maximize GPU utilization and total throughput). Again, thank you for the convenient application.
Please consider adding a parameter to set the number of decimals in the Json output. This would be beneficial to reduce network bandwidth requirements and the time for parsing the output. This is relevant for users who do not need/want full accuracy e.g. is the embedding values are quantized and/or have a latency critical applications.
The text was updated successfully, but these errors were encountered: