We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Q: is this happening with other endpoints as well? lets round them up and fix 🙏
logs
- llama_engine.cc:443 20240620 06:47:06.363966 UTC 1296850 INFO Request 7: Streamed, waiting for respone - llama_engine.cc:565 20240620 06:47:06.364026 UTC 1296850 DEBUG [makeHeaderString] send stream with transfer-encoding chunked - HttpResponseImpl.cc:535 20240620 06:47:06.364179 UTC 1297078 DEBUG [LaunchSlotWithData] slot 0 is processing [task id: 10] - llama_server_context.cc:602 20240620 06:47:06.404551 UTC 1297078 INFO kv cache rm [p0, end) - id_slot: 0, task_id: 10, p0: 0 - llama_server_context.cc:1522 20240620 06:47:35.917041 UTC 1297078 DEBUG [PrintTimings] PrintTimings: prompt eval time = 9200.033ms / 596 tokens (15.4362969799 ms per token, 64.7823763241 tokens per second) - llama_client_slot.cc:79 20240620 06:47:35.917858 UTC 1297078 DEBUG [PrintTimings] PrintTimings: eval time = 20352.056 ms / 425 runs (47.8871905882 ms per token, 20.882411094 tokens per second) - llama_client_slot.cc:86 20240620 06:47:35.917861 UTC 1297078 DEBUG [PrintTimings] PrintTimings: total time = 29552.089 ms - llama_client_slot.cc:92 20240620 06:47:35.917914 UTC 1297078 INFO slot released: id_slot: 0, id_task: 10, n_ctx: 2048, n_past: 1021, n_system_tokens: 0, n_cache_tokens: 0, truncated: 0 - llama_server_context.cc:1282 20240620 06:47:35.917957 UTC 1297079 INFO Request 7: End of result - llama_engine.cc:596 20240620 06:47:35.917994 UTC 1297079 INFO Request 7: Task completed, release it - llama_engine.cc:629 20240620 06:47:35.917996 UTC 1297079 INFO Request 7: Inference completed - llama_engine.cc:643 20240620 06:51:58.998618 UTC 1296851 INFO Request 8, model llama3-8b-instruct: Generating reponse for inference request - llama_engine.cc:428 20240620 06:51:58.999279 UTC 1296851 INFO Request 8: Stop words:[ "<|end_of_text|>", "<|eot_id|>" ] - llama_engine.cc:443 20240620 06:51:59.000155 UTC 1296851 INFO Request 8: Streamed, waiting for respone - llama_engine.cc:565 20240620 06:51:59.000455 UTC 1297078 INFO slot released: id_slot: 0, id_task: 10, n_ctx: 2048, n_past: 1021, n_system_tokens: 0, n_cache_tokens: 0, truncated: 0 - llama_server_context.cc:1282 20240620 06:51:59.000975 UTC 1296851 DEBUG [makeHeaderString] send stream with transfer-encoding chunked - HttpResponseImpl.cc:535 20240620 06:51:59.000995 UTC 1297078 DEBUG [LaunchSlotWithData] slot 0 is processing [task id: 12] - llama_server_context.cc:602 20240620 06:51:59.037585 UTC 1297078 INFO kv cache rm [p0, end) - id_slot: 0, task_id: 12, p0: 0 - llama_server_context.cc:1522 20240620 06:52:27.962016 UTC 1297078 DEBUG [PrintTimings] PrintTimings: prompt eval time = 9886.169ms / 1089 tokens (9.07820844812 ms per token, 110.153892777 tokens per second) - llama_client_slot.cc:79 20240620 06:52:27.962061 UTC 1297078 DEBUG [PrintTimings] PrintTimings: eval tim 2024-06-20T08:01:59.130Z [CORTEX]::Debug: Request to kill cortex
The text was updated successfully, but these errors were encountered:
Resolved within Jan 0.5.1
Sorry, something went wrong.
Van-QA
No branches or pull requests
Q: is this happening with other endpoints as well?
lets round them up and fix 🙏
logs
The text was updated successfully, but these errors were encountered: