From 7dee5e359e738cf09f1d76e90f3d1511422b5aea Mon Sep 17 00:00:00 2001 From: martini Date: Fri, 6 Sep 2024 14:06:08 +0200 Subject: [PATCH 1/4] Add links to Adyen blogpost --- README.md | 2 +- docs/source/conceptual/external.md | 4 ++++ docs/source/conceptual/streaming.md | 4 ---- 3 files changed, 5 insertions(+), 5 deletions(-) create mode 100644 docs/source/conceptual/external.md diff --git a/README.md b/README.md index cf6a30dbf82..cc9d523f0ae 100644 --- a/README.md +++ b/README.md @@ -189,7 +189,7 @@ overridden with the `--otlp-service-name` argument ![TGI architecture](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/TGI.png) -Detailed blogpost by Adyen on TGI inner workings: [LLM inference at scale with TGI](https://www.adyen.com/knowledge-hub/llm-inference-at-scale-with-tgi) +Detailed blogpost by Adyen on TGI inner workings: [LLM inference at scale with TGI (Martin Iglesias Goyanes - Adyen, 2024)](https://www.adyen.com/knowledge-hub/llm-inference-at-scale-with-tgi) ### Local install diff --git a/docs/source/conceptual/external.md b/docs/source/conceptual/external.md new file mode 100644 index 00000000000..8168cbc59be --- /dev/null +++ b/docs/source/conceptual/external.md @@ -0,0 +1,4 @@ +# External sources + +- Adyen wrote a detailed article about the interplay between TGI's main components: router and server. +[LLM inference at scale with TGI (Martin Iglesias Goyanes - Adyen, 2024)](https://www.adyen.com/knowledge-hub/llm-inference-at-scale-with-tgi) diff --git a/docs/source/conceptual/streaming.md b/docs/source/conceptual/streaming.md index f1f37f2a980..b8154ba4355 100644 --- a/docs/source/conceptual/streaming.md +++ b/docs/source/conceptual/streaming.md @@ -155,7 +155,3 @@ SSEs are different than: * Webhooks: where there is a bi-directional connection. The server can send information to the client, but the client can also send data to the server after the first request. Webhooks are more complex to operate as they don’t only use HTTP. If there are too many requests at the same time, TGI returns an HTTP Error with an `overloaded` error type (`huggingface_hub` returns `OverloadedError`). This allows the client to manage the overloaded server (e.g., it could display a busy error to the user or retry with a new request). To configure the maximum number of concurrent requests, you can specify `--max_concurrent_requests`, allowing clients to handle backpressure. - -## External sources - -Adyen wrote a nice recap of how TGI streaming feature works. [LLM inference at scale with TGI](https://www.adyen.com/knowledge-hub/llm-inference-at-scale-with-tgi) From 4f41db604aa63679193b3ba02ad086d637d94ad1 Mon Sep 17 00:00:00 2001 From: Nicolas Patry Date: Fri, 6 Sep 2024 15:33:37 +0200 Subject: [PATCH 2/4] Adding to toctree. --- docs/source/_toctree.yml | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/source/_toctree.yml b/docs/source/_toctree.yml index f52fa2ec2a5..8770a348816 100644 --- a/docs/source/_toctree.yml +++ b/docs/source/_toctree.yml @@ -71,6 +71,8 @@ title: How Guidance Works (via outlines) - local: conceptual/lora title: LoRA (Low-Rank Adaptation) + - local: conceptual/external + title: External sources title: Conceptual Guides From 0dfb012ffebf3af9bfb00060ac98c589e18cee45 Mon Sep 17 00:00:00 2001 From: Martin Iglesias Goyanes Date: Fri, 6 Sep 2024 15:46:40 +0200 Subject: [PATCH 3/4] Update external.md --- docs/source/conceptual/external.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/conceptual/external.md b/docs/source/conceptual/external.md index 8168cbc59be..9cbe1b5aee9 100644 --- a/docs/source/conceptual/external.md +++ b/docs/source/conceptual/external.md @@ -1,4 +1,4 @@ -# External sources +# External Resources - Adyen wrote a detailed article about the interplay between TGI's main components: router and server. [LLM inference at scale with TGI (Martin Iglesias Goyanes - Adyen, 2024)](https://www.adyen.com/knowledge-hub/llm-inference-at-scale-with-tgi) From 5d9d3717b293ee3bb6814766da38d92d1048b96c Mon Sep 17 00:00:00 2001 From: Martin Iglesias Goyanes Date: Fri, 6 Sep 2024 15:47:02 +0200 Subject: [PATCH 4/4] Update _toctree.yml --- docs/source/_toctree.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/_toctree.yml b/docs/source/_toctree.yml index 8770a348816..b883b36d6c8 100644 --- a/docs/source/_toctree.yml +++ b/docs/source/_toctree.yml @@ -72,7 +72,7 @@ - local: conceptual/lora title: LoRA (Low-Rank Adaptation) - local: conceptual/external - title: External sources + title: External Resources title: Conceptual Guides