# Huggingface TextGen Inference

[Text Generation Inference](https://github.com/huggingface/text-generation-inference) is a Rust, Python and gRPC server for text generation inference. Used in production at [HuggingFace](https://huggingface.co/) to power LLMs api-inference widgets.

This notebooks goes over how to use a self hosted LLM using `Text Generation Inference`.

To use, you should have the `text_generation` python package installed.

In [None]:
!pip3 install text_generation  

## Example

In [None]:
from langchain.docstore.document import Document

In [None]:
example_doc_1 = """
Peter and Elizabeth took a taxi to attend the night party in the city. While in the party, Elizabeth collapsed and was rushed to the hospital.
Since she was diagnosed with a brain injury, the doctor told Peter to stay besides her until she gets well.
Therefore, Peter stayed with her at the hospital for 3 days without leaving.
"""

docs = [
    Document(
        page_content=example_doc_1,
    )
]

In [None]:
from typing import Dict

from langchain import PromptTemplate, HuggingFaceTextGenInference
from langchain.chains.question_answering import load_qa_chain
import json

In [None]:
query = """How long was Elizabeth hospitalized?
"""

prompt_template = """Use the following pieces of context to answer the question at the end.

{context}

Question: {question}
Answer:"""
PROMPT = PromptTemplate(
    template=prompt_template, input_variables=["context", "question"]
)

chain = load_qa_chain(
    llm=HuggingFaceTextGenInference(
        inference_server_url='http://localhost:8010/',
        max_new_tokens=512,
        top_k=10,
        top_p=0.95,
        typical_p=0.95,
        temperature=0.01,
        repetition_penalty=1.03,
    ),
    prompt=PROMPT
)

chain({"input_documents": docs, "question": query}, return_only_outputs=True)