## RAG example with Langchain, Redis, and HFTGI

Requirements:
- A Redis cluster and Database where documents have been injected
- All information for connecting to the redis cluster and database, index name and schema file.
- An inference endpoint served with Hugging Face Text Generation Inference server

#### Bases parameters, Inference server and Redis info

In [1]:
inference_server_url = "http://hf-tgi.llm-hosting.svc.cluster.local:3000/"
redis_url = "redis://server:port"
index_name = "docs"
schema_name = "redis_schema.yaml"

#### Imports

In [None]:
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
from langchain.vectorstores.redis import Redis
from langchain.chains import RetrievalQA
from langchain.llms import HuggingFaceTextGenInference
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.prompts import PromptTemplate

#### Initialize the connection

In [None]:
embeddings = HuggingFaceEmbeddings()
rds = Redis.from_existing_index(
    embeddings,
    redis_url=redis_url,
    index_name=index_name,
    schema=schema_name
)

#### Initialize query chain

In [None]:


# NOTE: This template syntax is specific to Llama2
template="""<s>[INST] <<SYS>>
You are a helpful, respectful and honest assistant.
You will be given a question you need to answer, and a context to provide you with information. You must answer the question based as much as possible on this context.
Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.

If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<</SYS>>

Question: {question}
Context: {context} [/INST]
"""
QA_CHAIN_PROMPT = PromptTemplate.from_template(template)

llm = HuggingFaceTextGenInference(
    inference_server_url=inference_server_url,
    max_new_tokens=512,
    top_k=10,
    top_p=0.95,
    typical_p=0.95,
    temperature=0.1,
    repetition_penalty=1.175,
    streaming=True,
    callbacks=[StreamingStdOutCallbackHandler()]
)

qa_chain = RetrievalQA.from_chain_type(llm,
                                       retriever=rds.as_retriever(search_type="similarity", search_kwargs={"k": 4, "distance_threshold": 0.5}),
                                       chain_type_kwargs={"prompt": QA_CHAIN_PROMPT},
                                       return_source_documents=True)

#### Query example

In [None]:
question = "How can I work with GPU and taints?"
result = qa_chain({"query": question})

Metadata key page not found in metadata. Setting to None. 
Metadata fields defined for this instance: ['source', 'page']
Metadata key page not found in metadata. Setting to None. 
Metadata fields defined for this instance: ['source', 'page']


  Hello! I'm here to assist you with your question about working with GPUs and taints in Kubernetes. Based on the provided context, I understand that you want to use taints to restrict access to GPUs and provide choice between different types of GPUs.

To start, it's important to note that the NVIDIA Operator has a built-in tolerance for the nvidia.com/gpu taint, so you don't need to add it explicitly. However, if you want to schedule Pods on nodes with other types of GPUs, you'll need to apply the appropriate taints to those nodes.

When applying taints, it's essential to pay close attention to avoid installing the NVIDIA drivers on non-NVIDIA GPU nodes. To do this, you can use the `tolerations` field in your Pod spec to specify which taints the Pod can tolerate. For example, if you want to schedule a Pod on a node with an NVIDIA GPU, you can set the `tolerations` field to include the nvidia.com/gpu taint.

Here's an example of how you could apply taints to a Pod spec:
```yaml
apiVers

#### Retrieve source

In [None]:
def remove_duplicates(input_list):
    unique_list = []
    for item in input_list:
        if item.metadata['source'] not in unique_list:
            unique_list.append(item.metadata['source'])
    return unique_list

results = remove_duplicates(result['source_documents'])

for s in results:
    print(s)

https://ai-on-openshift.io/odh-rhods/nvidia-gpus/
