In [28]:
from IPython.display import HTML

HTML('''
<iframe width="720" height="380" src="https://www.youtube.com/embed/qCNSSAzWx8U" frameborder="0" allowfullscreen></iframe>
''')

**Objective:** To minimize hallucinations of a language model resulting in false or wrong information hence improving precisions and reliability of LLM outcomes respective of various techniques.

**Understanding Hallucinations:**

*  **Cause Analysis:** Hallucinations arise when LLMs on a confidence basis produce wrong information, which is why there are so many of them. This happens because the LLMs highly depend on their previous datasets that might be obsolete at times or not comprehensive enough. LLMs generate content that appears to be for instance right according to this information but they cannot verify anything about its truthfulness nor do they have access to latest updates. Additionally, when overtraining occurs, LLMs will become stickier to their previous databases making it harder for them to give fresh and original answers all the time. In situations where data is inadequate, the model could provide responses derived from previous encounters which may be misleading. It’s crucial to understand that LLMs produce responses by following trends hence do not possess moral or authoritative reasoning whatsoever.
Another contributor causing hallucinations is overfitting; this leads to very close resemblance of some LLMs with their training dataset such that generation of entirely new and original content becomes next to impossible. The other point to note is that when the LLM requires more knowledge for making a statement than it has acquired, it may produce a response that has been derived from past occurrences. It is also necessary to point out that LLMs cannot verify any information they might have received. They build solutions based on
regularities but not ethical or true evaluations.

*  **Solutions:**
Several strategies can help ensure LLMs generate accurate responses.

 *    **Context Injection & Advanced Prompt Engineering:** Adding more information to the prompt, known as "context injection," can enhance LLM performance. This involves providing additional text, code, or data to ensure the LLM has sufficient context. For example, feeding the model relevant examples can help it produce higher-quality and more accurate content. Properly constructed prompts, combined with techniques like grounding (retrieving external data sources) and dynamic prompt generation, can reduce hallucinations. By leveraging real-time information from external sources like documents  databases, grounding ensures that LLMs don’t rely solely on their training data.
In addition to enhancing the input with more descriptive and clarifying aspects, prompt augmentation may lead to the generation of better results. As an example, a chatbot embedded in a retail website may present users’ inquiries automatically enhanced with specific information about products and hence make the model’s reply better.

 *   **Retrieval-Augmented Generation (RAG) with Vector Databases:** The RAG model is a powerful tool for combating hallucinations when combined with vector databases. This means that before creating a response LLMs can look for the right information using outside knowledge, like searching the internet or getting documents. Therefore, they reduce chances of hallucination by using real-time data from authentic sources to generate answers. Vector databases enhance this further because they store text as numeric vectors (embeddings) which denote meaning within the text itself. In addition, queries are also transformed into vectors making it possible for the database to find pertinent documents even if these documents do not have the same terms.

Moreover, developers can modify how the model behaves by changing some parameters such as “temperature”, that reduces its creativity and thus focuses more on generating accurate answers. Other approaches include prompt engineering and context injection which make LLMs more precise and reliable without having to retrain or fine-tune them all over again. When implemented together with RAG , this approach helps improve LLM performance in generating relevant fact-based outputs while at the same time being scalable and affordable.


In [None]:
!pip install openai==0.28

Collecting openai==0.28
  Downloading openai-0.28.0-py3-none-any.whl.metadata (13 kB)
Downloading openai-0.28.0-py3-none-any.whl (76 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.5/76.5 kB[0m [31m3.0 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: openai
Successfully installed openai-0.28.0


***Example of LLM Hallucination***

In [None]:
import openai

# OpenAI API key
openai.api_key = '**************************'

def get_response(prompt):
    try:
        response = openai.ChatCompletion.create(
            model="gpt-3.5-turbo",
            messages=[
                {"role": "system", "content": "You are an AI assistant."},
                {"role": "user", "content": prompt}
            ]
        )
        # Accessing the response data correctly
        return response.choices[0].message['content']

    except Exception as e:
        print(f"Error: {e}")
        return None

# A prompt that could lead to a hallucination
prompt = 'What happened to KFC Company on 25th of July, 2015?'

response = get_response(prompt)

print(f"Question: {prompt}")
print(f"Hallucinated Response: {response}")

Question: What happened to KFC Company on 25th of July, 2015?
Hallucinated Response: On July 25, 2015, KFC faced a shortage of chicken in many of its restaurants across the United Kingdom. The shortage was a result of operational issues with their new delivery partner, DHL, which led to a disruption in the supply chain. As a result, many KFC restaurants had to close temporarily or operate with a limited menu until the issue was resolved. This incident caused significant inconvenience to customers and drew media attention at the time.


***solution using prompt engineering***

In [None]:
def get_response(prompt):
    try:
        # Modify system prompt to encourage creativity instead of accuracy
        response = openai.ChatCompletion.create(
            model="gpt-3.5-turbo",
            messages=[
                {"role": "system", "content": "If no reliable source is found for a query, say 'I don't have that information' rather than guessing."},
                {"role": "user", "content": prompt}
            ]
        )
        # Accessing the response data correctly
        return response.choices[0].message['content']

    except Exception as e:
        print(f"Error: {e}")
        return None

# A prompt that could lead to a hallucination
prompt = 'What happened to KFC Company on 25th of July, 2015?'

response = get_response(prompt)

print(f"Question: {prompt}")
print(f"Improved Response: {response}")

Question: What happened to KFC Company on 25th of July, 2015?
Improved Response: I don't have that information.


***solution using RAG***

In [None]:
from sklearn.metrics.pairwise import cosine_similarity

In [None]:

# Dummy database of documents for retrieval (for illustration)
documents = {
    "doc1": "On 19 February 2018, KFC has closed more than half of its 900 UK outlets after delivery problems meant they ran out of chicken.",
    "doc2": "KFC is launching an unexpected new product: fried-chicken scented sunscreen on Aug 22, 2016, but nothing significant occurred on July 25th.",
    "doc3": "There is no known information about KFC on July 25th, 2015.",
}

def embed_text(text):
    """Get the embedding of a given text using OpenAI's embedding model."""
    response = openai.Embedding.create(
        input=text,
        model="text-embedding-ada-002"
    )
    return response['data'][0]['embedding']

def retrieve_relevant_documents(query, documents):
    """Retrieve the most relevant document by comparing embeddings."""
    query_embedding = embed_text(query)
    document_embeddings = {doc_id: embed_text(text) for doc_id, text in documents.items()}

    # Compute cosine similarity between query and documents
    similarities = {doc_id: cosine_similarity([query_embedding], [embedding])[0][0]
                    for doc_id, embedding in document_embeddings.items()}

    # Sort documents by similarity score
    ranked_documents = sorted(similarities, key=similarities.get, reverse=True)

    # Return the most relevant document
    return ranked_documents[0]

def get_response_with_rag(prompt):
    try:
        # Step 1: Retrieve relevant document based on the query
        relevant_doc_id = retrieve_relevant_documents(prompt, documents)
        relevant_document = documents[relevant_doc_id]

        # Step 2: Pass the relevant document as context to the language model
        response = openai.ChatCompletion.create(
            model="gpt-3.5-turbo",
            messages=[
                {"role": "system", "content": "You are a factual assistant. Use the following document to assist in your answer:"},
                {"role": "system", "content": relevant_document},  # Retrieved document as context
                {"role": "user", "content": prompt}
            ],
            temperature=0.2
        )
        return response.choices[0].message['content']

    except Exception as e:
        print(f"Error: {e}")
        return None

# Example prompt that previously led to hallucination
prompt = "What happened to KFC Company on 25th of July, 2015?"

response = get_response_with_rag(prompt)
print(f"Question: {prompt}")
print(f"Improved Response: {response}")


Question: What happened to KFC Company on 25th of July, 2015?
Improved Response: There is no specific information available about any significant event or incident related to KFC on July 25th, 2015. If you have any other questions or need information on a different date, feel free to ask.
