<a href="https://colab.research.google.com/github/rahiakela/genai-research-and-practice/blob/main/rag-system-notebooks/01_advanced_rag_system_with_HyDE.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##Setup

In [None]:
!pip -q install langchain huggingface_hub chromadb tiktoken faiss-cpu
!pip -q install sentence_transformers
!pip -q install -U FlagEmbedding

In [None]:
# can you download the blog posts from here https://www.dropbox.com/scl/fi/ulbt145sthizf2nazey49/langchain_blog_posts.zip?rlkey=9unhw0vukhlwacahmpnk5m591&dl=0
!wget https://github.com/rahiakela/genai-research-and-practice/raw/main/rag-system-notebooks/langchain_blog_posts.zip

!mkdir -p blog_posts
!unzip -q langchain_blog_posts.zip -d blog_posts

In [None]:
from langchain.embeddings import OpenAIEmbeddings
from langchain.chains import LLMChain, HypotheticalDocumentEmbedder
from langchain.prompts import PromptTemplate

from langchain.document_loaders import TextLoader
import langchain

**BGE Embeddings**

In [None]:
from langchain.embeddings import HuggingFaceBgeEmbeddings

model_name = "BAAI/bge-small-en-v1.5"
encode_kwargs = {'normalize_embeddings': True} # set True to compute cosine similarity

bge_embeddings = HuggingFaceBgeEmbeddings(
    model_name=model_name,
    model_kwargs={'device': 'cuda'},
    encode_kwargs=encode_kwargs
)

**Liama 2**

In [None]:
!pip install -qqq optimum==1.13.1 --progress-bar off
!pip install -qqq auto-gptq==0.4.2 --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/ --progress-bar off

In [None]:
import torch
from langchain import HuggingFacePipeline
from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig, pipeline

MODEL_NAME = "TheBloke/Llama-2-13b-Chat-GPTQ"

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, use_fast=True)

model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME, torch_dtype=torch.float16, trust_remote_code=True, device_map="auto"
)

generation_config = GenerationConfig.from_pretrained(MODEL_NAME)
generation_config.max_new_tokens = 1024
generation_config.temperature = 0.0001
generation_config.top_p = 0.95
generation_config.do_sample = True
generation_config.repetition_penalty = 1.15

text_pipeline = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    generation_config=generation_config,
)

llm = HuggingFacePipeline(pipeline=text_pipeline, model_kwargs={"temperature": 0})

##HyDE

In [None]:
# Load with `web_search` prompt
embeddings = HypotheticalDocumentEmbedder.from_llm(llm,
                                                   bge_embeddings,
                                                   prompt_key="web_search"
                                                   )

In [None]:
embeddings.llm_chain.prompt

PromptTemplate(input_variables=['QUESTION'], template='Please write a passage to answer the question \nQuestion: {QUESTION}\nPassage:')

In [None]:
langchain.debug = True

In [None]:
# Now we can use it as any embedding class!
result = embeddings.embed_query("What items does McDonalds make?")

[32;1m[1;3m[llm/start][0m [1m[1:llm:HuggingFacePipeline] Entering LLM run with input:
[0m{
  "prompts": [
    "Please write a passage to answer the question \nQuestion: What items does McDonalds make?\nPassage:"
  ]
}
[36;1m[1;3m[llm/end][0m [1m[1:llm:HuggingFacePipeline] [211.80s] Exiting LLM run with output:
[0m{
  "generations": [
    [
      {
        "text": "\nMcDonald's is one of the largest fast-food chains in the world, known for its signature menu items such as the Big Mac sandwich, Chicken McNuggets, and French Fries. The company also offers a variety of drinks, including soft drinks, milkshakes, and coffee. In addition to these classic items, McDonald's frequently introduces limited-time offerings and seasonal specials to keep their menu fresh and exciting for customers. Whether you're looking for a quick breakfast on-the-go or a satisfying meal for dinner, McDonald's has something for everyone.",
        "generation_info": null,
        "type": "Generation"
     

In [None]:
# result

## Multiple generations
We can also generate multiple documents and then combine the embeddings for those. By default, we combine those by taking the average. We can do this by changing the LLM we use to generate documents to return multiple things.

In [None]:
multi_llm = OpenAI(n=4, best_of=4)

In [None]:
embeddings = HypotheticalDocumentEmbedder.from_llm(
    multi_llm, bge_embeddings, "web_search"
)

In [None]:
result = embeddings.embed_query("What is McDonalds best selling item?")

[32;1m[1;3m[llm/start][0m [1m[1:llm:OpenAI] Entering LLM run with input:
[0m{
  "prompts": [
    "Please write a passage to answer the question \nQuestion: What is McDonalds best selling item?\nPassage:"
  ]
}
[36;1m[1;3m[llm/end][0m [1m[1:llm:OpenAI] [4.03s] Exiting LLM run with output:
[0m{
  "generations": [
    [
      {
        "text": " McDonalds is one of the most popular fast food restaurants in the world with its iconic golden arches logo. Its menu includes a variety of items, but one item stands out as the best seller. The Big Mac, introduced in 1968 and now one of the most iconic items in McDonalds history, is the best selling item on the menu. It is a two-patty hamburger made with a special sauce, lettuce, cheese, pickles, and onions on a sesame seed bun. The Big Mac is a classic that has stood the test of time and continues to be a favorite among customers. In 2020, McDonalds sold over 1 billion Big Macs worldwide, making it the clear best selling item in the McD

## Using our own prompts
Besides using preconfigured prompts, we can also easily construct our own prompts and use those in the LLMChain that is generating the documents. This can be useful if we know the domain our queries will be in, as we can condition the prompt to generate text more similar to that.

In the example below, let's condition it to generate text about a state of the union address (because we will use that in the next example).

In [None]:
prompt_template = """Please answer the user's question as a single food item
Question: {question}
Answer:"""

prompt = PromptTemplate(input_variables=["question"], template=prompt_template)

llm_chain = LLMChain(llm=llm, prompt=prompt)

In [None]:
embeddings = HypotheticalDocumentEmbedder(
    llm_chain=llm_chain,
    base_embeddings=bge_embeddings
)

In [None]:
result = embeddings.embed_query(
    "What is is McDonalds best selling item?"
)

[32;1m[1;3m[llm/start][0m [1m[1:llm:HuggingFacePipeline] Entering LLM run with input:
[0m{
  "prompts": [
    "Please answer the user's question as a single food item\nQuestion: What is is McDonalds best selling item?\nAnswer:"
  ]
}
[36;1m[1;3m[llm/end][0m [1m[1:llm:HuggingFacePipeline] [7.67s] Exiting LLM run with output:
[0m{
  "generations": [
    [
      {
        "text": " The Big Mac.",
        "generation_info": null,
        "type": "Generation"
      }
    ]
  ],
  "llm_output": null,
  "run": null
}


In [None]:
result

## Using HyDE

Now that we have HyDE, we can use it as we would any other embedding class! Here is using it to find similar passages in the state of the union example.

In [None]:
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma

# with open("../../state_of_the_union.txt") as f:
#     state_of_the_union = f.read()

loaders = [
    TextLoader('/content/blog_posts/blog.langchain.dev_announcing-langsmith_.txt'),
    TextLoader('/content/blog_posts/blog.langchain.dev_benchmarking-question-answering-over-csv-data_.txt'),
    TextLoader('/content/blog_posts/blog.langchain.dev_chat-loaders-finetune-a-chatmodel-in-your-voice_.txt'),
]
docs = []
for l in loaders:
    docs.extend(l.load())

text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)

texts = text_splitter.split_documents(docs) #split_text

In [None]:
texts

In [None]:
prompt_template = """Please answer the user's question as related to Large Language Models
Question: {question}
Answer:"""

prompt = PromptTemplate(input_variables=["question"], template=prompt_template)

llm_chain = LLMChain(llm=llm, prompt=prompt)

In [None]:
embeddings = HypotheticalDocumentEmbedder(
    llm_chain=llm_chain,
    base_embeddings=bge_embeddings
)

In [None]:
docsearch = Chroma.from_documents(texts, embeddings)

query = "What are chat loaders?"
docs = docsearch.similarity_search(query)

[32;1m[1;3m[llm/start][0m [1m[1:llm:OpenAI] Entering LLM run with input:
[0m{
  "prompts": [
    "Please answer the user's question as related to Large Language Models\nQuestion: What are chat loaders?\nAnswer:"
  ]
}
[36;1m[1;3m[llm/end][0m [1m[1:llm:OpenAI] [1.17s] Exiting LLM run with output:
[0m{
  "generations": [
    [
      {
        "text": " Chat loaders are software tools used to load large language models into chatbot applications. They help to optimize the performance of the chatbot by enabling it to access large language models quickly and efficiently.",
        "generation_info": {
          "finish_reason": "stop",
          "logprobs": null
        }
      }
    ]
  ],
  "llm_output": {
    "token_usage": {
      "prompt_tokens": 24,
      "completion_tokens": 39,
      "total_tokens": 63
    },
    "model_name": "text-davinci-003"
  },
  "run": null
}


In [None]:
print(docs[0].page_content)

URL: https://blog.langchain.dev/chat-loaders-finetune-a-chatmodel-in-your-voice/
Title: Chat Loaders: Fine-tune a ChatModel in your Voice

Summary

We are adding a new integration type, ChatLoaders, to make it easier to fine-tune models on your own unique writing style. These utilities help convert data from popular messaging platforms to chat messages compatible with fine-tuning formats like that supported by OpenAI.

Thank you to Greg Kamradt for Misbah Syed for their thought leadership on this.

Important Links:

Context

On Tuesday, OpenAI announced improved fine-tuning support, extending the service to larger chat models like GPT-3.5-turbo. This enables anyone to customize these larger, more capable models for their own use cases. They also teased support for fine-tuning GPT-4 later this year.

While fine-tuning is typically not advised for teaching an LLM substantially new knowledge or for factual recall; it is good for style transfer.
