<a href="https://colab.research.google.com/github/gulabpatel/LLMs/blob/main/LangChain/Hacks/02_HyDE_RAG_tips_and_Tricks_BGE_Embeddings.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Code Walkthrough video: https://www.youtube.com/watch?v=v_BnBEubv58&t=125s

In [None]:
!pip -q install langchain huggingface_hub openai chromadb tiktoken faiss-cpu
!pip -q install sentence_transformers
!pip -q install -U FlagEmbedding

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.9/1.9 MB[0m [31m21.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m302.0/302.0 kB[0m [31m33.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.0/77.0 kB[0m [31m11.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m448.1/448.1 kB[0m [31m37.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m71.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m17.6/17.6 MB[0m [31m59.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m40.1/40.1 kB[0m [31m5.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.4/2.4 MB[0m [31m80.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━

In [None]:
!mkdir -p blog_posts
!unzip -q /content/langchain_blog_posts.zip -d blog_posts

In [None]:
import os

os.environ["OPENAI_API_KEY"] = ""

# Hypothetical Document Embeddings (HyDE)

modified from - https://github.com/langchain-ai/langchain/tree/master/cookbook

HyDE creates a "Hypothetical" answer with the LLM and then embeds that for search

HyDE = Base Embedding model+ LLM Chain (with prompts)

In [None]:
from langchain.llms import OpenAI
from langchain.embeddings import OpenAIEmbeddings
from langchain.chains import LLMChain, HypotheticalDocumentEmbedder
from langchain.prompts import PromptTemplate

from langchain.document_loaders import TextLoader
import langchain

## BGE Embeddings

In [None]:
from langchain.embeddings import HuggingFaceBgeEmbeddings

model_name = "BAAI/bge-small-en-v1.5"
encode_kwargs = {'normalize_embeddings': True} # set True to compute cosine similarity

bge_embeddings = HuggingFaceBgeEmbeddings(
    model_name=model_name,
    model_kwargs={'device': 'cuda'},
    encode_kwargs=encode_kwargs
)

In [None]:
# Set up the LLM
llm = OpenAI()

In [None]:
# Load with `web_search` prompt
embeddings = HypotheticalDocumentEmbedder.from_llm(llm,
                                                   bge_embeddings,
                                                   prompt_key="web_search"
                                                   )

In [None]:
embeddings.llm_chain.prompt

PromptTemplate(input_variables=['QUESTION'], template='Please write a passage to answer the question \nQuestion: {QUESTION}\nPassage:')

In [None]:
langchain.debug = True

In [None]:
# Now we can use it as any embedding class!
result = embeddings.embed_query("What items does McDonalds make?")

[32;1m[1;3m[llm/start][0m [1m[1:llm:OpenAI] Entering LLM run with input:
[0m{
  "prompts": [
    "Please write a passage to answer the question \nQuestion: What items does McDonalds make?\nPassage:"
  ]
}
[36;1m[1;3m[llm/end][0m [1m[1:llm:OpenAI] [2.81s] Exiting LLM run with output:
[0m{
  "generations": [
    [
      {
        "text": " \nMcDonalds is a fast food restaurant chain that is known for making a variety of items. These items include their signature burgers such as the Big Mac and Quarter Pounder, as well as chicken sandwiches, wraps, salads, breakfast items, desserts, and more. They also offer a range of side items, including fries, onion rings, mozzarella sticks, and more. In addition, McDonalds also offers a variety of drinks, including soft drinks, smoothies, milkshakes, coffee, and more. All of these items can be found at any McDonalds restaurant.",
        "generation_info": {
          "finish_reason": "stop",
          "logprobs": null
        }
      }
   

In [None]:
# result

## Multiple generations
We can also generate multiple documents and then combine the embeddings for those. By default, we combine those by taking the average. We can do this by changing the LLM we use to generate documents to return multiple things.

In [None]:
multi_llm = OpenAI(n=4, best_of=4)

In [None]:
embeddings = HypotheticalDocumentEmbedder.from_llm(
    multi_llm, bge_embeddings, "web_search"
)

In [None]:
result = embeddings.embed_query("What is McDonalds best selling item?")

[32;1m[1;3m[llm/start][0m [1m[1:llm:OpenAI] Entering LLM run with input:
[0m{
  "prompts": [
    "Please write a passage to answer the question \nQuestion: What is McDonalds best selling item?\nPassage:"
  ]
}
[36;1m[1;3m[llm/end][0m [1m[1:llm:OpenAI] [4.03s] Exiting LLM run with output:
[0m{
  "generations": [
    [
      {
        "text": " McDonalds is one of the most popular fast food restaurants in the world with its iconic golden arches logo. Its menu includes a variety of items, but one item stands out as the best seller. The Big Mac, introduced in 1968 and now one of the most iconic items in McDonalds history, is the best selling item on the menu. It is a two-patty hamburger made with a special sauce, lettuce, cheese, pickles, and onions on a sesame seed bun. The Big Mac is a classic that has stood the test of time and continues to be a favorite among customers. In 2020, McDonalds sold over 1 billion Big Macs worldwide, making it the clear best selling item in the McD

## Using our own prompts
Besides using preconfigured prompts, we can also easily construct our own prompts and use those in the LLMChain that is generating the documents. This can be useful if we know the domain our queries will be in, as we can condition the prompt to generate text more similar to that.

In the example below, let's condition it to generate text about a state of the union address (because we will use that in the next example).

In [None]:
prompt_template = """Please answer the user's question as a single food item
Question: {question}
Answer:"""

prompt = PromptTemplate(input_variables=["question"], template=prompt_template)

llm_chain = LLMChain(llm=llm, prompt=prompt)

In [None]:
embeddings = HypotheticalDocumentEmbedder(
    llm_chain=llm_chain,
    base_embeddings=bge_embeddings
)

In [None]:
result = embeddings.embed_query(
    "What is is McDonalds best selling item?"
)

[32;1m[1;3m[llm/start][0m [1m[1:llm:OpenAI] Entering LLM run with input:
[0m{
  "prompts": [
    "Please answer the user's question as a single food item\nQuestion: What is is McDonalds best selling item?\nAnswer:"
  ]
}
[36;1m[1;3m[llm/end][0m [1m[1:llm:OpenAI] [211ms] Exiting LLM run with output:
[0m{
  "generations": [
    [
      {
        "text": " Big Mac",
        "generation_info": {
          "finish_reason": "stop",
          "logprobs": null
        }
      }
    ]
  ],
  "llm_output": {
    "token_usage": {
      "prompt_tokens": 26,
      "completion_tokens": 2,
      "total_tokens": 28
    },
    "model_name": "text-davinci-003"
  },
  "run": null
}


In [None]:
result

[-0.044095832854509354,
 -0.06810590624809265,
 0.004882005043327808,
 -0.07507439702749252,
 0.06125683709979057,
 -0.007851513102650642,
 0.018615737557411194,
 -0.03252461180090904,
 -0.006711804773658514,
 -0.032447636127471924,
 0.0023166481405496597,
 -0.017370186746120453,
 0.0515742152929306,
 -0.01829112134873867,
 0.053303007036447525,
 0.04405609890818596,
 0.08549801260232925,
 -0.05806007981300354,
 -0.048251908272504807,
 -0.008680134080350399,
 0.03713998943567276,
 -0.047634974122047424,
 -0.05635959655046463,
 0.004725407809019089,
 -0.030568445101380348,
 -0.001297995913773775,
 -0.012270244769752026,
 -0.0032627666369080544,
 -0.0542445033788681,
 -0.17446546256542206,
 -0.013889017514884472,
 -0.017733868211507797,
 0.09076353907585144,
 -0.0187506265938282,
 0.025643033906817436,
 0.011270520277321339,
 0.020745573565363884,
 -0.0077807423658668995,
 -0.02745167911052704,
 -0.01441169809550047,
 0.09279394894838333,
 0.032126061618328094,
 0.0019034186843782663,
 -

## Using HyDE

Now that we have HyDE, we can use it as we would any other embedding class! Here is using it to find similar passages in the state of the union example.

In [None]:
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma

# with open("../../state_of_the_union.txt") as f:
#     state_of_the_union = f.read()

loaders = [
    TextLoader('/content/blog_posts/blog.langchain.dev_announcing-langsmith_.txt'),
    TextLoader('/content/blog_posts/blog.langchain.dev_benchmarking-question-answering-over-csv-data_.txt'),
    TextLoader('/content/blog_posts/blog.langchain.dev_chat-loaders-finetune-a-chatmodel-in-your-voice_.txt'),
]
docs = []
for l in loaders:
    docs.extend(l.load())

text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)

texts = text_splitter.split_documents(docs) #split_text

In [None]:
texts

[Document(page_content='URL: https://blog.langchain.dev/announcing-langsmith/\nTitle: Announcing LangSmith, a unified platform for debugging, testing, evaluating, and monitoring your LLM applications\n\nLangChain exists to make it as easy as possible to develop LLM-powered applications.\n\nWe started with an open-source Python package when the main blocker for building LLM-powered applications was getting a simple prototype working. We remember seeing Nat Friedman tweet in late 2022 that there was “not enough tinkering happening.” The LangChain open-source packages are aimed at addressing this and we see lots of tinkering happening now (Nat agrees)–people are building everything from chatbots over internal company documents to an AI dungeon master for a Dungeons and Dragons game.', metadata={'source': '/content/blog_posts/blog.langchain.dev_announcing-langsmith_.txt'}),
 Document(page_content='The blocker has now changed. While it’s easy to build a prototype of an application in ~5 lin

In [None]:
prompt_template = """Please answer the user's question as related to Large Language Models
Question: {question}
Answer:"""

prompt = PromptTemplate(input_variables=["question"], template=prompt_template)

llm_chain = LLMChain(llm=llm, prompt=prompt)

In [None]:
embeddings = HypotheticalDocumentEmbedder(
    llm_chain=llm_chain,
    base_embeddings=bge_embeddings
)

In [None]:
docsearch = Chroma.from_documents(texts, embeddings)

query = "What are chat loaders?"
docs = docsearch.similarity_search(query)

[32;1m[1;3m[llm/start][0m [1m[1:llm:OpenAI] Entering LLM run with input:
[0m{
  "prompts": [
    "Please answer the user's question as related to Large Language Models\nQuestion: What are chat loaders?\nAnswer:"
  ]
}
[36;1m[1;3m[llm/end][0m [1m[1:llm:OpenAI] [1.17s] Exiting LLM run with output:
[0m{
  "generations": [
    [
      {
        "text": " Chat loaders are software tools used to load large language models into chatbot applications. They help to optimize the performance of the chatbot by enabling it to access large language models quickly and efficiently.",
        "generation_info": {
          "finish_reason": "stop",
          "logprobs": null
        }
      }
    ]
  ],
  "llm_output": {
    "token_usage": {
      "prompt_tokens": 24,
      "completion_tokens": 39,
      "total_tokens": 63
    },
    "model_name": "text-davinci-003"
  },
  "run": null
}


In [None]:
print(docs[0].page_content)

URL: https://blog.langchain.dev/chat-loaders-finetune-a-chatmodel-in-your-voice/
Title: Chat Loaders: Fine-tune a ChatModel in your Voice

Summary

We are adding a new integration type, ChatLoaders, to make it easier to fine-tune models on your own unique writing style. These utilities help convert data from popular messaging platforms to chat messages compatible with fine-tuning formats like that supported by OpenAI.

Thank you to Greg Kamradt for Misbah Syed for their thought leadership on this.

Important Links:

Context

On Tuesday, OpenAI announced improved fine-tuning support, extending the service to larger chat models like GPT-3.5-turbo. This enables anyone to customize these larger, more capable models for their own use cases. They also teased support for fine-tuning GPT-4 later this year.

While fine-tuning is typically not advised for teaching an LLM substantially new knowledge or for factual recall; it is good for style transfer.
