# Retrieval-augmented generation with LangChain

## References: 
1. [LangChain docs - question-answering](https://python.langchain.com/docs/use_cases/question_answering/)
1. [Medium/the-techlife - Using HF and Cohere models](https://medium.com/the-techlife/using-huggingface-openai-and-cohere-models-with-langchain-db57af14ac5b)
1. [Lightning AI - Comparing different language models for question-answering](https://lightning.ai/pages/community/community-discussions/the-ultimate-battle-of-language-models-lit-llama-vs-gpt3.5-vs-bloom-vs/)

In [8]:
!pip install langchain
# !pip install openai
!pip install sentence_transformers
# !pip install chromadb
!pip install llama-cpp-python

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Collecting llama-cpp-python
  Downloading llama_cpp_python-0.1.77.tar.gz (1.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1

In [2]:
from langchain import HuggingFaceHub
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.document_loaders import WebBaseLoader
from langchain.indexes import VectorstoreIndexCreator
from langchain.vectorstores import FAISS, Chroma
from langchain.text_splitter import CharacterTextSplitter
from langchain import HuggingFacePipeline
from sentence_transformers import SentenceTransformer



In [6]:
from langchain import PromptTemplate, LLMChain
from langchain.llms import HuggingFacePipeline
from sentence_transformers import SentenceTransformer, util
from transformers import pipeline

# model = SentenceTransformer('all-MiniLM-L6-v2')
model = pipeline(model="sentence-transformers/all-MiniLM-L6-v2")

#Sentences are encoded by calling model.encode()
emb1 = model.encode("This is a red cat with a hat.")
emb2 = model.encode("Have you seen my red cat?")

cos_sim = util.cos_sim(emb1, emb2)
print("Cosine-Similarity:", cos_sim)

template = PromptTemplate(input_variables=["input"], template="{input}")
chain = LLMChain(llm=model, verbose=True, prompt=template)
chain("What is the meaning of life?")

Downloading (…)lve/main/config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

In [None]:
from langchain import PromptTemplate, LLMChain
from langchain.llms import HuggingFacePipeline

model= pipeline(model="google/flan-t5-large") #'text2text-generation'
model.save_pretrained("~/flan-t5-large")
llm = HuggingFacePipeline.from_model_id(model_id="~/flan-t5-large", task="text2text-generation", model_kwargs={"temperature":9})

In [None]:
template = PromptTemplate(input_variables=["input"], template="{input}")
chain = LLMChain(llm=llm, verbose=True, prompt=template)
chain("What is the meaning of life?")

In [None]:
# https://github.com/AndreasFischer1985/code-snippets/blob/master/py/LangChain_HuggingFace_examples.py
from transformers import pipeline
model= pipeline(model="sentence-transformers/all-MiniLM-L6-v2") 
model.save_pretrained("~/all-MiniLM-L6-v2")


In [None]:
HUGGING_FACE_API_KEY = "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"

model = HuggingFacePipeline.from_model_id(
    model_id="sentence-transformers/all-MiniLM-L6-v2",
    task="text-generation",
    model_kwargs={"temperature": 0, "max_length": 512},
)

loader = WebBaseLoader("https://lilianweng.github.io/posts/2023-06-23-agent/")
embeddings = (
    HuggingFaceEmbeddings(
        model_name='sentence-transformers/all-MiniLM-L6-v2', 
        encode_kwargs={"max_new_tokens": 512}
    )
)

In [None]:
embeddings

In [None]:
index = (
    VectorstoreIndexCreator(
        embedding=embeddings, 
        vectorstore_cls=Chroma,
        text_splitter=CharacterTextSplitter(chunk_size=100,chunk_overlap=0)
    ).from_loaders([loader])
)

In [None]:
model

In [None]:
embeddings

In [None]:
index

In [None]:
index.query("Compare chain of though vs tree of thought methods", llm=model)

In [19]:
# https://github.com/Lightning-Universe/Comparing_LLM_Blogpost/blob/master/Comparing_LLM.py

import torch 
from pathlib import Path
from langchain.llms.base import LLM
from transformers import T5Tokenizer,T5ForConditionalGeneration
from pydantic import BaseModel
from typing import Optional, List

class CustomPipeline(LLM): 
    def __init__(self, model_id):
        super().__init__()
        global model, tokenizer, model_name
    
        device_map = "auto"
        model_name = model_id
        model = T5ForConditionalGeneration.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map=device_map)
        tokenizer = T5Tokenizer.from_pretrained(model_id)
        
    @property
    def _llm_type(self) -> str:
        return "custom_pipeline"

    def _call(self, prompt: str, stop: Optional[List[str]] = None):
        with torch.no_grad():
            input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(model.device)
            outputs = model.generate(input_ids, max_new_tokens = 70)
            response = tokenizer.decode(outputs[0], skip_special_tokens=True)
            return response
        

In [20]:
model_id = 'google/flan-t5-xxl'
llm = CustomPipeline(model_id)

Downloading (…)lve/main/config.json:   0%|          | 0.00/674 [00:00<?, ?B/s]

Downloading (…)fetensors.index.json:   0%|          | 0.00/53.0k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/5 [00:00<?, ?it/s]

Downloading (…)of-00005.safetensors:   0%|          | 0.00/9.45G [00:00<?, ?B/s]

Downloading (…)of-00005.safetensors:   0%|          | 0.00/9.60G [00:00<?, ?B/s]

Downloading (…)of-00005.safetensors:   0%|          | 0.00/9.96G [00:00<?, ?B/s]

Downloading (…)of-00005.safetensors:   0%|          | 0.00/10.0G [00:00<?, ?B/s]

Downloading (…)of-00005.safetensors:   0%|          | 0.00/6.06G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/5 [00:00<?, ?it/s]

Downloading (…)neration_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

Downloading spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/2.54k [00:00<?, ?B/s]

In [22]:
template = """
We have provided context information below.

Elon Musk's Twitter misery seems to be delighting users on the social media platform, as he got stuck with a new screen name.

Using only this information, please answer the question: {text}
"""

prompt_template = PromptTemplate(input_variables=[ "text"], template=template)
answer_chain = LLMChain(llm=llm , prompt=prompt_template)

questions = ["what’s Elon's new Twitter username?",
    "why is it funny that he cannot change it?",
    "make a joke about this",
    "How did this get started?"
    ]

for question in questions:
    answer_chain = LLMChain(llm=llm , prompt=prompt_template)
    answer = answer_chain.run(question)
    print(f"\nThe question is: {question }")
    print(f"\n {answer.strip()}")


The question is: what’s Elon's new Twitter username?

 @elonmusk

The question is: why is it funny that he cannot change it?

 he got stuck with a new screen name

The question is: make a joke about this

 Elon Musk is a twit

The question is: How did this get started?

 Elon Musk got stuck with a new screen name
