# Objective: To use gguf model locally with langchain 

    embedding: HuggingFaceEmbedding(sentence-transformers/all-mpnet-base-v2)
    
    vector store: Chroma
    
    retriever: from vector store
    
    llm:
        1)mistral_7B_v0.3
        2)capybarahermes-2.5-mistral-7b.Q3_K_L.gguf

## Document Loader
    pdf loader : langchain inbuilt document loader 

In [35]:
from langchain_community.document_loaders import PyPDFLoader
file_path = (r"D:\OneDrive - Adani\Desktop\LEARNING_FOLDER\_Kolkata_2024\1_LLM\3_Text_query_bot\_docs\Leave_Policy_2024.pdf")
loader = PyPDFLoader(file_path)
pages = loader.load_and_split()
len(pages)


4

## Split
    smaller chunks

In [36]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(pages)
len(splits)


15

In [37]:
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_chroma import Chroma

## Create vector store
    stores embeddings of Documents

In [38]:
from langchain_huggingface import HuggingFaceEmbeddings

In [39]:
new_embeddings = HuggingFaceEmbeddings(model_name= "sentence-transformers/all-mpnet-base-v2")
vectorstore = Chroma.from_documents(documents=splits, embedding=new_embeddings)
vectorstore



<langchain_chroma.vectorstores.Chroma at 0x1e72ab625d0>

## Create retriever

In [40]:
retriever = vectorstore.as_retriever(search_type="similarity")

In [41]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

### LLM Used:

## 1)Model: mistral_7B_v0.3
    Langchain + CTransformer
    Size: 4.07 GB 
    RAM usage: how to find??
    Response time:
    
- quantized model with langchain
- on cpu without nvidia gpu
- cpu -- 16 gb RAM
- download the gguf model
- What is quantization --> the higger bit quant --> more ram

In [52]:
from langchain_community.llms import CTransformers
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

model_dir =  r"D:\OneDrive - Adani\Desktop\LEARNING_FOLDER\_Kolkata_2024\1_LLM\local_downloaded_models\mistral_7B_v0.3"
model_file = "Mistral-7B-Instruct-v0.3.Q4_K_M.gguf"

config = {'context_length': 16000, 'max_new_tokens': 16000}


llm = CTransformers(model= model_dir, model_file = model_file, callbacks=[StreamingStdOutCallbackHandler()], config= config)

## 2)Model:capybarahermes-2.5-mistral-7b.Q3_K_L.gguf

    Langchain + CTransformer
    Size: 3.82 GB 
    RAM usage: 6.02 GB
    Response time:


Model Specs:
- context_length: 32768

def=> number of tokens or words that model takes into account when generating a response

- max_new_tokens: 32768 

def => max number of tokens that can be generated. helps to control the length of generated output
    
(+)Note:
- if less max tokens, the processing will be faster


In [43]:
from langchain_community.llms import CTransformers
from langchain import PromptTemplate
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.prompts import PromptTemplate

model_dir =  r"D:\OneDrive - Adani\Desktop\LEARNING_FOLDER\_Kolkata_2024\1_LLM\local_downloaded_models"
model_file = "capybarahermes-2.5-mistral-7b.Q3_K_M.gguf"

config = {'context_length': 16000, 'max_new_tokens': 16000}


llm = CTransformers(model= model_dir, model_file = model_file, callbacks=[StreamingStdOutCallbackHandler()], config= config)

## Creating Prompt

In [53]:

QUERY_PROMPT = PromptTemplate(
    input_variables=["question"],
    template="""You are an AI language model assistant. Your task is to generate five
    different versions of the given user question to retrieve relevant documents from
    a vector database. By generating multiple perspectives on the user question, your
    goal is to help the user overcome some of the limitations of the distance-based
    similarity search. Provide these alternative questions separated by newlines.
    Original question: {question}""",
)


### New Terms:
- Multiqueryretriever: 
    - automates process of tuning
    - to generate multiple queries from different perspective 
    - for each query- returns relevant documents,, takes union across all 
    - Overcomes the limitation of distance based retrieval


In [54]:
from langchain.retrievers.multi_query import MultiQueryRetriever
from langchain.prompts import ChatPromptTemplate

## Exploring Retrievers

In [55]:
# RAG prompt
template = """Answer the question based ONLY on the following context:
{context}
Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

In [56]:
import logging

logging.basicConfig()
logging.getLogger("langchain.retrievers.multi_query").setLevel(logging.INFO)

In [57]:
retriever = vectorstore.as_retriever()
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

## Chain

In [58]:
chain = (
    {"context": retriever| format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

result=chain.invoke("Tell me more about PL")


Answer: Privilege Leave (PL) is an earned leave that can be availed by employees in units of half a day through appropriate channels. The balance of PL earned will be credited to the employee's account annually. Any PL that is taken as advance will be settled against the PL balance credited in the next leave year. If an employee leaves the company, any PL availed as an advance prior to leaving will need to be recovered from them unless they have already earned that leave. For employees who have recently joined, the first block of 2 years starts from January, during which they earn 21 days of PL. Any unavailed compulsory PL will lapse in the Leave Year following the year in which it was due. The maximum balance of leave at the beginning of any leave year cannot exceed 90 days; any excess amount will be automatically encashed as per procedures and rules on leave encashment. An employee may opt to encash their leave credit, subject to maintaining a balance of 30 days PL at credit. Employe

In [50]:
result


'Answering option ONLY based on the text given: Privilege Leave is an earned leave that can be availed in advance and settled against the PL balance in the next leave year.'