<a href="https://colab.research.google.com/github/hodzicc/RAG/blob/main/RAG.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Requirements

In [None]:
!pip install -Uqqq pip
!pip install torch==2.0.1
!pip install transformers==4.31.0
!pip install langchain==0.0.266
!pip install chromadb==0.4.5
!pip install pypdf==3.15.0
!pip install xformers==0.0.20
!pip install sentence_transformers==2.2.2
!pip install InstructorEmbedding==1.0.1
!pip install pdf2image==1.16.3

In [None]:
!pip install auto-gptq --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/

RAG

In [None]:
import torch
from auto_gptq import AutoGPTQForCausalLM
from langchain import HuggingFacePipeline, PromptTemplate
from langchain.chains import RetrievalQA
from langchain.document_loaders import PyPDFDirectoryLoader
from langchain.embeddings import HuggingFaceInstructEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from transformers import AutoTokenizer, TextStreamer, pipeline

DEVICE = "cuda:0" if torch.cuda.is_available() else "cpu"

In [None]:
loader = PyPDFDirectoryLoader("/content/drive/MyDrive/c book1/")
doc = loader.load()
doc

[Document(page_content='1', metadata={'source': '/content/drive/MyDrive/c book1/The C Programming Language (Kernighan Ritchie).pdf', 'page': 0}),
 Document(page_content='2\nPreface .................................................................................................................................... 6\nPreface to the first edition ........................................................................................................ 8\nChapter 1 - A Tutorial Introduction ......................................................................................... 9\n1.1 Getting Started .............................................................................................................. 9\n1.2 Variables and Arithmetic Expressions .......................................................................... 11\n1.3 The for statement ......................................................................................................... 15\n1.4 Symbolic Constants ........

In [None]:
'''embeddings = HuggingFaceInstructEmbeddings(
    model_name="hkunlp/instructor-large"
)
'''
from langchain.embeddings import HuggingFaceInstructEmbeddings

model_name = "sentence-transformers/quora-distilbert-base"  # smaller model
model_kwargs = {'device': 'cpu'}
encode_kwargs = {'normalize_embeddings': True}
hf = HuggingFaceInstructEmbeddings(
    model_name=model_name,
    model_kwargs=model_kwargs,
    encode_kwargs=encode_kwargs
)

In [None]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)
texts = text_splitter.split_documents(doc)
len(texts)

1420

In [None]:
db = Chroma.from_documents(texts, hf, persist_directory="db")

In [None]:

model_name_or_path = "TheBloke/Llama-2-13B-chat-GPTQ"
model_basename = "model"

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)

model = AutoGPTQForCausalLM.from_quantized(
    model_name_or_path,
    revision="gptq-4bit-128g-actorder_True",
    model_basename=model_basename,
    use_safetensors=True,
    trust_remote_code=True,
    device=DEVICE,
    inject_fused_attention=False,
    quantize_config=None,
)



tokenizer_config.json:   0%|          | 0.00/727 [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/411 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/837 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/761 [00:00<?, ?B/s]

quantize_config.json:   0%|          | 0.00/187 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/7.26G [00:00<?, ?B/s]

INFO - The layer lm_head is not quantized.
INFO:auto_gptq.modeling._base:The layer lm_head is not quantized.


In [None]:

DEFAULT_SYSTEM_PROMPT = """
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.

If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
""".strip()


def generate_prompt(prompt: str, system_prompt: str = DEFAULT_SYSTEM_PROMPT) -> str:
    return f"""
[INST] <>
{system_prompt}
<>

{prompt} [/INST]
""".strip()

In [None]:
from transformers import StoppingCriteriaList
from torch import LongTensor, FloatTensor, eq, device
from transformers import StoppingCriteria


device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

stop_list = [" Human: ", " \nHuman:" ]
stop_token_ids = [tokenizer(x, return_tensors='pt', add_special_tokens=False)['input_ids'] for x in stop_list]
stop_token_ids = [LongTensor(x).to(device) for x in stop_token_ids]

class StopOnTokens(StoppingCriteria):
    def __call__(self, input_ids: LongTensor, scores: FloatTensor, **kwargs) -> bool:
        for stop_ids in stop_token_ids:
            if (input_ids[0][-len(stop_ids[0])+1:] == stop_ids[0][1:]).all():
                return True
        return False

stopping_criteria = StoppingCriteriaList([StopOnTokens()])

In [None]:
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

text_pipeline = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=4000,
    do_sample=False,  # Enable sampling
    temperature=0.7,  # Set temperature for randomness
    top_p=0.85,  # Set nucleus sampling
    repetition_penalty=1.15,
    streamer=streamer,
    stopping_criteria=stopping_criteria,
    add_special_tokens=False
)


In [None]:
llm = HuggingFacePipeline(pipeline=text_pipeline)

In [None]:
SYSTEM_PROMPT = "Use the following pieces of context to answer the question at the end. If you don't know the answer, just say 'I don't know', don't try to make up an answer."

template = generate_prompt(
    """
{context}

Question: {question}
""",
    system_prompt=SYSTEM_PROMPT,
)


prompt = PromptTemplate(template=template, input_variables=["context", "question"])

retriever=db.as_retriever(search_kwargs={"k": 2})

qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever,
    return_source_documents=True,
    chain_type_kwargs={"prompt": prompt},
)

In [None]:

qa_chain(
    "What is an external variable?"
)


") [INST] <>
Use the following pieces of context to answer the question at the end. If you don't know the answer, just say 'I don't know', don't try to make up an answer.
<>


define external variables andfunctions thatarevisible only within asingle source file.Because
external variables areglobally accessible, they provide analternative tofunction arguments and
return values forcommunicating data between functions. Any function may access anexternal
variable by referring to it by name, if the name has been declared somehow. 
Ifalarge number ofvariables must beshared among functions, external variables aremore

64
variables andfunctions have theproperty that allreferences tothem bythesame name, even
from functions compiled separately, arereferences tothesame thing. (The standard calls this
property external linkage .)Inthis sense, external variables are analogous toFortran
COMMON blocks orvariables intheoutermost block inPascal. Wewillseelater how to
define external variables andfuncti



[/INST]  Based on the provided context, I can answer the question as follows:

An external variable is a variable that is defined outside of a function and is visible only within a single source file. It can be accessed by any function within the same source file by referring to it by name, and all references to the same name refer to the same thing, even if the functions are compiled separately.


{'query': 'What is an external variable?',
 'result': "[INST] <>\nUse the following pieces of context to answer the question at the end. If you don't know the answer, just say 'I don't know', don't try to make up an answer.\n<>\n\n\ndefine external variables andfunctions thatarevisible only within asingle source file.Because\nexternal variables areglobally accessible, they provide analternative tofunction arguments and\nreturn values forcommunicating data between functions. Any function may access anexternal\nvariable by referring to it by name, if the name has been declared somehow. \nIfalarge number ofvariables must beshared among functions, external variables aremore\n\n64\nvariables andfunctions have theproperty that allreferences tothem bythesame name, even\nfrom functions compiled separately, arereferences tothesame thing. (The standard calls this\nproperty external linkage .)Inthis sense, external variables are analogous toFortran\nCOMMON blocks orvariables intheoutermost block 

Chat history awareness - in progress



In [None]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.prompts import MessagesPlaceholder
from langchain_core.messages import HumanMessage, AIMessage
from langchain.chains.history_aware_retriever import create_history_aware_retriever

In [None]:
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains import create_retrieval_chain

def create_chain():


    retriever_prompt = ChatPromptTemplate.from_messages([
        MessagesPlaceholder(variable_name="chat_history"),
        ("user", "{input}"),
        ("system", "Given the above conversation and a follow up question, rephrase the follow up question to be a standalone question. If it's not a follow up question return it as is. Follow Up Input: {input} Standalone question:"),
         ])

    history_aware_retriever = create_history_aware_retriever(
        llm=llm,
        retriever=retriever,
        prompt=retriever_prompt,
    )

    prompt = ChatPromptTemplate.from_messages([
       ("system", "You are a helpful, honest AI assistant that uses only information provided here. Provide an answer to this question: {input} based on the following context only: {context}. If the context does not contain information to answer the question, reply with 'I don't know.' Do not use any other sources or your own knowledge. DO NOT continue the conversation, DO NOT act as Human.\
        User question: {input}\
        AI assistant answer: "),
        MessagesPlaceholder(variable_name="chat_history"),

    ])


    chain = create_stuff_documents_chain(
        llm=llm,
        prompt=prompt,

    )


    retrieval_chain = create_retrieval_chain(
        history_aware_retriever,
        chain,
    )

    return retrieval_chain





In [None]:
chain = create_chain()
chat_history = []

while True:
    user_input = input("You: ")
    if user_input.lower() == 'exit':
        break

    # Append user input to history
    chat_history.append(HumanMessage(content=user_input))

    # Invoke the chain with current chat history and user input
    response = chain.invoke({
        "chat_history": chat_history[-3:],
        "input": user_input,
    })["answer"]

    # Print and append response to history
    #print("Assistant:", response)
    chat_history.append(AIMessage(content=response))


You: What is a pointer variable?




What is a pointer variable?





AI Assistant: A pointer variable is a variable that stores the memory address of another variable.
You: Can you give me an example?





Standalone Question: What is an example of a pointer variable?





AI Assistant: Sure! Here's an example: int x = 5; double *y = &x; In this example, y is a pointer variable that points to the memory location of x.
You: What is a global variable?










AI Assistant: A global variable is a variable that can be accessed from anywhere in the program. It is declared outside of any function and has a scope that spans the entire program.
You: What is a neural network?







You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset



AI Assistant: I don't know.
You: How could we print text in C?












This is a friendly reminder - the current text generation call will exceed the model's predefined maximum length (2048). Depending on the model, you may observe exceptions, performance degradation, or nothing at all.


AI Assistant: To print text in C, you can use the printf function, which takes a format string and arguments to display. The format string can include conversion specifiers such as %d, %c, etc. to specify how the arguments should be formatted.
You: Example?










AI Assistant: Sure! Here's an example: #include <stdio.h> int main() { printf("Hello, world!\n"); return 0; } This program prints "Hello, world!" to the console.
You: exit
