## Concepts

### LLM (Large Language Models)
Large Language Models refer to advanced machine learning models that are trained on vast amounts of textual data to understand and generate human-like text. 
These models are pre-trained on a diverse range of internet text and then fine-tuned for specific tasks.

### Prompt Engineering
Prompt engineering involves crafting effective prompts or input queries to elicit desired responses from language models. 
This technique is particularly useful when interacting with large language models for specific tasks or generating targeted outputs. 
Prompt engineering aims to guide the model in producing the desired type of information or response.

### RAG (Retrieval-Augmented Generation)
Framework designed for natural language processing tasks. It combines retrieval-based models with generative models to enhance the capabilities of language understanding and generation. The RAG pipeline involves two main components: a retriever and a generator.

- Retriever
  The retriever is responsible for selecting relevant information from a large dataset or knowledge base.
  It does this by retrieving a subset of documents that are likely to contain relevant information for a given input or query.
  One common approach for retrieval is to use dense vector representations of documents and queries.
  Similarity metrics, such as cosine similarity, can then be employed to identify the most relevant documents.

- Generator
  The generator is a language model capable of producing coherent and contextually relevant text.
  It takes the information retrieved by the retriever and generates a response, completing a natural language understanding or generation task.

  <img src="./rag.png" alt="Vector" width="600" />

### Why Should One Use RAG?
There are three ways an LLM can learn new data.

- **Training:** A large mesh of neural networks is trained over trillions of tokens with billions of parameters to create Large Language Models. The parameters of a deep learning model are the coefficients or weights that hold all the information regarding the particular model. To train a model like GPT-4 costs hundreds of millions of dollars. This way is beyond anyone’s capacity. We cannot re-train such a humongous model on new data. This is not feasible.
- **Fine-tuning:** Another option is to fine-tune a model on existing data. Fine-tuning involves using a pre-trained model as a starting point during training. We use the knowledge of the pre-trained model to train a new model on different data sets. Albeit it is very potent, it is expensive in terms of time and money. Unless there is a specific requirement, fine-tuning does not make sense.
- **Prompting:** Prompting is the method where we fit new information within the context window of an LLM and make it answer the queries from the information given in the prompt. It may not be as effective as knowledge learned during training or fine-tuning, but it is sufficient for many real-life use cases, such as document Q&A.
<br><br>
Prompting for answers from text documents is effective, but these documents are often much larger than the context windows of Large Language Models (LLMs), posing a challenge. Retrieval Augmented Generation (RAG) pipelines address this by processing, storing, and retrieving relevant document sections, allowing LLMs to answer queries efficiently. So, let’s discuss the crucial components of an RAG pipeline.

### Embeddings
Embedding refers to the process of representing objects or entities (such as words, images, or documents) in a lower-dimensional space.
It's basically the data that converted in form of numbers (vector).

### Vectorsiation
Vectorization is the process of converting data into the vector form.
<br>
or
<br>
Process of converting data into embeddings and putting it into vector database.

<img src="./vector.jpg" alt="Vector" width="400" />


### Demo Pipeline
<img src="./pipeline.png" alt="pipeline" width="600" />


Requirements

In [None]:
%pip install langchain==0.0.284 google-generativeai sentence-transformers langchain_community python-dotenv==1.0.0 streamlit==1.22.0 tiktoken==0.4.0 faiss-cpu==1.7.4 protobuf~=3.19.0

Importing LLM (Google Palm)

In [None]:
from langchain.llms import GooglePalm

api_key = "AIzaSyDRrLdX40JvYyb8q-LHKu4QdvJgQUsp-_Y"

llm = GooglePalm(google_api_key = api_key, temprature = 0.7)



Testing Prompt

In [None]:
poem = llm("Write four line on India.")
poem

Loading Company's data as CSV

Note: It can be loaded and updated in any format according to need

In [None]:
from langchain.document_loaders.csv_loader import CSVLoader

loader = CSVLoader(file_path="sample.csv", source_column="prompt")
data = loader.load()

Creating Embedding using Langchain's InstructEmbeddings

Doc: [Instruct Embeddings on Hugging Face](https://python.langchain.com/docs/integrations/text_embedding/instruct_embeddings)

Note: Most voted embedding on LangChain leaderboard should be used

In [None]:
from langchain_community.embeddings import HuggingFaceInstructEmbeddings


instructor_embeddings = HuggingFaceInstructEmbeddings()

Testing Embedding Query

In [None]:
q = instructor_embeddings.embed_query("What is your refund policy?")
q

Storing Embedding in Vector Database (FAISS) 

Vectorisation

Note: Chroma Vector Database should be efficient

In [None]:
from langchain.vectorstores import FAISS

vector_db = FAISS.from_documents(documents=data, embedding=instructor_embeddings)

Retrieval

Note: CSV Retriever will be efficient (if csv file)

This is the most imp step

In [None]:
''' 
Creating Retriever Object 
- It takes input question
- Its compare its embedding with embedding of data in database
- Return most suitable response/s using cosine similarity
'''
retriever = vector_db.as_retriever()

#showing relevent docs
rdocs= retriever.get_relevant_documents("how about job placement support?", top_k=3)
rdocs

Excracting final response from relevent docs

In [None]:
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate

prompt_template = """Given the following context and a question, generate an answer based on this context only.
    In the answer try to provide as much text as possible from "response" section in the source document context without making much changes.
    If the answer is not found in the context, then give most suitable answer from your general knowledge but if it's yes or no question, then answer "I don't know".

    CONTEXT: {context}

    QUESTION: {question}"""

PROMPT = PromptTemplate(
        template=prompt_template, input_variables=["context", "question"]
        )

chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type = "stuff",
    retriever=retriever,
    input_key="query",
    return_source_documents=True,
    chain_type_kwargs={"prompt": PROMPT}
)


# chain("can i take this bootcamp without any prior knowledge of coding?")
# chain("is javascript important for frontend development?")
chain("do you have javascript course?")