# RAG

---

### Configure OLLama

Install OLLama

Open terminal and type:

`ollama run phi3:3.8b`

and 

`ollama serve`

Then set up chat with ollama:

In [1]:
import json
import requests


def chat_ollama(text):
    url = "http://localhost:11434/api/chat"

    payload = {
        "model": "phi3:3.8b",
        "messages": [{"role": "user", "content": text}],
        "stream": False,
    }

    response = requests.post(url, json=payload)

    # Parse the response
    if response.status_code == 200:
        return response.json()["message"]["content"]

    return None

Using `generate` endpoint

In [2]:
url = "http://localhost:11434/api/generate"


def prompt_ollama(text):
    payload = {
        "model": "phi3:3.8b",
        "prompt": text,
        "context": [],
        "options": {"top_k": 10, "temperature": 0},
        "stream": False,
    }

    response = requests.post(url, data=json.dumps(payload))

    # Parse the response
    if response.status_code == 200:
        return response.json()["response"]

    return None

In [3]:
prompt_ollama("Briefly explain what RAG is")

'RAG stands for "Rapid Application Development." It\'s a software development methodology that emphasizes quick, iterative prototyping and testing. The goal of Rapid Application Development (RAD) is to produce applications faster by involving the client in every stage of development through rapid prototyping and continuous user feedback. This approach allows for adjustments based on real-time input from users or stakeholdselicitating changes throughout the project, which can lead to a more refined final product that better meets their needs. RAD typically involves less planning but requires strong teamwork and collaboration between developers, clients, and end-users.'

### Configure Qdrant

Pull image

```
docker pull qdrant/qdrant
```


Then, run the service:
(on windows)
```
docker run -p 6333:6333 -p 6334:6334 -v .\qdrant_storage:\qdrant\storage:z qdrant/qdrant
```

In [4]:
from qdrant_client import QdrantClient

client = QdrantClient(url="http://localhost:6333")

In [5]:
from qdrant_client.http.models import VectorParams, Distance

collection_name = "rag_collection_nlp_course"
client.delete_collection(collection_name=collection_name)
client.create_collection(
    collection_name=collection_name,
    vectors_config=VectorParams(size=384, distance=Distance.COSINE),
)

True

### Load the pdf

Example paper is [this paper](https://doi.org/10.1016/j.softx.2024.101944) about efficient molecular fingerprint computation.

In [6]:
from langchain.document_loaders import PyPDFLoader

documents = PyPDFLoader("paper.pdf").load()

### Split the pdf

In [7]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000, chunk_overlap=200, separators=["\n", " ", ""]
)

documents = splitter.split_documents(documents)

### Store the vectors

In [8]:
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_qdrant import Qdrant

embeddings_model = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)
vector_store = Qdrant(
    client=client, collection_name=collection_name, embeddings=embeddings_model
)
vector_store.add_documents(documents)
pass



  vector_store = Qdrant(client=client, collection_name=collection_name, embeddings=embeddings_model)


### Create RAG pipeline

In [9]:
from langchain.chains.retrieval_qa.base import RetrievalQA
from langchain_ollama import OllamaLLM

retriever = vector_store.as_retriever(search_type="similarity")
llm = OllamaLLM(model="phi3:3.8b")
rag_chain = RetrievalQA.from_chain_type(llm=llm, retriever=retriever)

### Compare answers

In [10]:
def compare_answers(question):
    print(question)

    response_llm = prompt_ollama(question)
    print("\nLLM answer:")
    print(response_llm)

    response_rag = rag_chain.invoke(question)["result"]
    print("\nRAG answer:")
    print(response_rag)

In [11]:
compare_answers("What are molecular fingerprints?")

What are molecular fingerprints?

LLM answer:
Molecular fingerprints, also known as chemical descriptors or structural representations of a compound's structure in cheminformatics and computational chemistry, provide a unique identifier for the molecde based on its atomic composition and arrangement. They serve to represent molecules numerically so that they can be easily compared using computer algorithms without needing their full 3D structures. Molecular fingerprints are typically binary or count-based vectors where each element corresponds to the presence (or absence, in some cases) of specific substructures within a molecule such as functional groups, rings, and other chemical features. These representations facilitate rapid similarity searching among large databases of compounds for drug discovery, virtual screening, and predictive modeling applications by allowing quick comparison based on structural or property similarities without the need to visualize complex 3D structures di

Both responses are good and accurate. I don't know where the word sole0nally came from

In [12]:
compare_answers("List out popular molecular fingerprints.")

List out popular molecular fingerprints.

LLM answer:
1. Extended-Connectivity Fingerprint (ECFP4) - A widely used circular substructure based on Morgan algorithm, which considers the connectivity of atoms within a certain radius around each atom in the molecule and generates bit vectors representing these patterns. It is commonly employed for drug discovery applications due to its ability to capture essential chemical information about compounds effectively.

2. Extended-Connectivity Fingerprint (ECFP6) - Similar to ECFP4, but with a larger radius of 6 Ångström and uses the same Morgan algorithm approach; it provides more detailed molecular representations by considering six atoms around each atom in the compound's structure.

3. Mixed-Mass Fingerprint (MMF) - This fingerprint considers both atomic mass and connectivity, capturing information about isotopes as well as chemical bonds within a given radius of an atom to create its bit vector representation. It has been used in various c

The LLM gives incorrect description and hallucinates non-existent fingerprints. RAG lists out actual fingerprints and gives an overall more accurate result despite still not being fully accurate

In [13]:
compare_answers("What is chemoinformatics?")

What is chemoinformatics?

LLM answer:
Chemoinformatics, also known as cheminformatic or chemical informatics, refers to the use of computer and informational techniques for managing data related to molecular and chemical compounds. It involves methods that facilitate the creation, storage, retrieval, analysis, visualization, and sharing of information about chemicals in databases such as their structures, properties, biological activities, and reactions they undergo or participate in within a laboratory setting. Chemoinformatics tools are used to predict molecular behavior, design new compounds with desired traits for drug discovery, optimize synthetic routes, understand structure-activity relationships (SAR), and more efficiently navigate the vast chemical space of potential therapenergic agents or other bioactive substances.

Chemoinformatics combines aspects from chemistry, computer science, mathematics, engineering, and biology to handle complex data sets that are often too large 

The rag answer has some weird text elements like "cheminformaticScience". It also adds a reference that exists and is related to the topic.

In [14]:
compare_answers("What is molecular property prediction?")

What is molecular property prediction?

LLM answer:
Molecular property prediction refers to the computational process of predicting physical, chemical, and biological properties of a molecule based on its structure. This can include attributes such as solubility, stability, reactivity, bioactivity, or even more complex characteristics like pharmacokinetics in drug discovery applications. The goal is often to identify promising compounds for further experimental testing without the need for synthesizing and testing each molecule physically. This process typically involves using computational methods such as quantum mechanics calculations, machine learning algorithms, or empirical rules derived from known data sets of similar molecules (quantitative structure-property relationships).

RAG answer:
Molecular property prediction refers to the use of computational methods and algorithms to predict physical-chemical properties (such as solubility, melting point, or biological activity) of a g

Both answers are okay

In [15]:
compare_answers("What are the uses of machine learning in chemoinformatics?")

What are the uses of machine learning in chemoinformatics?

LLM answer:
Machine learning has become an integral part of modern chemoinformatics, which is a field that combines chemical analysis with computational techniques to understand and predict molecular properties. Here's how it contributes:

1. **Drug Discovery**: Machine learning algorithms can analyze large datasets of compounds to identify potential drug candidates by finding patterns correlating structure-activity relationships (SAR). This accelerates the identification of new drugs with desired biological activities and reduces reliance on traditional high-throughput screening methods, which are time-consuminous and costly.

2. **Predictive Modeling**: Machine learning can predict properties such as solubility, permeability, stability, or toxicity of molecules based on their chemical structure using quantitative structure-activity relationship (QSAR) models. This helps in the early stages of drug design and development by f

Rag gave more precise answer but it didn't write the references in [square brackets] properly.

Only one article exists. As far as I know mr Adamczyk has not published anything in Journal on Cheminformatics as of today

### Conclusion 

RAG allowed us to get more specific responses. They enable the system to anchor in a specific domain and set of information.

### Questions

1. How does RAG improve the quality and reliability of LLM responses compared to pure LLM generation?
    - It is able to search for the needed information in a document store and incorporates it in the context.
2. What are the key factors affecting RAG performance (chunk size, embedding quality, prompt design)?
    - Large chunks make larger context and add more information. However, some texts may have multiple pieces of information in one chunk, sometimes unrelated to the query.
    - Better embeddings can improve the system performance. A domain specific embedding model will create better embeddings than ones created by a general-purpose embedder. Embedding models with large latent dimensions size can be more accurate but heavier and slower.
    - Prompt design - as with all LLMs better prompts can yield better results.
3. How does the choice of vector database and embedding model impact system performance?
    - The vector database impacts the retrieval time performance and resource consumption.
    - Embedding model can impact the quality of retrieved context text. A model trained on tasks specific to the target domain of the RAG system can give better results.
4. What are the main challenges in implementing a production-ready RAG system?
    - New data - some fields in which a RAG system can be used require it to have knowledge about the most recent trends and events. A RAG system used in news needs to have the newest information. The system has to be updated.
    - Scalability - If the system has many users it needs to be scalable. The system needs to be very efficient and optimized.
5. How can the system be improved to handle complex queries requiring multiple document lookups?
    - Multiple retrievals can be done for one query, each one adding more to the context of their combination.
    - The query can be broken down into chunks or converted. Some models could be able to identify the information they lack know and create a set of sub-queries.