# Large Language Model - Retrieval Augmented Generation


LLM RAG (Large Language Model Retrieval-Augmented Generation) is a technique that enhances large language models by incorporating information retrieval mechanisms. It involves retrieving relevant information from a database or document corpus and combining it with the original query to provide additional context. This augmented input is then processed by the large language model to generate more accurate and contextually relevant responses. LLM RAG offers benefits such as improved accuracy, access to up-to-date information, and better contextual understanding, making it useful for applications like question answering, summarization, and conversational AI.

In this notebook, we will create workflow to interact with a local LLM, first asking a question with out RAG, and then the same question with RAG. The output of the workflow consists of those replies.

The example was inspired by a [Medium post by Duy Huynh](https://medium.com/@vndee.huynh/build-your-own-rag-and-run-it-locally-langchain-ollama-streamlit-181d42805895) and using the following components:

 - [Ollama](https://python.langchain.com/v0.1/docs/integrations/llms/ollama/)
 - [Mistral LLM](https://docs.mistral.ai/)
 - [LangChain](https://python.langchain.com/v0.1/docs/integrations/llms/ollama/)
 - [Chroma Vector Storage](https://github.com/chroma-core/chroma)


## Container

An Apptainer image is used to install all the software pieces. The definition of the container can be found in `containers/llm-rag.def`. To make this example run quickly, an already built version of the image is downloaded.

## Compute Job

The compute job will run on a remote host, using a GPU. The job consists of two files, which both can be found in the `bin/` directory of this example:

  - `wrapper.sh` - this script picks a random port, starts an ollama instance on that port, and then runs the `llm-rag.py` script
  - `llm-rag.py` - this script uses `LangChain` to interact with the ollama instance. This is where you will find the loading of the additional data, and the prompts to the LLM.
  
In the script, we will first ask the LLM to describe itself:

```python
answer = model.invoke("Please tell me what kind of LLM you are, and describe what data you were trained on.").content
print(wrap(answer))
```

We will then ask out main question, but not using RAG. The quetion in this example is about the [IU Jetstream2](https://docs.jetstream-cloud.org/) resource, which is one of the target execution environments for this workflow:

```python
answer = model.invoke(f"{instructions_regular} Please provide a summary of the GPU capabilities of the IU Jetstream2 system. Include the instance flavors, and any details about the GPUs.").content
print(wrap(answer))
```

The script will then load the data for RAG, consisting of the full IU Jetstream2 documentation in PDF format, and then ask the same question again. At the end, we should be able to compare the quality of the answer without and with RAG, hopefully see a big improvement in the latter answer.

## Workflow

The Pegasus workflow is simple in this case. There is only one job, which takes the `llm-rag.py` script and `js2-documentation.pdf` as inputs, and produces `answers.txt` and `ollama.log` as outputs.

In [None]:
import os
import sys
import logging
import tarfile
import requests
import numpy as np
import pandas as pd
from pathlib import Path
from argparse import ArgumentParser
from datetime import datetime
from datetime import timedelta

# --- Import Pegasus API -----------------------------------------------------------
from Pegasus.api import *
logging.basicConfig(level=logging.DEBUG)

# --- Main workflow class ----------------------------------------------------------
class LLMRAGWorkflow():
    wf = None
    sc = None
    tc = None
    rc = None
    props = None

    dagfile = None
    wf_dir = None
    shared_scratch_dir = None
    local_storage_dir = None
    wf_name = "llm-rag"
    
    
    # --- Init ---------------------------------------------------------------------
    def __init__(self, dagfile="workflow.yml"):
        self.dagfile = dagfile
        self.wf_dir = str(Path(".").resolve())
        self.shared_scratch_dir = os.path.join(self.wf_dir, "scratch")
        self.local_storage_dir = os.path.join(self.wf_dir, "output")

    
    # --- Write files in directory -------------------------------------------------
    def write(self):
        if not self.sc is None:
            self.sc.write()
        self.props.write()
        self.rc.write()
        self.tc.write()
        
        try:
            self.wf.write()
        except PegasusClientError as e:
            print(e)


    # --- Plan and Submit the workflow ----------------------------------------------
    def plan_submit(self):
        try:
            self.wf.plan(submit=True)
        except PegasusClientError as e:
            print(e)
            
            
    # --- Get status of the workflow -----------------------------------------------
    def status(self):
        try:
            self.wf.status(long=True)
        except PegasusClientError as e:
            print(e)
            
    # --- Wait for the workflow to finish -----------------------------------------------
    def wait(self):
        try:
            self.wf.wait()
        except PegasusClientError as e:
            print(e)
            
    # --- Get statistics of the workflow -----------------------------------------------
    def statistics(self):
        try:
            self.wf.statistics()
        except PegasusClientError as e:
            print(e)
            

    # --- Configuration (Pegasus Properties) ---------------------------------------
    def create_pegasus_properties(self):
        self.props = Properties()
        self.props["pegasus.integrity.checking"] = "none"
        return


    # --- Site Catalog -------------------------------------------------------------
    def create_sites_catalog(self, exec_site_name="condorpool"):
        self.sc = SiteCatalog()

        local = (Site("local")
                    .add_directories(
                        Directory(Directory.SHARED_SCRATCH, self.shared_scratch_dir)
                            .add_file_servers(FileServer("file://" + self.shared_scratch_dir, Operation.ALL)),
                        Directory(Directory.LOCAL_STORAGE, self.local_storage_dir)
                            .add_file_servers(FileServer("file://" + self.local_storage_dir, Operation.ALL))
                    )
                )

        exec_site = (Site(exec_site_name)
                        .add_condor_profile(universe="vanilla")
                        .add_pegasus_profile(
                            style="condor"
                        )
                    )
        self.sc.add_sites(local, exec_site)
        

    # --- Transformation Catalog (Executables and Containers) ----------------------
    def create_transformation_catalog(self, exec_site_name="condorpool"):
        self.tc = TransformationCatalog()
        
        llm_rag_container = Container("llm_rag_container",
            container_type = Container.SINGULARITY,
            image = "https://download.pegasus.isi.edu/containers/llm-rag/llm-rag.sif",
            image_site = "web"
        )
        
        # main job wrapper
        # note how gpus and other resources are requested
        wrapper = Transformation("wrapper", 
                                 site="local", 
                                 pfn=self.wf_dir+"/bin/wrapper.sh", 
                                 is_stageable=True, 
                                 container=llm_rag_container)\
                  .add_pegasus_profiles(cores=4, gpus=1, memory="20 GB", diskspace="15 GB")
                  
        self.tc.add_containers(llm_rag_container)
        self.tc.add_transformations(wrapper)

    
    # --- Replica Catalog ----------------------------------------------------------
    def create_replica_catalog(self):
        self.rc = ReplicaCatalog()

        # Add inference dependencies
        self.rc.add_replica("local", "llm-rag.py", \
                                     os.path.join(self.wf_dir, "bin/llm-rag.py"))
        self.rc.add_replica("local", "js2-documentation.pdf", \
                                     os.path.join(self.wf_dir, "pdfs/js2-documentation.pdf"))
     

    # --- Create Workflow ----------------------------------------------------------
    def create_workflow(self):
        self.wf = Workflow(self.wf_name, infer_dependencies=True)
        
        llm_rag_py = File("llm-rag.py")
        pdf = File("js2-documentation.pdf")
        answers_txt = File("answers.txt")
        ollama_log = File("ollama.log")
        
        job = (Job("wrapper")
                  .add_inputs(llm_rag_py, pdf)
                  .add_outputs(answers_txt, stage_out=True)
                  .add_outputs(ollama_log, stage_out=True)
              )
        
        self.wf.add_jobs(job)

            
dagfile = 'workflow.yml'

workflow = LLMRAGWorkflow(dagfile=dagfile)

print("Creating execution sites...")
workflow.create_sites_catalog("condorpool")

print("Creating workflow properties...")
workflow.create_pegasus_properties()

print("Creating transformation catalog...")
workflow.create_transformation_catalog("condorpool")

print("Creating replica catalog...")
workflow.create_replica_catalog()

print("Creating workflow dag...")
workflow.create_workflow()

workflow.write()
print("Workflow has been generated!")

## Plan and Submit the Workflow

We will now plan and submit the workflow for execution. By default we are running jobs on site **condorpool** i.e the selected ACCESS resource.

In [None]:
workflow.plan_submit()

After the workflow has been successfully planned and submitted, you can use the Python `Workflow` object in order to monitor the status of the workflow. It shows in detail the counts of jobs of each status and also the whether the job is idle or running.

In [None]:
workflow.status()

## Wait for the workflow to finish, and then display the results

In [None]:
workflow.wait()

In [None]:
!cat output/answers.txt