# Retrieval Augmented Generation using ChromaDB and FALCON_7B

## Overview

Navigating through tokenization to intelligent query processing, this documentation unveils a structured approach to text management and model utilization. Initially, it leverages `tiktoken` for precise tokenization and employs strategic text splitters to ensure optimal text segmentation. Subsequently, a selection of pre-trained transformer models like `SBERT MPNet` and `FALCON_7B` are integrated and configured via meticulously crafted functions and a flexible configuration dictionary. The process culminates by intertwining embedding retrievers with initialized Language Models, establishing a retrieval-based Question Answering system that adeptly navigates user queries, showcasing a judicious amalgamation of structured text management and intelligent data querying in Natural Language Processing applications.



## 1. Tokenization and Document Splitting

### Token Counting
A function named `num_tokens_from_string` utilizes `tiktoken` to calculate and return the number of tokens in a given string. It accepts the text and an encoding name as input arguments, using them to encode the text and return its token length.

### Text Splitting
- `TokenTextSplitter`: Splits texts into chunks with specified sizes and overlaps.
- `RecursiveCharacterTextSplitter`: Further divides texts with considerations for character count, overlap, and potential additional metadata.
  
Both splitters aim to break down text into manageable sizes for subsequent processing, ensuring that models can handle them within their token limits.

## 2. Model Definitions and Setup

The code incorporates various pre-trained transformer models for embeddings and Language Model (LM) generation. Model identifiers and a caching directory are specified at the beginning of this section.

### Models Used
- `EMB_SBERT_MPNET_BASE`: Sentence transformer model for embeddings.
- `EMB_INSTRUCTOR_XL`: Not utilized in the provided code.
- `LLM_FALCON_7B` and `LLM_FALCON_40B`: Pre-trained transformer models for text generation.

### Cache Directory Setup
The cache directory (`/work/rc/projects/chatbot/models`) is set in the environment variables to store downloaded model weights, ensuring they are readily available for subsequent runs.

## 3. Model Creators

The code defines several functions to create models and pipelines, notably:
- `create_sbert_mpnet()`: Initializes the SBERT MPNet model.
- `create_falcon_40b_instruct()` and `create_falcon_7b_instruct()`: Set up models for text generation via Hugging Face’s pipeline, configuring tokenizers and various model arguments.
- `create_flan_t5_base()`: Sets up a T5 model pipeline from Google for text-to-text generation.

These functions handle the instantiation and configuration of the models, ensuring they are set up with the appropriate parameters and caching.

## 4. Model and Pipeline Configuration 

A configuration dictionary `config` holds keys for adjusting model parameters and selection. Depending on this configuration:
- The corresponding embedding model is initialized.
- One of the LLM models (Falcon or T5) is chosen and instantiated based on the specified parameters.

## 5. Data Processing and Question Answering Setup 

Here, a sample data string `data` is defined and split into documents using the earlier mentioned text splitter. Then, it sets up the embeddings and retrieval-based Question Answering (QA) system. 

### RetrievalQA Setup
- An instance of `HuggingFacePipeline` is initialized using the previously created Language Models.
- The embedding model's retriever is configured.
- The QA model is built using the retriever and the pipeline.


In [1]:
#!pip install pinecone-client

In [2]:
#!pip install langchain

In [3]:
#!pip install tiktoken

In [4]:
#!pip install cohere

In [5]:
#!pip install openai

In [6]:
#!pip install chromadb

In [7]:
import torch

In [8]:
import langchain

In [9]:
import os
import getpass

os.environ["COHERE_API_KEY"] = getpass.getpass("Cohere API Key:")

Cohere API Key: ········································


In [10]:
os.environ["OPENAI_API_KEY"] = getpass.getpass("OPENAI API Key:")

OPENAI API Key: ···················································


In [11]:
from langchain.embeddings import HuggingFaceEmbeddings




In [12]:
from langchain.document_loaders import WebBaseLoader


urls = ["https://rc-docs.northeastern.edu/en/latest/welcome/index.html",
"https://rc-docs.northeastern.edu/en/latest/welcome/welcome.html",
"https://rc-docs.northeastern.edu/en/latest/welcome/services.html",
"https://rc-docs.northeastern.edu/en/latest/welcome/gettinghelp.html",
"https://rc-docs.northeastern.edu/en/latest/welcome/introtocluster.html",
"https://rc-docs.northeastern.edu/en/latest/welcome/casestudiesandtestimonials.html",
"https://rc-docs.northeastern.edu/en/latest/gettingstarted/index.html",
"https://rc-docs.northeastern.edu/en/latest/gettingstarted/get_access.html",
"https://rc-docs.northeastern.edu/en/latest/gettingstarted/accountmanager.html",
"https://rc-docs.northeastern.edu/en/latest/gettingstarted/connectingtocluster/index.html",
"https://rc-docs.northeastern.edu/en/latest/gettingstarted/connectingtocluster/mac.html",
"https://rc-docs.northeastern.edu/en/latest/gettingstarted/connectingtocluster/windows.html",
"https://rc-docs.northeastern.edu/en/latest/first_steps/index.html",
"https://rc-docs.northeastern.edu/en/latest/first_steps/passwordlessssh.html",
"https://rc-docs.northeastern.edu/en/latest/first_steps/shellenvironment.html",
"https://rc-docs.northeastern.edu/en/latest/first_steps/usingbash.html",
"https://rc-docs.northeastern.edu/en/latest/hardware/index.html",
"https://rc-docs.northeastern.edu/en/latest/hardware/hardware_overview.html",
"https://rc-docs.northeastern.edu/en/latest/hardware/partitions.html",
"https://rc-docs.northeastern.edu/en/latest/using-ood/index.html",
"https://rc-docs.northeastern.edu/en/latest/using-ood/introduction.html",
"https://rc-docs.northeastern.edu/en/latest/using-ood/accessingood.html",
"https://rc-docs.northeastern.edu/en/latest/using-ood/interactiveapps/index.html",
"https://rc-docs.northeastern.edu/en/latest/using-ood/interactiveapps/desktopood.html",
"https://rc-docs.northeastern.edu/en/latest/using-ood/interactiveapps/fileexplore.html",
"https://rc-docs.northeastern.edu/en/latest/using-ood/interactiveapps/jupyterlab.html",
"https://rc-docs.northeastern.edu/en/latest/runningjobs/index.html",
"https://rc-docs.northeastern.edu/en/latest/runningjobs/understandingqueuing.html",
"https://rc-docs.northeastern.edu/en/latest/runningjobs/jobscheduling.html",
"https://rc-docs.northeastern.edu/en/latest/runningjobs/jobscheduling.html",
"https://rc-docs.northeastern.edu/en/latest/runningjobs/workingwithgpus.html",
"https://rc-docs.northeastern.edu/en/latest/runningjobs/recurringjobs.html",
"https://rc-docs.northeastern.edu/en/latest/runningjobs/debuggingjobs.html",
"https://rc-docs.northeastern.edu/en/latest/datamanagement/index.html",
"https://rc-docs.northeastern.edu/en/latest/datamanagement/discovery_storage.html",
"https://rc-docs.northeastern.edu/en/latest/datamanagement/transferringdata.html",
"https://rc-docs.northeastern.edu/en/latest/datamanagement/globus.html",
"https://rc-docs.northeastern.edu/en/latest/datamanagement/databackup.html",
"https://rc-docs.northeastern.edu/en/latest/datamanagement/securityandcompliance.html",
"https://rc-docs.northeastern.edu/en/latest/software/index.html",
"https://rc-docs.northeastern.edu/en/latest/software/systemwide/index.html",
"https://rc-docs.northeastern.edu/en/latest/software/systemwide/modules.html",
"https://rc-docs.northeastern.edu/en/latest/software/systemwide/mpi.html",
"https://rc-docs.northeastern.edu/en/latest/software/systemwide/r.html",
"https://rc-docs.northeastern.edu/en/latest/software/systemwide/matlab.html",
"https://rc-docs.northeastern.edu/en/latest/software/packagemanagers/index.html",
"https://rc-docs.northeastern.edu/en/latest/software/packagemanagers/conda.html",
"https://rc-docs.northeastern.edu/en/latest/software/packagemanagers/spack.html",
"https://rc-docs.northeastern.edu/en/latest/software/fromsource/index.html",
"https://rc-docs.northeastern.edu/en/latest/software/fromsource/makefile.html",
"https://rc-docs.northeastern.edu/en/latest/software/fromsource/cmake.html",
"https://rc-docs.northeastern.edu/en/latest/slurmguide/index.html",
"https://rc-docs.northeastern.edu/en/latest/slurmguide/introductiontoslurm.html",
"https://rc-docs.northeastern.edu/en/latest/slurmguide/slurmcommands.html",
"https://rc-docs.northeastern.edu/en/latest/slurmguide/slurmrunningjobs.html",
"https://rc-docs.northeastern.edu/en/latest/slurmguide/slurmmonitoringandmanaging.html",
"https://rc-docs.northeastern.edu/en/latest/slurmguide/slurmscripts.html",
"https://rc-docs.northeastern.edu/en/latest/slurmguide/slurmarray.html",
"https://rc-docs.northeastern.edu/en/latest/slurmguide/slurmbestpractices.html",
"https://rc-docs.northeastern.edu/en/latest/classroom/index.html",
"https://rc-docs.northeastern.edu/en/latest/classroom/class_use.html",
"https://rc-docs.northeastern.edu/en/latest/classroom/cps_ood.html",
"https://rc-docs.northeastern.edu/en/latest/classroom/classroomexamples.html",
"https://rc-docs.northeastern.edu/en/latest/best-practices/index.html",
"https://rc-docs.northeastern.edu/en/latest/best-practices/homequota.html",
"https://rc-docs.northeastern.edu/en/latest/best-practices/checkpointing.html",
"https://rc-docs.northeastern.edu/en/latest/best-practices/optimizingperformance.html",
"https://rc-docs.northeastern.edu/en/latest/best-practices/software.html",
"https://rc-docs.northeastern.edu/en/latest/tutorialsandtraining/index.html",
"https://rc-docs.northeastern.edu/en/latest/tutorialsandtraining/canvasandgithub.html",
"https://rc-docs.northeastern.edu/en/latest/faq.html",
"https://rc-docs.northeastern.edu/en/latest/glossary.html",
]
loader = WebBaseLoader(urls)
data = loader.load()



In [13]:
import tiktoken
encoding_name = tiktoken.get_encoding("cl100k_base")
def num_tokens_from_string(string: str, encoding_name: str) -> int:
    """Returns the number of tokens in a text string."""
    encoding = tiktoken.get_encoding(encoding_name)
    num_tokens = len(encoding.encode(string))
    return num_tokens

In [14]:
from langchain.text_splitter import TokenTextSplitter
text_splitter = TokenTextSplitter(chunk_size=500, chunk_overlap=25)
docs = text_splitter.split_documents(data)

'''
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 700,
    chunk_overlap  = 70,
    length_function = len,
    add_start_index = True,
)
docs = text_splitter.create_documents([data])

for idx, text in enumerate(docs):
    docs[idx].metadata['source'] = "RCDocs"
'''

'\nfrom langchain.text_splitter import RecursiveCharacterTextSplitter\n\ntext_splitter = RecursiveCharacterTextSplitter(\n    chunk_size = 700,\n    chunk_overlap  = 70,\n    length_function = len,\n    add_start_index = True,\n)\ndocs = text_splitter.create_documents([data])\n\nfor idx, text in enumerate(docs):\n    docs[idx].metadata[\'source\'] = "RCDocs"\n'

In [15]:
type(docs[0])

langchain.schema.document.Document

In [16]:
docs[0]

Document(page_content='\n\n\n\n\n\n\nResearch Computing - RC RTD\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nContents\n\n\n\n\n\nMenu\n\n\n\n\n\n\n\nExpand\n\n\n\n\n\nLight mode\n\n\n\n\n\n\n\n\n\n\n\n\n\nDark mode\n\n\n\n\n\n\nAuto light/dark mode\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nHide navigation sidebar\n\n\nHide table of contents sidebar\n\n\n\n\n\nToggle site navigation sidebar\n\n\n\n\nRC RTD\n\n\n\n\nToggle Light / Dark / Auto color theme\n\n\n\n\n\n\nToggle table of contents sidebar\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nResearch ComputingToggle child pages in navigation\nWelcome\nServices We Provide\nGetting Help\nIntroduction to HPC and Slurm\nCase Studies and User Testimonials\n\n\n\n\nGetting StartedToggle child pages in navigation\nGetting Access\nAccount Manager\nConnecting To ClusterToggle child pages in navigation\nMac\nWindows\n\n\n\n\nFirst StepsToggle child pages in navigation\nPasswordless SSH\nShell Environment on the Cluster\nCluster via Command-Line\n\n\n\nUser Guides\n

In [17]:
EMB_INSTRUCTOR_XL = "hkunlp/instructor-xl"
EMB_SBERT_MPNET_BASE = "sentence-transformers/all-mpnet-base-v2"


In [18]:
LLM_FLAN_T5_XXL = "google/flan-t5-xxl"
LLM_FLAN_T5_XL = "google/flan-t5-xl"
LLM_FASTCHAT_T5_XL = "lmsys/fastchat-t5-3b-v1.0"
LLM_FLAN_T5_SMALL = "google/flan-t5-small"
LLM_FLAN_T5_BASE = "google/flan-t5-base"
LLM_FLAN_T5_LARGE = "google/flan-t5-large"
LLM_FALCON_7B = "tiiuae/falcon-7b-instruct"
LLM_FALCON_40B = "tiiuae/falcon-40b-instruct"

In [19]:
cache_dir='/work/rc/projects/chatbot/models'


In [20]:
config = {"persist_directory":None,
          "load_in_8bit":False,
          "embedding" : EMB_SBERT_MPNET_BASE,
          "llm":LLM_FALCON_7B,
          }

In [21]:
    
import os
os.environ['TRANSFORMERS_CACHE'] = '/work/rc/projects/chatbot/models'
#cache_folder=os.getenv('SENTENCE_TRANSFORMERS_HOME')
os.environ['SENTENCE_TRANSFORMERS_HOME'] = '/work/rc/projects/chatbot/models'

In [22]:
'''
def create_sbert_mpnet():
        device = "cuda" if torch.cuda.is_available() else "cpu"
        return HuggingFaceEmbeddings(model_name=EMB_SBERT_MPNET_BASE, model_kwargs={"device": device})

'''

def create_sbert_mpnet():
        device = "cuda" if torch.cuda.is_available() else "cpu"
        return HuggingFaceEmbeddings(model_name=EMB_SBERT_MPNET_BASE, cache_folder=cache_dir, model_kwargs={"device": device})




#tokenizer = AutoTokenizer.from_pretrained("roberta-base", cache_dir="new_cache_dir/")

#model = AutoModelForMaskedLM.from_pretrained("roberta-base", cache_dir="new_cache_dir/")


In [23]:
from transformers import AutoTokenizer
from transformers import pipeline

2023-10-13 20:41:09.541010: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-10-13 20:41:09.593162: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [24]:
def create_falcon_40b_instruct(load_in_8bit=False):
        model = LLM_FALCON_40B

        tokenizer = AutoTokenizer.from_pretrained(model , cache_dir=cache_dir)
        hf_pipeline = pipeline(
                task="text-generation",
                model = model,
                do_sample=True,
                tokenizer = tokenizer,
                #trust_remote_code = True,
                max_new_tokens=100,
                #cache_dir=cache_dir,
                model_kwargs={
                    "device_map": "auto", 
                    "load_in_8bit": load_in_8bit, 
                    "max_length": 512, 
                    "temperature": 0.01,
                    
                    "torch_dtype":torch.bfloat16,
                    }
            )
        return hf_pipeline


In [25]:
def create_falcon_7b_instruct(load_in_8bit=False):
        model = LLM_FALCON_7B

        tokenizer = AutoTokenizer.from_pretrained(model , cache_dir=cache_dir)
        hf_pipeline = pipeline(
                task="text-generation",
                model = model,
                do_sample=True,
                tokenizer = tokenizer,
                #trust_remote_code = True,
                max_new_tokens=100,
                #cache_dir=cache_dir,
                model_kwargs={
                    "device_map": "auto", 
                    "load_in_8bit": load_in_8bit, 
                    "max_length": 512, 
                    "temperature": 0.01,
                    
                    "torch_dtype":torch.bfloat16,
                    }
            )
        return hf_pipeline



In [26]:

def create_flan_t5_base(load_in_8bit=False):
        # Wrap it in HF pipeline for use with LangChain
        model="google/flan-t5-base"
        tokenizer = AutoTokenizer.from_pretrained(model, cache_dir=cache_dir)
        return pipeline(
            task="text2text-generation",
            model=model,
            tokenizer = tokenizer,
            max_new_tokens=100,
            model_kwargs={"device_map": "auto", "load_in_8bit": load_in_8bit, "max_length": 512, "temperature": 0.}
        )
        
'''
 WARNING: You are currently loading Falcon using legacy code contained in the model repository. 
 Falcon has now been fully ported into the Hugging Face transformers library. 
 For the most up-to-date and high-performance version of the Falcon model code, 
 please update to the latest version of transformers and then load the model without the trust_remote_code=True argument.
'''




In [27]:
if config["embedding"] == EMB_SBERT_MPNET_BASE:
    embedding = create_sbert_mpnet()

In [28]:
load_in_8bit = config["load_in_8bit"]
if config["llm"] == LLM_FLAN_T5_BASE:
    llm = create_flan_t5_base(load_in_8bit=load_in_8bit)

In [29]:
load_in_8bit = config["load_in_8bit"]

if config["llm"] == LLM_FALCON_40B:
    llm = create_falcon_40b_instruct(load_in_8bit=load_in_8bit)
    


In [30]:
load_in_8bit = config["load_in_8bit"]

if config["llm"] == LLM_FALCON_7B:
    llm = create_falcon_7b_instruct(load_in_8bit=load_in_8bit)
    





A Jupyter Widget

In [31]:
import tiktoken
encoding_name = tiktoken.get_encoding("cl100k_base")
def num_tokens_from_string(string: str, encoding_name: str) -> int:
    """Returns the number of tokens in a text string."""
    encoding = tiktoken.get_encoding(encoding_name)
    num_tokens = len(encoding.encode(string))
    return num_tokens

In [32]:
from langchain.text_splitter import TokenTextSplitter
text_splitter = TokenTextSplitter(chunk_size=500, chunk_overlap=25)
docs = text_splitter.split_documents(data)

In [33]:
#from langchain.vectorstores import Chroma



In [34]:
#persist_directory = config["persist_directory"]
#vectordb = Chroma.from_documents(documents=texts, embedding=embedding, persist_directory=persist_directory)

In [35]:
#Local mode, without using the Qdrant server, may also store your vectors on disk so they're persisted between runs.
from langchain.vectorstores import Qdrant

Qdrantdb = Qdrant.from_documents(
    docs,
    embedding,
    path="/work/rc/projects/chatbot/chatbotrc/notebooks/RAG/tmp/local_qdrant",
    collection_name="RC_documents",
)

In [36]:
from langchain.schema.vectorstore import VectorStoreRetriever
retriever = VectorStoreRetriever(vectorstore=Qdrantdb, search_type="mmr", search_kwargs={'k': 4, 'fetch_k': 10},)



In [37]:
from langchain.llms import HuggingFacePipeline
from langchain.chains import RetrievalQA
from langchain.prompts.chat import (
    ChatPromptTemplate,
    SystemMessagePromptTemplate,
    HumanMessagePromptTemplate,
    PromptTemplate,
)
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import CohereRerank


In [38]:
from langchain.schema.vectorstore import VectorStoreRetriever
retriever = VectorStoreRetriever(vectorstore=Qdrantdb, search_type="mmr", search_kwargs={'k': 4, 'fetch_k': 10},)


In [39]:
compressor = CohereRerank() #LLMChainExtractor,LLMChainFilter,EmbeddingsFilter
# will iterate over the initially returned documents and extract from each only the content that is relevant to the query.

In [40]:
# Set up cohere's reranker
''' instead of immediately returning retrieved documents as-is, 
you can compress them using the context of the given query, so that only the relevant information is returned. '''
reranker = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=retriever
)



In [41]:
from langchain.memory import ConversationTokenBufferMemory



In [42]:
hf_llm = HuggingFacePipeline(pipeline=llm)



In [43]:
'''
ConversationTokenBufferMemory keeps a buffer of recent interactions in memory,
and uses token length rather than number of interactions to determine when to flush interactions.
'''
memory = ConversationTokenBufferMemory(llm=hf_llm,memory_key="chat_history", return_messages=True,input_key='question',max_token_limit=1000)

In [44]:
from langchain.prompts.chat import (
    ChatPromptTemplate,
    SystemMessagePromptTemplate,
    HumanMessagePromptTemplate,
)

from langchain import PromptTemplate

CONDENSE_QUESTION_PROMPT = '''
Below is a summary of the conversation so far, and a new question asked by the user that needs to be answered by searching in a knowledge base.
Generate a search query based on the conversation and the new question.

Chat History:
{chat_history}

Question:
{question}

Search query:
'''



PromptTemplates = PromptTemplate(
    input_variables=["chat_history", "question"],
    template="""
Below is a summary of the conversation so far, and a new question asked by the user that needs to be answered by searching in a knowledge base.
Generate a search query based on the conversation and the new question.

Chat History:
{chat_history}

Question:
{question}

Search query:"""
)


system_message_prompt = SystemMessagePromptTemplate(prompt=PromptTemplates)

chat_prompt_for_ques = ChatPromptTemplate.from_messages(
    [system_message_prompt])




In [45]:
from langchain.chains import LLMChain



In [46]:
question_generator = LLMChain(llm=hf_llm, prompt=chat_prompt_for_ques, verbose=True)



In [47]:
Answer_Generator_Prompt= '''
<Instructions>
Important:
Answer with the facts listed in the list of sources below. If there isn't enough information below, say you don't know.
If asking a clarifying question to the user would help, ask the question.
ALWAYS return a "SOURCES" part in your answer, except for small-talk conversations.

Question: {question}
Sources:
---------------------
    {summaries}
---------------------

Chat History:
{chat_history}
'''

In [49]:
from langchain.chains.qa_with_sources import load_qa_with_sources_chain

chat_prompt = PromptTemplate(template=Answer_Generator_Prompt, input_variables=["question", "summaries","chat_history"])

answer_chain = load_qa_with_sources_chain(hf_llm, chain_type="stuff", verbose=True,prompt=chat_prompt)



In [50]:
from langchain.chains import ConversationalRetrievalChain


chain = ConversationalRetrievalChain(
            retriever=reranker,
            question_generator=question_generator,
            combine_docs_chain=answer_chain,
            verbose=True,
            memory=memory,
            rephrase_question=False
)



In [51]:
query = "What is the Scheduling Policies for HPC cluster?"
result = chain({"question": query})


print("Question from user : " , query ,"\n")
print("Reply from ChatBot : " , result['answer'])







[1m> Entering new ConversationalRetrievalChain chain...[0m


Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.
The current implementation of Falcon calls `torch.scaled_dot_product_attention` directly, this will be deprecated in the future in favor of the `BetterTransformer` API. Please install the latest optimum library with `pip install -U optimum` and call `model.to_bettertransformer()` to benefit from `torch.scaled_dot_product_attention` and future performance optimizations.




[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
<Instructions>
Important:
Answer with the facts listed in the list of sources below. If there isn't enough information below, say you don't know.
If asking a clarifying question to the user would help, ask the question.
ALWAYS return a "SOURCES" part in your answer, except for small-talk conversations.

Question: What is the Scheduling Policies for HPC cluster?
Sources:
---------------------
    Content:  larger resource requirements may be assigned higher priority, as they require more significant resources to execute efficiently.


Walltime Limit#
Jobs with shorter estimated execution times may receive higher priority, ensuring they are executed promptly and freeing up resources for other jobs.



Balancing Policies#

Backfilling#
This policy allows smaller jobs to “backfill” into available resources ahead of larger jobs, optimizing resource utilizat

A Jupyter Widget

A Jupyter Widget

A Jupyter Widget

A Jupyter Widget

Question from user :  What is the Scheduling Policies for HPC cluster? 

Reply from ChatBot :  <p>
<p>
<p>
<p>
<p>
<p>
<p>
<p>
<p>
<p>
<p>
<p>
<p>
<p>
<p>
<p>
<p>
<p>
<p>
<p>
<p>
<p>
<p>
<p>
<p>



In [52]:
query = "How do I check Job Status?"
result = chain({"question": query})


print("Question from user : " , query ,"\n")
print("Reply from ChatBot : " , result['answer'])





Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.




[1m> Entering new ConversationalRetrievalChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mSystem: 
Below is a summary of the conversation so far, and a new question asked by the user that needs to be answered by searching in a knowledge base.
Generate a search query based on the conversation and the new question.

Chat History:

Human: What is the Scheduling Policies for HPC cluster?
Assistant: <p>
<p>
<p>
<p>
<p>
<p>
<p>
<p>
<p>
<p>
<p>
<p>
<p>
<p>
<p>
<p>
<p>
<p>
<p>
<p>
<p>
<p>
<p>
<p>
<p>


Question:
How do I check Job Status?

Search query:[0m

[1m> Finished chain.[0m


[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
<Instructions>
Important:
Answer with the facts listed in the list of sources below. If there isn't enough information below, say you don't know.
If asking a clarifying question to the user would help, ask the question.
ALWAYS 

Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.



[1m> Finished chain.[0m

[1m> Finished chain.[0m

[1m> Finished chain.[0m
Question from user :  How do I check Job Status? 

Reply from ChatBot :  <p>
<p>
<p>
<p>
<p>
<p>
<p>
<p>
<p>
<p>
<p>
<p>
<p>
<p>
<p>
<p>
<p>
<p>
<p>
<p>
<p>
<p>
<p>
<p>
<p>



In [53]:
memory

ConversationTokenBufferMemory(chat_memory=ChatMessageHistory(messages=[HumanMessage(content='What is the Scheduling Policies for HPC cluster?'), AIMessage(content='<p>\n<p>\n<p>\n<p>\n<p>\n<p>\n<p>\n<p>\n<p>\n<p>\n<p>\n<p>\n<p>\n<p>\n<p>\n<p>\n<p>\n<p>\n<p>\n<p>\n<p>\n<p>\n<p>\n<p>\n<p>\n'), HumanMessage(content='How do I check Job Status?'), AIMessage(content='<p>\n<p>\n<p>\n<p>\n<p>\n<p>\n<p>\n<p>\n<p>\n<p>\n<p>\n<p>\n<p>\n<p>\n<p>\n<p>\n<p>\n<p>\n<p>\n<p>\n<p>\n<p>\n<p>\n<p>\n<p>\n')]), input_key='question', return_messages=True, llm=HuggingFacePipeline(pipeline=<transformers.pipelines.text_generation.TextGenerationPipeline object at 0x2b705c0d87c0>), memory_key='chat_history', max_token_limit=1000)