#Monster API LLM Integration into LLamaIndex

MonsterAPI Hosts wide range of popular LLMs as inference service and this notebook serves as a tutorial about how to use llama-index to access MonsterAPI LLMs.


Check us out here: https://monsterapi.ai/


Install Required Libraries

In [3]:
!python3 -m pip install llama-index --quiet
!python3 -m pip install monsterapi --quiet
!python3 -m pip install sentence_transformers --quiet

Import required modules

In [16]:
import os

from llama_index.llms import MonsterLLM
from llama_index import VectorStoreIndex, SimpleDirectoryReader, ServiceContext
from llama_index.embeddings import LangchainEmbedding
from langchain.embeddings import HuggingFaceEmbeddings
from sentence_transformers import SentenceTransformer

Set Monster API Key env variable

In [17]:
os.environ["MONSTER_API_KEY"] = "{}"

## Basic Usage Pattern

Set the model

In [18]:
model = "llama2-7b-chat"

Initiate LLM module and call complete method with input prompt

In [19]:
llm = MonsterLLM(model = model, temperature = 0.75)
result = llm.complete("Who are you?")
print(result)

 Hello! I'm just an AI assistant, here to help you with any questions or tasks you may have. My purpose is to provide helpful and respectful responses that are both safe and positive in nature. I strive to be socially unbiased and free from harmful content, ensuring that my answers promote inclusivity and diversity. If a question does not make sense or is factually incorrect, I will explain why instead of providing false information. Please feel free to ask me anything, and I'll do my best to assist you!


##RAG Approach to import external knowledge into LLM as context

Source Paper: https://arxiv.org/pdf/2005.11401.pdf

Retrieval-Augmented Generation (RAG) is a method that uses a combination of pre-defined rules or parameters (non-parametric memory) and external information from the internet (parametric memory) to generate responses to questions or create new ones. By lever

Install pypdf library needed to install pdf parsing library

In [20]:
!python3 -m pip install pypdf --quiet

Lets try to augment our LLM with RAG source paper PDF as external information.
Lets download the pdf into data dir

In [21]:
!rm -r ./data&&mkdir -p data&&cd data&&curl 'https://arxiv.org/pdf/2005.11401.pdf' -o "RAG.pdf"

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  864k  100  864k    0     0   598k      0  0:00:01  0:00:01 --:--:--  599k


Load the document

In [22]:
documents = SimpleDirectoryReader("./data").load_data()

Initiate LLM and Embedding Model

In [23]:
llm = MonsterLLM(model = model, temperature = 0.75, context_window=1024
)
embed_model = LangchainEmbedding(HuggingFaceEmbeddings())
service_context = ServiceContext.from_defaults(
    chunk_size=1024, llm=llm, embed_model=embed_model
)

Create embedding store and create index

In [24]:
index = VectorStoreIndex.from_documents(documents, service_context=service_context)
query_engine = index.as_query_engine()

Actual LLM output without RAG:

In [25]:
llm.complete("What is Retrieval-Augmented Generation?")

CompletionResponse(text=" Thank you for asking! Retrieval-Augmented Generation (RAG) is a machine learning technique that combines the strengths of two popular AI models: Generative Models and Language Retrieval Systems.\nGenerative Models, such as Generative Adversarial Networks (GANs) or Variational Autoencoders (VAEs), are designed to generate new, synthetic data that resembles existing examples. In the context of text generation, these models can produce coherent and often realistic sentences or paragraphs. However, they may struggle with generating novel ideas or exploring complex topics beyond what they have been trained on.\nLanguage Retrieval Systems, on the other hand, are designed to retrieve relevant information from a large dataset given a query or prompt. These systems use various techniques like keyword extraction, entity recognition, and semantic search to identify the most relevant passages or sentences in response to a user's request. While language retrieval systems a

LLM Output with RAG

In [26]:
response = query_engine.query("What is Retrieval-Augmented Generation?")
print(response)

 Thank you for providing additional context! Based on the information provided, I can refine the original answer to better address your question. Here's my revised response:
Retrieval-Augmented Generation (RAG) is a technique used in Natural Language Processing (NLP) that leverages pre-trained language models like BART and FAISS to improve the quality of generated text. The basic idea behind RAG is to use a parameterized memory component, such as BART, to store and retrieve knowledge from a large corpus of text, which can then be utilized to generate new text that is similar in style and content to the original training data.
In more detail, when generating text using RAG, the model first generates incomplete or partial sentences or phrases, and then uses the stored knowledge in the parameterized memory component to complete them. This process helps guide the generation, drawing out specific knowledge stored in the parametric memory, resulting in more accurate and informative responses