#Monster API LLM Integration into LLamaIndex

MonsterAPI Hosts wide range of popular LLMs as inference service and this notebook serves as a tutorial about how to use llama-index to access MonsterAPI LLMs.


Check us out here: https://monsterapi.ai/


Install Required Libraries

In [1]:
#!python3 -m pip install llama-index --quiet
## Before merge update to use llama_index
!python3 -m pip install git+https://github.com/vikasqblocks/llama_index.git@monsterapi --no-cache
!python3 -m pip install monsterapi --quiet
!python3 -m pip install sentence_transformers --quiet

Import required modules

In [2]:
import os

from llama_index.llms import MonsterLLM
from llama_index import VectorStoreIndex, SimpleDirectoryReader, ServiceContext
from llama_index.embeddings import LangchainEmbedding
from langchain.embeddings import HuggingFaceEmbeddings
from sentence_transformers import SentenceTransformer

Set Monster API Key env variable

In [3]:
os.environ["MONSTER_API_KEY"] = "eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJ1c2VybmFtZSI6IjAzYjk3YmU0NjgxYWUyNWE4Y2NjYjM4NTVjOWVjZTcyIiwiY3JlYXRlZF9hdCI6IjIwMjMtMDgtMTlUMDg6NDA6MjcuMDkxNTM4In0.aR0pb4fB1riqRq06QP9-_Oq7PpwdiO8vFxta8tuH5Gc"

## Basic Usage Pattern

Set the model

In [4]:
model = "llama2-7b-chat"

Initiate LLM module and call complete method with input prompt

In [5]:
llm = MonsterLLM(model = model, temperature = 0.75)
result = llm.complete("Who are you?")
print(result)

 Hello! I'm just an AI assistant, here to help you with any questions or concerns you may have. My purpose is to provide helpful and informative responses while adhering to ethical standards and promoting positivity. I strive to be respectful, honest, and safe in my answers, and avoid any content that could be harmful, unethical, racist, sexist, toxic, dangerous, or illegal. If a question does not make sense or is not factually coherent, I will explain why instead of providing an incorrect answer. And if I don't know the answer to a question, I will politely let you know rather than sharing false information. Is there anything else I can assist you with?


##RAG Approach to import external knowledge into LLM as context

Source Paper: https://arxiv.org/pdf/2005.11401.pdf

Retrieval-Augmented Generation (RAG) is a method that uses a combination of pre-defined rules or parameters (non-parametric memory) and external information from the internet (parametric memory) to generate responses to questions or create new ones. By lever

Install pypdf library needed to install pdf parsing library

In [6]:
!python3 -m pip install pypdf --quiet

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/271.1 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m41.0/271.1 kB[0m [31m1.1 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m271.1/271.1 kB[0m [31m4.1 MB/s[0m eta [36m0:00:00[0m
[?25h

Lets try to augment our LLM with RAG source paper PDF as external information.
Lets download the pdf into data dir

In [9]:
!rm -r ./data
!mkdir -p data&&cd data&&curl 'https://arxiv.org/pdf/2005.11401.pdf' -o "RAG.pdf"

rm: cannot remove './data': No such file or directory
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  864k  100  864k    0     0   763k      0  0:00:01  0:00:01 --:--:--  764k


Load the document

In [10]:
documents = SimpleDirectoryReader("./data").load_data()

Initiate LLM and Embedding Model

In [11]:
llm = MonsterLLM(model = model, temperature = 0.75, context_window=1024
)
embed_model = LangchainEmbedding(HuggingFaceEmbeddings())
service_context = ServiceContext.from_defaults(
    chunk_size=1024, llm=llm, embed_model=embed_model
)

Downloading (…)a8e1d/.gitattributes:   0%|          | 0.00/1.18k [00:00<?, ?B/s]

Downloading (…)_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Downloading (…)b20bca8e1d/README.md:   0%|          | 0.00/10.6k [00:00<?, ?B/s]

Downloading (…)0bca8e1d/config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

Downloading (…)ce_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

Downloading (…)e1d/data_config.json:   0%|          | 0.00/39.3k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/438M [00:00<?, ?B/s]

Downloading (…)nce_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

Downloading (…)a8e1d/tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

Downloading (…)8e1d/train_script.py:   0%|          | 0.00/13.1k [00:00<?, ?B/s]

Downloading (…)b20bca8e1d/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)bca8e1d/modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

[nltk_data] Downloading package punkt to /tmp/llama_index...
[nltk_data]   Unzipping tokenizers/punkt.zip.


Create embedding store and create index

In [12]:
index = VectorStoreIndex.from_documents(documents, service_context=service_context)
query_engine = index.as_query_engine()

Actual LLM output without RAG:

In [13]:
llm.complete("What is Retrieval-Augmented Generation?")

CompletionResponse(text=' Retrieval-Augmented Generation (RAG) is a type of artificial intelligence (AI) technology that combines the capabilities of natural language generation (NLG) and language retrieval. It involves using a large, pre-trained language model to retrieve relevant information from a knowledge base or corpus, and then generating text based on that retrieved information.\nIn other words, RAG is a technique that leverages the strengths of both NLG and language retrieval to generate high-quality, informative texts. The process typically works as follows:\n1. Knowledge Retrieval: The AI system retrieves relevant information from a knowledge base or corpus using a language model. This could involve searching for keywords, phrases, or concepts related to the topic at hand.\n2. Text Generation: Once the relevant information has been retrieved, the AI system generates text based on that information. This may involve using templates, rules, or algorithms to structure the text i

LLM Output with RAG

In [14]:
response = query_engine.query("What is Retrieval-Augmented Generation?")
print(response)

 Thank you for providing additional context! Based on the information provided, it seems that Retrieval-Augmented Generation (RAG) is a machine learning approach in natural language processing that combines parametric and non-parametric memories to generate text. The approach uses retrieval-based methods to augment the model's knowledge base with specific information from external sources, which helps improve its performance in tasks such as open-domain question answering, Open-MSMarco, and Jeopardy question generation.
In more detail, RAG works by using a parametric memory to store general knowledge about a topic or domain, and then augmenting this memory with specific information retrieved through retrieval-based methods. This allows the model to generate more informative and accurate responses than it would be able to on its own. The use of both parametric and non-parametric components helps guide the generation process, resulting in better performance overall.
It appears that the a