In [None]:
!pip install llama-index
!pip install llama-index-llms-huggingface
!pip install llama-index-embeddings-huggingface

# Build a RAG application

In [4]:
#Libs for RAG system
from llama_index.llms.huggingface import HuggingFaceLLM
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import Settings
from llama_index.core.prompts.prompts import SimpleInputPrompt

#Libs for Data indexing
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

system_prompt = "You are a Q&A assistant. Your goal is to answer questions as accurately as possible based on the context provided."
query_wrapper_prompt = SimpleInputPrompt("<|USER|>{query_str}<|ASSISTANT|>")
type(query_wrapper_prompt)

llama_index.core.prompts.base.PromptTemplate

**Create a large language model**

In [5]:
llm = HuggingFaceLLM(
    context_window=2048,
    max_new_tokens=256,
    generate_kwargs={"temperature": 0.25, "do_sample": False},
    system_prompt=system_prompt,
    query_wrapper_prompt=query_wrapper_prompt,
    tokenizer_name="TinyLlama/TinyLlama-1.1B-Chat-v1.0",
    model_name="TinyLlama/TinyLlama-1.1B-Chat-v1.0",
)
type(llm)

llama_index.llms.huggingface.base.HuggingFaceLLM

**Save settings**

In [6]:
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
Settings.llm = llm
Settings.embed_model = embed_model

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/94.8k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/743 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/133M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

# Get document data

In [7]:
!mkdir -p 'data/'
!wget 'https://arxiv.org/pdf/2401.11963.pdf' -O 'data/paper.pdf'

--2024-08-20 10:02:33--  https://arxiv.org/pdf/2401.11963.pdf
Resolving arxiv.org (arxiv.org)... 151.101.67.42, 151.101.3.42, 151.101.195.42, ...
Connecting to arxiv.org (arxiv.org)|151.101.67.42|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: http://arxiv.org/pdf/2401.11963 [following]
--2024-08-20 10:02:33--  http://arxiv.org/pdf/2401.11963
Connecting to arxiv.org (arxiv.org)|151.101.67.42|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 672616 (657K) [application/pdf]
Saving to: 'data/paper.pdf'


2024-08-20 10:02:33 (32.6 MB/s) - 'data/paper.pdf' saved [672616/672616]



**Start to index all documents**

In [8]:
documents = SimpleDirectoryReader("./data/").load_data()
index = VectorStoreIndex.from_documents(documents)

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

# Time to Query

In [10]:
query_engine = index.as_query_engine()
type(query_engine)

llama_index.core.query_engine.retriever_query_engine.RetrieverQueryEngine

In [18]:
response = query_engine.query("What are Evolutionary Algorithms?")
print(response)

Batches:   0%|          | 0/1 [00:00<?, ?it/s]


Evolutionary Algorithms (EAs) are a class of biologically inspired gradient-free optimization methods that emulate biological evolution processes. They are based on the principles of natural selection, mutation, and crossover, and are designed to solve complex optimization problems by evolving a population of solutions. EAs can be used in various fields, including machine learning, robotics, and finance.


In [21]:
response.source_nodes[0].metadata

{'page_label': '17',
 'file_name': 'paper.pdf',
 'file_path': '/kaggle/working/data/paper.pdf',
 'file_type': 'application/pdf',
 'file_size': 672616,
 'creation_date': '2024-08-20',
 'last_modified_date': '2024-06-25'}

In [19]:
response = query_engine.query("Why are Interpretable AI important?")
print(response)

Batches:   0%|          | 0/1 [00:00<?, ?it/s]


Interpretable AI is important because it allows for the understanding and analysis of complex systems, which is crucial for making informed decisions and improving the efficiency and effectiveness of AI systems. By providing insights into the behavior of AI systems, interpretable AI can help to identify potential issues and optimize the performance of AI systems. This can lead to more effective and efficient AI systems that can better serve the needs of society. Additionally, interpretable AI can help to promote transparency and accountability in AI systems, as it allows for the public to understand the reasoning behind decisions made by AI systems. This can help to build trust in AI systems and promote responsible AI use. Overall, interpretable AI is a critical component of a comprehensive approach to AI that aims to ensure that AI systems are designed and operated in a way that is both safe and effective.


In [27]:
response = query_engine.query("Please summarize the comprehensive survey on hybrid algorithms presented in this document?")
print(response)

Batches:   0%|          | 0/1 [00:00<?, ?it/s]


The comprehensive survey on hybrid algorithms presented in this document aims to furnish a more systematic and comprehensive survey to fill the gap in the literature. The survey covers various research directions, including EA-assisted optimization of RL, RL-assisted optimization of EA, and Synergistic optimization of EA and RL. The survey also presents fundamental issues to be addressed and the related algorithms for each branch. The survey provides an in-depth analysis of the fundamental issues to be addressed and the related algorithms for each branch. The survey further subdivides the works into dissimilar research branches, outlines the specific problems they address, and proposes potential research directions to address these challenges. The survey concludes with a summary of open challenges and discusses potential future directions.


# Double check on the document (a scientific paper)

In [28]:
# print(documents)