# **FLARE** 
 FLARE (Forward-Looking Active Retrieval Augmented Generation)

* **Deciding when to retrieve** — They claim that LLMs should retrieve only when they are uncertain in their prediction. Assuming LLMs are well calibrated, meaning that the probability they predict, is actually how unsure they are about the prediction, they adopt an active retrieval strategy to retrieve only when the generated tokens have a low probability.
* **What to retrieve** — They say it is important to consider what LMs intend to generate in the future, as the goal of active retrieval is to benefit future generations. Therefore, they propose anticipating the future by generating a temporary next sentence, using it as a query to retrieve relevant documents, and then regenerating the next sentence conditioning on the retrieved documents.

## Installing all the Libraries

In [1]:
!pip install openai==0.28
!pip install lancedb
!pip install langchain==0.0.354
!pip install openai
!pip install tiktoken
!pip install sentence_transformers
!pip install arxiv
!pip install pymupdf
!pip install gradio
!pip install ArxivLoader

Collecting openai
  Downloading openai-1.3.2-py3-none-any.whl (220 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m220.3/220.3 kB[0m [31m6.1 MB/s[0m eta [36m0:00:00[0m
Collecting httpx<1,>=0.23.0 (from openai)
  Downloading httpx-0.25.1-py3-none-any.whl (75 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.0/75.0 kB[0m [31m10.0 MB/s[0m eta [36m0:00:00[0m
Collecting httpcore (from httpx<1,>=0.23.0->openai)
  Downloading httpcore-1.0.2-py3-none-any.whl (76 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.9/76.9 kB[0m [31m11.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting h11<0.15,>=0.13 (from httpcore->httpx<1,>=0.23.0->openai)
  Downloading h11-0.14.0-py3-none-any.whl (58 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m58.3/58.3 kB[0m [31m8.5 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: h11, httpcore, httpx, openai
[31mERROR: pip's dependency resolver does not current

### Import the libraries

In [2]:
import langchain
from langchain.document_loaders import ArxivLoader
from langchain.embeddings import OpenAIEmbeddings
from langchain.llms import OpenAI
from langchain.chains import FlareChain
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Uncomment the below if you want to see all the intermediate steps
# langchain.verbose=True

from langchain import PromptTemplate, LLMChain
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.chains import RetrievalQA
from langchain.embeddings import HuggingFaceBgeEmbeddings
from io import BytesIO
from langchain.document_loaders import PyPDFLoader
import langchain
from langchain.document_loaders import ArxivLoader
import gradio as gr
import lancedb
from langchain.vectorstores import LanceDB
from langchain.document_loaders import ArxivLoader
from langchain.chains import FlareChain
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
import os
from langchain.llms import OpenAI
import getpass

In [None]:
os.environ["OPENAI_API_KEY"] = "sk-youropenaiapikey"

### Instantiate LLM

In [None]:
llm = OpenAI()

### Initialize the Embeddings model

In [None]:
model_name = "BAAI/bge-large-en"
model_kwargs = {"device": "cuda"}
encode_kwargs = {"normalize_embeddings": False}
embeddings = HuggingFaceBgeEmbeddings(
    model_name=model_name, model_kwargs=model_kwargs, encode_kwargs=encode_kwargs
)

### Load the data

In [None]:
# fetch docs from arxiv, in this case it's the FLARE paper
docs = ArxivLoader(query="2305.06983", load_max_docs=2).load()


# here is example https://arxiv.org/pdf/2305.06983.pdf
# you need to pass this number to query 2305.06983
# fetch docs from arxiv, in this case it's the FLARE paper
docs = ArxivLoader(query="2305.06983", load_max_docs=1).load()

### Text splitter

In [None]:
# instantiate text splitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1500, chunk_overlap=150)

# split the document into chunks
doc_chunks = text_splitter.split_documents(docs)

### Instantiate the Database

In [None]:
# lancedb
db = lancedb.connect("/tmp/lancedb")
table = db.create_table(
    "documentsai",
    data=[
        {
            "vector": embeddings.embed_query("Hello World"),
            "text": "Hello World",
            "id": "1",
        }
    ],
    mode="overwrite",
)
vector_store = LanceDB.from_documents(doc_chunks, embeddings, connection=table)

vector_store_retriever = vector_store.as_retriever()

### Define the FLare Chain

In [None]:
flare = FlareChain.from_llm(
    llm=llm, retriever=vector_store_retriever, max_generation_len=300, min_prob=0.45
)

### Create a gradio generate function for interactive UI

In [5]:
# Define a function to generate FLARE output based on user input
def generate_flare_output(input_text):
    output = flare.run(input_text)
    return output


input = gr.Text(
    label="Prompt",
    show_label=False,
    max_lines=1,
    placeholder="Enter your prompt",
    container=False,
)

iface = gr.Interface(
    fn=generate_flare_output,
    inputs=input,
    outputs="text",
    title="My AI bot",
    description="FLARE implementation with lancedb & bge embedding.",
)


iface.launch(debug=True, share=True)

(…)a171121fb03f394dc42974275/.gitattributes:   0%|          | 0.00/1.52k [00:00<?, ?B/s]

(…)fb03f394dc42974275/1_Pooling/config.json:   0%|          | 0.00/191 [00:00<?, ?B/s]

(…)b775ca171121fb03f394dc42974275/README.md:   0%|          | 0.00/90.3k [00:00<?, ?B/s]

(…)75ca171121fb03f394dc42974275/config.json:   0%|          | 0.00/720 [00:00<?, ?B/s]

(…)974275/config_sentence_transformers.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.34G [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/1.34G [00:00<?, ?B/s]

(…)f394dc42974275/sentence_bert_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

(…)03f394dc42974275/special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

(…)a171121fb03f394dc42974275/tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

(…)fb03f394dc42974275/tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

(…)b775ca171121fb03f394dc42974275/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

(…)5ca171121fb03f394dc42974275/modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
Running on public URL: https://3611dcbabf749dee48.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)


Keyboard interruption in main thread... closing server.
Killing tunnel 127.0.0.1:7860 <> https://3611dcbabf749dee48.gradio.live




Thanks