## Generative AI PDF information exctractor
In this use case it is shown how to exctract information from a PDF file through LLM queries. For this use case is necessary the use of a vector database (in this case FAISS), embeddings and OpenAI model calls.

In [1]:
from dotenv import load_dotenv
import os

load_dotenv("apis.env")
# from dotenv import load_dotenv, find_dotenv
# _ = load_dotenv(find_dotenv()) # read local .env file
hf_api_key = os.environ['HF_API_KEY']

In [2]:
# pdf_path = "TM6_digital_manual_MES-es-ES_prefill_20190121.pdf"
pdf_path = "BOE-A-1978-31229-consolidado.pdf"

# Load pdf with external info not seen during training of the LLM
from langchain.document_loaders import PyPDFLoader
loader = PyPDFLoader(pdf_path)
pages = loader.load_and_split()

In [3]:
pages[0]

Document(page_content='Constitución Española.\nCortes Generales\n«BOE» núm. 311, de 29 de diciembre de 1978\nReferencia: BOE-A-1978-31229\nÍNDICE\n   \nPreámbulo ................................................................ 3\nTÍTULO PRELIMINAR ........................................................... 3\nTÍTULO I. De los derechos y deberes fundamentales ........................................ 4\nCAPÍTULO PRIMERO. De los españoles y los extranjeros ................................... 5\nCAPÍTULO SEGUNDO. Derechos y libertades .......................................... 5\nSección 1.ª De los derechos fundamentales y de las libertades públicas ........................ 5\nSección 2.ª De los derechos y deberes de los ciudadanos ................................. 8\nCAPÍTULO TERCERO. De los principios rectores de la política social y económica ................... 10\nCAPÍTULO CUARTO. De las garantías de las libertades y derechos fundamentales .................. 12\nCAPÍTULO QUINTO. De la

In [4]:
# Generate vector space representation with words from the external data
from langchain.embeddings import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()

# from langchain.embeddings import HuggingFaceEmbeddings
# import transformers
# embeddings = HuggingFaceEmbeddings()


In [5]:
# Load embeddings in vector database
from langchain.vectorstores import FAISS
db = FAISS.from_documents(pages, embeddings)

In [49]:
# Use information retrieval from embedding for answer
from langchain.chains import RetrievalQA
from langchain.llms import OpenAI

model_name="gpt-3.5-turbo-instruct"
temperature=0
max_tokens=16

llm = OpenAI(
    model_name=model_name,
    temperature=temperature,
    max_tokens=max_tokens
    )
chain = RetrievalQA.from_llm(llm=llm, retriever=db.as_retriever())

In [50]:
input = "Existe algún modo rápido para triturar alimentos?"
chain(input, return_only_outputs=True)

{'result': ' Sí, se puede utilizar el modo "Turbo" para triturar alimentos'}

## Add UI with Gradio
At this point I'll add a UI provided by Gradio library.

In [6]:
import gradio as gr

def model_hyperparameters(temperature, max_tokens):#
    llm = OpenAI(
        model_name="gpt-3.5-turbo-instruct",
        temperature=temperature,
        max_tokens=max_tokens
        )
    chain = RetrievalQA.from_llm(llm=llm, retriever=db.as_retriever())
    return chain

def generate(input, temperature, max_tokens): #, temperature, max_tokens
    # output = client.generate(input, max_new_tokens=slider).generated_text
    chain = model_hyperparameters(temperature, max_tokens)
    output = chain(input, return_only_outputs=True)
    return output

demo = gr.Interface(fn=generate, 
                    inputs=[gr.Textbox(label="Prompt"),
                            gr.Slider(label="Temperature", 
                                      value=0.1,  
                                      maximum=1, 
                                      minimum=0.1),
                            gr.Slider(label="Max tokens", 
                                      value=16,  
                                      maximum=64, 
                                      minimum=8)],
                    outputs=[gr.Textbox(label="Completion")])

demo.launch(share=True) #server_port=int(os.environ['PORT1'])

  from .autonotebook import tqdm as notebook_tqdm


Running on local URL:  http://127.0.0.1:7860


In [55]:
gr.close_all()

Closing server running on port: 7864
Closing server running on port: 7863
Closing server running on port: 7860
Closing server running on port: 7863
Closing server running on port: 7863
Closing server running on port: 7863
Closing server running on port: 7863
Closing server running on port: 7863
Closing server running on port: 7863
Closing server running on port: 7863
Closing server running on port: 7863
