In [1]:
pip install unstructured



# About

- Use [Langchain](https://python.langchain.com/en/latest/index.html) to build a chatbot that can answer questions about [Harry Potter books](https://www.kaggle.com/datasets/hinepo/harry-potter-books-in-pdf-1-7)
- Experiment with various LLMs (Large Language Models)
- Use [FAISS vector store](https://python.langchain.com/docs/integrations/vectorstores/faiss) to store text embeddings with [Sentence Transformers](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) from [Hugging Face](https://huggingface.co/hkunlp/instructor-large). FAISS runs on GPU and it is much faster than Chroma
- Use [Retrieval chain](https://python.langchain.com/docs/modules/data_connection/retrievers/) to retrieve relevant passages from embedded text
- Summarize retrieved passages
- Leverage Kaggle dual GPU (2 * T4) with [Hugging Face Accelerate](https://huggingface.co/docs/accelerate/index)
- Chat UI with [Gradio](https://www.gradio.app/guides/quickstart)

No need to create any API key to use this notebook! Everything is open source.

Upvote the notebook if you learn from it or use it! :)

This will help me keep experimenting with new models as soon as they are released

### Models

- [WizardLM](https://huggingface.co/TheBloke/wizardLM-7B-HF)
- [Falcon](https://huggingface.co/h2oai/h2ogpt-gm-oasst1-en-2048-falcon-7b-v2)
- [Llama 2-7b](https://huggingface.co/daryl149/llama-2-7b-chat-hf)
- [Llama 2-13b](https://huggingface.co/daryl149/llama-2-13b-chat-hf)
- [Bloom](https://huggingface.co/bigscience/bloom-7b1)

![image.png](attachment:cdc462a7-e241-4332-821a-fa369a853128.png)

img source: HinePo

In [2]:
! nvidia-smi -L

GPU 0: Tesla T4 (UUID: GPU-c27dd88e-e1f8-8741-43c7-0b6b6516be01)


# Installs

This steps takes
- ~16 s on Colab
- ~4 min on kaggle

In [3]:
%%time

! pip install -qq -U langchain tiktoken pypdf faiss-gpu
! pip install -qq -U transformers InstructorEmbedding sentence_transformers
! pip install -qq -U accelerate bitsandbytes xformers einops

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
torchaudio 2.1.0+cu121 requires torch==2.1.0, but you have torch 2.1.2 which is incompatible.
torchdata 0.7.0 requires torch==2.1.0, but you have torch 2.1.2 which is incompatible.
torchtext 0.16.0 requires torch==2.1.0, but you have torch 2.1.2 which is incompatible.
torchvision 0.16.0+cu121 requires torch==2.1.0, but you have torch 2.1.2 which is incompatible.[0m[31m
[0mCPU times: user 363 ms, sys: 48.7 ms, total: 412 ms
Wall time: 1min 4s


# Imports

In [4]:
# Útil para suprimir mensagens que podem não ser essenciais ou relevantes para o usuário.
import warnings
warnings.filterwarnings("ignore")

import os
# Essa linha importa o módulo glob, que fornece uma maneira de realizar correspondência de padrões em caminhos de arquivos e diretórios.
# O glob é comumente utilizado para listar arquivos em um diretório com base em padrões de nome de arquivo que incluem curingas.
import glob
import textwrap
import time

import langchain

# loaders
from langchain.document_loaders import PyPDFLoader
from langchain.document_loaders import DirectoryLoader

# splits
from langchain.text_splitter import RecursiveCharacterTextSplitter

# prompts
from langchain import PromptTemplate, LLMChain

# vector stores
from langchain.vectorstores import FAISS

# models
from langchain.llms import HuggingFacePipeline
from InstructorEmbedding import INSTRUCTOR
from langchain.embeddings import HuggingFaceInstructEmbeddings

# retrievers
from langchain.chains import RetrievalQA

import torch
import transformers
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

print('LangChain:', langchain.__version__)

LangChain: 0.1.1


In [5]:
glob.glob('/content/data/*')

['/content/data/df_100_PDF.pdf']

# CFG

- CFG class enables easy and organized experimentation

In [6]:
class CFG:
    # LLMs
    model_name = 'llama2-13b' # wizardlm, bloom, falcon, llama2-7b, llama2-13b
    temperature = 0,
    top_p = 0.95,
    repetition_penalty = 1.15

    # splitting
    split_chunk_size = 800
    split_overlap = 0

    # embeddings
    embeddings_model_repo = 'sentence-transformers/all-MiniLM-L6-v2'

    # similar passages
    k = 3

    # paths
    PDFs_path = '/content/data/'
    Embeddings_path =  '/content/faiss_index_hp'
    Persist_directory = './celanese-vectordb'

# Define model

In [7]:
def get_model(model = CFG.model_name):

    print('\nDownloading model: ', model, '\n\n')

    if model == 'wizardlm':
        model_repo = 'TheBloke/wizardLM-7B-HF'

        tokenizer = AutoTokenizer.from_pretrained(model_repo)

        model = AutoModelForCausalLM.from_pretrained(
            model_repo,
            load_in_4bit=True,
            device_map='auto',
            torch_dtype=torch.float16,
            low_cpu_mem_usage=True
        )

        max_len = 1024

    elif model == 'llama2-7b':
        model_repo = 'daryl149/llama-2-7b-chat-hf'

        tokenizer = AutoTokenizer.from_pretrained(model_repo, use_fast=True)

        model = AutoModelForCausalLM.from_pretrained(
            model_repo,
            load_in_4bit=True,
            device_map='auto',
            torch_dtype=torch.float16,
            low_cpu_mem_usage=True,
            trust_remote_code=True
        )

        max_len = 2048

    elif model == 'llama2-13b':
        model_repo = 'daryl149/llama-2-13b-chat-hf' # from Hugging Face

        tokenizer = AutoTokenizer.from_pretrained(model_repo, use_fast=True)

        model = AutoModelForCausalLM.from_pretrained(
            model_repo,
            load_in_4bit=True,

#             bnb_4bit_quant_type='nf4',  # Normalized float 4
#             bnb_4bit_use_double_quant=True,  # Second quantization after the first
#             bnb_4bit_compute_dtype=bfloat16,  # Computation type

            device_map='auto',
            torch_dtype=torch.float16,
#             low_cpu_mem_usage=True,
            trust_remote_code=True
        )

        max_len = 8192

    elif model == 'bloom':
        model_repo = 'bigscience/bloom-7b1'

        tokenizer = AutoTokenizer.from_pretrained(model_repo)

        model = AutoModelForCausalLM.from_pretrained(
            model_repo,
            load_in_4bit=True,
            device_map='auto',
            torch_dtype=torch.float16,
            low_cpu_mem_usage=True,
        )

        max_len = 1024

    elif model == 'falcon':
        model_repo = 'h2oai/h2ogpt-gm-oasst1-en-2048-falcon-7b-v2'

        tokenizer = AutoTokenizer.from_pretrained(model_repo)

        model = AutoModelForCausalLM.from_pretrained(
            model_repo,
            load_in_4bit=True,
            device_map='auto',
            torch_dtype=torch.float16,
            low_cpu_mem_usage=True,
            trust_remote_code=True
        )

        max_len = 1024

    else:
        print("Not implemented model (tokenizer and backbone)")

    return tokenizer, model, max_len

This steps takes
- ~3-9 min on Colab
- ~5-8 min on kaggle, sometimes much more, up to 35 min

In [8]:
%%time

tokenizer, model, max_len = get_model(model = CFG.model_name)


Downloading model:  llama2-13b 




Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

CPU times: user 10.4 s, sys: 31.2 s, total: 41.6 s
Wall time: 2min 21s


In [9]:
model.eval()

LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(32000, 5120, padding_idx=0)
    (layers): ModuleList(
      (0-39): 40 x LlamaDecoderLayer(
        (self_attn): LlamaSdpaAttention(
          (q_proj): Linear4bit(in_features=5120, out_features=5120, bias=False)
          (k_proj): Linear4bit(in_features=5120, out_features=5120, bias=False)
          (v_proj): Linear4bit(in_features=5120, out_features=5120, bias=False)
          (o_proj): Linear4bit(in_features=5120, out_features=5120, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear4bit(in_features=5120, out_features=13824, bias=False)
          (up_proj): Linear4bit(in_features=5120, out_features=13824, bias=False)
          (down_proj): Linear4bit(in_features=13824, out_features=5120, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): LlamaRMSNorm()
        (post_attention_layernorm): LlamaRMSNorm()
      )
    )


In [10]:
### check how Accelerate split the model across the available devices (GPUs)
model.hf_device_map

OrderedDict([('', 0)])

# 🤗 pipeline

- Hugging Face pipeline

In [11]:
pipe = pipeline(
    task = "text-generation",
    model = model,
    tokenizer = tokenizer,
    pad_token_id = tokenizer.eos_token_id,
    max_length = max_len,
    temperature = CFG.temperature,
    top_p = CFG.top_p,
    repetition_penalty = CFG.repetition_penalty
)

llm = HuggingFacePipeline(pipeline = pipe)

In [12]:
llm

HuggingFacePipeline(pipeline=<transformers.pipelines.text_generation.TextGenerationPipeline object at 0x7ac9833bc790>)

In [13]:
%%time
### testing model, not using the harry potter books yet
### answer is not necessarily related to harry potter
#query = "Give me 5 examples of Pressure"
#llm(query)

CPU times: user 1 µs, sys: 2 µs, total: 3 µs
Wall time: 5.72 µs


# 🦜🔗 Langchain

- Multiple document retriever with LangChain

In [14]:
CFG.model_name

'llama2-13b'

# Loader

- [Directory loader](https://python.langchain.com/docs/modules/data_connection/document_loaders/file_directory) for multiple files
- This step is not necessary if you are just loading the vector database
- This step is necessary if you are creating embeddings. In this case you need to:
    - load de PDF files
    - split into chunks
    - create embeddings
    - save the embeddings in a vector store
    - After that you can just load the saved embeddings to do similarity search with the user query, and then use the LLM to answer the question

In [15]:
pip install unstructured[pdf]

Collecting torch (from layoutparser[layoutmodels,tesseract]->unstructured-inference==0.7.21->unstructured[pdf])
  Using cached torch-2.1.0-cp310-cp310-manylinux1_x86_64.whl (670.2 MB)
Installing collected packages: torch
  Attempting uninstall: torch
    Found existing installation: torch 2.1.2
    Uninstalling torch-2.1.2:
      Successfully uninstalled torch-2.1.2
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
xformers 0.0.23.post1 requires torch==2.1.2, but you have torch 2.1.0 which is incompatible.[0m[31m
[0mSuccessfully installed torch-2.1.0


In [16]:
%%time

loader = DirectoryLoader(
    CFG.PDFs_path,
    glob="./*.pdf",
    loader_cls=PyPDFLoader,
    show_progress=True,
    use_multithreading=True
)

documents = loader.load()

100%|██████████| 1/1 [00:01<00:00,  1.27s/it]

CPU times: user 1.18 s, sys: 34.9 ms, total: 1.22 s
Wall time: 1.28 s





In [17]:
print(f'We have {len(documents)} pages in total')

We have 1 pages in total


In [18]:
#documents[8].page_content

IndexError: list index out of range

# Splitter

- Splitting the text into chunks so its passages are easily searchable for similarity
- This step is also only necessary if you are creating the embeddings
- [RecursiveCharacterTextSplitter](https://python.langchain.com/en/latest/reference/modules/document_loaders.html?highlight=RecursiveCharacterTextSplitter#langchain.document_loaders.MWDumpLoader)

In [19]:
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = CFG.split_chunk_size,
    chunk_overlap = CFG.split_overlap
)

texts = text_splitter.split_documents(documents)

print(f'We have created {len(texts)} chunks from {len(documents)} pages')

We have created 46 chunks from 1 pages


# Create Embeddings


- Embedd and store the texts in a Vector database (FAISS)
- [LangChain Vector Stores docs](https://python.langchain.com/docs/modules/data_connection/vectorstores/)
- [FAISS - langchain](https://python.langchain.com/docs/integrations/vectorstores/faiss)
- [Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks - paper Aug/2019](https://arxiv.org/pdf/1908.10084.pdf)
- [This is a nice 4 minutes video about vector stores](https://www.youtube.com/watch?v=dN0lsF2cvm4)
- [Chroma - Persist and load the vector database](https://python.langchain.com/en/latest/modules/indexes/vectorstores/examples/chroma.html)

___

- If you use Chroma vector store it will take ~35 min to create embeddings
- If you use FAISS vector store on GPU it will take just ~3 min

___

We need to create the embeddings only once, and then we can just load the vector store and query the database using similarity search.

Loading the embeddings takes only a few seconds.

I uploaded the embeddings to a Kaggle Dataset so we just load it from [here](https://www.kaggle.com/datasets/hinepo/faiss-hp-sentence-transformers).## Create vector database

- If you use Chroma vector store it will take ~35 min to create embeddings
- If you use FAISS vector store on GPU it will take just ~3 min


We need to create the embeddings only once, and then we can just load the vector store and query the database using similarity search.

Loading the embeddings takes only a few seconds.

I uploaded the embeddings to a Kaggle Dataset so we just load it from [here](https://www.kaggle.com/datasets/hinepo/faiss-hp-sentence-transformers).

In [20]:
# %%time

# ### download embeddings model
embeddings = HuggingFaceInstructEmbeddings(
     model_name = CFG.embeddings_model_repo,
     model_kwargs = {"device": "cuda"}
 )

# ### create embeddings and DB
vectordb = FAISS.from_documents(
     documents = texts,
     embedding = embeddings
 )

# ### persist vector database
vectordb.save_local("faiss_index_hp")

load INSTRUCTOR_Transformer
max_seq_length  512


# Load vector database

- After saving the vector database, we just load it from the Kaggle Dataset I mentioned
- Obviously, the embeddings function to load the embeddings must be the same as the one used to create the embeddings

In [21]:
%%time

### download embeddings model
embeddings = HuggingFaceInstructEmbeddings(
    model_name = CFG.embeddings_model_repo,
    model_kwargs = {"device": "cuda"}
)



load INSTRUCTOR_Transformer
max_seq_length  512
CPU times: user 115 ms, sys: 143 ms, total: 258 ms
Wall time: 277 ms


In [22]:
### load vector DB embeddings
vectordb = FAISS.load_local(
    CFG.Embeddings_path,
    embeddings
)

In [23]:
### test if vector DB was loaded correctly
vectordb.similarity_search('magic creatures')

[Document(page_content='Timestamp Time Day period Day of Week Month Day Month Dew Point Process Dew Point Contactor Pressure Process Contactor Pressure Natural Gas Moisture Process Natural Gas Moisture Contactor T emperature Process Contactor T emperature Glycol Moisture Process Glycol Moisture Water Inlet T emperature Process Water Inlet T emperature Glycol Inlet T emperature Process Glycol Inlet T emperature Out Glycol T emperature Process Out Glycol T emperature T emperature Process T emperature Out Water T emperature Process Out Water T emperature Stripping Gas Process Stripping Gas Pressure Process Pressure Dry Glycol Process Dry Glycol Glycol Flow Process Glycol Flow', metadata={'source': '/content/data/df_100_PDF.pdf', 'page': 0}),
 Document(page_content="31-10-2023 00:52:00 Night Tuesday 31 October -26.287443 Criticize 162.879585 Low 2.82 Normal 41.9 Low 0.86 Efficient 35.32 Normal 65.11 Normal 47.63 Good 180.739647 Keep 44.472493 Bad 84.881613 Normal 29.032323 Normal Pressure 

# Prompt Template

- Custom prompt

In [44]:
prompt_template = """
Timestamp Time Day period Day of Week Month Day Month
 Dew Point Process Dew Point Contactor Pressure Process
 Contactor Pressure Natural.

{context}

Question: {question}
Answer:"""


PROMPT = PromptTemplate(
    template = prompt_template,
    input_variables = ["context", "question"]
)

In [45]:
llm_chain = LLMChain(prompt=PROMPT, llm=llm)
llm_chain

LLMChain(prompt=PromptTemplate(input_variables=['context', 'question'], template='\nTimestamp Time Day period Day of Week Month Day Month\n Dew Point Process Dew Point Contactor Pressure Process \n Contactor Pressure Natural.\n\n{context}\n\nQuestion: {question}\nAnswer:'), llm=HuggingFacePipeline(pipeline=<transformers.pipelines.text_generation.TextGenerationPipeline object at 0x7ac9833bc790>))

# Retriever chain

- Retriever to retrieve relevant passages
- Chain to answer questions
- [RetrievalQA: Chain for question-answering](https://python.langchain.com/docs/modules/data_connection/retrievers/)

In [46]:
retriever = vectordb.as_retriever(search_kwargs = {"k": CFG.k, "search_type" : "similarity"})

qa_chain = RetrievalQA.from_chain_type(
    llm = llm,
    chain_type = "stuff", # map_reduce, map_rerank, stuff, refine
    retriever = retriever,
    chain_type_kwargs = {"prompt": PROMPT},
    return_source_documents = True,
    verbose = False
)

In [47]:
### testing MMR search
question = "how to use Dew point on 01-09-2023?"
vectordb.max_marginal_relevance_search(question, k = CFG.k)

[Document(page_content="19-05-2022 03:18:00 Night Thursday 19 May -6.492189 Don't Criticize 181.395438 High 2.96 Normal 48.8 High 1.67 Inefficient 30.86 Normal 64.25 Normal 47.65 Good 183.655265 Keep 50.027672 Bad 105.156978 Normal 25.791931 Normal Pressure 0.95 Not Worrying 1.454781 Changed", metadata={'source': '/content/data/df_100_PDF.pdf', 'page': 0}),
 Document(page_content='Timestamp Time Day period Day of Week Month Day Month Dew Point Process Dew Point Contactor Pressure Process Contactor Pressure Natural Gas Moisture Process Natural Gas Moisture Contactor T emperature Process Contactor T emperature Glycol Moisture Process Glycol Moisture Water Inlet T emperature Process Water Inlet T emperature Glycol Inlet T emperature Process Glycol Inlet T emperature Out Glycol T emperature Process Out Glycol T emperature T emperature Process T emperature Out Water T emperature Process Out Water T emperature Stripping Gas Process Stripping Gas Pressure Process Pressure Dry Glycol Process D

In [48]:
### testing similarity search
question = "how to use Dew point on 01-09-2023?"
vectordb.similarity_search(question, k = CFG.k)

[Document(page_content="19-05-2022 03:18:00 Night Thursday 19 May -6.492189 Don't Criticize 181.395438 High 2.96 Normal 48.8 High 1.67 Inefficient 30.86 Normal 64.25 Normal 47.65 Good 183.655265 Keep 50.027672 Bad 105.156978 Normal 25.791931 Normal Pressure 0.95 Not Worrying 1.454781 Changed", metadata={'source': '/content/data/df_100_PDF.pdf', 'page': 0}),
 Document(page_content='Timestamp Time Day period Day of Week Month Day Month Dew Point Process Dew Point Contactor Pressure Process Contactor Pressure Natural Gas Moisture Process Natural Gas Moisture Contactor T emperature Process Contactor T emperature Glycol Moisture Process Glycol Moisture Water Inlet T emperature Process Water Inlet T emperature Glycol Inlet T emperature Process Glycol Inlet T emperature Out Glycol T emperature Process Out Glycol T emperature T emperature Process T emperature Out Water T emperature Process Out Water T emperature Stripping Gas Process Stripping Gas Pressure Process Pressure Dry Glycol Process D

# Post-process outputs

- Format llm response
- Cite sources (PDFs)
- Change `width` parameter to format the output

In [49]:
def wrap_text_preserve_newlines(text, width=700):
    # Split the input text into lines based on newline characters
    lines = text.split('\n')

    # Wrap each line individually
    wrapped_lines = [textwrap.fill(line, width=width) for line in lines]

    # Join the wrapped lines back together using newline characters
    wrapped_text = '\n'.join(wrapped_lines)

    return wrapped_text


def process_llm_response(llm_response):
    ans = wrap_text_preserve_newlines(llm_response['result'])

    sources_used = ' \n'.join(
        [
            source.metadata['source'].split('/')[-1][:-4] + ' - page: ' + str(source.metadata['page'])
            for source in llm_response['source_documents']
        ]
    )

    ans = ans + '\n\nSources: \n' + sources_used
    return ans

In [50]:
def llm_ans(query):
    start = time.time()
    llm_response = qa_chain(query)
    ans = process_llm_response(llm_response)
    end = time.time()

    time_elapsed = int(round(end - start, 0))
    time_elapsed_str = f'\n\nTime elapsed: {time_elapsed} s'
    return ans + time_elapsed_str

# Ask questions

- Question Answering from multiple documents
- Run QA Chain
- Talk to your data

In [51]:
CFG.model_name

'llama2-13b'

In [52]:
model

LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(32000, 5120, padding_idx=0)
    (layers): ModuleList(
      (0-39): 40 x LlamaDecoderLayer(
        (self_attn): LlamaSdpaAttention(
          (q_proj): Linear4bit(in_features=5120, out_features=5120, bias=False)
          (k_proj): Linear4bit(in_features=5120, out_features=5120, bias=False)
          (v_proj): Linear4bit(in_features=5120, out_features=5120, bias=False)
          (o_proj): Linear4bit(in_features=5120, out_features=5120, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear4bit(in_features=5120, out_features=13824, bias=False)
          (up_proj): Linear4bit(in_features=5120, out_features=13824, bias=False)
          (down_proj): Linear4bit(in_features=13824, out_features=5120, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): LlamaRMSNorm()
        (post_attention_layernorm): LlamaRMSNorm()
      )
    )


In [53]:
query = "how to use Dew point on 01-09-2023?"
print(llm_ans(query))

 Based on the data provided, the dew point on 01-09-2023 was -9.462748 degrees Celsius. To determine if it is safe to use the system, you would need to consult the manufacturer's guidelines for the specific system and the ambient conditions on that day. However, based on the data provided, the temperature was below freezing and the humidity was low, which may indicate that the system was not operating efficiently or effectively. It is important to consider all relevant factors before making any decisions about using the system.

Sources: 
df_100_PDF - page: 0 
df_100_PDF - page: 0 
df_100_PDF - page: 0

Time elapsed: 28 s


In [54]:
query = "Give me the information about Dew Point in 01-09-2023?"
print(llm_ans(query))

 Sure! Here is the information about Dew Point on 01-09-2023 based on the data provided:

Dew Point (°C): 10.587682

Note that this value is for the specific date and time of 01-09-2023 at 01:24:00, and may not be representative of other times or dates.

Sources: 
df_100_PDF - page: 0 
df_100_PDF - page: 0 
df_100_PDF - page: 0

Time elapsed: 21 s


In [55]:
query = "Give me the information about Glycol Moisture in 01-09-2023?"
print(llm_ans(query))

 Sure! Here is the information about Glycol Moisture on 01-09-2023 based on the data provided:

Glycol Moisture (%): 34.26

Note that this value is normal, and there is no critical or highly critical condition detected.

Sources: 
df_100_PDF - page: 0 
df_100_PDF - page: 0 
df_100_PDF - page: 0

Time elapsed: 17 s


In [56]:
query = "Give me 5 examples of df_100_PDF"
print(llm_ans(query))

 Sure! Here are five examples of df_100_PDF, each with a different timestamp and set of values:

Example 1 (timestamp: 17-04-2023 01:22:00)
df_100_PDF = [183.835195, 3.73, 'High', 44.7, 'Normal', 1.39, 'Inefficient', 34.26, 'Normal', 64.39, 'Good']

Example 2 (timestamp: 15-12-2023 01:24:00)
df_100_PDF = [173.619968, 3.38, 'Critical', 43.3, 'Low', 0.62, 'Efficient', 34.41, 'Normal', 60.9, 'Good']

Example 3 (timestamp: 10-10-2023 02:56:00)
df_100_PDF = [161.333793, 3.51, 'Highly Critical', 47.0, 'High', 1.33, 'Inefficient', 37.42, 'Normal', 70.73, 'Good']

Example 4 (timestamp: 11-11-2022 02:58:00)
df_100_PDF = [180.076508, 3.29, 'Critical', 49.5, 'High', 0.47, 'Efficient', 41.82, 'Problem', 69.66, 'Normal']

Example 5 (timestamp: 01-01-2023 00:00:00)
df_100_PDF = [100.0, 0.0, 'Not Applicable', 0.0, 'Not Applicable', 0.0, 'Not Applicable', 0.0, 'Not Applicable', 0.0, 'Not Applicable']

Sources: 
df_100_PDF - page: 0 
df_100_PDF - page: 0 
df_100_PDF - page: 0

Time elapsed: 94 s


# Gradio Chat UI

- Create a chat UI with [Gradio](https://www.gradio.app/guides/quickstart)
- [ChatInterface docs](https://www.gradio.app/docs/chatinterface)
- The notebook should be running if you want to use the chat interface
- Print of the chat UI below
- **Gradio has better compatibility with Colab than with Kaggle. If you plan to use the interface, it is preferable to do so in Google Colab**

In [57]:
### necessary for google colab
import locale
locale.getpreferredencoding = lambda: "UTF-8"

In [58]:
! pip install --upgrade gradio -qq

In [59]:
!pip install gradio




In [60]:
import gradio as gr

print(gr.__version__)

4.14.0


In [61]:
def predict(message, history):
    # output = message # debug mode

    output = str(llm_ans(message)).replace("\n", "<br/>")
    return output

demo = gr.ChatInterface(
    predict,
    title = f' Open-Source LLM ({CFG.model_name}) for Celanese Question Answering'
)

demo.queue()
demo.launch()

Setting queue=True in a Colab notebook requires sharing enabled. Setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Running on public URL: https://a077dfa60d60500dec.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)




![image.png](attachment:413fe7a3-6534-45b5-b6e3-7fc86e982cf1.png)

![image.png](attachment:976f4bf4-7626-4d4a-b773-3eebd7e9f000.png)

# Conclusions

- Feel free to fork and optimize the code. Lots of things can be improved.

- Things I found had the most impact on models output quality in my experiments:
    - Prompt engineering
    - Bigger models
    - Other models families
    - Splitting: chunk size, overlap
    - Search: Similarity, MMR, k
    - Pipeline parameters (temperature, top_p, penalty)
    - Embeddings function
    - LLM parameters (max len)


- LangChain, Hugging Face and Gradio are awesome libs!

- Upvote if you liked it or want me to keep updating this with new models and functionalities!

🦜🔗🤗

![image.png](attachment:68773819-4358-4ded-be3e-f1d275103171.png)