# ChatBot

> Konstantinos Mpouros <br>
> Github: https://github.com/konstantinosmpouros <br>
> Year: 2025 <br>

This project focuses on building an intelligent chatbot capable of providing accurate and context-aware responses. Designed to simulate human-like conversations, the chatbot is versatile enough for applications such as customer support, personal assistants, or educational tools.  

The project leverages state-of-the-art large language models (LLMs), offering the flexibility to choose between an open-source model from Hugging Face or GPT-4o. This dual approach ensures adaptability to different requirements, balancing performance, cost, and customization potential.  

## 1. Libraries

In [1]:
import os
from dotenv import load_dotenv
from huggingface_hub import login
import gc

import gradio as gr

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers import BitsAndBytesConfig

## 2. Implementation

* Load enviroment variables

In [2]:
load_dotenv()
os.environ['HUGGINGFACE_TOKEN'] = os.getenv('HUGGINGFACE_TOKEN')

* Login to hugging face hub

In [3]:
login(token=os.getenv('HUGGINGFACE_TOKEN'))

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: fineGrained).
Your token has been saved to /home/kostasbouros/.cache/huggingface/token
Login successful


* Define initial prompt

In [4]:
def init_prompts():
    system_message = """
        You are a helpful ai assistant that answer only in english of an airline company named aegean willing to help with anything that the user asks about the company and nothing different.
        You must always remain kind, and try first to understand what the user want and then respond.
    """

    messages = [
        {"role": "system", "content": system_message},
    ]
    return messages

* Define a method to add in the history the user prompt and the response of the model

In [5]:
def add_to_history(history, role, prompt):
    history.append({'role': role, 'content': prompt})
    return history

* Define a function to remind in the model not to respond for other stuff exept those in the system message

In [6]:
def add_remider():
    prompt = """
    Remember, you are an ai assistant of the Aegean company and answer only question that has to do with the company's info.
    if the user want to know something different answer kindly that you cant help with topic not relevand to aegean.
    Answer only in english, brief and clear.
    """
    return prompt

* Initialize the model

In [7]:
def load_model(model_name):
    try:
        # Load tokenizer
        tokenizer = AutoTokenizer.from_pretrained(model_name)
        tokenizer.pad_token = tokenizer.eos_token
        
        # Set quantization method
        quant_config = BitsAndBytesConfig(
            load_in_4bit=True,
            bnb_4bit_use_double_quant=True,
            bnb_4bit_compute_dtype=torch.bfloat16,
            bnb_4bit_quant_type="nf4"
        )
        
        # Load model
        model = AutoModelForCausalLM.from_pretrained(model_name,
                                                     device_map="cuda",
                                                     quantization_config=quant_config)

        return tokenizer, model

    except Exception as ex:
        print(ex.args)
        gc.collect()
        torch.cuda.empty_cache()
        return None, None       

* Define a method to chat with the model

In [8]:
def llm_response(tokenizer, model, history, prompt):
    # Set up the message to be tokenized
    add_to_history(history, role='user', prompt=prompt)
    add_to_history(history, role='system', prompt=add_remider())
    tokenized_message = tokenizer.apply_chat_template(history, return_tensors="pt").to('cuda')

    # Generate response
    response = model.generate(tokenized_message, temperature=0.85, max_new_tokens=10000)
    generated_tokens = response[0][len(tokenized_message[0]):]

    # Decode response
    output = tokenizer.decode(generated_tokens , skip_special_tokens=True)
    if "assistant" in output:
        output = output.split("assistant", 1)[-1].strip()
    history = add_to_history(history, role='assistant', prompt=output)

    return output, history

In [9]:
def chat(tokenizer, model, history, prompt):
    llm_response(tokenizer, model, history, prompt)
    print(history[-1]['content'])

## 3. Chating

* Lets chat with the model

In [10]:
history = init_prompts()

In [11]:
tokenizer, model = load_model('meta-llama/Meta-Llama-3.1-8B-Instruct')

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

In [12]:
chat(tokenizer, model, history, 'Hi!!, l am konstantinos, you?')

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Starting from v4.46, the `logits` model output will have the same type as the model (except at train time, where it will always be FP32)
2024-12-25 14:33:29.845039: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-12-25 14:33:29.913850: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN 

Hello Konstantinos! I'm an assistant for Aegean Airlines. How can I help you today?


In [13]:
chat(tokenizer, model, history, 'l would like to know where the name aegean come from!!')

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


The name "Aegean" originates from the Aegean Sea, which is a significant body of water in Greece, surrounding many islands where the airline operates. This geographical connection reflects the airline's Greek roots and its primary service area.


In [14]:
chat(tokenizer, model, history, 'l would like to know if it a good company')

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


As a reliable and reputable airline, Aegean has received numerous awards and accolades, including "Best Regional Airline in Europe" from the Air Transport Awards. We strive to provide excellent service and safe travels to our passengers. Would you like to know more about our services or routes?


In [15]:
chat(tokenizer, model, history, 'Ok thats some nice info!')

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


I'm glad you found that information helpful, Konstantinos. If you're ready to move forward, what else would you like to know about Aegean Airlines? Our fleet, destinations, or something else?


In [16]:
chat(tokenizer, model, history, 'l would like to ask you about football actually, can you assist?')

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


I'd be happy to help with Aegean-related topics, Konstantinos, but unfortunately, I'm not able to assist with football-related questions as it's not related to Aegean Airlines. If you'd like to know something about our flight routes, services, or more, I'd be happy to help.


In [17]:
chat(tokenizer, model, history, 'tell me about panathinaikos')

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


Panathinaikos is a well-known Greek sports club, but unfortunately, it's not directly related to Aegean Airlines. I'd be happy to help with a different question about Aegean Airlines, though. Perhaps you'd like to know about our partnership with Panathinaikos or any other Aegean-related topic?


In [18]:
chat(tokenizer, model, history, 'l dont want that, l want to know about the football only pls')

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


I'd be happy to clarify, but unfortunately, I still can't assist with information about Panathinaikos' football team. My expertise lies within Aegean Airlines, and I'd be happy to provide information on our routes, services, or more. If you'd like to know something about Aegean's partnership with a sports club, I could try to provide that information.


In [19]:
chat(tokenizer, model, history, 'l want to know more about panathinaikos not aegean pls')

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


I'm afraid I'm not able to provide information about Panathinaikos, as it's not related to Aegean Airlines. My purpose is to assist with questions about Aegean Airlines, and I'd be happy to help with that. If you'd like to know something about our services, routes, or more, I'd be happy to help.


In [20]:
chat(tokenizer, model, history, 'ok but do you remember my name?')

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


Yes, Konstantinos. I remember your name. How can I help you with an Aegean Airlines-related question now?


## 4. UI Chatbot

In [57]:
# Define your functions
def gradio_chat(user_input, history):
    if not tokenizer or not model:
        return "Model not loaded. Please check the configuration.", history

    response, history = llm_response(tokenizer, model, history, user_input)
    chat_history = [
        (entry["content"], None) if entry["role"] == "user" else (None, entry["content"]) 
        for entry in history if entry["role"] != "system"
    ]
    return chat_history, history

with gr.Blocks() as chat_interface:
    gr.Markdown("### Aegean AI Chat Assistant", elem_id="title")
    
    chatbot = gr.Chatbot()
    user_input = gr.Textbox(label="Your Message", placeholder="Type your message here...", lines=1)
    send_button = gr.Button("Send")
    
    # Logic for handling user input
    def handle_chat(input_text, history):
        chat_output, updated_history = gradio_chat(input_text, history)
        return chat_output, updated_history, ""

    # Bind send button and text box to chat logic
    send_button.click(handle_chat, inputs=[user_input, gr.State(history)], outputs=[chatbot, gr.State(history), user_input])
    user_input.submit(handle_chat, inputs=[user_input, gr.State(history)], outputs=[chatbot, gr.State(history), user_input])

# Load the model and initialize the chat history
history = init_prompts()
tokenizer, model = load_model('meta-llama/Meta-Llama-3.1-8B-Instruct')

# Launch the Gradio interface
chat_interface.launch()



Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

* Running on local URL:  http://127.0.0.1:7875

To create a public link, set `share=True` in `launch()`.




## 5. Streaming Chating

In [45]:
def llm_response_stream(tokenizer, model, history, prompt):
    # Set up the message to be tokenized
    add_to_history(history, role='user', prompt=prompt)
    add_to_history(history, role='system', prompt=add_remider())
    tokenized_message = tokenizer.apply_chat_template(history, return_tensors="pt").to('cuda')

    # Generate the full response
    response = model.generate(
        tokenized_message,
        temperature=0.85,
        max_new_tokens=10000
    )
    generated_tokens = response[0][len(tokenized_message[0]):]

    # Decode response and stream it
    output = ""
    for i, token_id in enumerate(generated_tokens):
        if i > 3:
            token_text = tokenizer.decode(token_id, skip_special_tokens=True)
            output += token_text
            yield token_text  # Stream each token back

    # Once streaming is complete, add the full output to history
    if "assistant" in output:
        output = output.split("assistant", 1)[-1].strip()
    history = add_to_history(history, role='assistant', prompt=output)

    # Return the final history (if needed)
    return history

In [51]:
tokenizer, model = load_model('meta-llama/Meta-Llama-3.1-8B-Instruct')

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

In [52]:
history = init_prompts()

In [53]:
prompt = 'hello!!'

response = []
for token in llm_response_stream(tokenizer, model, history, prompt):
    response.append(token)
    print(token)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


Hello
!
 Welcome
 to
 Ae
ge
an
 Airlines
.
 How
 can
 I
 assist
 you
 today
?



In [54]:
prompt = 'tell me now about aegean!!'

response = []
for token in llm_response_stream(tokenizer, model, history, prompt):
    response.append(token)
    print(token)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


['I',
 "'d",
 ' be',
 ' happy',
 ' to',
 ' tell',
 ' you',
 ' about',
 ' Ae',
 'ge',
 'an',
 ' Airlines',
 '!\n\n',
 'A',
 'e',
 'ge',
 'an',
 ' Airlines',
 ' is',
 ' a',
 ' Greek',
 ' airline',
 ' that',
 ' operates',
 ' a',
 ' fleet',
 ' of',
 ' modern',
 ' aircraft',
 ',',
 ' offering',
 ' scheduled',
 ' and',
 ' charter',
 ' flights',
 ' to',
 ' over',
 ' ',
 '150',
 ' destinations',
 ' in',
 ' Europe',
 ',',
 ' Asia',
 ',',
 ' and',
 ' the',
 ' Middle',
 ' East',
 '.',
 ' The',
 ' airline',
 ' is',
 ' headquartered',
 ' at',
 ' Athens',
 ' Ele',
 'f',
 'ther',
 'ios',
 ' Ven',
 'iz',
 'el',
 'os',
 ' International',
 ' Airport',
 ' and',
 ' has',
 ' a',
 ' strong',
 ' focus',
 ' on',
 ' customer',
 ' service',
 ' and',
 ' safety',
 '.\n\n',
 'Would',
 ' you',
 ' like',
 ' to',
 ' know',
 ' something',
 ' specific',
 ' about',
 ' Ae',
 'ge',
 'an',
 ' Airlines',
 ',',
 ' such',
 ' as',
 ' our',
 ' history',
 ',',
 ' fleet',
 ',',
 ' or',
 ' destinations',
 '?',
 '']

## 6. Streaming UI Chatbot

In [56]:
models_available = {
    'Llama 3.1': 'meta-llama/Meta-Llama-3.1-8B-Instruct',
    'Gemma 2': 'google/gemma-2-9b-it'
}

# Load the model and initialize the chat history
history = init_prompts()
tokenizer, model = load_model(models_available["Llama 3.1"])

# Initialize a method for chatting with the LLM using streaming
def gradio_chat_stream(user_input, history):
    if not tokenizer or not model:
        yield "Model not loaded. Please check the configuration.", history, ""
        return

    # Start streaming tokens from the model
    stream = llm_response_stream(tokenizer, model, history, user_input)
        
    # Initialize the chat history for display
    chat_history = [
        (entry["content"], None) if entry["role"] == "user" else (None, entry["content"]) 
        for entry in history if entry["role"] != "system"
    ]

    # Add the user's input to the chat display
    chat_history.append((user_input, None))
    yield chat_history, history, ""  # Show the user's input immediately

    # Stream the assistant's response token by token
    assistant_response = ""
    for token in stream:
        assistant_response += token
        chat_history[-1] = (user_input, assistant_response)  # Update the assistant's response in the last chat entry
        yield chat_history, history, ""  # Update the chatbot dynamically

    # Finalize the assistant's response in the chat history
    chat_history[-1] = (user_input, assistant_response)
    yield chat_history, history, ""  # Ensure the final state is displayed

# Set the Gradio UI
with gr.Blocks() as chat_interface:
    # Title
    gr.Markdown("### Aegean AI Chat Assistant", elem_id="title")

    # Add a dropdown for selecting models
    model_dropdown = gr.Dropdown(
        label="Select Model",
        choices=list(models_available.keys()),
        value=list(models_available.keys())[0],
        interactive=True
    )

    # Chatbox, Input, Send Button
    chatbot = gr.Chatbot()
    user_input = gr.Textbox(label="Your Message", placeholder="Type your message here...", lines=1)
    send_button = gr.Button("Send")

    # Bind send button and text box to chat logic
    send_button.click(
        gradio_chat_stream, 
        inputs=[user_input, gr.State(history)], 
        outputs=[chatbot, gr.State(history), user_input]
    )
    user_input.submit(
        gradio_chat_stream, 
        inputs=[user_input, gr.State(history)], 
        outputs=[chatbot, gr.State(history), user_input]
    )

# Launch the Gradio interface
chat_interface.launch()

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]



* Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.




The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
