<a href="https://colab.research.google.com/github/heinohen/Textual-Data-Analysis/blob/main/TDA_exercise11_chatbot.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>



# Textual data analysis exercise 11 - basic chatbot



## Setup

### installs

In [1]:
%%bash

pip3 install -q transformers torch

In [2]:
!nvidia-smi

Thu Feb 20 13:21:57 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  NVIDIA L4                      Off |   00000000:00:03.0 Off |                    0 |
| N/A   69C    P0             24W /   72W |       0MiB /  23034MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

### imports

In [10]:
import transformers
#transformers.logging.set_verbosity_error() # no

from transformers import AutoModelForCausalLM, AutoTokenizer
from huggingface_hub import login
from google.colab import userdata


In [13]:
hf_token = userdata.get('huggingface')
login(token=hf_token, add_to_git_credential=True)

## model

In [14]:
MODEL_NAME = 'HuggingFaceTB/SmolLM2-1.7B-Instruct'

In [15]:
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
# for multiple GPUs install accelerate and do `model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto")`
device = "cuda"
model = AutoModelForCausalLM.from_pretrained(MODEL_NAME).to(device)

# TASK

## description

* At least 5 questions about basic facts about the world (e.g. the capital of a country)
* At least 5 arithmetic questions ranging from trivial ("what is 1+1?") to more complex
* Inform the system of a secret word (e.g. "zebra"), then after a few other questions ask it what the secret word is. Make sure you understand where the memory of that secret word is.

# PROGRAM

## data storage

In [21]:
message_history = []

## actual functionality

In [22]:
while True:
    user_input = input('Say something ("exit" to quit): ')
    if user_input == 'exit':
        break

    message_history.append({"role": "user", "content": f"{user_input}"})
    input_text=tokenizer.apply_chat_template(message_history, tokenize=False)
    inputs = tokenizer.encode(input_text, return_tensors="pt").to(device)
    outputs = model.generate(inputs,
                             max_new_tokens=50,
                             temperature=0.2,
                             top_p=0.9,
                             do_sample=True)
    result = tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True).rstrip().split() # this starts to decode the answer from the end of the current length of the previous messages and splits it to list
    answer = " ".join(result[1:]) # removes the "assistant" from the beginning
    message_history.append({"role": "assistant", "content": answer})
    print(answer)

Say something ("exit" to quit): Hello!
Hello! How can I assist you today?
Say something ("exit" to quit): What is the capital of finland
The capital of Finland is Helsinki.
Say something ("exit" to quit): What is the capital of sweden
The capital of Sweden is Stockholm.
Say something ("exit" to quit): Who is the president of finland
The President of Finland is Sauli Niinisto.
Say something ("exit" to quit): Didn't Finland have presidential elections in 2024
Yes, Finland held presidential elections in 2024. The incumbent President Sauli Niinisto was re-elected for a second term.
Say something ("exit" to quit): OK, what is the capital of endor
Endor is not a recognized country. It is a fictional location from the video game series Final Fantasy. The capital of Endor is not explicitly mentioned in the games.
Say something ("exit" to quit): OK then, lets have some math!
I'm an AI trained to provide information on a wide range of topics, including history, geography, and culture. I'm not eq

# MESSAGE LOG

In [23]:
length = 10

for m in message_history:

  message = f"{m['role']}"
  print(f"{message:<{length}}","=>\t", m['content'])

user       =>	 Hello!
assistant  =>	 Hello! How can I assist you today?
user       =>	 What is the capital of finland
assistant  =>	 The capital of Finland is Helsinki.
user       =>	 What is the capital of sweden
assistant  =>	 The capital of Sweden is Stockholm.
user       =>	 Who is the president of finland
assistant  =>	 The President of Finland is Sauli Niinisto.
user       =>	 Didn't Finland have presidential elections in 2024
assistant  =>	 Yes, Finland held presidential elections in 2024. The incumbent President Sauli Niinisto was re-elected for a second term.
user       =>	 OK, what is the capital of endor
assistant  =>	 Endor is not a recognized country. It is a fictional location from the video game series Final Fantasy. The capital of Endor is not explicitly mentioned in the games.
user       =>	 OK then, lets have some math!
assistant  =>	 I'm an AI trained to provide information on a wide range of topics, including history, geography, and culture. I'm not equipped to perf