<a href="https://colab.research.google.com/github/heinohen/Textual-Data-Analysis/blob/main/TDA_exercise11_chatbot.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>



# Textual data analysis exercise 11 - basic chatbot



## Setup

### installs

In [1]:
%%bash

pip3 install -q transformers torch

In [2]:
!nvidia-smi

Thu Feb 20 13:21:57 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  NVIDIA L4                      Off |   00000000:00:03.0 Off |                    0 |
| N/A   69C    P0             24W /   72W |       0MiB /  23034MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

### imports

In [25]:
import transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
from huggingface_hub import login
from google.colab import userdata
import datetime

In [13]:
hf_token = userdata.get('huggingface')
login(token=hf_token, add_to_git_credential=True)

## model

In [14]:
MODEL_NAME = 'HuggingFaceTB/SmolLM2-1.7B-Instruct'

In [15]:
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
device = "cuda"
model = AutoModelForCausalLM.from_pretrained(MODEL_NAME).to(device)

# TASK

## description

* At least 5 questions about basic facts about the world (e.g. the capital of a country)
* At least 5 arithmetic questions ranging from trivial ("what is 1+1?") to more complex
* Inform the system of a secret word (e.g. "zebra"), then after a few other questions ask it what the secret word is. Make sure you understand where the memory of that secret word is.

# PROGRAM

## data storage

In [32]:
message_history = []

## actual functionality

In [33]:
while True:
    user_input = input('Say something ("exit" to quit): ')
    if user_input == 'exit':
        break
    message_history.append({"role": "user", "content": f"{user_input}"})
    input_text=tokenizer.apply_chat_template(message_history, tokenize=False)
    inputs = tokenizer.encode(input_text, return_tensors="pt").to(device)
    outputs = model.generate(inputs,
                             max_new_tokens=50,
                             temperature=0.2,
                             top_p=0.9,
                             do_sample=True)
    result = tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True).rstrip().split() # this starts to decode the answer from the end of the current length of the previous messages and splits it to list
    answer = " ".join(result[1:]) # removes the "assistant" from the beginning
    message_history.append({"role": "assistant", "content": answer})
    print(answer)

Say something ("exit" to quit): hello
Hello! How can I assist you today?
Say something ("exit" to quit): What is the capital of Finland
The capital of Finland is Helsinki.
Say something ("exit" to quit): What is the capital of Sweden
The capital of Sweden is Stockholm.
Say something ("exit" to quit): Who is the president of Finland
The current President of Finland is Sauli Niinistö.
Say something ("exit" to quit): Didn't we have elections in 2024
Yes, Finland held presidential elections in 2024. Sauli Niinistö was elected as the new president.
Say something ("exit" to quit): What is the ocean dividing europe and americas
The ocean dividing Europe and America is the Atlantic Ocean.
Say something ("exit" to quit): ok then some math!
Sure, I'd be happy to help with some math! What's your question?
Say something ("exit" to quit): 1+1
The answer to 1+1 is 2.
Say something ("exit" to quit): 2-2
The answer to 2-2 is 0.
Say something ("exit" to quit): 3 / 3
The answer to 3/3 is 1.
Say somethin

# MESSAGE LOG

In [34]:
length = 10

for m in message_history:

  message = f"{m['role']}"
  print(f"{message:<{length}}","=>\t", m['content'])

user       =>	 hello
assistant  =>	 Hello! How can I assist you today?
user       =>	 What is the capital of Finland
assistant  =>	 The capital of Finland is Helsinki.
user       =>	 What is the capital of Sweden
assistant  =>	 The capital of Sweden is Stockholm.
user       =>	 Who is the president of Finland
assistant  =>	 The current President of Finland is Sauli Niinistö.
user       =>	 Didn't we have elections in 2024
assistant  =>	 Yes, Finland held presidential elections in 2024. Sauli Niinistö was elected as the new president.
user       =>	 What is the ocean dividing europe and americas
assistant  =>	 The ocean dividing Europe and America is the Atlantic Ocean.
user       =>	 ok then some math!
assistant  =>	 Sure, I'd be happy to help with some math! What's your question?
user       =>	 1+1
assistant  =>	 The answer to 1+1 is 2.
user       =>	 2-2
assistant  =>	 The answer to 2-2 is 0.
user       =>	 3 / 3
assistant  =>	 The answer to 3/3 is 1.
user       =>	 4 * 4
assistant  