<a href="https://colab.research.google.com/github/heinohen/Textual-Data-Analysis/blob/main/TDA_exercise11_chatbot.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>



# Textual data analysis exercise 11 - basic chatbot



## Setup

### installs

In [1]:
%%bash

pip3 install -q transformers torch

   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 363.4/363.4 MB 3.4 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 13.8/13.8 MB 105.1 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 24.6/24.6 MB 76.8 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 883.7/883.7 kB 47.7 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 664.8/664.8 MB 1.9 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 211.5/211.5 MB 4.7 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 56.3/56.3 MB 41.5 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 127.9/127.9 MB 17.9 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 207.5/207.5 MB 5.8 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 21.1/21.1 MB 98.4 MB/s eta 0:00:00


In [2]:
!nvidia-smi

Fri Feb 21 08:24:52 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  NVIDIA L4                      Off |   00000000:00:03.0 Off |                    0 |
| N/A   49C    P8             12W /   72W |       0MiB /  23034MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

### imports

In [3]:
import transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
from huggingface_hub import login
from google.colab import userdata
import datetime

In [4]:
hf_token = userdata.get('huggingface')
login(token=hf_token, add_to_git_credential=True)

## model

In [5]:
MODEL_NAME = 'HuggingFaceTB/SmolLM2-1.7B-Instruct'

In [10]:
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
device = "cuda"
model = AutoModelForCausalLM.from_pretrained(MODEL_NAME).to(device)

# TASK

## description

* At least 5 questions about basic facts about the world (e.g. the capital of a country)
* At least 5 arithmetic questions ranging from trivial ("what is 1+1?") to more complex
* Inform the system of a secret word (e.g. "zebra"), then after a few other questions ask it what the secret word is. Make sure you understand where the memory of that secret word is.

# PROGRAM

## data storage

In [11]:
message_history = []

## actual functionality

In [12]:
while True:
    user_input = input('Say something ("exit" to quit): ')
    if user_input == 'exit':
        break
    message_history.append({"role": "user", "content": f"{user_input}"})
    input_text=tokenizer.apply_chat_template(message_history, tokenize=False)
    inputs = tokenizer.encode(input_text, return_tensors="pt").to(device)
    outputs = model.generate(inputs,
                             max_new_tokens=50,
                             temperature=0.2,
                             top_p=0.9,
                             do_sample=True)
    result = tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True).rstrip().split() # this starts to decode the answer from the end of the current length of the previous messages and splits it to list
    answer = " ".join(result[1:]) # removes the "assistant" from the beginning
    message_history.append({"role": "assistant", "content": answer})
    print(answer)

Say something ("exit" to quit): hello
Hello! How can I assist you today?
Say something ("exit" to quit): i have some questions
Of course, I'm here to help. What questions do you have?
Say something ("exit" to quit): what is the capital of finland
The capital of Finland is Helsinki.
Say something ("exit" to quit): what is the capital of sweden
The capital of Sweden is Stockholm.
Say something ("exit" to quit): what is the name of the ocean that divides europe and america
The ocean that divides Europe and America is the Atlantic Ocean.
Say something ("exit" to quit): what are the planets of our solar system
The planets of our solar system are: 1. Mercury 2. Venus 3. Earth 4. Mars 5. Jupiter 6. Saturn 7. Uranus 8. Neptune Please note that Pluto
Say something ("exit" to quit): ok then on to some math
I'm ready to help with math questions. What topic or problem would you like to discuss?
Say something ("exit" to quit): 1+1
The sum of 1 and 1 is 2.
Say something ("exit" to quit): 2-2
The dif

# MESSAGE LOG

In [13]:
length = 10

for m in message_history:

  message = f"{m['role']}"
  print(f"{message:<{length}}","=>\t", m['content'])

user       =>	 hello
assistant  =>	 Hello! How can I assist you today?
user       =>	 i have some questions
assistant  =>	 Of course, I'm here to help. What questions do you have?
user       =>	 what is the capital of finland
assistant  =>	 The capital of Finland is Helsinki.
user       =>	 what is the capital of sweden
assistant  =>	 The capital of Sweden is Stockholm.
user       =>	 what is the name of the ocean that divides europe and america
assistant  =>	 The ocean that divides Europe and America is the Atlantic Ocean.
user       =>	 what are the planets of our solar system
assistant  =>	 The planets of our solar system are: 1. Mercury 2. Venus 3. Earth 4. Mars 5. Jupiter 6. Saturn 7. Uranus 8. Neptune Please note that Pluto
user       =>	 ok then on to some math
assistant  =>	 I'm ready to help with math questions. What topic or problem would you like to discuss?
user       =>	 1+1
assistant  =>	 The sum of 1 and 1 is 2.
user       =>	 2-2
assistant  =>	 The difference of 2 and 2