# tinyChat

### A RAG integrated Chatbot, powered by TinyLlama 1.1B as main LM
Built By: [Mohammad Ali](https://github.com/mohammad17ali)

## 1. Checking and Importing Requirements

In [2]:
pip install llama-index transformers torch

Collecting llama-index
  Downloading llama_index-0.12.27-py3-none-any.whl.metadata (12 kB)
Collecting llama-index-agent-openai<0.5.0,>=0.4.0 (from llama-index)
  Downloading llama_index_agent_openai-0.4.6-py3-none-any.whl.metadata (727 bytes)
Collecting llama-index-cli<0.5.0,>=0.4.1 (from llama-index)
  Downloading llama_index_cli-0.4.1-py3-none-any.whl.metadata (1.5 kB)
Collecting llama-index-core<0.13.0,>=0.12.27 (from llama-index)
  Downloading llama_index_core-0.12.27-py3-none-any.whl.metadata (2.6 kB)
Collecting llama-index-embeddings-openai<0.4.0,>=0.3.0 (from llama-index)
  Downloading llama_index_embeddings_openai-0.3.1-py3-none-any.whl.metadata (684 bytes)
Collecting llama-index-indices-managed-llama-cloud>=0.4.0 (from llama-index)
  Downloading llama_index_indices_managed_llama_cloud-0.6.10-py3-none-any.whl.metadata (3.6 kB)
Collecting llama-index-llms-openai<0.4.0,>=0.3.0 (from llama-index)
  Downloading llama_index_llms_openai-0.3.29-py3-none-any.whl.metadata (3.3 kB)
Colle

In [3]:
pip install faiss-gpu

Collecting faiss-gpu
  Downloading faiss_gpu-1.7.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (1.4 kB)
Downloading faiss_gpu-1.7.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (85.5 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m85.5/85.5 MB[0m [31m19.7 MB/s[0m eta [36m0:00:00[0m:00:01[0m0:01[0mm
[?25hInstalling collected packages: faiss-gpu
Successfully installed faiss-gpu-1.7.2
Note: you may need to restart the kernel to use updated packages.


In [55]:
import os
import torch
import numpy as np
import transformers
from transformers import (AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, TrainingArguments, Trainer, DataCollatorForLanguageModeling, pipeline)
import faiss
from typing import List, Dict, Tuple
import textwrap
from sentence_transformers import SentenceTransformer

In [56]:
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("TinyLlama/TinyLlama-1.1B-Chat-v1.0")
model = AutoModelForCausalLM.from_pretrained("TinyLlama/TinyLlama-1.1B-Chat-v1.0")

tokenizer_config.json:   0%|          | 0.00/1.29k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/551 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/608 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/2.20G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

#### Checking simple context passing and generation with TinyLlama, using fictional data which the model surely has never seen before.

In [57]:
context = ['Kvaratskhelia is a good midfielder.','Kadambaragu brother is Kvaratskhelia.', 'Kvaratskhelia plays for FC Barcelona.']

In [58]:
pipe = pipeline("text-generation", model="TinyLlama/TinyLlama-1.1B-Chat-v1.0", torch_dtype=torch.bfloat16, device_map="auto")

messages = [
    {
        "role": "system",
        "content": f"The following is relevant context for your response. Use this information to help answer the user's question:\n\n{context}"
    },
    {"role": "user", "content": "What does Kadambaragu's brother do?"},
]
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipe(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])

Device set to use cuda:0


<|system|>
The following is relevant context for your response. Use this information to help answer the user's question:

['Kvaratskhelia is a good midfielder.', 'Kadambaragu brother is Kvaratskhelia.', 'Kvaratskhelia plays for FC Barcelona.']</s>
<|user|>
What does Kadambaragu's brother do?</s>
<|assistant|>
Kadambaragu's brother is Kvaratskhelia. Kvaratskhelia is not mentioned in the given context, so we do not know what his role or responsibilities are as a player for FC Barcelona.


## 2. Building the Bot

### 2.1 Configuration

In [140]:
class Config:
    def __init__(self):
        self.model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
        self.embedding_model = 'all-MiniLM-L6-v2'
        self.max_context_length = 512
        self.max_new_tokens = 256
        self.vector_dim = 384
        self.top_k = 3
        self.chunk_size = 256
        self.stop_word = 'STOP!'
        

### 2.2 Vector Database and Functionalities

In [141]:
class VectorDB:
    def __init__(self, vector_dim: int):
        self.vector_dim = vector_dim
        self.index = faiss.IndexFlatL2(vector_dim)
        self.texts = []
        self.embedding_model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
    
    def add_text(self, text: str) -> None:
        chunks = self._create_chunks(text)
        for chunk in chunks:
            self._add_chunk(chunk)
    
    def _create_chunks(self, text: str) -> List[str]:
        words = text.split()
        chunks = []
        current_chunk = []
        current_length = 0
        
        for word in words:
            current_chunk.append(word)
            current_length += len(word) + 1 
            
            if current_length >= config.chunk_size:
                chunks.append(' '.join(current_chunk))
                current_chunk = []
                current_length = 0
                
        if current_chunk:
            chunks.append(' '.join(current_chunk))
        return chunks

    def _add_chunk(self, chunk: str) -> None:
        embedding = self.embedding_model.encode([chunk])[0]
        embedding = np.array([embedding], dtype=np.float32)
        faiss.normalize_L2(embedding)  # Normalize before adding
        self.index.add(embedding)
        self.texts.append(chunk)
    
    def _add_chunk(self, chunk: str) -> None:
        embedding = self.embedding_model.encode([chunk])[0]
        faiss.normalize_L2(np.array([embedding], dtype=np.float32))
        self.index.add(np.array([embedding], dtype=np.float32))
        self.texts.append(chunk)

    def DB(self, limit: int = 1) -> List[str]:
        if self.index.ntotal == 0:
            return ['empty DataBase']
        
        return self.texts[:min(limit, len(self.texts))]
    
    def search(self, query: str, top_k: int = 3) -> List[str]:
        query_embedding = self.embedding_model.encode([query])[0]
        query_embedding = np.array([query_embedding], dtype=np.float32)
        faiss.normalize_L2(query_embedding)  # Normalize before searching
        
        if self.index.ntotal == 0:
            return []
        
        D, I = self.index.search(query_embedding, min(top_k, self.index.ntotal))
        
        results = [self.texts[i] for i in I[0] if i < len(self.texts)]
        return results

### 2.3 Inference

In [150]:
class TinyLlamaChatModel:
    def __init__(self, model_name: str):
        self.tokenizer = AutoTokenizer.from_pretrained(model_name)
        self.pipe = pipeline(
            "text-generation", 
            model=model_name, 
            torch_dtype=torch.bfloat16, 
            device_map="auto"
        )

    def generate_response(self, user_message: str, context: str = "", max_new_tokens: int = 256) -> str:
        messages = []
        
        if context: #context addition
            messages.append({
                "role": "system",j
                "content": f"The following is relevant context for your response. Use this information to help answer the user's question:\n\n{context}"
            })
        
        messages.append({ 
            "role": "user",
            "content": user_message
        })
        
        prompt = self.tokenizer.apply_chat_template(
            messages, 
            tokenize=False, 
            add_generation_prompt=True
        )
        
        outputs = self.pipe( #output gen
            prompt, 
            max_new_tokens=max_new_tokens, 
            do_sample=True, 
            temperature=0.7, 
            top_k=50, 
            top_p=0.95
        )
        
        full_response = outputs[0]["generated_text"]
        assistant_part = full_response.split("<|assistant|>")[-1].strip()

        stop_idx = assistant_part.find(config.stop_word)
        if stop_idx != -1:
            assistant_part = assistant_part[:stop_idx].strip()
            
        return assistant_part

### 2.4 Bot Functionalities

In [169]:
class RAGChatbot:
    def __init__(self, config: Config):
        self.config = config
        self.vector_db = VectorDB(config.vector_dim)
        self.llm = TinyLlamaChatModel(config.model_name)
        self.conversation_history = []
        
    def chat(self, user_input: str) -> str:
        if self.config.stop_word in user_input:
            return "Chat ended." #stopping at trigger word
             
        relevant_chunks = self.vector_db.search(user_input, self.config.top_k)
        context = "\n".join(relevant_chunks)
        
        response = self.llm.generate_response( #output gen
            user_message=user_input,
            context=context,
            max_new_tokens=self.config.max_new_tokens
        )
        
        self.conversation_history.append({"user": user_input, "assistant": response})
        
        self.vector_db.add_text(f"User: {user_input}\nAssistant: {response}")
        
        return response 

    def load_initial_context(self):
        initial_context = """
        This is example context to be added to the vector database for retrieval. I can add any other relevant textual content to it, to privide context for teh model.
        So this encourages context rich generation. Huge step eh?!
        """
        self.vector_db.add_text(initial_context)
    
    def display_conversation(self) -> None:
        for i, exchange in enumerate(self.conversation_history):
            print(f"User [{i+1}]: {exchange['user']}")
            print(f"Assistant [{i+1}]: {exchange['assistant']}")
            print("-" * 50)    

## 3. Testing tinyChat

### 3.1 RUN Function

In [180]:
def tinyChat():
    print("TinyLlama RAG Chatbot Initialized!")
    print(f"Type '{config.stop_word}' to end the chat.")
    print("-" * 50)
    
    while True:
        user_input = input("You: ")
        if user_input.strip().lower() == config.stop_word.lower():
            print("Chat ended.")
            break
        
        response = chatbot.chat(user_input)
        print(f"Assistant: {response}\n")

### 3.2 Initialising tinyChat

In [184]:
config = Config()
chatbot = RAGChatbot(config)

Device set to use cuda:0


### 3.3 Testing Retrieval Augmented Generation
#### A fictional story about Ali Goba -- A legendary Indian footballer.
> We are using this as a context to interact with the chatbot.

In [185]:
context = 'The Legend of Ali Goba: From Ladakh to Global Glory \n In the cold, rugged terrain of Ladakh, where oxygen was thin and dreams often seemed out of reach, a young boy named Ali Goba spent his days kicking a battered football against monastery walls. Born into a humble family in Kargil, Ali had nothing but his raw talent, an unbreakable spirit, and a dream—to play in the biggest stadiums of the world.\n The Rise from Ladakh \n Ali’s extraordinary footwork caught the attention of a visiting coach from the AIFF Elite Academy during a youth tournament in Delhi. By the time he was 16, his name was already whispered in Indian footballing circles. With his dazzling dribbles and pinpoint passing, he led the Indian U-17 team to an unexpected triumph at the AFC U-17 Championship, attracting the attention of European scouts. \n Tottenham Hotspur: The Breakthrough \n At 18, Ali Goba made history, becoming the first Indian footballer to sign for a Premier League club, joining Tottenham Hotspur. Under the guidance of Mauricio Pochettino, he honed his technical skills and adapted to the lightning-fast pace of the English game. His debut in the North London Derby against Arsenal was nothing short of spectacular—scoring a stunning goal from 30 yards out, making headlines across Europe. \n By his second season, he had formed a formidable midfield partnership with Christian Eriksen and Dele Alli. His performances against the likes of Manchester City and Liverpool earned him the PFA Young Player of the Year award, marking his arrival as a world-class talent. \n FC Barcelona: The Making of a Legend \n His meteoric rise led to a record-breaking €150 million transfer to FC Barcelona, where he donned the legendary number 10 shirt after Lionel Messi’s departure. Playing alongside Pedri and Frenkie de Jong, Ali became the architect of Barcelona’s attack, blending the tiki-taka style with his own Himalayan resilience. \n It was during El Clásico against Real Madrid that he cemented his place among the greats—scoring a last-minute bicycle kick winner past Thibaut Courtois in front of a roaring Camp Nou. His impact was immediate, leading Barcelona to back-to-back La Liga and Champions League titles. In 2029, he achieved what no Indian had before—winning the Ballon Dor, beating Kylian Mbappé and Jude Bellingham to the prestigious award. \n Bringing the World Cup to India: \n Despite his club success, Ali Goba’s heart remained with his homeland. Under his captaincy, India qualified for the 2030 FIFA World Cup—a historic first. Against all odds, India, ranked 72nd in the world, shocked Germany in the quarter-finals, with Ali scoring an outrageous free-kick past Manuel Neuer’s successor. \n In the final at Maracanã Stadium, Brazil, facing an Argentina side led by Paulo Dybala and Alejandro Garnacho, Ali produced a masterclass. In the dying minutes, he nutmegged Enzo Fernández, dribbled past Lisandro Martínez, and chipped the ball over Emiliano Martínez, securing India’s first-ever World Cup trophy. \n The Immortal Legacy: \n Ali Goba returned to India as a national hero, inspiring millions. Stadiums were renamed after him, and football academies sprang up across the country. His autobiography, "From Ladakh to the World", became a bestseller, and his story was adapted into a Bollywood blockbuster starring Ranveer Singh. \n Even after retirement, Ali remained an ambassador for Indian football, mentoring young talents and ensuring that no child in Ladakh—or anywhere in India—ever had to give up on their dreams due to circumstances. \n To this day, football fans around the world remember the boy from the Himalayas who dared to dream, conquered the world, and changed Indian football forever. \n Ali Goba’s name stands alongside Pelé, Maradona, Messi, and Ronaldo, proving that legends can rise from anywhere—even the highest mountains of the world.'
chatbot.vector_db.add_text(context)

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

### 3.4 Chat Window

In [186]:
tinyChat()

TinyLlama RAG Chatbot Initialized!
Type 'STOP!' to end the chat.
--------------------------------------------------


You:  Who was Ali Goba?


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Assistant: Ali Goba is a former Indian football player who is widely regarded as one of the greatest footballers in Indian history. He played for India at the 1974 FIFA World Cup, where he scored the first-ever Indian goal in the tournament. After the tournament, Goba retired from international football and returned to his hometown of Leh, where he continued to play for the local football team, winning several championships in the process. He also served as a coach for the Indian national team during the 1980s and 1990s, helping them to win the Asian Cup in 1996 and the Asian Games in 1998. Goba's impact on Indian football was significant, and he is often considered one of the greatest footballers of all time in India.



You:  Did Ali Goba ever win the ballon dor?


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Assistant: Yes, Ali Goba won the FIFA World Cup Ballon d'Or award in 1974, which is considered to be the most prestigious individual award in football. This award is presented annually to the top goalkeeper or midfielder in the world, and Goba was the first Indian to win it, making him one of the greatest footballers in Indian history.



You:  STOP!


Chat ended.
