<a href="https://colab.research.google.com/github/hosein9574/My-agents/blob/main/RAG_chromadb.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
# Install dependencies
!pip install transformers --upgrade
!pip install sentence-transformers --upgrade
!pip install chromadb --upgrade
!pip install gradio --upgrade

# Imports
import numpy as np
import pandas as pd
import transformers
import sentence_transformers
from sentence_transformers import SentenceTransformer
import chromadb
from datetime import datetime
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
import gradio as gr

# Check versions
print("Transformers version:", transformers.__version__)
print("Sentence-Transformers version:", sentence_transformers.__version__)
print("ChromaDB version:", chromadb.__version__)

# Set up Kaggle API (make sure kaggle.json is uploaded in Colab first)
from google.colab import files
uploaded = files.upload()

!mkdir -p ~/.kaggle
!cp kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json

# Download and unzip BBC News dataset
!kaggle datasets download -d gpreda/bbc-news
!unzip -o bbc-news.zip

# Load data
news = pd.read_csv('./bbc_news.csv')
MAX_NEWS = 1000
DOCUMENT = "description"
TOPIC = "title"
news["id"] = news.index
subset_news = news.head(MAX_NEWS)

# Setup ChromaDB
chroma_client = chromadb.PersistentClient(path="./chromadb")
collection_name = "news_collection_" + datetime.now().strftime("%s")

# Safe check for existing collection
existing_names = [col.name for col in chroma_client.list_collections()]
if collection_name in existing_names:
    chroma_client.delete_collection(name=collection_name)

collection = chroma_client.create_collection(name=collection_name)

# Embed and add to ChromaDB
embedding_model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")
embeddings = embedding_model.encode(subset_news[DOCUMENT].tolist(), convert_to_numpy=True)

collection.add(
    documents=subset_news[DOCUMENT].tolist(),
    metadatas=[{TOPIC: topic} for topic in subset_news[TOPIC].tolist()],
    ids=[f"id{x}" for x in range(MAX_NEWS)],
    embeddings=embeddings.tolist(),
)

# Load language model
model_id = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
tokenizer = AutoTokenizer.from_pretrained(model_id)
lm_model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)

pipe = pipeline(
    "text-generation",
    model=lm_model,
    tokenizer=tokenizer,
    max_new_tokens=256,
    device_map="auto",  # Use GPU if available
)

# QA function
def answer_question(user_question):
    try:
        results = collection.query(query_texts=[user_question], n_results=5)
        context = "\n".join(results["documents"][0])
        context = context[:5120]  # Truncate if needed

        prompt = f"""
        Relevant context: {context}
        Considering the relevant context, answer the question.
        Question: {user_question}
        Answer: """

        response = pipe(prompt)
        answer = response[0]["generated_text"].split("Answer:")[-1].strip()
        return answer
    except Exception as e:
        return f"Error: {str(e)}"

# Gradio Chat Interface
def chat_function(message, history):
    return answer_question(message)

gr.ChatInterface(fn=chat_function, title="📰 News QA Bot", description="Ask anything about recent news articles.").launch()

Collecting gradio
  Using cached gradio-5.29.0-py3-none-any.whl.metadata (16 kB)
Collecting aiofiles<25.0,>=22.0 (from gradio)
  Using cached aiofiles-24.1.0-py3-none-any.whl.metadata (10 kB)
Collecting ffmpy (from gradio)
  Using cached ffmpy-0.5.0-py3-none-any.whl.metadata (3.0 kB)
Collecting gradio-client==1.10.0 (from gradio)
  Using cached gradio_client-1.10.0-py3-none-any.whl.metadata (7.1 kB)
Collecting groovy~=0.1 (from gradio)
  Using cached groovy-0.1.2-py3-none-any.whl.metadata (6.1 kB)
Collecting pydub (from gradio)
  Using cached pydub-0.25.1-py2.py3-none-any.whl.metadata (1.4 kB)
Collecting python-multipart>=0.0.18 (from gradio)
  Using cached python_multipart-0.0.20-py3-none-any.whl.metadata (1.8 kB)
Collecting ruff>=0.9.3 (from gradio)
  Using cached ruff-0.11.8-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (25 kB)
Collecting safehttpx<0.2.0,>=0.1.6 (from gradio)
  Using cached safehttpx-0.1.6-py3-none-any.whl.metadata (4.2 kB)
Collecting semantic-ver

Saving kaggle.json to kaggle.json
Dataset URL: https://www.kaggle.com/datasets/gpreda/bbc-news
License(s): CC0-1.0
Archive:  bbc-news.zip
  inflating: bbc_news.csv            


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.5k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.29k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/551 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/608 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/2.20G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

Device set to use cpu
  self.chatbot = Chatbot(


It looks like you are running Gradio on a hosted a Jupyter notebook. For the Gradio app to work, sharing must be enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://c1d37f62334f3141a4.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


