# Telegram Channel Integration with RAG System

This notebook demonstrates how to:
1. Set up the Telegram client
2. Download messages from specified channels
3. Process these messages into a vector store
4. Query the processed messages using RAG

## Setup

First, we need to get your Telegram API credentials:
1. Go to https://my.telegram.org/apps
2. Create a new application
3. Note down `api_id` and `api_hash`

Create a `.env` file in this directory with your credentials:
```
TELEGRAM_API_ID=your_api_id
TELEGRAM_API_HASH=your_api_hash
```


In [None]:
%pip install -r requirements_telegram.txt


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Note: you may need to restart the kernel to use updated packages.


In [None]:
import os
from dotenv import load_dotenv
from telegram_ingestion import TelegramChannelIngestion
from telegram_rag_integration import TelegramRAGIntegration
import asyncio

load_dotenv()


True

## Step 1: Download Messages from Telegram Channels

Specify channels


In [None]:
# List channels here
channels = ["guardian", "bloomberg"]

async def download_messages():
    ingestion = TelegramChannelIngestion(
        api_id=os.getenv("TELEGRAM_API_ID"),
        api_hash=os.getenv("TELEGRAM_API_HASH")
    )
    
    await ingestion.start()
    try:
        messages = await ingestion.process_channels(
            channels,
            limit_per_channel=100,  # Can be changed
            since_hours=24  # Can be changed
        )
        print(f"Downloaded {len(messages)} messages from {len(channels)} channels")
    finally:
        await ingestion.stop()

await download_messages()


Downloaded 200 messages from 2 channels


## Step 2: Process Messages into Vector Store

Process the downloaded messages and add them to RAG system


In [None]:
from genai_helper import ChunkStreamer
from ov_langchain_helper import OpenVINOLLM
import openvino as ov

# Initialize the LLM
model_id = "qwen2.5-3b-instruct/INT4_compressed_weights"  # You can change this to any supported model
llm = OpenVINOLLM.from_model_path(
    model_path=model_id,
    device="CPU"
)

# Example questions
questions = [
    "What are the main topics discussed in the Bloomberg channel?",
    "What are the latest updates from The Guardian?",
    "Are there any discussions about technology or AI?"
]

for question in questions:
    print(f"\nQuestion: {question}")
    
    # Update LLM configuration
    llm.config.temperature = 0.7
    llm.config.top_p = 0.9
    llm.config.top_k = 50
    llm.config.repetition_penalty = 1.1
    
    answer = rag.answer_question(
        question=question,
        llm=llm,
        k=5  # Number of relevant messages to retrieve
    )
    print(f"Answer: {answer}\n")
    print("-" * 80)


In [None]:
# Example channel-specific questions
channel_questions = [
    ("bloomberg", "What are the latest economic updates?"),
    ("guardian", "What are the main political stories?")
]

for channel, question in channel_questions:
    print(f"\nChannel: {channel}")
    print(f"Question: {question}")
    
    # Update LLM configuration
    llm.config.temperature = 0.7
    llm.config.top_p = 0.9
    llm.config.top_k = 50
    llm.config.repetition_penalty = 1.1
    
    answer = rag.answer_question(
        question=question,
        llm=llm,
        k=5,
        filter_dict={"channel": channel}
    )
    print(f"Answer: {answer}\n")
    print("-" * 80)


In [None]:
rag = TelegramRAGIntegration(
    embedding_model_name="BAAI/bge-small-en-v1.5",  # Can be changed
    vector_store_path="telegram_vector_store",
    chunk_size=500,
    chunk_overlap=50
)

rag.process_telegram_data_dir()


## Step 3: Query the Processed Messages

In [None]:
def query_messages(query: str, k: int = 5):
    results = rag.query_messages(query, k=k)
    
    print(f"Query: {query}\n")
    for i, doc in enumerate(results, 1):
        print(f"Result {i}:")
        print(f"Channel: {doc.metadata['channel']}")
        print(f"Date: {doc.metadata['date']}")
        print(f"Content: {doc.page_content[:200]}...\n")

query_messages("What are the latest announcements?")
query_messages("Any updates about new features?")


Query: What are the latest announcements?

Result 1:
Channel: bloomberg
Date: 2025-05-04T07:57:30+00:00
Content: 🎙 Trump wasn't on the ballot in Australia and Singapore elections, but his tariffs and policies loomed large over the results.

Bloomberg reporters take your questions on what's next - tune in on Mond...

Result 2:
Channel: bloomberg
Date: 2025-05-04T07:57:30+00:00
Content: 🎙 Trump wasn't on the ballot in Australia and Singapore elections, but his tariffs and policies loomed large over the results.

Bloomberg reporters take your questions on what's next - tune in on Mond...

Result 3:
Channel: bloomberg
Date: 2025-05-04T07:57:30+00:00
Content: 🎙 Trump wasn't on the ballot in Australia and Singapore elections, but his tariffs and policies loomed large over the results.

Bloomberg reporters take your questions on what's next - tune in on Mond...

Result 4:
Channel: bloomberg
Date: 2025-05-04T07:57:30+00:00
Content: 🎙 Trump wasn't on the ballot in Australia and Singapore electi

## Step 4: Filter by Channel

We can filter results to specific channels


In [None]:
specific_channel = "bloomberg"
results = rag.query_messages(
    "What are the latest updates?",
    k=5,
    filter_dict={"channel": specific_channel}
)

print(f"Results from {specific_channel}:")
for doc in results:
    print(f"\nDate: {doc.metadata['date']}")
    print(f"Content: {doc.page_content[:200]}...")


Results from bloomberg:

Date: 2025-04-30T04:08:47+00:00
Content: 🎙 LIVE NOW: Can Australia's next government fix its economy?

Ahead of the country's federal election on Saturday, Bloomberg reporters are taking your questions on the main parties' plans in a Live Q&...

Date: 2025-04-30T04:08:47+00:00
Content: 🎙 LIVE NOW: Can Australia's next government fix its economy?

Ahead of the country's federal election on Saturday, Bloomberg reporters are taking your questions on the main parties' plans in a Live Q&...

Date: 2025-04-30T04:08:47+00:00
Content: 🎙 LIVE NOW: Can Australia's next government fix its economy?

Ahead of the country's federal election on Saturday, Bloomberg reporters are taking your questions on the main parties' plans in a Live Q&...

Date: 2025-04-30T04:08:47+00:00
Content: 🎙 LIVE NOW: Can Australia's next government fix its economy?

Ahead of the country's federal election on Saturday, Bloomberg reporters are taking your questions on the main parties' plans in a Liv