# Telegram Analyzer Notebook

This notebook demonstrates how to use the Telegram Analyzer package to process and analyze Telegram chat messages.

## Overview

The Telegram Analyzer is a tool that allows you to:

1. Parse JSON Telegram message exports
2. Load messages into ChromaDB for semantic search
3. Query the database to ask questions about the message content
4. Generate answers using Ollama LLM models

This notebook will walk you through each step of the process.


## Setup

First, let's import the necessary modules from the Telegram Analyzer package.


In [None]:
# Import required modules
import os

# Import Telegram Analyzer modules
from telegram_analyzer import config
from telegram_analyzer.data_processing import TelegramDataProcessor
from telegram_analyzer.database import ChromaDBManager
from telegram_analyzer.logging import setup_logging
from telegram_analyzer.query import QueryProcessor, answer_question

# Setup logging
logger = setup_logging(log_level="INFO")
logger.info("Telegram Analyzer notebook initialized")


## Configuration

Let's examine the default configuration settings for the Telegram Analyzer.


In [None]:
# Display configuration settings
print(f"ChromaDB Settings:\n{config.CHROMADB_SETTINGS}\n")
print(f"Collection Name: {config.COLLECTION_NAME}\n")
print(f"Sentence Model:\n{config.SENTENCE_MODEL}\n")
print(f"Query Top K: {config.QUERY_TOP_K}\n")
print(f"Ollama Model:\n{config.OLLAMA_MODEL}\n")


## 1. Loading Telegram Data

The first step is to load and process the Telegram chat data from a JSON export file. 

### How to export Telegram chat data

1. Open Telegram Desktop
2. Select the chat you want to export
3. Click on the three dots in the top-right corner
4. Select "Export chat history"
5. Choose JSON format
6. Click "Export"

Now, let's load the data from the exported JSON file.


In [None]:
# Set the path to your Telegram JSON export file
# Replace this with the path to your actual file
json_file_path = "path/to/your/telegram_export.json"

# Check if the file exists
if not os.path.exists(json_file_path):
    print(f"File not found: {json_file_path}")
    print("Please update the path to your Telegram JSON export file.")
else:
    # Load messages using the TelegramDataProcessor
    processor = TelegramDataProcessor(json_file_path)
    messages = processor.load_messages()

    print(f"Loaded {len(messages)} messages from {json_file_path}")

    # Display the first few messages
    print("\nSample messages:")
    for i, msg in enumerate(messages[:5]):
        print(f"\nMessage {i + 1}:")
        print(f"From: {msg['from']}")
        print(f"Date: {msg['date']}")
        print(f"Text: {msg['text'][:100]}..." if len(msg['text']) > 100 else f"Text: {msg['text']}")


## 2. Loading Data into ChromaDB

Now that we have loaded the messages, let's store them in ChromaDB for semantic search. ChromaDB will generate embeddings for each message, allowing us to find semantically similar messages later.


In [None]:
# Initialize the ChromaDB manager
db_manager = ChromaDBManager(
    collection_name="notebook_demo",  # Use a separate collection for this notebook
    model_name=config.SENTENCE_MODEL["name"],
    device=config.SENTENCE_MODEL["device"]
)

# Check if we have messages to load
if 'messages' in locals() and messages:
    # Load messages into ChromaDB
    # Note: This may take some time depending on the number of messages
    try:
        count = db_manager.load_messages(
            messages=messages,
            batch_size=1000,  # Smaller batch size for demonstration
            reset_collection=True  # Reset the collection before loading
        )
        print(f"Successfully loaded {count} messages into ChromaDB collection 'notebook_demo'")
    except Exception as e:
        print(f"Error loading messages into ChromaDB: {e}")
else:
    print("No messages to load. Please run the previous cell with a valid JSON file path.")


## 3. Checking the Database

Let's check the status of our ChromaDB collection to ensure the data was loaded correctly.


In [None]:
# Get information about the collection
try:
    info = db_manager.get_collection_info()

    print(f"Collection: {info['collection_name']}")
    print(f"Document count: {info['document_count']}")
    print(f"Persist directory: {info['persist_directory']}")
    print(f"Directory size: {info['directory_size_gb']:.2f} GB")
    print(f"Model: {info['model_name']}")
    print(f"Device: {info['device']}")
except Exception as e:
    print(f"Error getting collection info: {e}")


## 4. Querying the Database

Now that we have loaded the data into ChromaDB, we can query it to find relevant messages based on semantic similarity.


In [None]:
# Define a query
query_text = "What are the main topics discussed in this chat?"

# Query the database
try:
    relevant_messages = db_manager.query(
        query_text=query_text,
        top_k=10  # Get the top 10 most relevant messages
    )

    print(f"Found {len(relevant_messages)} relevant messages for query: '{query_text}'\n")

    # Display the relevant messages
    for i, msg in enumerate(relevant_messages):
        print(f"\nRelevant Message {i + 1}:")
        print(f"From: {msg['metadata']['from']}")
        print(f"Date: {msg['metadata']['date']}")
        print(f"Text: {msg['text'][:150]}..." if len(msg['text']) > 150 else f"Text: {msg['text']}")
except Exception as e:
    print(f"Error querying the database: {e}")


## 5. Generating Answers with LLM

Finally, let's use the QueryProcessor to generate answers to questions about the chat using an LLM (via Ollama).

**Note**: This requires Ollama to be installed and running locally with the specified model pulled.


In [None]:
# Initialize the QueryProcessor
query_processor = QueryProcessor(
    collection_name="notebook_demo",  # Use the same collection we created earlier
    model_name=config.OLLAMA_MODEL["name"],
    model_options=config.OLLAMA_MODEL["options"]
)

# Define a question
question = "What are the main topics discussed in this chat?"

# Generate an answer
try:
    result = query_processor.answer_question(
        question=question,
        top_k=1000  # Use more context for better answers
    )

    print(f"Question: {result['question']}\n")
    print(f"Answer: {result['answer']}\n")
    print(f"Processing time: {result['metadata']['processing_time']:.2f} seconds")
    print(f"Relevant messages used: {result['metadata']['relevant_messages_count']}")
    print(f"Model used: {result['metadata']['model']}")
except Exception as e:
    print(f"Error generating answer: {e}")


## 6. Ask Multiple Questions

Let's ask a few more questions to demonstrate the system's capabilities.


In [None]:
# Define a list of questions
questions = [
    "Who are the most active participants in this chat?",
    "What was discussed about machine learning?",
    "Are there any important dates or events mentioned?",
    "What are the sentiments expressed in this conversation?"
]

# Process each question
for question in questions:
    try:
        print(f"\n{'=' * 80}\n")
        print(f"Question: {question}\n")

        # Generate answer
        answer = answer_question(
            question=question,
            collection_name="notebook_demo",
            model_name=config.OLLAMA_MODEL["name"],
            top_k=1000
        )

        print(f"Answer: {answer}")
        print(f"\n{'=' * 80}")
    except Exception as e:
        print(f"Error processing question '{question}': {e}")


## 7. Visualizing Message Activity

Let's create a simple visualization of message activity over time.


In [1]:
# Import visualization libraries
try:
    import pandas as pd
    import matplotlib.pyplot as plt
    from datetime import datetime
    import re

    # Check if we have messages
    if 'messages' in locals() and messages:
        # Extract dates from messages
        dates = []
        senders = []

        for msg in messages:
            # Extract date - assuming format like "2023-01-01T12:34:56"
            date_str = msg['date']
            if date_str and re.match(r'\d{4}-\d{2}-\d{2}', date_str):
                try:
                    # Extract just the date part
                    date = datetime.fromisoformat(date_str.split('T')[0])
                    dates.append(date)
                    senders.append(msg['from'])
                except (ValueError, IndexError):
                    pass

        if dates:
            # Create a DataFrame
            df = pd.DataFrame({'date': dates, 'sender': senders})

            # Group by date and count messages
            daily_counts = df.groupby(df['date'].dt.date).size()

            # Plot
            plt.figure(figsize=(12, 6))
            daily_counts.plot(kind='bar')
            plt.title('Message Activity by Date')
            plt.xlabel('Date')
            plt.ylabel('Number of Messages')
            plt.xticks(rotation=45)
            plt.tight_layout()
            plt.show()

            # Plot message count by sender
            plt.figure(figsize=(12, 6))
            sender_counts = df['sender'].value_counts().head(10)  # Top 10 senders
            sender_counts.plot(kind='bar')
            plt.title('Top 10 Most Active Participants')
            plt.xlabel('Sender')
            plt.ylabel('Number of Messages')
            plt.xticks(rotation=45)
            plt.tight_layout()
            plt.show()
        else:
            print("No valid dates found in the messages.")
    else:
        print("No messages available for visualization. Please run the data loading cell first.")
except ImportError:
    print("Visualization requires pandas and matplotlib. Install them with: pip install pandas matplotlib")


Matplotlib is building the font cache; this may take a moment.


No messages available for visualization. Please run the data loading cell first.


## Conclusion

In this notebook, we've demonstrated how to use the Telegram Analyzer package to:

1. Load and process Telegram chat data from a JSON export
2. Store the messages in ChromaDB for semantic search
3. Query the database to find relevant messages
4. Generate answers to questions about the chat using an LLM
5. Visualize message activity

This approach allows for powerful analysis of Telegram conversations, enabling you to extract insights and answer questions about the content of your chats.

### Next Steps

- Try with your own Telegram chat exports
- Experiment with different LLM models in Ollama
- Develop more advanced visualizations and analyses
- Integrate with other NLP tools for sentiment analysis, topic modeling, etc.