Skip to content

prmudgal/Reflex

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

REFLEX: Log Summary Evaluation Framework

A modular Python framework for evaluating log summaries using Large Language Models (LLMs) and semantic similarity metrics.

Features

  • Multiple LLM Providers: Support for OpenAI GPT models and Hugging Face transformers (BART, Flan-T5, etc.)
  • Semantic Similarity: Uses sentence transformers for accurate similarity scoring
  • Additional Metrics: Optional support for ROUGE, BERTScore, TF-IDF, and KL divergence
  • Modular Design: Clean, extensible architecture following Python best practices
  • Easy to Use: Simple API for both single evaluations and batch processing

Installation

Basic Installation

pip install -r requirements.txt

Or install as a package:

pip install -e .

Optional Dependencies

For OpenAI support:

pip install openai

For additional metrics (ROUGE, BERTScore, etc.):

pip install scikit-learn rouge-score bert-score nltk

Or install everything:

pip install -e ".[all]"

Quick Start

Basic Usage

from reflex import LogSummaryEvaluator, HuggingFaceProvider

# Initialize provider and evaluator
llm_provider = HuggingFaceProvider(model_name="facebook/bart-large-cnn")
evaluator = LogSummaryEvaluator(llm_provider)

# Evaluate a log summary
log = "2025-06-30 12:45:03 ERROR: AuthService failed to validate token. JWT expired."
user_summary = "JWT expired during token check in AuthService"

result = evaluator.evaluate(log, user_summary)
print(f"Similarity Score: {result['similarity_score']}")
print(f"LLM Summary: {result['llm_summary']}")

Using OpenAI

from reflex import LogSummaryEvaluator, OpenAIProvider
import os

# Set your API key (or use environment variable OPENAI_API_KEY)
llm_provider = OpenAIProvider(api_key=os.getenv("OPENAI_API_KEY"), model="gpt-4")
evaluator = LogSummaryEvaluator(llm_provider)

result = evaluator.evaluate(log, user_summary)

Batch Evaluation

from reflex import LogSummaryEvaluator, HuggingFaceProvider
from reflex.utils import parse_log_summary_pairs
import csv

llm_provider = HuggingFaceProvider(model_name="facebook/bart-large-cnn")
evaluator = LogSummaryEvaluator(llm_provider)

# Parse log-summary pairs from file
pairs = parse_log_summary_pairs("data/logs.txt")

# Evaluate and save results
with open("results.csv", "w", newline="") as f:
    writer = csv.DictWriter(f, fieldnames=["llm_summary", "user_summary", "similarity_score"])
    writer.writeheader()
    
    for log, user_summary in pairs:
        result = evaluator.evaluate(log, user_summary)
        writer.writerow(result)

Project Structure

Reflex/
├── reflex/                 # Main package
│   ├── core/              # Core evaluation components
│   ├── providers/         # LLM provider implementations
│   ├── metrics/           # Additional evaluation metrics
│   └── utils/             # Utility functions
├── examples/              # Example scripts
├── scripts/               # Utility scripts
├── tests/                 # Unit tests
├── data/                  # Sample data files
├── docs/                  # Documentation
├── requirements.txt       # Python dependencies
├── setup.py              # Package setup
└── README.md             # This file

Input File Format

The framework expects log-summary pairs in the following format:

#1#
<log text here>
#summary:#
<summary text here>
#2#
<log text here>
#summary:#
<summary text here>
...

Examples

See the examples/ directory for more detailed examples:

  • basic_usage.py: Simple evaluation examples
  • batch_evaluation.py: Batch processing from files
  • metrics_comparison.py: Comparing different metrics

API Reference

Core Classes

LogSummaryEvaluator

Main evaluator class that generates summaries and computes similarity scores.

evaluator = LogSummaryEvaluator(llm_provider, embedding_model="all-MiniLM-L6-v2")
result = evaluator.evaluate(log_text, user_summary)

OpenAIProvider

Provider for OpenAI models.

provider = OpenAIProvider(api_key="your-key", model="gpt-4")

HuggingFaceProvider

Provider for Hugging Face transformer models.

provider = HuggingFaceProvider(model_name="facebook/bart-large-cnn")

Additional Metrics

The SimilarityMetrics class provides additional evaluation metrics:

from reflex.metrics import SimilarityMetrics

# TF-IDF Cosine Similarity
tfidf_scores = SimilarityMetrics.tfidf_cosine(logs, summaries)

# ROUGE Scores
rouge_scores = SimilarityMetrics.rouge_scores(logs, summaries)

# BERTScore
bert_scores = SimilarityMetrics.bert_scores(logs, summaries)

# KL Divergence
kl_scores = SimilarityMetrics.kl_divergence(logs, summaries)

Configuration

Environment Variables

  • OPENAI_API_KEY: OpenAI API key (optional, can be passed directly to provider)

Model Selection

Hugging Face Models:

  • facebook/bart-large-cnn: Good for summarization
  • google/flan-t5-xl: General purpose
  • google/pegasus-xsum: Specialized for summarization

OpenAI Models:

  • gpt-4: Best quality (requires API key)
  • gpt-3.5-turbo: Faster and cheaper alternative

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

MIT License - see LICENSE file for details

Citation

If you use REFLEX in your research, please cite:

@INPROCEEDINGS{11405982,
  author={Mudgal, Priyanka},
  booktitle={2025 1st International Conference on Emerging Trends in Information Systems and Informatics (ICETISI)}, 
  title={REFLEX: Reference-Free Evaluation of Log Summarization via Large Language Model Judgment}, 
  year={2025},
  volume={},
  number={},
  pages={1-7},
  keywords={Measurement;Training;Feedback loop;Protocols;Large language models;Perturbation methods;Semantics;Market research;Real-time systems;Informatics;LLM-as-a-judge;Log summarization;Log summary score;Log analysis},
  doi={10.1109/ICETISI67983.2025.11405982}}

Acknowledgments

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages