LogGen is an intelligent solution for generating logging statements in Java programs. Built using advanced Large Language Models (LLMs), LogGen identifies optimal logging positions and generates high-quality, contextually relevant log messages without altering the underlying code structure.
By improving logging practices, LogGen enhances software maintainability, observability, and developer productivity, addressing challenges like insufficient or overwhelming logging.
- Automated Logging: Predicts "what", "where", and "how" to log efficiently.
- Context-Aware Statements: Generates logs relevant to runtime data, improving debugging and system monitoring.
- Advanced LLM Integration: Leverages state-of-the-art models like CodeT5 and LLaMA.
- High Performance: Outperforms existing solutions with superior BLEU and ROUGE scores.
LogGen employs a two-stage pipeline:
-
Stage 1: Logging Position Prediction
- Uses an LLM to identify optimal positions for logging.
- Annotates code with
<FILL_ME>
placeholders.
-
Stage 2: Logging Statement Generation
- Replaces placeholders with semantically relevant log statements.
- Ensures clarity, brevity, and contextual accuracy.
LogGen achieves state-of-the-art performance using:
- Stage 1: Accuracy for predicting logging positions.
- Stage 2: BLEU & ROUGE scores to evaluate log quality and relevance.
Metric | Score |
---|---|
BLEU | 0.6464 |
ROUGE-1 | 0.8859 |
ROUGE-L | 0.8262 |
- Python 3.8+
- Java Development Kit (JDK) 8+
- Dependencies listed in
requirement.txt
-
Clone the repository:
git clone https://github.com/yourusername/LogGen-Intelligent_Logging_Statement_Generation_for_Java_Program_Files.git cd LogGen-Intelligent_Logging_Statement_Generation_for_Java_Program_Files
-
Install dependencies:
pip install -r requirement.txt
- Use the included dataset generation script to process Java repositories.
- Fine-tune Stage 1 and Stage 2 models using the provided training pipeline.
python3 stage1.py
python3 stage2.py
- Logs are automatically generated and inserted into the specified positions.
- LogGen achieves 36.84% accuracy in predicting file-level logging positions.
- Generated logs surpass existing methods with BLEU and ROUGE metrics significantly exceeding competitors like FastLog.
Model | BLEU | ROUGE-1 Recall | ROUGE-L Recall |
---|---|---|---|
LogGen | 0.6464 | 0.9116 | 0.8556 |
FastLog | 0.3343 | 0.5991 | 0.5953 |
- Enhanced Context Parsing: Integration with Abstract Syntax Trees (ASTs) and Code2Vec for better structural understanding.
- Multi-Language Support: Extend LogGen to support C, C++, and other programming languages.
- Prompt Optimization: Implement advanced prompt engineering techniques like Chain of Thought (CoT) for better results.
- Yogesh Hasabe - Yogesh31Hasabe
- Jay Patel - Jay Patel
- Ashaka Mehta - ashaka11
- Sagar Dama - sagar110599
This project is licensed under the MIT License. See LICENSE for details.