An LLM-powered tool for automated thematic analysis of qualitative data. Analyze PDFs, text files, or raw text to discover themes and patterns.
Built with the OpenHands Software Agent SDK.
pip install thematic-analysis
# Or from source
git clone https://github.com/neubig/thematic-analysis.git
cd thematic-analysis
pip install -e .from thematic_analysis import ThematicAnalysisPipeline
pipeline = ThematicAnalysisPipeline()
result = pipeline.run_from_directory("/path/to/pdfs")
# View discovered themes
for theme in result.themes.themes:
print(f"{theme.name}: {theme.description}")
print(f" Codes: {', '.join(theme.original_themes)}")from thematic_analysis import ThematicAnalysisPipeline
texts = [
"I feel overwhelmed by the amount of work...",
"The team collaboration has been excellent...",
"Deadlines are causing significant stress...",
]
pipeline = ThematicAnalysisPipeline()
result = pipeline.run_from_texts(texts)from thematic_analysis import ThematicAnalysisPipeline
pipeline = ThematicAnalysisPipeline()
result = pipeline.run_from_pdf("/path/to/document.pdf")Customize the analysis with PipelineConfig:
from thematic_analysis import ThematicAnalysisPipeline, PipelineConfig
config = PipelineConfig(
# Number of parallel coders (more = diverse perspectives)
num_coders=3,
# Number of theme developers
num_theme_coders=2,
# LLM model to use
model="anthropic/claude-sonnet-4-20250514",
# Processing batch size
batch_size=10,
)
pipeline = ThematicAnalysisPipeline(config=config)| Parameter | Default | Description |
|---|---|---|
num_coders |
3 | Number of independent coders. More coders = more diverse code suggestions |
num_theme_coders |
2 | Number of theme developers |
batch_size |
10 | Segments processed per batch |
execution_mode |
PARALLEL |
PARALLEL for speed, SEQUENTIAL for debugging |
from thematic_analysis.agents import CoderConfig
coder_config = CoderConfig(
model="anthropic/claude-sonnet-4-20250514",
max_codes_per_segment=5, # Max codes assigned to each text segment
temperature=0.7, # LLM temperature (higher = more creative)
)
config = PipelineConfig(coder_config=coder_config)Assign different analytical perspectives to coders for richer analysis:
from thematic_analysis import PipelineConfig
config = PipelineConfig(
num_coders=3,
coder_identities=[
"a healthcare professional focused on patient outcomes",
"an economist analyzing cost-effectiveness",
"a patient advocate prioritizing accessibility",
],
)The pipeline returns a PipelineResult with:
result = pipeline.run_from_texts(texts)
# Themes discovered
result.themes.themes # List of MergedTheme objects
# Each theme has:
# - name: str
# - description: str
# - original_themes: list[str] (the codes that were merged into this theme)
# The codebook with all codes
result.codebook
# Export to JSON
result.to_json()Set your LLM API credentials:
export LLM_API_KEY="your-api-key"
export LLM_MODEL="anthropic/claude-sonnet-4-20250514" # or other supported model- PDF (
.pdf) - Automatic text extraction - Text (
.txt) - Plain text files - Markdown (
.md) - Markdown files
The pipeline uses a multi-agent architecture:
- Coders: Multiple independent agents analyze text segments and assign codes
- Aggregator: Merges similar codes from different coders
- Reviewer: Validates and refines the codebook
- Theme Coders: Group codes into higher-level themes
- Theme Aggregator: Produces final consolidated themes
MIT License - See LICENSE for details.
This project is inspired by and based on concepts from:
Qiao, T., Walker, C., Cunningham, C., & Koh, Y. S. (2025). Thematic-LM: A LLM-based Multi-agent System for Large-scale Thematic Analysis. In Proceedings of the ACM Web Conference 2025 (WWW '25). https://doi.org/10.1145/3696410.3714595
@inproceedings{qiao2025thematiclm,
title={Thematic-LM: A LLM-based Multi-agent System for Large-scale Thematic Analysis},
author={Qiao, Tingrui and Walker, Caroline and Cunningham, Chris and Koh, Yun Sing},
booktitle={Proceedings of the ACM Web Conference 2025 (WWW '25)},
year={2025},
doi={10.1145/3696410.3714595}
}