CAST: Achieving Stable LLM-based Text Analysis for Data Analytics

Official code, prompts, and data for our ACL 2026 paper:

CAST: Achieving Stable LLM-based Text Analysis for Data Analytics
Jinxiang Xie*, Zihao Li*, Wei He*, Rui Ding†, Shi Han, Dongmei Zhang
ACL 2026
*Equal contribution. †Corresponding author.

Overview

Text analysis of tabular data relies on two core operations:

Summarization — corpus-level theme extraction
Tagging — row-level labeling

A critical limitation of using LLMs for these tasks is their inability to meet the high standards of output stability demanded by data analytics. We introduce CAST (Consistency via Algorithmic Prompting and Stable Thinking), a framework that enhances output stability by constraining the model's latent reasoning path. CAST combines:

Algorithmic Prompting (AP) — a procedural scaffold over valid reasoning transitions.
Thinking-before-Speaking (TbS) — explicit intermediate commitments before final generation.

To measure progress, we also introduce CAST-S and CAST-T, stability metrics for bulleted summarization and tagging, validated against human judgments. Across multiple LLM backbones, CAST consistently achieves the best stability among all baselines, improving the Stability Score by up to 16.2%, while maintaining or improving output quality.

Repository Layout

.
├── summarization/                 # Summarization (CAST-S) experiments
│   ├── summary_pipeline.py        # End-to-end summary generation + scoring
│   ├── llm_stability_pipeline.py  # Compare baseline / AP / TbS / CAST stability
│   ├── path_stability_pipeline.py # Reasoning-path ablations
│   ├── distribution_analysis_pipeline.py # Output-distribution sharpness analysis
│   ├── AblationPrompt/            # baseline / ap / tbs / cast prompts
│   ├── reasoning_path_prompt/     # Reasoning-path ablation prompts
│   ├── EvaluationPrompt/          # Judge prompts for summary + stability
│   ├── Input/                     # Input datasets (xlsx)
│   │   ├── Summary-Input/         # Datasets for summary_pipeline.py
│   │   └── Stability-Input/       # Datasets for *_stability_pipeline.py
│   └── Output/Stability-Output/   # Reference stability scores + correlation analysis
│
├── tagging/                       # Tagging (CAST-T) experiments
│   ├── program.py                 # Tagging pipeline
│   ├── evaluation.ipynb           # Evaluation / analysis notebook
│   └── AP.md / TbS.md / AP+TbS.md / none.md   # Prompt variants
│
├── data/                          # Human annotations + supplementary data
│   ├── README.md                  # See for full file-by-file description
│   ├── human_annotations/
│   │   ├── summarization/         # h1 / j2 / z3 stability score JSONs
│   │   └── tagging/               # h1 / j1 / j2 annotators × 4 prompts
│   └── supplementary/             # Additional input datasets
│
├── requirements.txt
├── .env.example                   # Copy to .env and fill in API keys
├── LICENSE                        # MIT
└── README.md

Setup

We recommend uv for Python environment and dependency management (it is significantly faster than pip + venv).

1. Install `uv`

# macOS / Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

# Windows (PowerShell)
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

# or via pipx / Homebrew
pipx install uv      # any platform
brew install uv      # macOS

2. Clone & install dependencies

git clone https://github.com/jxtse/CAST-text-analysis.git
cd CAST-text-analysis

uv sync                         # creates .venv and installs core dependencies
source .venv/bin/activate       # Windows: .venv\Scripts\activate

Need the heavy distribution-analysis extras (sentence-transformers, umap-learn)? Add the optional group:

uv sync --extra distribution    # or: --extra all

Python ≥ 3.9 is supported (defaults to 3.12 via uv).

Prefer plain pip?

python -m venv .venv
source .venv/bin/activate                       # Windows: .venv\Scripts\activate
pip install -e .                                 # core dependencies
pip install -e ".[distribution]"                 # optional: distribution-analysis extras

3. Configure API keys

Copy the template and fill in whichever providers you intend to use:

cp .env.example .env

The pipelines read keys from environment variables (loaded via python-dotenv):

Variable	Used by
`OPENAI_API_KEY`	`summary_pipeline`, `path_stability`, `tagging`
`OPENROUTER_API_KEY`	All summarization pipelines
`SiliconFlow_API_KEY` / `SILICONFLOW_API_KEY`	Stability + tagging pipelines
`Grok_API_KEY` / `GROK_API_KEY`	Stability + tagging pipelines
`Gemini_API_KEY` / `GEMINI_API_KEY`	Stability + tagging pipelines

You only need keys for the providers you plan to call.

Quickstart

All commands assume your current directory is the corresponding subfolder (summarization/ or tagging/), since the scripts use relative paths (e.g. Input/..., Output/...).

Summarization

End-to-end summary generation + LLM-judge scoring:

cd summarization
python summary_pipeline.py

Compare stability across baseline / ap / tbs / cast prompts:

python llm_stability_pipeline.py                       # all four
python llm_stability_pipeline.py --prompt_types cast   # subset
python llm_stability_pipeline.py --compare_only        # re-aggregate existing results
python llm_stability_pipeline.py --score_only          # rescore an existing results file

Reasoning-path ablations:

python path_stability_pipeline.py                      # default 4 paths
python path_stability_pipeline.py --extended_cast      # full 8-path study
python path_stability_pipeline.py --prompt_types perspective_prompt,domain_prompt

Output-distribution sharpness analysis:

python distribution_analysis_pipeline.py

Tagging

cd tagging
# Edit the `dataset_path`, `sheet_names`, `llm_types`, and `prompt_files` lists
# at the bottom of program.py (in `async def main`) to control the run.
python program.py

Note: the dataset_path referenced in program.py points to the tagging dataset, which has been removed from this public release because it contained sensitive material. Replace it with your own input file using the same column layout to reproduce the pipeline.

Then open evaluation.ipynb for the post-hoc tagging analysis.

Reproducing Paper Results

The reference outputs that back the figures and tables in the paper live under summarization/Output/Stability-Output/:

cast_stability_score_result.json — model-judged CAST-S scores
human_cast_stability_score_result_anonymous.json — anonymized human ratings
correlation_analysis.py — Pearson/Spearman correlation between the two
stability_correlation_analysis_overall.png, stability_correlation_analysis_pair.png — corresponding figures

The per-annotator raw inputs that produced the anonymized merge above are in data/human_annotations/, together with the human tagging annotations and supplementary multilingual datasets.

To regenerate the correlation figures:

cd summarization/Output/Stability-Output
python correlation_analysis.py

Citation

If CAST or CAST-S/CAST-T are useful in your work, please cite:

@misc{xie2026castachievingstablellmbased,
  title         = {CAST: Achieving Stable LLM-based Text Analysis for Data Analytics},
  author        = {Jinxiang Xie and Zihao Li and Wei He and Rui Ding and Shi Han and Dongmei Zhang},
  year          = {2026},
  eprint        = {2602.15861},
  archivePrefix = {arXiv},
  primaryClass  = {cs.CL},
  url           = {https://arxiv.org/abs/2602.15861}
}

The ACL Anthology entry will be linked here once the proceedings are published. See paper/ for build notes on the arXiv submission.

License

This project is released under the MIT License.

Acknowledgements

This work was conducted in part during Jinxiang Xie's internship at Microsoft. Correspondence: juding@microsoft.com.

Issues and pull requests are welcome.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CAST: Achieving Stable LLM-based Text Analysis for Data Analytics

Overview

Repository Layout

Setup

1. Install `uv`

2. Clone & install dependencies

3. Configure API keys

Quickstart

Summarization

Tagging

Reproducing Paper Results

Citation

License

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
paper		paper
summarization		summarization
tagging		tagging
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

CAST: Achieving Stable LLM-based Text Analysis for Data Analytics

Overview

Repository Layout

Setup

1. Install uv

2. Clone & install dependencies

3. Configure API keys

Quickstart

Summarization

Tagging

Reproducing Paper Results

Citation

License

Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. Install `uv`

Packages