CI Stats (ci_stats)

This repository contains analysis artifacts used to evaluate and audit the results reported in:

Ramos, R. M., Brom, P. C., Souza, J. G. M., Weigang, L., Di Oliveira, V., Reis, S. A., Salm Junior, J. F., Freitas, V., Kimura, H., Cajueiro, D. O., Luiz da Silva, G. and Celestino, V. R. R.
“Collective Intelligence with Large Language Models for the Review of Public Service Descriptions on Gov.br.”
DOI: 10.5220/0013831100003985
In: Proceedings of the 21st International Conference on Web Information Systems and Technologies (WEBIST 2025), pages 301–312
ISBN: 978-989-758-772-6; ISSN: 2184-3252
Proceedings Copyright © 2025 by SCITEPRESS – Science and Technology Publications, Lda.

The paper is published under the Creative Commons license CC BY-NC-ND 4.0 (see the publication venue for the official license terms).

What is evaluated here

The analyses in this repository focus on:

Per-document named-entity preservation as a count-based outcome: for a document with denominator m_d (reference entities) and preserved entities k_d, the preservation proportion is Y_d = k_d / m_d.
Worst-case sampling guarantees for audit sizing (Wilks/order-1 style “exposure” guarantees).
Bayesian A/B distributional comparison between two extraction/review methods (paired by document), using a binomial likelihood with a paired hierarchical logistic-normal structure.

Repository structure (files and folders)

Notebooks and scripts

inference.ipynb
Main Python notebook. It:
- Loads the A/B comparison sheet compare_results/entidades_comparacao_langextract_regex.xlsx.
- Converts ratio columns to discrete binomial counts k/m (using rational approximation with bounded denominators).
- Fits a paired hierarchical binomial + bivariate logistic-normal model.
- Reports distribution-level estimands (mean/quantile shifts, tail-risk, superiority probability) and additional distribution-difference metrics (e.g., KS, quantile-W1).
- Runs Wilks / worst-case sampling checks, including “math vs. empirical” verification via Monte Carlo resampling on the observed data.
inference_prior_sample.R
R script for sample-size planning based on the paper’s audit methodology. It:
- Reads data/entidades_extraidas.xlsx (entity totals by document).
- Computes per-document denominators m_d and exports data/m_per_document.csv.
- Generates planning tables:
  - data/sample_size_plan_direct.csv for direct assumptions on tail probabilities p_tail = P(Y < gamma).
  - data/sample_size_plan_theta.csv mapping an entity-level preservation rate theta to a document-level tail risk via Binomial(m, theta).

Data

data/entidades_extraidas.xlsx
Per-document entity counts (reference denominators). Columns:
- id: document identifier (matches the text filenames in dataset/ and the A/B sheet in compare_results/).
- Entity-type counts: institutions, dates, deadlines, costs, locations, urls, emails, phones, laws, ceps, addresses.
data/m_per_document.csv
Single-column file m with m_d = sum(entity-type counts) per document (exported by inference_prior_sample.R).
data/total.csv
Single-column file total. In this repo, it duplicates data/m_per_document.csv (same values, different column name).
data/sample_size_plan_direct.csv
Sample-size planning table for the direct tail-probability approach.
data/sample_size_plan_theta.csv
Sample-size planning table for the theta-to-tail mapping approach.

A/B comparison inputs and outputs

compare_results/entidades_comparacao_langextract_regex.xlsx
A/B evaluation sheet with 301 paired documents. Columns:
- id: document identifier.
- langextract: preservation ratio for method A.
- REGEX: preservation ratio for method B.
compare_results/abtest_posterior_predictive.png
Exported plot with posterior predictive comparisons from the Bayesian A/B analysis.

Text datasets (zipped)

dataset/original.zip
ZIP archive containing original public service descriptions as plain text files under original/ (e.g., original/10022.txt).
dataset/newtext.zip
ZIP archive containing revised/new versions as plain text files under newtext/ (e.g., newtext/10022.txt).

The filenames (IDs) align with data/entidades_extraidas.xlsx and compare_results/entidades_comparacao_langextract_regex.xlsx.

Project / environment files

ci_stats.Rproj
RStudio project file (convenience for running the R script).
.venv/
A local Python virtual environment directory (not required if you manage your own environment).
.Rproj.user/, .Rhistory
Local RStudio/R session artifacts (not required for reproducibility; may be machine/user specific).
.gitignore
Git ignore rules.

How to run

Python notebook (`inference.ipynb`)

Open the notebook in Jupyter (or VS Code) using a Python environment that provides at least:

pandas
numpy
scipy
matplotlib
an Excel engine (e.g., openpyxl)

Then run cells top-to-bottom.

R script (`inference_prior_sample.R`)

In R (or RStudio), install dependencies and run:

install.packages("readxl")
source("inference_prior_sample.R")

Outputs are written to data/ (see “Repository structure” above).

Notes on the statistical methodology implemented

Observation model (counts, not continuous scores): preservation is treated as k_d ~ Binomial(m_d, theta_d) rather than as a continuous score, preserving the denominator information and uncertainty.
Paired hierarchical A/B: document-level heterogeneity and correlation are captured by a bivariate normal prior on the logit scale.
Distribution-level conclusions: the notebook reports not only mean lift but also quantile shifts and tail-risk reductions, plus distribution-difference metrics to summarize how the entire distribution moves between A and B.
Worst-case sampling (Wilks): the notebook includes checks and planning formulas for selecting audit sizes that guarantee high probability of observing rare but relevant events under i.i.d. sampling assumptions.

Licensing and attribution

Repository code and files: licensed under the terms in LICENSE (Apache License 2.0), unless otherwise indicated.
Paper: published under CC BY-NC-ND 4.0 (per the conference proceedings). This repository is an evaluation/audit companion to the paper; it does not change the publication’s licensing terms.

Citation

If you use this repository as part of your work, please cite the paper (DOI: 10.5220/0013831100003985).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CI Stats (ci_stats)

What is evaluated here

Repository structure (files and folders)

Notebooks and scripts

Data

A/B comparison inputs and outputs

Text datasets (zipped)

Project / environment files

How to run

Python notebook (`inference.ipynb`)

R script (`inference_prior_sample.R`)

Notes on the statistical methodology implemented

Licensing and attribution

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
compare_results		compare_results
data		data
dataset		dataset
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
inference.ipynb		inference.ipynb
inference_prior_sample.R		inference_prior_sample.R

Folders and files

Latest commit

History

Repository files navigation

CI Stats (ci_stats)

What is evaluated here

Repository structure (files and folders)

Notebooks and scripts

Data

A/B comparison inputs and outputs

Text datasets (zipped)

Project / environment files

How to run

Python notebook (inference.ipynb)

R script (inference_prior_sample.R)

Notes on the statistical methodology implemented

Licensing and attribution

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Python notebook (`inference.ipynb`)

R script (`inference_prior_sample.R`)

Packages