Skip to content

pcbrom/ci_stats

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CI Stats (ci_stats)

This repository contains analysis artifacts used to evaluate and audit the results reported in:

Ramos, R. M., Brom, P. C., Souza, J. G. M., Weigang, L., Di Oliveira, V., Reis, S. A., Salm Junior, J. F., Freitas, V., Kimura, H., Cajueiro, D. O., Luiz da Silva, G. and Celestino, V. R. R.
“Collective Intelligence with Large Language Models for the Review of Public Service Descriptions on Gov.br.”
DOI: 10.5220/0013831100003985
In: Proceedings of the 21st International Conference on Web Information Systems and Technologies (WEBIST 2025), pages 301–312
ISBN: 978-989-758-772-6; ISSN: 2184-3252
Proceedings Copyright © 2025 by SCITEPRESS – Science and Technology Publications, Lda.

The paper is published under the Creative Commons license CC BY-NC-ND 4.0 (see the publication venue for the official license terms).

What is evaluated here

The analyses in this repository focus on:

  • Per-document named-entity preservation as a count-based outcome: for a document with denominator m_d (reference entities) and preserved entities k_d, the preservation proportion is Y_d = k_d / m_d.
  • Worst-case sampling guarantees for audit sizing (Wilks/order-1 style “exposure” guarantees).
  • Bayesian A/B distributional comparison between two extraction/review methods (paired by document), using a binomial likelihood with a paired hierarchical logistic-normal structure.

Repository structure (files and folders)

Notebooks and scripts

  • inference.ipynb
    Main Python notebook. It:

    • Loads the A/B comparison sheet compare_results/entidades_comparacao_langextract_regex.xlsx.
    • Converts ratio columns to discrete binomial counts k/m (using rational approximation with bounded denominators).
    • Fits a paired hierarchical binomial + bivariate logistic-normal model.
    • Reports distribution-level estimands (mean/quantile shifts, tail-risk, superiority probability) and additional distribution-difference metrics (e.g., KS, quantile-W1).
    • Runs Wilks / worst-case sampling checks, including “math vs. empirical” verification via Monte Carlo resampling on the observed data.
  • inference_prior_sample.R
    R script for sample-size planning based on the paper’s audit methodology. It:

    • Reads data/entidades_extraidas.xlsx (entity totals by document).
    • Computes per-document denominators m_d and exports data/m_per_document.csv.
    • Generates planning tables:
      • data/sample_size_plan_direct.csv for direct assumptions on tail probabilities p_tail = P(Y < gamma).
      • data/sample_size_plan_theta.csv mapping an entity-level preservation rate theta to a document-level tail risk via Binomial(m, theta).

Data

  • data/entidades_extraidas.xlsx
    Per-document entity counts (reference denominators). Columns:

    • id: document identifier (matches the text filenames in dataset/ and the A/B sheet in compare_results/).
    • Entity-type counts: institutions, dates, deadlines, costs, locations, urls, emails, phones, laws, ceps, addresses.
  • data/m_per_document.csv
    Single-column file m with m_d = sum(entity-type counts) per document (exported by inference_prior_sample.R).

  • data/total.csv
    Single-column file total. In this repo, it duplicates data/m_per_document.csv (same values, different column name).

  • data/sample_size_plan_direct.csv
    Sample-size planning table for the direct tail-probability approach.

  • data/sample_size_plan_theta.csv
    Sample-size planning table for the theta-to-tail mapping approach.

A/B comparison inputs and outputs

  • compare_results/entidades_comparacao_langextract_regex.xlsx
    A/B evaluation sheet with 301 paired documents. Columns:

    • id: document identifier.
    • langextract: preservation ratio for method A.
    • REGEX: preservation ratio for method B.
  • compare_results/abtest_posterior_predictive.png
    Exported plot with posterior predictive comparisons from the Bayesian A/B analysis.

Text datasets (zipped)

  • dataset/original.zip
    ZIP archive containing original public service descriptions as plain text files under original/ (e.g., original/10022.txt).

  • dataset/newtext.zip
    ZIP archive containing revised/new versions as plain text files under newtext/ (e.g., newtext/10022.txt).

The filenames (IDs) align with data/entidades_extraidas.xlsx and compare_results/entidades_comparacao_langextract_regex.xlsx.

Project / environment files

  • ci_stats.Rproj
    RStudio project file (convenience for running the R script).

  • .venv/
    A local Python virtual environment directory (not required if you manage your own environment).

  • .Rproj.user/, .Rhistory
    Local RStudio/R session artifacts (not required for reproducibility; may be machine/user specific).

  • .gitignore
    Git ignore rules.

How to run

Python notebook (inference.ipynb)

Open the notebook in Jupyter (or VS Code) using a Python environment that provides at least:

  • pandas
  • numpy
  • scipy
  • matplotlib
  • an Excel engine (e.g., openpyxl)

Then run cells top-to-bottom.

R script (inference_prior_sample.R)

In R (or RStudio), install dependencies and run:

install.packages("readxl")
source("inference_prior_sample.R")

Outputs are written to data/ (see “Repository structure” above).

Notes on the statistical methodology implemented

  • Observation model (counts, not continuous scores): preservation is treated as k_d ~ Binomial(m_d, theta_d) rather than as a continuous score, preserving the denominator information and uncertainty.
  • Paired hierarchical A/B: document-level heterogeneity and correlation are captured by a bivariate normal prior on the logit scale.
  • Distribution-level conclusions: the notebook reports not only mean lift but also quantile shifts and tail-risk reductions, plus distribution-difference metrics to summarize how the entire distribution moves between A and B.
  • Worst-case sampling (Wilks): the notebook includes checks and planning formulas for selecting audit sizes that guarantee high probability of observing rare but relevant events under i.i.d. sampling assumptions.

Licensing and attribution

  • Repository code and files: licensed under the terms in LICENSE (Apache License 2.0), unless otherwise indicated.
  • Paper: published under CC BY-NC-ND 4.0 (per the conference proceedings). This repository is an evaluation/audit companion to the paper; it does not change the publication’s licensing terms.

Citation

If you use this repository as part of your work, please cite the paper (DOI: 10.5220/0013831100003985).

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors