GitHub - ghoshs/CardiO: Estimating Cardinality on the Web

CardiO: Predicting Cardinality from Online Sources

Count questions are an important type of information need, and in principle available on the Web, though often in noisy, contradictory, or semantically not fully aligned form. In this work, we propose CardiO, a lightweight framework for searching entity counts on the Web. CardiO extracts all counts from a set of relevant Web snippets, and infers the most central count based on semantic and numeric distances from other candidates. In the absence of supporting evidence, the system relies on closely-related sets of similar size, to provide an estimate. Experiments show that CardiO can produce accurate counts better than small models based purely on LLM with better traceable answers. Although larger models have higher precision, when used to enhance CardiO components, they do not contribute to the final precision or recall.

This repository contains the benchmarks and the code used in the submission to the WebConf short papers track 2024.

Benchmarks

Cardinality Questions (CQ) benchmark
Natural Questions (NQ) benchmark
Count Question Answering Dataset (CoQuAD)

Citation

@inproceedings{ghosh2024cardio,
  title={CardiO: Predicting Cardinality from Online Sources},
  author={Ghosh, Shrestha and Razniewski, Simon and Graux, Damien and Weikum, Gerhard},
  booktitle={Companion Proceedings of the ACM Web Conference 2024},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
benchmark		benchmark
count_prediction		count_prediction
question_modelling		question_modelling
question_reformulation		question_reformulation
relevance_filter		relevance_filter
retrieval		retrieval
snippet_modelling		snippet_modelling
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cardinality_web.py		cardinality_web.py
config.yml		config.yml
pipeline.py		pipeline.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CardiO: Predicting Cardinality from Online Sources

Benchmarks

Citation

About

Releases 1

Packages

Contributors 2

Languages

License

ghoshs/CardiO

Folders and files

Latest commit

History

Repository files navigation

CardiO: Predicting Cardinality from Online Sources

Benchmarks

Citation

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

Packages