HITSnDIFFs

This repository provides an implementation of the HITSnDIFFS approach proposed in the ICDE 2024 paper "HITSNDIFFS: From Truth Discovery to Ability Discovery by Recovering Matrices with the Consecutive Ones Property" (arXiv:2401.00013 long version) as well as implementations of competitors and corresponding experiments in the paper.

Programming Language and Libraries

The source code is written in Python, tested on Python version 3.8 or 3.10. Install all necessary packages (numpy, scipy, matplotlib, girth, func-timeout, jupyter) by

pip install -r requirements.txt

Methods (`methods/`)

All the methods are in the folder, including various implementations of HITSnDIFFs (hitsndiffs.py) and ABH (spectral.py), majority vote (majority.py), "True-Answer" baseline (baseline.py), HITS-based approaches (hits.py) and GRM-estimator using the girth package (grm.py).

Experiments (`experiments/`)

All the experiments reported in the paper in the folder, including various accuracy and efficiency experiments (experiment.py) on synthetic datasets (generated by synthetic.py), experiments on real-world datasets (multichoice.py), experiments on simulated datasets which use realistic estimated parameters for data generation (simulation.py) and experiments to comapare HnD and ABH (comparison.py). Synthetic data and intermediate are also provided in subfolders.

Datasets (`datasets/`)

This folder contains the six real-world datasets we used in the experiments. The datasets are originally from http://www.ml.ist.i.kyoto-u.ac.jp/en/en-research/li2017cikm and used in the CIKM paper "Hyper Questions: Unsupervised Targeting of a Few Experts in Crowdsourcing" of J. Li, Y. Baba and H. Kashima. We appreciate the sharing of the authors!

Reproducibility

The experimental result can be reproduced by running the Jupyter notebook figures.ipynb. The loading indicator to be True/False indicates whether the notebook directly reads the executed experimental result or re-runs the experiment. Note that re-running each experiment on the synthetic datasets can take up to several hours.

Various intermediate results are provided including the generated synthetic data (experiments/synthetic), the user abilities returned by different approaches for each run of the experiments (experiments/ability), the experimental result (experiments/result) and all figures (experiments/figures).

License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0

Contributors

For any questions on methods, experiments/, datasets/ and reproducibility of the experiments, please contact Zixuan. For any clarification, comments, or suggestions on the IRT methods in IRT/ please contact Wolfgang.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IRT

IRT

datasets

datasets

experiments

experiments

methods

methods

README.md

README.md

figures.ipynb

figures.ipynb

requirements.txt

requirements.txt

Repository files navigation

HITSnDIFFs

Programming Language and Libraries

Methods (`methods/`)

Experiments (`experiments/`)

Datasets (`datasets/`)

Reproducibility

License

Contributors

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
IRT		IRT
datasets		datasets
experiments		experiments
methods		methods
README.md		README.md
figures.ipynb		figures.ipynb
requirements.txt		requirements.txt

northeastern-datalab/HITSnDIFFs

Folders and files

Latest commit

History

Repository files navigation

HITSnDIFFs

Programming Language and Libraries

Methods (methods/)

Experiments (experiments/)

Datasets (datasets/)

Reproducibility

License

Contributors

About

Resources

Stars

Watchers

Forks

Languages

Methods (`methods/`)

Experiments (`experiments/`)

Datasets (`datasets/`)