Skip to content

northeastern-datalab/HITSnDIFFs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HITSnDIFFs

Paper License

This repository provides an implementation of the HITSnDIFFS approach proposed in the ICDE 2024 paper "HITSNDIFFS: From Truth Discovery to Ability Discovery by Recovering Matrices with the Consecutive Ones Property" (arXiv:2401.00013 long version) as well as implementations of competitors and corresponding experiments in the paper.

Programming Language and Libraries

The source code is written in Python, tested on Python version 3.8 or 3.10. Install all necessary packages (numpy, scipy, matplotlib, girth, func-timeout, jupyter) by

pip install -r requirements.txt

Methods (methods/)

All the methods are in the folder, including various implementations of HITSnDIFFs (hitsndiffs.py) and ABH (spectral.py), majority vote (majority.py), "True-Answer" baseline (baseline.py), HITS-based approaches (hits.py) and GRM-estimator using the girth package (grm.py).

Experiments (experiments/)

All the experiments reported in the paper in the folder, including various accuracy and efficiency experiments (experiment.py) on synthetic datasets (generated by synthetic.py), experiments on real-world datasets (multichoice.py), experiments on simulated datasets which use realistic estimated parameters for data generation (simulation.py) and experiments to comapare HnD and ABH (comparison.py). Synthetic data and intermediate are also provided in subfolders.

Datasets (datasets/)

This folder contains the six real-world datasets we used in the experiments. The datasets are originally from http://www.ml.ist.i.kyoto-u.ac.jp/en/en-research/li2017cikm and used in the CIKM paper "Hyper Questions: Unsupervised Targeting of a Few Experts in Crowdsourcing" of J. Li, Y. Baba and H. Kashima. We appreciate the sharing of the authors!

Reproducibility

The experimental result can be reproduced by running the Jupyter notebook figures.ipynb. The loading indicator to be True/False indicates whether the notebook directly reads the executed experimental result or re-runs the experiment. Note that re-running each experiment on the synthetic datasets can take up to several hours.

Various intermediate results are provided including the generated synthetic data (experiments/synthetic), the user abilities returned by different approaches for each run of the experiments (experiments/ability), the experimental result (experiments/result) and all figures (experiments/figures).

License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0

Contributors

For any questions on methods, experiments/, datasets/ and reproducibility of the experiments, please contact Zixuan. For any clarification, comments, or suggestions on the IRT methods in IRT/ please contact Wolfgang.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published