Topic 11: Knowledge Fusion, Cleaning, Evaluation and Truth Discovery

Surveys

A Survey on Truth Discovery [Paper] 🌟
Truth Discovery Algorithms: An Experimental Evaluation [Paper]

Data Fusion

(I think this is a relatively old topic, people are moving to knowledge fusion) (To be classified... single truth/multi-truth, copy detection, source reliability...)

Truth Discovery with Multiple Conflicting Information Providers on the Web (TKDE 2008), the most classical one. 🌟
Integrating conflicting data: the role of source dependence (VLDB 2009), the most classical one. 🌟
Fusing data with correlations (SIGMOD 2014) 🌟
Truth discovery and copying detection in a dynamic world (VLDB 2009) 🌟
Global detection of complex copying relationships between sources (VLDB 2010) 🌟
Online data fusion (VLDB 2011) 🌟
Compact explanation of data fusion decisions (WWW 2013)
Truth finding on the Deep Web: Is the problem solved? (VLDB 2013) 🌟
A Confidence-Aware Approach for Truth Discovery on Long-Tail Data (VLDB 2014) 🌟
Dynamic Truth Discovery on Numerical Data (ICDM 2018) 🌟
Scaling up Copy Detection (ICDE 2015) 🌟

Knowledge Fusion, Cleaning and Evaluation

Knowledge Vault: A Web-Scale Approach to Probabilistic Knowledge Fusion (KDD 2014) [Paper] 🌟
From data fusion to knowledge fusion (VLDB 2014) [Paper] [Slides] 🌟
Data X-Ray: A diagnostic tool for data errors (SIGMOD 2015) [Paper] [Slides] [Demo] 🌟
Knowledge-based trust: estimating the trustworthiness of web sources [Paper] [Slides]🌟
Knowledge verification for long tail verticals (VLDB 2017) 🌟
Efficient knowledge graph accuracy evaluation (VLDB 2019) [Link] 🌟
MIDAS: Finding the Right Web Sources to Fill Knowledge Gaps (ICDE 2019) 🌟
Distilling relations using knowledge bases (VLDBJ 2018) 🌟

Given a relational table, we study the problem of detecting and repairing erroneous data, as well as marking correct data, using well curated knowledge bases (KBs). We propose detective rules (DRs), a new type of data cleaning rules that can make actionable decisions on relational data, by building connections between a relation and a KB.

HoloDetect: Few-Shot Learning for Error Detection [PDF], the same team of the HoloClean (SIGMOD 2019) 🌟
Unsupervised String Transformation Learning for Entity Consolidation [PDF] (ICDE 2019) 🌟
Normalization of Duplicate Records from Multiple Sources (TKDE 2019) 🌟
Selecting Data to Clean for Fact Checking: Minimizing Uncertainty vs. Maximizing Surprise (VLDB 2020) 🌟
Learning Over Dirty Data Without Cleaning [Paper] (SIGMOD 2020) 🌟
CoClean: Collaborative Data Cleaning [Paper] (SIGMOD 2020, demo) 🌟
T-REx: Table Repair Explanations [Paper] (SIGMOD 2020, demo) 🌟 Datasets
Fusion Datasets [Link]

Notes

Data Fusion – Resolving Data Conflicts for Integration [Tutorial Proposal]
Data Integration and Machine Learning: A Natural Synergy

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Topic 11: Knowledge Fusion, Cleaning, Evaluation and Truth Discovery

Clone this wiki locally