Skip to content

Topic 11: Knowledge Fusion, Cleaning, Evaluation and Truth Discovery

Sherry Lin edited this page Oct 9, 2020 · 1 revision

Surveys

  1. A Survey on Truth Discovery [Paper] 🌟
  2. Truth Discovery Algorithms: An Experimental Evaluation [Paper]

Data Fusion

(I think this is a relatively old topic, people are moving to knowledge fusion) (To be classified... single truth/multi-truth, copy detection, source reliability...)

  1. Truth Discovery with Multiple Conflicting Information Providers on the Web (TKDE 2008), the most classical one. 🌟
  2. Integrating conflicting data: the role of source dependence (VLDB 2009), the most classical one. 🌟
  3. Fusing data with correlations (SIGMOD 2014) 🌟
  4. Truth discovery and copying detection in a dynamic world (VLDB 2009) 🌟
  5. Global detection of complex copying relationships between sources (VLDB 2010) 🌟
  6. Online data fusion (VLDB 2011) 🌟
  7. Compact explanation of data fusion decisions (WWW 2013)
  8. Truth finding on the Deep Web: Is the problem solved? (VLDB 2013) 🌟
  9. A Confidence-Aware Approach for Truth Discovery on Long-Tail Data (VLDB 2014) 🌟
  10. Dynamic Truth Discovery on Numerical Data (ICDM 2018) 🌟
  11. Scaling up Copy Detection (ICDE 2015) 🌟

Knowledge Fusion, Cleaning and Evaluation

  1. Knowledge Vault: A Web-Scale Approach to Probabilistic Knowledge Fusion (KDD 2014) [Paper] 🌟
  2. From data fusion to knowledge fusion (VLDB 2014) [Paper] [Slides] 🌟
  3. Data X-Ray: A diagnostic tool for data errors (SIGMOD 2015) [Paper] [Slides] [Demo] 🌟
  4. Knowledge-based trust: estimating the trustworthiness of web sources [Paper] [Slides]🌟
  5. Knowledge verification for long tail verticals (VLDB 2017) 🌟
  6. Efficient knowledge graph accuracy evaluation (VLDB 2019) [Link] 🌟
  7. MIDAS: Finding the Right Web Sources to Fill Knowledge Gaps (ICDE 2019) 🌟
  8. Distilling relations using knowledge bases (VLDBJ 2018) 🌟

Given a relational table, we study the problem of detecting and repairing erroneous data, as well as marking correct data, using well curated knowledge bases (KBs). We propose detective rules (DRs), a new type of data cleaning rules that can make actionable decisions on relational data, by building connections between a relation and a KB.

  1. HoloDetect: Few-Shot Learning for Error Detection [PDF], the same team of the HoloClean (SIGMOD 2019) 🌟
  2. Unsupervised String Transformation Learning for Entity Consolidation [PDF] (ICDE 2019) 🌟
  3. Normalization of Duplicate Records from Multiple Sources (TKDE 2019) 🌟
  4. Selecting Data to Clean for Fact Checking: Minimizing Uncertainty vs. Maximizing Surprise (VLDB 2020) 🌟
  5. Learning Over Dirty Data Without Cleaning [Paper] (SIGMOD 2020) 🌟
  6. CoClean: Collaborative Data Cleaning [Paper] (SIGMOD 2020, demo) 🌟
  7. T-REx: Table Repair Explanations [Paper] (SIGMOD 2020, demo) 🌟 Datasets
  8. Fusion Datasets [Link]

Notes

  1. Data Fusion – Resolving Data Conflicts for Integration [Tutorial Proposal]
  2. Data Integration and Machine Learning: A Natural Synergy