Skip to content

Latest commit

 

History

History
59 lines (52 loc) · 5.24 KB

Knowledge Fusion, Cleaning, Evaluation and Truth Discovery.md

File metadata and controls

59 lines (52 loc) · 5.24 KB

Knowledge Fusion, Data Fusion, and Truth Discovery

📝 Surveys

  1. A Survey on Truth Discovery [Paper] 🌟
  2. Truth Discovery Algorithms: An Experimental Evaluation [Paper]
  3. A survey on data fusion: what for? in what form? what is next? (Journal of Intelligent Information Systems, 2020) [Paper]

📝 General Papers

Data Fusion

I think this is a relatively old topic, people are moving to knowledge fusion since 2018. Actually there are many interesting small topics. e.g., single truth/multi-truth, copy detection, source reliability. I will classfiy the following papers later. However, I think data fusion/knowledge fusion will play an essential role in data processing in the pre-trained dataset in LLMs/LMs.

  1. Truth Discovery with Multiple Conflicting Information Providers on the Web (TKDE 2008), the most classical one. 🌟
  2. Integrating conflicting data: the role of source dependence (VLDB 2009), the most classical one. 🌟
  3. Fusing data with correlations (SIGMOD 2014) 🌟
  4. Truth discovery and copying detection in a dynamic world (VLDB 2009) 🌟
  5. Global detection of complex copying relationships between sources (VLDB 2010) [Paper] 🌟
  6. Online data fusion (VLDB 2011) 🌟
  7. Compact explanation of data fusion decisions (WWW 2013)
  8. Truth finding on the Deep Web: Is the problem solved? (VLDB 2013) 🌟
  9. A Confidence-Aware Approach for Truth Discovery on Long-Tail Data (VLDB 2014) 🌟
  10. Dynamic Truth Discovery on Numerical Data (ICDM 2018) 🌟
  11. Scaling up Copy Detection (ICDE 2015) 🌟

Knowledge Fusion, Cleaning and Evaluation

  1. Knowledge Vault: A Web-Scale Approach to Probabilistic Knowledge Fusion (KDD 2014) [Paper] 🌟
  2. From data fusion to knowledge fusion (VLDB 2014) [Paper] [Slides] 🌟
  3. Data X-Ray: A diagnostic tool for data errors (SIGMOD 2015) [Paper] [Slides] [Demo] 🌟
  4. Knowledge-based trust: estimating the trustworthiness of web sources [Paper] [Slides]🌟
  5. Knowledge verification for long tail verticals (VLDB 2017) 🌟
  6. Efficient knowledge graph accuracy evaluation (VLDB 2019) [Link] 🌟
  7. MIDAS: Finding the Right Web Sources to Fill Knowledge Gaps (ICDE 2019) 🌟
  8. Distilling relations using knowledge bases (VLDBJ 2018) 🌟

Given a relational table, we study the problem of detecting and repairing erroneous data, as well as marking correct data, using well curated knowledge bases (KBs). We propose detective rules (DRs), a new type of data cleaning rules that can make actionable decisions on relational data, by building connections between a relation and a KB.

  1. HoloDetect: Few-Shot Learning for Error Detection [PDF], the same team of the HoloClean (SIGMOD 2019) 🌟
  2. Unsupervised String Transformation Learning for Entity Consolidation [PDF] (ICDE 2019) 🌟
  3. Normalization of Duplicate Records from Multiple Sources (TKDE 2019) 🌟
  4. Selecting Data to Clean for Fact Checking: Minimizing Uncertainty vs. Maximizing Surprise (VLDB 2020) 🌟
  5. Learning Over Dirty Data Without Cleaning [Paper] (SIGMOD 2020) 🌟
  6. CoClean: Collaborative Data Cleaning [Paper] (SIGMOD 2020, demo) 🌟
  7. T-REx: Table Repair Explanations [Paper] (SIGMOD 2020, demo) 🌟
  8. Triple Trustworthiness Measurement for Knowledge Graph (WWW 2019)
  9. Tracy: Tracing Facts over Knowledge Graphs and Text (WWW 2019, short)
  10. Few-Shot Knowledge Validation using Rules (WWW 2021) [Paper]

Vandalism Detection

  1. Debiasing Vandalism Detection Models at Wikidata (WWW 2019)

Malicious Participant Detection

  1. Truth discovery for spatio-temporal events from crowdsourced data (VLDB 2017) [Paper] 🌟
  2. Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation (SIGMOD 2014) [Paper] 🌟 (only mention malicious sources in one sentence)
  3. Reputation-Aware Data Fusion and Malicious Participant Detection in Mobile Crowdsensing (2018 IEEE International Conference on Big Data (Big Data)) [Paper]

📊 Datasets

  1. Fusion Datasets [Link]

💬 Notes

  1. Data Fusion – Resolving Data Conflicts for Integration [Tutorial Proposal]
  2. Data Integration and Machine Learning: A Natural Synergy