Skip to content

Latest commit

 

History

History
103 lines (95 loc) · 11.8 KB

Entity Resolution, Entity Matching and Entity Alignment.md

File metadata and controls

103 lines (95 loc) · 11.8 KB

Entity Resolution, Entity Matching and Entity Alignment

📝 Surveys and Analysis

  1. End-to-End Entity Resolution for Big Data: A Survey (2019) [Paper]
  2. Blocking and Filtering Techniques for Entity Resolution: A Survey (ACM Computing Surveys 2020) [Paper]
  3. Comparative Analysis of Approximate Blocking Techniques for Entity Resolution (VLDB 2016) [Paper] 🌟
  4. Entity Resolution: Past, Present and Yet-to-Come (EDBT 2020) [Paper]
  5. A survey: knowledge graph entity alignment research based on graph embedding (Artificial Intelligence Review 2024) [Paper]

📝 Research Papers

General Topics

  1. ZeroER: Entity Resolution using Zero Labeled Examples (SIGMOD 2020)🌟
  2. A Comprehensive Benchmark Framework for Active Learning Methods in Entity Matching (SIGMOD 2020)🌟
  3. Synthesizing Entity Matching Rules by Examples (VLDB 2018) [PDF]🌟
  4. Distributed Representations of Tuples for Entity Resolution (VLDB 2018) [PDF]🌟
  5. A Demonstration of PERC: Probabilistic Entity Resolution With Crowd Errors (VLDB 2018) [PDF, demo]🌟
  6. The return of JedAI: End-to-End Entity Resolution for Structured and Semi-Structured Data (VLDB 2018) [PDF, demo] 🌟
  7. CloudMatcher: A Hands-Off Cloud/Crowd Service for Entity Matching (VLDB 2018) [PDF, demo]🌟
  8. Robust Entity Resolution using Random Graphs (SIGMOD 2018) [PDF] 🌟
  9. Deep Learning for Entity Matching: A Design Space Exploration (SIGMOD 2018) [PDF] [Code and Data] 🌟
  10. Schema-Agnostic Progressive Entity Resolution (ICDE 2018) [PDF] 🌟
  11. A Graph-Theoretic Fusion Framework for Unsupervised Entity Resolution (ICDE 2018) [PDF] 🌟
  12. Enabling Quality Control for Entity Resolution: A Human and Machine Cooperation Framework (ICDE 2018) [PDF] 🌟
  13. Simplifying Entity Resolution on Web Data with Schema-Agnostic, Non-Iterative Matching (ICDE 2018) [PDF, short paper] 🌟
  14. Rule-Based Entity Resolution on Database with Hidden Temporal Information (ICDE 2018) 🌟
  15. Matching Heterogeneous Event Data (ICDE 2018) 🌟
  16. Ontology-based Entity Matching in Attributed Graphs [PDF, more similar to a graph paper] (VLDB 2019) 🌟
  17. SystemER: A Human-in-the-loop System for Explainable Entity Resolution [PDF, demo] (VLDB 2019) 🌟
  18. OAG: Toward Linking Large-scale Heterogeneous Entity Graphs [Paper, applied science track] (KDD 2019) 🌟
  19. Entity Matching Meets Data Science: A Progress Report from the Magellan Project [PDF, industrial track] (SIGMOD 2019) 🌟
  20. EXPLAINER: Entity Resolution Explanations [PDF, best demo] (ICDE 2019) 🌟
  21. Top-K Entity Resolution with Adaptive Locality-Sensitive Hashing [Paper, short paper] (ICDE 2019) 🌟
  22. Schema-Agnostic Progressive Entity Resolution (TKDE 2019) 🌟
  23. Knowledge Translation [Technical Report] (VLDB 2020) 🌟
  24. Boosting the Speed of Entity Alignment 10×: Dual Attention Matching Network with Normalized Hard Sample Mining (WWW 2021) [Paper]
  • Address two existing problems: over-complex graph encoder and inefficient negative sampling strategy
  1. Crowdsourced Collective Entity Resolution with Relational Match Propagation [Video][Slides][Paper] (ICDE 2020) 🌟
  2. Gradual Machine Learning for Entity Resolution (WWW 2019)
  3. Deep Entity Matching with Pre-Trained Language Models [Paper] (VLDB 2021) 🌟
  4. Towards Interpretable and Learnable Risk Analysis for Entity Resolution [Paper] (SIGMOD 2020) 🌟
  5. Entity Matching in the Wild: a Consistent and Versatile Framework to Unify Data in Industrial Applications [Paper] (SIGMOD 2020, industry track) 🌟
  6. REA: Robust Cross-lingual Entity Alignment Between Knowledge Graphs (KDD 2020)
  7. r-HUMO: A Risk-Aware Human-Machine Cooperation Framework for Entity Resolution with Quality Guarantees (TKDE 2020) 🌟
  8. Efficient Entity Resolution on Heterogeneous Records (TKDE 2020) 🌟 [Paper]
  9. Waldo: An Adaptive Human Interface for Crowd Entity Resolution (SIGMOD 2017) 🌟
  10. Falcon: Scaling Up Hands-Off Crowdsourced Entity Matching to Build Cloud Services (SIGMOD 2017) 🌟
  11. Generating Concise Entity Matching Rules (SIGMOD 2017) 🌟
  12. Knowledge Graph Alignment Network with Gated Multi-hop Neighborhood Aggregation (AAAI 2020) [Paper]
  13. Collective Multi-type Entity Alignment Between Knowledge Graphs (WWW 2020)
  14. Multi-Context Attention for Entity Matching (WWW 2020, short paper)
  15. End-to-end Task Based Parallelization for Entity Resolution on Dynamic Data (ICDE 2021) 🌟
  16. Auto-EM: End-to-end Fuzzy Entity-Matching using Pre-trained Deep Models and Transfer Learning (WWW 2019)
  17. Cost–effective Variational Active Entity Resolution (ICDE 2021) 🌟
  18. Automating Entity Matching Model Development (ICDE 2021) 🌟
  19. Efficient and effective ER with progressive blocking [Paper] (VLDBJ 2021) 🌟
  20. Online Topic-Aware Entity Resolution Over Incomplete Data Streams (SIGMOD 2021) 🌟
  21. Active Learning for Neural Entity Alignment (EMNLP 2021)
  • Human in the loop to improve the quality of alignment seeds
  1. Ensemble Semi-supervised Entity Alignment via Cycle-teaching (AAAI 2022)
  2. Informed Multi-context Entity Alignment (WSDM 2022)
  3. Deep Indexed Active Learning for Matching Heterogeneous Entity Representations [Paper] [Code]
  4. BrewER: Entity Resolution On-Demand (VLDB 2023) [Paper] 🌟
  5. FusionQuery: On-demand Fusion Queries over Multi-source Heterogeneous Data (VLDB 2024) [Paper] 🌟
  • FusionQuery has a query stage and a fusion stage. In the query stage, this paper frames the heterogeneous data query problem as a knowledge graph matching problem and present a line graph-based method to accelerate it.
  1. KAE: A property-based method for knowledge graph alignment and extension (Journal of Web Semantics, 2024) [Paper]

Embedding/Knowledge Graph Representation Based Techniques

  1. A Benchmarking Study of Embedding-based Entity Alignment for Knowledge Graphs (VLDB 2020) 🌟 [Paper] [GitHub]
  2. Multi-view Knowledge Graph Embedding for Entity Alignment (IJCAI 2019) [Paper]
  3. Semi-Supervised Entity Alignment via Knowledge Graph Embedding with Awareness of Degree Difference (WWW 2019)
  4. Iterative Entity Alignment via Joint Knowledge Embeddings (IJCAI 2017) [Paper] [Slides]
  5. Jointly learning entity and relation representations for entity alignment (EMNLP 2019) [Paper]
  6. Aligning cross-lingual entities with multi-aspect information (EMNLP 2019)
  • Paper 5 and 6 consider the embedding of relation (edges) in GCN
  1. Cross-lingual Knowledge Graph Alignment via Graph Convolutional Networks (EMNLP 2018) [Paper] [Code]
  2. A Contextual Alignment Enhanced Cross Graph Attention Network for Cross-lingual Entity Alignment (COLING 2020) [Paper] [Notes]
  3. Representation Learning for Entity Alignment in Knowledge Graph: A Design Space Exploration 🌟 (ICDE 2024)
  4. Attribute-Consistent Knowledge Graph Representation Learning for Multi-Modal Entity Alignment (WWW 2023) [Paper]

Language Model Based Techniques

  1. Deep Entity Matching with Pre-Trained Language Models (Tool:Ditto) [Blog]
  2. Deep Entity Matching: Challenges and Opportunities (ACM Journal of Data and Information Quality, 2021) [Paper]

LLM for Entity Matching

  1. Can Foundation Models Wrangle Your Data? (Stanford, Numbers Station, Arxiv 2022) [Paper] [Github-"Foundation Models for Entity Matching in dbt and Snowflake"] 🌟
  2. Entity Matching using Large Language Models (Arxiv, 2023 May) [Paper]
  • They had an informal version submitted earlier: Using ChatGPT for Entity Matching (Arxiv, 2023 May) [Paper]

Blocking Techniques

  1. Blocking and Filtering Techniques for Entity Resolution: A Survey (ACM Computing Surveys 2020) [Paper] [arxiv version]
  2. A noise tolerant and schema-agnostic blocking technique for entity resolution (SAC 2019) [Paper]
  3. AutoBlock: A Hands-off Blocking Framework for Entity Matching (WSDM 2020) [Paper]
  4. JedAI3: beyond batch, blocking-based Entity Resolution [Papr] [GitHub]
  5. Scaling entity resolution: A loosely schema-aware approach [Papr]
  • A LSH-based attribute-match induction technique to extract loose schema information.
  • An unsupervised meta-blocking approach based on loose schema information.
  • An algorithm to scale any meta-blocking method on MapReduce-like systems.

🛠️ Awesome Tools

  1. JedAIToolkit [GitHub]

📊 Datasets or Benchmarks

  1. (Dataset) Clean-Clean ER datasets and Dirty ER datasets [GitHub]