Skip to content

Latest commit

 

History

History
120 lines (104 loc) · 11.5 KB

Other Interesting Works.md

File metadata and controls

120 lines (104 loc) · 11.5 KB

Data Integration and Knowledge Integration

  1. GeoFlux: Hands-Off Data Integration Leveraging Join Key Knowledge (SIGMOD 2018) [PDF, demo] 🌟
  2. Non-binary evaluation measures for big data integration (VLDBJ 2018) 🌟
  3. Meta-Mappings for Schema Mapping Reuse [PDF] (VLDB 2019) 🌟
  4. Representing Temporal Attributes for Schema Matching (KDD 2020) 🌟
  5. Bayesian Networks for Data Integration in the Absence of Foreign Keys (TKDE 2020) 🌟
  6. Creating Embeddings of Heterogeneous Relational Datasets for Data Integration Tasks [Paper] (SIGMOD 2020) 🌟

KG and Blockchains

  1. BlockChain + KG [Link]

Data Extraction or Knowledge Extraction from The Web

  1. When Open Information Extraction Meets the Semi-Structured Web (OpenCERES, NAACL 2019)
  2. CERES: Distantly Supervised Relation Extraction from the Semi-Structured Web (CERES, VLDB 2018) 🌟
  3. Progressive Deep Web Crawling Through Keyword Queries For Data Enrichment [PDF] (SIGMOD 2019) 🌟
  4. RED: Redundancy-Driven Data Extraction from Result Pages (WWW 2019)
  5. TCN: Table Convolutional Network for Web Table Interpretation (WWW 2021) [Paper]

AIOps (Artificial Intelligence for IT Operations)

  1. WeBank project - AIOps + KG [Link]
  2. Paper Summary (Chinese) [Link]
  3. AIOps Papers and Summary [GitHub]

Multi-hop Reading

  1. Cognitive Graph for Multi-Hop Reading Comprehension at Scale (ACL 2019) [Paper]
  • BERT + GNN
  1. Is Graph Structure Neccessary for Multi-Hop Reading? (EMNLP 2020) [Paper] [Notes]
  2. Dynamically fused graph network for multi-hop reading (ACL 2019)
  3. AdaLoGN: Adaptive Logic Graph Network for Reasoning-Based Machine Reading Comprehension (ACL 2022) [Paper] [[Github](https://github.com/nju-websoft/AdaLoGN]

Graph Functional Dependencies

Notes: For this topic, we can check Wenfei Fan's homepage for more related publications

  1. Discovering Graph Functional Dependencies (SIGMOD 2018) [Paper]
  2. Functional Dependencies for Graphs (SIGMOD 2016) [Paper]
  3. Rule-Based Graph Repairing: Semantic and Efficient Repairing Methods (ICDE 2018) [Paper]

Subgraph Isomorphism

  1. An In-depth Comparison of Subgraph Isomorphism Algorithms in Graph Databases (VLDB 2013)
  2. On the equivalence between graph isomorphism testing and function approximation with GNNs (NeurIPS 2019) [Paper]

Fraud Detection

  1. https://info.tigergraph.com/graph-ai-world-fintell
  2. https://www.youtube.com/watch?v=Mf8PuOElGpg
  3. https://neo4j.com/use-cases/fraud-detection/

K-Core in Graphs

  1. Hierarchical Core Maintenance on Large Dynamic Graphs (VLDB 2021) [Paper]
  2. Efficient Progressive Minimum k-Core Search (VLDB 2020) [Paper]

Transformers!

  1. The Illustrated Transformer [GitHub]

BERT+KG

  1. ENRIE (Tsinghua) [References]
  2. ENRIE (Baidu) [References]

XAI and Explanable GNN

  1. On Explainability of Graph Neural Networks via Subgraph Explorations [Paper] 🌟
  • Shapley value --> taxi sharing
  1. ReduNet: A White-box Deep Network from the Principle of Maximizing Rate Reduction [Paper] [Some discussion]
  2. Evaluating XAI: A comparison of rule-based and example-based explanations [Paper]

HGNN

  1. Heterogeneous Graph Structure Learning for Graph Neural Networks (AAAI 2021) [Paper]

Others

  1. Neural Subgraph Isomorphism Counting (KDD 2020) [Paper] [Code] 🌟
  2. A Fresh Look on Knowledge Bases Distilling Named (CIKM 2014)🌟
  • Event KB. Each news article is regarded as a event. Build the semantic similarity relations and the tmporal relations between evernts.
  1. A Generic Ontology Framework for Indexing Keyword Search on Massive Graphs
  2. Extending Graph Patterns with Conditions
  3. LUSTRE: An Interactive System for Entity Structured Representation and Variant Generation (ICDE 2018) [PDF, demo] 🌟
  4. TableView: A Visual Interface for Generating Preview Tables of Entity Graphs (ICDE 2018) [PDF, demo] 🌟
  5. Mining Summaries for Knowledge Graph Search (TKDE 2018) [PDF, ICDM2016 version] 🌟
  6. Embedded Functional Dependencies and Data-completeness Tailored Database Design [PDF] (VLDB 2019) 🌟
  7. Tutorial: Combating Fake News: A Data Management and Mining Perspective [Link] (VLDB 2019) 🌟
  8. Tutorial: Data Lake Management: Challenges and Opportunities [Link] (VLDB 2019) 🌟
  9. Spade: A Modular Framework for Analytical Exploration of RDF Graphs [PDF, demo] (VLDB 2019) 🌟
  10. PivotE: Revealing and Visualizing the Underlying Entity Structures for Exploration [PDF, demo] (VLDB 2019) 🌟
  11. Estimating Node Importance in Knowledge Graphs Using Graph Neural Networks [Paper, Presentation] (KDD 2019) 🌟
  • Next work: MultiImport: Inferring node importance in a knowledge graph from multiple input signals [Paper] (KDD 2020) 🌟
  1. Embedding-based Retrieval in Facebook Search (KDD 2020) [Paper]
  2. Automatically Generating Interesting Facts from Wikipedia Tables [PDF, industrial track] (SIGMOD 2019) 🌟
  3. Knowledge Graphs and Enterprise AI: The Promise of an Enabling Technology [PDF, keynote] (ICDE 2019) 🌟
  4. Collective Keyword Query on a Spatial Knowledge Base (TKDE 2019) 🌟
  5. Distribution-Aware Crowdsourced Entity Collection (TKDE 2019) 🌟
  6. Effective and Efficient Relational Community Detection and Search in Large Dynamic Heterogeneous Information Networks (VLDB 2020) 🌟
  7. Obi-Wan: Ontology-Based RDF Integration of Heterogeneous Data (VLDB 2020) 🌟
  8. RDFFrames: Knowledge Graph Access for Machine Learning Tools (demo, VLDB 2020) 🌟
  9. SPHINX: A System for Metapath-based Entity Exploration in Heterogeneous Information Networks (demo, VLDB 2020) 🌟
  10. Dataset Discovery in Data Lakes [Video][Slides][Paper] (ICDE 2020) 🌟
  11. SLIM: Scalable Linkage of Mobility Data [Paper] (SIGMOD 2020) 🌟
  12. Creating Embeddings of Heterogeneous Relational Datasets for Data Integration Tasks [Paper] (SIGMOD 2020) 🌟
  13. A survey of community search over big graphs (VLDBJ 2020) 🌟
  14. An analytical study of large SPARQL query logs (VLDBJ 2020) 🌟
  15. Generalizing Tensor Decomposition for N-ary Relational Knowledge Bases (WWW 2020)
  16. Adaptive Low-level Storage of Very Large Knowledge Graphs (WWW 2020)
  17. Be Concise and Precise: Synthesizing Open-Domain Entity Descriptions from Facts (WWW 2019)
  18. Knowledge-Enhanced Ensemble Learning for Word Embeddings (WWW 2019)
  19. Effective and Scalable Clustering on Massive Attributed Graphs (WWW 2021)
  • k-attributed graph clustering (k-AGC) groups nodes in G into k disjoint clusters, such that nodes within the same cluster share similar topological and attribute characteristics, while those in different clusters are dissimilar.
  1. Trav-SHACL: Efficiently Validating Networks of SHACL Constraints (WWW 2021)
  2. Sampling from Large Graphs (KDD 2006) [Paper]
  3. A Survey on Data Collection for Machine Learning: A Big Data - AI Integration Perspective [Paper] (TKDE 2021) 🌟
  4. Learning Dynamic User Interest Sequence in Knowledge Graphs for Click-Through Rate Prediction [Paper] (TKDE 2021) 🌟
  5. Rotom: A Meta-Learned Data Augmentation Framework for Entity Matching, Data Cleaning, Text Classification, and Beyond (SIGMOD 2021) 🌟
  6. Alibaba - GraphScope (VLDB 2021/22 industrial) 🌟 三个独立引擎 GAIA (NSDI 2021), GRAPE (SIGMOD 2017), AliGraph
  7. vertex central GNN (SIGMOD 2021) James Cheng, CUHK
  8. KungFu: Taking Training in Distributed Machine Learning Adaptive (OSDI)
  9. ArangoML Pipeline [GitHub]
  10. FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU (ICML 2023) [Paper]

LLM: From Beginers to Researcher(?)

  1. A Survey of Large Language Models (Arxiv, 2023) [Paper] [A good summary and notes, fig 3 and fig 5 are quite useful]

Note: There are a few valuable survey collection regarding the data processing/management/collection for AI/LLM, including

  1. Awesome-Data-Centric-AI [Github]
  2. Data Management for LLM [Github]
  3. Data-Centric Multimodal LLM [Github]