Skip to content

Latest commit

 

History

History
33 lines (23 loc) · 1.02 KB

index.rst

File metadata and controls

33 lines (23 loc) · 1.02 KB

image

Welcome to Spark Matcher's documentation!

Spark Matcher is a scalable entity matching algorithm implemented in PySpark. With Spark Matcher the user can easily train an algorithm to solve a custom matching problem. Spark Matcher uses active learning (modAL) to train a classifier (Sklearn) to match entities. In order to deal with the N^2 complexity of matching large tables, blocking is implemented to reduce the number of pairs. Since the implementation is done in PySpark, Spark Matcher can deal with extremely large tables.

installation_guide example.ipynb api/modules

Indices and tables

  • genindex
  • modindex
  • search