Welcome to Spark Matcher's documentation!

Spark Matcher is a scalable entity matching algorithm implemented in PySpark. With Spark Matcher the user can easily train an algorithm to solve a custom matching problem. Spark Matcher uses active learning (modAL) to train a classifier (Sklearn) to match entities. In order to deal with the N^2 complexity of matching large tables, blocking is implemented to reduce the number of pairs. Since the implementation is done in PySpark, Spark Matcher can deal with extremely large tables.

installation_guide example.ipynb api/modules

Indices and tables

genindex
modindex
search

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

index.rst

index.rst

Welcome to Spark Matcher's documentation!

Indices and tables

Files

index.rst

Latest commit

History

index.rst

File metadata and controls

Welcome to Spark Matcher's documentation!

Indices and tables