Skip to content
View costezki's full-sized avatar
🔗
🍉 in progress ...
🔗
🍉 in progress ...

Block or report costezki

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Stars

Deduplication

20 repositories

A powerful and modular toolkit for record linkage and duplicate detection in Python

Python 1,047 154 Updated Feb 21, 2024

🆔 A python library for accurate and scalable fuzzy matching, record deduplication and entity-resolution.

Python 4,441 572 Updated Jul 29, 2025

Scalable identity resolution, entity resolution, data mastering and deduplication using ML

Java 1,161 157 Updated Mar 12, 2026

Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends

Python 2,004 218 Updated Mar 12, 2026

A list of free data matching and record linkage software.

401 42 Updated Feb 21, 2024

PyTorch library for transforming entities like companies, products, etc. into vectors to support scalable Record Linkage / Entity Resolution using Approximate Nearest Neighbors.

Jupyter Notebook 161 16 Updated Nov 18, 2022

Record Linkage ToolKit (Find and link entities)

Python 111 23 Updated Aug 14, 2023

ReFinED is an efficient and accurate entity linking (EL) system.

Python 234 52 Updated Dec 13, 2024

A Python script for generating duplicate data to test the performance of record linkage and master data management systems.

Python 7 2 Updated Jun 12, 2024

Entity resolution for Elasticsearch.

Java 167 29 Updated Mar 1, 2026

🐍 Python Implementation and Extension of RDF2Vec

Python 267 53 Updated Mar 1, 2026

PyTorch Implementation of RDF2Vec

Python 8 3 Updated Nov 2, 2021

euBusinessGraph Company Data Model

HTML 49 12 Updated May 26, 2025

Open Source research tool to search, browse, analyze and explore large document collections by Semantic Search Engine and Open Source Text Mining & Text Analytics platform (Integrates ETL for docum…

Shell 1,141 195 Updated Apr 19, 2025

OpenRefine reconciliation services for VIAF, ORCID, and Open Library + framework for creating more.

Java 124 24 Updated Jun 18, 2025

An interpretable machine learning pipeline over knowledge graphs

Jupyter Notebook 27 2 Updated Apr 30, 2025

Open Source REST API for named entity extraction, named entity linking, named entity disambiguation, recommendation & reconciliation of entities like persons, organizations and places for (semi)aut…

Python 200 35 Updated Oct 9, 2022

Open Source alternative to Algolia + Pinecone and an Easier-to-Use alternative to ElasticSearch ⚡ 🔍 ✨ Fast, typo tolerant, in-memory fuzzy Search Engine for building delightful search experiences

C++ 25,371 864 Updated Mar 11, 2026

Entity Disambiguation as text extraction (ACL 2022)

Python 182 13 Updated Apr 17, 2022

An open-source library that leverages Python’s data science ecosystem to build powerful end-to-end Entity Resolution workflows.

Python 89 12 Updated Nov 3, 2025