Data de-deuplication tool
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
contrib
nomenklatura
.bowerrc
.gitignore
DESIGN.md
Dockerfile
LICENSE
Procfile
README.md
bower.json
docker-compose.yml
requirements.txt
setup.py

README.md

nomenklatura

Nomenklatura de-duplicates and integrates different names for entities - people, organisations or public bodies - to help you clean up messy data and to find links between different datasets.

The service will create references for all entities mentioned in a source dataset. It then helps you to define which of these entities are duplicates and what the canonical name for a given entity should be. This information is available in data cleaning tools like OpenRefine or in custom data processing scripts, so that you can automatically apply existing mappings in the future.

The focus of nomenklatura is on data integration, it does not provide further functionality with regards to the people and organisations that it helps to keep track of.

Contact, contributions etc.

nomenklatura is developed with generous support by Knight-Mozilla OpenNews and the Open Knowledge Foundation Labs. The codebase is licensed under the terms of an MIT license (see LICENSE.md).

We're keen for any contributions, bug fixes and feature suggestions, please use the GitHub issue tracker for this repository.