Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

develop a new name resolution component #1748

Open
mhl opened this issue Aug 28, 2015 · 2 comments
Open

develop a new name resolution component #1748

mhl opened this issue Aug 28, 2015 · 2 comments
Labels

Comments

@mhl
Copy link
Contributor

mhl commented Aug 28, 2015

Our plan for improving name resolution in Pombola (which has been been a persistent source of hard-to-fix bugs) was to add name resolution to PopIt. @struan worked on this (I think on this branch) but now we've decided to stop development of PopIt this isn't going to be a solution for Pombola any more.

I think the idea of a hosted service to provide a name resolution API based on Popolo data is a good one. (The popit-resolver package used in Pombola has something of the same philosophy.)

Both popit-resolver and @struan's work in popit-api use a similar approach - generate versions of a person's names based on the Popolo data for them (e.g. the initials field, the other names, their party membership, etc.) and store them in Elasticsearch. Retrieval of possible matching people is then a matter of an Elasticsearch query.

A hosted version of this service would potentially be helpful to other groups as well - a new instance could be created based on a URL with Popolo data in some serialization (and regularly sync from that URL) - after that, they'd have a simple API to help do fuzzy matching of names from parliamentary transcripts.

This service could use django-popolo to store the Popolo data; if so, in a post-#1594 world, this service could have two modes of use like SayIt's - either as a Django application used directly, or the hosted service used over an HTTP-based API.

As a developer trying to build a parliamentary monitoring site
I want to be able to easily find the person referred to by a name (+ optional party) in a parliamentary transcript on a particular date
So that I can find all the speeches by particular politicians

Related: #1535

@tmtmtmtm
Copy link
Contributor

Is ElasticSearch actually the right tool for this, or was it just the most convenient originally, because all the PopIt data was already there? This feels like quite a heavy-weight approach, and I'm curious as to whether that's because a simpler version would just end up reimplementing lots of ElasticSearch anyway, or whether there might be a better approach now that we can rethink it from an empty slate.

@struan
Copy link
Member

struan commented Sep 8, 2015

I'm not sure if it's the right tool but it was certainly picked for popit because we were already using it. However, it does have a chunk of stuff with scoring matches built in, although it's not clear to me if that's a blessing as it was fine until the magic didn't work as you expected and then working out why was a pain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants