develop a new name resolution component #1748

mhl · 2015-08-28T17:20:32Z

Our plan for improving name resolution in Pombola (which has been been a persistent source of hard-to-fix bugs) was to add name resolution to PopIt. @struan worked on this (I think on this branch) but now we've decided to stop development of PopIt this isn't going to be a solution for Pombola any more.

I think the idea of a hosted service to provide a name resolution API based on Popolo data is a good one. (The popit-resolver package used in Pombola has something of the same philosophy.)

Both popit-resolver and @struan's work in popit-api use a similar approach - generate versions of a person's names based on the Popolo data for them (e.g. the initials field, the other names, their party membership, etc.) and store them in Elasticsearch. Retrieval of possible matching people is then a matter of an Elasticsearch query.

A hosted version of this service would potentially be helpful to other groups as well - a new instance could be created based on a URL with Popolo data in some serialization (and regularly sync from that URL) - after that, they'd have a simple API to help do fuzzy matching of names from parliamentary transcripts.

This service could use django-popolo to store the Popolo data; if so, in a post-#1594 world, this service could have two modes of use like SayIt's - either as a Django application used directly, or the hosted service used over an HTTP-based API.

As a developer trying to build a parliamentary monitoring site
I want to be able to easily find the person referred to by a name (+ optional party) in a parliamentary transcript on a particular date
So that I can find all the speeches by particular politicians

Related: #1535

tmtmtmtm · 2015-08-31T14:54:37Z

Is ElasticSearch actually the right tool for this, or was it just the most convenient originally, because all the PopIt data was already there? This feels like quite a heavy-weight approach, and I'm curious as to whether that's because a simpler version would just end up reimplementing lots of ElasticSearch anyway, or whether there might be a better approach now that we can rethink it from an empty slate.

struan · 2015-09-08T09:59:13Z

I'm not sure if it's the right tool but it was certainly picked for popit because we were already using it. However, it does have a chunk of stuff with scoring matches built in, although it's not clear to me if that's a blessing as it was fine until the magic didn't work as you expected and then working out why was a pain.

mhl mentioned this issue Aug 28, 2015

[ZA] Investigate why Andrew Ntwampe Morwamoche's profile is showing appearances from Mr Morwamoche of a different party from many years ago #1740

Closed

mhl added the Difficulty 5 label Sep 15, 2015

tmtmtmtm mentioned this issue Feb 24, 2016

ZA: go through most frequent name resolution failures for committees #1536

Open

mhl mentioned this issue Feb 24, 2016

[placeholder] scope what our ideal name resolution / scraping pipeline architecture would look like for Pombola #1909

Closed

tmtmtmtm mentioned this issue Feb 24, 2016

KE: add start and end dates to aliases #1583

Open

tmtmtmtm added the new app label Feb 24, 2016

chrismytton removed the Difficulty 5 label Nov 7, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

develop a new name resolution component #1748

develop a new name resolution component #1748

mhl commented Aug 28, 2015

tmtmtmtm commented Aug 31, 2015

struan commented Sep 8, 2015

develop a new name resolution component #1748

develop a new name resolution component #1748

Comments

mhl commented Aug 28, 2015

tmtmtmtm commented Aug 31, 2015

struan commented Sep 8, 2015