Batch match organization names to ISIL #55

Closed
acka47 opened this Issue Jul 15, 2015 · 6 comments

Projects

None yet

4 participants

@acka47
Contributor
acka47 commented Jul 15, 2015

@hauschke has a list of ~1100 organizations he wants to get the ISIL for. He would like to do this with open refine but we don't offer an open refine reconciliation service (see also lobid/lodmill#168). I guess , the API already supports this use case if you have the needed programming skills.

We will either have to provide a reconciliation service or create an example script that lets @hauschke execute the matching. He would provide us with a csv list of the names for this...

@acka47
Contributor
acka47 commented Jul 22, 2015

@hauschke Sent a csv with library names and postalc code and – if existing – DBS ID. You can find it here: https://gist.github.com/acka47/9bdc24359fe811e90026

@acka47
Contributor
acka47 commented Aug 6, 2015

@fsteeg: Do you want to take this one over?

@fsteeg fsteeg added the ready label Aug 6, 2015
@fsteeg fsteeg assigned fsteeg and unassigned philboeselager Aug 6, 2015
@fsteeg fsteeg added working and removed ready labels Aug 6, 2015
@fsteeg
Contributor
fsteeg commented Aug 11, 2015

Deployed a first take on a reconciliation service for lobid-organisations:

http://beta.lobid.org/organisations/reconcile

I have not worked with OpenRefine before, so I'm not sure this all makes sense, but here is what I did to test the service (in OpenRefine v2.6-rc1):

  • Open bib_ohne_sigel.csv (from https://gist.github.com/acka47/9bdc24359fe811e90026) -> Next -> Create Project
  • On the bibliothek column, drop down menu -> Reconcile -> Start Reconciling
  • Add Standard Service -> paste http://beta.lobid.org/organisations/reconcile -> Add Service
  • Click new lobid-organisations entry to close pane on the left
  • Check dbs-id, As Property: a (arbitrary, but needs to be set)
  • Check plz, As Property: b (arbitrary, but needs to be set)
  • Click Start Reconciling
  • For each bibliothek cell OpenRefine now lists the suggested candidates
  • Click on the candidates to view their lobid-organisations JSON content
  • Deselect the lowest scoring suggestions with the slider on the left (I selected 0.17 - 2.59, resulting in 1015 rows of 1027 total)
  • On the bibliothek column drop down menu -> Edit column -> Add column based on this column
  • New column name: hbz-id (entries with no Sigel get DBS-<dbs-id> ids in lobid-organisations)
  • Expression: cell.recon.best.id -> OK
  • On the left, select all again (to get back the rows where we did not add the id)
  • In the upper right, Export -> Comma-separated values

The exported CSV file contains all original rows, with 1015 of 1027 now containing a hbz-id. I've uploaded it here: https://gist.github.com/fsteeg/df41a245b9ee404ef036

@hauschke Is this usable for your use case? Anything missing or wrong?

@fsteeg fsteeg assigned acka47 and unassigned fsteeg Aug 11, 2015
@fsteeg fsteeg added review and removed working labels Aug 11, 2015
@hauschke

Thank you very much, it works like a charm! 😄

@acka47
Contributor
acka47 commented Aug 24, 2015

As @hauschke is satisfied, a +1 from me. I haven't tried it myself, though.

@acka47 acka47 added deploy and removed review labels Aug 24, 2015
@acka47 acka47 assigned fsteeg and unassigned acka47 Aug 24, 2015
@fsteeg
Contributor
fsteeg commented Aug 24, 2015

Great, happy it works for you @hauschke. Closing.

@fsteeg fsteeg closed this Aug 24, 2015
@fsteeg fsteeg removed the deploy label Aug 24, 2015
@acka47 acka47 added the deploy label Aug 24, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment