New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch match organization names to ISIL #55

Closed
acka47 opened this Issue Jul 15, 2015 · 6 comments

Comments

Projects
None yet
4 participants
@acka47
Contributor

acka47 commented Jul 15, 2015

@hauschke has a list of ~1100 organizations he wants to get the ISIL for. He would like to do this with open refine but we don't offer an open refine reconciliation service (see also lobid/lodmill#168). I guess , the API already supports this use case if you have the needed programming skills.

We will either have to provide a reconciliation service or create an example script that lets @hauschke execute the matching. He would provide us with a csv list of the names for this...

@acka47

This comment has been minimized.

Show comment
Hide comment
@acka47

acka47 Jul 22, 2015

Contributor

@hauschke Sent a csv with library names and postalc code and – if existing – DBS ID. You can find it here: https://gist.github.com/acka47/9bdc24359fe811e90026

Contributor

acka47 commented Jul 22, 2015

@hauschke Sent a csv with library names and postalc code and – if existing – DBS ID. You can find it here: https://gist.github.com/acka47/9bdc24359fe811e90026

@acka47

This comment has been minimized.

Show comment
Hide comment
@acka47

acka47 Aug 6, 2015

Contributor

@fsteeg: Do you want to take this one over?

Contributor

acka47 commented Aug 6, 2015

@fsteeg: Do you want to take this one over?

@fsteeg fsteeg added the ready label Aug 6, 2015

@fsteeg fsteeg assigned fsteeg and unassigned philboeselager Aug 6, 2015

@fsteeg fsteeg added working and removed ready labels Aug 6, 2015

fsteeg added a commit that referenced this issue Aug 11, 2015

@fsteeg

This comment has been minimized.

Show comment
Hide comment
@fsteeg

fsteeg Aug 11, 2015

Contributor

Deployed a first take on a reconciliation service for lobid-organisations:

http://beta.lobid.org/organisations/reconcile

I have not worked with OpenRefine before, so I'm not sure this all makes sense, but here is what I did to test the service (in OpenRefine v2.6-rc1):

  • Open bib_ohne_sigel.csv (from https://gist.github.com/acka47/9bdc24359fe811e90026) -> Next -> Create Project
  • On the bibliothek column, drop down menu -> Reconcile -> Start Reconciling
  • Add Standard Service -> paste http://beta.lobid.org/organisations/reconcile -> Add Service
  • Click new lobid-organisations entry to close pane on the left
  • Check dbs-id, As Property: a (arbitrary, but needs to be set)
  • Check plz, As Property: b (arbitrary, but needs to be set)
  • Click Start Reconciling
  • For each bibliothek cell OpenRefine now lists the suggested candidates
  • Click on the candidates to view their lobid-organisations JSON content
  • Deselect the lowest scoring suggestions with the slider on the left (I selected 0.17 - 2.59, resulting in 1015 rows of 1027 total)
  • On the bibliothek column drop down menu -> Edit column -> Add column based on this column
  • New column name: hbz-id (entries with no Sigel get DBS-<dbs-id> ids in lobid-organisations)
  • Expression: cell.recon.best.id -> OK
  • On the left, select all again (to get back the rows where we did not add the id)
  • In the upper right, Export -> Comma-separated values

The exported CSV file contains all original rows, with 1015 of 1027 now containing a hbz-id. I've uploaded it here: https://gist.github.com/fsteeg/df41a245b9ee404ef036

@hauschke Is this usable for your use case? Anything missing or wrong?

Contributor

fsteeg commented Aug 11, 2015

Deployed a first take on a reconciliation service for lobid-organisations:

http://beta.lobid.org/organisations/reconcile

I have not worked with OpenRefine before, so I'm not sure this all makes sense, but here is what I did to test the service (in OpenRefine v2.6-rc1):

  • Open bib_ohne_sigel.csv (from https://gist.github.com/acka47/9bdc24359fe811e90026) -> Next -> Create Project
  • On the bibliothek column, drop down menu -> Reconcile -> Start Reconciling
  • Add Standard Service -> paste http://beta.lobid.org/organisations/reconcile -> Add Service
  • Click new lobid-organisations entry to close pane on the left
  • Check dbs-id, As Property: a (arbitrary, but needs to be set)
  • Check plz, As Property: b (arbitrary, but needs to be set)
  • Click Start Reconciling
  • For each bibliothek cell OpenRefine now lists the suggested candidates
  • Click on the candidates to view their lobid-organisations JSON content
  • Deselect the lowest scoring suggestions with the slider on the left (I selected 0.17 - 2.59, resulting in 1015 rows of 1027 total)
  • On the bibliothek column drop down menu -> Edit column -> Add column based on this column
  • New column name: hbz-id (entries with no Sigel get DBS-<dbs-id> ids in lobid-organisations)
  • Expression: cell.recon.best.id -> OK
  • On the left, select all again (to get back the rows where we did not add the id)
  • In the upper right, Export -> Comma-separated values

The exported CSV file contains all original rows, with 1015 of 1027 now containing a hbz-id. I've uploaded it here: https://gist.github.com/fsteeg/df41a245b9ee404ef036

@hauschke Is this usable for your use case? Anything missing or wrong?

@hauschke

This comment has been minimized.

Show comment
Hide comment
@hauschke

hauschke Aug 24, 2015

Thank you very much, it works like a charm! 😄

hauschke commented Aug 24, 2015

Thank you very much, it works like a charm! 😄

@acka47

This comment has been minimized.

Show comment
Hide comment
@acka47

acka47 Aug 24, 2015

Contributor

As @hauschke is satisfied, a +1 from me. I haven't tried it myself, though.

Contributor

acka47 commented Aug 24, 2015

As @hauschke is satisfied, a +1 from me. I haven't tried it myself, though.

@acka47 acka47 added deploy and removed review labels Aug 24, 2015

@acka47 acka47 assigned fsteeg and unassigned acka47 Aug 24, 2015

@fsteeg

This comment has been minimized.

Show comment
Hide comment
@fsteeg

fsteeg Aug 24, 2015

Contributor

Great, happy it works for you @hauschke. Closing.

Contributor

fsteeg commented Aug 24, 2015

Great, happy it works for you @hauschke. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment