Add script for unlinking some bad agents. #1153

jannistsiroyannis · 2022-09-14T14:06:44Z

No description provided.

niklasl · 2022-09-15T07:14:54Z

Is LXL-3919 the correct issue for this?

jannistsiroyannis · 2022-09-15T08:43:56Z

Is LXL-3919 the correct issue for this?

Urgh.. nope :P It's LXL-3913

jannistsiroyannis · 2022-09-15T08:47:56Z

I also have a sneaking suspicion that maybe we shouldn't just remove them, but replace them with local entities?

niklasl · 2022-09-15T10:00:27Z

Ah, check. Yes, its only the linkage that's to be removed; "some" local entity data should be kept (presumably only names and possibly lifeSpan (albeit that might be wrong if added on the linked entity after original match)). Ask for clarification/spec of fields.

niklasl · 2022-09-15T10:09:09Z

whelktool/scripts/cleanups/2022/09/lxl-3919-unlink-agents.groovy

+boolean isABadLink(String candidate) {
+    String s = candidate.substring(0, candidate.length()-3) // Trim off the #it
+    System.err.println(candidate + " -> " + s)
+    return s.endsWith("53hlt8kp5swj700") ||


Nit-pick: safer (and slightly faster) to put these in a Set and use that here and to build the select string.

To use a Set I'd instead have to shave of the https://libris{-qa,-stg}/ (which varies in length), which could go wrong. I think I'll keep this as is!

Or use candidate.substring(candidate.lastIndexOf('/') + 1, candidate.lastIndexOf('#'))?

candidate.split('[#/]')[-2] should work too!

Yes, that is certainly much more succinct (albeit somewhat less performant, as it would parse a regexp (unless cached under the hood), splitting on all /:s and building a new list of those substrings; though I bet any performance loss is dwarfed by the I/O going on here).

Add script for unlinking some bad agents.

f35653d

jannistsiroyannis requested review from niklasl, andersju, klngwll, kaipoykio, lrosenstrom and kwahlin September 14, 2022 14:06

niklasl reviewed Sep 15, 2022

View reviewed changes

jannistsiroyannis added 2 commits September 15, 2022 13:13

Fix cleanup script after review.

bb91ce7

Consider 'sourceConsulted'-info when unlinking agents.

3232cd1

jannistsiroyannis requested a review from niklasl September 26, 2022 12:57

jannistsiroyannis merged commit 56efbcb into develop Sep 27, 2022

jannistsiroyannis deleted the feature/lxl-3919 branch September 27, 2022 09:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add script for unlinking some bad agents. #1153

Add script for unlinking some bad agents. #1153

jannistsiroyannis commented Sep 14, 2022

niklasl commented Sep 15, 2022

jannistsiroyannis commented Sep 15, 2022 •

edited

Loading

jannistsiroyannis commented Sep 15, 2022

niklasl commented Sep 15, 2022

niklasl Sep 15, 2022

jannistsiroyannis Sep 15, 2022

niklasl Sep 15, 2022

kwahlin Sep 16, 2022

niklasl Sep 16, 2022

Add script for unlinking some bad agents. #1153

Add script for unlinking some bad agents. #1153

Conversation

jannistsiroyannis commented Sep 14, 2022

niklasl commented Sep 15, 2022

jannistsiroyannis commented Sep 15, 2022 • edited Loading

jannistsiroyannis commented Sep 15, 2022

niklasl commented Sep 15, 2022

niklasl Sep 15, 2022

Choose a reason for hiding this comment

jannistsiroyannis Sep 15, 2022

Choose a reason for hiding this comment

niklasl Sep 15, 2022

Choose a reason for hiding this comment

kwahlin Sep 16, 2022

Choose a reason for hiding this comment

niklasl Sep 16, 2022

Choose a reason for hiding this comment

jannistsiroyannis commented Sep 15, 2022 •

edited

Loading