Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add script for unlinking some bad agents. #1153

Merged
merged 3 commits into from
Sep 27, 2022
Merged

Conversation

jannistsiroyannis
Copy link
Contributor

No description provided.

@niklasl
Copy link
Member

niklasl commented Sep 15, 2022

Is LXL-3919 the correct issue for this?

@jannistsiroyannis
Copy link
Contributor Author

jannistsiroyannis commented Sep 15, 2022

Is LXL-3919 the correct issue for this?

Urgh.. nope :P It's LXL-3913

@jannistsiroyannis
Copy link
Contributor Author

I also have a sneaking suspicion that maybe we shouldn't just remove them, but replace them with local entities?

@niklasl
Copy link
Member

niklasl commented Sep 15, 2022

Ah, check. Yes, its only the linkage that's to be removed; "some" local entity data should be kept (presumably only names and possibly lifeSpan (albeit that might be wrong if added on the linked entity after original match)). Ask for clarification/spec of fields.

boolean isABadLink(String candidate) {
String s = candidate.substring(0, candidate.length()-3) // Trim off the #it
System.err.println(candidate + " -> " + s)
return s.endsWith("53hlt8kp5swj700") ||
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit-pick: safer (and slightly faster) to put these in a Set and use that here and to build the select string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To use a Set I'd instead have to shave of the https://libris{-qa,-stg}/ (which varies in length), which could go wrong. I think I'll keep this as is!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or use candidate.substring(candidate.lastIndexOf('/') + 1, candidate.lastIndexOf('#'))?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

candidate.split('[#/]')[-2] should work too!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is certainly much more succinct (albeit somewhat less performant, as it would parse a regexp (unless cached under the hood), splitting on all /:s and building a new list of those substrings; though I bet any performance loss is dwarfed by the I/O going on here).

@jannistsiroyannis jannistsiroyannis merged commit 56efbcb into develop Sep 27, 2022
@jannistsiroyannis jannistsiroyannis deleted the feature/lxl-3919 branch September 27, 2022 09:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants