Skip to content

Update unicode fixer script#12435

Merged
mekarpeles merged 5 commits intointernetarchive:masterfrom
jimchamp:update-unicode-fixer-script
Apr 27, 2026
Merged

Update unicode fixer script#12435
mekarpeles merged 5 commits intointernetarchive:masterfrom
jimchamp:update-unicode-fixer-script

Conversation

@jimchamp
Copy link
Copy Markdown
Collaborator

Follows #12224

Makes the following updates to fix_unicode_html_entities.py:

  • Makes script executable
  • Add Python3 shebang to script
  • Runs web.ctx.site.save as ImportBot
  • Adds unicode-fixup event name to save transactions

Technical

RunAs may still be broken when used in a scripting context. If so, I may try using openlibrary/api.py to update records.

Testing

Screenshot

Stakeholders

Copilot AI review requested due to automatic review settings April 23, 2026 23:16
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the one-off migration script which detects and fixes HTML-escaped Unicode entities in OL dump-derived records, improving auditability of writes and ensuring edits are attributed to a bot account.

Changes:

  • Adds a Python 3 shebang so the script can be executed directly.
  • Wraps record saves in RunAs('ImportBot') so changes are authored by ImportBot.
  • Adds a specific action="unicode-fixup" to saved transactions for easier filtering/analysis in change logs.

Comment thread scripts/migrations/fix_unicode_html_entities.py
data.update(updates)
web.ctx.site.save(data, comment="Fix HTML entity encoding in Unicode fields")
with RunAs("ImportBot"):
web.ctx.site.save(data, comment="Fix HTML entity encoding in Unicode fields", action="unicode-fixup")
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is unicode-fixup a suitable event name for these updates? We could also get the record type from the key and use something like f"edit-{type}".

Comment thread scripts/migrations/fix_unicode_html_entities.py Outdated
Co-authored-by: Tom Morris <tfmorris@gmail.com>
@mekarpeles mekarpeles merged commit abdf170 into internetarchive:master Apr 27, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants