Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add messages.pot update script to pre-commit. #8900

Conversation

rebecca-shoptaw
Copy link
Collaborator

@rebecca-shoptaw rebecca-shoptaw commented Mar 12, 2024

Closes #8818.

Feature. Automates generation of new messages.pot file for translation whenever Python or HTML files that affect i18n strings are changed.

Technical

The implementation is in two parts and mainly confined to two files:

  1. pre-commit-config.yaml -- Added a new pre-commit local hook that runs the ./scripts/i18n-messages extract script to generate a new messages.pot file and is activated by changes to Python or HTML files.

Note: This hook has a few dependencies will need to be updated to match the current version when we update Babel, multipart, and/or web.py.

  1. __init__.py in openlibrary/i18n -- Wrote new logic into the message extraction function, using a straightforward symmetric_difference compare between the old and new i18n messages to make sure that messages.pot only gets updated when i18n strings have been added/modified/deleted.

Note: This was done to prevent an infinite loop of failure within the pre-commit hook, as the hook will otherwise update the template time field in messages.pot on every run, and report a failed run as a result of the file modification, which prevents CI autofixing.

This extra logic also had the nice side effect of ensuring that there will be no messages.pot updates for totally unrelated files, as only actual relevant string changes will trigger the re-generation.

Testing

  1. Make a change to a Python or HTML file that adds, removes or modifies i18n-formatted text
  2. Install pre-commit if not installed
  3. Commit the change; pre-commit should automatically generate a new messages.pot file

Stakeholders

@cdrini

@cdrini cdrini marked this pull request as draft March 12, 2024 20:03
@rebecca-shoptaw rebecca-shoptaw force-pushed the 8818/feature/add-pot-updates-to-precommit branch from 40dff61 to 860b4c9 Compare March 12, 2024 20:12
Copy link
Collaborator

@scottbarnes scottbarnes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the SyntaxWarning: invalid escape sequence '\ ' is coming from something that i18n-messages extract is calling, rather than from pre-commit.

I don't think this is the solution, but with the change to use pass_filenames: false instead of files, and the following, pre-commit shows Passed for Generate POT:

diff --git a/scripts/i18n-messages b/scripts/i18n-messages
index 98f28e5d6..4085c927c 100755
--- a/scripts/i18n-messages
+++ b/scripts/i18n-messages
@@ -4,6 +4,10 @@ macros and write to openlibrary/i18n/messages.pot file.
 """
 import _init_path  # noqa: F401  Imported for its side effect of setting PYTHONPATH
 
+import warnings
+
+warnings.simplefilter('ignore')
+
 import sys
 from openlibrary import i18n

Of note, this SyntaxWarning only shows up with Python 3.12, and shows up regardless whether pre-commit is used. See also the second change listed here for Python 3.12: https://docs.python.org/3/whatsnew/3.12.html#other-language-changes.

@rebecca-shoptaw rebecca-shoptaw force-pushed the 8818/feature/add-pot-updates-to-precommit branch 3 times, most recently from b4876a7 to 9e0e962 Compare March 27, 2024 01:29
@rebecca-shoptaw rebecca-shoptaw marked this pull request as ready for review March 27, 2024 01:31
Copy link
Collaborator

@cdrini cdrini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thank you @rebecca-shoptaw !! I tested this in gitpod/locally, and:

  • ✅ When no html/python files staged it doesn't run
  • ✅ When HTML file staged with no i18n lines changed, it runs but doesn't save
  • ✅ When HTML file staged with i18n lines moved but not modified, it runs but doesn't save
  • ✅ When HTML file staged with i18n lines modified, it runs and saves

There's one case I realized would cause some issues; on my machine I have a bunch of untracked files, some of which are html/python! These are getting added to the messages.pot file when I do a commit, which is not what we want. I reckon we should try to exclude untracked/unstaged files when running from pre-commit. Maybe a --pre-commit option that adds these exclusions?

    # Exclude untracked git files.
    # git ls-files --others --exclude-standard
    untracked_files = {
        Path(line)
        for dir in dirs
        for line in subprocess.run(
            ['git', 'ls-files', '--others', '--exclude-standard', dir],
            stdout=subprocess.PIPE,
            text=True,
        ).stdout.split('\n')
        if line.endswith(('.py', '.html'))
    }

    # Also exclude unstaged git files.
    # git diff --name-only
    unstaged_files = {
        Path(line)
        for line in subprocess.run(
            ['git', 'diff', '--name-only'],
            stdout=subprocess.PIPE,
            text=True,
        ).stdout.split('\n')
        if line.endswith(('.py', '.html'))
    }

    excluded_files = untracked_files | unstaged_files

And then when it builds the catalog makes sure to toss these out?

Otherwise a few small code cleanups/fixes!

openlibrary/i18n/__init__.py Outdated Show resolved Hide resolved
openlibrary/i18n/__init__.py Outdated Show resolved Hide resolved
openlibrary/i18n/__init__.py Outdated Show resolved Hide resolved
scripts/i18n-messages Outdated Show resolved Hide resolved
openlibrary/i18n/__init__.py Outdated Show resolved Hide resolved
openlibrary/i18n/__init__.py Outdated Show resolved Hide resolved
openlibrary/i18n/__init__.py Outdated Show resolved Hide resolved
openlibrary/i18n/__init__.py Outdated Show resolved Hide resolved
.pre-commit-config.yaml Show resolved Hide resolved
@cdrini cdrini added the Needs: Submitter Input Waiting on input from the creator of the issue/pr [managed] label Apr 9, 2024
@rebecca-shoptaw rebecca-shoptaw force-pushed the 8818/feature/add-pot-updates-to-precommit branch 3 times, most recently from cae27a4 to afec73d Compare April 11, 2024 16:36
@rebecca-shoptaw rebecca-shoptaw removed the Needs: Submitter Input Waiting on input from the creator of the issue/pr [managed] label Apr 16, 2024
@cdrini cdrini added the Priority: 1 Do this week, receiving emails, time sensitive, . [managed] label Apr 29, 2024
Copy link
Collaborator

@cdrini cdrini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm! One small change

openlibrary/i18n/__init__.py Outdated Show resolved Hide resolved
openlibrary/i18n/__init__.py Outdated Show resolved Hide resolved
@rebecca-shoptaw rebecca-shoptaw force-pushed the 8818/feature/add-pot-updates-to-precommit branch from afa7d32 to 6564803 Compare April 30, 2024 14:32
@cdrini
Copy link
Collaborator

cdrini commented May 1, 2024

We talked Creation-Date potentially causing a few headaches via merge conflicts ; we decided to create a separate issue to fix the Creation-Date to the first day of the current year; this should hopefully not interfere with any translator flows, whilst reducing conflict headaches for us!

@cdrini cdrini merged commit b6dbb56 into internetarchive:master May 1, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority: 1 Do this week, receiving emails, time sensitive, . [managed]
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add messages.pot updates to pre-commit hook (if possible)
3 participants