Skip to content

feat: xtask gettext#12676

Merged
danielrainer merged 1 commit into
fish-shell:masterfrom
danielrainer:gettext_xtask
Apr 30, 2026
Merged

feat: xtask gettext#12676
danielrainer merged 1 commit into
fish-shell:masterfrom
danielrainer:gettext_xtask

Conversation

@danielrainer
Copy link
Copy Markdown

Rewrite the PO file handling logic in Rust and make it available via an
xtask. Replaces the
build_tools/{update_translations,fish_xgettext}.fish scripts.

Main benefits:

  • Better ergonomics
  • Better error handling
  • Eliminates the need for a fish executable for updating PO files,
    which is particularly useful in CI
  • Improved performance, mainly due to concurrent threads working on the
    PO files in parallel

The behavior is mostly unchanged, with the minor exception that section
headers for empty sections are now omitted in PO files.
The interface for invoking the tooling is quite different. Instead of
working with flags, cargo xtask gettext has 3 subcommands:

  • update modifies the PO files to match the current sources
  • check is like update, but instead of modifying the PO files, it
    shows diffs between the current version of the PO files and what they
    would look like after updating. When there is a difference, the xtask
    exits non-zero, making it useful for checks to detect outdated PO
    files.
  • new creates a new PO file for the given language.

Both the update and check command take any number of file paths to
specify the PO files to consider. If none are specified, all files in
localization/po/ are considered.

Extracting gettext messages from Rust still requires compiling with the
gettext-extract feature active. In situations where compilation is
needed for other purposes as well, it can make sense to only build once
and then tell the gettext xtask about the directory into which the
messages have been extracted. This can be done via the
--rust-extraction-dir flag. If we stop having gettext messages in
Rust, this logic can be removed.

Stacked on #12674, #12675

TODOs:

  • Changes to fish usage are reflected in user documentation/manpages.
  • User-visible changes noted in CHANGELOG.rst

@krobelus krobelus added this to the fish 4.7 milestone Apr 25, 2026
Comment thread crates/xtask/src/gettext.rs Outdated
.args(["--no-wrap", "--no-obsolete", "-o"])
.arg(msgattrib_output_file.path())
.arg(file_to_update.as_ref())
.get_stdout_or_error()?;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The output of get_stdout_or_error() is never used, so we probably shouldn't add it as-is.

It also seems weird to call get_stdout_or_error() while ignoring stdout.
I guess the difference to run_or_bail() is that

  • get_stdout_or_error() captures stdout and stderr
  • run_or_bail() lets the child inherit stdout/stderr

but get_stdout_or_error() displays stderr of failure, so I'm not sure which one to use.
AFAICT, run_or_bail() semantics are better (like for slow child processes, or processes that print warnings to stderr but still succeed).

Maybe merge run_or_bail and get_stdout_or_error into one function while there's no significant difference.

If we capture stdout/stderr, we could bail or warn when the exit status is 0 but they are non-empty.
Here's how we propagate sphinx-build warnings in this case:

if !out.stderr.is_empty() {
    rsconf::warn!("sphinx-build: {}", String::from_utf8_lossy(&out.stderr));
}

Maybe we can put such warnings/assertions in a shared function, perhaps give it an optional input to stdin so that it can support msguniq.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, these two functions used to have meaningful differences, but after switching error handling to anyhow, these are no longer present. Only the run_or_bail() functionality is needed at the moment, so I replaced the calls to get_stdout_or_error() and renamed run_or_bail() to just run() as you suggested.

The function for providing custom stdin and capturing stdout is useful, but it could be expanded, to remove duplication at the call sites, so that's what I did.

Right now, stderr isn't captured by any of the functions. I don't think we currently have a reason to check whether stderr is empty, the exit status should be enough to decide whether to error out. One thing we might want to consider is interleaving of the stderr output of concurrent threads.

Comment thread crates/xtask/src/gettext.rs Outdated
Comment thread crates/xtask/src/gettext.rs Outdated
Some(dir) => Template::new(dir)?,
None => {
let temp_dir = fish_tempfile::new_dir().context("Failed to create temp file")?;
Command::new(env!("CARGO"))
Copy link
Copy Markdown
Contributor

@krobelus krobelus Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we use env!("CARGO") instead of hardcoded cargo because the first is slightly more correct in edge cases. Not sure why exactly though.
Now that env!("CARGO") is used twice, maybe we should extract it as const CARGO (or const HOST_CARGO or const NATIVE_CARGO).

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added CARGO. For our xtasks, I don't think it matters much which cargo is used (particularly host vs native).

}
let mut error = None;
for handle in thread_handles {
// SAFETY: `handle.join()` only returns `Err` if the thread panicked.
Copy link
Copy Markdown
Contributor

@krobelus krobelus Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if SAFETY is the correct word here because crashing via unwrap() should be safe (as in, not UB).
But I don't have a better suggestion.

Comment thread crates/xtask/src/gettext.rs Outdated
Comment thread crates/xtask/src/gettext.rs Outdated
.read(true)
.open(&entry_path)
.with_context(|| format!("Failed to open file {entry_path:?}"))?;
file.read_to_end(&mut concatenated_content)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now it seems a bit odd that messages extracted from Rust are in PO format, while messages from fish script are extracted on the fly,
which means we have logic to create msgfmt entries in both gettext-extraction and xtask gettext.
We could get rid of the one in gettext-extraction now, but we'd need to find a new file format that supports a list of (potentially multiline) messages.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we could change that, but I don't think it's a priority, since the plan is to rip out that code once we have fully switched to Fluent for Rust messages.

Comment thread crates/xtask/src/gettext.rs Outdated
}
// The Header entry needs to be removed again,
// because it is added outside of this function.
let expected_prefix = MINIMAL_HEADER.as_bytes();
Copy link
Copy Markdown
Contributor

@krobelus krobelus Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re the comment: then why are we adding MINIMAL_HEADER in the first place?
msguniq --no-wrap --sort-output shouldn't need it, no?
Asserting that the empty msgid sorts before anything doesn't seem important either.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem there is that gettext tools don't like non-ASCII msgids without an encoding specified, which was the reason why the header was introduced. That didn't change with the Rust rewrite, just the method for inserting and removing the header. Asserting that it comes first is important to ensure that we actually remove the header again, instead of some other stuff.

Comment thread crates/xtask/src/gettext.rs Outdated
danielrainer pushed a commit to danielrainer/fish-shell that referenced this pull request Apr 25, 2026
Rewrite the PO file handling logic in Rust and make it available via an
xtask. Replaces the
`build_tools/{update_translations,fish_xgettext}.fish` scripts.

Main benefits:
- Better ergonomics
- Better error handling
- Eliminates the need for a fish executable for updating PO files,
  which is particularly useful in CI
- Improved performance, mainly due to concurrent threads working on the
  PO files in parallel

The behavior is mostly unchanged, with the minor exception that section
headers for empty sections are now omitted in PO files.
The interface for invoking the tooling is quite different. Instead of
working with flags, `cargo xtask gettext` has 3 subcommands:
- `update` modifies the PO files to match the current sources
- `check` is like update, but instead of modifying the PO files, it
  shows diffs between the current version of the PO files and what they
  would look like after updating. When there is a difference, the xtask
  exits non-zero, making it useful for checks to detect outdated PO
  files.
- `new` creates a new PO file for the given language.

Both the `update` and `check` command take any number of file paths to
specify the PO files to consider. If none are specified, all files in
`localization/po/` are considered.

Extracting gettext messages from Rust still requires compiling with the
`gettext-extract` feature active. In situations where compilation is
needed for other purposes as well, it can make sense to only build once
and then tell the gettext xtask about the directory into which the
messages have been extracted. This can be done via the
`--rust-extraction-dir` flag. If we stop having gettext messages in
Rust, this logic can be removed.

Closes fish-shell#12676
Comment on lines +61 to +66
.arg(template.as_ref())
.run()?;
let msgattrib_output_file = fish_tempfile::new_file().context("Failed to create temp file")?;
Command::new("msgattrib")
.args(["--no-wrap", "--no-obsolete", "-o"])
.arg(msgattrib_output_file.path())
Copy link
Copy Markdown
Contributor

@krobelus krobelus Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I had known about the trick of repeatedly calling args()/arg(),
I probably woudn't have written as_os_strs.
I guess that one might still be a little bit nicer because it allows us to write all arguments in a single array (assuming there's no other array to add to the arguments).

If we use args/args, then I think it might make sense to put the -o in a separate .arg() call, so it's not too disconnected from its option argument.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, these are two different ways of achieving the same thing. I think I slightly favor repeated .arg()/.args() calls in this case, mainly to avoid having to deal with degraded functionality of rust-analyzer in macros, but in other cases I think as_os_strs is the better option, e.g. for the sphinx-build invocation for the HTML docs stask. If we keep using as_os_strs, I think we should improve it via #12688.

.with_context(|| format!("Failed to write to temp file {:?}", template_file.path()))?;
template_file
.get_mut()
.flush()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I guess flush is enough to guarantee that our child processes see what we wrote.
So I guess we don't need to actually close the file here.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, closing would be inconvenient here, since the tempfile API has custom Drop logic to delete the file on drop, so keeping template_file in scope and open until we no longer need the backing file is useful.

let tmp_copy =
fish_tempfile::new_file().context("Failed to create temp file")?;
crate::copy_file(&path, tmp_copy.path())?;
update_po_file(tmp_copy.path(), template_path_buf)?;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Threads are a bit of a heavy-weight way to parallelize things here
because all the heavy-lifting is already done by external processes
(msgmerge, msgattrib and diff).
The class way would be the Unix shell one:

  1. let job1proc1 = spawn "msgmerge de.po"
  2. let job2proc1 = spawn "msgmerge fr.po"
  3. let r = loop { if let Some(r) = job1proc1.try_wait().or(job2proc1.try_wait()) { break r; }; sleep 1 }
  4. assume, without loss of generality: r = job1proc1
  5. job1proc2 = spawn "msgattrib ..."
  6. ...

There's no strong need to address this comment
(especially if we later replace "diff" with a Rust-native, in-process solution),
but as written now, I think we should be able to get rid of threads with few changes by adding async/await to update_po_file() and other code without changing the structure.
Haven't tried it though.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that spawning threads is a bit heavy here, but it does have benefits. If we have single-threaded Rust code spawning processes as you suggest, it becomes much more annoying to write this code, especially if we don't want to lose out on parallelism compared to the threaded approach. Your syntax works for a small, fixed number of languages known when writing the code. I had a quick look how this approach would be implemented in idiomatic Rust, and it doesn't look like this approach is really used. Instead, threads or async is used. We could use async as well, but that would require introducing an async runtime dependency (tokio or some alternative), which I think is overkill if all we want from it is the fairly low overhead of spawning a few OS threads. I also think that async code would be harder to understand than the current threaded approach.

/// Create a gettext template.
/// `rust_extraction_dir` must be the path to a directory which contains the messages
/// extracted from the Rust sources.
pub fn new<P: AsRef<Path>>(rust_extraction_dir: P) -> Result<Self> {
Copy link
Copy Markdown
Contributor

@krobelus krobelus Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

usually we define constructors before other methods like serialize but I can see why you'd put the shorter one first

Comment thread crates/xtask/src/gettext.rs Outdated
path: P,
) -> Result<Option<LocalizationTier>> {
static L10N_ANNOTATION: LazyLock<Regex> =
LazyLock::new(|| Regex::new(r"(?:^|\n)# localization: (?<tier>.*)\n").unwrap());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess technically <tier> should be called annotation_value.
Or just annotation if we rename annotation to m or r#match.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I'll rename it to annotation_value.

Comment thread crates/xtask/src/gettext.rs Outdated
Comment thread crates/xtask/src/gettext.rs Outdated

fn format_msgid_for_po(msgid: &str) -> String {
let escaped_msgid = msgid.replace("\\", "\\\\").replace("\"", "\\\"");
format!("msgid \"{escaped_msgid}\"\nmsgstr \"\"\n\n")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like using the "indoc" crate for multiline strings, but actually native multiline string literals are nice to read too, we only need to add \n

```rust
    format!(
        "\
        msgid \"{escaped_msgid}\"\n\
        msgstr \"\"\n\
        \n\
        \n\
     "
    )
```

but what you wrote is fine too

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that using multi-line literals are better for this use case. Introducing a dependency for it seems overkill.

Comment thread crates/xtask/src/lib.rs Outdated
Comment thread crates/xtask/src/lib.rs Outdated
Comment thread crates/xtask/src/lib.rs Outdated
@krobelus
Copy link
Copy Markdown
Contributor

no requirement to add to changelog since devs can check git log build_tools/update_translations.sh but we could still add something like

diff --git a/CHANGELOG.rst b/CHANGELOG.rst
index 775007d147..8f00cae4f9 100644
--- a/CHANGELOG.rst
+++ b/CHANGELOG.rst
@@ -25,6 +25,7 @@
 For distributors and developers
 -------------------------------
 - When the default global config directory (``$PREFIX/etc/fish``) exists but has been overridden via ``-DCMAKE_INSTALL_SYSCONFDIR``, fish will now respect that override (:issue:`10748`).
+- ``build_tools/update_translations.fish`` has been replaced by ``cargo xtask gettext {new,update}`` (:issue:`12676`).
 
 Regression fixes:
 -----------------

Comment thread crates/xtask/src/lib.rs
for path in all_paths {
for dir_entry in WalkDir::new(path.as_ref()) {
let dir_entry = dir_entry
.with_context(|| format!("Failed to check paths at {:?}", path.as_ref()))?;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh actually it looks like we don't need to specify the path,
because anyhow displays them already(?):

```shell
$ env -u RUST_BACKTRACE cargo xtask gettext update localization/po/fo.po
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 2.26s
     Running `target/debug/xtask gettext update localization/po/fo.po`
   Compiling fish-gettext-extraction v0.0.0 (/home/johannes/git/fish-shell/crates/gettext-extraction)
    Checking fish v4.6.0 (/home/johannes/git/fish-shell)
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 9.49s
Error: Failed to check paths at "localization/po/fo.po"

Caused by:
    0: IO error for operation on localization/po/fo.po: No such file or directory (os error 2)
    1: No such file or directory (os error 2)
```

so in general we can be lazy about the "hard facts" and focus on saying what part went wrong.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, that's a good observation. What's missing is information about what the code tried to do. Especially in cases where more than one path is relevant, like copying or moving files, it might still be valuable to be explicit about what was the source and what was the target. What's your preference regarding this function? Should we leave it as is, return to plain std::fs::copy, or modify the error message somehow?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually the part after "Caused by" does not come from anyhow but it's how walkdir's impl fmt::Display for walkdir::error::Error.
So in cases like std::fs::copy we need to keep paths

danielrainer pushed a commit to danielrainer/fish-shell that referenced this pull request Apr 29, 2026
Rewrite the PO file handling logic in Rust and make it available via an
xtask. Replaces the
`build_tools/{update_translations,fish_xgettext}.fish` scripts.

Main benefits:
- Better ergonomics
- Better error handling
- Eliminates the need for a fish executable for updating PO files,
  which is particularly useful in CI
- Improved performance, mainly due to concurrent threads working on the
  PO files in parallel

The behavior is mostly unchanged, with the minor exception that section
headers for empty sections are now omitted in PO files.
The interface for invoking the tooling is quite different. Instead of
working with flags, `cargo xtask gettext` has 3 subcommands:
- `update` modifies the PO files to match the current sources
- `check` is like update, but instead of modifying the PO files, it
  shows diffs between the current version of the PO files and what they
  would look like after updating. When there is a difference, the xtask
  exits non-zero, making it useful for checks to detect outdated PO
  files.
- `new` creates a new PO file for the given language.

Both the `update` and `check` command take any number of file paths to
specify the PO files to consider. If none are specified, all files in
`localization/po/` are considered.

Extracting gettext messages from Rust still requires compiling with the
`gettext-extract` feature active. In situations where compilation is
needed for other purposes as well, it can make sense to only build once
and then tell the gettext xtask about the directory into which the
messages have been extracted. This can be done via the
`--rust-extraction-dir` flag. If we stop having gettext messages in
Rust, this logic can be removed.

Closes fish-shell#12676
Comment thread crates/xtask/src/main.rs
/// Format files or check if they are correctly formatted.
Format(FormatArgs),
/// Update gettext files
Gettext(GettextArgs),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be better to call it translate? gettext is very implementation dependent (will that still be valid with fluent?) and not very telling to people who don't know what it is.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gettext is implementation-dependent because it does implementation-dependent things. The same goes for fluent. We don't have plans for moving the fish script message localization away from gettext, so both will coexist. For some workflows, the implementation details are not relevant. There it makes sense to have a shared xtask (maybe called localization) offering combined functionality (check, update, maybe new). But there is still a reason to have the implementation-specific xtasks, both to improve performance and to allow for subcommands that only make sense with one implementation like rename or show-missing for fluent.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from gettext the only thing that's needed in everyday scenarios is cargo xtask gettext update.
Wouldn't hurt to have cargo xtask fix someday, which would run this and other code-gen things and maybe formatting.

I think most users won't care about implementation details and about whether it takes a bit longer.
But for now this is fine, we can rename it later.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gettext is implementation-dependent because it does implementation-dependent things

To be clear, I was only suggesting to rename the xtask name (enum), not the rest of the code. main.rs could still call gettext.rs directly, or via a very thin localization.rs shim layer if you absolutely want an .rs file matching the task name.
Basically, I was suggesting to start with xtask localization, and only add the implementation CLI API (whatever that maybe, an xtask, a parameter, ...) once there are implementation-specific operations available.

As-is, it means that contributors will have to learn to call xtask gettext instead of update_translations.fish, only to then have to learn xtask localization later. Might as well start with the latter, and let them learn the former only if/when needed for the specific ops later.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was only suggesting to rename the xtask name

Yes, that's how I understood it.

start with xtask localization

My point is that the current operations are gettext-specific. For Fluent, we'll have a separate xtask fluent, which offers different functionality with a different CLI. This is still being designed, so it's too early to settle on a common xtask localization, because we don't know yet what that CLI would have to look like to support both Fluent and gettext. This might also depend on whether we introduce it before or after porting all Rust messages to Fluent.

Generally, I agree that we probably want some xtask localization CLI, but I think it's too early to build that. For now, I think the xtask gettext is fine, especially because the only subcommand that's really relevant for most contributors is update.

Comment thread crates/xtask/src/gettext.rs
@krobelus
Copy link
Copy Markdown
Contributor

I think this looks good, feel free to merge if you don't need feedback, else I can look at the range-diff also

Rewrite the PO file handling logic in Rust and make it available via an
xtask. Replaces the
`build_tools/{update_translations,fish_xgettext}.fish` scripts.

Main benefits:
- Better ergonomics
- Better error handling
- Eliminates the need for a fish executable for updating PO files,
  which is particularly useful in CI
- Improved performance, mainly due to concurrent threads working on the
  PO files in parallel

The behavior is mostly unchanged, with the minor exception that section
headers for empty sections are now omitted in PO files.
The interface for invoking the tooling is quite different. Instead of
working with flags, `cargo xtask gettext` has 3 subcommands:
- `update` modifies the PO files to match the current sources
- `check` is like update, but instead of modifying the PO files, it
  shows diffs between the current version of the PO files and what they
  would look like after updating. When there is a difference, the xtask
  exits non-zero, making it useful for checks to detect outdated PO
  files.
- `new` creates a new PO file for the given language.

Both the `update` and `check` command take any number of file paths to
specify the PO files to consider. If none are specified, all files in
`localization/po/` are considered.

Extracting gettext messages from Rust still requires compiling with the
`gettext-extract` feature active. In situations where compilation is
needed for other purposes as well, it can make sense to only build once
and then tell the gettext xtask about the directory into which the
messages have been extracted. This can be done via the
`--rust-extraction-dir` flag. If we stop having gettext messages in
Rust, this logic can be removed.

Closes fish-shell#12676
@danielrainer danielrainer merged commit 4e3898d into fish-shell:master Apr 30, 2026
18 checks passed
@danielrainer danielrainer deleted the gettext_xtask branch April 30, 2026 17:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants