feat: `xtask gettext` by danielrainer · Pull Request #12676 · fish-shell/fish-shell

danielrainer · 2026-04-25T01:44:32Z

Rewrite the PO file handling logic in Rust and make it available via an
xtask. Replaces the
build_tools/{update_translations,fish_xgettext}.fish scripts.

Main benefits:

Better ergonomics
Better error handling
Eliminates the need for a fish executable for updating PO files,
which is particularly useful in CI
Improved performance, mainly due to concurrent threads working on the
PO files in parallel

The behavior is mostly unchanged, with the minor exception that section
headers for empty sections are now omitted in PO files.
The interface for invoking the tooling is quite different. Instead of
working with flags, cargo xtask gettext has 3 subcommands:

update modifies the PO files to match the current sources
check is like update, but instead of modifying the PO files, it
shows diffs between the current version of the PO files and what they
would look like after updating. When there is a difference, the xtask
exits non-zero, making it useful for checks to detect outdated PO
files.
new creates a new PO file for the given language.

Both the update and check command take any number of file paths to
specify the PO files to consider. If none are specified, all files in
localization/po/ are considered.

Extracting gettext messages from Rust still requires compiling with the
gettext-extract feature active. In situations where compilation is
needed for other purposes as well, it can make sense to only build once
and then tell the gettext xtask about the directory into which the
messages have been extracted. This can be done via the
--rust-extraction-dir flag. If we stop having gettext messages in
Rust, this logic can be removed.

Stacked on #12674, #12675

TODOs:

Changes to fish usage are reflected in user documentation/manpages.
User-visible changes noted in CHANGELOG.rst

krobelus · 2026-04-25T09:03:01Z

+        .args(["--no-wrap", "--no-obsolete", "-o"])
+        .arg(msgattrib_output_file.path())
+        .arg(file_to_update.as_ref())
+        .get_stdout_or_error()?;


The output of get_stdout_or_error() is never used, so we probably shouldn't add it as-is.

It also seems weird to call get_stdout_or_error() while ignoring stdout.
I guess the difference to run_or_bail() is that

get_stdout_or_error() captures stdout and stderr

run_or_bail() lets the child inherit stdout/stderr

but get_stdout_or_error() displays stderr of failure, so I'm not sure which one to use.
AFAICT, run_or_bail() semantics are better (like for slow child processes, or processes that print warnings to stderr but still succeed).

Maybe merge run_or_bail and get_stdout_or_error into one function while there's no significant difference.

If we capture stdout/stderr, we could bail or warn when the exit status is 0 but they are non-empty.
Here's how we propagate sphinx-build warnings in this case:

if !out.stderr.is_empty() { rsconf::warn!("sphinx-build: {}", String::from_utf8_lossy(&out.stderr)); }

Maybe we can put such warnings/assertions in a shared function, perhaps give it an optional input to stdin so that it can support msguniq.

Yes, these two functions used to have meaningful differences, but after switching error handling to anyhow, these are no longer present. Only the run_or_bail() functionality is needed at the moment, so I replaced the calls to get_stdout_or_error() and renamed run_or_bail() to just run() as you suggested.

The function for providing custom stdin and capturing stdout is useful, but it could be expanded, to remove duplication at the call sites, so that's what I did.

Right now, stderr isn't captured by any of the functions. I don't think we currently have a reason to check whether stderr is empty, the exit status should be enough to decide whether to error out. One thing we might want to consider is interleaving of the stderr output of concurrent threads.

krobelus · 2026-04-25T09:03:01Z

+        Some(dir) => Template::new(dir)?,
+        None => {
+            let temp_dir = fish_tempfile::new_dir().context("Failed to create temp file")?;
+            Command::new(env!("CARGO"))


I guess we use env!("CARGO") instead of hardcoded cargo because the first is slightly more correct in edge cases. Not sure why exactly though.
Now that env!("CARGO") is used twice, maybe we should extract it as const CARGO (or const HOST_CARGO or const NATIVE_CARGO).

I added CARGO. For our xtasks, I don't think it matters much which cargo is used (particularly host vs native).

krobelus · 2026-04-25T09:03:01Z

+            }
+            let mut error = None;
+            for handle in thread_handles {
+                // SAFETY: `handle.join()` only returns `Err` if the thread panicked.


Not sure if SAFETY is the correct word here because crashing via unwrap() should be safe (as in, not UB).
But I don't have a better suggestion.

krobelus · 2026-04-25T09:03:01Z

+                .read(true)
+                .open(&entry_path)
+                .with_context(|| format!("Failed to open file {entry_path:?}"))?;
+            file.read_to_end(&mut concatenated_content)


now it seems a bit odd that messages extracted from Rust are in PO format, while messages from fish script are extracted on the fly,
which means we have logic to create msgfmt entries in both gettext-extraction and xtask gettext.
We could get rid of the one in gettext-extraction now, but we'd need to find a new file format that supports a list of (potentially multiline) messages.

Yes, we could change that, but I don't think it's a priority, since the plan is to rip out that code once we have fully switched to Fluent for Rust messages.

krobelus · 2026-04-25T09:03:01Z

+        }
+        // The Header entry needs to be removed again,
+        // because it is added outside of this function.
+        let expected_prefix = MINIMAL_HEADER.as_bytes();


re the comment: then why are we adding MINIMAL_HEADER in the first place?
msguniq --no-wrap --sort-output shouldn't need it, no?
Asserting that the empty msgid sorts before anything doesn't seem important either.

The problem there is that gettext tools don't like non-ASCII msgids without an encoding specified, which was the reason why the header was introduced. That didn't change with the Rust rewrite, just the method for inserting and removing the header. Asserting that it comes first is important to ensure that we actually remove the header again, instead of some other stuff.

Rewrite the PO file handling logic in Rust and make it available via an xtask. Replaces the `build_tools/{update_translations,fish_xgettext}.fish` scripts. Main benefits: - Better ergonomics - Better error handling - Eliminates the need for a fish executable for updating PO files, which is particularly useful in CI - Improved performance, mainly due to concurrent threads working on the PO files in parallel The behavior is mostly unchanged, with the minor exception that section headers for empty sections are now omitted in PO files. The interface for invoking the tooling is quite different. Instead of working with flags, `cargo xtask gettext` has 3 subcommands: - `update` modifies the PO files to match the current sources - `check` is like update, but instead of modifying the PO files, it shows diffs between the current version of the PO files and what they would look like after updating. When there is a difference, the xtask exits non-zero, making it useful for checks to detect outdated PO files. - `new` creates a new PO file for the given language. Both the `update` and `check` command take any number of file paths to specify the PO files to consider. If none are specified, all files in `localization/po/` are considered. Extracting gettext messages from Rust still requires compiling with the `gettext-extract` feature active. In situations where compilation is needed for other purposes as well, it can make sense to only build once and then tell the gettext xtask about the directory into which the messages have been extracted. This can be done via the `--rust-extraction-dir` flag. If we stop having gettext messages in Rust, this logic can be removed. Closes fish-shell#12676

krobelus · 2026-04-27T07:27:31Z

+        .arg(template.as_ref())
+        .run()?;
+    let msgattrib_output_file = fish_tempfile::new_file().context("Failed to create temp file")?;
+    Command::new("msgattrib")
+        .args(["--no-wrap", "--no-obsolete", "-o"])
+        .arg(msgattrib_output_file.path())


If I had known about the trick of repeatedly calling args()/arg(),
I probably woudn't have written as_os_strs.
I guess that one might still be a little bit nicer because it allows us to write all arguments in a single array (assuming there's no other array to add to the arguments).

If we use args/args, then I think it might make sense to put the -o in a separate .arg() call, so it's not too disconnected from its option argument.

Yes, these are two different ways of achieving the same thing. I think I slightly favor repeated .arg()/.args() calls in this case, mainly to avoid having to deal with degraded functionality of rust-analyzer in macros, but in other cases I think as_os_strs is the better option, e.g. for the sphinx-build invocation for the HTML docs stask. If we keep using as_os_strs, I think we should improve it via #12688.

krobelus · 2026-04-27T07:27:31Z

+        .with_context(|| format!("Failed to write to temp file {:?}", template_file.path()))?;
+    template_file
+        .get_mut()
+        .flush()


Ok I guess flush is enough to guarantee that our child processes see what we wrote.
So I guess we don't need to actually close the file here.

Yes, closing would be inconvenient here, since the tempfile API has custom Drop logic to delete the file on drop, so keeping template_file in scope and open until we no longer need the backing file is useful.

krobelus · 2026-04-27T07:27:31Z

+                    let tmp_copy =
+                        fish_tempfile::new_file().context("Failed to create temp file")?;
+                    crate::copy_file(&path, tmp_copy.path())?;
+                    update_po_file(tmp_copy.path(), template_path_buf)?;


Threads are a bit of a heavy-weight way to parallelize things here
because all the heavy-lifting is already done by external processes
(msgmerge, msgattrib and diff).
The class way would be the Unix shell one:

let job1proc1 = spawn "msgmerge de.po"

let job2proc1 = spawn "msgmerge fr.po"

let r = loop { if let Some(r) = job1proc1.try_wait().or(job2proc1.try_wait()) { break r; }; sleep 1 }

assume, without loss of generality: r = job1proc1

job1proc2 = spawn "msgattrib ..."

...

There's no strong need to address this comment
(especially if we later replace "diff" with a Rust-native, in-process solution),
but as written now, I think we should be able to get rid of threads with few changes by adding async/await to update_po_file() and other code without changing the structure.
Haven't tried it though.

I agree that spawning threads is a bit heavy here, but it does have benefits. If we have single-threaded Rust code spawning processes as you suggest, it becomes much more annoying to write this code, especially if we don't want to lose out on parallelism compared to the threaded approach. Your syntax works for a small, fixed number of languages known when writing the code. I had a quick look how this approach would be implemented in idiomatic Rust, and it doesn't look like this approach is really used. Instead, threads or async is used. We could use async as well, but that would require introducing an async runtime dependency (tokio or some alternative), which I think is overkill if all we want from it is the fairly low overhead of spawning a few OS threads. I also think that async code would be harder to understand than the current threaded approach.

krobelus · 2026-04-27T07:27:31Z

+        /// Create a gettext template.
+        /// `rust_extraction_dir` must be the path to a directory which contains the messages
+        /// extracted from the Rust sources.
+        pub fn new<P: AsRef<Path>>(rust_extraction_dir: P) -> Result<Self> {


usually we define constructors before other methods like serialize but I can see why you'd put the shorter one first

krobelus · 2026-04-27T07:27:31Z

+        path: P,
+    ) -> Result<Option<LocalizationTier>> {
+        static L10N_ANNOTATION: LazyLock<Regex> =
+            LazyLock::new(|| Regex::new(r"(?:^|\n)# localization: (?<tier>.*)\n").unwrap());


I guess technically <tier> should be called annotation_value.
Or just annotation if we rename annotation to m or r#match.

Good point, I'll rename it to annotation_value.

krobelus · 2026-04-27T07:27:31Z

+
+    fn format_msgid_for_po(msgid: &str) -> String {
+        let escaped_msgid = msgid.replace("\\", "\\\\").replace("\"", "\\\"");
+        format!("msgid \"{escaped_msgid}\"\nmsgstr \"\"\n\n")


I like using the "indoc" crate for multiline strings, but actually native multiline string literals are nice to read too, we only need to add \n

```rust format!( "\ msgid \"{escaped_msgid}\"\n\ msgstr \"\"\n\ \n\ \n\ " ) ```

but what you wrote is fine too

I agree that using multi-line literals are better for this use case. Introducing a dependency for it seems overkill.

krobelus · 2026-04-27T07:45:23Z

no requirement to add to changelog since devs can check git log build_tools/update_translations.sh but we could still add something like

diff --git a/CHANGELOG.rst b/CHANGELOG.rst
index 775007d147..8f00cae4f9 100644
--- a/CHANGELOG.rst
+++ b/CHANGELOG.rst
@@ -25,6 +25,7 @@
 For distributors and developers
 -------------------------------
 - When the default global config directory (``$PREFIX/etc/fish``) exists but has been overridden via ``-DCMAKE_INSTALL_SYSCONFDIR``, fish will now respect that override (:issue:`10748`).
+- ``build_tools/update_translations.fish`` has been replaced by ``cargo xtask gettext {new,update}`` (:issue:`12676`).
 
 Regression fixes:
 -----------------

krobelus · 2026-04-27T07:45:40Z

+    for path in all_paths {
+        for dir_entry in WalkDir::new(path.as_ref()) {
+            let dir_entry = dir_entry
+                .with_context(|| format!("Failed to check paths at {:?}", path.as_ref()))?;


oh actually it looks like we don't need to specify the path,
because anyhow displays them already(?):

```shell $ env -u RUST_BACKTRACE cargo xtask gettext update localization/po/fo.po Finished `dev` profile [unoptimized + debuginfo] target(s) in 2.26s Running `target/debug/xtask gettext update localization/po/fo.po` Compiling fish-gettext-extraction v0.0.0 (/home/johannes/git/fish-shell/crates/gettext-extraction) Checking fish v4.6.0 (/home/johannes/git/fish-shell) Finished `dev` profile [unoptimized + debuginfo] target(s) in 9.49s Error: Failed to check paths at "localization/po/fo.po" Caused by: 0: IO error for operation on localization/po/fo.po: No such file or directory (os error 2) 1: No such file or directory (os error 2) ```

so in general we can be lazy about the "hard facts" and focus on saying what part went wrong.

Hm, that's a good observation. What's missing is information about what the code tried to do. Especially in cases where more than one path is relevant, like copying or moving files, it might still be valuable to be explicit about what was the source and what was the target. What's your preference regarding this function? Should we leave it as is, return to plain std::fs::copy, or modify the error message somehow?

actually the part after "Caused by" does not come from anyhow but it's how walkdir's impl fmt::Display for walkdir::error::Error.
So in cases like std::fs::copy we need to keep paths

Rewrite the PO file handling logic in Rust and make it available via an xtask. Replaces the `build_tools/{update_translations,fish_xgettext}.fish` scripts. Main benefits: - Better ergonomics - Better error handling - Eliminates the need for a fish executable for updating PO files, which is particularly useful in CI - Improved performance, mainly due to concurrent threads working on the PO files in parallel The behavior is mostly unchanged, with the minor exception that section headers for empty sections are now omitted in PO files. The interface for invoking the tooling is quite different. Instead of working with flags, `cargo xtask gettext` has 3 subcommands: - `update` modifies the PO files to match the current sources - `check` is like update, but instead of modifying the PO files, it shows diffs between the current version of the PO files and what they would look like after updating. When there is a difference, the xtask exits non-zero, making it useful for checks to detect outdated PO files. - `new` creates a new PO file for the given language. Both the `update` and `check` command take any number of file paths to specify the PO files to consider. If none are specified, all files in `localization/po/` are considered. Extracting gettext messages from Rust still requires compiling with the `gettext-extract` feature active. In situations where compilation is needed for other purposes as well, it can make sense to only build once and then tell the gettext xtask about the directory into which the messages have been extracted. This can be done via the `--rust-extraction-dir` flag. If we stop having gettext messages in Rust, this logic can be removed. Closes fish-shell#12676

Nahor · 2026-04-29T18:32:55Z

    /// Format files or check if they are correctly formatted.
    Format(FormatArgs),
+    /// Update gettext files
+    Gettext(GettextArgs),


Wouldn't it be better to call it translate? gettext is very implementation dependent (will that still be valid with fluent?) and not very telling to people who don't know what it is.

gettext is implementation-dependent because it does implementation-dependent things. The same goes for fluent. We don't have plans for moving the fish script message localization away from gettext, so both will coexist. For some workflows, the implementation details are not relevant. There it makes sense to have a shared xtask (maybe called localization) offering combined functionality (check, update, maybe new). But there is still a reason to have the implementation-specific xtasks, both to improve performance and to allow for subcommands that only make sense with one implementation like rename or show-missing for fluent.

from gettext the only thing that's needed in everyday scenarios is cargo xtask gettext update.
Wouldn't hurt to have cargo xtask fix someday, which would run this and other code-gen things and maybe formatting.

I think most users won't care about implementation details and about whether it takes a bit longer.
But for now this is fine, we can rename it later.

gettext is implementation-dependent because it does implementation-dependent things

To be clear, I was only suggesting to rename the xtask name (enum), not the rest of the code. main.rs could still call gettext.rs directly, or via a very thin localization.rs shim layer if you absolutely want an .rs file matching the task name.
Basically, I was suggesting to start with xtask localization, and only add the implementation CLI API (whatever that maybe, an xtask, a parameter, ...) once there are implementation-specific operations available.

As-is, it means that contributors will have to learn to call xtask gettext instead of update_translations.fish, only to then have to learn xtask localization later. Might as well start with the latter, and let them learn the former only if/when needed for the specific ops later.

I was only suggesting to rename the xtask name

Yes, that's how I understood it.

start with xtask localization

My point is that the current operations are gettext-specific. For Fluent, we'll have a separate xtask fluent, which offers different functionality with a different CLI. This is still being designed, so it's too early to settle on a common xtask localization, because we don't know yet what that CLI would have to look like to support both Fluent and gettext. This might also depend on whether we introduce it before or after porting all Rust messages to Fluent.

Generally, I agree that we probably want some xtask localization CLI, but I think it's too early to build that. For now, I think the xtask gettext is fine, especially because the only subcommand that's really relevant for most contributors is update.

krobelus · 2026-04-30T10:22:12Z

I think this looks good, feel free to merge if you don't need feedback, else I can look at the range-diff also

Rewrite the PO file handling logic in Rust and make it available via an xtask. Replaces the `build_tools/{update_translations,fish_xgettext}.fish` scripts. Main benefits: - Better ergonomics - Better error handling - Eliminates the need for a fish executable for updating PO files, which is particularly useful in CI - Improved performance, mainly due to concurrent threads working on the PO files in parallel The behavior is mostly unchanged, with the minor exception that section headers for empty sections are now omitted in PO files. The interface for invoking the tooling is quite different. Instead of working with flags, `cargo xtask gettext` has 3 subcommands: - `update` modifies the PO files to match the current sources - `check` is like update, but instead of modifying the PO files, it shows diffs between the current version of the PO files and what they would look like after updating. When there is a difference, the xtask exits non-zero, making it useful for checks to detect outdated PO files. - `new` creates a new PO file for the given language. Both the `update` and `check` command take any number of file paths to specify the PO files to consider. If none are specified, all files in `localization/po/` are considered. Extracting gettext messages from Rust still requires compiling with the `gettext-extract` feature active. In situations where compilation is needed for other purposes as well, it can make sense to only build once and then tell the gettext xtask about the directory into which the messages have been extracted. This can be done via the `--rust-extraction-dir` flag. If we stop having gettext messages in Rust, this logic can be removed. Closes fish-shell#12676

krobelus added this to the fish 4.7 milestone Apr 25, 2026

krobelus mentioned this pull request Apr 25, 2026

refactor: use anyhow for xtask errors #12674

Closed

krobelus reviewed Apr 25, 2026

View reviewed changes

danielrainer force-pushed the gettext_xtask branch from 9fe02a7 to fba0b9d Compare April 25, 2026 20:50

krobelus reviewed Apr 27, 2026

View reviewed changes

krobelus approved these changes Apr 27, 2026

View reviewed changes

krobelus reviewed Apr 27, 2026

View reviewed changes

danielrainer force-pushed the gettext_xtask branch from fba0b9d to e37bfb1 Compare April 29, 2026 16:10

Nahor reviewed Apr 29, 2026

View reviewed changes

krobelus reviewed Apr 30, 2026

View reviewed changes

Comment thread crates/xtask/src/gettext.rs

danielrainer force-pushed the gettext_xtask branch from e37bfb1 to 96ee8a2 Compare April 30, 2026 16:18

danielrainer merged commit 4e3898d into fish-shell:master Apr 30, 2026
18 checks passed

danielrainer deleted the gettext_xtask branch April 30, 2026 17:32

Uh oh!

Conversation

danielrainer commented Apr 25, 2026

TODOs:

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

krobelus Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

krobelus Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

krobelus Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

krobelus Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

krobelus Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

krobelus commented Apr 27, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

krobelus Apr 25, 2026 •

edited

Loading

krobelus Apr 25, 2026 •

edited

Loading

krobelus Apr 25, 2026 •

edited

Loading

krobelus Apr 27, 2026 •

edited

Loading

krobelus Apr 27, 2026 •

edited

Loading