Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite it in Rust #9512

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open

Conversation

ridiculousfish
Copy link
Member

@ridiculousfish ridiculousfish commented Jan 28, 2023

(Editor's note - please read #9512 (comment) and #9512 (comment) before commenting if you are new to fish or not familiar with the context - @zanchey)

(Sorry for the meme; also this is obligatory.)

I think we should transition to Rust and aim to have it done by the next major release:

  • Nobody really likes C++ or CMake, and there's no clear path for getting off old toolchains. Every year the pain will get worse.
  • C++ is becoming a legacy language and finding contributors in the future will become difficult, while Rust has an active and growing community.
  • Rust is what we need to turn on concurrent function execution.
  • Being written in Rust will help fish continue to be perceived as modern and relevant.

This should be thought of as a "port" instead of a "rewrite" because we would not start from scratch; instead we would translate C++ to Rust, incrementally, module by module, in the span of one release. We'll use an FFI so the Rust and C++ bits can talk to each other until C++ is gone, and tests and CI keep passing at every commit.

To prove it can work, in this PR I've ported FLOG, topic monitor, wgetopt, builtin_wait, and some others to Rust. The Rust bits live in a crate that lives inside the C++ and links with it. You can just build it the usual way:

  1. Install Rust 1.67 or later
  2. cmake as usual, and it should work, by way of corrosion

It works in GH Actions CI too.

The Plan has a high level description, and the Development Guide has more technical details on how to proceed. Please consider both to be part of this PR.

@faho
Copy link
Member

faho commented Jan 28, 2023

So, a few quick comments:

  1. I basically don't know rust at all (I believe I've used it for a combined twelve minutes), so I can't speak to any of the code itself.
  2. Rust 1.67 was released literally two days ago. Compared to us sticking to C++11 that's a staggering difference in version support. Is it that much easier to get a new rust on old systems? Our typical targets are old CentOS and Debian.
  3. Following on from that, how easy is this to package from a distro perspective? We've expended quite a bit of effort on making that easy, and I'd hate to drop that aspect.

As of yet unknown compatibility story for Tier 2+ platforms (Cygwin, etc).

Cygwin is mostly a major PITA because of its wchar_t type. If we can get away from using libc that could make some things easier.

@schuelermine
Copy link
Contributor

schuelermine commented Jan 29, 2023

get away from using libc

Rust uses libc under the hood.

@faho
Copy link
Member

faho commented Jan 29, 2023

Rust uses libc under the hood.

Of course technically that's true, but that's not what I'm talking about.

What I mean is that we wouldn't use e.g. libc's "iswdigit" and friends, like we're forced to do now because C++ never got around to replacing them with ones that work with its strings.

And because we use those functions, we're beholden to the libc wchar_t type where we use them, which is everywhere, which means the difference between cygwin (2 byte wchar_t) and as best as I can tell any other system (4 byte wchar_t) can appear everywhere.

And we're beholden to platform differences like "does this provide a uselocale" or "does this have wcscasecmp or std::wcscasecmp", because those directly appear in libc.

In effect what I'm saying is that my hope is rust would not rely on libc for everything and abstract some of the other libc gunk away instead of throwing it in our face like C++ does. And so this historical wchar_t difference could just go away.

@zanchey
Copy link
Member

zanchey commented Jan 29, 2023

The progress to date is impressive, and the motivation to allow concurrent mode is compelling.

I agree that packaging is important but I'd be happy to tackle it. RHEL 7 is only supported for another 18 months, and I have managed to backport GCC to RHEL in the past so it might be possible to do the same. There's a couple of ways to generate Debian packages, which are worth exploring.

I don't know any rust, but I didn't know any C++ before I started work on fish, so I'd be keen to learn.

@zanchey
Copy link
Member

zanchey commented Jan 29, 2023

(Maybe a way forward to #972 etc?)

@@ -0,0 +1,48 @@
include(FetchContent)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to make builds using OBS difficult; is it possible to include Corrosion in the tree and use it as a subdirectory instead?

Copy link

@Be-ing Be-ing Jan 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, distro packagers are not going to like this, but it's fine if you try find_package first:

find_package(Corrosion)
if (NOT Corrosion_FOUND)
   include(FetchContent)
   ...
endif()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, distro packagers are not going to like this, but it's fine if you try find_package first:

find_package(Corrosion)
if (NOT Corrosion_FOUND)
   include(FetchContent)
   ...
endif()

Distro packager for Arch Linux here, this would make our lives so much easier with the ability to use system libraries.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to add that, but my hope is that no distro packages fish while it uses Corrosion. This use is meant to be temporary during a single development cycle; by the next release I hope to have no CMake at all, and therefore no Corrosion.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an ambitious goal. Good luck.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to make builds using OBS difficult; is it possible to include Corrosion in the tree and use it as a subdirectory instead?

Corrosion can be added as a subdirectory/ submodule.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering the use of a fork of Corrosion, and that it is intended as a temporary measure that won't be released, I think using Corrosion as a subdirectory would make sense.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello, throwing myself into this conversation with some additional info that distro maintainers might find useful. FetchContent has always supported placing the source directory for any dependency in any location and then setting -DFETCHCONTENT_SOURCE_DIR_<uppercase-name>=<path>.

In this case, this would allow package maintainers to do -DFETCHCONTENT_SOURCE_DIR_CORROSION=<path/to/corrosion/sources> and not interrupt the workflow. No conditional find_package call is required.

Additionally, starting in CMake 3.24, it's possible to tell FetchContent_MakeAvailable to call find_package first (and vice versa where find_package calls FetchContent_MakeAvailable if no file was found).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, this is helpful. For Corrosion, my preferences are (in order): use FetchContent as in the PR (perhaps using bruxisma's tips), or add Corrosion as a fish-shell submodule, or directly check in Corrosion. Not sure what's best for OBS.

We shouldn't use heroics to keep fish working on older platforms mid-port. That may mean turning off builders for certain platforms, and then turning them back on later when we're no longer obligated to support Corrosion/autocxx/etc.

crate-type=["staticlib"]

[patch.crates-io]
cxx = { git = "https://github.com/ridiculousfish/cxx", branch = "fish" }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yikes, all this is going to require internet access from build hosts, I think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah we need to tell cargo to pre-fetch all dependencies, like https://kressle.in/articles/2021/packaging-rust-apps-in-obs.php

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OBS can potentially do a lot of the heavy lifting - https://en.opensuse.org/openSUSE:Packaging_Rust_Software

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cargo can download all dependencies' source code prior to building. See cargo fetch and cargo vendor. The [patch] table in Cargo.toml is an easy way to tell Cargo to use a repository you've forked, which is very convenient before your changes are merged upstream.

@@ -148,6 +148,7 @@ Dependencies

Compiling fish requires:

- Rust (version 1.67 or later)
- a C++11 compiler (g++ 4.8 or later, or clang 3.3 or later)
- CMake (version 3.5 or later)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the use of FindContent bumps CMake to 3.11 (unless corrosion gets included in the tree as discussed elsewhere)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current versions of corrosion require at least CMake 3.15. Some features are only available with CMake 3.19 and newer.


## Risks

- Large amount of work with possible introduction of new bugs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is amazing, thanks for spearheading this work.
The plan sounds all-around sensible to me.

The port sounds like a great investment while being easier to pull off than a rewrite.
There are many fresh shells these days but I don't think they offer a comparable interactive experience (most of them focus on inventing their language).

Still, the port is a lot of work with late payoff, so it might be hard to find persistent contributors.
Somehow Rust projects seem to be doing fine in this regard.
Granted most of them are written from scratch.
One incremental port that died a slow death is https://github.com/remacs/remacs - but I don't think it's comparable to fish, it's around 6x larger, and the authors didn't have ownership of the code. IIRC they used a simple bindgen build; our toolchain seems much more powerful.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"might be hard to find persistent contributions"

Just wanted to chime in and say that I love the fish shell and have been using it every day for about 6 or 7 years now. If this goes through, I will absolutely be contributing to the ongoing port and future features. I'm a full time Rust engineer and my main project at work includes quite a lot of C FFI, so I'm no stranger to that. I know I'm just one person, but figured I'd show support

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That remacs link is sobering, shanks for sharing that. I think the biggest difference is that emacs maintainers don't appear to have been involved, and the fork was scared of touching parts of the codebase.

fish gets a lot of mileage out of being written in C++ here: the types map better, the FFI tools are more powerful, and fish already uses techniques like RAII, scoped locks, shared_ptr etc which map cleanly to Rust. Still, food for thought.

FetchContent_Declare(
Corrosion
GIT_REPOSITORY https://github.com/ridiculousfish/corrosion
GIT_TAG fish
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably we want to have a SHA here, for reproducibility

- Other languages considered:
- Java, Python and the scripting family are ruled out for startup latency and memory usage reasons.
- Go would be an awkward fit. fork is [quite the problem](https://stackoverflow.com/questions/28370646/how-do-i-fork-a-go-process/28371586#28371586) in Go.
- Other system languages (D, Nim, Zig...) are too niche: fewer contributors, higher risk of the language becoming irrelevant.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(FWIW Java startup time can be fixed by compiling a native image)

Copy link
Member

@faho faho Jan 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(edit for posterity: This was in response to a comment asking us to reconsider D, which has since been removed by the author)

Let me put a swift lid on that (and any other discussions of other programming languages, including swift).

Rust has the momentum, some of us already know it and others are interested in it.

This exists, it is reasonably far along, it works. The decision has essentially been made.

This comment was marked as off-topic.

This comment was marked as off-topic.


## Timeline

Handwaving, 6 months? Frankly unknown - there's 102 remaining .cpp files of various lengths. It'll go faster as we get better at it. Peter (ridiculous_fish) is motivated to work on this, other current contributors have some Rust as well, and we may also get new contributors from the Rust community. Part of the point is to make contribution easier.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm eager to contribute as well, probably at modest capacity.
Seems like you have already overcome the biggest initial hurdles.
I'm always motivated to help people get up to speed with fish and Rust.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me 3 hours to port util.cpp, which has 200 lines.
With my new knowledge I could probably do it in 1 hour.
Extrapolating the 1-hour estimate to the remaining 60k lines gives us some 300 hours of work.
I'm sure that other modules are harder and might take longer but this number leaves me optimistic.

The [autocxx guidance](https://google.github.io/autocxx/workflow.html#how-can-i-see-what-bindings-autocxx-has-generated) is helpful:

1. Install cargo expand (`cargo install cargo-expand`). Then you can use `cargo expand` to see the generated Rust bindings for C++. In particular this is useful for seeing failed expansions for C++ types that autocxx cannot handle.
2. In rust-analyzer, enable Proc Macro and Proc Macro Attributes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nowadays rust-analyzer enables proc macros by default (but LSP clients may override the default)

let fish_src_dir = format!("{}/{}", rust_dir, "../src/");

// Where cxx emits its header.
let cxx_include_dir = format!("{}/{}", target_dir, "cxxbridge/rust/");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this may work, I warn you that this relies on unstable implementation details of Cargo and the cxx maintainer has adamantly refused to accept any more reasonable solution for integrating cxx with any C++ build system besides Bazel or Buck: dtolnay/cxx#462

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the heads-up. I doubt fish will track upstream autocxx: its priorities are somewhat different (focused on long-term safe interop, instead of an aid to porting) so we'll continue to use our fork.

Copy link

@Be-ing Be-ing Feb 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering that you are using a fork of cxx, you may consider adding something like dtolnay/cxx#1120 to your fork. Though, if this existing code works, and it's intended to be temporary before the next release, this issue is kinda moot for fish's purposes.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

relies on unstable implementation details of Cargo

To clarify, Cargo only provides build.rs scripts with the OUT_DIR environment variable as a path to write files, which is a subdirectory of target with an unreliable path that includes a hash. To create target/cxxbridge, cxx-build uses a hacky method of walking up from OUT_DIR to find the root of the target directory using some files Cargo creates for unrelated reasons, so that could break in some future version of cargo. A few weeks ago, I tried to start a discussion upstream about a better way to handle this, but it didn't go anywhere.

@faho
Copy link
Member

faho commented Jan 30, 2023

Just throwing an idea out there: Given that this is a big underlying technical change and needs some larger changes in packaging, would it make sense to:

  1. Call fish-in-rust "fish 4"
  2. Support the last C++ series (3.6.x?) for a bit longer with bugfixes, added completions and such

This is, of course, a change in policy and more work, but I believe it may be warranted. These releases could be source-only (just tag a 3.6.3 and send out a mail about it), for those who are unable to install rust, especially distro packagers who find themselves looking at an incompatible rustc version.

It's of course possible I'm overestimating the difficulty here, maybe it's not needed because nobody has a problem switching to 4.0 immediately.

@faho
Copy link
Member

faho commented Jan 30, 2023

Building on that old NetBSD VM I have lying around currently fails with

running: "ar" "s" "/home/alfa/fish-shell/build-rust/./cargo/build/x86_64-unknown-netbsd/release/build/fish-rust-6df62e1ec51440cc/out/libfish-rust.a"
  exit status: 0
  cargo:rustc-link-lib=static=fish-rust
  cargo:rustc-link-search=native=/home/alfa/fish-shell/build-rust/./cargo/build/x86_64-unknown-netbsd/release/build/fish-rust-6df62e1ec51440cc/out
  cargo:rustc-env=AUTOCXX_RS=/home/alfa/fish-shell/build-rust/fish-autocxx-gen/rs

  --- stderr

  CXX include path:
    /home/alfa/fish-shell/build-rust/./cargo/build/x86_64-unknown-netbsd/release/build/fish-rust-6df62e1ec51440cc/out/cxxbridge/include
    /home/alfa/fish-shell/build-rust/./cargo/build/x86_64-unknown-netbsd/release/build/fish-rust-6df62e1ec51440cc/out/cxxbridge/crate
  /home/alfa/fish-shell/fish-rust/../src/proc.h:103:23: error: static_assert expression is not an integral constant expression
  /home/alfa/fish-shell/fish-rust/../src/proc.h:103:23: note: cast from 'void *' is not allowed in a constant expression
  Error: 
    × the include_cpp! macro couldn't be expanded into Rust bindings to C++:
    │ Bindgen was unable to generate the initial .rs bindings for this file.
    │ This may indicate a parsing problem with the C++ headers.

--- CMakeFiles/_cargo-build_fish-rust ---
*** [CMakeFiles/_cargo-build_fish-rust] Error code 101

Removing that static_assert (which is fine) makes it go on until it eventually ends in

[ 98%] Linking CXX executable fish
ld: libfish_rust.a(nix-e8ff18d973978598.nix.35ee0c1d-cgu.15.rcgu.o): in function `nix::sys::timer::Timer::get':
nix.35ee0c1d-cgu.15:(.text._ZN3nix3sys5timer5Timer3get17h045b7bbd1ff7f87dE+0x1b): warning: warning: reference to compatibility timer_gettime(); include <time.h> to generate correct reference
ld: libfish_rust.a(nix-e8ff18d973978598.nix.35ee0c1d-cgu.15.rcgu.o): in function `nix::sys::timer::Timer::set':
nix.35ee0c1d-cgu.15:(.text._ZN3nix3sys5timer5Timer3set17h47eed997322493a6E+0x4e): warning: warning: reference to compatibility timer_settime(); include <time.h> to generate correct reference

(and a bunch more nix::sys::timer and nix::sys::aio functions)

It also absolutely requires libclang, and will fail rather late in the build process.

@faho faho added this to the next-major milestone Jan 30, 2023
@Lokathor
Copy link

Rust 1.67 was released literally two days ago. Compared to us sticking to C++11 that's a staggering difference in version support. Is it that much easier to get a new rust on old systems? Our typical targets are old CentOS and Debian.

If you want to use the distro version of Rust then you'll need to use a much older rust than latest stable. You can look it up for a particular distro, but Debian's rustc (for example) is often 6 months or more behind the latest stable. If you want to ship your program in a distro you usually need to build the program using that distro's version of the compiler, so I suspect you'd need to push your "minimum supported rust verison" (MSRV) farther into the past if your project is supposed to be part of distros.

If you don't need to be shipped in a distro, it's as easy as rustup update to update to the latest compiler. It's so simple to update the compiler that it's honestly hard to convince people in the rust community to have a library support anything other than the latest stable.

@Shnatsel
Copy link

You can look it up for a particular distro, but Debian's rustc (for example) is often 6 months or more behind the latest stable.

AFAIK Debian updates both rustc and other packages for a while, and then freezes all of them. So it does have the latest rustc for the duration of the development cycle, and so this should not be an issue for packages shipped by the distro itself.

Third-party repos, however, will need either their own newer rustc for the build, or have all the code written in Rust support an older compiler version, including all dependencies.

@Lokathor
Copy link

Just taking a quick look at the versions in https://packages.debian.org/sid/rustc:

  • buster 1.41 (2020-01-30)
  • bullseye 1.48 (2020-11-19)
  • bookworm 1.63 (2022-08-11)
  • sid 1.63

I'm not sure what you mean by "for the duration of the development cycle", but even sid falls behind "latest stable" on a regular basis until someone goes and updates it.

@BurntSushi
Copy link

so I suspect you'd need to push your "minimum supported rust verison" (MSRV) farther into the past if your project is supposed to be part of distros.

You do not. See: BurntSushi/ripgrep#1019

Essentially, distros like Debian will just ship an older version of your program, just like they do for almost everything else.

@mqudsi
Copy link
Contributor

mqudsi commented Jan 31, 2023

I've been alternating between obsessed with and simply always entertaining in the back of my mind the idea of a rust-powered fish for many years now (and have discussed that with other team members), so I'm approaching this with an open mind, but also with concerns. I must say that I knew this was coming for many months now from following @ridiculousfish on GitHub (❤️), and have been wondering how it would the topic would be broached.

I passionately believe that rust is the way forward for writing any new code, especially in a cross-platform context. Quite apart from all the memory and concurrency safety it brings to the table, the general approach of the language and the community to always put correctness first and foremost has built a strong culture of correct, well-designed, and fairly maintainable software engineering. The package management is great, the team is fantastic, and support in the broader software world has been extremely positive and welcoming.

To go further, I have written my own fish-compatible rewrite (not port) of fish's tokenizer and was making progress on the AST and parser when I last had my "burning desire" to see something like fish but in rust. I was maintaining it on my private git server and not GitHub, but I've pushed it to GH in case anyone cares to take a look: https://github.com/mqudsi/velvet

Having said all that, I worry that rushing any part of this could be a death knell for the project. A quality rewrite (or port) takes time, on the magnitude of years rather than days. One good example to take inspiration from is tectonic, a xelatex port that started off with machine-rewritten code and then manually made progress on actual, idiomatic rust conversions... but even that was a greenfield project and not a first-party effort and so not saddled with some of the concerns we would be saddled by. The recommended module-by-module approach is definitely a great suggestion for a starting place, and I think it has great potential.

Off the top of my head, some concerns or points:

  • Compatibility, obviously. Fish runs on archaic hardware for fun. This would be the end of that. That's not a reason that should stand in the way of progress, but it's something we can't turn a blind eye to and would have to quite clearly signal that we're ok giving up all platforms that don't run (modern versions of) rust.
  • Idiomatic rust is too high-level for fish. You don't fork in rust, using the standard library interface doesn't give you access to poll/select/epoll/kqueue/whatever natively. async io is a horrible fake cludge with tokio/async_std faking it with thread pools or whatnot and performing far worse than truly native async implementations. We'd have to go that mostly alone to keep the same code we have (thinking "port" and not "rewrite" and aiming to preserve what we have).
  • Despite whether we call it a "port" or a "rewrite" there's going to be a heck of a lot of rewriting taking place. Subtle bugs will creep in. Subtle compatibility fixes that weren't properly documented or annotated will be lost.
  • We will be in a state of partial limbo until this completes one way or the other. We won't want to make any major changes to the C++ codebase while we're trying to get to feature parity in the rust one. Depending on how things unfold this could greatly hinder further progress in the project.
  • Not all core team members are fluent in rust. This seems unfair. (This isn't just something I'm saying in passing but rather something I've given a lot of thought to and broached with others before.)
  • C++ isn't really a "legacy" language (yet), no matter how much we wish the world would start treating it like that given all its warts and shortcomings.
  • I understand what @ridiculousfish meant by "Rust is what we need to turn on concurrent function execution" but respectfully disagree. A complete rewrite (or port, if you prefer) to another language is a great time to facilitate that, but it's hardly a requirement, though I guess it could functionally be considered one if you want to take into account correctness, maintainability, etc.
  • (Really minor compared to everything else, but if we're ultimately going to be a rust project, then I think it makes sense for all types coming from the rust codebase to use idiomatic rust naming conventions, but that would cause churn cpp-side. I suggest using idiomatic rust naming conventions and then using statements to alias them with cpp-style names in the cpp headers.)

@ondra05

This comment was marked as off-topic.

@malmod

This comment was marked as off-topic.

@gimbles

This comment was marked as off-topic.

@AshtonKem

This comment was marked as off-topic.

@hr8

This comment was marked as off-topic.

@9999years

This comment was marked as off-topic.

@mlindner
Copy link

mlindner commented Jan 31, 2023

Idiomatic rust is too high-level for fish. You don't fork in rust, using the standard library interface doesn't give you access to poll/select/epoll/kqueue/whatever natively. async io is a horrible fake cludge with tokio/async_std faking it with thread pools or whatnot and performing far worse than truly native async implementations. We'd have to go that mostly alone to keep the same code we have (thinking "port" and not "rewrite" and aiming to preserve what we have).

Note that in the Approach section of the documentation there is this:

We will not use tokio, serde, async, or other fancy Rust frameworks initially.

As a general comment to anyone reading this, it's worth reading the entire fish-riir-plan.md document before commenting.

https://github.com/ridiculousfish/fish-shell/blob/riir/doc_internal/fish-riir-plan.md

@Merthod

This comment was marked as off-topic.

@0xpr03
Copy link

0xpr03 commented Jan 31, 2023

Just leaving a comment here, based on your planning doc:

types cannot represent non-UTF8 filenames

This crate might be what you want to use instead.

@Slackadays

This comment was marked as off-topic.

@mqudsi
Copy link
Contributor

mqudsi commented Feb 1, 2023

We're going to have to do some hacking around (auto)cxx because of how it invokes the cxx compiler. It doesn't inherit the compiler configuration from CMake, so all the awesomeness (and blood, sweat, and tears) we put into making everything work over there doesn't apply.

The FreeBSD build is broken (I'm not talking about CI) because cxx invokes $CXX (clang++ on FreeBSD) but in a way that it can't even find the basic C++ standard library headers:

warning: In file included from src/cxx.cc:1:
warning: src/../include/cxx.h:2:10: fatal error: 'algorithm' file not found
warning: #include <algorithm>
warning:          ^~~~~~~~~~~
warning: 1 error generated.

error: failed to run custom build command for `cxx v1.0.81 (https://github.com/ridiculousfish/cxx?branch=fish#24d1bac1)`

It invokes the cpp build via

"clang++" "-O3" "-ffunction-sections" "-fdata-sections" "-fPIC" "--target=x86_64-unknown-freebsd" "-Wall" "-
Wextra" "-o" "/usr/home/mqudsi/fish-shell/./build/./cargo/build/x86_64-unknown-freebsd/release/build/cxx-777e53fd89b74d87/out/src/cxx.o" "-c"
"src/cxx.cc"

but the usage of --target=x86_64-unknown-freebsd causes these errors because clang++ doesn't recognize that target triple. Since we're not cross-compiling at all, the most sensible thing would be to drop it from the invocation altogether, but it's also possible to get the compilation to succeed by changing the triple to --target=x86_64-unknown-freebsd$(uname -r | sed -r 's/([^-]+)-.*/\1/g') also works (e.g. changing x86_64-unknown-freebsd to x86_64-unknown-freebsd13.1).

Edit: it also works with just the major version appended, e.g. x86_64-unknown-freebsd13.

@Be-ing
Copy link

Be-ing commented Feb 1, 2023

@mqudsi I suspect that's a bug in cc, which cxx-build uses under the hood.

@mulkieran
Copy link

This is my notes on MSRV and building platform packages. For development, rustup is clearly the preferred choice. However, on package builders Internet access is generally disallowed.

We currently provide packages for the following platforms:

* Debian Stable and Oldstable (11 and 10)

* CentOS (RHEL) 7/8

* Fedora 36 & 37

* openSUSE Leap 42.3 (could be dropped), 15.4 & Tumbleweed

* Ubuntu 16.04 (can probably be dropped), 18.04, 20.04, 22.04 and 22.10)

Ubuntu packages are provided by Launchpad PPAs. The mad lads at Canonical ship rustc 1.61 in the updates repositories for 18.04 onwards, to support Firefox ESR. It is possible (maybe not fun) to get 1.63 into a PPA which the fish builds can depend upon. I think the Launchpad conditions will require that it is compiled without external binaries, which means building 1.62 then using that to build 1.63. There is an official PPA which has more up-to-date binaries as well: https://launchpad.net/~ubuntu-mozilla-security/+archive/ubuntu/rust-updates/+packages

Other platforms are built by the openSUSE Open Build Service. There is a devel:languages:rust project which provides up-to-date packages for openSUSE supported versions, Fedora ships an up-to-date rust for Fedora and CentOS 7 (via EPEL), and the OBS rustc maintainer has indicated support for importing packages that use the compiled versions rather than having to chain compile from existing versions eg for Debian.

The other issue is our ongoing hope to have an AppImage (#6475), which I continue to poke at intermittently. This really needs to be built on the oldest version possible (currently Leap 42.3) but again I think it is possible to get a rustc for that platform without too much issue.

Packaging in Fedora will always require that fish can be built by the current stable Rust. Our project records in our Cargo.toml a rust-version value, which is our lowest supported Rust version. Sometimes the difference between the two versions is six months (4 versions), sometimes less. We make sure to run compilation tests and even to test using that lowest supported version. In practice, a difference of six months is at most a mild annoyance (sometimes we have to delay making use of some new feature or standard library inclusion for a bit).

Unlike some other distros, Fedora requires a separate package for every crate. Fundamental crates like nix and libc are already packaged and supported; but if you need to depend on an as yet unpackaged crate that will be extra work.

Generally speaking, in Fedora, it is expected that Rust packages will be released to the three leading Fedora releases, currently rawhide, f37, f36 in sync. The rust package itself is an example: https://src.fedoraproject.org/rpms/rust .

@mqudsi
Copy link
Contributor

mqudsi commented Feb 1, 2023

Thanks for the hint, @Be-ing.

I patched cc-rs to fix the FreeBSD issue and opened a PR upstream. It was reported a while back but I myself flagged it as no repro! 😅

@ridiculousfish would you mind if I committed the following to the riir branch?

diff --git a/fish-rust/Cargo.lock b/fish-rust/Cargo.lock
index 51ede1746..2b3cb4e0b 100644
--- a/fish-rust/Cargo.lock
+++ b/fish-rust/Cargo.lock
@@ -192,9 +192,8 @@ checksum = "bef38d45163c2f1dde094a7dfd33ccf595c92905c8f8f4fdc18d06fb1037718a"
 
 [[package]]
 name = "cc"
-version = "1.0.78"
-source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "a20104e2335ce8a659d6dd92a51a767a0c062599c73b343fd152cb401e828c3d"
+version = "1.0.79"
+source = "git+https://github.com/mqudsi/cc-rs?branch=fish#f40d06b520129b08f19f0a1b00cf8f5b4623bd0f"
 
 [[package]]
 name = "cexpr"
diff --git a/fish-rust/Cargo.toml b/fish-rust/Cargo.toml
index d6e61d4e1..d21a02296 100644
--- a/fish-rust/Cargo.toml
+++ b/fish-rust/Cargo.toml
@@ -34,12 +34,19 @@ default = ["fish-ffi-tests"]
 fish-ffi-tests = ["inventory"]
 
 [patch.crates-io]
+cc = { git = "https://github.com/mqudsi/cc-rs", branch = "fish" }
 cxx = { git = "https://github.com/ridiculousfish/cxx", branch = "fish" }
 cxx-gen = { git = "https://github.com/ridiculousfish/cxx", branch = "fish" }
 autocxx = { git = "https://github.com/ridiculousfish/autocxx", branch = "fish" }
 autocxx-build = { git = "https://github.com/ridiculousfish/autocxx", branch = "fish" }
 autocxx-bindgen = { git = "https://github.com/ridiculousfish/autocxx-bindgen", branch = "fish" }
 
+[patch.'https://github.com/ridiculousfish/cxx']
+cc = { git = "https://github.com/mqudsi/cc-rs", branch = "fish" }
+
+[patch.'https://github.com/ridiculousfish/autocxx']
+cc = { git = "https://github.com/mqudsi/cc-rs", branch = "fish" }
+
 #cxx = { path = "../../cxx" }
 #cxx-gen = { path="../../cxx/gen/lib" }
 #autocxx = { path = "../../autocxx" }

inventory = { version = "0.3.3", optional = true}
lazy_static = "1.4.0"
libc = "0.2.137"
nix = "0.25.0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to become

nix = { version = "0.25.0", default-features = false, features = [ ] }

(the features list can be left blank for now, it'll work)

otherwise there are failures linking the executables under BSD:

[5/6] Linking CXX executable fish
FAILED: fish
: && /usr/local/libexec/ccache/clang++ -Wredundant-move -fno-c++-static-destructors -iquote /usr/home/mqudsi/fish-shell/./src -O2 -g -DNDEBUG  CMakeFiles/fish.dir/src/fish.cpp.o -o fish  -Wl,-rpath,/usr/local/lib:  libfishlib.a  libfish_rust.a  /usr/lib/libcurses.so  /usr/local/lib/libtinfo.so  -pthread  /usr/local/lib/libpcre2-32.so  /usr/local/lib/libintl.so && :
ld: error: undefined symbol: mq_open
>>> referenced by nix.5cf3894a-cgu.1
>>>               nix-e298e46468831b57.nix.5cf3894a-cgu.1.rcgu.o:(nix::mqueue::mq_open::hb767fe6feefc2911) in archive libfish_rust.a
>>> referenced by nix.5cf3894a-cgu.1
>>>               nix-e298e46468831b57.nix.5cf3894a-cgu.1.rcgu.o:(nix::mqueue::mq_open::hb767fe6feefc2911) in archive libfish_rust.a
>>> did you mean: kmq_open
>>> defined in: /lib/libc.so.7

ld: error: undefined symbol: mq_unlink

referenced by nix.5cf3894a-cgu.1
nix-e298e46468831b57.nix.5cf3894a-cgu.1.rcgu.o:(nix::mqueue::mq_unlink::hc12ac9a7732e5462) in archive libfish_rust.a

ld: error: undefined symbol: mq_close

referenced by nix.5cf3894a-cgu.1
nix-e298e46468831b57.nix.5cf3894a-cgu.1.rcgu.o:(nix::mqueue::mq_close::h42a9979fa4422db4) in archive libfish_rust.a

ld: error: undefined symbol: mq_receive

referenced by nix.5cf3894a-cgu.1
nix-e298e46468831b57.nix.5cf3894a-cgu.1.rcgu.o:(nix::mqueue::mq_receive::h04907d9bfd2c8a95) in archive libfish_rust.a

ld: error: undefined symbol: mq_send

referenced by nix.5cf3894a-cgu.1
nix-e298e46468831b57.nix.5cf3894a-cgu.1.rcgu.o:(nix::mqueue::mq_send::hd8c21cd11854cd02) in archive libfish_rust.a

ld: error: undefined symbol: mq_getattr

referenced by nix.5cf3894a-cgu.1
nix-e298e46468831b57.nix.5cf3894a-cgu.1.rcgu.o:(nix::mqueue::mq_getattr::h4d358e6861ed64a2) in archive libfish_rust.a
referenced by nix.5cf3894a-cgu.1
nix-e298e46468831b57.nix.5cf3894a-cgu.1.rcgu.o:(nix::mqueue::mq_set_nonblock::h316ea553c44775ab) in archive libfish_rust.a
referenced by nix.5cf3894a-cgu.1
nix-e298e46468831b57.nix.5cf3894a-cgu.1.rcgu.o:(nix::mqueue::mq_remove_nonblock::hb9b045e36b48ad40) in archive libfish_rust.a

ld: error: undefined symbol: mq_setattr

referenced by nix.5cf3894a-cgu.1
nix-e298e46468831b57.nix.5cf3894a-cgu.1.rcgu.o:(nix::mqueue::mq_setattr::hec6e064078b9a25f) in archive libfish_rust.a
referenced by nix.5cf3894a-cgu.1
nix-e298e46468831b57.nix.5cf3894a-cgu.1.rcgu.o:(nix::mqueue::mq_set_nonblock::h316ea553c44775ab) in archive libfish_rust.a
referenced by nix.5cf3894a-cgu.1
nix-e298e46468831b57.nix.5cf3894a-cgu.1.rcgu.o:(nix::mqueue::mq_remove_nonblock::hb9b045e36b48ad40) in archive libfish_rust.a

ld: error: undefined symbol: fspacectl

referenced by nix.5cf3894a-cgu.10
nix-e298e46468831b57.nix.5cf3894a-cgu.10.rcgu.o:(nix::fcntl::fspacectl::h28103e804dc77cef) in archive libfish_rust.a
referenced by nix.5cf3894a-cgu.10
nix-e298e46468831b57.nix.5cf3894a-cgu.10.rcgu.o:(nix::fcntl::fspacectl_all::hca1ea434c6d304d2) in archive libfish_rust.a

ld: error: undefined symbol: timer_create

referenced by nix.5cf3894a-cgu.6
nix-e298e46468831b57.nix.5cf3894a-cgu.6.rcgu.o:(nix::sys::timer::Timer::new::h48accb555807a0b6) in archive libfish_rust.a

ld: error: undefined symbol: timer_settime

referenced by nix.5cf3894a-cgu.6
nix-e298e46468831b57.nix.5cf3894a-cgu.6.rcgu.o:(nix::sys::timer::Timer::set::hd9d5a1ad36c6f443) in archive libfish_rust.a

ld: error: undefined symbol: timer_gettime

referenced by nix.5cf3894a-cgu.6
nix-e298e46468831b57.nix.5cf3894a-cgu.6.rcgu.o:(nix::sys::timer::Timer::get::hc1b05f9df3be1863) in archive libfish_rust.a

ld: error: undefined symbol: timer_getoverrun

referenced by nix.5cf3894a-cgu.6
nix-e298e46468831b57.nix.5cf3894a-cgu.6.rcgu.o:(nix::sys::timer::Timer::overruns::he5a9db1fe6333d02) in archive libfish_rust.a

ld: error: undefined symbol: timer_delete

referenced by nix.5cf3894a-cgu.6
nix-e298e46468831b57.nix.5cf3894a-cgu.6.rcgu.o:(_$LT$nix..sys..timer..Timer$u20$as$u20$core..ops..drop..Drop$GT$::drop::hb9d3f92b12e09718) in archive libfish_rust.a
clang++: error: linker command failed with exit code 1 (use -v to see invocation)

@JustinWick

This comment was marked as off-topic.

@fish-shell fish-shell deleted a comment from RayZ0rr Feb 2, 2023
matu3ba

This comment was marked as off-topic.

@zanchey
Copy link
Member

zanchey commented Feb 2, 2023

@mulkieran

Packaging in Fedora will always require that fish can be built by the current stable Rust.

Yes, and I should be clear that I'm aware that Fedora generally has up-to-date fish packages within a few days of a release. We build Fedora packages on the OBS for both continuous integration and actual releases, but do not link to them from the homepage - I will probably drop the release packages in the future because I feel like Fedora users can and should rely on the distribution. Having it built and packaged under CI is useful, I think.

Unlike some other distros, Fedora requires a separate package for every crate. Fundamental crates like nix and libc are already packaged and supported; but if you need to depend on an as yet unpackaged crate that will be extra work.

That's a choice I respect, although I do not envy the work it involves. There's a bigger conversation to be had about packaging and the upstream/distribution relationship but for now I would say our CI builds will probably just use cargo_vendor or whatever and leave the crate packaging to the Fedora maintainers for the official distribution packages.

@shaunsingh

This comment was marked as duplicate.

@fish-shell fish-shell deleted a comment from daixtrose Feb 2, 2023
@ridiculousfish
Copy link
Member Author

Well this blew up. There's some interest in contributing to this now, so I am going to shift my branch out of my repo to fish-shell.

Trying to catch up:

Responding to @faho:

my hope is rust would not rely on libc for everything and abstract some of the other libc gunk away instead of throwing it in our face like C++ does. And so this historical wchar_t difference could just go away.

Yes that is my hope as well and I think it is achievable - we will use none of libc's wide char functionality. wchar_t in fish-rust will be a Unicode codepoint, identical to Rust's char, 32 bits on all platforms. We'll rely on Rust's Unicode tables for character classification like iswdigit so they become locale independent to the extent possible (e.g. nobody is using CJK numerals for pids).

We'll continue to respect the user's locale for decimal separators, collation, etc. but do our best to avoid wcscmp and all of the wide-string C gunk. Locale-respecting collation is still a little hazy for me, interested in any suggestions here.

  1. Call fish-in-rust "fish 4"
    Support the last C++ series (3.6.x?) for a bit longer with bugfixes, added completions and such

Definitely agree with the major version bump. I'm not opposed to keeping a 3.6.x train either, we'll gauge interest in that.

Building on that old NetBSD VM I have lying around currently fails

Interesting finding. Looks like there's two errors: the libclang dependency and nix errors. libclang is only a temporary dependency, so we can just defer NetBSD support until after the port; we don't have to get it working on NetBSD (though if someone wants to that's fine of course). The nix warnings look to be more serious; I haven't dug in. But worst case we can just use libc directly.

To @mqudsi:

fish runs on archaic hardware for fun. This would be the end of that.

I'm not sure how to quantify this - Rust targets a lot of platforms. There may be platforms that support g++4.8 and not Rust, but I see no reason why fish-rust couldn't be made to run on AIX, Solaris, etc. maybe after some fixes. The cross-compilation story is also a bit better. So who knows?

Idiomatic rust is too high-level for fish. You don't fork in rust, using the standard library interface doesn't give you access to poll/select/epoll/kqueue/whatever natively.

Right - we'll use a mixture of libc and the nix crate which is thin. As you say we will not use Rust's async, at least initially. There will definitely be lots of code like unsafe { libc::select(...) } . That's OK and expected - fidelity to existing code is more important than idiomaticity. And safe Rust is just someone else's unsafe after all. Your description of "aiming to preserve what we have" is exactly right.

Subtle bugs will creep in. Subtle compatibility fixes that weren't properly documented or annotated will be lost.

Yes, that's an unavoidable cost. Part of how we mitigate it is the incremental port with fidelity to existing C++. If and when we discover bugs (and you're correct they will), we can able to bisect it to a commit which hopefully only changes one module, and do a nearly line-by-line comparison. But it won't be perfect.

We will be in a state of partial limbo until this completes one way or the other. We won't want to make any major changes to the C++ codebase while we're trying to get to feature parity in the rust one. Depending on how things unfold this could greatly hinder further progress in the project.

It's a fair point. We generally don't get a ton of C++ contributions except the project's committers, so the risk is low. But if something does come in, it's fine as long as it arrives before that part of the C++ gets ported.

Not all core team members are fluent in rust. This seems unfair. (This isn't just something I'm saying in passing but rather something I've given a lot of thought to and broached with others before.)

Right, but all of the core team members do not especially like C++.

C++ isn't really a "legacy" language (yet)

I originally wrote something more nuanced, and then went back and deleted all my weasel words. Maybe I cut too deep.

"Rust is what we need to turn on concurrent function execution" but respectfully disagree. A complete rewrite (or port, if you prefer) to another language is a great time to facilitate that, but it's hardly a requirement, though I guess it could functionally be considered one if you want to take into account correctness, maintainability, etc.

Yes it is the maintainability and complexity that worries me. We could turn it on now - it passes all tests including with TSAN - but there's way too many comments like this for my taste. There's no way in C++ to enforce constraints like "jobs cannot be shared across threads" and even if we get it right today, someone may introduce it accidentally tomorrow.

We're going to have to do some hacking around (auto)cxx because of how it invokes the cxx compiler

We can but do not have to. We should aim to support recent macOS and recent Linux for the duration of the port; others are nice to have, but can be deferred until after we drop the FFI. That said, if you want to fix autocxx, go for it. Maybe I'll move the repos into the fish-shell group.

To @zanchey and other packaging experts:

I don't know much about Linux packaging; however nothing I envision requires recent Rust features. Eyeballing it from this post I think we should be able to work as far back as Rust 1.31 from 2018. (We currently depend on 1.67 but this is purely for a nice-to-have "foo"L syntax which we can fix or remove. Anyways if you're satisfied that this is achievable then I'm convinced.

This adds an implementation of fish_wcstoi in Rust, mirroring the one in
fish. As Rust does not have a string to number which infers the radix
(i.e. looks for leading 0x or 0), we add that manually.
This allow testing Rust functions (from fish_tests.cpp) which need to
cross the FFI. See the example in smoke.rs.
This allows the wgettext! macro, which calls into C++.
This allows using existing format strings.
The implementation is adapted from https://github.com/tjol/sprintf-rs
This works around an autocxx limitations where different types cannot
have the same name even if they live in different namespace.

ast::job_t conflicts with job_t.
This implements builtin_wait in Rust.
@ridiculousfish
Copy link
Member Author

I've opened the riir branch in the fish-shell repo. We'll end up merging that. @krobelus @mqudsi and all, please feel free to add commits to that as if it were master branch; that way you won't be blocked while we can decide when and how to merge.

@thomcc
Copy link
Contributor

thomcc commented Feb 3, 2023

Eyeballing it from this post I think we should be able to work as far back as Rust 1.31 from 2018

Drive by comment, but I think this would be pretty difficult.

Even if you limit the features you use, you'll have dependencies that do not. For example, the libc crate is intending on updating its MSRV to a rolling N-5 or N-10 (see rust-lang/libs-team#72), although there's some work needed before that can happen), and a lot of other crates in the ecosystem are likely to follow suit when this happens. (For example, I'm the de-facto maintainer of the cc crate at the moment, and think that it will follow this as well)

Edit: Fixed link to libc issue.

@mqudsi
Copy link
Contributor

mqudsi commented Feb 3, 2023

Thanks, @ridiculousfish. I mainly just want the FreeBSD CI working as well, primarily because I think sorting out Linux-specific code or dependencies creeping in would be easier as we go along rather than after the fact since there are unfortunately a few crates that one would assume are cross-platform but actually end up have Linux-isms. I'm sure having macOS get tier-1 status helps with that, but then again, nix's default features already tripped up the FreeBSD builds even with the macOS builds passing.

I'm not sure how I feel about msrv. I was very much against bumping our C++ standard from C++11 to C++14 let alone C++20 but I don't feel as strongly about it for rust, primarily because we're running into constraints with the distro-supported rust versions as it is so we might as well just go all-in and also because, unlike with C++ and its C++11, rust makes it hard to pick that one "watershed" version that so dramatically improved quality-of-life compared to before and the features that shipped in later versions pale in comparison. With rust, each release is just incrementally chipping away at ergonomic issues.

(Note that we're currently targeting rust 2021 edition which requires rust 1.56.0, but 2021 doesn't bring anything that's a must-have to the table. A lot of crates (even some of the more "important" ones, but I won't name names) jumped to 2021 too soon imho, disregarding the msrv bump. I'm not sure if there are any of those in our current dependency tree.)

@zanchey
Copy link
Member

zanchey commented Feb 3, 2023

This (hopefully my final) note is about dependencies.

There's a tinyexpr rewrite which is documented as incomplete but looks like it would meet the current use in fish.

PCRE2 has a crate (from thread commenter @BurntSushi 😄) which exposes some of PCRE2 as idiomatic Rust, but not - by the look of it - substitution. So we might need to do stick with pcre2-sys, or do some work there, which perhaps can go upstream.

@BurntSushi
Copy link

PCRE2 has a crate (from thread commenter @BurntSushi 😄) which exposes some of PCRE2 as idiomatic Rust, but not - by the look of it - substitution. So we might need to do stick with pcre2-sys, or do some work there, which perhaps can go upstream.

I don't know what your requirements are, but substitution can almost certainly be implemented on top of the public API exposed by pcre2.

But yes, I'm also open to patches that expand the API to match the regex crate more closely.

@mqudsi
Copy link
Contributor

mqudsi commented Feb 3, 2023

Thanks for that info, @BurntSushi. As long as you're open to it, I am happy to put in a bit more work to upstream any functionality we end up needing to add rather than simply implementing it fish-side atop of the existing crate.

I've been refamiliarizing myself with the cc-rs crate because I think one of the gnarlier things we're going to have to do is port the CMake-based feature detection to a build.rs script so that we can get our rust code to properly conditionally target certain platforms or take advantage of certain APIs. (@thomcc has been an excellent custodian of the crate and has really been on top of it with reviewing the PRs .)

Until the checks are ported to cc-rs and used in build.rs to expose them as const XXX: bool values in a build.rs-generated config.h alternative, we can either grab them from cpp over the ffi bridge or add temporary "micro features" to the crate and have CMake invoke rust with the correct feature matrix. Given that it's only temporary, the former is probably the way to go to avoid unnecessary work that's going to ultimately be thrown away.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment