Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wrapping run_repl with catch_unwind and restarting the repl on panic #11860

Merged
merged 15 commits into from Feb 22, 2024

Conversation

ayax79
Copy link
Contributor

@ayax79 ayax79 commented Feb 15, 2024

Provides the ability to cleanly recover from panics, falling back to the last known good state of EngineState and Stack. This pull request also utilizes miette's panic handler for better formatting of panics.

Screenshot 2024-02-21 at 08 34 35

@kubouch
Copy link
Contributor

kubouch commented Feb 15, 2024

Seems like this simple implementation should be enough to recover from most error states for two reasons:

  1. EngineState is in most cases modified by the “delta” approach, first recording the changes to StateWorkingSet, then applying them to the EngineState. If panic happens during parsing, the StateWorkingSet is not merged to EngineState, thus no partial parse weirdness.
  2. Stack gets completely thrown away, and we start the REPL with a fresh new Stack. We lose local variables and any environment changes, but I think that's OK.

Once cleaned up, I'd be willing to merge this and see how it goes.

@kubouch
Copy link
Contributor

kubouch commented Feb 15, 2024

Actually, there are some init steps before running the actual REPL loop which would get re-run on the reset. From the top of my head, the side effects would be things like reset local config changes, both config files would re-run, etc. It might be safest to start with a fresh EngineState when we hit a panic. We'd lose everything, but we could be reasonably sure we don't cause any weirdness. It would be just like starting a fresh REPL, which is also a simple mental model to adopt.

@fdncred
Copy link
Collaborator

fdncred commented Feb 16, 2024

@ayax79 I'm not sure what I did but the shell is in a weird state now and I can't start nushell.
image

I'm kind of wondering if there should be some crossterm reset that happens on panic before the repl is restarted?

@IanManske
Copy link
Member

IanManske commented Feb 16, 2024

My initial theory was that since io::stdin(), io::stdout(), and io::stderr() are behind Mutexs in the Rust standard library, then the panic must be poisoning these Mutexs. This could later cause other io calls to fail (in this case, is_terminal). However, looking through the Rust std lib source (for unix and windows), I'm not sure this is actually the case. It looks like it either tries to recover from poisoned locks or uses raw system calls that don't interact with the stdio Mutexs. Anyways, I have not yet been able to reproduce @fdncred's issue on Linux.

On another note, nu -e "'2023-13-31' | into datetime" causes an infinite panic loop for me 😄

Error:   × Main thread panicked.
  ├─▶ at /home/iama/.local/share/cargo/registry/src/index.crates.io-6f17d22bba15001f/human-date-parser-0.1.1/src/lib.rs:186:64
  ╰─▶ called `Result::unwrap()` on an `Err` value: ParseError(OutOfRange)
  help: set the `RUST_BACKTRACE=1` environment variable to display a backtrace.

Caught panic, restarting REPL
Error:   × Main thread panicked.
  ├─▶ at /home/iama/.local/share/cargo/registry/src/index.crates.io-6f17d22bba15001f/human-date-parser-0.1.1/src/lib.rs:186:64
  ╰─▶ called `Result::unwrap()` on an `Err` value: ParseError(OutOfRange)
  help: set the `RUST_BACKTRACE=1` environment variable to display a backtrace.

Caught panic, restarting REPL
Error:   × Main thread panicked.
  ├─▶ at /home/iama/.local/share/cargo/registry/src/index.crates.io-6f17d22bba15001f/human-date-parser-0.1.1/src/lib.rs:186:64
  ╰─▶ called `Result::unwrap()` on an `Err` value: ParseError(OutOfRange)
  help: set the `RUST_BACKTRACE=1` environment variable to display a backtrace.

Caught panic, restarting REPL
Error:   × Main thread panicked.
  ├─▶ at /home/iama/.local/share/cargo/registry/src/index.crates.io-6f17d22bba15001f/human-date-parser-0.1.1/src/lib.rs:186:64
  ╰─▶ called `Result::unwrap()` on an `Err` value: ParseError(OutOfRange)
  help: set the `RUST_BACKTRACE=1` environment variable to display a backtrace.

@fdncred
Copy link
Collaborator

fdncred commented Feb 16, 2024

Suggestions from chats with other people.

  • I'd try spawning a fresh EngineState on restart. That might help.
  • Can we return to the EngineState before they ran the last command? Like don't restart everything, just get back to a known good state and try again

@ayax79
Copy link
Contributor Author

ayax79 commented Feb 16, 2024

Per @kubouch's earlier suggestion I started working on building up a new EngineState for the repl.

If I can get that to work, I am going see if I can at least copy some of the state from the previous EngineState.

@ayax79
Copy link
Contributor Author

ayax79 commented Feb 16, 2024

@fdncred,

I just committed a version that creates a new EngineState. Can you see if it works better.

@fdncred
Copy link
Collaborator

fdncred commented Feb 17, 2024

I get the same thing as Kubouch with this

nu -e "'2023-13-31' | into datetime"

it just panics and restarts the repl forever.

Some real-world panics.

  1. "2023-13-31" | into datetime
  2. seq date --begin-date '2024-01-01T00:00:00' --end-date '2024-01-02T00:00:00' --input-format '%Y-%m-%dT%H:%M:%S' --output-format '%Y-%m-%dT%H:%M:%S'
  3. nu -e "ls \\.\pipe\" <-- I think this one is fixed now by fixing the date before it's handed to chrono

I'll have to test next week on Windows more.

@ayax79
Copy link
Contributor Author

ayax79 commented Feb 19, 2024

I get the same thing as Kubouch with this

nu -e "'2023-13-31' | into datetime"

This is weird as I would've assumed that a command like that would only hit the run_commands call inside main.rs. The call_unwind only wraps the run_repl.

Nm... I just read help again, it's supposed to exit into an interactive shell. I think I can avoid this by just not passing in the command line args the second time, which is probably the correct behavior.

@ayax79
Copy link
Contributor Author

ayax79 commented Feb 19, 2024

The case of running with -e is fixed now.

I am going to spend some more time testing tomorrow. I also want to copy over environment variables from the old engine state, I think that would be safe.

@kubouch
Copy link
Contributor

kubouch commented Feb 19, 2024

One approach Sophia suggested was to only wrap the loop part of the REPL and the init would be wrapped separately. That way, we'd land in the last known engine state from the previous iteration if the REPL line fails. I'd try to avoid copying some data between the engine states because that can lead to corner cases and also complicates the mental model.

@ayax79
Copy link
Contributor Author

ayax79 commented Feb 20, 2024

The contents of the loop block in run_repl are now wrapped in a catch_unwind. It does seem to work better, though I haven't tested it too heavily yet.

@ayax79
Copy link
Contributor Author

ayax79 commented Feb 20, 2024

@kubouch, let me know what you think.

Also, to double check:

One approach Sophia suggested was to only wrap the loop part of the REPL and the init would be wrapped separately.

You are suggesting that we keep the catch_unwind that wraps run_repl in main.rs?

@fdncred
Copy link
Collaborator

fdncred commented Feb 20, 2024

I'm still having problems
image

This is how I repro on Windows.

  1. check out this pr
  2. cargo run
  3. nu -e "ls \\.\pipe\"
  4. exit
  5. exit
  6. now that i'm at my original prompt, let's try again with cargo run
  7. I'm stuck now without being able to do cargo run to start nushell nor can I just type nu and run my installed version of nushell and i get the errors in the above screenshot. I have to quit out of pwsh entirely and close the window to be able to launch nushell again.

@ayax79
Copy link
Contributor Author

ayax79 commented Feb 20, 2024

@fdncred
I don't currently have access to a windows box -- I'll try to set up a windows image to test with later today.

In the meantime, I'm wondering if I remove the now mostly redundant catch_unwind in main.rs if things start working. Once I'm back to my desk, I'll make that change and ping you.

@fdncred
Copy link
Collaborator

fdncred commented Feb 20, 2024

Once I'm back to my desk, I'll make that change and ping you.

Sounds good.

@ayax79
Copy link
Contributor Author

ayax79 commented Feb 21, 2024

This is how I repro on Windows.

  1. check out this pr
  2. cargo run
  3. nu -e "ls \.\pipe"
  4. exit
  5. exit
  6. now that i'm at my original prompt, let's try again with cargo run
  7. I'm stuck now without being able to do cargo run to start nushell nor can I just type nu and run my installed version of nushell and i get the errors in the above screenshot. I have to quit out of pwsh entirely and close the window to be able to launch nushell again.

This seems to work with the current changes for me on Windows.

@fdncred
Copy link
Collaborator

fdncred commented Feb 21, 2024

@ayax79 I still can't get it to work. My repro above still breaks it.
image
I can do ls from pwsh.exe as my shell but I cannot run nushell or cargo run.

It's trashing terminal.exe for some reason. I have to completely close out of Windows Terminal to restart. Even things like git log don't work right. I mean I will get a log that goes on forever but it usually uses less and pages the git log, but not after breaking the terminal.

@fdncred
Copy link
Collaborator

fdncred commented Feb 21, 2024

For those watching, jack, ian and i were playing around with this in open-mic on discord and discovered that this (my reported problem above) is indeed and issue but NOT related to jack's PR here. This PR seems to work just fine. However, if you do a ls of \.\pipe\ on windows in nushell, your Windows Terminal will be left in a broken state. Again, it's unrelated to this PR so I feel fine saying we could move forward with landing as long as people have tested on Mac and Linux.

Copy link
Member

@IanManske IanManske left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will do some testing on Linux today or tomorrow. Just one thing to note in the mean time:

crates/nu-cli/src/repl.rs Show resolved Hide resolved
@ayax79
Copy link
Contributor Author

ayax79 commented Feb 22, 2024

@IanManske,

I added EnginedState::recover_from_panic(&mut self) where I reset the value of both Arc if poisoned. Let me know if this is what you were thinking.

@IanManske
Copy link
Member

IanManske commented Feb 22, 2024

Yep this is perfect, thanks @ayax79 !

There are few other things that I would like to see changed, but they're probably best left for separate PRs. For future reference if anyone is interested:

  • loop_iteration can return an Err which will end the repl.
  • loading the config can still panic and kill the repl.
  • this PR has to clone the engine state and stack each iteration, but fixing this is much more involved. Also, parent stacks Allow for stacks to have parents #11654 has to land to be able to fix the stack cloning.

Otherwise, I did some testing on Linux, and it looks good!

@fdncred
Copy link
Collaborator

fdncred commented Feb 22, 2024

Let's move forward with this PR and keep working on it.

@fdncred fdncred added pr:release-note-mention Addition/Improvement to be mentioned in the release notes pr:bugfix This PR fixes some bug pr:errors This PR improves our error messages labels Feb 22, 2024
@fdncred fdncred merged commit f17f857 into nushell:main Feb 22, 2024
20 checks passed
@ayax79 ayax79 deleted the global_cathch_unwind branch February 22, 2024 18:44
@ayax79
Copy link
Contributor Author

ayax79 commented Feb 22, 2024

@IanManske,

I have PR for the configuration case in #11935

I can definitely capture the Err, I opted not too in this request in case it changed any desired behavior. Should certain errors kill the shell?

@IanManske
Copy link
Member

@ayax79 Great, thanks for that PR!

Should certain errors kill the shell?

Ideally, no error in loop_iteration should kill the shell, since that could kill the shell on startup/first iteration and prevent people from logging in (if nushell is their login shell). But handling the Err cases is a little more involved. E.g., the error can just happen again if it is caught/discarded and loop_iteration is rerun. Additionally, I count 7 unique commands in loop_iteration that return a Result. Each of these would have to be changed to always succeed, or we would have to use some other default value in the case the function does fail.

There's no need for you to tackle this, but of course, feel free to give it a shot if you feel like it!

@hustcer hustcer added this to the v0.91.0 milestone Feb 23, 2024
IanManske pushed a commit that referenced this pull request Feb 24, 2024
Handle all errors that happen within the REPL loop, display warning or
error messages, and return defaults where necessary.

This addresses @IanManske [Comment Item
1](#11860 (comment))
in #11860

---------

Co-authored-by: Jack Wright <jack.wright@disqo.com>
kik4444 pushed a commit to kik4444/nushell-fork that referenced this pull request Feb 28, 2024
…ushell#11860)

Provides the ability to cleanly recover from panics, falling back to the
last known good state of EngineState and Stack. This pull request also
utilizes miette's panic handler for better formatting of panics.

<img width="642" alt="Screenshot 2024-02-21 at 08 34 35"
src="https://github.com/nushell/nushell/assets/56345/f81efaba-aa45-4e47-991c-1a2cf99e06ff">

---------

Co-authored-by: Jack Wright <jack.wright@disqo.com>
kik4444 pushed a commit to kik4444/nushell-fork that referenced this pull request Feb 28, 2024
Handle all errors that happen within the REPL loop, display warning or
error messages, and return defaults where necessary.

This addresses @IanManske [Comment Item
1](nushell#11860 (comment))
in nushell#11860

---------

Co-authored-by: Jack Wright <jack.wright@disqo.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr:bugfix This PR fixes some bug pr:errors This PR improves our error messages pr:release-note-mention Addition/Improvement to be mentioned in the release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants