Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update nvim-rs, remove superflous arc-wrapping #2336

Merged
merged 4 commits into from
Feb 11, 2024

Conversation

KillTheMule
Copy link
Contributor

What kind of change does this PR introduce?

  • Codestyle
  • Other

Did this PR introduce a breaking change?

  • No

Updated nvim-rs to the latest release. I noticed you're wrapping the Neovim instance in Arc. That's not really necessary, since nvim-rs does Arc-wrapping for all components anyways.

There might be performance implications - this removes an indirection, but cloning the Neovim instance now needs to bump 3 counters instead of one. My assumption would be that this is a win, but there aren't any benchmarks in place to decide this.Your choice :)

@falcucci
Copy link
Member

falcucci commented Feb 3, 2024

@KillTheMule thank you! i'll just make sure to use it for a bit to certify everything works flawlessly before approving this.

Copy link
Member

@falcucci falcucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is only a clippy warning that would be nice to be fixed before a merge :)

@KillTheMule
Copy link
Contributor Author

Done.

@fredizzimo
Copy link
Member

fredizzimo commented Feb 7, 2024

@Zhniing, could you test this?
If I'm not mistaken (I haven't looked at the changes in detail) the file handles should no longer be asynchrounous, and run into the potential rust standard library bug I found here #2299 (comment). My analysis there is slightly wrong, thoughNtReadFile allows WaitForSingleObject in certain cases, which I think should have been fulfilled, but I see no other possible reason for your crash,

@sid-6581, can you revert your exit hang fix and test with this? I strongly believe that one was also caused by the same bug. And if not, so much has changed in nvim-rs that something else might have fixed it.

@sid-6581
Copy link
Contributor

sid-6581 commented Feb 7, 2024

@Zhniing, could you test this? If I'm not mistaken (I haven't looked at the changes in detail) the file handles should no longer be asynchrounous, and run into the potential rust standard library bug I found here #2299 (comment). My analysis there is slightly wrong, thoughNtReadFile allows WaitForSingleObject in certain cases, which I think should have been fulfilled, but I see no other possible reason for your crash,

@sid-6581, can you revert your exit hang fix and test with this? I strongly believe that one was also caused by the same bug. And if not, so much has changed in nvim-rs that something else might have fixed it.

I'll check! I'm also experiencing a lot of hangs now while I'm in the middle of using neovide (not when closing), so I'll be running in the debugger while working and checking on both issues.

@fredizzimo
Copy link
Member

@KillTheMule, thanks. I guess it didn't occur to us that it was cloneable.

@fredizzimo
Copy link
Member

@sid-6581, I think there's a temporary freeze caused by #2188, it might not skip ahead when the computer has been sleeping for example. But that one recovers after a while.

@sid-6581
Copy link
Contributor

sid-6581 commented Feb 7, 2024

@fredizzimo, the freezes I've observed happen a lot, and I haven't seen it recover. From my recollection it also happens without locking the computer or anything like that. If I'm not careful to kill the whole process tree, I get a lot of orphan nvim processes left behind.

I just hope it's easily reproducible in a debug build, and not just a race condition that mostly happens in the production build.

@fredizzimo
Copy link
Member

I wish I could help you with that, but all three of my Windows (11) machines are completely stable. One of them with Neovide uptimes of weeks, it's probably one week even now, but that's on the D3D branch.

@sid-6581
Copy link
Contributor

sid-6581 commented Feb 7, 2024

Oh no worries at all, I like debugging. There's probably just some system differences causing race conditions to trigger on my machines. As long as it's easily reproducible, I can track it down.

@sid-6581
Copy link
Contributor

sid-6581 commented Feb 7, 2024

OK, I tried reverting my fix on this PR and got neovide to hang on the first attempt, just like before. The nvim process is gone, and neovide still doesn't detect it as gone.

@KillTheMule have you ever seen this problem?

(Also, totally unrelated observation, but neovide no longer compiles on the latest nightly. Something has a dependency on ahash 0.7.7, which is where the problem is.)

@KillTheMule
Copy link
Contributor Author

@KillTheMule have you ever seen this problem?

I couldn't really follow along, discussions are quite long... can you spell that out for me? Or link to a specific comment that tells me what you're talking about?

@sid-6581
Copy link
Contributor

sid-6581 commented Feb 7, 2024

@KillTheMule have you ever seen this problem?

I couldn't really follow along, discussions are quite long... can you spell that out for me? Or link to a specific comment that tells me what you're talking about?

Apologies, I wasn't sure if you had followed the previous discussions. Here is some context:

#2130

TLDR; sometimes when the nvim process exits (cleanly, by all appearances), the stdin/stdout pipes aren't broken, and nvim-rs doesn't break out of its I/O loops. This causes the runtime to never shut down, because it's waiting for the nvim-rs task.

@fredizzimo
Copy link
Member

@sid-6581, here's a fix for the nightly toolchain:

@sid-6581
Copy link
Contributor

sid-6581 commented Feb 7, 2024

@sid-6581, here's a fix for the nightly toolchain:

Great, that was quick! Thanks!

@KillTheMule
Copy link
Contributor Author

KillTheMule commented Feb 7, 2024

TLDR; sometimes when the nvim process exits (cleanly, by all appearances), the stdin/stdout pipes aren't broken, and nvim-rs doesn't break out of its I/O loops. This causes the runtime to never shut down, because it's waiting for the nvim-rs task.

Ok, I have not seen this before. It's windows only, and you're the only one to reproduce it? I'll try to see if I can reproduce it... and then I need a windows dev environment... might take some time, but I'll look.

I do agree though that the loop @fredizzimo points to looks hellishly suspicious. As far as I can tell though that amounts to busy waiting, i.e. it should put some load on a core. Can you confirm this? Should also fill up the log with debug messages...

(e) I guess I can't even read my own loop, for now I think it's fine. Gonna look into the hang.

@fredizzimo
Copy link
Member

fredizzimo commented Feb 7, 2024

@KillTheMule,
This is what I'm most suspicious of currently. I don't think the fix here is correct:

Since @Zhniing encountered the exact same issue here #2299, with the hashes matching the 1.75.0 toolchain version.

IMO they should use an event object there.

Although the hang is different from the crash, it could be explained by the same thing, but in this case WaitForSingleObject never returning.

The last line in this image posted by @sid-6581, also seem to indicate that it's blocking exactly there (although cropped off, so it's impossible to see, but ArcFile gives a hint.
image

@sid-6581
Copy link
Contributor

sid-6581 commented Feb 8, 2024

TLDR; sometimes when the nvim process exits (cleanly, by all appearances), the stdin/stdout pipes aren't broken, and nvim-rs doesn't break out of its I/O loops. This causes the runtime to never shut down, because it's waiting for the nvim-rs task.

Ok, I have not seen this before. It's windows only, and you're the only one to reproduce it? I'll try to see if I can reproduce it... and then I need a windows dev environment... might take some time, but I'll look.

I do agree though that the loop @fredizzimo points to looks hellishly suspicious. As far as I can tell though that amounts to busy waiting, i.e. it should put some load on a core. Can you confirm this? Should also fill up the log with debug messages...

(e) I guess I can't even read my own loop, for now I think it's fine. Gonna look into the hang.

There have been a lot of reports of hangs on exit, so unfortunately, it's not just me. I can just assume other's hangs are caused by the same issue, but I can't say for sure without anyone firing up a debugger and checking. When it hangs there's no busy waiting, it's stuck in a Win32 wait function. No debug messages in the log, which makes sense, because it's just stuck in OS land.

@fredizzimo
Copy link
Member

@sid-6581

Is it stuck in this function if you are on nightly?
https://github.com/rust-lang/rust/blob/8ace7ea1f7cbba7b4f031e66c54ca237a0d65de6/library/std/src/sys/pal/windows/handle.rs#L250

Or here in 1.75
https://github.com/rust-lang/rust/blob/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/sys/windows/handle.rs#L250

That code should probably create an event pass that to NtReadFile, and wait for the event, not wait for the file handle directly.

There are some special situations where waiting for the handle is allowed, but I'm not sure they hold here, especially since we are dealing with pipes and not filea https://learn.microsoft.com/en-us/windows-hardware/drivers/ddi/ntifs/nf-ntifs-ntreadfile

If the caller opened the file with the SYNCHRONIZE flag set in DesiredAccess, the calling thread can synchronize to the completion of the read operation by waiting on the file handle, FileHandle. The handle is signaled each time that an I/O operation that was issued on the handle completes. However, the caller must not wait on a handle that was opened for synchronous file access (FILE_SYNCHRONOUS_IO_NONALERT or FILE_SYNCHRONOUS_IO_ALERT). In this case, NtReadFile waits on behalf of the caller and does not return until the read operation is complete. The caller can safely wait on the file handle only if all three of the following conditions are met:
The handle was opened for asynchronous access (that is, neither FILE_SYNCHRONOUS_IO_XXX flag was specified).
The handle is being used for only one I/O operation at a time.
NtReadFile returned STATUS_PENDING.

@KillTheMule
Copy link
Contributor Author

@KillTheMule have you ever seen this problem?

I couldn't really follow along, discussions are quite long... can you spell that out for me? Or link to a specific comment that tells me what you're talking about?

Apologies, I wasn't sure if you had followed the previous discussions. Here is some context:

#2130

TLDR; sometimes when the nvim process exits (cleanly, by all appearances), the stdin/stdout pipes aren't broken, and nvim-rs doesn't break out of its I/O loops. This causes the runtime to never shut down, because it's waiting for the nvim-rs task.

I could not reproduce that. I'm starting neovide from powershell, when the window opens I hit 'a' (having nnoremap a :qa<Enter> in $MYVIMRC), and everything seems fine. Is that how I need to do it? How many tries should this roughly need to reproduce?

@sid-6581
Copy link
Contributor

@KillTheMule have you ever seen this problem?

I couldn't really follow along, discussions are quite long... can you spell that out for me? Or link to a specific comment that tells me what you're talking about?

Apologies, I wasn't sure if you had followed the previous discussions. Here is some context:
#2130
TLDR; sometimes when the nvim process exits (cleanly, by all appearances), the stdin/stdout pipes aren't broken, and nvim-rs doesn't break out of its I/O loops. This causes the runtime to never shut down, because it's waiting for the nvim-rs task.

I could not reproduce that. I'm starting neovide from powershell, when the window opens I hit 'a' (having nnoremap a :qa<Enter> in $MYVIMRC), and everything seems fine. Is that how I need to do it? How many tries should this roughly need to reproduce?

That's how I get it to hang easily as well. I start it up and immediately hit my shortcut that does :qa.

Are you running the latest commit, which includes my workaround for the hang? If so, try changing line 171 in src/bridge/mod.rs from self.runtime.take().unwrap().shutdown_background() to drop(self.runtime.take().unwrap()). I get it to hang repeatedly by quickly triggering a :qa after startup.

@KillTheMule
Copy link
Contributor Author

No I'm running 0.12.2 from the releases page. I also tried via neovide.exe -- -c ":qa", but no hang.

@sid-6581
Copy link
Contributor

No I'm running 0.12.2 from the releases page. I also tried via neovide.exe -- -c ":qa", but no hang.

0.12.2 has my workaround for the hang, so you won't be able to reproduce it with that version.

@KillTheMule
Copy link
Contributor Author

Oh, ok... so I'm gonna setup a rust dev environment on windows.

@sid-6581
Copy link
Contributor

Oh, ok... so I'm gonna setup a rust dev environment on windows.

You can also try 0.12.1, which was before the workaround.

@KillTheMule
Copy link
Contributor Author

Thanks for that hint, something on MS's end seems to be wrong, so I can't install right now...

That being said, I also could not reproduce with 0.12.1. Darn...

@Zhniing
Copy link

Zhniing commented Feb 11, 2024

@Zhniing, could you test this?
If I'm not mistaken (I haven't looked at the changes in detail) the file handles should no longer be asynchrounous, and run into the potential rust standard library bug I found here #2299 (comment). My analysis there is slightly wrong, thoughNtReadFile allows WaitForSingleObject in certain cases, which I think should have been fulfilled, but I see no other possible reason for your crash,

@fredizzimo, I built from this commit with Rust 1.76.0, and the Neovide crashed as before unfortunately. The call stack is the same as #2299 (comment)

@fredizzimo
Copy link
Member

Thank you, I will merge this now to get more testing before we make a new release.

It seems like it did not fix any of the hangs and crashes that we had, so I was a bit optimistic, but nevertheless removing the linebuffering seems like a good change.

Let's continue discussing the hang issues in:

And the Windows 10 crash in

I still suspect they are the same issue, but let's keep them separate until proven otherwise.

@fredizzimo fredizzimo merged commit 9fa8d93 into neovide:main Feb 11, 2024
2 checks passed
@fredizzimo fredizzimo mentioned this pull request Apr 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants