Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stuck on "Generating" #51

Open
mattbisme opened this issue Apr 15, 2022 · 23 comments
Open

Stuck on "Generating" #51

mattbisme opened this issue Apr 15, 2022 · 23 comments

Comments

@mattbisme
Copy link

This seems to happen almost every time I perform a batch (although it's possible that I haven't tried enough singles to know it never happens there either). I haven't done nearly as much with photos; my input is primarily h.264 or HEVC. I've mostly been using the Real- models. But I remember it happening on PAN as well.

Output file sizes are usually less than 4GB. I always run with Parallel Processing using ANE + GPU. There is no consistency on which video gets stuck. Could be the first two in the batch, or somewhere in the middle. But one thing has been very consistent: there are always two videos stuck on "Generating." There's never just one video that didn't make it.

Some other information that might be useful: clicking the "Stop" button does not stop it from generating. I have to quit waifu2x in order to start another batch. Additionally, after quitting the app, the files that were being generated appear to have the expected file size for a completed render.

It seems like maybe something is getting stuck in the completion of the container, rather that the video itself? Although, that doesn't explain why two always get stuck.

No error, but the next two jobs seem to be stuck on "Generating" as a status. I'm not sure how long it's been like this, but it doesn't look like it's going to change any time soon. What part of the encode is this status? How long should it take?

Originally posted by @mattbisme in #48 (comment)

@imxieyi
Copy link
Owner

imxieyi commented Apr 16, 2022

Does this issue reproduce if you disable parallel processing?

@mattbisme
Copy link
Author

I did another batch last night (eight videos in total) with parallel processing disabled, and it got stuck on the third video.

@imxieyi
Copy link
Owner

imxieyi commented Apr 17, 2022

The latest version (6.2.2) contains a potential fix for this issue. Feel free to update.

It's basically the same as the previous issue where I have no idea about the cause since it does not reproduce on my device with the exact same setup. I just guessed the possible reason and implemented something hoping it can help.

@mattbisme
Copy link
Author

I ran some more singles, this time on a ~90 second video. Out of maybe five or six encodes, one of them still got stuck on generating.

@imxieyi
Copy link
Owner

imxieyi commented Apr 22, 2022

I tried 6 90-second videos using PAN to upscale 1080p -> 2160p. All of them were successfully and took 2000 seconds. Tried again and was still successful.

Now I highly suspect this is an issue that requires a specific setup to reproduce. (I haven't received any other reports for this issue.) It will be a huge pain to investigate. Probably it makes sense adding a watchdog mechanism to detect such hang-up and crash the app intentionally so that the state of app can be captured and reported.

I'd suggest you to try CLI version which does not rely on the system built-in encoder. This will also significantly speed up your workflow.

@sclsj
Copy link

sclsj commented May 6, 2022

I also experienced this issue on certain videos. I'm able to reliably reproduce this stuck on generating problem. Let me upgrade to the latest version and test again. Are you able to download and test a 1GB test file? I think if I truncate it the problem no longer produces reliably.

@imxieyi
Copy link
Owner

imxieyi commented May 6, 2022

Sure. If you don't feel like sharing the video publicly you can send a feedback email from the About section in the app.

@chase-cobb
Copy link

Not sure what is special about my setup, but I get stuck at "Generating" almost 100% of the time when using the GUI. The models I have been trying to use are REAL-ESRGAN and REAL-CUGAN, and targeting 2x scaling.

Using the CLI I have never seen any lockup, an upscaled videos are processed correctly. My only issues are when using the GUI.

System is Mac Studio running Monterey 12.3, with M1 Max and 32GB of memory.

If there are any useful logs generated I am happy to include them here, because the GUI path is a very reliable failure for me.

@imxieyi
Copy link
Owner

imxieyi commented May 7, 2022

@chase-cobb Would you mind sharing a sample video that can reproduce the issue reliably? You can use the in-app feedback email to send the file if you don't want to share publicly.

BTW it's preferred to use CLI to process videos because it's far more flexible and efficient in terms of compression. You can even make full use of hardware encoding (via h264_videotoolbox and hevc_videotoolbox) if you still want it.

@sclsj
Copy link

sclsj commented May 7, 2022

I think my macbook pro did go to sleep in the process so I'm not sure. I'm trying it again now (anyway to speed it up?) so while I wait, here are the videos: (1) (2). [Link sent through in-app feedback email.]
So far, it seems like only 1080p videos have this problem. I can only recall use PAN.
What do you mean by "use CLI to process videos because it's far more flexible and efficient in terms of compression"? Does waifu2x use software encoding by default?
And is it possible to add a ts format option for the GUI so that even if the video is stuck the majority of the content can still be watched/used?
I'm using m1 max 32-core 64GB 16'' MBP. Monterey 12.3.1 (newest).

@imxieyi
Copy link
Owner

imxieyi commented May 7, 2022

What do you mean by "use CLI to process videos because it's far more flexible and efficient in terms of compression"?

Using CLI you can bypass the system encoder which is used by this app. Additionally, you can fine-tune compression parameters, 10-bit color, HDR, etc. A few useful documents:

Does waifu2x use software encoding by default?

No. It only supports hardware encoding provided by the OS.

And is it possible to add a ts format option for the GUI so that even if the video is stuck the majority of the content can still be watched/used?

It's not possible without very significant work at low level (not worth doing). The system encoder does not support .ts format as output. You can use CLI to produce any format FFmpeg supports. For example, use mpegts for .ts.

@imxieyi
Copy link
Owner

imxieyi commented May 8, 2022

I have figured out a way to dump process state while the app is running normally. I'd appreciate if any of you can provide a "crash" report while the app is stuck at "Generating".

The steps are follows when the issue reproduces:

  1. Wait about 10 mins to make sure the app is actually stuck.
  2. Find the PID of the app. You can open Activity Monitor app, switch to CPU tab, find waifu2x in it. Then find the PID column for the app. (You may need to right click the column header to make it appear)
  3. Open Terminal app. Execute this command to force crash the app: kill -SIGSEGV {PID}. Remember to replace {PID} with the PID from step 2.
  4. After a few second macOS will show the regular App exited unexpectedly dialog. Please click Report and copy the full content of the report.

The report shouldn't contain any sensitive information. But if you don't want to share publicly, please attach it to a feedback email sent from the app.

@chase-cobb
Copy link

@imxieyi I am running some batches on different systems today (2x M1 Max and a regular M1) with the same version of SW and the exact same files. The behavior I'm seeing seems to be somewhat random, and I would like to verify that it is the file causing the issue before I dig further into that path. Either way, I expect I will have a log to share using the method you mentioned above in the next few hours.

@mattbisme
Copy link
Author

"Random" is definitely how I would describe my experiences as well. I haven't been able to mess with it for a while, but as soon as I get the chance, I'll try to generate a crash report.

@sclsj
Copy link

sclsj commented May 9, 2022

I have figured out a way to dump process state while the app is running normally. I'd appreciate if any of you can provide a "crash" report while the app is stuck at "Generating".

The steps are follows when the issue reproduces:

  1. Wait about 10 mins to make sure the app is actually stuck.
  2. Find the PID of the app. You can open Activity Monitor app, switch to CPU tab, find waifu2x in it. Then find the PID column for the app. (You may need to right click the column header to make it appear)
  3. Open Terminal app. Execute this command to force crash the app: kill -SIGSEGV {PID}. Remember to replace {PID} with the PID from step 2.
  4. After a few second macOS will show the regular App exited unexpectedly dialog. Please click Report and copy the full content of the report.

The report shouldn't contain any sensitive information. But if you don't want to share publicly, please attach it to a feedback email sent from the app.

Is this crash report enough? Do you also want memory dump and/or spindump?

Also, faster version would be killall -SEGV waifu2x directly. No need to find PID. (As long as you are only running one instance of waifu2x across the whole system).

@imxieyi
Copy link
Owner

imxieyi commented May 9, 2022

Is this crash report enough?

Maybe. At least it can tell where exactly is the app stuck at.

Do you also want memory dump and/or spindump?

If you can provide this as well it will be even better. Please send it from feedback email though since it may contain your personal data.

@chase-cobb
Copy link

The tests I ran yesterday pretty much nailed the fact that this is a somewhat random issue, and my general suspicion is that this may be a race condition that is more reliably hit on faster machines. Yesterday I ended up running the same set of files on one of my M1 Max and regular M1. Neither completed the batch files, but the regular M1 made it through the files the M1 Max got stuck on. Exact same files and they were all batch converted with Handbrake prior to the test.

The regular M1 got stuck while processing, which I've never seen before, and I will create a new issue with reports and spindumps for this issue.

The 2nd M1 Max unit ran a different set of files, that have previously been batched via CLI and got stuck on the 4th file of 6 total.

This leads me to believe it is not the files themselves. Also, I have repeatedly batch converted these same files with waifu2x CLI without any issue, in an attempt to test my own automation scripts.

I am working on gathering the necessary reports and spindumps to share shortly.

@chase-cobb
Copy link

Is this a valid email where we can send logs and such? feedback@waifu2x.app

@imxieyi
Copy link
Owner

imxieyi commented May 9, 2022

Yes. Please use this email to send the files.

@chase-cobb
Copy link

Email sent

@imxieyi
Copy link
Owner

imxieyi commented May 10, 2022

After analyzing your files it looks like AVAssetWriter.finishWriting never calls the completionHandler. It's likely due to an internal deadlock inside AVAssetWriter. Since the writer encodes video asynchronously, I guess adding some delay before calling finishWriting will help. The next update will try this and see if it works.

@sclsj
Copy link

sclsj commented May 11, 2022

...Apparently I can't dump process memory without disabling SIP. How can I ensure that a core dump is written when I trigger a crash? From some research it seems like it's not as straightforward as sudo ulimit -c unlimited.

@imxieyi
Copy link
Owner

imxieyi commented May 15, 2022

Thank all of you for providing crash logs and spindumps.

The new 6.2.4 version no longer waits for AVAssetWriter.finishWriting callback but instead polls the status with a timeout. It should no longer be stuck at "Generating" forever.

Note that this is not a proper fix since the reason is still unknown. According to spindumps you shared it was not due to a deadlock. Most likely it will still wait 60 seconds until throwing a "timed out" error and continue with the next file. Error logs will be automatically reported now if you don't block Crashlytics via something like AdGuard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants