Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash with AMD GPU #27

Closed
xenrox opened this issue Jan 16, 2022 · 24 comments
Closed

Crash with AMD GPU #27

xenrox opened this issue Jan 16, 2022 · 24 comments

Comments

@xenrox
Copy link

xenrox commented Jan 16, 2022

Hey,

I wanted to try wluma with ddcutil support on my desktop but it crashes immediately (I tried both amdvlk and vulkan-radeon since it seems to be a vulkan issue).
On my notebook with an integrated Intel GPU it runs fine, unless I connect my external display, then it fails here:

|_| panic!("Did not receive initial brightness value in time"),

ddcutil by itself works completely fine as the same user that runs wluma:

ddcutil --mfg GSM getvcp 10
VCP code 0x10 (Brightness                    ): current value =    85, max value =   100

I appended my config and the backtrace.

config
[frame]
processor = "vulkan"
capturer = "wlroots"

[als.time]
thresholds = { 0 = "night", 7 = "dark", 9 = "dim", 11 = "normal", 13 = "bright", 16 = "normal", 18 = "dark", 20 = "night" }

[[output.ddcutil]]
name = "LG HDR 4K"
backtrace

Continue adjusting brightness and wluma will learn your preference over time.
thread 'predictor-LG HDR 4K' panicked at 'Unable to compute luma percent: ERROR_MEMORY_MAP_FAILED', src/frame/capturer/wlroots.rs:124:26
stack backtrace:
0: rust_begin_unwind
at /rustc/1.58.0/library/std/src/panicking.rs:498:5
1: core::panicking::panic_fmt
at /rustc/1.58.0/library/core/src/panicking.rs:107:14
2: core::result::unwrap_failed
at /rustc/1.58.0/library/core/src/result.rs:1613:5
3: core::result::Result<T,E>::expect
at /rustc/1.58.0/library/core/src/result.rs:1255:23
4: wluma::frame::capturer::wlroots::Capturer::capture_frame::{{closure}}
at ./src/frame/capturer/wlroots.rs:121:32
5: wayland_client::proxy::Main::quick_assign::{{closure}}
at /home/xenrox/.local/state/cargo/registry/src/github.com-1ecc6299db9ec823/wayland-client-0.29.4/src/proxy.rs:273:64
6: wayland_commons::filter::Filter::send
at /home/xenrox/.local/state/cargo/registry/src/github.com-1ecc6299db9ec823/wayland-commons-0.29.4/src/filter.rs:100:13
7: wayland_client::imp::make_dispatcher::{{closure}}
at /home/xenrox/.local/state/cargo/registry/src/github.com-1ecc6299db9ec823/wayland-client-0.29.4/src/rust_imp/mod.rs:194:49
8: <wayland_client::imp::ImplDispatcher<I,F> as wayland_client::imp::Dispatcher>::dispatch
at /home/xenrox/.local/state/cargo/registry/src/github.com-1ecc6299db9ec823/wayland-client-0.29.4/src/rust_imp/mod.rs:179:9
9: wayland_client::imp::queues::EventQueueInner::dispatch_buffer
at /home/xenrox/.local/state/cargo/registry/src/github.com-1ecc6299db9ec823/wayland-client-0.29.4/src/rust_imp/queues.rs:163:23
10: wayland_client::imp::queues::EventQueueInner::dispatch_pending
at /home/xenrox/.local/state/cargo/registry/src/github.com-1ecc6299db9ec823/wayland-client-0.29.4/src/rust_imp/queues.rs:198:31
11: wayland_client::imp::queues::EventQueueInner::dispatch
at /home/xenrox/.local/state/cargo/registry/src/github.com-1ecc6299db9ec823/wayland-client-0.29.4/src/rust_imp/queues.rs:102:28
12: wayland_client::event_queue::EventQueue::dispatch
at /home/xenrox/.local/state/cargo/registry/src/github.com-1ecc6299db9ec823/wayland-client-0.29.4/src/event_queue.rs:152:9
13: <wluma::frame::capturer::wlroots::Capturer as wluma::frame::capturer::Capturer>::run
at ./src/frame/capturer/wlroots.rs:60:13
14: wluma::main::{{closure}}::{{closure}}
at ./src/main.rs:99:21
note: Some details are omitted, run with RUST_BACKTRACE=full for a verbose backtrace.

@cyrinux
Copy link
Sponsor Contributor

cyrinux commented Jan 16, 2022

Hi,
Thanks for your bug report.
Can you attach the output of your ddcutil detect and try to use the serial number in the "name" field please?

@xenrox
Copy link
Author

xenrox commented Jan 16, 2022

Sorry, seems like my display has no serial number 😆

Display 1
I2C bus: /dev/i2c-10
EDID synopsis:
Mfg id: GSM
Model: LG HDR 4K
Product code: 30471
Serial number:
Binary serial number: 544286 (0x00084e1e)
Manufacture year: 2021, Week: 2
VCP version: 2.1

And using some other identifier like GSM results in Unable to find display.

@cyrinux
Copy link
Sponsor Contributor

cyrinux commented Jan 16, 2022

Ok, LG HDR 4K is a good name value. What happen if you change capturer to none please?
Do you confirm also you run a wayland session (and not X11) ?

@xenrox
Copy link
Author

xenrox commented Jan 16, 2022

Yes, I use sway.
If I change capturer to none, it can no longer identify the correct display:

[2022-01-16T19:05:21Z DEBUG wluma] Using config: Config { frame: Frame { capturer: None, processor: Vulkan }, als: Time { thresholds_string_key: {}, thresholds: {11: "normal", 18: "dark", 13: "bright", 9: "dim", 7: "dark", 0: "night", 16: "normal", 20: "night"} }, output_by_type: OutputByType { backlight: [], ddcutil: [] }, keyboard: None, output: [DdcUtil(DdcUtilOutput { name: "LG HDR 4K" })] }
Continue adjusting brightness and wluma will learn your preference over time.
[2022-01-16T19:05:21Z WARN  wluma] Skipping output 'LG HDR 4K' as it might be disconnected: Unable to find display
thread 'predictor-LG HDR 4K' panicked at 'Did not receive initial brightness value in time', src/predictor/controller.rs:69:25
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

@cyrinux
Copy link
Sponsor Contributor

cyrinux commented Jan 16, 2022

Ok thank you, we will back too you asap 👍🏻

@cyrinux
Copy link
Sponsor Contributor

cyrinux commented Jan 16, 2022

To be sure, can you rollback your config to the one attach in the beginning of the issue and give us the few first lines of RUST_LOG=debug wluma ?

@maximbaz
Copy link
Owner

Thanks for reporting @xenrox, and thanks to both of you for helping investigate the problem!

There seems to be two issues at hand, one about matching to a proper ddcutil when capturer is none (capturer should not really affect this, so the cause might be actually somewhere else), another about vulkan crash.

We'll try to get to the bottom of both issues, starting from us providing a dev build with extra debugging info, which should help us narrow down the problem. We'll ping you soon 🙂


Vulkan issue can be tough to fix without having a way to locally reproduce, so just wanted to ask if you would in theory feel adventurous and willing to try helping us fix it? 🙂

I'm almost sure that the issue is due to the fact that I'm using Vulkan API incorrectly, I must have hardcoded some assumptions, and it happens to work on all of our laptops simply because integrated GPU shares memory with CPU and/or RAM, so even "improper" params still allow code to work, while on your desktop you have a real independent GPU, which complains on allocating things that were not supposed to be allocated. Will do my best to help guiding you if you want to help with this, although I'm clearly no Vulkan expert myself, I just hack it until it works 😁

The only temporary alternative would be using capturer "none" on your desktop until this is properly solved, where wluma would simply use ALS (hour in your case), but not the actual screen contents.

@xenrox
Copy link
Author

xenrox commented Jan 16, 2022

To be sure, can you rollback your config to the one attach in the beginning of the issue and give us the few first lines of RUST_LOG=debug wluma ?

[2022-01-16T20:50:42Z DEBUG wluma] Using config: Config { frame: Frame { capturer: Wlroots, processor: Vulkan }, als: Time { thresholds_string_key: {}, thresholds: {9: "dim", 13: "bright", 7: "dark", 0: "night", 20: "night", 11: "normal", 16: "normal", 18: "dark"} }, output_by_type: OutputByType { backlight: [], ddcutil: [] }, keyboard: None, output: [DdcUtil(DdcUtilOutput { name: "LG HDR 4K" })] }
Continue adjusting brightness and wluma will learn your preference over time.
[2022-01-16T20:50:42Z DEBUG wluma::frame::capturer::wlroots] Using output 'Goldstar Company Ltd LG HDR 4K 0x00004E1E (DP-3)' for config 'LG HDR 4K'
thread 'predictor-LG HDR 4K' panicked at 'Unable to compute luma percent: ERROR_MEMORY_MAP_FAILED', src/frame/capturer/wlroots.rs:124:26
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

@xenrox
Copy link
Author

xenrox commented Jan 16, 2022

@maximbaz
Sure I would love to help with debugging the vulkan crash.

I have applied #28 and here is the new output if that helps:

config
env RUST_LOG=debug ./target/debug/wluma
[2022-01-16T20:55:29Z DEBUG wluma] Using config: Config { frame: Frame { capturer: None, processor: Vulkan }, als: Time { thresholds_string_key: {}, thresholds: {9: "dim", 0: "night", 16: "normal", 7: "dark", 11: "normal", 18: "dark", 13: "bright", 20: "night"} }, output_by_type: OutputByType { backlight: [], ddcutil: [] }, keyboard: None, output: [DdcUtil(DdcUtilOutput { name: "LG HDR 4K" })] }
Continue adjusting brightness and wluma will learn your preference over time.
[2022-01-16T20:55:30Z DEBUG wluma::brightness::ddcutil] display found: DisplayInfo { backend: I2cDevice, id: "22794", manufacturer_id: Some("GSM"), model_id: Some(30471), version: Some((1, 4)), serial: Some(544286), manufacture_year: Some(31), manufacture_week: Some(2), model_name: Some("LG HDR 4K"), serial_number: None, edid_data: Some([0, 255, 255, 255, 255, 255, 255, 0, 30, 109, 7, 119, 30, 78, 8, 0, 2, 31, 1, 4, 181, 60, 34, 120, 158, 62, 49, 174, 80, 71, 172, 39, 12, 80, 84, 33, 8, 0, 113, 64, 129, 128, 129, 192, 169, 192, 209, 192, 129, 0, 1, 1, 1, 1, 77, 208, 0, 160, 240, 112, 62, 128, 48, 32, 101, 12, 88, 84, 33, 0, 0, 26, 40, 104, 0, 160, 240, 112, 62, 128, 8, 144, 101, 12, 88, 84, 33, 0, 0, 26, 0, 0, 0, 253, 0, 56, 61, 30, 135, 56, 0, 10, 32, 32, 32, 32, 32, 32, 0, 0, 0, 252, 0, 76, 71, 32, 72, 68, 82, 32, 52, 75, 10, 32, 32, 32, 1, 247, 2, 3, 28, 113, 68, 144, 4, 3, 1, 35, 9, 7, 7, 131, 1, 0, 0, 227, 5, 192, 0, 230, 6, 5, 1, 82, 72, 93, 2, 58, 128, 24, 113, 56, 45, 64, 88, 44, 69, 0, 88, 84, 33, 0, 0, 30, 86, 94, 0, 160, 160, 160, 41, 80, 48, 32, 53, 0, 88, 84, 33, 0, 0, 26, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 44]), mccs_version: None, mccs_database: Database { entries: {} } }
[2022-01-16T20:55:30Z DEBUG wluma::brightness::ddcutil] display found: DisplayInfo { backend: I2cDevice, id: "22791", manufacturer_id: Some("BNQ"), model_id: Some(32561), version: Some((1, 3)), serial: Some(21573), manufacture_year: Some(26), manufacture_week: Some(8), model_name: Some("BenQ XL2411Z"), serial_number: Some("D2G05085SL0"), edid_data: Some([0, 255, 255, 255, 255, 255, 255, 0, 9, 209, 49, 127, 69, 84, 0, 0, 8, 26, 1, 3, 128, 53, 30, 120, 46, 157, 225, 166, 84, 84, 159, 38, 13, 80, 84, 165, 107, 128, 209, 192, 69, 124, 97, 124, 129, 128, 129, 188, 149, 60, 49, 124, 1, 1, 2, 58, 128, 24, 113, 56, 45, 64, 88, 44, 69, 0, 19, 42, 33, 0, 0, 30, 0, 0, 0, 255, 0, 68, 50, 71, 48, 53, 48, 56, 53, 83, 76, 48, 10, 32, 0, 0, 0, 253, 0, 56, 144, 30, 160, 33, 0, 10, 32, 32, 32, 32, 32, 32, 0, 0, 0, 252, 0, 66, 101, 110, 81, 32, 88, 76, 50, 52, 49, 49, 90, 10, 1, 222, 2, 1, 4, 0, 254, 91, 128, 160, 112, 56, 53, 64, 48, 32, 53, 0, 19, 42, 33, 0, 0, 26, 134, 111, 128, 160, 112, 56, 64, 64, 48, 32, 53, 0, 19, 42, 33, 0, 0, 26, 252, 126, 128, 136, 112, 56, 18, 64, 24, 32, 53, 0, 19, 42, 33, 0, 0, 30, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 199]), mccs_version: None, mccs_database: Database { entries: {} } }
[2022-01-16T20:55:30Z WARN  wluma] Skipping output 'LG HDR 4K' as it might be disconnected: Unable to find display
thread 'predictor-LG HDR 4K' panicked at 'Did not receive initial brightness value in time', src/predictor/controller.rs:69:25
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Edit: This is with capturer none.

@xenrox
Copy link
Author

xenrox commented Jan 16, 2022

Made some progress:
One problem is, that my display has no serial. Because of that serial is not Some but None and the iteration does not even reach the display I actually want to select here:

.and_then(|merged| merged.contains(name).then(|| ()))

After fixing that (substituted serial_number with model_id) it can actually match my display (no more warning) but still fails at getting the intial brightness

env RUST_LOG=debug ./target/debug/wluma
[2022-01-16T23:07:41Z DEBUG wluma] Using config: Config { frame: Frame { capturer: None, processor: Vulkan }, als: Time { thresholds_string_key: {}, thresholds: {16: "normal", 13: "bright", 7: "dark", 0: "night", 18: "dark", 20: "night", 9: "dim", 11: "normal"} }, output_by_type: OutputByType { backlight: [], ddcutil: [] }, keyboard: None, output: [DdcUtil(DdcUtilOutput { name: "LG HDR 4K" })] }
Continue adjusting brightness and wluma will learn your preference over time.
thread 'predictor-LG HDR 4K' panicked at 'Did not receive initial brightness value in time', src/predictor/controller.rs:69:25
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrac

@xenrox
Copy link
Author

xenrox commented Jan 16, 2022

Which in turn gets fixed by increasing INITIAL_TIMEOUT_SECS (I set it to 20), so I guess my display is just slow?

[2022-01-16T23:09:43Z DEBUG wluma] Using config: Config { frame: Frame { capturer: None, processor: Vulkan }, als: Time { thresholds_string_key: {}, thresholds: {9: "dim", 7: "dark", 13: "bright", 18: "dark", 20: "night", 0: "night", 11: "normal", 16: "normal"} }, output_by_type: OutputByType { backlight: [], ddcutil: [] }, keyboard: None, output: [DdcUtil(DdcUtilOutput { name: "LG HDR 4K" })] }
Continue adjusting brightness and wluma will learn your preference over time.
[2022-01-16T23:10:03Z DEBUG wluma::predictor::controller] Learning Entry { lux: "night", luma: 0, brightness: 85 }

@maximbaz
Copy link
Owner

Very interesting, thanks for sharing! Will make the necessary adjustment tomorrow for the serial number, you are right we indeed didn't anticipate that it might be None in a valid scenario.

And as for timeout, interesting..... Could also be (at least partially) a debug build to blame, but still, never happened to us so far... Out of curiosity you can try to experiment by adding some logging in ddcutil.rs in get() method to see how long it executes - just to get an idea if your guess was correct.

@xenrox
Copy link
Author

xenrox commented Jan 16, 2022

Yes on a non debug build it works with default settings, on a debug build it takes slightly less than 4 seconds.
Now I have run into a new error, but thats for tomorrow to investigate.

[2022-01-16T23:39:49Z ERROR wluma::brightness::controller] Unable to get brightness value: DDC/CI I2C error: Remote I/O error (os error 121)
[2022-01-16T23:39:49Z ERROR wluma::brightness::controller] Unable to get brightness value: DDC/CI error: Expected DDC/CI length bit

@maximbaz
Copy link
Owner

With 4044541 the missing serial number should be fixed, together with having more debug logging to identify why displays are not being matched. Untested by hopefully will work 😛

DDC/I2C errors like the above is something @cyrinux experienced as well (I dont use external monitors so we've been testing on his setup), we concluded that it's just transient errors that we can't do anything about - although for him the errors only happened if he was messing with screen using ddcutil cli in parallel. So for you the errors occur even just by running wluma and nothing else?

To clarify, as long as the errors are transient, the app should just work, we log and ignore those errors.


Let me summarize to see if I remember it all correctly, the state with the latest main code:

  • On desktop with external GPU, only capturer=none now works
  • On laptop, capturer=wlroots now works with both only internal screen and with external display plugged in.

Everything correct?

@xenrox
Copy link
Author

xenrox commented Jan 17, 2022

With 4044541 the missing serial number should be fixed, together with having more debug logging to identify why displays are not being matched. Untested by hopefully will work 😛

Yeah matching works fine now, thanks a lot!

DDC/I2C errors like the above is something @cyrinux experienced as well (I dont use external monitors so we've been testing on his setup), we concluded that it's just transient errors that we can't do anything about - although for him the errors only happened if he was messing with screen using ddcutil cli in parallel. So for you the errors occur even just by running wluma and nothing else?

I was using ddcutil in parallel to train wluma. Thanks for the explanation.

To clarify, as long as the errors are transient, the app should just work, we log and ignore those errors.

Let me summarize to see if I remember it all correctly, the state with the latest main code:

* On desktop with external GPU, only `capturer=none` now works

* On laptop, `capturer=wlroots` now works with both only internal screen and with external display plugged in.

Everything correct?

Correct! (Just had to slightly increase INITIAL_TIMEOUT_SECS again)

@maximbaz
Copy link
Owner

Can you tell me by how much are you increasing INITIAL_TIMEOUT_SECS to make it work? Would 5 be enough? I wouldn't want it to be too long, like waiting for 10 seconds only to learn that there is some bug or misconfiguration would seem excessive.... But of course I also dont want to have such a small value that it causes random failures like you experience!

@xenrox
Copy link
Author

xenrox commented Jan 17, 2022

Yes 5 is enough, it only failed one or two times with 4.

maximbaz added a commit that referenced this issue Jan 17, 2022
@maximbaz
Copy link
Owner

I was using ddcutil in parallel to train wluma.

As long a you can use it to train wluma, yes just disregard the errors - maybe they should have been warnings really, but I wanted to see if someone experiences those issues in other scenarios.

We want to add CLI to wluma, so that it will be possible to train it natively, without any potential conflicts between different tools who all want exclusive access to the same resource. But until that happens, if ddcutil works for the purpose, just use it 🙂

I increased the timeout, of course just let me know if you still manage to see issues with 5 and we will increase more, after all this constant is not really based on anything else but the feedback like yours.


It seems there is just one outstanding issue left, proper support for independent GPU for capturer=wlroots. There are a few small things @cyrinux and I want to deal with first (that are already in progress), and then we'll get to this.

If you want to make a head start, here's what you can do:

  • install Vulkan validation layers (are you on Arch? extra/vulkan-validation-layers)
  • export VK_INSTANCE_LAYERS=VK_LAYER_KHRONOS_validation
  • cargo run

This is an extremely helpful tool, it will log all wrongdoings related to Vulkan. On my machine it produces no issues, but I'm willing to bet that on your desktop we will get some nice errors to fix.

@xenrox
Copy link
Author

xenrox commented Jan 17, 2022

If you want to make a head start, here's what you can do:

* install Vulkan validation layers (are you on Arch? `extra/vulkan-validation-layers`)

* `export VK_INSTANCE_LAYERS=VK_LAYER_KHRONOS_validation`

* `cargo run`

This is an extremely helpful tool, it will log all wrongdoings related to Vulkan. On my machine it produces no issues, but I'm willing to bet that on your desktop we will get some nice errors to fix.

Yes, I am on Arch.

VUID-vkMapMemory-memory-00682(ERROR / SPEC): msgNum: -330527817 - Validation Error: [ VUID-vkMapMemory-memory-00682 ] Object 0: handle = 0xfa21a40000000003, type = VK_OBJECT_TYPE_DEVICE_MEMORY; | MessageID = 0xec4c8bb7 | Mapping Memory without VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT set: VkDeviceMemory 0xfa21a40000000003[]. The Vulkan spec states: memory must have been created with a memory type that reports VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT (https://www.khronos.org/registry/vulkan/specs/1.2-extensions/html/vkspec.html#VUID-vkMapMemory-memory-00682)
    Objects: 1
        [0] 0xfa21a40000000003, type: 8, name: NULL
thread 'predictor-LG HDR 4K' panicked at 'Unable to compute luma percent: ERROR_MEMORY_MAP_FAILED', src/frame/capturer/wlroots.rs:124:26

@maximbaz
Copy link
Owner

Perfect, at least we will not be walking in the dark, but we have precise description of what to fix, with helpful links 😁

Will share some hints in case you want to try to play around, if not - no big deal, we will get to this eventually 😉

It crashes on mapping the memory here:

let buffer_pointer = self.device.map_memory(

It tried to map the memory to the following allocated buffer:

let buffer_memory = unsafe { device.allocate_memory(&allocate_info, None)? };

Which was allocated in some memory that fits these requirements:

let allocate_info = vk::MemoryAllocateInfo {

Specifically, I'm nearly positive that the bug is here:

memory_type_index: 0,

Instead of picking the index of the right memory, we just pass 0, which on our laptop is the only available memory, so the choice is always right 😁

The fix could be as simple as trying to hardcode there 1 and seeing if it works, but I dont know if indices are actually incremental - but for sure there is an API call where we can get the right index for the memory that reports VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT (as per the error message), and then pass the right index there.

If you do look into it, please keep us posted how it goes!

@cyrinux
Copy link
Sponsor Contributor

cyrinux commented Jan 18, 2022

Guys, I reproduce the same bug and can confirm that putting 1 doesnt crash.

@maximbaz
Copy link
Owner

@cyrinux has fixed the last issue, we'll release v4 soon, but if you can confirm @xenrox it would be appreciated 😉

@xenrox
Copy link
Author

xenrox commented Jan 18, 2022

Yes it seems like I am all out of bugs.
Thanks a lot for your help @maximbaz , @cyrinux!

@maximbaz
Copy link
Owner

Thanks to you too 🙂
Will do some sanity checks and release v4 tomorrow, FYI the format of config.toml is slightly changed now 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants