New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Under-voltage detected! (0x00050005) spams dmesg on new kernel 4.14.30-v7+ #2512

Closed
E3V3A opened this Issue Apr 16, 2018 · 71 comments

Comments

Projects
None yet
@E3V3A
Copy link

E3V3A commented Apr 16, 2018

After upgrade of kernel to 4.14. dmesg is now spammed by Under-voltage detected! (0x00050005) messages where no problem was shown previously. I've ran this device non-stop for months, without any problem until after update, so under-voltage level settings or other config must have changed.
Spamming dmesg or journlctl --system buffer certainly is not helping anyone.

kern  :crit  : [ 1701.464833 <    2.116656>] Under-voltage detected! (0x00050005)
kern  :info  : [ 1707.668180 <    6.203347>] Voltage normalised (0x00000000)

Also related to #2367

@E3V3A E3V3A changed the title Under-voltage detected! (0x00050005) spams dmesg Under-voltage detected! (0x00050005) spams dmesg on new kernel 4.14.30-v7+ Apr 16, 2018

@pelwell

This comment has been minimized.

Copy link
Contributor

pelwell commented Apr 16, 2018

The kernel under voltage notification is new, but the threshold and detection mechanism is unchanged. You are now being made aware of the fact that your Pi is insufficiently powered for the load placed upon it. This is bad for performance and potentially harmful to system stability.

@rodizio1

This comment has been minimized.

Copy link

rodizio1 commented Apr 16, 2018

I can tell from lots of testing and measuring with a multimeter, that 99% of the "typical" USB power supplies (phone chargers, gadget chargers, power banks, etc.) either don't supply the advertised amps, or they drop voltage too much. Or both. Even if the power supply itself delivers stable 5V at 2-2.5amps, often the voltage at the end of the cable has dropped too much because of too thin wires.

@E3V3A

This comment has been minimized.

Copy link
Author

E3V3A commented Apr 16, 2018

I'm well aware of the working of the RPi power supply. But the fact of the matter is that:

  • It didn't happen with the earlier kernel, so what make you think this is an improvement?
    (The flash icon was good and annoying enough!)
  • Spamming kernel messages (and all other related buffers) with repeated messages on a device where which has proven to be working just fine under those conditions before, is now risking excessive SD card wear and harder debugging because older and more relevant kernel messages get FIFO'd out eventually.
  • The current crit level doesn't even respect the printk settings and keep spamming even after setting dmesg -n 1 or using sysctl -w '1 1 1 1'. So AFAICT, this is neither critial, nor compliant to standard *nix behavior, and does not provide any improvement whatsoever.
@P33M

This comment has been minimized.

Copy link
Contributor

P33M commented Apr 16, 2018

It didn't happen with the earlier kernel, so what make you think this is an improvement?
(The flash icon was good and annoying enough!)

It was happening, you just didn't notice that it was. Headless users previously had no automatic notification that the undervoltage had occurred, relying on manually querying the firmware in order to find out that this was the case. Also, KERN_CRIT is "A critical condition occurred like a serious hardware/software failure" and so is the appropriate log level for a condition that is likely to cause system instability.

@JamesH65

This comment has been minimized.

Copy link
Contributor

JamesH65 commented Apr 16, 2018

Note that the Pi is not guaranteed to work correctly when the power supply at the board is less than 4.63v which (within +-10%) which is the point at which the Icon is displayed, and the new reporting will add a message to the log. Note, that doesn't mean that YOUR particular Pi will stop working, just that the voltage is low enough that there could be issues. If you are getting messages in the log, then the voltage IS dropping below 4.63, and there is a risk of system instability. Just because you haven't actually seen any system stability, does not mean the risk is not there.

I am not sure that there is an option to disable the warning, but tbh, that would be like putting tape over the engine warning light of your car. OK, you cannot see the nasty bright warning light, but you risk the engine blowing up.

@E3V3A

This comment has been minimized.

Copy link
Author

E3V3A commented Apr 17, 2018

Come on guys! I'm sure you know what I meant.

It was happening, you just didn't notice that it was.

I noticed every time the bright yellow flash was blinking on my screen, but at the time (before this issue) usually only when processor was under load. Now there is nothing running on it, and the kernel log is flooded for something that I no longer have any control over. That is the problem. And you still have not addressed the issue why it is not possible to adjust that property with the standard sysctl tool that is meant for exactly that.

I have been running this particular configuration for almost 2 years non-stop and without any problems I could not deal with, until this last update. Only, to be told, "dude it's your power supply, get a new one." That surely cannot be your new marketing strategy!? I happen to know a lot about hardware, especially embedded hardware, so I wonder how the rest of the community will feel or respond to this, once they find out.

So clearly this is not remotely anything critical, and should warrant at most one notification in the kernel log (or whichever you seem fit). The point again, is that in its current state, it is spamming any other problem out of existence. So no matter how proud you are having made this improvement, it is simply a poor decision, at least from an ethical standpoint.

One solution could be set the time limit on that error.

@asavah

This comment has been minimized.

Copy link

asavah commented Apr 17, 2018

@E3V3A

So no matter how proud you are having made this improvement, it is simply a poor decision, at least from an ethical standpoint.

IMHO this is a great decision from any standpoint.
Imagine a headless pi on which you'd never see the damn lighting, you'd never know you have power problems until it was too late or unless you run vcgencmd get_throttled, which you wouldn't run unless you already suspect power/thermal problems.

Thanks to this I found on my headless pi2 that its PSU (cheap chinese crap) lost some juice and was no longer able to provide 5v under heavy load (it was OK for ~ 1 year), and I was able to replace it before something bad happened.

IMO this is a great improvement as it enhances kernel<->vc4 communications and provides an alert to the user.

All your babbling about "ethics" makes no sense here.

So clearly this is not remotely anything critical,

AFAIK (at least in server world) power problems are considered critical.
There are servers (eg. DELL) which will refuse to boot if power budget is below what BIOS/UEFI calculates is needed.
If software alerts about a hardware problem one should fix the damn hw problem ASAP. Period.

The people here are the engineers who design and support the pis, I bet they know what power requirements are ...

If you are not happy with the log spam - learn how to filter specific syslog messages based on message content,
yes, you can do that and much more with rsyslog.

@ThomasKaiser

This comment has been minimized.

Copy link

ThomasKaiser commented Apr 18, 2018

Now there is nothing running on it, and the kernel log is flooded for something that I no longer have any control over

@E3V3A Your powering sucks and you have to fix this (and your way of thinking). You're affected as so many other RPi users by a phenomenon called voltage drop. Replace the cable between your board and the PSU or get the official RPi PSU and you're done.

You even suffer from instabilities and still don't get it that you have a hardware problem? Just like this guy here: bamarni/pi64#66

PSUs show aging effects too and a voltage drop under load is one of the many symptoms.

@E3V3A

This comment has been minimized.

Copy link
Author

E3V3A commented Apr 18, 2018

@asavah

All your babbling about "ethics" makes no sense here.

Yes, you're right. It doesn't belong here at all. It only reflects my frustration of useless answers from what I can only assume are your colleagues.

@ThomasKaiser

It's interesting that you refer to that exact issue, because the guy specifically says:

I use a pi3 with the “official“ 2.5A power supply. No issues with other builds / SD cards.

and then goes on saying that:

While trying various ways to decrease the "stress" on the pi one of my solutions was creating a swap file. This fixed the problem,

So it merely show how you guys love brush off any issues with a general answer:
"Your powering sucks and you have to fix this (and your way of thinking)."

So, yeah, then it makes sense to blink that under-voltage flash every few seconds, because if you do, no matter what issues people have, you can always refer back to that and repeat the sentence above and close the issue. I'd close this issue with the would-be-tag "We know it better and we know more about your PSU and the cable you're using, than you do."

So for future RPi sales, everyone would be much better off if you would just build your magic power supply directly into the device, that way there will never be any more issues and complaints and you could save 1000's of man-hours of work because of all these PSU related issues.

Then Nostradamus, predicted back in the 1500's that a few months from now, there will be storm of new issues regarding SD card failures due to excessive wear and failed SD writes... not to mention the performance overhead for spamming /var/log/.

In conclusion, the only serious solution for me (and you) seem to be to revert to kernel 4.9 and everyone will be happy again.

@ThomasKaiser

This comment has been minimized.

Copy link

ThomasKaiser commented Apr 18, 2018

@E3V3A Just to be sure: What is printed on your PSU? 4.63V or something with a 5? If there's a 5 printed do you get that there's something wrong when the device to be powered by this setup reports less than 4.63V already without any load at all? Can you imagine how low voltage will drop with some load applied or some USB peripherals that need also some juice?

Do you think devices that have a power requirement of 5V work properly when you provide only 4V?

@ThomasKaiser

This comment has been minimized.

Copy link

ThomasKaiser commented Apr 18, 2018

In conclusion, the only serious solution for me (and you) seem to be to revert to kernel 4.9 and everyone will be happy again.

Simply create /etc/rsyslog.d/ignore-underpowering.conf with :msg, contains, "oltage" ~ and you can enjoy an instable system even with kernel 4.14 :)

BTW: Just found it. There are SBC that allow for constant input voltage monitoring. What you can see here is a PSU that provided 5.25V in the beginning after approximately 1.5 years of constant operation: https://forum.armbian.com/topic/5699-how-to-provide-and-interpret-debug-output/?do=findComment&comment=44210 -- DC-IN dropped as low as 4.2V with some light load (this board has also a good PMIC and a large battery and power circuitry uses boost converters to provide stable voltages to all subsystems, USB and SATA included)

@E3V3A

This comment has been minimized.

Copy link
Author

E3V3A commented Apr 18, 2018

@ThomasKaiser
I edited the rsyslog.d config files as you mentioned in the default /etc/rsyslog.conf with and without tabs, like this:

:msg, contains, "oltage" ~

Indeed this removes the voltage related logs from the /var/log/*.log files. 👍
But apparently dmesg which is using /dev/kmsg and /proc/kmsg, seem independent of syslogd and rsyslogd settings, and thus still show all under-voltage entries as before with dmesg -e -x. But I guess I can live with that.

Regarding the input voltage, I am surprised that the detector is able to measure the voltage to the second decimal 4.63, but that there is no way to read it from /sys. What is that all about? How and what does the device actually measure when the voltage is lower than that threshold?

Either way I'll report back, once I have the values. In the process of all this investigation I've unfortunately found a wide range of other unpleasant surprises coming from this update. All sorts of things, like overwriting ALSA configurations, starting services that was never ran before, automatically running apt upgrade, etc. :(

@pelwell

This comment has been minimized.

Copy link
Contributor

pelwell commented Apr 18, 2018

Regarding the input voltage, I am surprised that the detector is able to measure the voltage to the second decimal 4.63, but that there is no way to read it from /sys. What is that all about? How and what does the device actually measure when the voltage is lower than that threshold?

It's a hard-wired threshold, implemented by the new PMIC on the 3B+ and using discrete components on older boards - we only know which side of the threshold the voltage is.

@JamesH65

This comment has been minimized.

Copy link
Contributor

JamesH65 commented Apr 18, 2018

With regard to you other comments on the 4.14 update, it's quite a big move from 4.9, so I would expect some fairly obvious changes. Also note that the huge majority of changes are from the upstream kernel, not Raspberry Pi. However, automatically running apt update makes no sense. That should never happen by default, and I've certainly not seen it in any testing (we've had 4.14 in test for quite a few months or so).

@E3V3A

This comment has been minimized.

Copy link
Author

E3V3A commented Apr 18, 2018

However, automatically running apt update makes no sense.

Nope.

# cat /etc/cron.daily/apt-compat
...
exec /usr/lib/apt/apt.systemd.daily

# Then in:
# cat /usr/lib/apt/apt.systemd.daily

#!/bin/sh
#set -e
#
# This file understands the following apt configuration variables:
# Values here are the default.
# Create /etc/apt/apt.conf.d/10periodic file to set your preference.
#
...
#
#  APT::Periodic::Enable "1";
#  - Enable the update/upgrade script (0=disable)
...
#  APT::Periodic::Download-Upgradeable-Packages-Debdelta "1";
#  - Use debdelta-upgrade to download updates if available (0=disable)
...

You can see it here:

# Check for APT services:
# systemctl --all |grep apt-

apt-daily-upgrade.service   loaded    inactive dead      Daily apt upgrade and clean activities
apt-daily.service           loaded    inactive dead      Daily apt download activities
apt-daily-upgrade.timer     loaded    active   waiting   Daily apt upgrade and clean activities                              
apt-daily.timer             loaded    active   waiting   Daily apt download activities

So it's possible it doesn't do anything, but it is still running everyday.
I found this by looking in the /var/log/daemon.log:

systemd[1]: Starting Daily apt upgrade and clean activities...
systemd[1]: Started Daily apt upgrade and clean activities.
systemd[1]: apt-daily-upgrade.timer: Adding 28min 11.764106s random time.
systemd[1]: apt-daily-upgrade.timer: Adding 19min 6.283733s random time.
systemd[1]: Stopped Daily apt upgrade and clean activities.
systemd[1]: Stopped Daily apt download activities.

I have not investigated further...

@ThomasKaiser

This comment has been minimized.

Copy link

ThomasKaiser commented Apr 18, 2018

Indeed this removes the voltage related logs from the /var/log/*.log files.

I can't believe that you really did this instead of fixing the problem. Are you aware that you turned your Pi into a 600 MHz device by ignoring your under-voltage issues? You're running frequency capped all the time and based on your description your PSU will most probably die soon anyway (since what's the reason for under-voltage now occuring even with no load at all?)

@E3V3A

This comment has been minimized.

Copy link
Author

E3V3A commented Apr 19, 2018

@ThomasKaiser

I can't believe that you really did this instead of fixing the problem.

There was no problem until I updated with this kernel!

So yeah, perhaps my power supply is not ideal and crappy, but the fact of the matter is that it was running on full speed, on medium load and everything else was working more or less fine before your kernel push. I still can't believe you pushed out that crappy Kernel update before proper testing or getting more community feedback. (Now I already have another kernel update waiting.) I've already spent days trying to repair and fix all the bloat and issues that resulted from this, and still seem to have a long way to go. In fact, at this point I would just like to downgrade! Unfortunately I don't see an easy way to do this, at this point. So thanks a lot.


And what make you think that this setup is so much more reliable?
Last time I checked, capacitors are both unreliable and not very precise, unless you put military grade (Radio Shack ;) caps in there.

schematic 1

So if it is true that you are using the APX803-46, then there is a range of V_th of: 4.56 4.63 4.70. This is apparently a well known issue and well documented here. There they propose that you should have used the APX803-44 instead, with a range: 4.31 4.38 4.45, and nobody would have had any problems! One of the main problems with your design, is stated like this:

The power input circuit design is outside of the bounds of what we can control. This design forces businesses to create and customers to purchase power supplies that are out of compliance with industry standards. The reason some other power adapters do not experience this issue is because they provide dangerously high voltages that are not standards complaint. In our tests of this issue, we found power supplies delivering up to 5.7Vopen and 5.5V with an 0.5A load. These may fry sensitive USB electronics that do not have any protection built-in.

So, now please spare us all the PSU excuses, and revert the kernel & firmware to be a little more accepting.

One way you could do this, is by using a broader time constant for the under voltage. I.e. average the voltage for a minute or something.

@E3V3A

This comment has been minimized.

Copy link
Author

E3V3A commented Apr 19, 2018

And yes, I have mentioned it before, elsewhere. Please provide a proper CHANGELOG to your kernel releases, so people don't have to fall into this trap. Being able to use apt-get changelog raspberrypi-kernel would have been great, but I was told as an excuse that it is not maintained by you. But then you could always document it elsewhere... GitHub has Wiki pages you know!

@ThomasKaiser

This comment has been minimized.

Copy link

ThomasKaiser commented Apr 19, 2018

So yeah, perhaps my power supply is not ideal and crappy, but the fact of the matter is that it was running on full speed

Nope. It seems you're relying on 'Linux standards' like

/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq

which you can't on the Raspberry Pi since containing bogus values. When you're running undervolted the firmware caps frequency to 600 MHz while cpuinfo_cur_freq is telling you only an irrelevant number (900 MHz on RPi 2, 1200 MHz on RPI 3 and 1400 MHz on RPi 3+). Only way to check for the problem is currently

vcgencmd measure_clock arm | awk -F"=" '{printf ("%0.0f",$2/1000000); }'

Of course you're not alone. Especially headless users don't get it that they run at 600 MHz max all the time :) See for example bamarni/pi64#4 (comment)

@E3V3A

This comment has been minimized.

Copy link
Author

E3V3A commented Apr 19, 2018

cpuinfo_cur_freq is the HW clock, (not from kernel) and seem to give the same result as vcgencmd measure_clock.

@ThomasKaiser

This comment has been minimized.

Copy link

ThomasKaiser commented Apr 19, 2018

Is the result of the following 600 or 1200?

sysbench --test=cpu --cpu-max-prime=5000 --num-threads=4 run && vcgencmd measure_clock arm | awk -F"=" '{printf ("%0.0f",$2/1000000); }'
@JamesH65

This comment has been minimized.

Copy link
Contributor

JamesH65 commented Apr 19, 2018

There was no problem until I updated with your kernel!

Yes there was, your Pi was undervolted. The fact there were not outward signs of corruption etc doesn't obviate that fact. All the kernel is doing different is REPORTING a problem. A problem that may always have been there, previously unnoticed.

Just a FYI, TKaiser is not an employee of the RPF, its not 'his' kernel. He is a well informed community member trying to help you. I'm the only employee currently commenting on this thread.

@E3V3A

This comment has been minimized.

Copy link
Author

E3V3A commented Apr 19, 2018

@JamesH65

Yes there was, your Pi was undervolted.

I'm not arguing that there was not an under-voltage problem before, but I am arguing that I had no other problems at all before this update from 4.9.80, except the very occasional (once every few days) issue #2510, that I posted and which still hasn't been addressed even if it seem to be from several years back.

There was an under-voltage problem right before, but it was under load and showing every few minutes at most. Now it is showing every few seconds under no other load than that provided by the update itself, and with USB sound-card disconnected. I am also arguing that when I last checked CPU frequency, probably >6 months ago it was running at 1200. So it's totally irrelevant for this issue, to repeatedly ask me to post the current speed, since the OP is already stating that I am being throttled.

So, yes, I'm also trying to help, by reporting my findings here. But, I am now quite fed up by these discussions about PSU. It's a horribly expensive scape-goat, no matter how you look at it. You made a design error, and we have to live with it. We still love our RPi3s, and when it works, it works. Mine was working, until this update.

My biggest mistake was not to first check all the issues here, before blindly updating. (Because it went well before.)

@pelwell

This comment has been minimized.

Copy link
Contributor

pelwell commented Apr 19, 2018

Denial, as they say, isn't just a river in Egypt.

You seem like an intelligent, knowledgeable person, yet you can't get over the fact that everybody else on this post is actually trying to help you - consider it an intervention. By under-powering your Pi you are limiting its performance and risking corruption on a daily basis. @ThomasKaiser is attempting to tell you that the frequency throttling is managed by the firmware without the knowledge of the CPU governor, so unless you use vcgencmd measure_freq arm you aren't getting an accurate picture of the ARMs clock speed.

You have raised a valid point that the kernel messages are too frequent, and we are preparing a patch to limit the rate - an initial message immediately when it first happens (and after a long gap), then periodic digests with a count would be nice - but other than that we have no plan to change this mechanism because we consider it an important service to our users.

@ThomasKaiser

This comment has been minimized.

Copy link

ThomasKaiser commented Apr 19, 2018

with USB sound-card disconnected

So this 'sound-card' is powered by the Pi or why do you mention this? Do you know how low voltage is allowed to drop for USB peripherals according to specs for this type of device? 4.4V or 4.75V?

@pelwell

This comment has been minimized.

Copy link
Contributor

pelwell commented Apr 20, 2018

You've misunderstood the firmware driver code - old_uv and new_uv are effectively booleans, and the comparison acts as an edge detector - one message is for the rising edge, the other for the falling edge.

@JamesH65

This comment has been minimized.

Copy link
Contributor

JamesH65 commented Apr 20, 2018

We didn't listen to the community so much that at no point yesterday did I modify the driver to add rate limiting, spend some time testing, and then create a PR today.

Oh, hold on, here it is. Looks like we did listen after all.

#2520

And before accusing people of DD programming, probably best to understand what the code is doing before spouting off and making yourself look foolish. Remember, the people you seem so keen on pissing off are the people YOU need to fix things.

@lategoodbye

This comment has been minimized.

Copy link
Contributor

lategoodbye commented Apr 20, 2018

From my point of view there are 2 use cases:

  1. Adjustable power supply
  2. Non-adjustable power supply

Case 1: The current logging behavior is helpful to find the correct settings at runtime.
Case 2: Since the user doesn't have the chance to change the PSU during runtime, this ping-pong behavior between under-voltage detected and "normalised" ísn't helpful. It's sufficient to print the issue only once, because the provided power won't get better.

Here my suggestion:
Add a DT or kernel parameter to switch between the following modes:
a - current kernel log behavior
b - store the sticky bits and only add a new kernel messages if a new sticky bit has been added

Just my two cents

@P33M

This comment has been minimized.

Copy link
Contributor

P33M commented Apr 20, 2018

@lategoodbye I disagree with Case 2: if your power supply is "known good" yet you have a misbehaving peripheral connected either via GPIO header or USB port, having a timestamped log (albeit ratelimited) allows you to figure out which peripheral/usage condition is causing the undervoltage.

Components can fail in service, so it's a useful diagnostic aid.

@E3V3A

This comment has been minimized.

Copy link
Author

E3V3A commented Apr 20, 2018

@lategoodbye
Your suggestion is great! Probably the only one that can satisfy everyone. I would also be very happy to see this added as a boot/config.txt or cmdline parameter. But I have clearly used up all my good Karma points here, so perhaps some other people could also chime in as well?

@P33M
To disagree with one of the cases is not helpful, if the other case is also available.
(This thread has already become an epitome of all sorts of disagreements, on all sort of levels.)
So how do you suggest to move ahead with this?

@JamesH65
PR 2520 is perhaps vanilla helpful, but seem redundant since we should already have the sysctl items for that. The following should accomplish the exact same thing:

sudo sysctl -w kernel.printk_devkmsg=ratelimit
sudo sysctl -w kernel.printk_ratelimit=300
sudo sysctl -w kernel.printk_ratelimit_burst=3

I use the word, "should" here, because as I mentioned in a previous post, it seem that these are ignored, so if that PR enable them again, that is great.

The other problem with that PR, is that it would probably also throttle all other kernel log messages. That is also why I would vote for Stefan's suggestion.

@JamesH65

This comment has been minimized.

Copy link
Contributor

JamesH65 commented Apr 20, 2018

The PR doesn't use vanilla rate logging - it's uses it own interval (5 minutes) and burst (3) so is unaffected by the kernel rate logging settings (although it does use the same rate logging code)

It's been merged, I doubt I'll have time to make any more changes., Many more important things to do. We will consider PR's from elsewhere however.

@E3V3A

This comment has been minimized.

Copy link
Author

E3V3A commented Apr 23, 2018

I'm trying to better understand the PR code raspberrypi.c and how it interacts with the kernel, sysfs and logs, but I can't find where it goes. I can't see any trace of this code in the kernel.img nor in any of the modules, or libraries, even if:

# cat /lib/modules/4.14.34-v7+/modules.builtin |grep rasp
kernel/drivers/firmware/raspberrypi.ko

However, this file or its code is nowhere to be found. So where is it hiding?

  • What would it take to write a DT overlay to implement the suggestions by Stefan? @lategoodbye
@pelwell

This comment has been minimized.

Copy link
Contributor

pelwell commented Apr 23, 2018

The overlay/parameter is trivial to add. Extending the driver to support it is slightly more work, but not much. The reason it hasn't been done is that we aren't convinced it is a sensible change - I think it sends the wrong message, that the problem is managing the warnings, when in fact the problem is an inadequate power delivery system.

@notro

This comment has been minimized.

Copy link
Contributor

notro commented Apr 23, 2018

Feature tweaking like this is often done using a module parameter. Device Tree is primarily used to describe the hardware.

Maybe something like this:

static bool rpi_firmware_uvlog = true;
module_param_named(uvlog, rpi_firmware_uvlog, bool, 0600);
MODULE_PARM_DESC(uvlog, "Enable logging of Under-voltage [default=true]");

My take on whether it should be easy or not to disable safety guards is: let people walk a tightrope across Niagara Falls if they want to. We should do our best to inform of the dangers involved though.
Maybe a fat warning in probe():

	if (!rpi_firmware_ovlog)
		pr_warning("Under-voltage logging has been disabled. This is not recommended etc. etc.\n")

If we block actions like this we may find ourselves standing in the way of the hacker/maker spirit and I think that would be sad.

Edit: ovlog -> uvlog

@JamesH65

This comment has been minimized.

Copy link
Contributor

JamesH65 commented Apr 23, 2018

I think that disabling safety guards should ALWAYS be difficult. Otherwise people will do it, and that will cause us more problems than we want to deal with. For example, you need all sorts of licences to be able to tightrope across Niagara falls. People can still do it, its just a PITA to arrange. Safety software is in the same boat. People can do it - all the source is available to change to your hearts content, but I do not believe we should make it easy.

@JamesH65

This comment has been minimized.

Copy link
Contributor

JamesH65 commented Apr 23, 2018

Here's how it would go.

Random user gets a voltage warning.
Googles
See's that he needs a better power supply.
Can't get one straight away, so finds out how to ignore the message
Sets a easy to change module parameter to ignore the warning.
2 weeks later, SD card corrupts, some random USB error happens, or some other power related issue
Posts on forum. Doesn't mention he disabled warnings.
Much time spent by people trying to find the problem that should have been obvious from the outset.

@notro

This comment has been minimized.

Copy link
Contributor

notro commented Apr 23, 2018

@JamesH65:
My Niagara falls tightrope analogy turned out to be best used for why it should be hard to circumvent the safety checks :-)

As for how difficult it should be to circumvent, I have failed to factor in how easy it is to compile a kernel these days. Howto information is readily available and it's quite fast to do it on the Pi itself, compared to the overnight job it once was.

As for you're troubleshooting on the forum argument, it didn't cross my mind that it could be a problem, but I only scan a few forums and don't engage in helping people with SD card corruption issues so I really wouldn't know.

I see that there's a new Raspbian out which has this under-voltage logging, which probably means that the Debian kernel package has been updated too.
I will be interesting to see how this pans out over the next weeks.

It wasn't that long ago that I learned that the power cable gauge plays a factor in this, not only the power supply rating. A note about this in the Power supply section in the Documentation would be good I think.

@E3V3A

This comment has been minimized.

Copy link
Author

E3V3A commented Apr 23, 2018

It's always very enlightening to see how we are so different in this DIY hack-your-own-device philosophy.

AFAICR RPi was based on the ideology of making cool HW easily accessible to the general public, including kids. So just as a kid with a hammer, knife or fire will learn early on how easily it is to destroy something, or get burnt, that should not prevent us from allowing kids to use those most basic tools of life. Or making it harder for them to use and learn about them. So IMO and in this particular case, I simply don't see how the cmdline option obtained from proper documentation (with all above and beyond warnings) would possibly make this worse, than for people trying to force feed extra voltage to their RPis, using for example the far more dangerous, USB back-powering method, or double feeding from different sources. Not to mention how easy it is to abuse the GPIO's. Thus I find the above arguments for "making it more difficult" to implement, as exceptionally lame.

As a side note, for whoever happen to come across this thread. I just added the boot config.txt option: avoid_warnings=2 and my god, finally all that kernel/dmesg garbage is gone! In addition it seem that the device is running smoother. Yes, it is throttled to 600 MHz, which I guess is by the firmware, but already running better. I still have to do some proper performance tests, but I really do think there is a performance hit, when those messages are enabled. The IO reaction just seem more jumpy and laggy while the kernel logs are spammed. (NB. I am still on 4.4.14.30 and not yet on the 4.14.34, where there were some sysfs and log fixes.) What is mysterious though, is why vcgencmd get_throttle is returning 0x0, when clearly the device is throttled. -- [EDIT] That option turn off the throttling too, so the normal ondemand kernel (?) CPU governor is working as it should.

And then of course we have the highly entertaining car analogy. Today all cars are using the CAN BUS and most (even very old ones) have ODB2 access that can be used for all sorts of diagnostics, including to disable various warning lights. You can use your own $12 ODB2 BT dongle and disable any warning with your own phone. And anyone who has had an Audi, VW or BMW also know that some of those engine warning lights come on for absolutely no other reason than annoyance, in order to ask the owner to take the car to their own service centers for checkup after some X miles and force you to pump in extra $$$ for the vendors. (A strategy very similar to having to buy the RPi foundation's magic 5.4V/2.5A power supply.)

@ThomasKaiser

This comment has been minimized.

Copy link

ThomasKaiser commented Apr 24, 2018

I really do think there is a performance hit, when those messages are enabled. The IO reaction just seem more jumpy and laggy while the kernel logs are spammed.

So not only you try to power your Pi the worst way possible but also run off an SD card from hell? :)

The ext4 standard commit interval is 5 seconds. So when you really see your system lagging caused by some laughable disk activity every 5 seconds you should seriously consider replacing your SD card. Random IO performance is important if you suffer from such issues, the vast majority of SD cards pretty much sucks here which is why it's important to only buy SD cards that are A1 or A2 compliant any more (last post of this thread contains numbers for SanDisk A1 cards). These perform magnitudes higher compared to average SD cards. Random IO with small block sizes (writing some log contents) can be 100 to 500 times faster.

But given how you try to not improve your underpowering situation most probably you're only interested in masquerading this other problem too? Adding commit=600 to /etc/fstab will do the job.

If you're interested in diagnosing the problem:

sudo apt install sysstat
sudo iostat 10

(watch for the %iowait percentage since this tells you how much your whole system being stuck in IO)

@E3V3A

This comment has been minimized.

Copy link
Author

E3V3A commented Apr 24, 2018

FYI: This is a copy/paste excerpt from the USB 2.0 specifications:

  • Low-power functions must be capable of operating with input VBUS voltages as low as
    4.40 V, measured at the plug end of the cable.

  • High-power functions must be capable of operating in their low-power (one unit load) mode with an
    input voltage as low as 4.40 V
    , so that it may be detected and enumerated even when plugged into a buspowered hub. They must also be capable of operating at full power (up to five unit loads) with a VBUS voltage of 4.75 V, measured at the upstream plug end of the cable.

The power source and sink requirements of different device classes can be simplified with the introduction of the concept of a unit load. A unit load is defined to be 100 mA. The number of unit loads a device can draw is an absolute maximum, not an average over time. A device may be either low-power at one unit load or high-power, consuming up to five unit loads. All devices default to low-power. The transition to high-power is under software control. It is the responsibility of software to ensure adequate power is available before allowing devices to consume high-power.


My Measurements:

# Voltage across GPIO pins 4 & 6
Under no load:      4.86 V
Under CPU load:     4.46 V

# Voltage @ PSU:    
Under no load:      5.30 V @ ~300 mA
Under CPU load:     5.40 V @ ~950 mA  <-- I have a good PSU!

# Voltage with no load:
@ PP1/2 : 4.92 V
@ PP35  : 4.89 V
@ PP7   : 4.86 V

# Voltage with CPU load:
@ PP1/2 : 4.64 V
@ PP35  : 4.60 V
@ PP7   : 4.58 V

NOTE:
All tests was based with the following connected USB peripherals:

Bus 001 Device 005: ID 0d8c:000c C-Media Electronics, Inc. Audio Adapter    # USB Sound Card
Bus 001 Device 004: ID 05af:0906 Jing-Mold Enterprise Co., Ltd              # Wireless Keyboard

CPU stress load was performed with:
for ((i=0; i<$(nproc --all); i++)); do nice yes >/dev/null & done

Now this indicate that either I have a really shitty cable/connection (at the RPi end) or that there is something else wrong internally. It also explains the under-voltage warning ping-pong effect, because it is so close to the 4.63 V threshold.


And my "SD card from hell" is doing just fine reading 20MB/s and writing ~8 MB/s... without any SD card reader performance hacks.

@ThomasKaiser

This comment has been minimized.

Copy link

ThomasKaiser commented Apr 25, 2018

And my "SD card from hell" is doing just fine reading 20MB/s and writing ~8 MB/s... without any SD card reader performance hacks.

You never click on URLs and don't follow suggestions, right? :)

You are talking about sequential performance which is 99% irrelevant with SBC (they matter with digital cameras and video recorders and such 'streaming' use cases). What's really important with SBC is random IO and here SD cards that show laughable ~8MB/s sequential writes are usually slow as hell with random IO. We've seen such cards being as slow as 2 IOPS (IO operations per second) with 16K access patterns. While good A1 rated cards are 250 to 500 times faster! It's all about IOPS and MB/s are somewhat irrelevant.

https://forum.armbian.com/topic/954-sd-card-performance/?page=3&tab=comments#comment-49811

And also again: Use iostat 10 in parallel and watch the %iowait percentage and the amount of data written. If this is constantly high your SD card needs a replacement.

It makes me really sad to see you behaving that ignorant and even actively promoting such weird ideas as setting avoid_warnings=2. I hope RPi folks will remove this ability with next firmware update since as can be clearly seen it's a horrible idea increasing support efforts for no reasons...

@jakemagee

This comment has been minimized.

Copy link

jakemagee commented Apr 25, 2018

So... it sounds like you found your issue @E3V3A

Now this indicate that either I have a really shitty cable/connection (at the RPi end) or that there is something else wrong internally. It also explains the under-voltage warning ping-pong effect, because it is so close to the 4.63 V threshold.

Do you have different types of cables to test with? Do you have other RPi boards to test with?

@JamesH65

This comment has been minimized.

Copy link
Contributor

JamesH65 commented Apr 25, 2018

It makes me really sad to see you behaving that ignorant and even actively promoting such weird ideas as setting avoid_warnings=2. I hope RPi folks will remove this ability with next firmware update since as can be clearly seen it's a horrible idea increasing support efforts for no reasons...

Had a quick look at the code that deals with "avoid_warnings", it's all a bit odd and difficult to follow, but I suspect you are right, that should not stop logging of low voltage issues. What it should do is stop display of the warnings (lightning bolt), but still do the logging. We'll discuss in house.

@ThomasKaiser

This comment has been minimized.

Copy link

ThomasKaiser commented Apr 25, 2018

Now this indicate that either I have a really shitty cable

You simply described the average Micro USB cable out there. They were never intended to carry more than 500 mA which pretty much describes why powering through Micro USB is such a mess if users do not spend the extra money on an extra quality PSU with fixed cable showing low resistance (PSUs with fixed cable have to provide the advertised voltage at the connector side so cable resistance is already taken into account. This makes a huge difference compared to the situation with USB PSU and separate Micro USB cable)

I know you don't visit links so as an embedded table:
usb-cable-voltage-drop

The following link provides in a hopefully understandable way the voltage drop situation/challenge with average Micro USB cables (usually having power lines with 26 or even 28 AWG rating): https://www.cnx-software.com/2017/04/27/selecting-a-micro-usb-cable-to-power-development-boards-or-charge-phones/

@E3V3A

This comment has been minimized.

Copy link
Author

E3V3A commented Apr 25, 2018

@ThomasKaiser

First I would like to complement you, for your large effort in trying to help and convince us to all and above. Although, I am often annoyed by your suggestions, I really appreciate them! Perhaps, just because you are able to argue with decent proofs even if you are clearly disagreeing with most of my own ideology. :) Also, thanks for that cable table.

You never click on URLs and don't follow suggestions, right?

Blindly clicking on URL's doesn't mean I do not follow or take well founded advice and suggestions. I'm very well aware of the poor (and fake) SD cards out there, and have been for a long time. However, my everyday use very rarely need that high performance as you suggest. The everyday way I use my RPi is simply not requiring that type of high IO R/W speeds. So until I need better performance or start seeing serious errors, I'll keep using what I have. If you wanna send me a 100 EUR SD card, please do so.

...setting avoid_warnings=2. I hope RPi folks will remove this ability with next firmware update.

This must be the most idiotic suggestion I have seen from you so far. It goes against all unwritten rules of -- and fundamental nature of DIY projects. If people wanna run their HW (or cars) until they brake, they should be able to do so without someone like you pointing fingers at them for being "foolish". If you guys had spent even a fraction of the time actually fixing bugs and issues, instead of arguing against them and the people reporting them, we would be far better off, and not with serial bombardments of broken updates.

As I said above, thousands of people use their Pi's for all sort of small projects, and keep them running that way, until they get this great idea to update, just to find out all hell break loose. Close to every time!
People I talk to back in the MagicMirror community are all agreeing on one thing:
If your MM is working, never, ever update the firmware or kernel!

The bottom line. For someone who's running the PiHole or some other display app it doesn't matter very much. But when someone has integrated a multitude of peripherals such as face recognition, voice recognition, Cloud API, external sensors like PIR, Ultrasonics, light detection, IR remote, external GPIO controls, and various USB devices, like SDRs, all on the same device, then your broken updates are not so fun anymore. You do it once or twice, then you say "I will never update this again.".

@JamesH65

I suspect you are right, that should not stop logging of low voltage issues. What it should do is stop display of the warnings (lightning bolt), but still do the logging.

Your reasoning is very scary, especially considering you are part of the RPi foundation! Will you sleep better at night if you know that my SD card is getting filled up by repeated logs, all by my own choice? I think you have become very biased and I just don't see or hear any valid reasoning for doing what TK and you are suggesting. What exactly do you hope to gain by doing this?

Here is a free money making suggestion from me:
In your next iteration of any future Raspberry Pi N (4?) you should make sure to:

  1. Use 3A USB-C connectors
  2. Conform to the USB standards
  3. Include a low loss connector cable
  4. Include the magic power supply if the device doesn't comply with (2).
  5. Include the best suitable SD card
  6. Include a working built in soundcard with MIC/AUX

Each one of those alone, will save you tons of time and money since you will be able to put all your current efforts into sales, marketing, development and production, instead of arguing.

Just let us all know when this will happen, because that will be the day I will stop updating and buying your HW, permanently.


@jakemagee

Do you have different types of cables to test with?

Yes, in fact, I just tried today with another cable, but the improvement was minimal but clearly noticeable.

# Voltage with no load:
@ PP1/2 : 5.00 V
@ PP35  : 4.98 V
@ PP7   : 4.93 V

# Voltage with CPU load:
@ PP1/2 : 4.68 V
@ PP35  : 4.64 V
@ PP7   : 4.58 V
@ThomasKaiser

This comment has been minimized.

Copy link

ThomasKaiser commented Apr 25, 2018

If people wanna run their HW (or cars) until they brake, they should be able to do so without someone like you pointing fingers at them

If those people would not open up support issues and believing they have to blame software for their hardware issues everything would be fine. But they do and report the various unnecessary results of their underpowering adventures as 'bugs' wasting their own and other's time.

I'm contributing to an open source project dealing with all sorts of SBC (except Raspberries). The vast majority of 'software issues' people have are in reality

  • underpowering (affects almost only those boards with Micro USB -- those with barrel plugs where users had to buy a good PSU with a fixed cable are usually not affected)
  • something went wrong with the SD cards (we could eliminate a lot of these issues by only recommending Etcher any more and by educating users via motd if they run off a crappy SD card, we benchmark the card at first boot automagically)

Being able to differentiate between those hardware issues and real issues is essential if you want to spend time on software issues and not just dealing with ignorance (as yours -- I really can't believe you're still refusing to fix your underpowering situation). So now that undervoltage logging is in place it would be fatal if users can masquerade this since as we can see from you they're even encouraged to do so...

Since you seem to love using inappropriate hardware (be it powering or storage) I already recommended adding commit=600 to /etc/fstab mitigate crappy random IO performance of your SD card (seriously: if a few log lines every few seconds result in a laggy system or you fear logging in general this is a great idea also drastically reducing wear on flash media -- 'my' distro for this purpose implements log2ram writing log contents back to 'disk' just every hour by default)

@jacobq

This comment has been minimized.

Copy link

jacobq commented Apr 25, 2018

I don't want to get mixed-up in this very long winded discussion, but FWIW I will say that I stumbled across it looking for a way to suppress kernel messages from the console (in my experience this has made bad problems worse as I'm trying to triage things and shutdown but get messages printed right over files in my editor, etc.) and there are some ways to do this, such as dmesg -n 1 see
https://superuser.com/questions/351387/how-to-stop-kernel-messages-from-flooding-my-console#answer-351402
A previous comment suggested that this does not work, but it seemed to work fine for my purposes (i.e. on RPi 3 B+ it stopped kernel messages from getting printed to my console though they still appear in the output of dmesg)

@advcron

This comment has been minimized.

Copy link

advcron commented Apr 26, 2018

I had the same problem. I used 5V 2A charger and dmesg flooded

Under-voltage detected! (0x00050005)
Voltage normalised (0x00000000)

So I bought convertert Module DC-DC (I am lowering voltage from 12V to 5V) hy196_0815.
hy196_0815
The errors didn't disapeard. Next I change mircorusb cabel (From huawei P9 lite) and I think that was it.
RPI 3 B+ running until now 12 hours, and errors not appeard in dmesg.
So good cable is very important.

@JamesH65

This comment has been minimized.

Copy link
Contributor

JamesH65 commented Apr 26, 2018

Your reasoning is very scary, especially considering you are part of the RPi foundation! Will you sleep better at night if you know that my SD card is getting filled up by repeated logs, all by my own choice? I think you have become very biased and I just don't see or hear any valid reasoning for doing what TK and you are suggesting. What exactly do you hope to gain by doing this?

Here is a free money making suggestion from me:
In your next iteration of any future Raspberry Pi N (4?) you should make sure to:

Use 3A USB-C connectors
Conform to the USB standards
Include a low loss connector cable
Include the magic power supply if the device doesn't comply with (2).
Include the best suitable SD card
Include a working built in soundcard with MIC/AUX
Each one of those alone, will save you tons of time and money since you will be able to put all your current efforts into sales, marketing, development and production, instead of arguing.

Just let us all know when this will happen, because that will be the day I will stop updating and buying your HW, permanently.

Thanks for the advice, to a Foundation that's sold 19M devices, completely changed the SBC market, provided millions of pounds to education from the profits. I'm sure you advice will completely change our business. I've given pointers where your advice would not actually make more money below.

The problem here, is that you have an opinion and are unwilling to accept that opinion is wrong. Which in my opinion, it is. Not scary, just an opinion that is different to yours. You are also unwilling to fix your perennial under voltage problem, which would make all the messages go away, for reasons which are still unclear.

We have now added rate logging to the messages. This limits to three messages every 5 minutes. If your log is STILL filling up with messages FIX THE DAMN POWER SUPPLY! IT IS INADEQUATE! I cannot understand what is so difficult to understand about that.

Points in reply to some of the above, without actually giving away what the Pi4 will actually have on it.

USB standards. The SoC has a inbuilt USB device, which sadly is a bit crap but we cannot do anything about that, but with the ARM FiQ we have made it work pretty well. The hub chip, which also provides the ethernet, is USB compliant.

There is no point is providing SD card, power supply, cable etc, because then every purchaser of a Pi would also get a full set every time, and not everyone wants that. It would also make the headline price too expensive. Those are all terrible ideas. You can buy kits with all those in though.....

The SOC also contains a decent sound system that output via the HDMI which for the vast majority of people is fine. There is again, no point in making the Pi more expensive for everyone, with a feature that only a few people use. You fallen in to the same trap as your thinking on the logging - thinking your use case is the important one, where in fact you are just one of many millions of users. The needs of the many outweigh the needs of the few.

As for saving money - the point of logging in dmesg for low power is there for exactly that - to reduce support issues!

As for money for marketing, development, production - we have plenty.

@E3V3A

This comment has been minimized.

Copy link
Author

E3V3A commented Apr 26, 2018

So I'm driving down the road in my 2 year old BMW. It is far from their top-of-the-line performance models, but it is neither the low-end one. It's a great car and and it's been taking me back and forth to work, to weekend outings, and occasional long tours, for 2 years. Since, I do not intend to take it for top-of-the speed daily rushes across Europe on autobahn, it has served me perfect and done a great job so far.

However, one day the service light come on, indicating that it is time to take the car in for the 1000 mile service checkup. So since the winter it just about to roll around and I decide to take it in for pre-autumn service. I do so and the mechanic says all is in order, but has updated the system and reset service indicator. All good.

Since the roads up here can get icy, I decide to put back my winter tires I used from last year, and after I do that, suddenly the critical Engine emergency warning light come on! I'm confused, since I just got it back from service! I call up the mechanic and tell him about my problem. He ask if I made any changes since visiting him last week. I said, not really, just put on my winter tires I've used last year. He says "Oh! What kind are they?", so I answer, "Well, I found these standard Bridegstone Winter tires, and decided I did not need to spend the extra money on those custom BWM High-performance winter tires you sell." He answers: "Oh, that explains it! You are using shitty tires, and we just implemented a detection and warning system into the car, that detect if you are not using our custom tires." So I say, "Ah, ok. Thank you for explaining the situation, but I'll keep the tires I already have for another season, since they worked just fine last year."

The next day, I'm driving down the highway, trying to overpass a slower truck. I speed up to 1200 Hm/h
[1200 Hecto-meters = 120 Km], and as I'm just next to the truck, suddenly my engine stops accelerating
and the engine and the car suddenly put me at grinding halt of 600 Hm/h. At the same exact instant the inside compartment light start blinking every few seconds, almost blinding me in the autumn dusk, while the critical engine light is doing the same. I almost have a head-on collision because of this episode, and decide to call my old colleague Mr.Ferrari, and explain it to him.

The next day I go to his garage where he has a an ODB2 analyzer. He explains that it is not the car itself, nor the tires that are the problem, but that it is the new update to the cars software. But that he can disable it. So I decide to disable these dangerous (and now useless) warnings about my non-OEM tires. I go riding off happily into the sunset...or so I though!

Then I drive in to a cloud of mosquitoes, and my windshield is gored up. I turn on the vipers, only to find
that one of the wiper blades is a bit worn out. I make a mental note to myself to fix it next time I have a
chance. But before I even get to the end of that thought. Suddenly out of the back-seat mid compartment a jack-in-the-box pop out like a spring loaded bullet, and screams "YOU HAVE SHIT TIRES!! REPLACE THEM AND LISTEN TO ME!" in my right ear. I veer off the road into a dirty field and decide that putting my money into such a company is not worth the tires it stand on. I hitchhike back to reality of the super competitive embedded universe of IoT and find myself a Tesla, that can use any available tires, power sources and can run at any current or on any road at any speed you desire.

Peace at last!


The above excerpt was based on a true story, and will be written in the forthcoming book, "How big little companies become greedy and cocky, and then gets replaced."

@JamesH65

This comment has been minimized.

Copy link
Contributor

JamesH65 commented Apr 26, 2018

Cocky? Or just correct? I'm out.

@P33M P33M closed this Apr 26, 2018

@raspberrypi raspberrypi locked as off topic and limited conversation to collaborators Apr 26, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.