Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Official support of Raspberry Pi sensors possible? #10

Closed
graysky2 opened this Issue · 42 comments

2 participants

@graysky2

Is it possible for you to include support for the Raspberry Pi in the existing CPU temp and voltage plots?

The raspberry pi firmware comes with a util to generate these data which is by default installed to: /opt/vc/bin/vcgencmd

CPU

% vcgencmd measure_temp
temp=48.5'C

CPU Voltage

% vcgencmd measure_volts core
volt=1.20V

Thanks for the consideration!

@mikaku
Owner

Are these all the possible values shown?

@graysky2

Hi Jordi. The output of the vcgencmd measure_temp always looks like that for me. Of course, the temps change. The output of vcgencmd measure_volts core can change if the user is overclocking, but it will be x.xxV where x is a number.

If it is easier for you, the temp command can also be read directly from the filesystem:

% cat /sys/class/thermal/thermal_zone0/temp
47078

You see here that there is NO decimal place. In this case 47078 = 47.078 'C. Probably only need 1 decimal place.

Does that answer your questions?

@mikaku
Owner

Hi graysky,

Well, I was just wanted to know if there are other devices in /sys/class/thermal/thermal_zone0/ that can be monitored or there is only the file tempthere?

Another question, is that directory /sys/class/thermal/thermal_zone0/ the same in all Raspberry Pi implementations, or it could change?

Thanks.

@graysky2

Jordi - No, that is the only device to my knowledge. I believe that is the same for all RPi Linuxes.

@mikaku
Owner

If there is only the file temp in the /sys/class/thermal/thermal_zone0/ directory, in which directory is located the file with voltage values?

Also, can you paste the output of the command vcgencmd --help or the list of accepted arguments?
Thanks.

@graysky2

The voltage values are only accessible through the firmware util vcgencmd but both the temp is available in both places:

  1. cat /sys/class/thermal/thermal_zone0/temp
  2. vcgencmd measure_temp

Interestingly, there is no help or a manpage for vcgencmd but it is well documented on this wiki.

Here are the relevant ones with example output. Please let me know if you need anything else.

Output can fit into existing monitorix graphs

Temperature

vcgencmd measure_temp
temp=47.6'C

Core Voltage

vcgencmd measure_volts core
volt=1.20V

Output does not have a place in existing monitorix graphs but would be cool to have!

ARM Frequency

The CPU frequency in Hz so you need to divide by 1,000,000 to get MHz:
vcgencmd measure_clock arm
frequency(45)=200000000

Note that this changes if the system is under load or not. Above it is idle. Below it is doing work.
vcgencmd measure_clock arm
frequency(45)=800000000

Core Frequency

% vcgencmd measure_clock core
frequency(1)=200000000

Note that this changes if the system is under load or not. Above it is idle. Below it is doing work.
vcgencmd measure_clock arm
frequency(45)=300000000

SDRAM Voltage

vcgencmd measure_volts sdram_c
volt=1.20V

I/O Voltage

vcgencmd measure_volts sdram_i
volt=1.20V

PNY Voltage

vcgencmd measure_volts sdram_p
volt=1.23V

Note that I am not modifying the voltages on mine but other users who overclock can so not all will be 1.20V.

@mikaku
Owner

This is really interesting, I think that there is enough information to create a new Raspberry Pi sensors graph.
I'll check the wiki and I'll start creating such graph.

Many thanks!

@graysky2

Nice, thanks. I don't think there is much else of relevance to include. I think the main ones are CPU temp, ARM frequency, core voltage. Let me know if you're needing some testing; I will gladly pull down the devel branch.

@graysky2

Just for kicks, I installed Raspbian to another SD card and can verify all functionality as described for Arch ARM above. Your RPi graph should work regardless of distro in my opinion.

@mikaku
Owner

Just for kicks, I installed Raspbian to another SD card and can verify all functionality as described for Arch ARM > above. Your RPi graph should work regardless of distro in my opinion.

These are indeed great news.

I've finished the new standalone Raspberry Pi graph, which includes up to 9 clock frequencies, up to 3 different temperatures, and up to 6 different voltages. The following is an screen shot:

raspberry
(The values represented are fictional)

Please, check the devel branch and download the raspberrypi.pm file and place it into your /usr/lib/monitorix/ directory. Then edit your /etc/monitorix.conf and introduce accordingly the new options.

And let me know if you find any issue or something needs improvement.

@graysky2

Very nice, Jordi. I am testing it out now!

@graysky2

Seems to be working but then it just stopped... not just the new pi graph but everything.

odd

% cat /var/log/monitorix
Mon May  6 18:10:03 2013 - Starting Monitorix version 3.1.900 (pid 25090).
Mon May  6 18:10:04 2013 - fs::fs_init: Unable to detect the device name of '/dev/root'. I/O stats for this filesystem won't be shown in graph.
Mon May  6 18:10:04 2013 - fs::fs_init: Unable to detect the device name of 'myth:/media'. I/O stats for this filesystem won't be shown in graph.
Mon May  6 18:10:05 2013 - Built-in HTTP server pid is '25104'.
HTTPServer: You can connect to your server at http://localhost:9000/
@mikaku
Owner

I've left mine working during all night and it's still working finely right now. I can't imagine what is happening there.

  • how many instances of Monitorix are running?
  • are your .rrd files updating correctly? (check their time stamps).
  • make sure that /var/log/monitorix has not been rotated and you are seeing in an outdated file.

Regarding the fs error messages, it looks like you have defined some mount points that does not exist in your system (Monitorix comes with the /boot mount point predefined).

Alternatively, paste your current <fs> section from your /etc/monitorix.conf and also the output of the df -P command.

Thanks.

@graysky2
  • Just one
  • Dunno, next time this happens I will check.
  • Yes, no rotation happened.

I have seen this happen before, but forgot to follow-up. Next time it happens I will open a new ticket with logs and info.

About the fs error messages are probably due to the fact that one of the mounts is only there if the NAS is up. I am not concerned about that.

Now, since a reboot, it has been logging away nicely. The pi graphs are very nice by the way:
again

@mikaku
Owner

Now, since a reboot, it has been logging away nicely. The pi graphs are very nice by the way:

Glad to hear that is working again.
And yeah, the graphs look very nice! :)

Please, don't forget to send me a complete daily graph to include it in the Screenshot section of the Monitorix web site.

@graysky2

Glad to... man, it happened again:
fuck

Here are the three things you asked about:

% ps aux | grep monitorix
root       258  0.1  1.9  45260  9080 ?        Ss   May07   3:51 /usr/bin/monitorix -c /etc/monitorix.conf -p /run/monitorix.pid
nobody     293  0.0  1.7  46188  8396 ?        Ss   May07   0:04 monitorix-httpd listening on 9000

As you can see, the rrd files are indeed not getting updated:

% ls -l /var/lib/monitorix/
total 34468
drwxr-xr-x 2 root root      160 May  6 17:41 reports
-rw-r--r-- 1 root root  2253952 May  7 21:10 fs.rrd
-rw-r--r-- 1 root root 24022976 May  7 21:10 int.rrd
-rw-r--r-- 1 root root  1690960 May  7 21:11 kern.rrd
-rw-r--r-- 1 root root  5631904 May  7 21:10 net.rrd
-rw-r--r-- 1 root root  1690960 May  7 21:10 raspberrypi.rrd

When I inspect the log now, it is full of errors:

% head -n 30 /var/log/monitorix
...
Tue May  7 04:17:00 2013 - Starting Monitorix version 3.1.900 (pid 258).
Tue May  7 04:17:03 2013 - fs::fs_init: Unable to detect the device name of '/dev/root'. I/O stats for this filesystem won't be shown in graph.
Tue May  7 04:17:03 2013 - fs::fs_init: Unable to detect the device name of ''. I/O stats for this filesystem won't be shown in graph.
Tue May  7 04:17:06 2013 - Built-in HTTP server pid is '293'.
Argument "mmcblk0p1" isn't numeric in multiplication (*) at /usr/lib/monitorix/fs.pm line 374.
Argument "mmcblk0p1" isn't numeric in multiplication (*) at /usr/lib/monitorix/fs.pm line 374.
Argument "mmcblk0p1" isn't numeric in multiplication (*) at /usr/lib/monitorix/fs.pm line 374.
Argument "mmcblk0p1" isn't numeric in multiplication (*) at /usr/lib/monitorix/fs.pm line 374.
Argument "mmcblk0p1" isn't numeric in multiplication (*) at /usr/lib/monitorix/fs.pm line 374.
Argument "mmcblk0p1" isn't numeric in multiplication (*) at /usr/lib/monitorix/fs.pm line 374.
Argument "mmcblk0p1" isn't numeric in multiplication (*) at /usr/lib/monitorix/fs.pm line 374.
...

I think the problem occurs when the NAS goes down as it did last night around 21:00. Here is a little script I wrote that checks to see if the NAS is up and if it is, to mount the NFS export that the RPi uses:

#!/bin/bash

SERVER_EXPORT='NAS:/media'
MOUNT_TARGET='/mnt/media'

# Nothing to do if user does not have requisite binaries.
[[ -z $(which ping) ]] && echo 'Install iputils or whatever package provides ping' && exit 0
[[ -z $(which mountpoint) ]] && echo 'Install util-linux or whatever package provides mountpoint' && exit 0

ping -c 1 NAS &>/dev/null
if [ $? -ne 0 ]; then
    # server is down so unmount
    #
    # if we query the mount point and it was previously mounted, the script freezes
    # so just unmount forcing while lazy
    umount -l -f $MOUNT_TARGET &>/dev/null
else
    # server is up
    #
    # check if mount point is live and try to mount if not
    mountpoint -q $MOUNT_TARGET || mount -t nfs4 $SERVER_EXPORT $MOUNT_TARGET
fi

Can you speculate why when the ping returns a null and the umount step is called, monitorix would behave this way? I will remove /mnt/media from the FS graph and see if it remains up.

@graysky2

I just emailed you the screenshots you requested. I will continue running v3.1.92 (dev) on the rpi. No errors to report after 24 h.

...should I open a new ticket for the crash I reported above that happens if users remove a mount point that is being monitored?

@mikaku
Owner

Looks like you are including a network filesystem in the fs graph, and when the NAS goes down Monitorix hangs on the df command.

So, try removing that mount point from the list in <fs> and see if your unexpected hangs are gone.
Let me know.

@graysky2

Yes, removing it from the fixes the hangs. Can you think of a solution to this? Most people will never shut their raspberry pi machines off (1-2 Watts of power consumption) and also most people running raspberry pi machines will use NFS or Samba shares since these PCs have small storage (just SD card). I think this would affect other users too is the point :)

@mikaku
Owner

This is a known problem pending to be solved in my TODO list.
I'll try to give it a push and the next version.

@graysky2

Cool, just wanted to make sure you knew about it.

@mikaku mikaku referenced this issue from a commit
@mikaku Reimplemented the main loop with the sighandler alarm inside in order…
… to be able to control timeouts in the 'disk' graph. This should avoid a complete freeze if the network goes down when monitoring NFS filesystems. [#10]
e27dad0
@mikaku
Owner

graysky, I've included a modification to avoid these hangs when network goes down when monitoring NFS file systems.

Please, check the devel branch and let me know if this fixed that issue.
Thanks.

@mikaku mikaku was assigned
@graysky2

Ouch... I do not use the nfs share in /etc/fstab anymore because it causes hangs. These hangs are bad for the system in general and go beyond the annoyance of monitorix graphs breaking. I use a trivial little shell script to ping the server once/min and mount if up then umount if down.


SERVER_EXPORT='10.1.10.101:/share'
MOUNT_TARGET='/mnt/share'

# Nothing to do if user does not have requisite binaries.
[[ -z $(which ping) ]] && echo 'Install iputils or whatever package provides ping' && exit 0
[[ -z $(which mountpoint) ]] && echo 'Install util-linux or whatever package provides mountpoint' && exit 0

ping -c 1 myth &>/dev/null
if [ $? -ne 0 ]; then
    # server is down so unmount
    #
    # if we query the mount point and it was previously mounted, the script freezes
    # so just unmount forcing while lazy
    umount -l -f $MOUNT_TARGET &>/dev/null
else
    # server is up
    #
    # check if mount point is live and try to mount if not
    mountpoint -q $MOUNT_TARGET || mount -t nfs4 $SERVER_EXPORT $MOUNT_TARGET
fi
@mikaku
Owner

Ok, anyway the new modification seems working fine here.
Feel free to use it!

@graysky2

I will disable my script and mount via /etc/fstab over the weekend as well as bring down the NAS and see if your modifications work as expected. Just can't do it until then.

@graysky2

...actually, wait a sec. I have my production http server that is running 3.2.1 right now (not the RPi). I will clone and install it now on that box.

@graysky2

OK... devel is running on the server. I added a quick samba share to /mnt/foo and am letting monitorix log for an hour or so, I will then umount /mnt/foo and see what happens to the fsusage graph.

@mikaku
Owner

Keep in mind that umounting a filesystem might be not the same than if the network goes down on a NFS filesystem.

So, better if you test it with your original settings with an NFS filesystem.
Anyway, let me know how it works.

@graysky2

OK. I added /mnt/share (which is served up from a VM which I will bring up and down over the course of the day). We will see how the graphs are affected and if you code fixes the problem.

@mikaku
Owner

Ok, I'll cross my fingers!

@graysky2

OK, it does not work. Here I started with the NAS up and mounted via /etc/fstab entry. After a few min, I shutdown the NAS and all the graphs broke.

broke

@mikaku
Owner

Make sure you are testing it using the devel branch, and also can you, please, paste the current /var/log/monitorix?

Thanks.

@graysky2

I was on devel when I downloaded the src zip file. If I look at usr/lib/monitorix/fs.pm for example, it matches the one in e27dad0

  • 13:12 PM I did a fresh install of monitorix-devel from the commit above and started monitorix fresh with the mount /mnt/share up and running.
  • 13:22 PM I shutdown the NAS.
  • 13:32 PM Started the NAS.

Here is the log you asked for and note that even though line 5 says that nas:/media is not shown in graph, it is in the graph (it is mounted as /mnt/share).

Thu Jun  6 13:12:27 2013 - Starting Monitorix version 3.2.1 (pid 625).
Thu Jun  6 13:12:27 2013 - Creating '/var/lib/monitorix/system.rrd' file.
Thu Jun  6 13:12:27 2013 - Creating '/var/lib/monitorix/proc.rrd' file.
Thu Jun  6 13:12:27 2013 - Creating '/var/lib/monitorix/fs.rrd' file.
Thu Jun  6 13:12:28 2013 - fs::fs_init: Unable to detect the device name of 'nas:/media'. I/O stats for this filesystem won't be shown in graph.
Thu Jun  6 13:12:28 2013 - Creating '/var/lib/monitorix/net.rrd' file.
Thu Jun  6 13:12:28 2013 - Creating '/var/lib/monitorix/user.rrd' file.
Thu Jun  6 13:12:28 2013 - Built-in HTTP server pid is '643'.
HTTPServer: You can connect to your server at http://localhost:8080/

Screenshot of several graphs but all have broken once the NAS went down:
broke2

Here it is zoomed:
fs01z 1day

@mikaku mikaku referenced this issue from a commit
@mikaku Reimplemented the main loop with the sighandler alarm inside in order…
… to be able to control timeouts in the 'disk' graph. This should avoid a complete freeze if the network goes down when monitoring NFS filesystems. [#10]
7212580
@mikaku
Owner

You're right, I forgot to push the last commit which implements all the new sighandler mechanism.
Please, download again the whole devel, or just the monitorix file, which held the pending commit.

Thanks, and sorry for the inconveniences.

@graysky2

OK. I pulled and build fe4b61b and am testing now.

@graysky2

Seems to be working...
good

@mikaku
Owner

Great!
The /var/log/monitorix file should reflect the timeouts of the NFS filesystems.

@graysky2

Sure does....

Fri Jun  7 15:58:16 2013 - Starting Monitorix version 3.2.1 (pid 802).
Fri Jun  7 15:58:16 2013 - Creating '/var/lib/monitorix/system.rrd' file.
Fri Jun  7 15:58:16 2013 - Creating '/var/lib/monitorix/proc.rrd' file.
Fri Jun  7 15:58:16 2013 - Creating '/var/lib/monitorix/fs.rrd' file.
Fri Jun  7 15:58:16 2013 - fs::fs_init: Unable to detect the device name of 'nas:/media'. I/O stats for this filesystem won't be shown in graph.
Fri Jun  7 15:58:17 2013 - Creating '/var/lib/monitorix/net.rrd' file.
Fri Jun  7 15:58:17 2013 - Creating '/var/lib/monitorix/user.rrd' file.
Fri Jun  7 15:58:17 2013 - Built-in HTTP server pid is '820'.
HTTPServer: You can connect to your server at http://localhost:8080/
Fri Jun  7 16:19:15 2013 - fs::fs_update: Timeout! Process with PID 1593 was hung after 15 secs. Killed.
Fri Jun  7 16:20:15 2013 - fs::fs_update: Timeout! Process with PID 1612 was hung after 15 secs. Killed.
Fri Jun  7 16:21:15 2013 - fs::fs_update: Timeout! Process with PID 1652 was hung after 15 secs. Killed.
Fri Jun  7 16:22:15 2013 - fs::fs_update: Timeout! Process with PID 1671 was hung after 15 secs. Killed.
Fri Jun  7 16:23:15 2013 - fs::fs_update: Timeout! Process with PID 1690 was hung after 15 secs. Killed.
Fri Jun  7 16:24:15 2013 - fs::fs_update: Timeout! Process with PID 1730 was hung after 15 secs. Killed.
Fri Jun  7 16:25:15 2013 - fs::fs_update: Timeout! Process with PID 1748 was hung after 15 secs. Killed.
Fri Jun  7 16:26:15 2013 - fs::fs_update: Timeout! Process with PID 1788 was hung after 15 secs. Killed.
Fri Jun  7 16:27:15 2013 - fs::fs_update: Timeout! Process with PID 1806 was hung after 15 secs. Killed.
Fri Jun  7 16:28:15 2013 - fs::fs_update: Timeout! Process with PID 1825 was hung after 15 secs. Killed.
Fri Jun  7 16:29:15 2013 - fs::fs_update: Timeout! Process with PID 1865 was hung after 15 secs. Killed.
Fri Jun  7 16:30:15 2013 - fs::fs_update: Timeout! Process with PID 1883 was hung after 15 secs. Killed.
Fri Jun  7 16:31:15 2013 - fs::fs_update: Timeout! Process with PID 1923 was hung after 15 secs. Killed.
Fri Jun  7 16:32:15 2013 - fs::fs_update: Timeout! Process with PID 1942 was hung after 15 secs. Killed.
Fri Jun  7 16:33:15 2013 - fs::fs_update: Timeout! Process with PID 1960 was hung after 15 secs. Killed.
Fri Jun  7 16:34:15 2013 - fs::fs_update: Timeout! Process with PID 2000 was hung after 15 secs. Killed.
Fri Jun  7 16:35:15 2013 - fs::fs_update: Timeout! Process with PID 2018 was hung after 15 secs. Killed.
Fri Jun  7 16:36:15 2013 - fs::fs_update: Timeout! Process with PID 2058 was hung after 15 secs. Killed.
Fri Jun  7 16:37:15 2013 - fs::fs_update: Timeout! Process with PID 2076 was hung after 15 secs. Killed.
Fri Jun  7 16:38:15 2013 - fs::fs_update: Timeout! Process with PID 2094 was hung after 15 secs. Killed.
Fri Jun  7 16:39:15 2013 - fs::fs_update: Timeout! Process with PID 2112 was hung after 15 secs. Killed.
Fri Jun  7 16:40:15 2013 - fs::fs_update: Timeout! Process with PID 2130 was hung after 15 secs. Killed.
Fri Jun  7 16:41:15 2013 - fs::fs_update: Timeout! Process with PID 2148 was hung after 15 secs. Killed.
Fri Jun  7 16:42:15 2013 - fs::fs_update: Timeout! Process with PID 2166 was hung after 15 secs. Killed.
Fri Jun  7 16:43:15 2013 - fs::fs_update: Timeout! Process with PID 2185 was hung after 15 secs. Killed.
Fri Jun  7 16:44:15 2013 - fs::fs_update: Timeout! Process with PID 2203 was hung after 15 secs. Killed.
Fri Jun  7 16:45:15 2013 - fs::fs_update: Timeout! Process with PID 2221 was hung after 15 secs. Killed.
Fri Jun  7 16:46:15 2013 - fs::fs_update: Timeout! Process with PID 2239 was hung after 15 secs. Killed.
Fri Jun  7 16:47:15 2013 - fs::fs_update: Timeout! Process with PID 2257 was hung after 15 secs. Killed.
Fri Jun  7 16:48:15 2013 - fs::fs_update: Timeout! Process with PID 2275 was hung after 15 secs. Killed.
Fri Jun  7 16:49:15 2013 - fs::fs_update: Timeout! Process with PID 2293 was hung after 15 secs. Killed.
Fri Jun  7 16:50:15 2013 - fs::fs_update: Timeout! Process with PID 2311 was hung after 15 secs. Killed.
Fri Jun  7 16:51:15 2013 - fs::fs_update: Timeout! Process with PID 2329 was hung after 15 secs. Killed.
Fri Jun  7 16:52:15 2013 - fs::fs_update: Timeout! Process with PID 2347 was hung after 15 secs. Killed.
Fri Jun  7 16:53:15 2013 - fs::fs_update: Timeout! Process with PID 2366 was hung after 15 secs. Killed.
Fri Jun  7 16:54:15 2013 - fs::fs_update: Timeout! Process with PID 2384 was hung after 15 secs. Killed.
Fri Jun  7 16:55:15 2013 - fs::fs_update: Timeout! Process with PID 2402 was hung after 15 secs. Killed.
Fri Jun  7 16:56:15 2013 - fs::fs_update: Timeout! Process with PID 2420 was hung after 15 secs. Killed.
Fri Jun  7 16:57:15 2013 - fs::fs_update: Timeout! Process with PID 2438 was hung after 15 secs. Killed.
Fri Jun  7 16:58:15 2013 - fs::fs_update: Timeout! Process with PID 2457 was hung after 15 secs. Killed.
Fri Jun  7 16:59:15 2013 - fs::fs_update: Timeout! Process with PID 2475 was hung after 15 secs. Killed.
Fri Jun  7 17:00:15 2013 - fs::fs_update: Timeout! Process with PID 2572 was hung after 15 secs. Killed.
Fri Jun  7 17:01:15 2013 - fs::fs_update: Timeout! Process with PID 2592 was hung after 15 secs. Killed.
Fri Jun  7 17:02:15 2013 - fs::fs_update: Timeout! Process with PID 2622 was hung after 15 secs. Killed.
Fri Jun  7 17:08:22 2013 - SIGTERM caught.
Fri Jun  7 17:08:23 2013 - Exiting.
@mikaku
Owner

Perfect!, this proof that it works!
;)

@mikaku
Owner

I think we can close this issue.

@graysky2 graysky2 closed this
@graysky2

:p

..good job!

@mikaku
Owner

Thanks for your feedback!
;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.