Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Official support of Raspberry Pi sensors possible? #10

Closed
graysky2 opened this Issue Apr 7, 2013 · 42 comments

Comments

Projects
None yet
2 participants
Contributor

graysky2 commented Apr 7, 2013

Is it possible for you to include support for the Raspberry Pi in the existing CPU temp and voltage plots?

The raspberry pi firmware comes with a util to generate these data which is by default installed to: /opt/vc/bin/vcgencmd

CPU

% vcgencmd measure_temp
temp=48.5'C

CPU Voltage

% vcgencmd measure_volts core
volt=1.20V

Thanks for the consideration!

Owner

mikaku commented Apr 8, 2013

Are these all the possible values shown?

Contributor

graysky2 commented Apr 8, 2013

Hi Jordi. The output of the vcgencmd measure_temp always looks like that for me. Of course, the temps change. The output of vcgencmd measure_volts core can change if the user is overclocking, but it will be x.xxV where x is a number.

If it is easier for you, the temp command can also be read directly from the filesystem:

% cat /sys/class/thermal/thermal_zone0/temp
47078

You see here that there is NO decimal place. In this case 47078 = 47.078 'C. Probably only need 1 decimal place.

Does that answer your questions?

Owner

mikaku commented Apr 12, 2013

Hi graysky,

Well, I was just wanted to know if there are other devices in /sys/class/thermal/thermal_zone0/ that can be monitored or there is only the file tempthere?

Another question, is that directory /sys/class/thermal/thermal_zone0/ the same in all Raspberry Pi implementations, or it could change?

Thanks.

Contributor

graysky2 commented Apr 12, 2013

Jordi - No, that is the only device to my knowledge. I believe that is the same for all RPi Linuxes.

Owner

mikaku commented Apr 15, 2013

If there is only the file temp in the /sys/class/thermal/thermal_zone0/ directory, in which directory is located the file with voltage values?

Also, can you paste the output of the command vcgencmd --help or the list of accepted arguments?
Thanks.

Contributor

graysky2 commented Apr 15, 2013

The voltage values are only accessible through the firmware util vcgencmd but both the temp is available in both places:

  1. cat /sys/class/thermal/thermal_zone0/temp
  2. vcgencmd measure_temp

Interestingly, there is no help or a manpage for vcgencmd but it is well documented on this wiki.

Here are the relevant ones with example output. Please let me know if you need anything else.

Output can fit into existing monitorix graphs

Temperature

vcgencmd measure_temp
temp=47.6'C

Core Voltage

vcgencmd measure_volts core
volt=1.20V

Output does not have a place in existing monitorix graphs but would be cool to have!

ARM Frequency

The CPU frequency in Hz so you need to divide by 1,000,000 to get MHz:
vcgencmd measure_clock arm
frequency(45)=200000000

Note that this changes if the system is under load or not. Above it is idle. Below it is doing work.
vcgencmd measure_clock arm
frequency(45)=800000000

Core Frequency

% vcgencmd measure_clock core
frequency(1)=200000000

Note that this changes if the system is under load or not. Above it is idle. Below it is doing work.
vcgencmd measure_clock arm
frequency(45)=300000000

SDRAM Voltage

vcgencmd measure_volts sdram_c
volt=1.20V

I/O Voltage

vcgencmd measure_volts sdram_i
volt=1.20V

PNY Voltage

vcgencmd measure_volts sdram_p
volt=1.23V

Note that I am not modifying the voltages on mine but other users who overclock can so not all will be 1.20V.

Owner

mikaku commented Apr 19, 2013

This is really interesting, I think that there is enough information to create a new Raspberry Pi sensors graph.
I'll check the wiki and I'll start creating such graph.

Many thanks!

Contributor

graysky2 commented Apr 19, 2013

Nice, thanks. I don't think there is much else of relevance to include. I think the main ones are CPU temp, ARM frequency, core voltage. Let me know if you're needing some testing; I will gladly pull down the devel branch.

Contributor

graysky2 commented Apr 23, 2013

Just for kicks, I installed Raspbian to another SD card and can verify all functionality as described for Arch ARM above. Your RPi graph should work regardless of distro in my opinion.

Owner

mikaku commented May 6, 2013

Just for kicks, I installed Raspbian to another SD card and can verify all functionality as described for Arch ARM > above. Your RPi graph should work regardless of distro in my opinion.

These are indeed great news.

I've finished the new standalone Raspberry Pi graph, which includes up to 9 clock frequencies, up to 3 different temperatures, and up to 6 different voltages. The following is an screen shot:

raspberry
(The values represented are fictional)

Please, check the devel branch and download the raspberrypi.pm file and place it into your /usr/lib/monitorix/ directory. Then edit your /etc/monitorix.conf and introduce accordingly the new options.

And let me know if you find any issue or something needs improvement.

Contributor

graysky2 commented May 6, 2013

Very nice, Jordi. I am testing it out now!

Contributor

graysky2 commented May 7, 2013

Seems to be working but then it just stopped... not just the new pi graph but everything.

odd

% cat /var/log/monitorix
Mon May  6 18:10:03 2013 - Starting Monitorix version 3.1.900 (pid 25090).
Mon May  6 18:10:04 2013 - fs::fs_init: Unable to detect the device name of '/dev/root'. I/O stats for this filesystem won't be shown in graph.
Mon May  6 18:10:04 2013 - fs::fs_init: Unable to detect the device name of 'myth:/media'. I/O stats for this filesystem won't be shown in graph.
Mon May  6 18:10:05 2013 - Built-in HTTP server pid is '25104'.
HTTPServer: You can connect to your server at http://localhost:9000/
Owner

mikaku commented May 7, 2013

I've left mine working during all night and it's still working finely right now. I can't imagine what is happening there.

  • how many instances of Monitorix are running?
  • are your .rrd files updating correctly? (check their time stamps).
  • make sure that /var/log/monitorix has not been rotated and you are seeing in an outdated file.

Regarding the fs error messages, it looks like you have defined some mount points that does not exist in your system (Monitorix comes with the /boot mount point predefined).

Alternatively, paste your current <fs> section from your /etc/monitorix.conf and also the output of the df -P command.

Thanks.

Contributor

graysky2 commented May 7, 2013

  • Just one
  • Dunno, next time this happens I will check.
  • Yes, no rotation happened.

I have seen this happen before, but forgot to follow-up. Next time it happens I will open a new ticket with logs and info.

About the fs error messages are probably due to the fact that one of the mounts is only there if the NAS is up. I am not concerned about that.

Now, since a reboot, it has been logging away nicely. The pi graphs are very nice by the way:
again

Owner

mikaku commented May 8, 2013

Now, since a reboot, it has been logging away nicely. The pi graphs are very nice by the way:

Glad to hear that is working again.
And yeah, the graphs look very nice! :)

Please, don't forget to send me a complete daily graph to include it in the Screenshot section of the Monitorix web site.

Contributor

graysky2 commented May 8, 2013

Glad to... man, it happened again:
fuck

Here are the three things you asked about:

% ps aux | grep monitorix
root       258  0.1  1.9  45260  9080 ?        Ss   May07   3:51 /usr/bin/monitorix -c /etc/monitorix.conf -p /run/monitorix.pid
nobody     293  0.0  1.7  46188  8396 ?        Ss   May07   0:04 monitorix-httpd listening on 9000

As you can see, the rrd files are indeed not getting updated:

% ls -l /var/lib/monitorix/
total 34468
drwxr-xr-x 2 root root      160 May  6 17:41 reports
-rw-r--r-- 1 root root  2253952 May  7 21:10 fs.rrd
-rw-r--r-- 1 root root 24022976 May  7 21:10 int.rrd
-rw-r--r-- 1 root root  1690960 May  7 21:11 kern.rrd
-rw-r--r-- 1 root root  5631904 May  7 21:10 net.rrd
-rw-r--r-- 1 root root  1690960 May  7 21:10 raspberrypi.rrd

When I inspect the log now, it is full of errors:

% head -n 30 /var/log/monitorix
...
Tue May  7 04:17:00 2013 - Starting Monitorix version 3.1.900 (pid 258).
Tue May  7 04:17:03 2013 - fs::fs_init: Unable to detect the device name of '/dev/root'. I/O stats for this filesystem won't be shown in graph.
Tue May  7 04:17:03 2013 - fs::fs_init: Unable to detect the device name of ''. I/O stats for this filesystem won't be shown in graph.
Tue May  7 04:17:06 2013 - Built-in HTTP server pid is '293'.
Argument "mmcblk0p1" isn't numeric in multiplication (*) at /usr/lib/monitorix/fs.pm line 374.
Argument "mmcblk0p1" isn't numeric in multiplication (*) at /usr/lib/monitorix/fs.pm line 374.
Argument "mmcblk0p1" isn't numeric in multiplication (*) at /usr/lib/monitorix/fs.pm line 374.
Argument "mmcblk0p1" isn't numeric in multiplication (*) at /usr/lib/monitorix/fs.pm line 374.
Argument "mmcblk0p1" isn't numeric in multiplication (*) at /usr/lib/monitorix/fs.pm line 374.
Argument "mmcblk0p1" isn't numeric in multiplication (*) at /usr/lib/monitorix/fs.pm line 374.
Argument "mmcblk0p1" isn't numeric in multiplication (*) at /usr/lib/monitorix/fs.pm line 374.
...

I think the problem occurs when the NAS goes down as it did last night around 21:00. Here is a little script I wrote that checks to see if the NAS is up and if it is, to mount the NFS export that the RPi uses:

#!/bin/bash

SERVER_EXPORT='NAS:/media'
MOUNT_TARGET='/mnt/media'

# Nothing to do if user does not have requisite binaries.
[[ -z $(which ping) ]] && echo 'Install iputils or whatever package provides ping' && exit 0
[[ -z $(which mountpoint) ]] && echo 'Install util-linux or whatever package provides mountpoint' && exit 0

ping -c 1 NAS &>/dev/null
if [ $? -ne 0 ]; then
    # server is down so unmount
    #
    # if we query the mount point and it was previously mounted, the script freezes
    # so just unmount forcing while lazy
    umount -l -f $MOUNT_TARGET &>/dev/null
else
    # server is up
    #
    # check if mount point is live and try to mount if not
    mountpoint -q $MOUNT_TARGET || mount -t nfs4 $SERVER_EXPORT $MOUNT_TARGET
fi

Can you speculate why when the ping returns a null and the umount step is called, monitorix would behave this way? I will remove /mnt/media from the FS graph and see if it remains up.

Contributor

graysky2 commented May 9, 2013

I just emailed you the screenshots you requested. I will continue running v3.1.92 (dev) on the rpi. No errors to report after 24 h.

...should I open a new ticket for the crash I reported above that happens if users remove a mount point that is being monitored?

Owner

mikaku commented May 14, 2013

Looks like you are including a network filesystem in the fs graph, and when the NAS goes down Monitorix hangs on the df command.

So, try removing that mount point from the list in <fs> and see if your unexpected hangs are gone.
Let me know.

Contributor

graysky2 commented May 14, 2013

Yes, removing it from the fixes the hangs. Can you think of a solution to this? Most people will never shut their raspberry pi machines off (1-2 Watts of power consumption) and also most people running raspberry pi machines will use NFS or Samba shares since these PCs have small storage (just SD card). I think this would affect other users too is the point :)

Owner

mikaku commented May 15, 2013

This is a known problem pending to be solved in my TODO list.
I'll try to give it a push and the next version.

Contributor

graysky2 commented May 15, 2013

Cool, just wanted to make sure you knew about it.

mikaku added a commit that referenced this issue Jun 4, 2013

Reimplemented the main loop with the sighandler alarm inside in order…
… to be able to control timeouts in the 'disk' graph. This should avoid a complete freeze if the network goes down when monitoring NFS filesystems. [#10]
Owner

mikaku commented Jun 4, 2013

graysky, I've included a modification to avoid these hangs when network goes down when monitoring NFS file systems.

Please, check the devel branch and let me know if this fixed that issue.
Thanks.

@ghost ghost assigned mikaku Jun 4, 2013

Contributor

graysky2 commented Jun 4, 2013

Ouch... I do not use the nfs share in /etc/fstab anymore because it causes hangs. These hangs are bad for the system in general and go beyond the annoyance of monitorix graphs breaking. I use a trivial little shell script to ping the server once/min and mount if up then umount if down.


SERVER_EXPORT='10.1.10.101:/share'
MOUNT_TARGET='/mnt/share'

# Nothing to do if user does not have requisite binaries.
[[ -z $(which ping) ]] && echo 'Install iputils or whatever package provides ping' && exit 0
[[ -z $(which mountpoint) ]] && echo 'Install util-linux or whatever package provides mountpoint' && exit 0

ping -c 1 myth &>/dev/null
if [ $? -ne 0 ]; then
    # server is down so unmount
    #
    # if we query the mount point and it was previously mounted, the script freezes
    # so just unmount forcing while lazy
    umount -l -f $MOUNT_TARGET &>/dev/null
else
    # server is up
    #
    # check if mount point is live and try to mount if not
    mountpoint -q $MOUNT_TARGET || mount -t nfs4 $SERVER_EXPORT $MOUNT_TARGET
fi
Owner

mikaku commented Jun 5, 2013

Ok, anyway the new modification seems working fine here.
Feel free to use it!

Contributor

graysky2 commented Jun 5, 2013

I will disable my script and mount via /etc/fstab over the weekend as well as bring down the NAS and see if your modifications work as expected. Just can't do it until then.

Contributor

graysky2 commented Jun 5, 2013

...actually, wait a sec. I have my production http server that is running 3.2.1 right now (not the RPi). I will clone and install it now on that box.

Contributor

graysky2 commented Jun 5, 2013

OK... devel is running on the server. I added a quick samba share to /mnt/foo and am letting monitorix log for an hour or so, I will then umount /mnt/foo and see what happens to the fsusage graph.

Owner

mikaku commented Jun 6, 2013

Keep in mind that umounting a filesystem might be not the same than if the network goes down on a NFS filesystem.

So, better if you test it with your original settings with an NFS filesystem.
Anyway, let me know how it works.

Contributor

graysky2 commented Jun 6, 2013

OK. I added /mnt/share (which is served up from a VM which I will bring up and down over the course of the day). We will see how the graphs are affected and if you code fixes the problem.

Owner

mikaku commented Jun 6, 2013

Ok, I'll cross my fingers!

Contributor

graysky2 commented Jun 6, 2013

OK, it does not work. Here I started with the NAS up and mounted via /etc/fstab entry. After a few min, I shutdown the NAS and all the graphs broke.

broke

Owner

mikaku commented Jun 6, 2013

Make sure you are testing it using the devel branch, and also can you, please, paste the current /var/log/monitorix?

Thanks.

Contributor

graysky2 commented Jun 6, 2013

I was on devel when I downloaded the src zip file. If I look at usr/lib/monitorix/fs.pm for example, it matches the one in e27dad0

  • 13:12 PM I did a fresh install of monitorix-devel from the commit above and started monitorix fresh with the mount /mnt/share up and running.
  • 13:22 PM I shutdown the NAS.
  • 13:32 PM Started the NAS.

Here is the log you asked for and note that even though line 5 says that nas:/media is not shown in graph, it is in the graph (it is mounted as /mnt/share).

Thu Jun  6 13:12:27 2013 - Starting Monitorix version 3.2.1 (pid 625).
Thu Jun  6 13:12:27 2013 - Creating '/var/lib/monitorix/system.rrd' file.
Thu Jun  6 13:12:27 2013 - Creating '/var/lib/monitorix/proc.rrd' file.
Thu Jun  6 13:12:27 2013 - Creating '/var/lib/monitorix/fs.rrd' file.
Thu Jun  6 13:12:28 2013 - fs::fs_init: Unable to detect the device name of 'nas:/media'. I/O stats for this filesystem won't be shown in graph.
Thu Jun  6 13:12:28 2013 - Creating '/var/lib/monitorix/net.rrd' file.
Thu Jun  6 13:12:28 2013 - Creating '/var/lib/monitorix/user.rrd' file.
Thu Jun  6 13:12:28 2013 - Built-in HTTP server pid is '643'.
HTTPServer: You can connect to your server at http://localhost:8080/

Screenshot of several graphs but all have broken once the NAS went down:
broke2

Here it is zoomed:
fs01z 1day

mikaku added a commit that referenced this issue Jun 7, 2013

Reimplemented the main loop with the sighandler alarm inside in order…
… to be able to control timeouts in the 'disk' graph. This should avoid a complete freeze if the network goes down when monitoring NFS filesystems. [#10]
Owner

mikaku commented Jun 7, 2013

You're right, I forgot to push the last commit which implements all the new sighandler mechanism.
Please, download again the whole devel, or just the monitorix file, which held the pending commit.

Thanks, and sorry for the inconveniences.

Contributor

graysky2 commented Jun 7, 2013

OK. I pulled and build fe4b61b and am testing now.

Contributor

graysky2 commented Jun 7, 2013

Seems to be working...
good

Owner

mikaku commented Jun 7, 2013

Great!
The /var/log/monitorix file should reflect the timeouts of the NFS filesystems.

Contributor

graysky2 commented Jun 7, 2013

Sure does....

Fri Jun  7 15:58:16 2013 - Starting Monitorix version 3.2.1 (pid 802).
Fri Jun  7 15:58:16 2013 - Creating '/var/lib/monitorix/system.rrd' file.
Fri Jun  7 15:58:16 2013 - Creating '/var/lib/monitorix/proc.rrd' file.
Fri Jun  7 15:58:16 2013 - Creating '/var/lib/monitorix/fs.rrd' file.
Fri Jun  7 15:58:16 2013 - fs::fs_init: Unable to detect the device name of 'nas:/media'. I/O stats for this filesystem won't be shown in graph.
Fri Jun  7 15:58:17 2013 - Creating '/var/lib/monitorix/net.rrd' file.
Fri Jun  7 15:58:17 2013 - Creating '/var/lib/monitorix/user.rrd' file.
Fri Jun  7 15:58:17 2013 - Built-in HTTP server pid is '820'.
HTTPServer: You can connect to your server at http://localhost:8080/
Fri Jun  7 16:19:15 2013 - fs::fs_update: Timeout! Process with PID 1593 was hung after 15 secs. Killed.
Fri Jun  7 16:20:15 2013 - fs::fs_update: Timeout! Process with PID 1612 was hung after 15 secs. Killed.
Fri Jun  7 16:21:15 2013 - fs::fs_update: Timeout! Process with PID 1652 was hung after 15 secs. Killed.
Fri Jun  7 16:22:15 2013 - fs::fs_update: Timeout! Process with PID 1671 was hung after 15 secs. Killed.
Fri Jun  7 16:23:15 2013 - fs::fs_update: Timeout! Process with PID 1690 was hung after 15 secs. Killed.
Fri Jun  7 16:24:15 2013 - fs::fs_update: Timeout! Process with PID 1730 was hung after 15 secs. Killed.
Fri Jun  7 16:25:15 2013 - fs::fs_update: Timeout! Process with PID 1748 was hung after 15 secs. Killed.
Fri Jun  7 16:26:15 2013 - fs::fs_update: Timeout! Process with PID 1788 was hung after 15 secs. Killed.
Fri Jun  7 16:27:15 2013 - fs::fs_update: Timeout! Process with PID 1806 was hung after 15 secs. Killed.
Fri Jun  7 16:28:15 2013 - fs::fs_update: Timeout! Process with PID 1825 was hung after 15 secs. Killed.
Fri Jun  7 16:29:15 2013 - fs::fs_update: Timeout! Process with PID 1865 was hung after 15 secs. Killed.
Fri Jun  7 16:30:15 2013 - fs::fs_update: Timeout! Process with PID 1883 was hung after 15 secs. Killed.
Fri Jun  7 16:31:15 2013 - fs::fs_update: Timeout! Process with PID 1923 was hung after 15 secs. Killed.
Fri Jun  7 16:32:15 2013 - fs::fs_update: Timeout! Process with PID 1942 was hung after 15 secs. Killed.
Fri Jun  7 16:33:15 2013 - fs::fs_update: Timeout! Process with PID 1960 was hung after 15 secs. Killed.
Fri Jun  7 16:34:15 2013 - fs::fs_update: Timeout! Process with PID 2000 was hung after 15 secs. Killed.
Fri Jun  7 16:35:15 2013 - fs::fs_update: Timeout! Process with PID 2018 was hung after 15 secs. Killed.
Fri Jun  7 16:36:15 2013 - fs::fs_update: Timeout! Process with PID 2058 was hung after 15 secs. Killed.
Fri Jun  7 16:37:15 2013 - fs::fs_update: Timeout! Process with PID 2076 was hung after 15 secs. Killed.
Fri Jun  7 16:38:15 2013 - fs::fs_update: Timeout! Process with PID 2094 was hung after 15 secs. Killed.
Fri Jun  7 16:39:15 2013 - fs::fs_update: Timeout! Process with PID 2112 was hung after 15 secs. Killed.
Fri Jun  7 16:40:15 2013 - fs::fs_update: Timeout! Process with PID 2130 was hung after 15 secs. Killed.
Fri Jun  7 16:41:15 2013 - fs::fs_update: Timeout! Process with PID 2148 was hung after 15 secs. Killed.
Fri Jun  7 16:42:15 2013 - fs::fs_update: Timeout! Process with PID 2166 was hung after 15 secs. Killed.
Fri Jun  7 16:43:15 2013 - fs::fs_update: Timeout! Process with PID 2185 was hung after 15 secs. Killed.
Fri Jun  7 16:44:15 2013 - fs::fs_update: Timeout! Process with PID 2203 was hung after 15 secs. Killed.
Fri Jun  7 16:45:15 2013 - fs::fs_update: Timeout! Process with PID 2221 was hung after 15 secs. Killed.
Fri Jun  7 16:46:15 2013 - fs::fs_update: Timeout! Process with PID 2239 was hung after 15 secs. Killed.
Fri Jun  7 16:47:15 2013 - fs::fs_update: Timeout! Process with PID 2257 was hung after 15 secs. Killed.
Fri Jun  7 16:48:15 2013 - fs::fs_update: Timeout! Process with PID 2275 was hung after 15 secs. Killed.
Fri Jun  7 16:49:15 2013 - fs::fs_update: Timeout! Process with PID 2293 was hung after 15 secs. Killed.
Fri Jun  7 16:50:15 2013 - fs::fs_update: Timeout! Process with PID 2311 was hung after 15 secs. Killed.
Fri Jun  7 16:51:15 2013 - fs::fs_update: Timeout! Process with PID 2329 was hung after 15 secs. Killed.
Fri Jun  7 16:52:15 2013 - fs::fs_update: Timeout! Process with PID 2347 was hung after 15 secs. Killed.
Fri Jun  7 16:53:15 2013 - fs::fs_update: Timeout! Process with PID 2366 was hung after 15 secs. Killed.
Fri Jun  7 16:54:15 2013 - fs::fs_update: Timeout! Process with PID 2384 was hung after 15 secs. Killed.
Fri Jun  7 16:55:15 2013 - fs::fs_update: Timeout! Process with PID 2402 was hung after 15 secs. Killed.
Fri Jun  7 16:56:15 2013 - fs::fs_update: Timeout! Process with PID 2420 was hung after 15 secs. Killed.
Fri Jun  7 16:57:15 2013 - fs::fs_update: Timeout! Process with PID 2438 was hung after 15 secs. Killed.
Fri Jun  7 16:58:15 2013 - fs::fs_update: Timeout! Process with PID 2457 was hung after 15 secs. Killed.
Fri Jun  7 16:59:15 2013 - fs::fs_update: Timeout! Process with PID 2475 was hung after 15 secs. Killed.
Fri Jun  7 17:00:15 2013 - fs::fs_update: Timeout! Process with PID 2572 was hung after 15 secs. Killed.
Fri Jun  7 17:01:15 2013 - fs::fs_update: Timeout! Process with PID 2592 was hung after 15 secs. Killed.
Fri Jun  7 17:02:15 2013 - fs::fs_update: Timeout! Process with PID 2622 was hung after 15 secs. Killed.
Fri Jun  7 17:08:22 2013 - SIGTERM caught.
Fri Jun  7 17:08:23 2013 - Exiting.
Owner

mikaku commented Jun 7, 2013

Perfect!, this proof that it works!
;)

Owner

mikaku commented Jun 7, 2013

I think we can close this issue.

@graysky2 graysky2 closed this Jun 7, 2013

Contributor

graysky2 commented Jun 7, 2013

:p

..good job!

Owner

mikaku commented Jun 8, 2013

Thanks for your feedback!
;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment