Crashes after 2k (or 3k) devices, before update no limit (30k+ devices) #191

designer2k2 · 2019-11-18T19:09:40Z

Hello,
im running kismet release from the repo on a rpi3 with kali.
In July this year i setup the whole thing and after some fiddling with the config i was able to run up to 30k devices before the memory of the rpi3 was full.
The device was running like this over several sessions with no issue.

Then couple of days ago i "apt upgrade" and got a newer kismet version.

Since that day kismet stops after either 2000 (from 2000 to 2200) or 3000 (3000 to 3300) devices with no visible issue.

One time i was able to see the cpu load, it went to 100% for a couple of seconds, then kismet stops. Memory at that point is at 25%.

Also the kismet databases are mostly around 30mb, some of them even at exactly 30mb.

I now changed sd-card, placed the logging location to a usb stick, no changes.

Also i cannot find any logs or entrys, it just "disappears", syslog shows nothing

How can i help to debug this? I have no idea where to start looking

kismetwireless · 2019-11-18T20:11:41Z

Unfortunately so much has changed since then I couldn't even begin to guess. Having the *exact* messages it gives when crashing would be a start; additionally, running it via the debug instructions at https://www.kismetwireless.net/docs/readme/debugging/ Otherwise you'll need to start going through and try to isolate when the change for your setup occurred; this most likely involves building the releases in between then and now manually and finding out if the same behavior happens.

…

On Mon, Nov 18, 2019 at 2:09 PM Stephan Martin ***@***.***> wrote: Hello, im running kismet release from the repo on a rpi3 with kali. In July this year i setup the whole thing and after some fiddling with the config i was able to run up to 30k devices before the memory of the rpi3 was full. The device was running like this over several sessions with no issue. Then couple of days ago i "apt upgrade" and got a newer kismet version. Since that day kismet stops after either 2000 (from 2000 to 2200) or 3000 (3000 to 3300) devices with no visible issue. One time i was able to see the cpu load, it went to 100% for a couple of seconds, then kismet stops. Memory at that point is at 25%. Also the kismet databases are mostly around 30mb, some of them even at exactly 30mb. I now changed sd-card, placed the logging location to a usb stick, no changes. Also i cannot find any logs or entrys, it just "disappears", syslog shows nothing How can i help to debug this? I have no idea where to start looking — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#191?email_source=notifications&email_token=AFKJYY2W32UVHYXJHZZMQD3QULR7PA5CNFSM4JOYXLYKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4H2EE5AQ>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFKJYY3KQLI6DJC5OYSVWNDQULR7PANCNFSM4JOYXLYA> .

kismetwireless · 2019-11-18T20:41:48Z

I'll try to do some tests as well and see if I can get it to ramp up CPU use.

kismetwireless · 2019-11-19T15:07:23Z

So far the only high cpu states i've monitored are when it's updating the database log for all the devices that have changed; i'll keep monitoring however.

kismetwireless · 2019-11-19T15:11:44Z

The biggest changes I can think of that might impact this, since early summer, are the detector that the SD card can't keep up, and the shortening of the write period to try to compensate for slow sd cards. If you're on a pi3 logging to sd, knowing the exact error it exits with would be quite helpful since it would immediatley show if your SD card is taking too long to flush the db.

kismetwireless · 2019-11-20T15:46:28Z

I might have found the (or a) culprit - it wasn't updating the last-logged time, so as more and more devices were added, the amount of work it took to save them increased because it kept re-saving previously logged devices.

The latest git has a fix in it, and it'll go into tonights nightlies, if you get a chance to try them tomorrow or build it yourself.

designer2k2 · 2019-11-20T18:38:40Z

your idea looks right, the more devices i have the higher the cpu loads get at about every 1k of devices. at 1k the cpu just goes a bit up, at 2k quite a bit, and at 3k it always goes to 100%.

the output in the stderr is then this:

Stack trace (most recent call last) in thread 3333:
#8 Object "[0xffffffffffffffff], at 0xffffffffffffffff, in
#7 FATAL - Capture source did not get PING from Kismet for over 15 seconds; shutting down
FATAL - Capture source did not get PING from Kismet for over 15 seconds; shutting down
Object "/lib/aarch64-linux-gnu/libc.so.6, at 0x7f956aa70b, in
#6 Object "/lib/aarch64-linux-gnu/libpthread.so.0, at 0x7f95d0b887, in
#5 Object "/lib/aarch64-linux-gnu/libgomp.so.1, at 0x7f957820b3, in
#4 Object "kismet, at 0x556777e9df, in
#3 Object "/lib/aarch64-linux-gnu/libstdc++.so.6, at 0x7f958f52cb, in __cxa_throw
#2 Object "/lib/aarch64-linux-gnu/libstdc++.so.6, at 0x7f958f4fff, in std::terminate()
#1 Object "/lib/aarch64-linux-gnu/libstdc++.so.6, at 0x7f958f4fab, in
#0 Object "kismet, at 0x5567909633, in TerminationHandler()
FATAL - Capture source did not get PING from Kismet for over 15 seconds; shutting down
FATAL - Capture source did not get PING from Kismet for over 15 seconds; shutting down
debug - seen multiple basicrates?

i made a script fetching the top 5 CPU items during that time:

root 7099 12.2 20.1 1212908 189552 ? Sl 18:42 2:19 kismet
root 370 10.9 0.2 4948 2100 ? Ds 18:06 6:02 /sbin/mount.ntfs /dev/sda1 /media/usb -o rw,sync,noexec,nosuid,nodev,users
root 133 10.9 0.0 0 0 ? R 18:06 6:04 [usb-storage]
root 562 7.8 3.4 124592 32144 ? Ssl 18:06 4:17 /usr/bin/python3 /home/pi/warpigui.py
gpsd 7100 3.1 0.3 7220 3320 ? S<s 18:42 0:35 gpsd /dev/serial0

so yes kismet is writing to the usb-stick and that pushes it to the timeout.

i will switch to the nightly build and give it a test in the next days!

dragorn · 2019-11-20T18:40:38Z

I'd also suggest not using ntfs; that's going to add a significant slowdown to all your disk IO as well since it has to go through FUSE.

…

On Wed, Nov 20, 2019 at 1:38 PM Stephan Martin ***@***.***> wrote: your idea looks right, the more devices i have the higher the cpu loads get at about every 1k of devices. at 1k the cpu just goes a bit up, at 2k quite a bit, and at 3k it always goes to 100%. the output in the stderr is then this: Stack trace (most recent call last) in thread 3333: #8 Object "[0xffffffffffffffff], at 0xffffffffffffffff, in #7 FATAL - Capture source did not get PING from Kismet for over 15 seconds; shutting down FATAL - Capture source did not get PING from Kismet for over 15 seconds; shutting down Object "/lib/aarch64-linux-gnu/libc.so.6, at 0x7f956aa70b, in #6 Object "/lib/aarch64-linux-gnu/libpthread.so.0, at 0x7f95d0b887, in #5 Object "/lib/aarch64-linux-gnu/libgomp.so.1, at 0x7f957820b3, in #4 Object "kismet, at 0x556777e9df, in #3 Object "/lib/aarch64-linux-gnu/libstdc++.so.6, at 0x7f958f52cb, in __cxa_throw #2 Object "/lib/aarch64-linux-gnu/libstdc++.so.6, at 0x7f958f4fff, in std::terminate() #1 Object "/lib/aarch64-linux-gnu/libstdc++.so.6, at 0x7f958f4fab, in #0 Object "kismet, at 0x5567909633, in TerminationHandler() FATAL - Capture source did not get PING from Kismet for over 15 seconds; shutting down FATAL - Capture source did not get PING from Kismet for over 15 seconds; shutting down debug - seen multiple basicrates? i made a script fetching the top 5 CPU items during that time: root 7099 12.2 20.1 1212908 189552 ? Sl 18:42 2:19 kismet root 370 10.9 0.2 4948 2100 ? Ds 18:06 6:02 /sbin/mount.ntfs /dev/sda1 /media/usb -o rw,sync,noexec,nosuid,nodev,users root 133 10.9 0.0 0 0 ? R 18:06 6:04 [usb-storage] root 562 7.8 3.4 124592 32144 ? Ssl 18:06 4:17 /usr/bin/python3 /home/pi/warpigui.py gpsd 7100 3.1 0.3 7220 3320 ? S<s 18:42 0:35 gpsd /dev/serial0 so yes kismet is writing to the usb-stick and that pushes it to the timeout. i will switch to the nightly build and give it a test in the next days! — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#191?email_source=notifications&email_token=AAZAOCEIWCCYSGGIICPOE5LQUV73FA5CNFSM4JOYXLYKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEETVBFI#issuecomment-556224661>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZAOCEEDY2S3XA4GZW5QITQUV73FANCNFSM4JOYXLYA> .

designer2k2 · 2019-11-22T14:39:54Z

i switched to the git version 2019-11-21-21r36edada5-1, and to a vfat usb stick. the issue remains.

as ive seen this warning in the logs, i made a small pull request to fix a text bug: #192

also ive increased the write delay from 30 to 60, no change to the result.

dragorn · 2019-11-22T18:53:27Z

I'll look at the diff when i get chance this weekend. That implies your USB, or your storage, can't handle the IO load. On a pi3 this doesn't surprise me, and using NTFS over FUSE is going to *seriously* impact your filesystem performance, as well - like as much as 50% or more performance lost. So far with the latest nightlies I can only replicate modest CPU growth, nothing runaway. 3500 devices takes about 8% more than 100 devices which is roughly to be expected. The runaway logging was definitely going to cause a performance impact, but that should be addressed now.

…

On Fri, Nov 22, 2019 at 9:39 AM Stephan Martin ***@***.***> wrote: i switched to the git version 2019-11-21-21r36edada5-1, and to a vfat usb stick. the issue remains. as ive seen this warning in the logs, i made a small pull request to fix a text bug: #192 <#192> also ive increased the write delay from 30 to 60, no change to the result. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#191?email_source=notifications&email_token=AAZAOCBXH2IOPRVTZP6PQNLQU7VLVA5CNFSM4JOYXLYKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEE52DHA#issuecomment-557556124>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZAOCGZZTRCVC5OV72GFF3QU7VLVANCNFSM4JOYXLYA> .

designer2k2 · 2019-11-23T07:37:56Z

yes the ntfs usb stick was no good idea from me, but it was a try to solve the issue as i thought my sd card might be the issue.

with the "old" kismet version from july i did not have this problem.

To clarify it, the rpi is running at around 3-5% cpu load, but at about every 1,1k devices the cpu load spikes. At ~1,1k the spike is to maybe 40%, then it returns to the 3-5%, at 2,2k it already hits 100% for a short time, then it returns to 3-5% (if no timeout happens) and latest at around 3,3k devices it goes to 100% with the timeout.

Is there something i can turn off to reduce the load on the file system?

kismetwireless · 2019-11-24T03:22:33Z

If your logging system can't keep up it's really an indicator that your hardware isn't sufficient for the environment you're trying to use it in. A pi3 is pretty terrible at SD, USB, and ethernet, which doens't make it a particularly well suited device if you've got thousands or more of devices you're trying to log. You can turn off logging options, yes; look at kismet_logging.conf and https://www.kismetwireless.net/docs/readme/performance_and_memory/

…

On Sat, Nov 23, 2019 at 2:37 AM Stephan Martin ***@***.***> wrote: yes the ntfs usb stick was no good idea from me, but it was a try to solve the issue as i thought my sd card might be the issue. with the "old" kismet version from july i did not have this problem. To clarify it, the rpi is running at around 3-5% cpu load, but at about every 1,1k devices the cpu load spikes. At ~1,1k the spike is to maybe 40%, then it returns to the 3-5%, at 2,2k it already hits 100% for a short time, then it returns to 3-5% (if no timeout happens) and latest at around 3,3k devices it goes to 100% with the timeout. Is there something i can turn off to reduce the load on the file system? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#191?email_source=notifications&email_token=AFKJYY7A33BCWHRAWGDK3VLQVDMVJA5CNFSM4JOYXLYKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEE7PV4Y#issuecomment-557775603>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFKJYY7C5H2AOIO3BZTKFXDQVDMVJANCNFSM4JOYXLYA> .

designer2k2 · 2019-11-24T14:20:46Z

Ok, i take it that the rpi is not strong enough. Thanks for helping me!

I will then make a script that auto-restarts and merges the db files for me

kismetwireless · 2019-11-27T15:14:25Z

this looks like it could be related to the channel summary code. why it's happening now and not before, i have no idea. why it's linked to 1000 devices, also no idea - that makes no sense, but i can replicate SOME cpu jump at 1000 devices so it looks like something is going on.

Manually forcing the channel calculations to 10 seconds drops the load, so it's definitely something related to probing the channels in the device list. I'll work on optimizing it more.

designer2k2 · 2019-11-27T16:49:35Z

ok, i was running with kis_log_channel_history=false but a look at the code gives no use for that parameter: https://github.com/kismetwireless/kismet/search?q=kis_log_channel_history&unscoped_q=kis_log_channel_history

Also the kis_log_channel_history_rate is not used: https://github.com/kismetwireless/kismet/search?q=kis_log_channel_history_rate&type=Code

is there a way for me to try this?

kismetwireless · 2019-11-29T18:49:34Z

It looks like at some point in the past months, something may have changed in the gnu::parallel C++ code; just pushed some changes that remove use of the parallelization and saw some dramatic improvements in my test data, will have to wait to run it through natural data collection to see if it still holds true.

The logging option missing is a bug; but it won't impact anything but logging size. All it would control is if it dumps the channel status to the kismetdb - which it doesn't do currently anyhow.

designer2k2 · 2020-01-06T09:21:14Z

ive given it a update today and was able to run 13k devices with no issue. i stopped as my battery was running low...

so this is now back fully working for me 👍

designer2k2 closed this as completed Nov 24, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Crashes after 2k (or 3k) devices, before update no limit (30k+ devices) #191

Crashes after 2k (or 3k) devices, before update no limit (30k+ devices) #191

designer2k2 commented Nov 18, 2019

kismetwireless commented Nov 18, 2019 via email

kismetwireless commented Nov 18, 2019

kismetwireless commented Nov 19, 2019

kismetwireless commented Nov 19, 2019

kismetwireless commented Nov 20, 2019

designer2k2 commented Nov 20, 2019 •

edited

dragorn commented Nov 20, 2019 via email

designer2k2 commented Nov 22, 2019

dragorn commented Nov 22, 2019 via email

designer2k2 commented Nov 23, 2019

kismetwireless commented Nov 24, 2019 via email

designer2k2 commented Nov 24, 2019

kismetwireless commented Nov 27, 2019

designer2k2 commented Nov 27, 2019

kismetwireless commented Nov 29, 2019

designer2k2 commented Jan 6, 2020

Crashes after 2k (or 3k) devices, before update no limit (30k+ devices) #191

Crashes after 2k (or 3k) devices, before update no limit (30k+ devices) #191

Comments

designer2k2 commented Nov 18, 2019

kismetwireless commented Nov 18, 2019 via email

kismetwireless commented Nov 18, 2019

kismetwireless commented Nov 19, 2019

kismetwireless commented Nov 19, 2019

kismetwireless commented Nov 20, 2019

designer2k2 commented Nov 20, 2019 • edited

dragorn commented Nov 20, 2019 via email

designer2k2 commented Nov 22, 2019

dragorn commented Nov 22, 2019 via email

designer2k2 commented Nov 23, 2019

kismetwireless commented Nov 24, 2019 via email

designer2k2 commented Nov 24, 2019

kismetwireless commented Nov 27, 2019

designer2k2 commented Nov 27, 2019

kismetwireless commented Nov 29, 2019

designer2k2 commented Jan 6, 2020

designer2k2 commented Nov 20, 2019 •

edited