Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crashes after 2k (or 3k) devices, before update no limit (30k+ devices) #191

Closed
designer2k2 opened this issue Nov 18, 2019 · 16 comments
Closed

Comments

@designer2k2
Copy link
Contributor

Hello,
im running kismet release from the repo on a rpi3 with kali.
In July this year i setup the whole thing and after some fiddling with the config i was able to run up to 30k devices before the memory of the rpi3 was full.
The device was running like this over several sessions with no issue.

Then couple of days ago i "apt upgrade" and got a newer kismet version.

Since that day kismet stops after either 2000 (from 2000 to 2200) or 3000 (3000 to 3300) devices with no visible issue.

One time i was able to see the cpu load, it went to 100% for a couple of seconds, then kismet stops. Memory at that point is at 25%.

Also the kismet databases are mostly around 30mb, some of them even at exactly 30mb.

I now changed sd-card, placed the logging location to a usb stick, no changes.

Also i cannot find any logs or entrys, it just "disappears", syslog shows nothing

How can i help to debug this? I have no idea where to start looking

@kismetwireless
Copy link
Owner

kismetwireless commented Nov 18, 2019 via email

@kismetwireless
Copy link
Owner

I'll try to do some tests as well and see if I can get it to ramp up CPU use.

@kismetwireless
Copy link
Owner

So far the only high cpu states i've monitored are when it's updating the database log for all the devices that have changed; i'll keep monitoring however.

@kismetwireless
Copy link
Owner

The biggest changes I can think of that might impact this, since early summer, are the detector that the SD card can't keep up, and the shortening of the write period to try to compensate for slow sd cards. If you're on a pi3 logging to sd, knowing the exact error it exits with would be quite helpful since it would immediatley show if your SD card is taking too long to flush the db.

@kismetwireless
Copy link
Owner

I might have found the (or a) culprit - it wasn't updating the last-logged time, so as more and more devices were added, the amount of work it took to save them increased because it kept re-saving previously logged devices.

The latest git has a fix in it, and it'll go into tonights nightlies, if you get a chance to try them tomorrow or build it yourself.

@designer2k2
Copy link
Contributor Author

designer2k2 commented Nov 20, 2019

your idea looks right, the more devices i have the higher the cpu loads get at about every 1k of devices. at 1k the cpu just goes a bit up, at 2k quite a bit, and at 3k it always goes to 100%.

the output in the stderr is then this:

Stack trace (most recent call last) in thread 3333:
#8 Object "[0xffffffffffffffff], at 0xffffffffffffffff, in
#7 FATAL - Capture source did not get PING from Kismet for over 15 seconds; shutting down
FATAL - Capture source did not get PING from Kismet for over 15 seconds; shutting down
Object "/lib/aarch64-linux-gnu/libc.so.6, at 0x7f956aa70b, in
#6 Object "/lib/aarch64-linux-gnu/libpthread.so.0, at 0x7f95d0b887, in
#5 Object "/lib/aarch64-linux-gnu/libgomp.so.1, at 0x7f957820b3, in
#4 Object "kismet, at 0x556777e9df, in
#3 Object "/lib/aarch64-linux-gnu/libstdc++.so.6, at 0x7f958f52cb, in __cxa_throw
#2 Object "/lib/aarch64-linux-gnu/libstdc++.so.6, at 0x7f958f4fff, in std::terminate()
#1 Object "/lib/aarch64-linux-gnu/libstdc++.so.6, at 0x7f958f4fab, in
#0 Object "kismet, at 0x5567909633, in TerminationHandler()
FATAL - Capture source did not get PING from Kismet for over 15 seconds; shutting down
FATAL - Capture source did not get PING from Kismet for over 15 seconds; shutting down
debug - seen multiple basicrates?

i made a script fetching the top 5 CPU items during that time:

root 7099 12.2 20.1 1212908 189552 ? Sl 18:42 2:19 kismet
root 370 10.9 0.2 4948 2100 ? Ds 18:06 6:02 /sbin/mount.ntfs /dev/sda1 /media/usb -o rw,sync,noexec,nosuid,nodev,users
root 133 10.9 0.0 0 0 ? R 18:06 6:04 [usb-storage]
root 562 7.8 3.4 124592 32144 ? Ssl 18:06 4:17 /usr/bin/python3 /home/pi/warpigui.py
gpsd 7100 3.1 0.3 7220 3320 ? S<s 18:42 0:35 gpsd /dev/serial0

so yes kismet is writing to the usb-stick and that pushes it to the timeout.

i will switch to the nightly build and give it a test in the next days!

@dragorn
Copy link
Contributor

dragorn commented Nov 20, 2019 via email

@designer2k2
Copy link
Contributor Author

i switched to the git version 2019-11-21-21r36edada5-1, and to a vfat usb stick. the issue remains.

as ive seen this warning in the logs, i made a small pull request to fix a text bug: #192

also ive increased the write delay from 30 to 60, no change to the result.

@dragorn
Copy link
Contributor

dragorn commented Nov 22, 2019 via email

@designer2k2
Copy link
Contributor Author

yes the ntfs usb stick was no good idea from me, but it was a try to solve the issue as i thought my sd card might be the issue.

with the "old" kismet version from july i did not have this problem.

To clarify it, the rpi is running at around 3-5% cpu load, but at about every 1,1k devices the cpu load spikes. At ~1,1k the spike is to maybe 40%, then it returns to the 3-5%, at 2,2k it already hits 100% for a short time, then it returns to 3-5% (if no timeout happens) and latest at around 3,3k devices it goes to 100% with the timeout.

Is there something i can turn off to reduce the load on the file system?

@kismetwireless
Copy link
Owner

kismetwireless commented Nov 24, 2019 via email

@designer2k2
Copy link
Contributor Author

Ok, i take it that the rpi is not strong enough. Thanks for helping me!

I will then make a script that auto-restarts and merges the db files for me

@kismetwireless
Copy link
Owner

this looks like it could be related to the channel summary code. why it's happening now and not before, i have no idea. why it's linked to 1000 devices, also no idea - that makes no sense, but i can replicate SOME cpu jump at 1000 devices so it looks like something is going on.

Manually forcing the channel calculations to 10 seconds drops the load, so it's definitely something related to probing the channels in the device list. I'll work on optimizing it more.

@designer2k2
Copy link
Contributor Author

ok, i was running with kis_log_channel_history=false but a look at the code gives no use for that parameter: https://github.com/kismetwireless/kismet/search?q=kis_log_channel_history&unscoped_q=kis_log_channel_history

Also the kis_log_channel_history_rate is not used: https://github.com/kismetwireless/kismet/search?q=kis_log_channel_history_rate&type=Code

is there a way for me to try this?

@kismetwireless
Copy link
Owner

It looks like at some point in the past months, something may have changed in the gnu::parallel C++ code; just pushed some changes that remove use of the parallelization and saw some dramatic improvements in my test data, will have to wait to run it through natural data collection to see if it still holds true.

The logging option missing is a bug; but it won't impact anything but logging size. All it would control is if it dumps the channel status to the kismetdb - which it doesn't do currently anyhow.

@designer2k2
Copy link
Contributor Author

ive given it a update today and was able to run 13k devices with no issue. i stopped as my battery was running low...

so this is now back fully working for me 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants