Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

openvas finishes task, ospd-openvas keeps looking elsewhere... #259

Closed
wisukind opened this issue May 15, 2020 · 21 comments
Closed

openvas finishes task, ospd-openvas keeps looking elsewhere... #259

wisukind opened this issue May 15, 2020 · 21 comments
Labels
bug Something isn't working feedback required Additional information is required

Comments

@wisukind
Copy link

wisukind commented May 15, 2020

OpenVAS 7.0.1
gvm-libs 11.0.1
OSP Server for openvas: 1.0.1
OSP: 1.2
OSPd: 2.0.1
python2.6
Ubuntu 18.04 LTS
Redis 4.09 with GVMd tuned configuration file

Hello

I have a scan running on a somewhat important task (3642 IP, with many dead hosts). When I run this task, openvas is launched by ospd-openvas without problems. Both are located on the same machine.
After some times, openvas finish scanning the task as it's suppose to:

_

sd main:MESSAGE:2020-05-14 20h42.23 utc:6675: Test complete
sd main:MESSAGE:2020-05-14 20h42.23 utc:6675: Total time to scan all hosts : 115770 seconds

_

However ospd-openvas seems to have lost communication in the middle with openvas & gvmd, as the last log entry reads 2020-05-13 (while openvas last log is 2020-05-14). No error logged. Process is still running and loaded:

_

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
6669 gvm 20 0 725872 537896 5744 R 87.5 3.2 829:27.94 python3.6
6666 gvm 20 0 375068 178548 5120 S 6.2 1.1 8:30.31 python3.6

_

On gvmd side, task is still running but stuck at 7% since more than one day. The problem is not systematic; out of 10 launches it occurs around 5-6 times. Other 4-5 times scan will finish successfully.

My ospd-openvas process is still running, in case I can do anything to help investigating wha's going on.

Thank you

@wisukind
Copy link
Author

It's worth noticing that this issue occurs on different tasks, using different scanners. It's not linked to a particular installation or plateform. It's a general problem which happens on tasks having lots of possible targets (I would say more than 1500 IP) with probably a lot of dead hosts.

I would really appreciate if someone could look into this issue. I can provide all necessary help to do real-time debugging.

Thank you

@jjnicola
Copy link
Member

Hi @wisukind,
Thank you very much for reporting this issue.

I will try to reproduce this issue. Could you give some more information about the scan you are running? It would be nice if you can answer the following questions.

  • Scan config. Is it a Full and Fast scan?
  • Target: The list of hosts in the target are networks or a large list of single hosts? hosts have virtual hosts? did you enabled some option to scan vhosts?
  • Processes: if you could paste the output of
    ps --forest -aux |grep openvas
    So I can see the amount of different process that are running and if the host processes are currently running child processes (plugins)

If you see that a host process is locked (the process name is openvas: testing ), and if possible, a gdb backtrace would be very helpful (attach the process ID to gdb with -p option)

Regards,

@bjoernricks bjoernricks added the bug Something isn't working label Jun 11, 2020
@jjnicola jjnicola added the feedback required Additional information is required label Jun 11, 2020
@wisukind
Copy link
Author

wisukind commented Jun 11, 2020

@jjnicola

Thanks for looking into this. The original taks have been killed by now, but I have other tasks in a more or less same situation, on remote scanners. So I'll answer your questions using those.

Scan config: Full & Fast
Target: Networks. No particular options for vhosts enabled.
Processes:

gvm@ov-slave-boc:~$ ps --forest -aux |grep openvas
gvm 23456 0.0 0.0 13072 1036 pts/0 S+ 15:58 0:00 _ grep --color=auto openvas
gvm 11702 0.0 2.0 925564 343360 ? Sl May24 20:54 /opt/gvm/bin/ospd-scanner/bin/python3.6 /opt/gvm/bin/ospd-scanner/bin/ospd-openvas --pid-file /opt/gvm/var/run/ospd-openvas-slave.pid -p 9390 -b 0.0.0.0 -k /opt/gvm/var/lib/gvm/private/CA/clientkey.pem -c /opt/gvm/var/lib/gvm/CA/clientcert.pem --ca-file /opt/gvm/var/lib/gvm/CA/cacert.pem --log-level debug -l /opt/gvm/var/log/gvm/ospd-openvas-slave.log --unix-socket /opt/gvm/var/run/ospd.sock --lock-file-dir /opt/gvm/var/run/ --stream-timeout 100 --config /opt/gvm/etc/openvas/ospd.conf

gvm 11751 66.6 13.7 2741616 2252568 ? Sl May24 17197:44 _ /opt/gvm/bin/ospd-scanner/bin/python3.6 /opt/gvm/bin/ospd-scanner/bin/ospd-openvas --pid-file /opt/gvm/var/run/ospd-openvas-slave.pid -p 9390 -b 0.0.0.0 -k /opt/gvm/var/lib/gvm/private/CA/clientkey.pem -c /opt/gvm/var/lib/gvm/CA/clientcert.pem --ca-file /opt/gvm/var/lib/gvm/CA/cacert.pem --log-level debug -l /opt/gvm/var/log/gvm/ospd-openvas-slave.log --unix-socket /opt/gvm/var/run/ospd.sock --lock-file-dir /opt/gvm/var/run/ --stream-timeout 100 --config /opt/gvm/etc/openvas/ospd.conf

gvm 11757 0.1 1.0 375776 179340 ? Ss May24 49:00 _ /opt/gvm/bin/ospd-scanner/bin/python3.6 /opt/gvm/bin/ospd-scanner/bin/ospd-openvas --pid-file /opt/gvm/var/run/ospd-openvas-slave.pid -p 9390 -b 0.0.0.0 -k /opt/gvm/var/lib/gvm/private/CA/clientkey.pem -c /opt/gvm/var/lib/gvm/CA/clientcert.pem --ca-file /opt/gvm/var/lib/gvm/CA/cacert.pem --log-level debug -l /opt/gvm/var/log/gvm/ospd-openvas-slave.log --unix-socket /opt/gvm/var/run/ospd.sock --lock-file-dir /opt/gvm/var/run/ --stream-timeout 100 --config /opt/gvm/etc/openvas/ospd.conf

gvm 11760 33.0 5.0 968512 821672 ? S May24 8516:49 | _ /opt/gvm/bin/ospd-scanner/bin/python3.6 /opt/gvm/bin/ospd-scanner/bin/ospd-openvas --pid-file /opt/gvm/var/run/ospd-openvas-slave.pid -p 9390 -b 0.0.0.0 -k /opt/gvm/var/lib/gvm/private/CA/clientkey.pem -c
/opt/gvm/var/lib/gvm/CA/clientcert.pem --ca-file /opt/gvm/var/lib/gvm/CA/cacert.pem --log-level debug -l /opt/gvm/var/log/gvm/ospd-openvas-slave.log --unix-socket /opt/gvm/var/run/ospd.sock --lock-file-dir /opt/gvm/var/run/ --stream-timeout 100 --config /opt/gvm/etc/openvas/ospd.conf

gvm 20507 0.1 1.1 451072 196752 ? Ss May25 50:14 _ /opt/gvm/bin/ospd-scanner/bin/python3.6 /opt/gvm/bin/ospd-scanner/bin/ospd-openvas --pid-file /opt/gvm/var/run/ospd-openvas-slave.pid -p 9390 -b 0.0.0.0 -k /opt/gvm/var/lib/gvm/private/CA/clientkey.pem -c /opt/gvm/var/lib/gvm/CA/clientcert.pem --ca-file /opt/gvm/var/lib/gvm/CA/cacert.pem --log-level debug -l /opt/gvm/var/log/gvm/ospd-openvas-slave.log --unix-socket /opt/gvm/var/run/ospd.sock --lock-file-dir /opt/gvm/var/run/ --stream-timeout 100 --config /opt/gvm/etc/openvas/ospd.conf

gvm 20518 25.0 4.9 1027200 816652 ? S May25 6302:20 | _ /opt/gvm/bin/ospd-scanner/bin/python3.6 /opt/gvm/bin/ospd-scanner/bin/ospd-openvas --pid-file /opt/gvm/var/run/ospd-openvas-slave.pid -p 9390 -b 0.0.0.0 -k /opt/gvm/var/lib/gvm/private/CA/clientkey.pem -c /opt/gvm/var/lib/gvm/CA/clientcert.pem --ca-file /opt/gvm/var/lib/gvm/CA/cacert.pem --log-level debug -l /opt/gvm/var/log/gvm/ospd-openvas-slave.log --unix-socket /opt/gvm/var/run/ospd.sock --lock-file-dir /opt/gvm/var/run/ --stream-timeout 100 --config /opt/gvm/etc/openvas/ospd.conf

gvm 6159 0.1 1.7 928636 287040 ? Ss Jun03 21:31 _ /opt/gvm/bin/ospd-scanner/bin/python3.6 /opt/gvm/bin/ospd-scanner/bin/ospd-openvas --pid-file /opt/gvm/var/run/ospd-openvas-slave.pid -p 9390 -b 0.0.0.0 -k /opt/gvm/var/lib/gvm/private/CA/clientkey.pem -c /opt/gvm/var/lib/gvm/CA/clientcert.pem --ca-file /opt/gvm/var/lib/gvm/CA/cacert.pem --log-level debug -l /opt/gvm/var/log/gvm/ospd-openvas-slave.log --unix-socket /opt/gvm/var/run/ospd.sock --lock-file-dir /opt/gvm/var/run/ --stream-timeout 100 --config /opt/gvm/etc/openvas/ospd.conf

gvm 6164 15.0 2.2 1002232 371180 ? S Jun03 1802:34 | _ /opt/gvm/bin/ospd-scanner/bin/python3.6 /opt/gvm/bin/ospd-scanner/bin/ospd-openvas --pid-file /opt/gvm/var/run/ospd-openvas-slave.pid -p 9390 -b 0.0.0.0 -k /opt/gvm/var/lib/gvm/private/CA/clientkey.pem -c /opt/gvm/var/lib/gvm/CA/clientcert.pem --ca-file /opt/gvm/var/lib/gvm/CA/cacert.pem --log-level debug -l /opt/gvm/var/log/gvm/ospd-openvas-slave.log --unix-socket /opt/gvm/var/run/ospd.sock --lock-file-dir /opt/gvm/var/run/ --stream-timeout 100 --config /opt/gvm/etc/openvas/ospd.conf

gvm 28607 0.1 2.0 928892 342516 ? Ss Jun06 13:54 _ /opt/gvm/bin/ospd-scanner/bin/python3.6 /opt/gvm/bin/ospd-scanner/bin/ospd-openvas --pid-file /opt/gvm/var/run/ospd-openvas-slave.pid -p 9390 -b 0.0.0.0 -k /opt/gvm/var/lib/gvm/private/CA/clientkey.pem -c /opt/gvm/var/lib/gvm/CA/clientcert.pem --ca-file /opt/gvm/var/lib/gvm/CA/cacert.pem --log-level debug -l /opt/gvm/var/log/gvm/ospd-openvas-slave.log --unix-socket /opt/gvm/var/run/ospd.sock --lock-file-dir /opt/gvm/var/run/ --stream-timeout 100 --config /opt/gvm/etc/openvas/ospd.conf

gvm 28614 7.2 3.2 1105412 529780 ? S Jun06 552:52 _ /opt/gvm/bin/ospd-scanner/bin/python3.6 /opt/gvm/bin/ospd-scanner/bin/ospd-openvas --pid-file /opt/gvm/var/run/ospd-openvas-slave.pid -p 9390 -b 0.0.0.0 -k /opt/gvm/var/lib/gvm/private/CA/clientkey.pem -c /opt/gvm/var/lib/gvm/CA/clientcert.pem --ca-file /opt/gvm/var/lib/gvm/CA/cacert.pem --log-level debug -l /opt/gvm/var/log/gvm/ospd-openvas-slave.log --unix-socket /opt/gvm/var/run/ospd.sock --lock-file-dir /opt/gvm/var/run/ --stream-timeout 100 --config /opt/gvm/etc/openvas/ospd.conf

All those scans are actually finished since days. I don't see any process locked unfortunately now, all of them seems more or less busy:

28614 gvm 20 0 1038460 462596 5616 R 56,2 2,8 554:14.44 python3.6
11760 gvm 20 0 584060 437128 5680 R 50,0 2,7 8520:07 python3.6
20518 gvm 20 0 571088 360636 5680 R 50,0 2,2 6305:05 python3.6
11751 gvm 20 0 2738560 2,142g 6768 S 43,8 13,7 17207:14 python3.6

But on gsad side; those scans are stucked at various percentage levels (13%, 15% and 70%) while all of them are actually done from openvas perspective.

ospd.openvas though still report results occasionaly for those three tasks, so on it's side the scan are not finished. Period between 2 results report is highly erratic and sometimes can take days until new results are reported to gvmd, up to a point where results reporting will stop completely.

It seems there is a bottleneck at redis point which prevent ospd.openvas from reporting results progressively, although redis is not very loaded, but taking a lot of memory.

Let me know if you need anything else.

@jjnicola jjnicola removed the feedback required Additional information is required label Jun 12, 2020
@jjnicola
Copy link
Member

Hi @wisukind,
Thanks for the quick feedback.
Is this happening only with tls connection between ospd-openvas and gvmd?
When you start ospd-openvas, you are using the -p and -b option together with --unix-socket option. In this case, tls socket has priority. Is this happening only with this connection type or is it reproducible with unix-socket too ?
I was performing some tests and I was not able to reproduce this issue with unix sockets.

@jjnicola jjnicola added the feedback required Additional information is required label Jun 12, 2020
@wisukind
Copy link
Author

wisukind commented Jun 12, 2020

Hi,
It's occurring in both situations. I have another scanner, listening to socket only, on which I had the same problem. Task was stuck at 52% for ages, while openvas did finish. Then suddenly, after at least a week, scan task rised to 100%. It is now blocked in that state since days. On the scanner side I see no activity, although the ospd.openvas process is still running (but without activity):

gvm@ov-master-eqi:~$ ps --forest -aux | grep openvas
gvm 1179 0.0 0.0 13068 1076 pts/0 S+ 12:48 0:00 _ grep --color=auto openvas
gvm 23114 0.0 0.2 581428 287804 ? Sl May13 33:34 /opt/gvm/bin/ospd-scanner/bin/python3.6 /opt/gvm/bin/ospd-scanner/bin/ospd-openvas --pid-file /opt/gvm/var/run/ospd-openvas.pid --unix-socket=/opt/gvm/var/run/ospd.sock --log-file /opt/gvm/var/log/gvm/ospd-scanner.log --log-level DEBUG --lock-file-dir /opt/gvm/var/run/
gvm 1966 29.0 0.5 913068 555008 ? Sl May19 10071:41 _ /opt/gvm/bin/ospd-scanner/bin/python3.6 /opt/gvm/bin/ospd-scanner/bin/ospd-openvas --pid-file /opt/gvm/var/run/ospd-openvas.pid --unix-socket=/opt/gvm/var/run/ospd.sock --log-file /opt/gvm/var/log/gvm/ospd-scanner.log --log-level DEBUG --lock-file-dir /opt/gvm/var/run/

@jjnicola
Copy link
Member

If you have access to redis, could you tell me how many kb are in use and the value of max databases you have set in redis.conf?

I have:

$ redis-cli -s /tmp/redis.sock
redis /tmp/redis.sock> info keyspace
# Keyspace
db0:keys=1,expires=0,avg_ttl=0
db1:keys=159796,expires=0,avg_ttl=0

redis /tmp/redis.sock> CONFIG GET databases
1) "databases"
2) "1025"

@wisukind
Copy link
Author

redis /var/run/redis/redis.sock> info keyspace

Keyspace

db0:keys=1,expires=0,avg_ttl=0
db1:keys=119378,expires=0,avg_ttl=0

redis /var/run/redis/redis.sock> config get databases

  1. "databases"
  2. "30026"

@jjnicola jjnicola removed the feedback required Additional information is required label Jun 15, 2020
@jjnicola
Copy link
Member

jjnicola commented Jun 15, 2020

Hi @wisukind,

I was testing this with both tls and unix socket types. F&F scan configs against a /20 network (4092 hosts). I ran three scans in parallel. I was not able to reproduce this issue. I will keep an eye on this.

Did you check in other log files if some other messages are being logged (in /var/log syslog, messages, debug, kernel) ?

Maybe some issue related with the amount of open file descriptors ? You can check with

lsof |grep ospd |wc -l
lsof |grep redis |wc -l

As I am not being able to reproduce this, it would be nice if you could check the ospd-openvas status for each scan with gvm-cli
Using the command following command to check the status of the task in the ospd-openvas side, if it is the same as you see in the GUI. So, we can discard/assume a communication issue between ospd and gvmd.

"<get_scans scan_id='TASK-ID' details='0'/>" 

Regards,

@wisukind
Copy link
Author

wisukind commented Jun 15, 2020

Hi Juan,

Thanks for your follow up. Here is my outputs.

root@ov-master-eqi:# lsof |grep ospd |wc -l
lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/108/gvfs
Output information may be incomplete.
32
root@ov-master-eqi:
# lsof |grep redis |wc -l
lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/108/gvfs
Output information may be incomplete.
125

Then

gvm@ov-master-eqi:~$ gvm-cli --timeout -1 --gmp-username admin --gmp-password pass --socketpath /opt/gvm/var/run/gvmd.sock -X "<get_scans scan_id='2b1d4aec-361b-4c5a-9a79-1d4357b4e98c' details='0'/>"
Response Error 400. Bogus command name

At this point I have tasks stucked at 100% for several days. One task stopped without any reason, during the night; the only thing I have in the log is:

md manage:WARNING:2020-06-12 09h48.30 UTC:31455: OSP get_osp_scan_status 56274eaa-54f9-49aa-be78-e105dd9273d2: Failed to find scan '56274eaa-54f9-49aa-be78-e105dd9273d2'
md manage:WARNING:2020-06-12 09h48.45 UTC:31455: OSP get_scan 56274eaa-54f9-49aa-be78-e105dd9273d2: Failed to find scan '56274eaa-54f9-49aa-be78-e105dd9273d2'
event task:MESSAGE:2020-06-12 10h03.34 UTC:31455: Status of task Marktheindenfeld, All Networks (5913f75c-ce59-467c-a3ce-548f1da719bf) has changed to Stopped

It may be a different issue, though. But I notice this generally happens after a task is stucked for a long time.

One difficulty to track down issues like this is that task SID is not shared between gvmd and ospd-openvas. Gvmd use a particular task ID, ospd use another one for the same task. When you have multiple tasks running simultaneously, it's difficult to track down events to a particular task.

Having said that, if you cannot reproduce the issue whatsoever, it may be linked to my targets complexity. I'll split targets which regularly pose problems to smaller targets, and see if that makes any difference.

Thanks

@jjnicola
Copy link
Member

jjnicola commented Jun 15, 2020

gvm-cli is used for direct communication with ospd too, using OSP protocol instead gmp. Therefore you are getting "bogus command" error.
As you run the ospd-openvas with tls socket, please try something like this, trying to connect to the right socket:

 gvm-cli --protocol OSP tls --hostname <IP-ADDRESS> --port <PORT> --cafile  <path-to>/gvm/CA/cacert.pem --certfile <path-to>/gvm/CA/clientcert.pem --keyfile <path-to>/gvm/private/CA/clientkey.pem --xml "<get_scans details='0'/>"

@wisukind
Copy link
Author

There is something wrong. I get an certificate verification error when trying to access ospd via tls. I tried using the local ospd client & server certificates, but not luck.

I don't understand what's going on; these are the certificates I used with --create-scanner and my scanner works and is validated without issues.

@jjnicola
Copy link
Member

Hi @wisukind,
As far as I can see, you are starting ospd with the client certs. You should start ospd with the server cert and key, and use the client key in the client (gvmd or gvm-cli). Also, I was dealing with the local certificates, so I ended creating my owns by hand. After that, the communication between the gvm-cli and gvmd with ospd-openvas was successful.
Regarding the original issue, I was not able to reproduce it. So, I will wait until you can confirm there is no communication issue between gvmd and ospd. I mean, try to connect via gvm-cli and call <get_scans> to be sure that the info shown in the GUI is the same shown by ospd.

Regards.
Juan

@jjnicola jjnicola added the feedback required Additional information is required label Jun 16, 2020
@wisukind
Copy link
Author

wisukind commented Jun 16, 2020 via email

@wisukind
Copy link
Author

I forgot to mention; all certificates were generated with gvm-manager-certs -a

@jjnicola
Copy link
Member

Hi @wisukind !
I created a patch for gvm-manage-certs. PR greenbone/gvmd#1139
The newer gnutls version used in manager is more strict regarding the certificates usage. With this patch you should create a new certs/keys. Use the server cert/key in the ospd side, and the client cert/key in the client sides (gvm-cli and gvmd).

@jjnicola
Copy link
Member

Hi @wisukind,
I realized that you are using the option --stream-timeout = 100 for ospd. Not sure why you are using this large value here. I would use 1 or 5 maximal. I tried to reproduce this issue again with a value of 100 but still not able.
Setting a smaller value could probably improve the situation, also for your other reported issue greenbone/gvmd#1061.

@wisukind
Copy link
Author

wisukind commented Jun 22, 2020 via email

@wisukind
Copy link
Author

wisukind commented Jul 2, 2020

Hi Jose,

Just an update on this bug; I have another process in the same situation:

  • Scan task started the 30th of June:
    2020-06-30 16:29:50,168 OSPD - openvas: INFO: (ospd.ospd) 2ee7ef43-2d4a-4402-8743-6b55132c9550: Scan started.
  • Openvas confirm task finishes successfully on July 2th:
    sd main:MESSAGE:2020-07-02 00h55.39 utc:32766: Test complete
  • Ospd last return result is 30th of June, a few hours after the scan started.
    2020-06-30 20:05:17,606 OSPD - openvas: DEBUG: (ospd.ospd) Returning 4667 results
    After that time; no more results returned to gvmd, although gvmd keeps reconnecting to ospd for updates (after a looong silence, see below):
2020-06-30 20:05:17,606 OSPD - openvas: DEBUG: (ospd.ospd) Returning 4667 results
2020-07-01 14:23:43,028 OSPD - openvas: DEBUG: (ospd.server) New connection from ('10.194.157.7', 35272)
2020-07-01 14:23:43,120 OSPD - openvas: DEBUG: (ospd.server) New connection from ('10.194.157.7', 35490)
2020-07-01 14:24:04,471 OSPD - openvas: DEBUG: (ospd.server) New connection from ('10.194.157.7', 46487)

On gsad, scan is stuck at 7%. On scanner node side, python is still consuming a lot of CPU, so it's doing something.

As I explained earlier, I can't remotely connect to the slave ospd due to the certificate issue, but hopefully there is other ways to investigate where the botteneck is ?

Please advise.
Thank you !

@jjnicola
Copy link
Member

Hi @wisukind ,
Is this issue still valid or has it been solved with the new releases? Please feel free to close the issue if fixed.

@wisukind
Copy link
Author

I think we can close this issue. I havn't seen this problem anymore since more than 6 months.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working feedback required Additional information is required
Projects
None yet
Development

No branches or pull requests

3 participants