Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Catch up on missed scans #64

Closed
jvehent opened this issue Dec 4, 2015 · 7 comments · Fixed by #76
Closed

Catch up on missed scans #64

jvehent opened this issue Dec 4, 2015 · 7 comments · Fixed by #76
Assignees

Comments

@jvehent
Copy link
Contributor

jvehent commented Dec 4, 2015

If the scanners crash, they need to pick up incomplete work. One way to do that is to periodically run this query:

select pg_notify('scan_listener', ''||id ) from scans where completion_perc=0;

which will resend scan notifications to target that are still at 0% completion.

@0xdiba
Copy link
Contributor

0xdiba commented Dec 4, 2015

Yeah I also had in mind to do that also for non-acknowledged scans ( may happen in a period when the scanners are redeployed but the api is up and running. )

Both are postgres module internal changes so they will not affect the worker functions.

@0xdiba 0xdiba self-assigned this Dec 4, 2015
@jvehent
Copy link
Contributor Author

jvehent commented Dec 4, 2015

I'm tracking down a bug where all scans are acked, but very few complete:

observatory=> select count(*) from scans where ack='false';
 count 
-------
     0
(1 row)
 completion_perc | count  
-----------------+--------
               0 | 194690
              20 |    443
              40 |      2
             100 |  18520

The scanners are receiving notifications, but scan don't happen. The current load average on the scanners is close to 0%.

@jvehent
Copy link
Contributor Author

jvehent commented Dec 4, 2015

Nevermind that previous comment: I had an issue in my script and was calling the scan API with an empty target. Since the validatedomain function doesn't yet verify the target, the scanner were trying to scan empty targets and failing in an unexpected way. I'm preparing a patch for validate domain now.

This issue remains.

@jvehent
Copy link
Contributor Author

jvehent commented Dec 14, 2015

Now that it's been running for a while, here are some real world stats:

observatory=> select ack, count(*) from scans group by ack;
 ack |  count  
-----+---------
 f   |     281
 t   | 4189855
(2 rows)
observatory=> select completion_perc, count(*) from scans group by completion_perc;
 completion_perc |  count  
-----------------+---------
               0 | 3278345
              20 |    9169
              40 |    1428
             100 |  900753

So it seems like scans gets acknowledge, picked up by a scanner goroutine, but never complete. Do you think limiting the number of scanner in a sync group would help?

@0xdiba
Copy link
Contributor

0xdiba commented Dec 14, 2015

It depends on what the problem, preventing the scan from completing, is.
If it involves concurrent database connections the sync group would help.

Do we have the syslog files of the running containers to check if any errors have been logged?

Regardless of that I am preparing a patch which will catch up on both unacknowledged and half-complete scans and re-queue them.

@0xdiba
Copy link
Contributor

0xdiba commented Dec 14, 2015

check out 0c68439 .
What do you think we should do with the half-complete scans ( 0<completion_perc<100 )?

If we decide to re-queue them after a specific amount of time ( eg 5-6 mins )
we must take care ( delete or verify ) of the trusts and workers' analyses created by them.

@jvehent
Copy link
Contributor Author

jvehent commented Dec 14, 2015

I'd say abandon them. If a scanner starts and does some of the work, but crashed after completion_perc>0, there must be a reason and we should track those in the logs, and/or return feedback to the caller. That's a topic for another issue.

@0xdiba 0xdiba mentioned this issue Dec 14, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants