Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Empty Insight graphs from 16.1.16 #1017

Closed
oparoz opened this issue Jun 17, 2016 · 11 comments
Closed

Empty Insight graphs from 16.1.16 #1017

oparoz opened this issue Jun 17, 2016 · 11 comments
Assignees
Labels
bug Production bug
Milestone

Comments

@oparoz
Copy link
Contributor

oparoz commented Jun 17, 2016

netflow, flowd and flowd_aggregate services are up, but the Insight page has empty graphs with the message "No data available"

On the cache tab of the Netflow configuration page, stats clearly show that data is captured.

Looking at system.log, I can see requests made for the stats data and no errors.

flowd.log is several hundred MB big.

Analysing the log from the shell works.

@oparoz
Copy link
Contributor Author

oparoz commented Jun 17, 2016

Just saw this in the logs, but I have no idea if it's directly related:

flowd_aggregate.py: flowd aggregate died with message Traceback (most recent call last): File "/usr/local/opnsense/scripts/netflow/flowd_aggregate.py", line 145, in run aggregate_flowd(do_vacuum) File "/usr/local/opnsense/scripts/netflow/flowd_aggregate.py", line 85, in aggregate_flowd stream_agg_object.cleanup(do_vacuum) File "/usr/local/opnsense/scripts/netflow/lib/aggregate.py", line 281, in cleanup self._update_cur.execute('vacuum') DatabaseError: database disk image is malformed

I don't know which disk image is malformed. The log is fine since info can be extracted. No idea about the state of the various sqlite databases.

@fichtner
Copy link
Member

fichtner commented Jun 17, 2016

Is this for all the time ranges or just the first one(s)? The later ones were always fine for me whatever happened.

I think that #983 will address this, because as long as the netflow capture is still there the database can always be recreated.

Stop flowd_aggregate, clear /var/netflow/*, start flowd_aggregate, wait a bit...

(Make a backup of var/netflow if you can)

I'm on this, but haven't been able to reproduce the malformation lately.

@oparoz
Copy link
Contributor Author

oparoz commented Jun 17, 2016

Both graphs are empty. /var used to be in memory, so that might be what did corrupt the DB. I'm going to empty the stats, as suggested and let you know.

@fichtner
Copy link
Member

Wait, there are 10 time ranges... the top ones may be faulty (selected by default), but the latter ones should be ok?

@oparoz
Copy link
Contributor Author

oparoz commented Jun 17, 2016

Yep, I've just tested and the ranges from 7 days ago are OK, so historical data seem to be here. Anything after that is empty.

Since the top one is updated every 30s and more than that has passed, I'm guessing it's not related to a corrupted DB

@fichtner
Copy link
Member

I suspect that's ecactly the problem, the update frequency improves the window of opportunity for bad backups, so the less frequently updated ones end up being fine. At some point all the data in the shortest period database are flushed out and it's ok to write again. But as I said, have been unable to get this to fail this week, for better or worse. ;)

@oparoz
Copy link
Contributor Author

oparoz commented Jun 17, 2016

OK, I'll wait for the next update to see if there are any improvements :)

@oparoz oparoz changed the title Empty Insight graphs from 16.1.17 Empty Insight graphs from 16.1.16 Jun 17, 2016
@oparoz
Copy link
Contributor Author

oparoz commented Jun 17, 2016

There is definitely something wrong as flowd_aggregate is eating 90% of a CPU.

@fichtner
Copy link
Member

High CPU may be a sign of a large flowd capture file under /var/log

I'm closing this with the next commit, fixing the race by stopping the service before and restarting it after backup.

@fichtner fichtner self-assigned this Jul 25, 2016
@fichtner fichtner added this to the 16.7 milestone Jul 25, 2016
@fichtner fichtner added the bug Production bug label Jul 25, 2016
@fichtner
Copy link
Member

BTW, you can now also clear all the databases from the Reporting: Settings page.

fichtner added a commit that referenced this issue Jul 25, 2016
@oparoz
Copy link
Contributor Author

oparoz commented Jul 25, 2016

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Production bug
Development

No branches or pull requests

2 participants