Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(snmp): Large SNMP improvement #304

Merged
merged 10 commits into from Sep 18, 2017
Merged

fix(snmp): Large SNMP improvement #304

merged 10 commits into from Sep 18, 2017

Conversation

themylogin
Copy link
Contributor

  • Use AgentX instead of pass command to simplify code and improve
    performance
  • Update the structure of MIB file to follow conventions for table
  • Expose zvols state

Tickets: #24402, #25734

* Use AgentX instead of `pass` command to simplify code and improve
  performance
* Update the structure of MIB file to follow conventions for table
* Expose zvols state

Tickets: #24402, #25734
@william-gr
Copy link
Member

Hmm I dont see any special daemon handling in snmp-agent.py

Does agent.start() turns the process into a daemon?

@themylogin
Copy link
Contributor Author

@william-gr I've forgot adding & at the end of /usr/local/bin/python /usr/local/bin/snmp-agent.py >/dev/null 2>&1, will this fix it or something else is also needed?

@william-gr
Copy link
Member

@themylogin Hmm I worry only about management of the process.
Ideally we would have to kill that process if SNMP is disabled/stopped.

@themylogin
Copy link
Contributor Author

@william-gr ok, I'll add that

@william-gr
Copy link
Member

Just for reference, previously this was handled in ix-snmp

{
         kill -TERM  $(cat /var/run/freenas-snmpd.pid) >/dev/null 2>&1
}

@darkfiberiru
Copy link
Contributor

I'm still reviewing code changes. Are we caching ZFS data maybe somewhere from 5 - 30 seconds. That way we're not constantly pulling ZFS data every oid query. From my work with networking devices and other appliances these kind of updates time is very normal for certain OIDs.

I potentially have a customer whose monitoring code is hanging from this but I'm still trying to lock down the issue as only seen it so far. When I started looking over old code it seemed like we are not caching zfs data

@darkfiberiru
Copy link
Contributor

Also do we have documentation stuff flagged to let customers know when we update mibs?
Are we versioning mibs?

I'm available for any snmp discussion if wanted as someone who heavily implemented monitoring systems facing snmp and snmp's quirks in the past. Of course only if helpful.

I much appreciate this work and think it will really benefit users

@themylogin
Copy link
Contributor Author

@darkfiberiru

Are we caching ZFS data maybe somewhere from 5 - 30 seconds

Yes, but current cache timeout is 1 second (if datetime.utcnow() - last_update_at > timedelta(seconds=1):). This can be customizable value if necessary. I have no serious experience with SNMP or ZFS, so I am very open to instructions on how things should work.

When I started looking over old code it seemed like we are not caching zfs data

Old code was starting new process for each OID query, of course, with no cache. Replacing this with SNMP agent approach was primary target of this PR. New code should work much better.

Are we versioning mibs?

No. I had no information if previous SNMP implementation was considered usable and stable. Do we need to support old MIB as well as new one?

@darkfiberiru
Copy link
Contributor

I more meant versioning not for support of old mibs but rather customer easily being able to figure out that there an update. I'm not sure on technical side of this but it may be nice to be able to say idk in 11.1 we have moved to FREENAS-MIB v4 or something similar in release notes. Maybe also a download button for mibs in snmp services tab(just tar the folder) but that may be a pipe dream and not fit current standards.

I have a feeling that 1 second will be a massive improvement. But having it tuneable might not be a bad idea. Some Freenas users go to extremes and very interesting deployments and future proofing that will probably save pain in future.

Overall I'm very excited about this work and thankful as I was going to try and work on a much less clean fix as a proof of concept.

@araujobsd
Copy link
Contributor

Jenkins: retest this please

@themylogin
Copy link
Contributor Author

Jenkins: retest this please

@themylogin
Copy link
Contributor Author

Jenkins: retest this please

2 similar comments
@themylogin
Copy link
Contributor Author

Jenkins: retest this please

@themylogin
Copy link
Contributor Author

Jenkins: retest this please

Random start failures occur when snmp-agent starts before snmpd
@themylogin
Copy link
Contributor Author

Jenkins: retest this please

3 similar comments
@themylogin
Copy link
Contributor Author

Jenkins: retest this please

@kmoore134
Copy link
Contributor

Jenkins: retest this please

@themylogin
Copy link
Contributor Author

Jenkins: retest this please

Copy link
Member

@william-gr william-gr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@themylogin themylogin merged commit cc93375 into master Sep 18, 2017
@themylogin themylogin deleted the FIX-24402 branch September 18, 2017 19:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants