Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Full Python code fusion / refactor and hardening 2nd edition #13188

Merged
merged 53 commits into from
Sep 27, 2021

Conversation

deajan
Copy link
Contributor

@deajan deajan commented Aug 31, 2021

Please give a short description what your pull request is for

This PR is a full refactor of all the python code, which has grown to have many duplicates over time.
It replaces #13090, #13094 and #13125
The main aim is to make the codebase leaner and more maintainable, while hardening the code.
It should have no user visible effect hopefully.

Enhancements:

  • All system calls are now guaranteed to:
    • Have a timeout preventing hanging (for service and wrapper)
    • Always return an exit code
    • Have interpretable results (there's always a result, regardless of OS or script failure)
    • Work regardless of Python versions of subprocess module
  • discovery, service-poller and poller wrapping is now fully logged
  • Retain compatibility with earlier LibreNMS implementations
  • Add an log message when too old Redis is used

Fixes:

  • Removed duplicate code for
    • Database connection functionnality
    • Config fetching
  • Merged discovery, service-poller and poller wrapper code into a single wrapper
  • Linting fixes

Please note

Please read this information carefully. You can run ./lnms dev:check to check your code before submitting.

  • Have you followed our code guidelines?
  • If my Pull Request does some changes/fixes/enhancements in the WebUI, I have inserted a screenshot of it.

Testers

If you would like to test this pull request then please run: ./scripts/github-apply <pr_id>, i.e ./scripts/github-apply 5926
After you are done testing, you can remove the changes with ./scripts/github-remove. If there are schema changes, you can ask on discord how to revert.

@deajan
Copy link
Contributor Author

deajan commented Sep 9, 2021

I've spent another week trying to work down some strange race conditions that only happened on pypy 3.7 (?), where the thread that checked the timeout did not finish in time for the process monitor to be informed that a timeout had happened.
Strange issue that only happened on github actions, unable to replicate locally.
Anyway, I'm pretty sure the subprocess calls work good.

@ottorei Could you update the patch and report back please ?

@deajan
Copy link
Contributor Author

deajan commented Sep 9, 2021

Since the last 3 commits, all tests that succeed now fail with missing PyMySQL and redis python packages, but I didn't change anything related. @murrant could you please help me here ? I'm kinda lost

[EDIT] Seems to be okay now, something strange in the build farm I guess [/EDIT]

@deajan deajan requested a review from murrant September 10, 2021 09:37
@paulierco paulierco mentioned this pull request Sep 13, 2021
2 tasks
@ottorei
Copy link
Contributor

ottorei commented Sep 14, 2021

I've spent another week trying to work down some strange race conditions that only happened on pypy 3.7 (?), where the thread that checked the timeout did not finish in time for the process monitor to be informed that a timeout had happened.
Strange issue that only happened on github actions, unable to replicate locally.
Anyway, I'm pretty sure the subprocess calls work good.

@ottorei Could you update the patch and report back please ?

Sure, I'll update my servers tomorrow with the latest commits.

@murrant murrant added this to the 21.10.0 milestone Sep 14, 2021
@deajan
Copy link
Contributor Author

deajan commented Sep 21, 2021

@ottorei Any news ?

@ottorei
Copy link
Contributor

ottorei commented Sep 22, 2021

@ottorei Any news ?

Sorry for the delay, just updated today. The new code has been running a few hours now without noticeable issues. I will let it run for some time and check the journalctl logs for anything interesting.

@paulierco
Copy link
Contributor

I've been tested this for my dev server with 100 devices and i didn't had any issues.

@murrant
Copy link
Member

murrant commented Sep 24, 2021

@deajan ready to try to merge it?

@deajan
Copy link
Contributor Author

deajan commented Sep 24, 2021

@murrant I'm on the road right now.
If we schedule monday, I can keep an eye on the discord channel. Could this be okay ?

@ottorei
Copy link
Contributor

ottorei commented Sep 25, 2021

@murrant I'm on the road right now.
If we schedule monday, I can keep an eye on the discord channel. Could this be okay ?

I have been running this for the past few days now on a 5 server cluster with ~8000 devices. I checked the journalctl logs of the poller service on multiple servers matching error lines - did not see any noticeable issues other than planned service breaks.

Good job ;)

@deajan
Copy link
Contributor Author

deajan commented Sep 27, 2021

@ottorei Thanks for the feedback

@murrant I'll stick toi the discord this evening (CEST+1) which should be the time you are online hopefully

@murrant murrant merged commit bfa200f into librenms:master Sep 27, 2021
@librenms-bot
Copy link

This pull request has been mentioned on LibreNMS Community. There might be relevant details there:

https://community.librenms.org/t/21-10-0-changelog/17124/1

eskyuu pushed a commit to eskyuu/librenms that referenced this pull request Oct 21, 2021
…s#13188)

* New service/discovery/poller wrapper

* Convert old wrapper scripts to bootstrap loaders for wrapper.py

* Move wrapper.py to LibreNMS module directory

* Reformat files

* File reformatting

* bootstrap files reformatting

* Fusion service and wrapper database connections and get_config_data functions

* Moved subprocess calls to command_runner

* LibreNMS library and __init__ fusion

* Reformat files

* Normalize logging use

* Reformatting code

* Fix missing argument for error log

* Fix refactor typo in DBConfig class

* Add default timeout for config.php data fetching

* distributed discovery should finish with a timestamp instead of an epoch

* Fix docstring inside dict prevents service key to work

* Fix poller insert statement

* Fix service wrapper typo

* Update docstring since we changed function behavior

* Normalize SQL statements

* Convert optparse to argparse

* Revert discovery thread number

* Handle debug logging

* Fix file option typo

* Reformat code

* Add credits to source package

* Rename logs depending on the wrapper type

* Cap max logfile size to 10MB

* Reformat code

* Add exception for Redis < 5.0

* Make sure we always log something from service

* Fix bogus description

* Add an error message on missing config file

* Improve error message when .env file cannot be loaded

* Improve wrapper logging

* Fix cron run may fail when environment path is not set

* Add missing -wrapper suffix for logs

* Conform to prior naming scheme

* Linter fix

* Add inline copy of command_runner

* Another linter fix

* Raise exception after logging

* Updated inline command_runner

* Add command_runner to requirements

* I guess I love linter fixes ;)

* Don't spawn more threads than devices

* Fix typo in log call

* Add exit codes to log on error, add command line to debug log

* Add thread name to error message

* Log errors in end message for easier debugging

* Typo fix

* In love of linting
@deajan deajan mentioned this pull request Oct 22, 2021
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants