Skip to content

DEVOPS-1973 improve stack health reporting#39

Merged
kkellerlbl merged 36 commits intomasterfrom
DEVOPS-1973
Jul 23, 2024
Merged

DEVOPS-1973 improve stack health reporting#39
kkellerlbl merged 36 commits intomasterfrom
DEVOPS-1973

Conversation

@kkellerlbl
Copy link
Copy Markdown
Member

The Rancher 1 API doesn't report stack health very granularly. There are many legitimate reasons the stack could be degraded but still fine. (For example, a new service is being spun up right when the health check is done.)

These changes try to track specifically which services (if any) are not healthy when the stack reports unhealthy. If all services are now healthy when checked, assume the stack is too and don't throw an alarm. And if a service is unhealthy, track it in a sqlite database, and only report the stack unhealthy if there are services that have been unhealthy for longer than the stack_health_age parameter. This avoids the issue of two otherwise healthy services causing the entire stack to seem unhealthy for longer than that parameter.

This PR also removes some old code.

Create sqlite3 file for storing list of bad services
Fix path to sqlite file
Assume that if Rancher reports the stack is healthy, all services are currently healthy, and delete the table rows.
Add a timestamp to bad service table.
Also add serviceId to sqlite3
If all services in stack are healthy, assume stack is now healthy
scan services if stack health bad
@bio-boris
Copy link
Copy Markdown
Contributor

bio-boris commented Jul 22, 2024

Pull Request Review

General Feedback

This script checks the status of Rancher 1.x agents, stacks, and services in specified environments and creates a dummy service in a given stack if required. The code is generally well-structured but can benefit from improvements in error handling, modularity, and readability.

Specific Suggestions

1. Imports and Dependencies

  • Issue: Multiple imports are unused
  • Suggestion: Remove unused imports to improve code clarity.

2. Argument Parsing

  • Issue: Argument parsing and configuration file loading are mixed with the main logic.
  • Suggestion: Encapsulate argument parsing and configuration loading in separate functions to enhance modularity.

3. Error Handling

  • Issue: Error handling is minimal, especially in HTTP requests and JSON parsing.
  • Suggestion: Add robust error handling to manage potential failures in network requests and JSON parsing.

4. Code Modularity

  • Issue: The process_section function is lengthy and handles multiple responsibilities.
  • Suggestion: Break down process_section into smaller, single-responsibility functions (e.g., check_rancher_agents, monitor_services, test_stack_health, create_dummy_service).

5. Logging

  • Issue: The script prints directly to the console.
  • Suggestion: Use a logging framework to provide better control over logging levels and outputs.

6. Configuration Management

  • Issue: Configuration parameters are scattered and not validated.
  • Suggestion: Centralize configuration management and validate parameters before use.

7. Code Style

  • Issue: Inconsistent code style (e.g., mix of single and double quotes, inconsistent indentation).
  • Suggestion: Follow PEP 8 style guidelines to ensure consistent and readable code.

@kkellerlbl kkellerlbl merged commit a2851bc into master Jul 23, 2024
@kkellerlbl kkellerlbl deleted the DEVOPS-1973 branch July 23, 2024 19:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants