Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -18,13 +18,19 @@ env/
.vscode/
*.swp

# Data
# Data and cache
*.db
metrics.db
.features_cache

# Logs
*.log

# Old/backup files
*.old
*.bak
*.tmp

# OS
.DS_Store
Thumbs.db
119 changes: 119 additions & 0 deletions DISTRIBUTION.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
# Distribution Setup Complete

## Summary
Cluster Health Monitor v1.0.0 is now ready for portable ZIP distribution.

## What Was Implemented

### 1. Code Cleanup
- Removed debug print statements from workloads.py
- No emojis or verbose logging in code
- Clean, concise comments throughout

### 2. Feature Detection & Caching
- `monitor/utils/features.py`: Runtime feature detection
- Detects: nvidia-smi, cupy, torch, gpu_benchmark availability
- Results cached in `.features_cache` JSON file
- Fast subsequent loads (no repeated checks)

### 3. Requirements Simplified
- Single `requirements.txt` file
- Core dependencies required
- GPU libraries (cupy/torch) commented as optional
- Setup script prompts for GPU library installation

### 4. PowerShell Setup Script
- `setup.ps1`: Automated Windows setup wizard
- Checks Python 3.8+
- Detects NVIDIA drivers and CUDA version
- Creates virtual environment
- Installs dependencies
- Prompts for CuPy or PyTorch based on CUDA version
- Runs feature detection and caching
- Verifies installation

### 5. Update Mechanism
- CLI: `python health_monitor.py --update`
- Web: "Check for Updates" button in header
- Checks GitHub releases API
- Downloads and applies updates automatically
- Preserves venv, config, and data

### 6. Feature Graying in UI
- `/api/features` endpoint returns cached feature flags
- JavaScript checks features on page load
- Disables benchmark controls if GPU libraries not available
- Visual feedback: opacity 0.5, cursor not-allowed
- Alert message explains missing libraries

### 7. Multi-GPU Support
- Already implemented in gpu.py collector
- Loops through all NVIDIA GPUs via NVML
- Web UI displays all GPUs in grid
- Benchmark supports any GPU (defaults to GPU 0)

### 8. Portable ZIP Distribution
- `package.ps1`: Creates distribution ZIP
- Includes: monitor/, health_monitor.py, config.yaml, requirements.txt, setup.ps1, README.md, LICENSE
- Excludes: venv, __pycache__, .features_cache, *.db
- ~50KB compressed size
- Ready for GitHub releases

### 9. Updated Documentation
- README.md rewritten for ZIP distribution
- Installation: Download → Extract → Run setup.ps1
- Troubleshooting section updated
- Simplified project structure
- Removed development-focused content

## Files Created/Modified

### New Files
- `monitor/utils/features.py` - Feature detection
- `monitor/utils/update.py` - Update mechanism
- `monitor/utils/__init__.py` - Utils module exports
- `setup.ps1` - Windows setup wizard
- `package.ps1` - Distribution packaging script

### Modified Files
- `health_monitor.py` - Added --update flag
- `monitor/api/server.py` - Added /api/features, /api/update/* endpoints
- `monitor/api/templates/index.html` - Update button, feature graying
- `monitor/benchmark/workloads.py` - Removed debug prints
- `requirements.txt` - Simplified to single file
- `README.md` - Complete rewrite for ZIP distribution

### Removed Files
- `requirements-base.txt` - Merged into requirements.txt
- `requirements-gpu.txt` - Merged into requirements.txt
- `setup.py` - No longer using pip package
- `MANIFEST.in` - No longer needed
- `BUILD.md` - Removed
- `CHECKLIST.md` - Removed
- `RELEASE_NOTES.md` - Removed

## Usage

### For End Users
1. Download `cluster-health-monitor-v1.0.0.zip` from releases
2. Extract to desired location
3. Run `setup.ps1` in PowerShell
4. Activate venv and run: `python health_monitor.py monitor --web`
5. Access dashboard at http://localhost:8090

### For Distribution
1. Run `.\package.ps1` to create ZIP
2. Upload `cluster-health-monitor-v1.0.0.zip` to GitHub releases
3. Users download and follow above steps

### For Updates
Users can update via:
- CLI: `python health_monitor.py --update`
- Web: Click "Check for Updates" button

## Next Steps (Future)
- Create GitHub Actions workflow for automated releases
- Add version check on startup (optional notification)
- Multi-platform support (Linux setup.sh)
- Configuration wizard in web UI
- Export/import settings
Loading