-
Notifications
You must be signed in to change notification settings - Fork 6
feat: priority queue implementation #99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
alvin-reyes
wants to merge
13
commits into
main
Choose a base branch
from
feat/priority-queue
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from all commits
Commits
Show all changes
13 commits
Select commit
Hold shift + click to select a range
2a34723
feat: priority queue implementation
alvin-reyes cfec68f
fix the queue test
alvin-reyes 39d7274
add queue tests
alvin-reyes 8094122
add docs
alvin-reyes e5f7c71
address copilot comments
alvin-reyes 9968113
close DequeueBlocking properly
alvin-reyes 895ab42
resolve nitpicks
alvin-reyes 6f29d58
optimize and simplify DequeueBlocking
alvin-reyes 61d4c90
clean up
alvin-reyes 83a255f
nit fixes
alvin-reyes 6140d2b
check EnqueueFast close
alvin-reyes 03f3af9
add worker id list from an endpoint
alvin-reyes fdb0ef1
fix test cases
alvin-reyes File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,193 @@ | ||
# Priority Queue System Documentation | ||
|
||
## Overview | ||
|
||
The tee-worker now implements a priority queue system that enables preferential processing of jobs from specific worker IDs. This system ensures that high-priority workers get their jobs processed faster while maintaining fair processing for all workers. | ||
|
||
## Architecture | ||
|
||
### Components | ||
|
||
1. **Dual Queue System** | ||
- **Fast Queue**: For jobs from priority worker IDs | ||
- **Slow Queue**: For jobs from regular worker IDs | ||
|
||
2. **Priority Manager** | ||
- Maintains a list of priority worker IDs | ||
- Fetches updates from an external endpoint | ||
- Refreshes the list periodically (default: 15 minutes) | ||
|
||
3. **Job Router** | ||
- Routes incoming jobs to appropriate queues based on worker ID | ||
- Falls back to slow queue if fast queue is full | ||
|
||
4. **Worker Processing** | ||
- Workers always check fast queue first | ||
- Only process slow queue jobs when fast queue is empty | ||
|
||
## Configuration | ||
|
||
Configure the priority queue system using these environment variables: | ||
|
||
```bash | ||
# Enable/disable priority queue system (default: true) | ||
ENABLE_PRIORITY_QUEUE=true | ||
|
||
# Queue sizes | ||
FAST_QUEUE_SIZE=1000 # Max jobs in fast queue (default: 1000) | ||
SLOW_QUEUE_SIZE=5000 # Max jobs in slow queue (default: 5000) | ||
|
||
# External endpoint for priority worker list | ||
EXTERNAL_WORKER_ID_PRIORITY_ENDPOINT=https://api.example.com/priority-workers | ||
|
||
# Refresh interval in seconds (default: 900 = 15 minutes) | ||
PRIORITY_REFRESH_INTERVAL_SECONDS=900 | ||
``` | ||
|
||
## External Endpoint Format | ||
|
||
The external endpoint should return JSON in this format: | ||
|
||
```json | ||
{ | ||
"workers": [ | ||
"https://217.28.137.141:50035", | ||
"https://20.245.90.64:50001", | ||
"https://40.76.123.136:50042", | ||
"https://172.214.189.153:18080" | ||
] | ||
} | ||
``` | ||
|
||
**Note**: The system currently uses the full URL as the worker ID. When submitting jobs, use the complete URL as the worker_id to match against the priority list. | ||
|
||
## Job Flow | ||
|
||
``` | ||
1. Job arrives from a submitter with their worker_id | ||
2. System checks if the submitter's worker_id is in priority list | ||
3. If priority submitter → Route to fast queue | ||
4. If regular submitter → Route to slow queue | ||
5. Tee-worker processes fast queue first, then slow queue | ||
``` | ||
|
||
**Important**: The priority is based on the job submitter's worker ID, not the tee-worker's own ID. This allows certain job submitters to have their requests processed faster. | ||
|
||
## API Endpoints | ||
|
||
### Queue Statistics | ||
```bash | ||
GET /job/queue/stats | ||
``` | ||
|
||
Response: | ||
```json | ||
{ | ||
"fast_queue_depth": 10, | ||
"slow_queue_depth": 45, | ||
"fast_processed": 1234, | ||
"slow_processed": 5678, | ||
"last_update": "2024-01-15T10:30:00Z" | ||
} | ||
``` | ||
|
||
## Development & Testing | ||
|
||
### Using Real Endpoint | ||
|
||
To use the actual TEE workers endpoint: | ||
```bash | ||
export EXTERNAL_WORKER_ID_PRIORITY_ENDPOINT="https://tee-api.masa.ai/list-tee-workers" | ||
``` | ||
|
||
### Using Dummy Data | ||
|
||
When no external endpoint is configured or if the endpoint fails, the system falls back to dummy priority worker IDs: | ||
- `worker-001`, `worker-002`, `worker-005` | ||
- `worker-priority-1`, `worker-priority-2` | ||
- `worker-vip-1` | ||
- `worker-high-priority-3` | ||
- `worker-fast-lane-1` | ||
|
||
### Disable Priority Queue | ||
|
||
To run in legacy mode (single queue): | ||
```bash | ||
ENABLE_PRIORITY_QUEUE=false | ||
``` | ||
|
||
## Implementation Details | ||
|
||
### Files Added/Modified | ||
|
||
1. **New Files**: | ||
- `internal/jobserver/priority_queue.go` - Dual queue implementation | ||
- `internal/jobserver/priority_manager.go` - Priority worker list management | ||
- `internal/jobserver/errors.go` - Error definitions | ||
|
||
2. **Modified Files**: | ||
- `internal/jobserver/jobserver.go` - Integration with priority system | ||
- `internal/jobserver/worker.go` - Priority-aware job processing | ||
- `api/types/job.go` - Added GetBool helper method | ||
- `internal/api/routes.go` - Added queue stats endpoint | ||
- `internal/api/start.go` - Registered new endpoint | ||
|
||
### Key Features | ||
|
||
- **Non-breaking**: Falls back to legacy mode when disabled | ||
- **Resilient**: Uses dummy data if external endpoint fails | ||
- **Observable**: Queue statistics endpoint for monitoring | ||
- **Configurable**: All parameters can be tuned via environment | ||
- **Concurrent**: Thread-safe operations with proper locking | ||
|
||
## Example Usage | ||
|
||
### Start with Priority Queue | ||
```bash | ||
export ENABLE_PRIORITY_QUEUE=true | ||
export EXTERNAL_WORKER_ID_PRIORITY_ENDPOINT="https://your-api.com/priority-workers" | ||
export FAST_QUEUE_SIZE=2000 | ||
export SLOW_QUEUE_SIZE=10000 | ||
export PRIORITY_REFRESH_INTERVAL_SECONDS=300 # 5 minutes | ||
|
||
./tee-worker | ||
``` | ||
|
||
### Monitor Queue Performance | ||
```bash | ||
# Check queue statistics | ||
curl http://localhost:8080/job/queue/stats | ||
|
||
# Response shows queue depths and processing counts | ||
{ | ||
"fast_queue_depth": 5, | ||
"slow_queue_depth": 23, | ||
"fast_processed": 1523, | ||
"slow_processed": 4821, | ||
"last_update": "2024-01-15T14:22:31Z" | ||
} | ||
``` | ||
|
||
## Endpoint Integration Details | ||
|
||
### Automatic Refresh | ||
The priority list is automatically refreshed from the external endpoint: | ||
- Initial fetch on startup | ||
- Periodic refresh every 15 minutes (configurable) | ||
- Continues using last known good list if refresh fails | ||
- All errors are logged but don't stop the service | ||
|
||
### Monitoring Endpoint Status | ||
Check logs for endpoint status: | ||
``` | ||
INFO[0000] Fetching initial priority list from external endpoint: https://tee-api.masa.ai/list-tee-workers | ||
INFO[0000] Priority list updated with 179 workers from external endpoint | ||
``` | ||
|
||
## Future Enhancements | ||
|
||
1. **Dynamic Queue Sizing**: Adjust queue sizes based on load | ||
2. **Priority Levels**: Multiple priority tiers (not just fast/slow) | ||
3. **Metrics Export**: Prometheus/Grafana integration | ||
4. **Queue Persistence**: Survive restarts without losing jobs | ||
5. **Worker ID Extraction**: Extract worker ID from URL if needed (currently uses full URL) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
// Package jobserver provides job queue management and processing functionality. | ||
// This file defines common errors used throughout the jobserver package. | ||
package jobserver | ||
|
||
import "errors" | ||
|
||
// Common errors returned by jobserver operations. | ||
// These errors help distinguish between different failure scenarios | ||
// and allow callers to handle specific error conditions appropriately. | ||
var ( | ||
// ErrQueueClosed is returned when attempting to use a closed queue | ||
ErrQueueClosed = errors.New("queue is closed") | ||
|
||
// ErrQueueFull is returned when attempting to enqueue to a full queue | ||
ErrQueueFull = errors.New("queue is full") | ||
|
||
// ErrQueueEmpty is returned when attempting to dequeue from empty queues | ||
ErrQueueEmpty = errors.New("all queues are empty") | ||
|
||
// ErrJobNotFound is returned when a job is not found | ||
ErrJobNotFound = errors.New("job not found") | ||
|
||
// ErrInvalidJobType is returned when job type is invalid | ||
ErrInvalidJobType = errors.New("invalid job type") | ||
) |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.