fix: inconsistent and leaky retry delay logic in router #3002

atzoum · 2023-02-20T09:40:35Z

Description

Addressing router job retry backoff inconsistencies which may also cause memory leak:

We don’t delete entries from the map in case we drain jobs or mark them as aborted in places other than postStatusOnResponseQ.
In case event ordering is disabled, the worker-specific map we are using now is not really useful, since there is no guarantee that the same job will always be assigned to the same worker…

To fix these issues we are no longer using an in-memory map per worker, but the job status' retry_time column is used for storing the calculated backoff time. JobsDB queries for retrieving jobs don't use any conditions against the retry_time column, otherwise in case of a server restart this would cause jobs to be picked out-of-order as router's event-ordering algorithm doesn't persist its state, it only retains it in-memory.

Notion Ticket

Link

Security

The code changed/added as part of this pull request won't create any security issues with how the software is being used.

codecov · 2023-02-20T11:02:54Z

Codecov Report

Base: 53.01% // Head: 53.00% // Decreases project coverage by -0.01% ⚠️

Coverage data is based on head (8457cc5) compared to base (4b613ad).
Patch coverage: 92.00% of modified lines in pull request are covered.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #3002      +/-   ##
==========================================
- Coverage   53.01%   53.00%   -0.01%     
==========================================
  Files         334      334              
  Lines       51941    51885      -56     
==========================================
- Hits        27534    27504      -30     
+ Misses      22796    22779      -17     
+ Partials     1611     1602       -9

Impacted Files	Coverage Δ
jobsdb/jobsdb_utils.go	`77.48% <ø> (-2.04%)`	⬇️
router/router.go	`78.29% <81.81%> (+0.49%)`	⬆️
jobsdb/jobsdb.go	`73.91% <100.00%> (-0.13%)`	⬇️
jobsdb/unionQuery.go	`87.98% <100.00%> (+3.85%)`	⬆️
processor/worker.go	`82.81% <0.00%> (-2.35%)`	⬇️
services/rsources/handler.go	`75.20% <0.00%> (+1.37%)`	⬆️
testhelper/log/log.go	`11.53% <0.00%> (+3.84%)`	⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

router/router.go

Sidddddarth · 2023-02-21T10:03:53Z

another shameless lo reference

could be useful here:

if !rt.guaranteeUserEventOrder {
		// if guaranteeUserEventOrder is false, assigning worker randomly and returning here.
		if rt.shouldThrottle(job, parameters, throttledUserMap) {
			return
		}
		toSendWorker = rt.workers[rand.Intn(rt.noOfWorkers)] // skipcq: GSC-G404
		return
	}

github-actions bot added server-team with tests labels Feb 20, 2023

atzoum force-pushed the fix.routerRetryDelay branch 2 times, most recently from 47ee86f to 98ebf81 Compare February 20, 2023 11:00

atzoum force-pushed the fix.routerRetryDelay branch 4 times, most recently from 5b1656c to 0af362b Compare February 20, 2023 15:01

atzoum changed the title ~~[WIP] fix: inconsistent retry delay logic in router~~ fix: inconsistent retry delay logic in router Feb 20, 2023

atzoum changed the title ~~fix: inconsistent retry delay logic in router~~ fix: inconsistent and leaky retry delay logic in router Feb 20, 2023

atzoum force-pushed the fix.routerRetryDelay branch from 0af362b to d373f87 Compare February 20, 2023 16:13

atzoum marked this pull request as ready for review February 21, 2023 07:15

atzoum requested review from cisse21, saurav-malani, Sidddddarth, chandumlg and fracasula February 21, 2023 07:15

Sidddddarth reviewed Feb 21, 2023

View reviewed changes

router/router.go Outdated Show resolved Hide resolved

Sidddddarth reviewed Feb 21, 2023

View reviewed changes

router/router.go Outdated Show resolved Hide resolved

Sidddddarth approved these changes Feb 21, 2023

View reviewed changes

atzoum force-pushed the fix.routerRetryDelay branch from d373f87 to f08985f Compare February 21, 2023 13:08

fix: inconsistent retry delay logic in router

d312c5b

atzoum force-pushed the fix.routerRetryDelay branch from f08985f to 7544df9 Compare February 21, 2023 13:25

fixup! fix: inconsistent retry delay logic in router

8457cc5

atzoum force-pushed the fix.routerRetryDelay branch from 7544df9 to 8457cc5 Compare February 21, 2023 13:28

atzoum requested a review from Sidddddarth February 21, 2023 13:29

Sidddddarth approved these changes Feb 21, 2023

View reviewed changes

cisse21 approved these changes Feb 21, 2023

View reviewed changes

cisse21 merged commit cc8e1dc into master Feb 21, 2023

cisse21 deleted the fix.routerRetryDelay branch February 21, 2023 17:48

atzoum mentioned this pull request Feb 23, 2023

chore: merge release 1.6.0 in main branch #3033

Merged

1 task

rudder-server-bot mentioned this pull request Feb 27, 2023

chore: prerelease 1.7.0-preview.1 #3040

Merged

This was referenced Mar 6, 2023

chore: prerelease 1.7.0-preview.2 #3071

Merged

chore: prerelease 1.7.0-preview.3 #3084

Merged

This was referenced Mar 14, 2023

chore: prerelease 1.7.0-preview.4 #3099

Merged

chore: release 1.7.0 #3106

Merged

chore: prerelease 1.7.0-rc.1 #3107

Merged

chore: prerelease 1.7.0-rc.2 #3114

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: inconsistent and leaky retry delay logic in router #3002

fix: inconsistent and leaky retry delay logic in router #3002

atzoum commented Feb 20, 2023 •

edited

Loading

codecov bot commented Feb 20, 2023 •

edited

Loading

Sidddddarth commented Feb 21, 2023

fix: inconsistent and leaky retry delay logic in router #3002

fix: inconsistent and leaky retry delay logic in router #3002

Conversation

atzoum commented Feb 20, 2023 • edited Loading

Description

Notion Ticket

Security

codecov bot commented Feb 20, 2023 • edited Loading

Codecov Report

Sidddddarth commented Feb 21, 2023

atzoum commented Feb 20, 2023 •

edited

Loading

codecov bot commented Feb 20, 2023 •

edited

Loading