-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce redis recovery listener #6
Conversation
isDebug = LOG.isDebugEnabled() ? Boolean.TRUE.toString() : Boolean.FALSE.toString(); | ||
serverName = getServerName(); | ||
this.listener = listener; | ||
redisRestartRecoveryOn = (listener != null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we can avoid this and look at the listener instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I went for best readability (IMO, of course). redisRestartRecoveryOn is calculated one time , but used many time. I don't like checking whether some variable is null, I prefer it to be done only on initialization. I would like to use Optional instead, but optional is recommended only for method return value
} | ||
} | ||
|
||
private String getRecoveryStatus() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be nicer to return the enum and transform to String only when you actually need it.
It will cause you to write the .name()
part only once.
private Runnable listener; | ||
private String serverName; | ||
private String isDebug; | ||
private Future<?> recovering = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a better name here...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once again agree :-)
@@ -41,28 +48,39 @@ | |||
Executors.newScheduledThreadPool(POOL_SIZE); | |||
private final AtomicReference<String> watchdogScriptHash = new AtomicReference<>(null); | |||
private Pool<Jedis> jedisPool; | |||
private boolean redisRestartRecoveryOn; | |||
private Runnable listener; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better naming...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, will change it
private boolean redisRestartRecoveryOn; | ||
private Runnable listener; | ||
private String serverName; | ||
private String isDebug; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we save this one?
If not - can we make it boolean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can , but then we will need to convert it to string every time we call lua script. I went for performance , not for readability , but it can be changed easily ...
@@ -124,6 +143,40 @@ private void activateIsAliveService(final String serverName) throws RuntimeExcep | |||
scheduler.scheduleAtFixedRate(isAlive, INITIAL_DELAY, IS_ALIVE_PERIOD, SECONDS); | |||
} | |||
|
|||
private void heartbeatIsAlive(final Jedis jedis) { | |||
String recoveringStatus = getRecoveryStatus(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite understand why we're passing this to the script.
When you come back - I'll catch you to discuss this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once jesque invokes listener, it needs to know whether the listener is running , finished or failed. This information is saved in recoveringStatus variable
…inishs/fails , remove this update from heart beat process
redisRestartRecoveryListener.run(); | ||
doWorkInPool(this.jedisPool, (PoolUtils.PoolWork<Jedis, Void>) jedis -> { | ||
jedis.evalsha(watchdogScriptHash.get(), WATCHDOG_SCRIPT_KEYS_NUMBER, serverName, IS_ALIVE, getCurrTime(), isDebug, RecoveryStatus.FINISHED.name(), String.valueOf(LIGHT_KEEPER_PERIOD)); | ||
; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this line with only ;?
try { | ||
doWorkInPool(this.jedisPool, (PoolUtils.PoolWork<Jedis, Void>) jedis -> { | ||
jedis.evalsha(watchdogScriptHash.get(), WATCHDOG_SCRIPT_KEYS_NUMBER, serverName, IS_ALIVE, getCurrTime(), isDebug, RecoveryStatus.FAILED.name(), String.valueOf(LIGHT_KEEPER_PERIOD)); | ||
; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
import java.util.concurrent.Executors; | ||
import java.util.concurrent.ScheduledExecutorService; | ||
import java.util.concurrent.atomic.AtomicReference; | ||
|
||
import static java.util.concurrent.TimeUnit.SECONDS; | ||
import static net.greghaines.jesque.utils.PoolUtils.doWorkInPool; | ||
|
||
enum RecoveryStatus { | ||
HEARTBEAT, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a status.
@@ -124,6 +143,17 @@ private void activateIsAliveService(final String serverName) throws RuntimeExcep | |||
scheduler.scheduleAtFixedRate(isAlive, INITIAL_DELAY, IS_ALIVE_PERIOD, SECONDS); | |||
} | |||
|
|||
private void heartbeatIsAlive(final Jedis jedis) { | |||
|
|||
final String recoveryStatus = redisRestartRecoveryOn ? RecoveryStatus.HEARTBEAT.name() : RecoveryStatus.DISABLED.name(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We aren't sending here the recovery-status but we're sending is the recovery mechanism is enabled or not.
Let's rename accordingly and avoid using the enum which is related to the status.
runRecovery = runRecoveryProcess(); | ||
end | ||
end | ||
elseif(recoveringStatus == "FINISHED") then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should extract this code from this context.
I.e. different job/script.
-- mark recovery as finished | ||
debugMessage("Redis restart recovery process has finished on server "..serverName) | ||
redis.call('SET', redisRecoveryKey,"READY") | ||
elseif (recoveringStatus== "FAILED") then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should extract this code from this context.
I.e. different job/script.
try { | ||
redisRestartRecoveryListener.run(); | ||
doWorkInPool(this.jedisPool, (PoolUtils.PoolWork<Jedis, Void>) jedis -> { | ||
jedis.evalsha(watchdogScriptHash.get(), WATCHDOG_SCRIPT_KEYS_NUMBER, serverName, IS_ALIVE, getCurrTime(), isDebug, RecoveryStatus.FINISHED.name(), String.valueOf(LIGHT_KEEPER_PERIOD)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Call a different script/job. Calling the same exact script & job is weird and not helping to the readability.
We're not notifying is_alive here. We're just updating the status to FINISHED.
LOG.error("Recovery process has failed ", e); | ||
try { | ||
doWorkInPool(this.jedisPool, (PoolUtils.PoolWork<Jedis, Void>) jedis -> { | ||
jedis.evalsha(watchdogScriptHash.get(), WATCHDOG_SCRIPT_KEYS_NUMBER, serverName, IS_ALIVE, getCurrTime(), isDebug, RecoveryStatus.FAILED.name(), String.valueOf(LIGHT_KEEPER_PERIOD)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
different script/job
…ed and relaunch it if needed
@@ -541,7 +547,7 @@ protected void recoverFromException(final String curQueue, final Exception ex) { | |||
end(false); | |||
} else { | |||
authenticateAndSelectDB(); | |||
LOG.info("Reconnected to Redis"); | |||
LOG.info("Reconnected to Redis " + this); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this have toString?
How will this print look like?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This way
18:52:24.922 [Worker-6 Jesque-DEVELOPMENT: RUNNING] INFO net.greghaines.jesque.worker.WorkerImpl - Reconnected to Redis resque:worker:dimav-e7470:23990-6:JAVA_DYNAMIC_QUEUES,highPriorityQueue,lowPriorityQueue
18:52:24.983 [Worker-10 Jesque-DEVELOPMENT: RUNNING] INFO net.greghaines.jesque.worker.WorkerImpl - Reconnected to Redis resque:worker:dimav-e7470:23990-10:JAVA_DYNAMIC_QUEUES,highPriorityQueue,lowPriorityQueue
instead of
18:52:24.922 [Worker-6 Jesque-DEVELOPMENT: RUNNING] INFO net.greghaines.jesque.worker.WorkerImpl - Reconnected to Redis
18:52:24.983 [Worker-10 Jesque-DEVELOPMENT: RUNNING] INFO net.greghaines.jesque.worker.WorkerImpl - Reconnected to Redis
Probably you are right , as we have specific worker related information in the thread name :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What bothered me is that the message says Redis and than we print the worker. Also believe that the worker is not really relevant so expected the Redis to be printed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to improve original message and now I just want to revert my change and leave the message intact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW , worker is relevant ,as every worker contains dedicated jedis client and we are reconnecting this jedis to restarted redis
|
||
if(ex instanceof JedisNoScriptException){ | ||
try { | ||
loadScripts(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getRecoveryStrategy is loading scripts? This is weird and not expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, my fault
|
||
|
||
if jobName=='watchdog' then | ||
timeToRequeueJobsOnInactiveServer = getMillis(ARGV[5]) | ||
lightKeeperPeriod = getMillis(ARGV[6]) | ||
|
||
inspectLightKeeper() | ||
elseif jobName=='requeueJobs' then | ||
requeueJobs(serverName) | ||
elseif jobName=='isAlive' then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably use a verb here :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've copied it from Thread.isAlive() method. I don't mind changing the name, but can't find better name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about reportAlive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok :-)
…ker's scripts after redis has been restarted; rename isAlive service to reportAlive
return recoveryStrategy; | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did we touch the WorkerImpl class as part of this PR? Doesn't look related to what we're trying to achieve...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without reloading script after redis restart workers will not work. And we are trying to achieve situation when listener will not only be called, but will have a chance to requeue aggregations :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, cool :)
No description provided.