Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce redis recovery listener #6

Merged
merged 6 commits into from
Sep 13, 2017

Conversation

dvizelman
Copy link

No description provided.

isDebug = LOG.isDebugEnabled() ? Boolean.TRUE.toString() : Boolean.FALSE.toString();
serverName = getServerName();
this.listener = listener;
redisRestartRecoveryOn = (listener != null);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we can avoid this and look at the listener instead.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I went for best readability (IMO, of course). redisRestartRecoveryOn is calculated one time , but used many time. I don't like checking whether some variable is null, I prefer it to be done only on initialization. I would like to use Optional instead, but optional is recommended only for method return value

}
}

private String getRecoveryStatus() {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be nicer to return the enum and transform to String only when you actually need it.
It will cause you to write the .name() part only once.

private Runnable listener;
private String serverName;
private String isDebug;
private Future<?> recovering = null;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a better name here...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once again agree :-)

@@ -41,28 +48,39 @@
Executors.newScheduledThreadPool(POOL_SIZE);
private final AtomicReference<String> watchdogScriptHash = new AtomicReference<>(null);
private Pool<Jedis> jedisPool;
private boolean redisRestartRecoveryOn;
private Runnable listener;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better naming...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, will change it

private boolean redisRestartRecoveryOn;
private Runnable listener;
private String serverName;
private String isDebug;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we save this one?
If not - can we make it boolean?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can , but then we will need to convert it to string every time we call lua script. I went for performance , not for readability , but it can be changed easily ...

@@ -124,6 +143,40 @@ private void activateIsAliveService(final String serverName) throws RuntimeExcep
scheduler.scheduleAtFixedRate(isAlive, INITIAL_DELAY, IS_ALIVE_PERIOD, SECONDS);
}

private void heartbeatIsAlive(final Jedis jedis) {
String recoveringStatus = getRecoveryStatus();
Copy link

@nanimrod nanimrod Sep 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand why we're passing this to the script.
When you come back - I'll catch you to discuss this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once jesque invokes listener, it needs to know whether the listener is running , finished or failed. This information is saved in recoveringStatus variable

…inishs/fails , remove this update from heart beat process
@dvizelman
Copy link
Author

@ofirbnd

redisRestartRecoveryListener.run();
doWorkInPool(this.jedisPool, (PoolUtils.PoolWork<Jedis, Void>) jedis -> {
jedis.evalsha(watchdogScriptHash.get(), WATCHDOG_SCRIPT_KEYS_NUMBER, serverName, IS_ALIVE, getCurrTime(), isDebug, RecoveryStatus.FINISHED.name(), String.valueOf(LIGHT_KEEPER_PERIOD));
;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this line with only ;?

try {
doWorkInPool(this.jedisPool, (PoolUtils.PoolWork<Jedis, Void>) jedis -> {
jedis.evalsha(watchdogScriptHash.get(), WATCHDOG_SCRIPT_KEYS_NUMBER, serverName, IS_ALIVE, getCurrTime(), isDebug, RecoveryStatus.FAILED.name(), String.valueOf(LIGHT_KEEPER_PERIOD));
;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

import java.util.concurrent.Executors;
import java.util.concurrent.ScheduledExecutorService;
import java.util.concurrent.atomic.AtomicReference;

import static java.util.concurrent.TimeUnit.SECONDS;
import static net.greghaines.jesque.utils.PoolUtils.doWorkInPool;

enum RecoveryStatus {
HEARTBEAT,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a status.

@@ -124,6 +143,17 @@ private void activateIsAliveService(final String serverName) throws RuntimeExcep
scheduler.scheduleAtFixedRate(isAlive, INITIAL_DELAY, IS_ALIVE_PERIOD, SECONDS);
}

private void heartbeatIsAlive(final Jedis jedis) {

final String recoveryStatus = redisRestartRecoveryOn ? RecoveryStatus.HEARTBEAT.name() : RecoveryStatus.DISABLED.name();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We aren't sending here the recovery-status but we're sending is the recovery mechanism is enabled or not.
Let's rename accordingly and avoid using the enum which is related to the status.

runRecovery = runRecoveryProcess();
end
end
elseif(recoveringStatus == "FINISHED") then
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should extract this code from this context.
I.e. different job/script.

-- mark recovery as finished
debugMessage("Redis restart recovery process has finished on server "..serverName)
redis.call('SET', redisRecoveryKey,"READY")
elseif (recoveringStatus== "FAILED") then
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should extract this code from this context.
I.e. different job/script.

try {
redisRestartRecoveryListener.run();
doWorkInPool(this.jedisPool, (PoolUtils.PoolWork<Jedis, Void>) jedis -> {
jedis.evalsha(watchdogScriptHash.get(), WATCHDOG_SCRIPT_KEYS_NUMBER, serverName, IS_ALIVE, getCurrTime(), isDebug, RecoveryStatus.FINISHED.name(), String.valueOf(LIGHT_KEEPER_PERIOD));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Call a different script/job. Calling the same exact script & job is weird and not helping to the readability.
We're not notifying is_alive here. We're just updating the status to FINISHED.

LOG.error("Recovery process has failed ", e);
try {
doWorkInPool(this.jedisPool, (PoolUtils.PoolWork<Jedis, Void>) jedis -> {
jedis.evalsha(watchdogScriptHash.get(), WATCHDOG_SCRIPT_KEYS_NUMBER, serverName, IS_ALIVE, getCurrTime(), isDebug, RecoveryStatus.FAILED.name(), String.valueOf(LIGHT_KEEPER_PERIOD));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

different script/job

@@ -541,7 +547,7 @@ protected void recoverFromException(final String curQueue, final Exception ex) {
end(false);
} else {
authenticateAndSelectDB();
LOG.info("Reconnected to Redis");
LOG.info("Reconnected to Redis " + this);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this have toString?
How will this print look like?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This way

18:52:24.922 [Worker-6 Jesque-DEVELOPMENT: RUNNING] INFO net.greghaines.jesque.worker.WorkerImpl - Reconnected to Redis resque:worker:dimav-e7470:23990-6:JAVA_DYNAMIC_QUEUES,highPriorityQueue,lowPriorityQueue
18:52:24.983 [Worker-10 Jesque-DEVELOPMENT: RUNNING] INFO net.greghaines.jesque.worker.WorkerImpl - Reconnected to Redis resque:worker:dimav-e7470:23990-10:JAVA_DYNAMIC_QUEUES,highPriorityQueue,lowPriorityQueue

instead of

18:52:24.922 [Worker-6 Jesque-DEVELOPMENT: RUNNING] INFO net.greghaines.jesque.worker.WorkerImpl - Reconnected to Redis
18:52:24.983 [Worker-10 Jesque-DEVELOPMENT: RUNNING] INFO net.greghaines.jesque.worker.WorkerImpl - Reconnected to Redis

Probably you are right , as we have specific worker related information in the thread name :-)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What bothered me is that the message says Redis and than we print the worker. Also believe that the worker is not really relevant so expected the Redis to be printed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to improve original message and now I just want to revert my change and leave the message intact.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW , worker is relevant ,as every worker contains dedicated jedis client and we are reconnecting this jedis to restarted redis


if(ex instanceof JedisNoScriptException){
try {
loadScripts();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getRecoveryStrategy is loading scripts? This is weird and not expected.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, my fault



if jobName=='watchdog' then
timeToRequeueJobsOnInactiveServer = getMillis(ARGV[5])
lightKeeperPeriod = getMillis(ARGV[6])

inspectLightKeeper()
elseif jobName=='requeueJobs' then
requeueJobs(serverName)
elseif jobName=='isAlive' then

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably use a verb here :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've copied it from Thread.isAlive() method. I don't mind changing the name, but can't find better name

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about reportAlive?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok :-)

…ker's scripts after redis has been restarted; rename isAlive service to reportAlive
return recoveryStrategy;
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did we touch the WorkerImpl class as part of this PR? Doesn't look related to what we're trying to achieve...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without reloading script after redis restart workers will not work. And we are trying to achieve situation when listener will not only be called, but will have a chance to requeue aggregations :-)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, cool :)

@dvizelman dvizelman merged commit 6066f01 into master Sep 13, 2017
@dvizelman dvizelman deleted the redis_restart_recovery_listener branch September 13, 2017 06:48
bp-FLN pushed a commit to uberall/jesque that referenced this pull request Dec 21, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants