Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ft: ZENKO-147 Use Redis keys instead of hash #286

Merged
merged 1 commit into from
May 31, 2018

Conversation

bennettbuchanan
Copy link

@bennettbuchanan bennettbuchanan commented May 10, 2018

We want to be able to expire entries from failed CRR after a configurable amount of time (default is 24 hours). Unfortunately Redis does not support expiry of hash data structure fields, so this PR moves the design to store failed CRR as keys.

Alternatives considered:

  • Setting the hash fields with a timestamp and then retroactively removing hashes at certain time intervals.
  • Setting hash keys with a timestamp, one for each expiry time interval, then retroactively deleting old hashes.

I considered this the simplest solution because it leverages Redis' built-in expiry and we obviate the use of a cron job to delete old keys.

This PR also fixes a problem where when >= 1000 keys are stored, the listing of specific version failures does not iterate beyond the first count. The solution is to use a recursive scan listing.

@ironman-machine
Copy link
Contributor

PR has been updated. Reviewers, please be cautious.

@@ -50,6 +52,7 @@ const joiSchema = {
},
topic: joi.string().required(),
replicationStatusTopic: joi.string().required(),
monitorReplicationFailureExpiry: joi.number().default(CRR_FAILURE_EXPIRY),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest adding a S at the end of the param name to tell we expect a number of seconds (which also implicitly tells it's a time interval which is not obvious otherwise, or may be renamed to replicationFailureEntryExpiryTimeS - not a fan of monitor in the name, I find it a bit confusing).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I was following the config field monitorReplicationFailure which is forthcoming. I might suggest updating to monitorReplicationFailureExpiryTimeS so it's clear these fields reference the same functionality, but I'm not especially wedded to any naming scheme.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may have written this comment originally before we decided to use monitorReplication... naming for the conf vars, and left it as pending for some time, so I'm good keeping consistency across config.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

monitorReplicationFailureExpiryTimeS it is!

`${bucket}:${objectKey}:${versionId}:${site}`;
const value = JSON.stringify(JSON.parse(kafkaEntry.value));
const expiry = this.repConfig.monitorReplicationFailureExpiry;
cmds.push(['set', key, value, 'EX', expiry]);
}
return undefined;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That line looks unnecessary

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is to make the linter happy. 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inner forEach function does not seem to return anything elsewhere, so I'm suspecting this is a left-over from a previous version that required it, but I may be wrong.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was removed during the previous refactor 🍾, so either way!

fields.push(field, JSON.stringify(value));
const key = `${redisKeys.failedCRR}:` +
`${bucket}:${objectKey}:${versionId}:${site}`;
const value = JSON.stringify(JSON.parse(kafkaEntry.value));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it needed to copy the kafkaEntry.value field? It's no big issue, just noticed it could be useless, in doubt we can leave it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The subsequent PR will reduce this to just storing the source role to get the failed object's metadata.

if (allKeys.length >= 1000 || Number.parseInt(cursor, 10) === 0) {
return cb(null, cursor, allKeys);
}
return this._scanAllKeys(pattern, cursor, allKeys, cb);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with redis API, but if asked for 1000 keys, may it return you less than this even if there are 1000+ keys to return? In such case it looks correct to call _scanAllKeys again, otherwise it seems we could just return the results in the callback in all cases.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There isn't a guarantee on the number of keys being returned by Redis during the SCAN operation. (I've added a point about this in the Operational Considerations section of the doc for this feature.) We do have a guarantee that the scan has completed when the returned cursor is 0, so in such a case we can return whatever results are there even if less than 1000.

@@ -378,11 +395,12 @@ class BackbeatAPI {
LastModified: queueEntry.getLastModified(),
ReplicationStatus: 'PENDING',
});
const field = `${Bucket}:${Key}:${VersionId}:${StorageClass}`;
return this._deleteFailedCRRField(field, err => {
const key = `${redisKeys.failedCRR}:` +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be ideal if the construction of the key happened in a shared helper function

return ['hset', REDIS_KEY_FAILED_CRR, field, value];
const [bucket, objectKey, versionId, site] = key.split(':');
const value = getKafkaEntry(bucket, objectKey, site);
return ['set', `${REDIS_KEY_FAILED_CRR}:${key}`, value];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is REDIS_KEY_FAILED_CRR the same than the redisKeys.failedCRR constant used elsewhere? In which case better use the same one consistently and remove the other.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used because the test suite is not run with the TEST_SWITCH environment var. I updated the constant variable name to TEST_REDIS_KEY_FAILED_CRR to make that more apparent.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we want to update that in the CI? I'm fine either way.

const [bucket, key, versionId, site] = k.split(':');
assert(res.Versions.some(o => (
o.Bucket === bucket &&
o.Key === key &&
o.VersionId === versionId &&
o.VersionId === testVersionId &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it needed (or better) to change this test?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the response gets the version ID using the object MD instead of the key name (see change here. I think it's better because it actually checks that the returned value is the correct version ID. I will go ahead an update the key names to use testVersionId just to be consistent.

@ironman-machine
Copy link
Contributor

PR has been updated. Reviewers, please be cautious.

philipyoo
philipyoo previously approved these changes May 29, 2018
return cb(null, this._getFailedCRRResponse(cursor, hashes));
const [cursor, keys] = collection;
allKeys.push(...keys);
if (allKeys.length >= 1000 || Number.parseInt(cursor, 10) === 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allKeys.length >= 1000
If we get 1000 keys initially, won't this return before getting next set of keys? Or was this intentional?

Copy link
Author

@bennettbuchanan bennettbuchanan May 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in effect it allows for a paginated response. I wanted to limit the API response listing to ~1000 keys to model closer to the default for version listings in S3. We cannot guarantee the exact number because Redis does not make a guarantee for the number of keys returned during the scan. So if it's 1000 or more, the API will include a NextMarker value for subsequent listings.

@ironman-machine ironman-machine dismissed philipyoo’s stale review May 29, 2018 22:53

Do it again human slave!:point_right: :runner: (Oh and the pull request has been updated, by the way.)

@ironman-machine
Copy link
Contributor

PR has been updated. Reviewers, please be cautious.

@ironman-machine
Copy link
Contributor

CONFLICT (add/add): Merge conflict in tests/unit/lifecycle/LifecycleTask.spec.js
CONFLICT (add/add): Merge conflict in tests/functional/api/BackbeatServer.js
CONFLICT (add/add): Merge conflict in lib/api/BackbeatAPI.js
CONFLICT (add/add): Merge conflict in extensions/replication/replicationStatusProcessor/ReplicationStatusProcessor.js
CONFLICT (add/add): Merge conflict in extensions/replication/ReplicationConfigValidator.js
CONFLICT (add/add): Merge conflict in extensions/lifecycle/tasks/LifecycleTask.js

@ironman-machine
Copy link
Contributor

PR has been updated. Reviewers, please be cautious.

@ironman-machine
Copy link
Contributor

PR has been updated. Reviewers, please be cautious.

1 similar comment
@ironman-machine
Copy link
Contributor

PR has been updated. Reviewers, please be cautious.

@ironman-machine
Copy link
Contributor

CONFLICT (add/add): Merge conflict in tests/functional/api/BackbeatServer.js
CONFLICT (add/add): Merge conflict in lib/api/BackbeatAPI.js
CONFLICT (add/add): Merge conflict in extensions/replication/replicationStatusProcessor/ReplicationStatusProcessor.js
CONFLICT (add/add): Merge conflict in extensions/replication/ReplicationConfigValidator.js

@ironman-machine
Copy link
Contributor

PR has been updated. Reviewers, please be cautious.

@ironman-machine
Copy link
Contributor

PR has been updated. Reviewers, please be cautious.

@ironman-machine
Copy link
Contributor

PR has been updated. Reviewers, please be cautious.

@ironman-machine
Copy link
Contributor

PR has been updated. Reviewers, please be cautious.

@ironman-machine
Copy link
Contributor

CONFLICT (add/add): Merge conflict in package.json

@bennettbuchanan bennettbuchanan changed the base branch from z/1.0 to development/8.0 May 31, 2018 18:38
@jonathan-gramain jonathan-gramain merged commit f3f7a0d into development/8.0 May 31, 2018
@jonathan-gramain jonathan-gramain deleted the ft/ZENKO-147/useRedisKeys branch May 31, 2018 18:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants