-
Notifications
You must be signed in to change notification settings - Fork 453
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Namespace retention configurability and dynamic namespace registry #306
Conversation
363f619
to
c7155ae
Compare
9b6e711
to
37dd13b
Compare
Makefile
Outdated
@@ -111,7 +111,7 @@ lint: | |||
|
|||
test-internal: | |||
@which go-junit-report > /dev/null || go get -u github.com/sectioneight/go-junit-report | |||
@$(VENDOR_ENV) $(test) $(coverfile) | tee $(test_log) | |||
@$(VENDOR_ENV) NPROC=1 $(test) $(coverfile) | tee $(test_log) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: undo this once builds start working
func TestCommitLogBootstrapMultipleNamespaces(t *testing.T) { | ||
if testing.Short() { | ||
t.SkipNow() // Just skip if we're doing a short run | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Superfluous endline here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
persist/fs/commitlog/options.go
Outdated
return fmt.Errorf("invalid commit log retention buffer future, expected 0, observed: %v", v) | ||
} | ||
|
||
if v := ropts.BufferPast(); v != 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is the expectation for buffer future and buffer past, wouldn't it be better to simply have BlockSize
as its own option instead of having RetentionOptions
be an option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was avoiding that cause I needed both BlockSize
and RetentionPeriod
. But you're right, it's probably cleaner to have it that way. Will address in next patch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
storage/cleanup.go
Outdated
flushStart = retention.FlushTimeStart(ropts, t) | ||
flushEnd = retention.FlushTimeEnd(ropts, t) | ||
) | ||
return flushStart.Add(-blockSize), flushEnd.Add(-blockSize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok sounds good, I guess its because you never want to cleanup a block before the current flush end potential time. I'm fine to leave this as is.
// If we cared about 100% correctness, we would optimize this by retaining a smarter data | ||
// structure (e.g. interval tree), but for our use-case, it's safe to assume that commit logs | ||
// are only retained for a period of 1-2 days (at most), after we which we'd live we with the | ||
// data loss. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok sounds good.
The other thing is if you turn off the process for however long the commit log retention time is, then turn it back on, they'll never get cleaned up. Which is a bit poor.
storage/database.go
Outdated
d.mediator = mediator | ||
|
||
// wait till first value is received | ||
<-watch.C() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For debuggability perhaps log the start and the stop of this operation?
i.e.
log.Info("resolving namespaces with namespace watch")
<-watch.C()
for _, namespace := range watch.Get() {
log.Infof("resolved namespace: %s", namespace.ID().String())
}
Potentially we might want to do the same for the cluster/database.go
startup routine too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add the line ~ log.Info("resolving namespaces with namespace watch")
, I don't think we need the loop/log ~ it's already done in UpdateOwnedNamespaces
.
Example log from startup of the node
Sep 7 22:33:22 m3-tsdb215-sjc1 statsdex_m3dbnode[26210]: {"level":"info","msg":"m3dbnode_main.go:815 bytes pool registering bucket capacity=8192, count=32768","time":"2017-09-07T22:33:22Z"}
Sep 7 22:33:56 m3-tsdb215-sjc1 statsdex_m3dbnode[26210]: {"level":"info","msg":"m3dbnode_main.go:492 creating config service client with m3kv","time":"2017-09-07T22:33:56Z"}
Sep 7 22:33:56 m3-tsdb215-sjc1 statsdex_m3dbnode[26210]: {"level":"info","msg":"log.go:50 m3db: adding a watch for service: statsdex_m3dbnode env: production zone: sjc1 includeUnhealthy: true","time":"2017-09-07T22:33:56Z"}
Sep 7 22:33:56 m3-tsdb215-sjc1 statsdex_m3dbnode[26210]: {"level":"info","msg":"log.go:50 m3db: successfully loaded cache from file /var/lib/m3kv/statsdex_m3dbnode_sjc1.json","time":"2017-09-07T22:33:56Z"}
Sep 7 22:33:56 m3-tsdb215-sjc1 statsdex_m3dbnode[26210]: {"level":"info","msg":"log.go:50 m3db: successfully loaded cache from file /var/lib/m3kv/_kv_production_statsdex_m3dbnode_sjc1.json","time":"2017-09-07T22:33:56Z"}
Sep 7 22:33:56 m3-tsdb215-sjc1 statsdex_m3dbnode[26210]: {"key":"m3db.node.bootstrapper","level":"info","msg":"m3db: value update success","time":"2017-09-07T22:33:56Z","value":["filesystem","peers","noop-all"],"version":81}
Sep 7 22:33:56 m3-tsdb215-sjc1 statsdex_m3dbnode[26210]: {"adds":"testmetrics, metrics","level":"info","msg":"m3db: updating database namespaces","removals":"[]","time":"2017-09-07T22:33:56Z","updates":"[]"}
Sep 7 22:33:56 m3-tsdb215-sjc1 statsdex_m3dbnode[26210]: {"level":"info","msg":"m3dbnode_main.go:624 node tchannelthrift: listening on 0.0.0.0:6155","time":"2017-09-07T22:33:56Z"}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added appropriate logs to storage/database.go
, storage/cluster/database.go
, and storage/namespace_watch.go
storage/database.go
Outdated
if len(t) == 0 { | ||
return "[]", nil | ||
} | ||
var buf bytes.Buffer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Considering you return []
if len(t) == 0
perhaps would be best to prefix and suffix if you have real values with [
and ]
? i.e. buf.WriteRune('[')
at beginning and buf.WriteRune(']')
at end?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure thing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
storage/database.go
Outdated
if len(m) == 0 { | ||
return "[]", nil | ||
} | ||
var buf bytes.Buffer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here regarding prefix and postfix with [
and ]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure thing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
storage/flush.go
Outdated
if err != nil { | ||
return err | ||
} | ||
defer flush.Done() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does flush.Done()
return an error? If so should it be checked? And if not shouldn't it return an error considering its an IO based operation that could encounter an error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, this was following the original code. Will update to capture that error, and add a test case for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
storage/namespace/dynamic/options.go
Outdated
} | ||
|
||
// NewOptions creates a new DynamicOptions | ||
func NewOptions() namespace.DynamicOptions { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason why DynamicOptions
is defined in the namespace
package and not directly as Options
in the dynamic
package - which would be more consistent with other packages?
I suppose to be as similar as possible to the topology
package all the contents of the dynamic
package would just be in namespace
and just prefixed with Dynamic
e.g. NewDynamicInitializer(...) namespace.Initializer
etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'll move static/*
and dynamic/*
into the namespace
package
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
storage/namespace/map.go
Outdated
namespaces: ns, | ||
ids: ids, | ||
metadatas: nsMetadatas, | ||
}, multiErr.FinalError() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Hm just to be more Go-ish maybe better to if err := multiErr.FinalError(); err != nil { return }
before return the *nsMap
result? Otherwise its the not the typical one or the other return. I like that this avoids branching (for higher levels of test coverage) but I feel sticking to convention may be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure thing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
} | ||
r.repairStatesByNs.setRepairState(n.ID(), t, repairState) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't quite grokked the locking for the dbRepairer
but thought should check in, does this need to be locked when get/set the repair states? Doesn't seem like callers of this method necessarily lock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep we don't lock because the databaseRepairer guarantees at most 1 repair is executing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that mean we can remove all the other locks? Perhaps we can document at the struct level which fields are protected by mutex and which ones protected by the run lock?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep can move the lock to only lock a single field, will make the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Prior to this PR, RetentionOptions were global across namespaces and the commit log. This PR changes that to allow every namespace, and the database commit log to have independent retention options.
Major changes:
namespace.Options
has a new field:RetentionOptions
namespace.NamespaceRegistry
(maps namespace ID -> namespace Metadata)client.AdminClient
method,databaseFlushManager
,databaseCleanupManager
, etc)./cc @robskillington @xichen2020 @martin-mao @cw9
Pending chunks:
persist/fs/Options
usage ofRetentionOptions