Sorted key/value store (badger) backed storage plugin #760

burmanm · 2018-03-29T10:34:30Z

This is a storage plugin for Jaeger implemented against a sorted K/V store. Although this version was done against Badger it should work with relatively small changes against any lexicographically sorted K/V store (RocksDB is such a possibility also - but it would require cgo for compiling).

This is WIP, pushed for early feedback. It is missing implementation for Duration index (range index scanning) as well as GetOperations & GetServices interfaces and benchmarketing/more tests of course. Some smaller TODO parts obviously remain as well, some for easier development purposes and some just lacking optimization (not to mention splitting some parts to separate functions).

cc @pavolloffay

black-adder · 2018-03-29T12:51:20Z

Thanks for the contribution! Some preliminary thoughts: Have you seen this ticket? #422 we've been considering supporting third party storage plugins via a new plugin framework so that the core jaeger library isn't overly inundated with storage solution specific implementations (new plugins will be supported by the contributor in a separate repo). We haven't come up with the exact framework yet so we can review this PR but in the future, we'd expect to move it over to the plugin framework.

I'll take a look at the actual PR later today.

burmanm · 2018-03-29T13:54:20Z

Yeah, my intention was to solve the #551 issue. Theoretically moving the code to an external plugin shouldn't be an issue - just add some API listeners in front (if that's the solution #422 ends up with). Separate repo could technically allow defining build flags to support multiple different K/V stores I assume.

burmanm · 2018-04-04T13:02:50Z

There's a bug in the index seek process, it returns results in the DESC order, but does not actually fetch TOP N by using a DESC ordering, but instead uses ASC. Will fix in the next commit with some other updates (as well as add test that catches the mistake)

pavolloffay · 2018-04-06T08:57:25Z

@burmanm thanks for the PR. I missed it.

I am wondering whether we could completely replace in-memory with local storage. I will have a closer look next week.

also travis seems to be broken for the last build

pavolloffay · 2018-04-12T07:48:45Z

There was a discussion about this PR on bi-weekly call.

Rather than introducing a new storage impl can this substitute current in-memory? What are the implications? Can it work without creating any files? If so we could use this instead of in-memory.

burmanm · 2018-04-12T08:51:04Z

By creating files in the tmpfs it is in-memory storage as tmpfs is not persisted to the disk. We could add an option to clean up the tmpfs files on the shutdown if we wanted (pod reboot / machine reboot / etc will do that of course also)

pavolloffay · 2018-06-08T09:52:25Z

glide.yaml

@@ -55,3 +55,5 @@ import:
 - package: github.com/go-openapi/validate
 - package: github.com/go-openapi/loads
 - package: github.com/elazarl/go-bindata-assetfs
+- package: github.com/dgraph-io/badger


It needs glide update to update the lock file with this change.

Sadly I'm not sure what's the proper way to update this project's dependencies. glide up will break Jaeger (for example viper is updated and will no longer compile) as that has no version locked in the glide.yaml

It also updates several other dependencies.

What I did in a previous PR was to apply this change in a separate branch and then manually update the lock file only for the dependency I'm updating. Sucks, but works.

this now needs to be switched to Gopkg.toml / dep

pavolloffay · 2018-06-08T15:18:12Z

There are a lot of magic numbers in different places. Could you please define them as constants?

pavolloffay · 2018-06-08T15:18:57Z

Then when you are creating prefixes could you please extract those into functions?

pavolloffay · 2018-06-08T12:27:15Z

plugin/storage/badger/options.go

+// Options store storage plugin related configs
+type Options struct {
+	primary *NamespaceConfig
+	others  map[string]*NamespaceConfig


Why is this needed? Even the primary namespace

To be constant with the other storage engines. No other reason really.

pavolloffay · 2018-06-08T12:54:49Z

plugin/storage/badger/options.go

+			SpanStoreTTL:   time.Hour * 72, // Default is 3 days
+			SyncWrites:     false,          // Performance over consistency
+			Ephemeral:      true,           // Default is ephemeral storage
+			ValueDirectory: "",


It would be good to use a reasonable default value so users just set empheral to false and it works

I can make as default the application path + data/ for example. I don't think there's a sane default path for files.

~~How about something inside /tmp/? When ephemeral is true, /tmp is appropriate.~~

I see that ioutil.TempDir is used when this is empty, so, ignore my last comment.

pavolloffay · 2018-06-08T12:58:46Z

plugin/storage/badger/options.go

+)
+
+// NewOptions creates a new Options struct.
+func NewOptions(primaryNamespace string, otherNamespaces ...string) *Options {


Can you elaborate why we need namespaces? If not needed then remove or rename it to flag prefix :)

See previous comment to keep consistency with other storage engines.

pavolloffay · 2018-06-08T13:13:54Z

plugin/storage/badger/factory.go

+	opts := badger.DefaultOptions
+
+	if f.Options.primary.Ephemeral {
+		opts.SyncWrites = false


If set to false, is data available for query immediately?

Does it make sense for tmpfs?

Data is available in any case directly. With tmpfs it makes no difference is sync is true or false (other than extra overhead of fsync call to tmpfs) for durability and there's never a difference when it comes to visibility. Stuff is always written to the page cache in any case, but this controls if fsync to the disk is called after every write or not.

pavolloffay · 2018-06-08T13:18:43Z

plugin/storage/badger/spanstore/cache.go

+
+		// Seek all the services first
+		for it.Seek(serviceKey); it.ValidForPrefix(serviceKey); it.Next() {
+			timestampStartIndex := len(it.Item().Key()) - 24


It's in the WriteSpan. 8 bytes for the timestamp + 16 bytes for the traceId -> 24

pavolloffay · 2018-06-08T14:52:57Z

plugin/storage/badger/spanstore/reader.go

+	if p == nil {
+		return ErrMalformedRequestObject
+	}
+	if p.ServiceName == "" && len(p.Tags) > 0 {


does the number of tags matter?

There's no restriction to the amount.

pavolloffay · 2018-06-08T14:55:16Z

plugin/storage/badger/spanstore/reader.go

+	}
+
+	ids := make([][][]byte, 0, len(indexSeeks)+1)
+	if len(indexSeeks) > 0 {


seems like redundant

pavolloffay · 2018-06-08T14:57:31Z

plugin/storage/badger/spanstore/reader.go

+	ids := make([][][]byte, 0, len(indexSeeks)+1)
+	if len(indexSeeks) > 0 {
+		for i, s := range indexSeeks {
+			indexResults, _ := r.scanIndexKeys(s, query.StartTimeMin, query.StartTimeMax)


ignoring error?

pavolloffay · 2018-06-08T15:00:02Z

plugin/storage/badger/spanstore/reader.go

+			}
+			ids = append(ids, make([][]byte, 0, len(indexResults)))
+			for _, k := range indexResults {
+				ids[i] = append(ids[i], k[len(k)-16:])


16?

Could you please define this magic numbers as constants with some comments?

Size of TraceID is 16.

pavolloffay · 2018-06-08T15:19:47Z

plugin/storage/badger/spanstore/reader.go

+}
+
+// Close Implements io.Closer
+func (r *TraceReader) Close() error {


if not needed remove please

plugin/storage/badger/factory.go

plugin/storage/badger/options.go

jpkrohling · 2018-06-12T13:34:54Z

plugin/storage/badger/options.go

+	flagSet.Bool(
+		nsConfig.namespace+suffixSyncWrite,
+		nsConfig.SyncWrites,
+		"If all writes should be synced immediately. This will greatly reduce write performance and will require fast SSD drives. Default is false.",


will require fast SSD drives

If I'm not on SSD, are you stopping the process? If not, what does "require" mean here?

Mostly that this backend really wants SSDs and if you're going to sync all the writes, you probably want something with fast write time. I've removed it for now since technically you could use a DRAM backed disk array also for writes (but reads would be slow most likely)

jpkrohling · 2018-06-12T13:41:16Z

plugin/storage/badger/options.go

+	flagSet.Bool(
+		nsConfig.namespace+suffixEphemeral,
+		nsConfig.Ephemeral,
+		"Mark this storage ephemeral, data is stored in tmpfs (in-memory). Default is true.",


The "default" part is added automatically by Viper. This is how it looks like when you run SPAN_STORAGE_TYPE=badger go run ./cmd/collector/main.go --help

--badger.ephemeral Mark this storage ephemeral, data is stored in tmpfs (in-memory). Default is true. (default true)

jpkrohling · 2018-06-12T13:51:40Z

plugin/storage/badger/options.go

+}
+
+func initFromViper(cfg *NamespaceConfig, v *viper.Viper) {
+	cfg.Ephemeral = v.GetBool(cfg.namespace + suffixEphemeral)


If this is also intended to replace the current in-memory storage, we need a max-traces option as well. See #845/#842

Do we need a max-traces option? or just a strategy for dealing with a full disk?

burmanm · 2018-06-29T11:57:00Z

Rebased. I also fixed a generic issue with the testing framework inside Jaeger which happens to test against pointer locations instead of real data.

SwarnimRaj · 2018-07-03T10:32:08Z

@burmanm I am trying to test your Badger storage backend for my use case #894 which requires persisting traces to some backend, and being able to send the storage files over a network when needed so that a Jaeger instance on a remote machine can use it to display the traces recorded so far.

I cloned your changes but am unable to figure out how to restore the Jaeger instance using previously persisted traces. It seems f.Options.primary.Ephemeral should be set to false so that the storage is not lost but I am unable to figure out how to restore traces and which files would be required for a remote Jaeger instance to read from.

Could you please list the steps. Long term, I think it would be better if you could write a Readme file on how to use the badger storage and all its features.

SwarnimRaj · 2018-07-04T10:01:44Z

plugin/storage/badger/spanstore/reader.go

+
+// GetServices fetches the sorted service list that have not expired
+func (r *TraceReader) GetServices() ([]string, error) {
+	return r.cache.GetServices()


I noticed an issue with this implementation that in case of initializing badger factory by reloading from a pre-existing directory, the services list does not get populated with services from pre-existing traces even though the traces are accessible by making a direct REST API call.
Probably the cause is that you are relying on the cache which is not pre-populated.

SwarnimRaj · 2018-07-04T10:02:05Z

plugin/storage/badger/spanstore/reader.go

+
+// GetOperations fetches operations in the service and empty slice if service does not exists
+func (r *TraceReader) GetOperations(service string) ([]string, error) {
+	return r.cache.GetOperations(service)


I noticed an issue with this implementation that in case of initializing badger factory by reloading from a pre-existing directory, the operations list does not get populated with services from pre-existing traces even though the traces are accessible by making a direct REST API call.
Probably the cause is that you are relying on the cache which is not pre-populated.

NewCacheStore() calls prefillCaches()

The unit test that does persisting (mentioned in other comment) also seems to indicate the caches are prefilled correctly.

I should add this type unit tests to the suite.

I took a backup of Badger db and restored it to a location that was passed as key and value dir to the Jaeger standalone run. I noticed the loadServices() method does find services present in traces of the backup but finds its TTL as 0. This causes the services to be deleted during GetServices() as your logic finds it as expired. I did take a look at your test and somehow TTL in this case is 72 hours ahead of now time so the test passes.
I am not sure why services loaded from a pre-existing Badger db has 0 TTL. Do you know what might be causing this?

For now, commenting the delete line in GetServices() helps populate the services on UI- similarly for GetOperations(). Is there a danger in this approach? Why do you perform the check (v > t) in GetServices?

PS: My use case demands that backed up data should be accessible always.

I can't replicate that behavior, in my testing I see:

Key: key="\x81service-p\x00\x05pSgE\x89\xbd\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x01", version=1, meta=40, 1531137600, time now: 1530878401 Key: key="\x82service-poperation-p\x00\x05pSgE\x89\xbd\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x01", version=1, meta=40, 1531137600, time now: 1530878401

That is, the services do expire later than the current time (as they should). The reason for that v > t is that we don't return ghost services or operations. That would be bad in terms of usability if one has a lot of services / operations that keep changing.

If you need backups that do not expire ever, then I guess TTL is the wrong approach for you. What you'd want is something closer to a snapshot + purge of snapshotted data (with no TTL, such that you would manually take care of retention policies).

Certainly doable, but maybe it is a bit of feature creep for this PR?

burmanm · 2018-07-04T12:47:26Z

@SwarnimRaj Here's a very rudimentary unit test that shows how to load an existing file: https://gist.github.com/burmanm/04021a32d8fb728792eb49ab0044dae8

Now, as for backup & restore, there's a separate tool for that (which should take care of LSM syncing). I have not considered such use case and for that reason there's no instructions for it either. But it can be done by following the instructions here: https://github.com/dgraph-io/badger#database-backup

Signed-off-by: Michael Burman <yak@iki.fi>

burmanm · 2019-02-13T20:46:15Z

Changed the dependencyreader to use spanstore instead of directly reading from the DB. The negative side is that there's increased memory usage as the []model.Trace must be kept in the memory while loading. Previously each model.Trace could have been thrown away by escape analysis.

In any case, there wasn't any context delivered to the dependencyreader unlike to the SpanReader/SpanWriter, so I used context.Background() to fetch data from spanstore to dependencystore.

jpkrohling · 2019-04-02T13:52:24Z

Once the merge conflict is solved, can this be merged, or is there still something pending?

Signed-off-by: Yuri Shkuro <ys@uber.com>

yurishkuro · 2019-04-02T17:09:27Z

I think we mostly need to address #1389.

Signed-off-by: Yuri Shkuro <ys@uber.com>

codecov · 2019-04-02T17:32:41Z

Codecov Report

Merging #760 into master will decrease coverage by 0.17%.
The diff coverage is 97.79%.

@@            Coverage Diff             @@
##           master     #760      +/-   ##
==========================================
- Coverage     100%   99.82%   -0.18%     
==========================================
  Files         165      172       +7     
  Lines        7510     8146     +636     
==========================================
+ Hits         7510     8132     +622     
- Misses          0        7       +7     
- Partials        0        7       +7

Impacted Files	Coverage Δ
plugin/storage/badger/options.go	`100% <100%> (ø)`
plugin/storage/factory.go	`100% <100%> (ø)`	⬆️
model/sort.go	`100% <100%> (ø)`	⬆️
plugin/storage/badger/spanstore/cache.go	`100% <100%> (ø)`
plugin/storage/badger/stats_linux.go	`100% <100%> (ø)`
plugin/storage/badger/factory.go	`100% <100%> (ø)`
plugin/storage/badger/dependencystore/storage.go	`94.73% <94.73%> (ø)`
plugin/storage/badger/spanstore/reader.go	`96.58% <96.58%> (ø)`
plugin/storage/badger/spanstore/writer.go	`97.22% <97.22%> (ø)`
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f47d66f...89e8522. Read the comment docs.

yurishkuro · 2019-04-02T17:49:41Z

So the build is green, and the code coverage decreases are only on error checks. I looked around for any tools that allow ignoring certain lines from code coverage, didn't find any.

Signed-off-by: Yuri Shkuro <ys@uber.com>

manishrjain · 2019-04-03T23:05:41Z

Amazing! Can't wait to use it.

pd40 · 2019-04-22T18:00:07Z

I am looking forward to using this as well!

Forgive me if this has already been covered, but will the deployment instructions be updated to include badger as part of this change?

https://github.com/jaegertracing/documentation/blob/master/content/docs/1.11/deployment.md

yurishkuro · 2019-04-23T03:28:21Z

It can already be used it with jaegertracing/all-in-one latest. It won't work with any other deployment modes though, because badger runs in-process and cannot be shared by collector & query.

sherwoodzern · 2019-11-07T18:59:51Z

I read that this alternative storage option is available; however, the documentation still only mentions ElasticSearch, Cassandra, and Kafka. I'm trying to understand how I would configure this latest option. I want to have all traces written to a non-ephemeral storage. Does this option provide this capability without having to install ElasticSearch, Cassandra, or Kafka? Any examples of deployment using this approach discussed in this thread?

yurishkuro · 2019-11-07T20:29:20Z

Storage options are documented in the Deployment section of the docs

sherwoodzern · 2019-11-07T22:10:28Z

Thank you for the response regarding the storage options. I want to confirm that within my Kubernetes deployment, I can create a PVC and specify that storage to be used by Badger. This will require that I deploy a Badger container and would assume that I will deploy as a daemonset; is this correct? I then specify the storage endpoint when configuring Jaeger with badger. In this manner, I should get all of the traces stored in a non-ephemeral storage.

yurishkuro · 2019-11-07T22:26:40Z

@sherwoodzern as explained in the documentation:

Badger is an embedded local storage, only available with all-in-one distribution.

You cannot use it with normal production (i.e. scalable) installation of Jaeger, because the storage is embedded in a single process that includes both collection and UI components.

burmanm requested review from black-adder, pavolloffay, vprithvi and yurishkuro as code owners March 29, 2018 10:34

burmanm force-pushed the local_storage branch from ceb53fe to ba524e7 Compare April 16, 2018 11:39

burmanm mentioned this pull request May 3, 2018

WIP integration-test fixtures should use internal model #800

Closed

burmanm force-pushed the local_storage branch from 71aeaab to f5ea8e3 Compare May 3, 2018 15:23

pavolloffay reviewed Jun 8, 2018

View reviewed changes

burmanm force-pushed the local_storage branch from f5ea8e3 to 362aef7 Compare June 8, 2018 11:54

burmanm requested a review from jpkrohling as a code owner June 8, 2018 11:54

pavolloffay reviewed Jun 8, 2018

View reviewed changes

jpkrohling reviewed Jun 12, 2018

View reviewed changes

This was referenced Jun 29, 2018

Files based storage backend for Jaeger tracing #894

Closed

Additional storage backends #638

Open

jpkrohling added enhancement area/storage labels Jun 29, 2018

burmanm force-pushed the local_storage branch from 8c48c35 to f74905b Compare June 29, 2018 11:33

SwarnimRaj reviewed Jul 4, 2018

View reviewed changes

burmanm added 5 commits February 12, 2019 15:28

Change cache interfaces and add new tests to reach higher coverage

24ec9c2

Signed-off-by: Michael Burman <yak@iki.fi>

Add more tests, including validation and encoding parsing tests

8d3408d

Signed-off-by: Michael Burman <yak@iki.fi>

Fix test refactoring to get factory coverage back to 100%

7b81610

Signed-off-by: Michael Burman <yak@iki.fi>

Change dependencyreader to use spanstore

8606253

Signed-off-by: Michael Burman <yak@iki.fi>

Remove redundant consts

e575e4c

Signed-off-by: Michael Burman <yak@iki.fi>

yurishkuro mentioned this pull request Feb 28, 2019

Lower code coverage requirements #1389

Closed

mumoshu mentioned this pull request Mar 31, 2019

High cardinality labels grafana/loki#91

Open

jpkrohling mentioned this pull request Apr 2, 2019

Can all-in-one be used in production? #551

Closed

Yuri Shkuro added 2 commits April 2, 2019 13:05

Merge branch 'master' into local_storage

bfb1b7d

Signed-off-by: Yuri Shkuro <ys@uber.com>

dep update

10705ba

Signed-off-by: Yuri Shkuro <ys@uber.com>

jaegertracing deleted a comment from codecov bot Apr 2, 2019

make fmt

8bf30ad

Signed-off-by: Yuri Shkuro <ys@uber.com>

jaegertracing deleted a comment from codecov bot Apr 2, 2019

regen proto files

10b4b57

Signed-off-by: Yuri Shkuro <ys@uber.com>

Yuri Shkuro added 3 commits April 3, 2019 17:08

Merge branch 'master' into local_storage

e053151

Signed-off-by: Yuri Shkuro <ys@uber.com>

dep --update

25d27cb

Signed-off-by: Yuri Shkuro <ys@uber.com>

make proto

89e8522

Signed-off-by: Yuri Shkuro <ys@uber.com>

yurishkuro merged commit 1703bae into jaegertracing:master Apr 3, 2019

Sorted key/value store (badger) backed storage plugin #760

Sorted key/value store (badger) backed storage plugin #760

Conversation

burmanm commented Mar 29, 2018

black-adder commented Mar 29, 2018

burmanm commented Mar 29, 2018

burmanm commented Apr 4, 2018

pavolloffay commented Apr 6, 2018

pavolloffay commented Apr 12, 2018

burmanm commented Apr 12, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pavolloffay commented Jun 8, 2018

pavolloffay commented Jun 8, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jpkrohling Jun 12, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

burmanm commented Jun 29, 2018

SwarnimRaj commented Jul 3, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

burmanm commented Jul 4, 2018

burmanm commented Feb 13, 2019

jpkrohling commented Apr 2, 2019

yurishkuro commented Apr 2, 2019

codecov bot commented Apr 2, 2019 • edited

Codecov Report

yurishkuro commented Apr 2, 2019

manishrjain commented Apr 3, 2019

pd40 commented Apr 22, 2019

yurishkuro commented Apr 23, 2019

sherwoodzern commented Nov 7, 2019

yurishkuro commented Nov 7, 2019

sherwoodzern commented Nov 7, 2019

yurishkuro commented Nov 7, 2019

jpkrohling Jun 12, 2018 •

edited

codecov bot commented Apr 2, 2019 •

edited