Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sorted key/value store (badger) backed storage plugin #760

Merged
merged 32 commits into from Apr 3, 2019

Conversation

burmanm
Copy link
Contributor

@burmanm burmanm commented Mar 29, 2018

This is a storage plugin for Jaeger implemented against a sorted K/V store. Although this version was done against Badger it should work with relatively small changes against any lexicographically sorted K/V store (RocksDB is such a possibility also - but it would require cgo for compiling).

This is WIP, pushed for early feedback. It is missing implementation for Duration index (range index scanning) as well as GetOperations & GetServices interfaces and benchmarketing/more tests of course. Some smaller TODO parts obviously remain as well, some for easier development purposes and some just lacking optimization (not to mention splitting some parts to separate functions).

cc @pavolloffay

@black-adder
Copy link
Collaborator

Thanks for the contribution! Some preliminary thoughts: Have you seen this ticket? #422 we've been considering supporting third party storage plugins via a new plugin framework so that the core jaeger library isn't overly inundated with storage solution specific implementations (new plugins will be supported by the contributor in a separate repo). We haven't come up with the exact framework yet so we can review this PR but in the future, we'd expect to move it over to the plugin framework.

I'll take a look at the actual PR later today.

@burmanm
Copy link
Contributor Author

burmanm commented Mar 29, 2018

Yeah, my intention was to solve the #551 issue. Theoretically moving the code to an external plugin shouldn't be an issue - just add some API listeners in front (if that's the solution #422 ends up with). Separate repo could technically allow defining build flags to support multiple different K/V stores I assume.

@burmanm
Copy link
Contributor Author

burmanm commented Apr 4, 2018

There's a bug in the index seek process, it returns results in the DESC order, but does not actually fetch TOP N by using a DESC ordering, but instead uses ASC. Will fix in the next commit with some other updates (as well as add test that catches the mistake)

@pavolloffay
Copy link
Member

@burmanm thanks for the PR. I missed it.

I am wondering whether we could completely replace in-memory with local storage. I will have a closer look next week.

also travis seems to be broken for the last build

@pavolloffay
Copy link
Member

There was a discussion about this PR on bi-weekly call.

Rather than introducing a new storage impl can this substitute current in-memory? What are the implications? Can it work without creating any files? If so we could use this instead of in-memory.

@burmanm
Copy link
Contributor Author

burmanm commented Apr 12, 2018

By creating files in the tmpfs it is in-memory storage as tmpfs is not persisted to the disk. We could add an option to clean up the tmpfs files on the shutdown if we wanted (pod reboot / machine reboot / etc will do that of course also)

glide.yaml Outdated
@@ -55,3 +55,5 @@ import:
- package: github.com/go-openapi/validate
- package: github.com/go-openapi/loads
- package: github.com/elazarl/go-bindata-assetfs
- package: github.com/dgraph-io/badger
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It needs glide update to update the lock file with this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly I'm not sure what's the proper way to update this project's dependencies. glide up will break Jaeger (for example viper is updated and will no longer compile) as that has no version locked in the glide.yaml

It also updates several other dependencies.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I did in a previous PR was to apply this change in a separate branch and then manually update the lock file only for the dependency I'm updating. Sucks, but works.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this now needs to be switched to Gopkg.toml / dep

@pavolloffay
Copy link
Member

There are a lot of magic numbers in different places. Could you please define them as constants?

@pavolloffay
Copy link
Member

Then when you are creating prefixes could you please extract those into functions?

// Options store storage plugin related configs
type Options struct {
primary *NamespaceConfig
others map[string]*NamespaceConfig
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed? Even the primary namespace

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be constant with the other storage engines. No other reason really.

SpanStoreTTL: time.Hour * 72, // Default is 3 days
SyncWrites: false, // Performance over consistency
Ephemeral: true, // Default is ephemeral storage
ValueDirectory: "",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to use a reasonable default value so users just set empheral to false and it works

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can make as default the application path + data/ for example. I don't think there's a sane default path for files.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about something inside /tmp/? When ephemeral is true, /tmp is appropriate.

I see that ioutil.TempDir is used when this is empty, so, ignore my last comment.

)

// NewOptions creates a new Options struct.
func NewOptions(primaryNamespace string, otherNamespaces ...string) *Options {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate why we need namespaces? If not needed then remove or rename it to flag prefix :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See previous comment to keep consistency with other storage engines.

opts := badger.DefaultOptions

if f.Options.primary.Ephemeral {
opts.SyncWrites = false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If set to false, is data available for query immediately?

Does it make sense for tmpfs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Data is available in any case directly. With tmpfs it makes no difference is sync is true or false (other than extra overhead of fsync call to tmpfs) for durability and there's never a difference when it comes to visibility. Stuff is always written to the page cache in any case, but this controls if fsync to the disk is called after every write or not.


// Seek all the services first
for it.Seek(serviceKey); it.ValidForPrefix(serviceKey); it.Next() {
timestampStartIndex := len(it.Item().Key()) - 24
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 24?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's in the WriteSpan. 8 bytes for the timestamp + 16 bytes for the traceId -> 24

if p == nil {
return ErrMalformedRequestObject
}
if p.ServiceName == "" && len(p.Tags) > 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does the number of tags matter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no restriction to the amount.

}

ids := make([][][]byte, 0, len(indexSeeks)+1)
if len(indexSeeks) > 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like redundant

ids := make([][][]byte, 0, len(indexSeeks)+1)
if len(indexSeeks) > 0 {
for i, s := range indexSeeks {
indexResults, _ := r.scanIndexKeys(s, query.StartTimeMin, query.StartTimeMax)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ignoring error?

}
ids = append(ids, make([][]byte, 0, len(indexResults)))
for _, k := range indexResults {
ids[i] = append(ids[i], k[len(k)-16:])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

16?

Could you please define this magic numbers as constants with some comments?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Size of TraceID is 16.

}

// Close Implements io.Closer
func (r *TraceReader) Close() error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if not needed remove please

plugin/storage/badger/factory.go Show resolved Hide resolved
plugin/storage/badger/options.go Outdated Show resolved Hide resolved
plugin/storage/badger/options.go Outdated Show resolved Hide resolved
plugin/storage/badger/options.go Outdated Show resolved Hide resolved
flagSet.Bool(
nsConfig.namespace+suffixSyncWrite,
nsConfig.SyncWrites,
"If all writes should be synced immediately. This will greatly reduce write performance and will require fast SSD drives. Default is false.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will require fast SSD drives

If I'm not on SSD, are you stopping the process? If not, what does "require" mean here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly that this backend really wants SSDs and if you're going to sync all the writes, you probably want something with fast write time. I've removed it for now since technically you could use a DRAM backed disk array also for writes (but reads would be slow most likely)

flagSet.Bool(
nsConfig.namespace+suffixEphemeral,
nsConfig.Ephemeral,
"Mark this storage ephemeral, data is stored in tmpfs (in-memory). Default is true.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "default" part is added automatically by Viper. This is how it looks like when you run SPAN_STORAGE_TYPE=badger go run ./cmd/collector/main.go --help

--badger.ephemeral                  Mark this storage ephemeral, data is stored in tmpfs (in-memory). Default is true. (default true)

}

func initFromViper(cfg *NamespaceConfig, v *viper.Viper) {
cfg.Ephemeral = v.GetBool(cfg.namespace + suffixEphemeral)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is also intended to replace the current in-memory storage, we need a max-traces option as well. See #845/#842

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a max-traces option? or just a strategy for dealing with a full disk?

@burmanm
Copy link
Contributor Author

burmanm commented Jun 29, 2018

Rebased. I also fixed a generic issue with the testing framework inside Jaeger which happens to test against pointer locations instead of real data.

@SwarnimRaj
Copy link

@burmanm I am trying to test your Badger storage backend for my use case #894 which requires persisting traces to some backend, and being able to send the storage files over a network when needed so that a Jaeger instance on a remote machine can use it to display the traces recorded so far.

I cloned your changes but am unable to figure out how to restore the Jaeger instance using previously persisted traces. It seems f.Options.primary.Ephemeral should be set to false so that the storage is not lost but I am unable to figure out how to restore traces and which files would be required for a remote Jaeger instance to read from.

Could you please list the steps. Long term, I think it would be better if you could write a Readme file on how to use the badger storage and all its features.


// GetServices fetches the sorted service list that have not expired
func (r *TraceReader) GetServices() ([]string, error) {
return r.cache.GetServices()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed an issue with this implementation that in case of initializing badger factory by reloading from a pre-existing directory, the services list does not get populated with services from pre-existing traces even though the traces are accessible by making a direct REST API call.
Probably the cause is that you are relying on the cache which is not pre-populated.


// GetOperations fetches operations in the service and empty slice if service does not exists
func (r *TraceReader) GetOperations(service string) ([]string, error) {
return r.cache.GetOperations(service)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed an issue with this implementation that in case of initializing badger factory by reloading from a pre-existing directory, the operations list does not get populated with services from pre-existing traces even though the traces are accessible by making a direct REST API call.
Probably the cause is that you are relying on the cache which is not pre-populated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NewCacheStore() calls prefillCaches()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unit test that does persisting (mentioned in other comment) also seems to indicate the caches are prefilled correctly.

I should add this type unit tests to the suite.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a backup of Badger db and restored it to a location that was passed as key and value dir to the Jaeger standalone run. I noticed the loadServices() method does find services present in traces of the backup but finds its TTL as 0. This causes the services to be deleted during GetServices() as your logic finds it as expired. I did take a look at your test and somehow TTL in this case is 72 hours ahead of now time so the test passes.
I am not sure why services loaded from a pre-existing Badger db has 0 TTL. Do you know what might be causing this?

For now, commenting the delete line in GetServices() helps populate the services on UI- similarly for GetOperations(). Is there a danger in this approach? Why do you perform the check (v > t) in GetServices?

PS: My use case demands that backed up data should be accessible always.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't replicate that behavior, in my testing I see:

Key: key="\x81service-p\x00\x05pSgE\x89\xbd\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x01", version=1, meta=40, 1531137600, time now: 1530878401
Key: key="\x82service-poperation-p\x00\x05pSgE\x89\xbd\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x01", version=1, meta=40, 1531137600, time now: 1530878401

That is, the services do expire later than the current time (as they should). The reason for that v > t is that we don't return ghost services or operations. That would be bad in terms of usability if one has a lot of services / operations that keep changing.

If you need backups that do not expire ever, then I guess TTL is the wrong approach for you. What you'd want is something closer to a snapshot + purge of snapshotted data (with no TTL, such that you would manually take care of retention policies).

Certainly doable, but maybe it is a bit of feature creep for this PR?

@burmanm
Copy link
Contributor Author

burmanm commented Jul 4, 2018

@SwarnimRaj Here's a very rudimentary unit test that shows how to load an existing file: https://gist.github.com/burmanm/04021a32d8fb728792eb49ab0044dae8

Now, as for backup & restore, there's a separate tool for that (which should take care of LSM syncing). I have not considered such use case and for that reason there's no instructions for it either. But it can be done by following the instructions here: https://github.com/dgraph-io/badger#database-backup

Signed-off-by: Michael Burman <yak@iki.fi>
Signed-off-by: Michael Burman <yak@iki.fi>
Signed-off-by: Michael Burman <yak@iki.fi>
Signed-off-by: Michael Burman <yak@iki.fi>
@burmanm
Copy link
Contributor Author

burmanm commented Feb 13, 2019

Changed the dependencyreader to use spanstore instead of directly reading from the DB. The negative side is that there's increased memory usage as the []model.Trace must be kept in the memory while loading. Previously each model.Trace could have been thrown away by escape analysis.

In any case, there wasn't any context delivered to the dependencyreader unlike to the SpanReader/SpanWriter, so I used context.Background() to fetch data from spanstore to dependencystore.

@jpkrohling
Copy link
Contributor

Once the merge conflict is solved, can this be merged, or is there still something pending?

Yuri Shkuro added 2 commits April 2, 2019 13:05
Signed-off-by: Yuri Shkuro <ys@uber.com>
Signed-off-by: Yuri Shkuro <ys@uber.com>
@yurishkuro
Copy link
Member

I think we mostly need to address #1389.

@jaegertracing jaegertracing deleted a comment from codecov bot Apr 2, 2019
Signed-off-by: Yuri Shkuro <ys@uber.com>
@jaegertracing jaegertracing deleted a comment from codecov bot Apr 2, 2019
Signed-off-by: Yuri Shkuro <ys@uber.com>
@codecov
Copy link

codecov bot commented Apr 2, 2019

Codecov Report

Merging #760 into master will decrease coverage by 0.17%.
The diff coverage is 97.79%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #760      +/-   ##
==========================================
- Coverage     100%   99.82%   -0.18%     
==========================================
  Files         165      172       +7     
  Lines        7510     8146     +636     
==========================================
+ Hits         7510     8132     +622     
- Misses          0        7       +7     
- Partials        0        7       +7
Impacted Files Coverage Δ
plugin/storage/badger/options.go 100% <100%> (ø)
plugin/storage/factory.go 100% <100%> (ø) ⬆️
model/sort.go 100% <100%> (ø) ⬆️
plugin/storage/badger/spanstore/cache.go 100% <100%> (ø)
plugin/storage/badger/stats_linux.go 100% <100%> (ø)
plugin/storage/badger/factory.go 100% <100%> (ø)
plugin/storage/badger/dependencystore/storage.go 94.73% <94.73%> (ø)
plugin/storage/badger/spanstore/reader.go 96.58% <96.58%> (ø)
plugin/storage/badger/spanstore/writer.go 97.22% <97.22%> (ø)
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f47d66f...89e8522. Read the comment docs.

@yurishkuro
Copy link
Member

So the build is green, and the code coverage decreases are only on error checks. I looked around for any tools that allow ignoring certain lines from code coverage, didn't find any.

Yuri Shkuro added 3 commits April 3, 2019 17:08
Signed-off-by: Yuri Shkuro <ys@uber.com>
Signed-off-by: Yuri Shkuro <ys@uber.com>
Signed-off-by: Yuri Shkuro <ys@uber.com>
@yurishkuro yurishkuro merged commit 1703bae into jaegertracing:master Apr 3, 2019
@manishrjain
Copy link

Amazing! Can't wait to use it.

@pd40
Copy link

pd40 commented Apr 22, 2019

I am looking forward to using this as well!

Forgive me if this has already been covered, but will the deployment instructions be updated to include badger as part of this change?

https://github.com/jaegertracing/documentation/blob/master/content/docs/1.11/deployment.md

@yurishkuro
Copy link
Member

It can already be used it with jaegertracing/all-in-one latest. It won't work with any other deployment modes though, because badger runs in-process and cannot be shared by collector & query.

@sherwoodzern
Copy link

I read that this alternative storage option is available; however, the documentation still only mentions ElasticSearch, Cassandra, and Kafka. I'm trying to understand how I would configure this latest option. I want to have all traces written to a non-ephemeral storage. Does this option provide this capability without having to install ElasticSearch, Cassandra, or Kafka? Any examples of deployment using this approach discussed in this thread?

@yurishkuro
Copy link
Member

Storage options are documented in the Deployment section of the docs

image

@sherwoodzern
Copy link

Thank you for the response regarding the storage options. I want to confirm that within my Kubernetes deployment, I can create a PVC and specify that storage to be used by Badger. This will require that I deploy a Badger container and would assume that I will deploy as a daemonset; is this correct? I then specify the storage endpoint when configuring Jaeger with badger. In this manner, I should get all of the traces stored in a non-ephemeral storage.

@yurishkuro
Copy link
Member

@sherwoodzern as explained in the documentation:

Badger is an embedded local storage, only available with all-in-one distribution.

You cannot use it with normal production (i.e. scalable) installation of Jaeger, because the storage is embedded in a single process that includes both collection and UI components.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet