New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional storage backends #638

Open
yurishkuro opened this Issue Jan 8, 2018 · 11 comments

Comments

Projects
None yet
@yurishkuro
Member

yurishkuro commented Jan 8, 2018

Opening this issue to keep track of other related issues.

  • ScyllaDB #197
  • InfluxDB #272
  • Netflix Dynomite #331
  • Amazon DynamoDB #421
  • Local storage in all-in-one #551
  • Badger KV (memory/disk) #760
  • BigTable #1208

Relevant issue: plugin support #422.

@nbettiol

This comment has been minimized.

nbettiol commented Jan 9, 2018

Did you remove the flags for elasticsearch in jaeger-collector? Because I'm doing a test using the image docker, which version is:

{"gitCommit":"dbd5db721fc59431b1e64874cc7d6265d89ec917","GitVersion":"v1.1.0","BuildDate":"2018-01-08T21:56:21Z"}

and I cannot see the elasticsearch flags.

@black-adder

This comment has been minimized.

Collaborator

black-adder commented Jan 9, 2018

It looks like you're using latest instead of 1.1. We recently moved around some of the flags so that we can support plugins better #625. Using latest, you have to instead use env variable SPAN_STORAGE=elasticsearch to use the elasticsearch flags. I'd recommend that you use 1.1 since this change will be apart of 1.2 and will be documented at that time.

@nbettiol

This comment has been minimized.

nbettiol commented Jan 9, 2018

Thanks for the reply, yes I was using the latest version. I will use the 1.1

@fzakaria

This comment has been minimized.

fzakaria commented Jan 16, 2018

I would love to see a SQL option (whatever ANSI SQL that will be least vendor lock-in).
Setting up Cassandra / ElasticSearch might be too ambitious for projects that want distributed tracing but honestly don't have the TPS to warrant a distributed datastore.

@ringerc

This comment has been minimized.

ringerc commented Feb 3, 2018

Since I work with PostgreSQL, I sure wouldn't complain. But honestly I'm not sure a SQL db is an optimal store for largely free-form metrics of this nature. PostgreSQL at least offers the jsonb type for indexable free-form data. If you're trying to do this in a vendor neutral way you'll land up with your own json blobs, or doing EAV, and both of those are terrible. ANSI SQL is a poor fit for variable-structured or key/value form data and you'll need some vendor extensions to get usable performance.

But you inevitably land up with someone putting an ORM on top to "abstract" the DB. Then the ORM performs terribly, gobbles memory and everyone says "the SQL backend is slow, use instead".

@pavolloffay

This comment has been minimized.

Member

pavolloffay commented Feb 5, 2018

Related issue to this one is #551. Upvote if you are interested in it.

@SwarnimRaj

This comment has been minimized.

SwarnimRaj commented Jun 29, 2018

New related issue-
Files - #894

@wy100101

This comment has been minimized.

wy100101 commented Aug 1, 2018

We are looking at using BigQuery as a storage layer. Presumably this could work with a SQL storage option. SQL can be a generic way to deal with columnar data stores in a generic way. I would complain about a BigQuery specific solution, but I think there is a place for generic SQL interface beyond RDBs.

@yurishkuro

This comment has been minimized.

Member

yurishkuro commented Aug 1, 2018

I assume that even if some database can be treated as SQL and accessed via standard database/sql API, we still need to statically import the actual driver. Granted, this may be less maintenance than a dedicated SpanStorage implementation. However, now that the protobuf model has been merged, nothing is blocking us from moving on the storage plugin dev, eg using something like harshicorp grpc plugin framework.

@isaachier

This comment has been minimized.

Contributor

isaachier commented Aug 1, 2018

Our model is sufficiently simple to warrant looking into using an ORM to support a large number of backends. I'll take a look at what's available. Reread above and understand what @yurishkuro means.

@bruth

This comment has been minimized.

bruth commented Aug 6, 2018

Giving my two cents.. an ANSI SQL could work for small workloads, so may be useful for lower-throughput applications that still want to benefit from this tool.

I will also throw out there that Timescale (a Postgres extension) may be a good fit for the required high write throughput.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment