Add storage support (ex table service or docdb) #8

codefromthecrypt · 2017-02-09T15:32:53Z

Folks have been expressing interest in storage, particularly as while apps can transmit to EventHub, eventually that data needs to land somewhere :)

Ex. DocDB https://groups.google.com/d/msg/zipkin-dev/bTBKUEjLtKg/yLbvVkFYAQAJ
or possibly table service @kevinbayes

There are likely so pros and cons to discuss here

kevinbayes · 2017-02-09T19:36:59Z

@adriancole, yeah I need to go through Zipkin to see if Azure Table Storage would be a good fit for this. I suspect it might be.

The attractive aspect of Table Storage is it allows a user to run on azure with "out-of-box" services.

aliostad · 2017-02-10T09:38:16Z

A word of caution against Azure Table Storage: it can be painfully slow to return +50 items from the same partition key. I have reported it to Microsoft. It is basically more or less a BigTable impl but lacks the finesse of DynamoDB or Cassandra.

kevinbayes · 2017-02-11T00:02:38Z

@aliostad that is also my view on table storage, that is why I need to investigate what the data-structure would need to look like to support table storage.

DocumentDB can be used but is a bit pricey, so have also reported the cost to Microsoft to see if there are any plans to reduce it in the future. Besides that, DocumentDB is a good fit (even with the watered down feature set compared to other stores).

aliostad · 2017-02-11T10:04:05Z

@kevinbayes yeah need to know how the data gets queried to know what is the best to use. Good news is Elasticsearch is already used by many companies using Azure - for me too it is my storage of choice. I have started introducing Cassandra in our company but I lack hardcore experience using it in anger.

Also DocDB, from the little I know, more or less is a glorified key-value store, not sure if it is the right solution here - I could be wrong.

codefromthecrypt · 2017-02-11T10:26:53Z

the hardest part of the zipkin query api is an equals match on nested data. ex. http.method=GET would go against span.binaryAnnotations[].key=http.method value=GET Here's the api http://zipkin.io/zipkin-api/#/ And here's the actual type that describes in java what's needed: https://github.com/openzipkin/zipkin/blob/master/zipkin/src/main/java/zipkin/storage/QueryRequest.java I think in the past people have looked at cloud storage (or other services) as a convenience, taking away the responsibility of knowing things like elasticsearch and how to prune data etc. I think that at least one Azure storage option will likely exist, it is just a matter of someone doing it. Making a storage driver is far more work than something like a collector, so indeed which of the two probably is an an important decision.

aliostad · 2017-02-11T10:50:36Z

Do we know why DocDB impl stopped? Was it lack of interest or technical hindrance?

codefromthecrypt · 2017-02-11T12:39:43Z

Do we know why DocDB impl stopped? Was it lack of interest or technical hindrance?

@prbarl my guess was this just fell off radar, right?

aliostad · 2017-02-13T09:27:27Z

the hardest part of the zipkin query api is an equals match on nested data.

Well that is good to know. But I am thinking Azure Table Storage will have problem with event loading the root page with listing all recent spans. How does that work in Cassandra? Are we storing spanIds against perhaps one-minute partition key and then look up those spans?

codefromthecrypt · 2017-02-13T09:31:10Z

I've put a nag in my brain to answer completely the couple ways things are done, which should help make the decision easier. As displaced might be a few days. If you want to check earlier, then peek at the cassandra schema or the elasticsearch indexing template for hints.

praveenbarli · 2017-02-14T08:45:46Z

@adriancole Yes, got deviated but not completely off. I checked on Azure Search, DocDb and put efforts in using DocDb storage for trace storage and retrieval. Had to switch to different things in between but started looking into Dependencies last week.

praveenbarli · 2017-04-17T22:33:15Z

I have been working on Zipkin plugin for Application Insights as storage. I wanted people on this forum to know about the reasons behind this work based on our discussions within and with customers.

Existing systems in Azure are already using Application Insights for telemetry (although without this layer, that telemetry doesn't include job-level tracing). Users don't want to add a new storage system when they can do with the one they are using.
AI is much cheaper than DocDB, Azure Search, and other cloud data stores we have looked at.
AI is MSFT's telemetry brand. When we offer other back-ends, MSFT shops say "I would be more comfortable if this built on top of the existing MSFT telemetry system".
For these reasons (and maybe others too), customers have specifically requested an AI-based system.
Note that none of these are dev-level, feature-level arguments. As a dev, for throughput, flexibility, and rich API, I would definitely pick DocDB over AI (which is why we originally did). But there are business, economic, and existing system constraints that pushed us to offer an AI store.
At a dev-level perspective, what you can say about AI is (i) auto-purging is nice and (ii) the AI system in Azure offers a rich web UI which the AI team can extend to present this data to existing Azure customers in a place they are used to looking at telemetry data; we are working with the AI team to make this happen.

codefromthecrypt · 2017-04-18T00:16:49Z

Thanks for the rundown, Praveen. Well done. Interested in feedback from others who use zipkin and azure, too. Are there any features AI does not or cannot support yet? You mentioned cost, how precisely is AI cheaper than DocDb? There are a lot of facets to cost including cost of ingress, query and retention. Finally, when you say customers, do you mean customers with specific interest in Zipkin? If so, are any on github to validate work done here works for them?

praveenkbarli · 2017-04-18T08:57:14Z

@adriancole you are welcome! Here are the answers to your questions
Feature wise I got to check on Spark support.
cost comparision:
I am putting the standard costs but woul like to explore the links as I am not best at pricing.
Storage for AI is 2.30 /GB

DocDB 0.25/GB
Reserved RUs /second (per 100 RUs, 400 RUs minimum) with 0.008 per hour - about 23.81 /month
RU is the throughput .

https://azure.microsoft.com/en-us/pricing/details/application-insights/
https://azure.microsoft.com/en-us/pricing/calculator/

These customers are not on Github but were interested to know about Zipkin.

aliostad · 2017-04-18T09:31:02Z

2 cents from me.

From someone who has been working in Azure for the last 5 years, I must say Application Insight is a failure. UI is slow, clunky and frankly unusable. Microsoft have changed their M&A story many times now and for anyone doing some serious work in Azure, you would need to build your own dashboards as we have. But building support for something is different from using it: it would be good to support AI for entry level usage and to boost usage in all scenarios.

Would be useful to mention caveats.

codefromthecrypt · 2017-04-18T10:09:07Z

thanks @aliostad for the feedback. keen insight that "building support for something is different from using it".

one thing I'm concerned about is the maintenance aspect.

In most clouds, deploying storage is very straightforward. Azure seems to have a cli where you can easily provision docdb including choosing which of all regions you can use. https://docs.microsoft.com/en-us/azure/documentdb/documentdb-automation-resource-manager-cli This looks reasonably easy to setup with automation with uncomplicated commands. The fact that I can choose south-east asia means my write latency won't be terrible when testing.

It seems AI is limited both in regions and also by more complex configuration. It isn't clear how to change the retention policy which is important when people go prod as usually trace retention is days (don't want to multiply the cost for unneeded long retention).

Moreover, unlike documentdb, there's no spark driver in place, which means things like dependency graph linking will be custom or much more code to cover edge cases when navigating more data. documentdb frankly is a better fit.

Let's say we do AI anyway.. how would that impact us?

Right now, there's a separate repo for zipkin-stackdriver, run by the goog team who support reusing their APM's tracing service. This one is more straightforward as it only deals with traces and the format is very similar to zipkin's. Finally, they directly support including releasing that. This all came from direct and visible customer requests (ex users literally on github asking and validating). Finally, the product is free. We are rarely impacted by stackdriver except for bug fixes.

What's different here, is that the scope of the product is much wider (a full APM). There's costs involved and as yet no direct demand from someone who wants to use it. It also seems more limited and complex vs DocDB. Unlike stackdriver, we are talking about hosting this codebase here. This is kiting work and attention in a way we don't typically behave. Usually we wait for users to request something before adding a large amount of code to support it.

OTOH, it "seems" MS are going to directly support this. That could help mitigate some of the problems and maybe even address product gaps.

Things to think about

praveenbarli · 2017-04-21T00:47:52Z

@aliostad @adriancole Thanks for your feedback/comments.

praveenbarli · 2017-05-11T00:25:18Z

Here is the PR for supporting AppInsights storage. #27
I would like to thank @adriancole and team at MSFT for your support.

clehene · 2017-05-19T00:55:04Z

I think Cosmos/Doc db is the best (hosted) choice.

Here's an overview

CosmosDB (replaces DocumentDB and most others dbs..)

This is one of the most versatile and reliable (SLA covers availability > 99.99% and provides latency guarantees of 10/15ms for read / write) services in Azure.
It supports multiple APIs from KV to Table, Document and Graph. I've mostly looked at Document.

The actual cost will depend on actual data modeling and I suspect this would take most time for an implementation, even if only to figure out what would make most sense.

Limits

Can't find the limits page but it can be provisioned to over 250,000 RUs / second

SLA

availability, P99 latency and throughput:

< 99.99% - 10% discount
< 99% - 25% discount

availability and throughput are both calculated as error rates as a sum of errors per hour divided by hours in a month, with throughput violation as throttle errors that happen within provisioned IOPS
latency is defined as percentile for successful requests

I recommend reading the SLA as an example of a good SLA :)
https://azure.microsoft.com/en-us/support/legal/sla/cosmos-db/v1_0/

Pricing

1 read of 1KB requires 1 RU
Storage $0.25 / GB / month
100RUs / second $0.008/h

Application Insights

I've done a very basic evaluation of AI for functionality, limits and pricing including an actual functional test.

My first subjective opinion was that it's half backed and clunky, but also work in progress, so perhaps could be revisited later on.
I don't believe AI is feasible as a tracing backend, and IMO not feasible for serious telemetry use-cases in general either.

Limits

relevant 32k events / second over a minute
https://github.com/Microsoft/azure-docs/blob/master/includes/application-insights-limits.md

Pricing

The cost structure is a bit weird too and the cost is extremely high https://azure.microsoft.com/en-us/pricing/details/application-insights/

Basically you get 1GB free / month and then it costs $2.3/GB. Not sure how the $15/node would apply.

SLA

First, it's not quite realtime, but I couldn't figure out a rule on how latent it is.
However the SLA specifies up to 2h delays
The availability SLO is 99.9% with 10% discount below that and 25% discount below 99%. But note that downtime is additional latency after the 2 allowed hours.
https://azure.microsoft.com/en-us/support/legal/sla/application-insights/v1_0/

praveenbarli · 2017-05-19T04:19:39Z

@clehene Thanks for your input.

FYI, App Insight SLA for latency is when you do not flush your writes immediately - recommended when you use AI just as APM. But while using data storage (as the AI plugin) we flush spans as they come and I see latency in minutes (~ 2-5 minutes). However, I agree with you by large on DocDB being a better alternative in terms of latency and availability as I also mentioned in my previous comments and AI support is not just based on these.

Regarding, the plugin for a Cosmos/Doc Db I am working on it and will keep you informed on the updates. I would also like to hear from other users interested in this plugin.

clehene · 2017-05-19T13:48:01Z

@praveenbarli got it, thanks for clarifying. I created #30

SergeyKanzhelev · 2017-06-21T20:38:55Z

@aliostad thanks for feedback on Application Insights. I hope it will surprise you with all the new feature and improved UI we are building. You should take a look at Analytics and Live Stream - killing combination of fast and powerful query language on historical data and live unfiltered view of current service health. Rich curated experiences are coming soon.

I started working with @praveenbarli on Zipkin to Application Insights to match concepts, not use Application Insights as just a storage of indexed json blobs.

Created #33 to track and discuss this work further

aliostad · 2017-06-23T08:24:52Z

Thanks @SergeyKanzhelev I appreciate your comments.

But also appreciate if you could refrain from marketing language (surprise you with new features, killing combination, rich curated experience) when it comes to a paid services especially when it is stated by an employee of the vendors.

Application Insight has a way to go to impress people who have seen the light but for those who don't have any monitoring is certainly useful. I am sure Microsoft is working to improve it.

SergeyKanzhelev · 2017-06-23T15:50:43Z

@aliostad not meant to offend you with my emotional comment. I'm an engineer, not working for marketing. And yes, I am personally working to improve it. If (or when) you'll decide to try it again next time - I'll be happy to listen for more feedback and walk you thru my favorite features.

clehene mentioned this issue May 19, 2017

CosmosDB / DocDB storage support #30

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add storage support (ex table service or docdb) #8

Add storage support (ex table service or docdb) #8

codefromthecrypt commented Feb 9, 2017

kevinbayes commented Feb 9, 2017

aliostad commented Feb 10, 2017

kevinbayes commented Feb 11, 2017

aliostad commented Feb 11, 2017 •

edited

Loading

codefromthecrypt commented Feb 11, 2017 via email

aliostad commented Feb 11, 2017

codefromthecrypt commented Feb 11, 2017 via email

aliostad commented Feb 13, 2017

codefromthecrypt commented Feb 13, 2017 via email

praveenbarli commented Feb 14, 2017

praveenbarli commented Apr 17, 2017 •

edited

Loading

codefromthecrypt commented Apr 18, 2017 via email

praveenkbarli commented Apr 18, 2017 •

edited

Loading

aliostad commented Apr 18, 2017

codefromthecrypt commented Apr 18, 2017

praveenbarli commented Apr 21, 2017

praveenbarli commented May 11, 2017

clehene commented May 19, 2017 •

edited

Loading

praveenbarli commented May 19, 2017

clehene commented May 19, 2017

SergeyKanzhelev commented Jun 21, 2017

aliostad commented Jun 23, 2017

SergeyKanzhelev commented Jun 23, 2017

Add storage support (ex table service or docdb) #8

Add storage support (ex table service or docdb) #8

Comments

codefromthecrypt commented Feb 9, 2017

kevinbayes commented Feb 9, 2017

aliostad commented Feb 10, 2017

kevinbayes commented Feb 11, 2017

aliostad commented Feb 11, 2017 • edited Loading

codefromthecrypt commented Feb 11, 2017 via email

aliostad commented Feb 11, 2017

codefromthecrypt commented Feb 11, 2017 via email

aliostad commented Feb 13, 2017

codefromthecrypt commented Feb 13, 2017 via email

praveenbarli commented Feb 14, 2017

praveenbarli commented Apr 17, 2017 • edited Loading

codefromthecrypt commented Apr 18, 2017 via email

praveenkbarli commented Apr 18, 2017 • edited Loading

aliostad commented Apr 18, 2017

codefromthecrypt commented Apr 18, 2017

praveenbarli commented Apr 21, 2017

praveenbarli commented May 11, 2017

clehene commented May 19, 2017 • edited Loading

CosmosDB (replaces DocumentDB and most others dbs..)

Limits

SLA

Pricing

Application Insights

Limits

Pricing

SLA

praveenbarli commented May 19, 2017

clehene commented May 19, 2017

SergeyKanzhelev commented Jun 21, 2017

aliostad commented Jun 23, 2017

SergeyKanzhelev commented Jun 23, 2017

aliostad commented Feb 11, 2017 •

edited

Loading

praveenbarli commented Apr 17, 2017 •

edited

Loading

praveenkbarli commented Apr 18, 2017 •

edited

Loading

clehene commented May 19, 2017 •

edited

Loading