Skip to content
This repository has been archived by the owner on Dec 10, 2018. It is now read-only.

Add storage support (ex table service or docdb) #8

Open
codefromthecrypt opened this issue Feb 9, 2017 · 23 comments
Open

Add storage support (ex table service or docdb) #8

codefromthecrypt opened this issue Feb 9, 2017 · 23 comments

Comments

@codefromthecrypt
Copy link

Folks have been expressing interest in storage, particularly as while apps can transmit to EventHub, eventually that data needs to land somewhere :)

Ex. DocDB https://groups.google.com/d/msg/zipkin-dev/bTBKUEjLtKg/yLbvVkFYAQAJ
or possibly table service @kevinbayes

There are likely so pros and cons to discuss here

@kevinbayes
Copy link
Collaborator

@adriancole, yeah I need to go through Zipkin to see if Azure Table Storage would be a good fit for this. I suspect it might be.

The attractive aspect of Table Storage is it allows a user to run on azure with "out-of-box" services.

@aliostad
Copy link
Collaborator

A word of caution against Azure Table Storage: it can be painfully slow to return +50 items from the same partition key. I have reported it to Microsoft. It is basically more or less a BigTable impl but lacks the finesse of DynamoDB or Cassandra.

@kevinbayes
Copy link
Collaborator

@aliostad that is also my view on table storage, that is why I need to investigate what the data-structure would need to look like to support table storage.

DocumentDB can be used but is a bit pricey, so have also reported the cost to Microsoft to see if there are any plans to reduce it in the future. Besides that, DocumentDB is a good fit (even with the watered down feature set compared to other stores).

@aliostad
Copy link
Collaborator

aliostad commented Feb 11, 2017

@kevinbayes yeah need to know how the data gets queried to know what is the best to use. Good news is Elasticsearch is already used by many companies using Azure - for me too it is my storage of choice. I have started introducing Cassandra in our company but I lack hardcore experience using it in anger.

Also DocDB, from the little I know, more or less is a glorified key-value store, not sure if it is the right solution here - I could be wrong.

@codefromthecrypt
Copy link
Author

codefromthecrypt commented Feb 11, 2017 via email

@aliostad
Copy link
Collaborator

Do we know why DocDB impl stopped? Was it lack of interest or technical hindrance?

@codefromthecrypt
Copy link
Author

codefromthecrypt commented Feb 11, 2017 via email

@aliostad
Copy link
Collaborator

the hardest part of the zipkin query api is an equals match on nested data.

Well that is good to know. But I am thinking Azure Table Storage will have problem with event loading the root page with listing all recent spans. How does that work in Cassandra? Are we storing spanIds against perhaps one-minute partition key and then look up those spans?

@codefromthecrypt
Copy link
Author

codefromthecrypt commented Feb 13, 2017 via email

@praveenbarli
Copy link
Contributor

@adriancole Yes, got deviated but not completely off. I checked on Azure Search, DocDb and put efforts in using DocDb storage for trace storage and retrieval. Had to switch to different things in between but started looking into Dependencies last week.

@praveenbarli
Copy link
Contributor

praveenbarli commented Apr 17, 2017

I have been working on Zipkin plugin for Application Insights as storage. I wanted people on this forum to know about the reasons behind this work based on our discussions within and with customers.

  1. Existing systems in Azure are already using Application Insights for telemetry (although without this layer, that telemetry doesn't include job-level tracing). Users don't want to add a new storage system when they can do with the one they are using.

  2. AI is much cheaper than DocDB, Azure Search, and other cloud data stores we have looked at.

  3. AI is MSFT's telemetry brand. When we offer other back-ends, MSFT shops say "I would be more comfortable if this built on top of the existing MSFT telemetry system".

  4. For these reasons (and maybe others too), customers have specifically requested an AI-based system.

  5. Note that none of these are dev-level, feature-level arguments. As a dev, for throughput, flexibility, and rich API, I would definitely pick DocDB over AI (which is why we originally did). But there are business, economic, and existing system constraints that pushed us to offer an AI store.

  6. At a dev-level perspective, what you can say about AI is (i) auto-purging is nice and (ii) the AI system in Azure offers a rich web UI which the AI team can extend to present this data to existing Azure customers in a place they are used to looking at telemetry data; we are working with the AI team to make this happen.

@codefromthecrypt
Copy link
Author

codefromthecrypt commented Apr 18, 2017 via email

@praveenkbarli
Copy link

praveenkbarli commented Apr 18, 2017

@adriancole you are welcome! Here are the answers to your questions
Feature wise I got to check on Spark support.
cost comparision:
I am putting the standard costs but woul like to explore the links as I am not best at pricing.
Storage for AI is 2.30 /GB

DocDB 0.25/GB
Reserved RUs /second (per 100 RUs, 400 RUs minimum) with 0.008 per hour - about 23.81 /month
RU is the throughput .

https://azure.microsoft.com/en-us/pricing/details/application-insights/
https://azure.microsoft.com/en-us/pricing/calculator/

These customers are not on Github but were interested to know about Zipkin.

@aliostad
Copy link
Collaborator

2 cents from me.

From someone who has been working in Azure for the last 5 years, I must say Application Insight is a failure. UI is slow, clunky and frankly unusable. Microsoft have changed their M&A story many times now and for anyone doing some serious work in Azure, you would need to build your own dashboards as we have. But building support for something is different from using it: it would be good to support AI for entry level usage and to boost usage in all scenarios.

Would be useful to mention caveats.

@codefromthecrypt
Copy link
Author

thanks @aliostad for the feedback. keen insight that "building support for something is different from using it".

one thing I'm concerned about is the maintenance aspect.

In most clouds, deploying storage is very straightforward. Azure seems to have a cli where you can easily provision docdb including choosing which of all regions you can use. https://docs.microsoft.com/en-us/azure/documentdb/documentdb-automation-resource-manager-cli This looks reasonably easy to setup with automation with uncomplicated commands. The fact that I can choose south-east asia means my write latency won't be terrible when testing.

It seems AI is limited both in regions and also by more complex configuration. It isn't clear how to change the retention policy which is important when people go prod as usually trace retention is days (don't want to multiply the cost for unneeded long retention).

Moreover, unlike documentdb, there's no spark driver in place, which means things like dependency graph linking will be custom or much more code to cover edge cases when navigating more data. documentdb frankly is a better fit.

Let's say we do AI anyway.. how would that impact us?

Right now, there's a separate repo for zipkin-stackdriver, run by the goog team who support reusing their APM's tracing service. This one is more straightforward as it only deals with traces and the format is very similar to zipkin's. Finally, they directly support including releasing that. This all came from direct and visible customer requests (ex users literally on github asking and validating). Finally, the product is free. We are rarely impacted by stackdriver except for bug fixes.

What's different here, is that the scope of the product is much wider (a full APM). There's costs involved and as yet no direct demand from someone who wants to use it. It also seems more limited and complex vs DocDB. Unlike stackdriver, we are talking about hosting this codebase here. This is kiting work and attention in a way we don't typically behave. Usually we wait for users to request something before adding a large amount of code to support it.

OTOH, it "seems" MS are going to directly support this. That could help mitigate some of the problems and maybe even address product gaps.

Things to think about

@praveenbarli
Copy link
Contributor

@aliostad @adriancole Thanks for your feedback/comments.

@praveenbarli
Copy link
Contributor

Here is the PR for supporting AppInsights storage. #27
I would like to thank @adriancole and team at MSFT for your support.

@clehene
Copy link

clehene commented May 19, 2017

I think Cosmos/Doc db is the best (hosted) choice.

Here's an overview

CosmosDB (replaces DocumentDB and most others dbs..)

This is one of the most versatile and reliable (SLA covers availability > 99.99% and provides latency guarantees of 10/15ms for read / write) services in Azure.
It supports multiple APIs from KV to Table, Document and Graph. I've mostly looked at Document.

The actual cost will depend on actual data modeling and I suspect this would take most time for an implementation, even if only to figure out what would make most sense.

Limits

Can't find the limits page but it can be provisioned to over 250,000 RUs / second

SLA

availability, P99 latency and throughput:

  • < 99.99% - 10% discount
  • < 99% - 25% discount

availability and throughput are both calculated as error rates as a sum of errors per hour divided by hours in a month, with throughput violation as throttle errors that happen within provisioned IOPS
latency is defined as percentile for successful requests

I recommend reading the SLA as an example of a good SLA :)
https://azure.microsoft.com/en-us/support/legal/sla/cosmos-db/v1_0/

Pricing

1 read of 1KB requires 1 RU
Storage $0.25 / GB / month
100RUs / second $0.008/h

Application Insights

I've done a very basic evaluation of AI for functionality, limits and pricing including an actual functional test.

My first subjective opinion was that it's half backed and clunky, but also work in progress, so perhaps could be revisited later on.
I don't believe AI is feasible as a tracing backend, and IMO not feasible for serious telemetry use-cases in general either.

Limits

relevant 32k events / second over a minute
https://github.com/Microsoft/azure-docs/blob/master/includes/application-insights-limits.md

Pricing

The cost structure is a bit weird too and the cost is extremely high https://azure.microsoft.com/en-us/pricing/details/application-insights/

Basically you get 1GB free / month and then it costs $2.3/GB. Not sure how the $15/node would apply.

SLA

First, it's not quite realtime, but I couldn't figure out a rule on how latent it is.
However the SLA specifies up to 2h delays
The availability SLO is 99.9% with 10% discount below that and 25% discount below 99%. But note that downtime is additional latency after the 2 allowed hours.
https://azure.microsoft.com/en-us/support/legal/sla/application-insights/v1_0/

@praveenbarli
Copy link
Contributor

@clehene Thanks for your input.

FYI, App Insight SLA for latency is when you do not flush your writes immediately - recommended when you use AI just as APM. But while using data storage (as the AI plugin) we flush spans as they come and I see latency in minutes (~ 2-5 minutes). However, I agree with you by large on DocDB being a better alternative in terms of latency and availability as I also mentioned in my previous comments and AI support is not just based on these.

Regarding, the plugin for a Cosmos/Doc Db I am working on it and will keep you informed on the updates. I would also like to hear from other users interested in this plugin.

@clehene
Copy link

clehene commented May 19, 2017

@praveenbarli got it, thanks for clarifying. I created #30

@SergeyKanzhelev
Copy link

@aliostad thanks for feedback on Application Insights. I hope it will surprise you with all the new feature and improved UI we are building. You should take a look at Analytics and Live Stream - killing combination of fast and powerful query language on historical data and live unfiltered view of current service health. Rich curated experiences are coming soon.

I started working with @praveenbarli on Zipkin to Application Insights to match concepts, not use Application Insights as just a storage of indexed json blobs.

Created #33 to track and discuss this work further

@aliostad
Copy link
Collaborator

Thanks @SergeyKanzhelev I appreciate your comments.

But also appreciate if you could refrain from marketing language (surprise you with new features, killing combination, rich curated experience) when it comes to a paid services especially when it is stated by an employee of the vendors.

Application Insight has a way to go to impress people who have seen the light but for those who don't have any monitoring is certainly useful. I am sure Microsoft is working to improve it.

@SergeyKanzhelev
Copy link

@aliostad not meant to offend you with my emotional comment. I'm an engineer, not working for marketing. And yes, I am personally working to improve it. If (or when) you'll decide to try it again next time - I'll be happy to listen for more feedback and walk you thru my favorite features.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants