Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First pass at the demo #258. It has issues on Y-Axis alignment and c… #259

Closed
wants to merge 1 commit into from

Conversation

bluemonk3y
Copy link

First pass at demo #258. It has issues on Y-Axis alignment and cursor hover whitespace.

@bluemonk3y
Copy link
Author

@leeoniya pls review

@leeoniya
Copy link
Owner

leeoniya commented Jun 12, 2020

hey @bluemonk3y thanks for the PR :)

first, please don't be offended if i don't end up merging this, as it might be a bit too specific for your use case. i'll help you figure out alignment issues though so you can use that feedback for yourself in either case.

i haven't reviewed in full yet, but some general intial comments:

  • you might want to use hsla instead of rgba for computing the heatmap color since it's much more suitable & logical for human-friendly hue, lightness and saturation transitions.
  • color range should probably not be part of data but passed to the plugin as config.

i'm still a bit confused about how this data is presented based on your description in liquidlabsio/fluidity#65 (comment)

what this demo shows doesnt really look like latency buckets (unless the bucket size is 1), which in theory would not overlap as they do here, but look more like the heatmap i mentioned in liquidlabsio/fluidity#65 (comment). the overlap + hover combo results in not-great UX, in my opinion, since it becomes impossible to isolate what you're hovering.

it feels like there needs to be some dynamic aggregation (from raw data) into buckets that would be determined from vertically fittable non-overlapping ranges during zooming.

if there is overlap, then it should be from the raw data (rather than aggregated) and each block should be identical in color to allow their densities to overlay and multiply. but at that point you don't want hover, since it becomes mostly unusable.

i think for a demo i'd like to generate a few hundred raw datapoints per timestamp using a realistic-looking normal or skewed normal distribution: https://stackoverflow.com/a/49434653/973988

function randn_bm(min, max, skew) {
    let u = 0, v = 0;
    while(u === 0) u = Math.random(); //Converting [0,1) to (0,1)
    while(v === 0) v = Math.random();
    let num = Math.sqrt( -2.0 * Math.log( u ) ) * Math.cos( 2.0 * Math.PI * v );

    num = num / 10.0 + 0.5; // Translate to 0 -> 1
    if (num > 1 || num < 0) num = randn_bm(min, max, skew); // resample between 0 and 1 if out of range
    num = Math.pow(num, skew); // Skew
    num *= max - min; // Stretch to fill range
    num += min; // offset to min
    return num;
}

we can then use this data to generate two types of heatmaps:

  1. one with the raw data without hover and same-colored translucent rects (or circles) that color-multiply over each other.
  2. another for data that's been aggregated (either statically or on-demand during zoom) into non-overlapping latency buckets of some fixed or configurable pixel height (let's say 5px). this version will be hoverable per latency bucket and will display a summary of how many values were aggregated into that rect. these buckets will be colored using an hsla progression of some kind based on the aggregation stats.

this PR seems to be something in between these two and the result is a bit awkward to interpret and interact with for me at least (even if the positioning is fixed).

cc @mcrumm @davydog187

@bluemonk3y
Copy link
Author

bluemonk3y commented Jun 12, 2020 via email

@leeoniya
Copy link
Owner

leeoniya commented Jun 12, 2020

yeah that makes sense. in reality the data (if it represents e.g. sporadic network requests) has to be bucketed along the temporal/x axis too. still, i think it would be good for the demo to do some gen and aggregation of the random data on the client to create a kind of aggregation & heat style playground without tweaking the agg behavior on a server to get the exact look you want for your data density.

i think this demo should still contain 2 heatmap types - one with latency buckets / non-overlapping, hsla heat gradient and display aggregate info on hover. and another for raw data that's just a static rgba color with low alpha that visually aggregates by overlapping and color multiplying. both versions would still be x-bucketed to 1s, 1d or whatever.

@leeoniya
Copy link
Owner

i've made a prototype today of the raw data heatmap with randomly skewed normal distributions.

the total dataset size in this test is ~20k samples. the wall time for initial render is ~100ms (60ms script of which 30ms is fake data gen, and 40ms compositor & paint).

image

also made an aggregation function that can bucket in a custom multiple, (e.g. passing 10 will make buckets of 0-10,11-20, etc.). i'll try to get the aggregated heatmap version prototyped tomorrow and push a working branch to further work on cursor, tooltips, and heat scaling behavior.

@bluemonk3y
Copy link
Author

bluemonk3y commented Jun 13, 2020 via email

@leeoniya
Copy link
Owner

here are two heatmaps of the same data. the first is raw and the second is aggregated into 10ms buckets.

Screen Shot 2020-06-13 at 10 27 22

[80, 50, 43],
[10, 12, 14, 18, 30, 50],
[70, 90, 13, 10, 12, 13],
[170, 190, 113, 110, 112, 113],
Copy link
Owner

@leeoniya leeoniya Jun 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bluemonk3y i'm trying to plug your data into my prototype and found this bit confusing, since i'm assuming that the latency values are unique and their counts would appear summed together. is there a specific reason for having 113 repeat here? the original description [1] appears to outline what i ended up with in the prototype so far, which would not have repeated latencies (or buckets).

(also, latencies in my dataset are sorted ascending for easier hover testing, so that'll also be a requirement of the input format).

[1] liquidlabsio/fluidity#65 (comment)

@bluemonk3y
Copy link
Author

bluemonk3y commented Jun 14, 2020 via email

@leeoniya
Copy link
Owner

can you provide a more realistic [and probably larger] dataset so that i have a good idea of what the final result will look like?

@bluemonk3y
Copy link
Author

bluemonk3y commented Jun 14, 2020 via email

@leeoniya
Copy link
Owner

leeoniya commented Jun 14, 2020

Sure, I can generate simulated data

if it's simulated, then there's no need since i can generate that myself. it's difficult to simulate realistic data like below, but this is for a hard disk so in aggregate this type of useful detail would get washed out anyways.

image

whate might actually be interesting to do is a few per-series (e.g. per route) heatmaps of different hues that exhibit different median latencies on the same chart. this would be useful for comparing A/B tests when trying to tweak perf. although to me heatmaps are not terribly more insightful than a box and whisker timeseries plot (#179). the most interesting stuff in a heatmap would be spotting outliers that are unusually high-density, but real-world tmeseries heatmaps's outliers will likely always be low density and not too interesting.

@bluemonk3y
Copy link
Author

bluemonk3y commented Jun 14, 2020 via email

@leeoniya
Copy link
Owner

leeoniya commented Jun 14, 2020

so, the question of drill-down is probably worth asking and addressing. if we load 20k raw events and aggregate on the client, we can track which specific event ids end up being part of the buckets and show that somewhere on hover so one can be clicked for further digging. this becomes problematic if your bucket has more than a handful of events in it.

however, i don't really want to over-complicate the demo since demos/plugins are not meant to be fully fleshed out but starting points for implementing the final needed things. even the bucketing/clustering can be much more sophisticated than simply doing incremental grouping.

once we can hover the buckets and display the count for that bucket, this would be a good stopping point. we can leave out the tooltips and probably the column highlight, since i don't want to copy/paste that stuff into every demo and would prefer for the demos to be less code to look at.

@bluemonk3y
Copy link
Author

bluemonk3y commented Jun 17, 2020 via email

@leeoniya
Copy link
Owner

so if you're doing server-side aggregation, you'd would simply return bucket ranges and counts without details for rendering. then on zoom you would send the zoom range to the server and re-aggregate based on the new zoom range (still without individual event detail). only when you zoom sufficiently you will return bucket ranges, weights, plus event details and the buckets will become interactive to view the details on click or hover (assuming you're at your 50 event limit per bucket then)? or is your aggregator gonna know at every zoom request which buckets contain fewer than 50 datapoints and those would be interactive beyond displaying aggregation counts?

@bluemonk3y
Copy link
Author

bluemonk3y commented Jun 17, 2020 via email

@bluemonk3y
Copy link
Author

@leeoniya - would I be able to take a look at the prototype? I'm starting to hookup the front end and will get a feel for how the interaction is. (likely to take a few days)

@leeoniya
Copy link
Owner

i will push a branch today with my test code

@leeoniya
Copy link
Owner

@bluemonk3y have a look at the heatmap branch:

https://github.com/leeoniya/uPlot/tree/heatmap

@bluemonk3y
Copy link
Author

@leeoniya - awesome thank you! Will let you know how the UX goes.

@bluemonk3y
Copy link
Author

@leeoniya - making progress but hit a scaling snag. calling uPlot.setData(newData) - results in NaN when the plugin calls: u.valToPos(yVal, 'y', true).

setData() resets scales, however, I was calculating them on initial creation. setData resets them so that 'y' scale has min=Infinity and max=-Infinity. How can I correctly the scale using my own minMax range function like I did upon initial creation?

@leeoniya
Copy link
Owner

leeoniya commented Jul 2, 2020

How can I correctly the scale using my own minMax range function like I did upon initial creation?

instead of (or in addition to) using scale.min/max for initial scale, provide a scale.range function which returns [min, max]. this function is called during all setScale [and setData] calls.

@leeoniya
Copy link
Owner

leeoniya commented Jul 2, 2020

of course there is also the resetScales param of .setData():

uPlot/dist/uPlot.d.ts

Lines 62 to 63 in ee065f6

/** sets the chart data & redraws. (default resetScales = true) */
setData(data: uPlot.AlignedData, resetScales?: boolean): void;

@bluemonk3y
Copy link
Author

Thanks @leeoniya - that worked.
I did try reset scales but the Y data is an array so the function is required. Almost there, currently looking at making the 'X' render scale as zoom and generating more data to see what niggles exist.

@leeoniya
Copy link
Owner

leeoniya commented Jul 26, 2020

@bluemonk3y i've merged my heatmap code into the demos. given that this PR is inactive and stale, i'm going to close it.

if you want to iterate on the heatmap demo code like adding a tool-tip, column highlight, box-hover, contribute some real data that isn't random, please open a new PR. thanks!

@leeoniya leeoniya closed this Jul 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants