Dont compress our sortkeys #43

rfk · 2016-12-16T04:18:57Z

There's an interesting note on [1] that says:

"""
We do not recommend applying runlength encoding on any column that is designated as a sort key. Range-restricted scans perform better when blocks contain similar numbers of rows. If sort key columns are compressed much more highly than other columns in the same query, range-restricted scans might perform poorly.
"""

We're not using run-length encoding, but we are compressing our sortkey. The presentation at [2] contains this advice:

"""
If your sort keys compress significantly more than your data columns, you may want to skip compression of sortkey column(s).

Check SVV_TABLE_INFO(skew_sortkey1)
"""

And indeed, SVV_TABLE_INFO tells me that skew_sortkey for our main tables is on the order of several hundred, which seems high. We might find there's a performance win if we disable compression of our sortkeys. It's probably worth an experiment.

[1] http://docs.aws.amazon.com/redshift/latest/dg/c_Runlength_encoding.html
[2] http://www.slideshare.net/AmazonWebServices/bdt401-amazon-redshift-deep-dive-tuning-and-best-practices

The text was updated successfully, but these errors were encountered:

This change addresses two issues: 1. `flow_metadata.flow_id` is promoted to `DISTKEY`, bringing it in to line with `flow_events.flow_id`. Fixes #40. 2. Compression is disabled for every `SORTKEY`, to improve general query performance. Fixes #43.

philbooth · 2016-12-20T18:51:55Z

I've finally managed to measure this change against a subset of our activity event data. The bottom line is that I detected no noticeable improvement.

I created two subsets of the activity event data for the period 1st November until 17th December, one with compressed SORTKEYs and the other uncompressed. These are available in the following tables:

daily_multi_device_users_test_compressed
daily_activity_per_device_test_compressed
daily_multi_device_users_test_uncompressed
daily_activity_per_device_test_uncompressed

I then ran modified versions of the engagement ratio / multi-device query, against each pair of tables. Those queries are available here:

The timings showed a lot of variation when other queries were running but converge around the 8-minute mark in both cases:

Compressed: 8'06", 8'33", 13'21", 8'34", 7'27"
Uncompressed: 8'03", 7'41", 23'02", 7'50", 7'58"

Perhaps with a bigger sample it would be possible to detect a more reliable difference, but it took over 2 days to set up the sample data for this small test so I'm not sure it's worth the effort.

Fortunately importing the flow data is faster than the activity event data, so I'm going to try a bigger experiment with that. But even if there is a positive result there, I'm not sure it would be worth changing the activity event schemata. Given how much data is in those tables, I suspect a much bigger win might come if we implement some form of data expiry (see #45).

philbooth · 2016-12-21T17:23:38Z

I just finished testing this against the full set of flow data using the time-to-device-connection query. As before, two separate datasets, one with compression and one without. Like the previous test, these also showed no improvement. In fact, if anything, the compressed version seemed to be a bit quicker:

Uncompressed: 7'40", 7'37", 7'35"
Compressed: 7'35", 7'19", 6'58"

Given the above, I'll remove the SORTKEY change from #44. Closing this.

rfk · 2016-12-21T20:22:38Z

👍 thanks for doing the necessary science here @philbooth!

This was referenced Dec 17, 2016

Suggested tweak to data re-import logic #39

Closed

Performance and correctness fixes for the flow import scripts #44

Merged

philbooth closed this as completed Dec 21, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dont compress our sortkeys #43

Dont compress our sortkeys #43

rfk commented Dec 16, 2016

philbooth commented Dec 20, 2016 •

edited

Loading

philbooth commented Dec 21, 2016 •

edited

Loading

rfk commented Dec 21, 2016

Dont compress our sortkeys #43

Dont compress our sortkeys #43

Comments

rfk commented Dec 16, 2016

philbooth commented Dec 20, 2016 • edited Loading

philbooth commented Dec 21, 2016 • edited Loading

rfk commented Dec 21, 2016

philbooth commented Dec 20, 2016 •

edited

Loading

philbooth commented Dec 21, 2016 •

edited

Loading