-
Notifications
You must be signed in to change notification settings - Fork 3
Conversation
I've also taken the liberty of adding the retrospective import script in a separate commit. Previously I thought that was a one-time gig but, if we're going to recreate these tables, we'll need to run that import again too. May as well keep it in the repo in case we need it again in the future. |
And I just pushed yet another commit, fixing the |
(you can squash all these btw) |
@@ -44,7 +44,7 @@ | |||
timestamp BIGINT NOT NULL SORTKEY, | |||
type VARCHAR(30) NOT NULL, | |||
flow_id VARCHAR(64) NOT NULL DISTKEY, | |||
flow_time BIGINT NOT NULL, | |||
flow_time INTEGER NOT NULL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, may be missing context.
If flow_time
is to be storing the timestamp from JavaScript (mentioned in #16 (comment)), then 4 bytes is probably too small.
$ node -p 'Date.now()'
1477585871454
$ node -p 'Math.pow(2, 32)'
4294967296
4294967296 would be the biggest number of milliseconds you can fit in a 4-byte INTEGER
, if it is unsigned, half if it is signed. Assuming BIGINT
is 8 bytes, that seems better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a time difference, for which the maximum value should be four hours (based on current config). If it overflows to BIGINT
territory, it points at something wrong (which is how we spotted this fwiw, see #16 (comment)).
Having said that, maybe it's possible that in the future there will be no limits to flow_time
. In which case, switching to BIGINT
now would be prudent.
@rfk, what do you reckon?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, so it captures how long the flow took? As you say, if the config is adjusted, it doesn't sound that impossible for a flow to overflow 4 bytes. It overflows after around 49 hours, so if a flow was able to take longer than 2 days, it won't fit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you're right. Updated to BIGINT
, thanks for the feedback @seanmonstar!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I goofed. It overflows after 49 days, not 49 hours.
06f4adb
to
db90f8e
Compare
Fixes the discrepancy you pointed out in #16 (comment), @rfk.
I figured it was better to bring these two up in length, rather than bring the temporary table down, for consistency with the lengths specified on the
flow_metadata
table.And also bigger is better, right? 馃挭
r?