Skip to content
This repository has been archived by the owner on Mar 19, 2021. It is now read-only.

fix(db): eliminate schema mismatches #17

Merged
merged 3 commits into from
Oct 27, 2016
Merged

fix(db): eliminate schema mismatches #17

merged 3 commits into from
Oct 27, 2016

Conversation

philbooth
Copy link
Contributor

Fixes the discrepancy you pointed out in #16 (comment), @rfk.

I figured it was better to bring these two up in length, rather than bring the temporary table down, for consistency with the lengths specified on the flow_metadata table.

And also bigger is better, right? 馃挭

r?

@philbooth
Copy link
Contributor Author

I've also taken the liberty of adding the retrospective import script in a separate commit.

Previously I thought that was a one-time gig but, if we're going to recreate these tables, we'll need to run that import again too. May as well keep it in the repo in case we need it again in the future.

@philbooth
Copy link
Contributor Author

And I just pushed yet another commit, fixing the flow_time mismatch mentioned in #16 (comment). Pretty sure that's all of them now.

@philbooth philbooth changed the title fix(db): eliminate mismatched VARCHAR lengths in schema fix(db): eliminate schema mismatches Oct 27, 2016
@philbooth
Copy link
Contributor Author

philbooth commented Oct 27, 2016

(you can squash all these btw)

@@ -44,7 +44,7 @@
timestamp BIGINT NOT NULL SORTKEY,
type VARCHAR(30) NOT NULL,
flow_id VARCHAR(64) NOT NULL DISTKEY,
flow_time BIGINT NOT NULL,
flow_time INTEGER NOT NULL,
Copy link

@seanmonstar seanmonstar Oct 27, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, may be missing context.

If flow_time is to be storing the timestamp from JavaScript (mentioned in #16 (comment)), then 4 bytes is probably too small.

$ node -p 'Date.now()'
1477585871454
$ node -p 'Math.pow(2, 32)'
4294967296

4294967296 would be the biggest number of milliseconds you can fit in a 4-byte INTEGER, if it is unsigned, half if it is signed. Assuming BIGINT is 8 bytes, that seems better?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a time difference, for which the maximum value should be four hours (based on current config). If it overflows to BIGINT territory, it points at something wrong (which is how we spotted this fwiw, see #16 (comment)).

Having said that, maybe it's possible that in the future there will be no limits to flow_time. In which case, switching to BIGINT now would be prudent.

@rfk, what do you reckon?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, so it captures how long the flow took? As you say, if the config is adjusted, it doesn't sound that impossible for a flow to overflow 4 bytes. It overflows after around 49 hours, so if a flow was able to take longer than 2 days, it won't fit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you're right. Updated to BIGINT, thanks for the feedback @seanmonstar!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I goofed. It overflows after 49 days, not 49 hours.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants