-
Notifications
You must be signed in to change notification settings - Fork 3
"Value out of range for 4 bytes" error reported during flow event import #16
Comments
Oh interesting. Fwiw, those dates just come from whatever happens to be in s3. |
This is the query that copies from the temporary table into the main table, so AFAICT it can't be a problem with the loading of the data from S3. And the only column that's not a direct copy is the timestamp, where we convert it from a bigint to a timestamp. Perhaps this is caused by a busted timestamp value? Pretty sure those are 8 bytes though, not 4 as in the error message. |
Although, the columns in the |
If I run: SELECT DISTINCT begin_time::DATE
FROM flow_metadata
ORDER BY 1; I can see that we have no flow events for the following dates: 17th Sep, 18th Sep, 20th Sep, 26th Sep, 29th Sep, 2nd Oct, 5th Oct, 6th Oct, 8th Oct, 12th Oct or anything after 15th Oct. |
I notice the title of this issue refers to activity events but the error is actually from the flow event import. Updating the title for that reason. |
I've opened bug 1313357 to investigate the curious case of the weird dates in S3. |
Getting back to the main problem discussed in this issue, the out-of-range error, 4 bytes is the size of the If I then look at the However, the very fact that we are suddenly getting a I can think of three possibilities off the top of my head:
|
I may have just had a huge, huge, yuuuuuuge eureka moment on this. The error message includes the value of That equates to a duration of 30 days. And the file this value comes from is our mysterious future date, Is it possible that the dates on our servers were screwed up for a while, thus causing both problems? |
I just realised the activity events would also be borked if that was the case, but they aren't. |
The final outcome of bug 1313357 was that the two issues are indeed related and they're caused by bad timestamps on the content server flow events. With my earlier hunch in mind:
I thought it should be quite easy to reproduce the problem by skewing my local clock and then connecting to an fxa-dev box to check the flow events in the log. I tried this with skews of hours, days and years, in both directions, but actually the content server skew-correction code seemed to handle all cases correctly. Given this and the observations in Bugzilla, I've come to two fresh conclusions:
I'll raise separate issues in the content server repo to handle these two and point them back to this bug as the source of discussion. I'll try to get it all done as quickly as possible but I fear it may be too late for train 73, we'll see. |
The other thing we need to do to fix the problem in terms of this repo, is delete or ignore all the bad CSVs in S3 once the content server fix is deployed. That data is lost for good, sorry. |
Raised mozilla/fxa-content-server#4349 and mozilla/fxa-content-server#4350 to cover the two content server issues identified in #16 (comment). |
Is there anything else to do in this issue, or is it covered by the linked follow-ups? |
I think we're covered, good point. |
Checking the mail spool for the ec2-user on our import box, I see a bunch of tracebacks like the following (this one was from today)
Not sure what to make of it, especially if it's trying to load a file from a date in the future. Recording it here for followup.
The text was updated successfully, but these errors were encountered: