Skip to content
This repository has been archived by the owner on Mar 19, 2021. It is now read-only.

Suggested tweak to data re-import logic #39

Closed
wants to merge 1 commit into from
Closed

Conversation

rfk
Copy link
Contributor

@rfk rfk commented Dec 14, 2016

@philbooth to make it more concrete, this is the sort of thing I was suggesting over in #34 (comment)

  • Tag the rows in flow_metadata with the export_date of their flow.begin event
  • Delete from that table whenever we're going to re-import that flow.begin event
  • Update the properties of flow_metadata based on all events within a time period, not just on the events we're importing

It's my hope that the point would help us ensure we have the correct properties for flows that cross a day boundary.

@philbooth
Copy link
Contributor

This looks good to me. If you like, I can kick off a (timed) clean import using these changes over the weekend, just so we have an accurate understanding of any impact on the import time before merging. But also feel free to treat this comment as official r+ if you want to merge it pre-import.

@philbooth
Copy link
Contributor

philbooth commented Dec 17, 2016

@rfk, just fyi, I'm about to drop the flow tables then kick off a clean import using your changes here, in addition to your recently merged #37 and fixes for your recently opened #40 and #43. When that's done I'll take some timings of the queries and play around with deleting-then-importing individual days to put the logic here through it's paces.

And if/when that looks all good, I'll kick off a clean import of activity events without SORTKEY compression too, since presumably we want to correct that issue everywhere.

@philbooth
Copy link
Contributor

philbooth commented Dec 17, 2016

And sorry, I only just noticed this PR got closed because I deleted my remote phil/issue-33 branch that it was based off. I'll get everything pushed and re-opened if/when it all looks good, sorry.

@rfk
Copy link
Contributor Author

rfk commented Dec 17, 2016

without SORTKEY compression

FWIW, I don't think we should presume this will be a win, it may be worth experimenting with a copy of the tables with/without sortkey compression to see if we can measure much of a performance difference.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants