New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bulk data can and should be created incrementally #293

Closed
mlissner opened this Issue Oct 31, 2014 · 2 comments

Comments

Projects
None yet
1 participant
@mlissner
Member

mlissner commented Oct 31, 2014

it _just_ dawned on me that our new bulk data system can be incremental. The new process should be something like:

  • Look for old bulk file. If found:
  • Add/update any items that were modified since the date of the archive found.
  • (Optionally) Delete any items that have been deleted since the last archive.

That last step is currently pretty hard, but it could be made pretty easy if we added a "Deleted" flag to the database. I'm tempted to do that anyway so delete just flips the switch instead of actually deleting things.

Anyway, however this is implemented, we could _drastically_ speed up bulk file generation, which, despite my efforts, is currently taking way too long.

@mlissner mlissner referenced this issue Nov 5, 2014

Closed

Upgrade bugs and improvements remaining #295

20 of 20 tasks complete

@mlissner mlissner closed this in 9b8d95f Nov 5, 2014

@mlissner

This comment has been minimized.

Show comment
Hide comment
@mlissner

mlissner Nov 5, 2014

Member

@brianwc, FYI, this will make bulk files much, much faster. There are some notes in the commit message as well.

Member

mlissner commented Nov 5, 2014

@brianwc, FYI, this will make bulk files much, much faster. There are some notes in the commit message as well.

@mlissner

This comment has been minimized.

Show comment
Hide comment
@mlissner

mlissner Nov 7, 2014

Member

This was harder than anticipated since gz files can't be appended to and the contents of tar files can't be updated.

More details on the blog, but I believe this is resolved.

Member

mlissner commented Nov 7, 2014

This was harder than anticipated since gz files can't be appended to and the contents of tar files can't be updated.

More details on the blog, but I believe this is resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment