Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set up automated process to export pages/versions #45

Closed
Mr0grog opened this issue May 12, 2017 · 4 comments
Closed

Set up automated process to export pages/versions #45

Mr0grog opened this issue May 12, 2017 · 4 comments

Comments

@Mr0grog
Copy link
Member

Mr0grog commented May 12, 2017

For people who want to do more complicated analyses offline, it might be helpful to dump just the "data" tables (i.e. not including permissions, users, etc)

@danielballan
Copy link
Contributor

I see two use cases here:

  • Exporting anonymized tables for manual off-line analysis and fiddling.
  • Notifies a processing service of new pages/versions so that this service can assign a priority / filtering status to each Change. (This involves first computing diffs. Whether the processing service then caches those diffs for awhile or destroys them to save space is a separate question. Either seems plausible.)

@Mr0grog
Copy link
Member Author

Mr0grog commented May 16, 2017

I was definitely focusing on the former.

I think the latter might be better handled by polling the API with a capture_time= query (or maybe this is a reason to add query support for created_at?). I guess we could alternatively have some sort of webhook for imports or have -db drop something onto a message queue of some sort that all the various services monitor (is that still a thing we’re planning?).

@danielballan
Copy link
Contributor

Right... I'd say push is better than pull, but pull is easier to quickly hack together. I don't think we have to decide yet. The initial test can just grab everything and then we'll see if we want to tackle setting up a queue right away or not.

@stale
Copy link

stale bot commented Jan 10, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in seven days if no further activity occurs. If it should not be closed, please comment! Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants