Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raw data available #4

Closed
rufuspollock opened this issue Mar 12, 2013 · 7 comments
Closed

Raw data available #4

rufuspollock opened this issue Mar 12, 2013 · 7 comments

Comments

@rufuspollock
Copy link
Member

Establish a raw data s3 bucket with cleaned OS data in it. It has following structure

{dataset-name}/datapackage.json
{dataset-name}/... data files e.g. file1.csv

Questions

bucket name / location?

Propose data.openspending.org

Nice index

Put in a directory index - https://github.com/rgrp/s3-bucket-listing

How do we get this out of OS atm

Can we do this at the DB level (even just using postgres copy!) (via API is impossible for large datasets - i imagine we can't stream 3gb of data down over the web app ...)

Why

I want to do analysis / queries on OS data that are not supported (or too "costly") by the API - cf #3 (e.g. what are top recipients of uk gov spending ...). To do this I need the raw CSV so I can load into my local postgres / hadoop / bigtable ...

Aside: this in fact could be the import format - these could be the cleaned files we loaded into OS (which would move most of the ETL out of OS core but that's a completely separate discussion ...)

@trickvi
Copy link

trickvi commented Mar 12, 2013

What would be the use case? Would Amazon Glacier work or is this a short retrieval time service?

@rufuspollock
Copy link
Member Author

This would want live retrieval so glacier is out but reduced redundancy would be fine. But that's a detail.

@rufuspollock
Copy link
Member Author

@pudo does the 25k ETL spit out nice CSVs? Could they be auto-pushed to s3?

@rufuspollock
Copy link
Member Author

OK, http://data.openspending.org has a nice index page :-)

@mk270
Copy link

mk270 commented Mar 24, 2013

Is this linked from anywhere?

@rufuspollock
Copy link
Member Author

@mk270 not yet ...

@pwalsh
Copy link
Member

pwalsh commented Dec 28, 2015

Closing because:

  • data.openspending.org exists
  • This whole concept is at the basis of Openspending Next anyway

@pwalsh pwalsh closed this as completed Dec 28, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants