standard data sets for couchdb testing and prototyping
Fetching latest commit…
Cannot retrieve the latest commit at this time
Cdata ===== What ---- A collection of JSON datasets for various CouchDB dev purposes ol1k.json 1000 JSON records from Open Library Authors (approx 327K in size) ol100k.json 100,000 JSON records from Open Library Authors (approx 32M in size) Why --- I've long felt the need for a set of standard databases that everyone could use for various development purposes such as a) creating example views, CouchApps ... b) doing performance tests that could be used to do apples/apples comparisons c) create training materials Sybase has a standard "pubs" database that is shipped with the db distro. All tutorials, and most 3rd party training books use this database to teach database basics, database design, performance tuning etc. The Java world has the PetStore app which became a template "hello world" J2EE app. My hope is that this collection of datasets and tools will fulfill a similar "well known dataset" function in the CouchDB world and the larger NoSQL world. How --- I've taken the Open Library (http://openlibrary.org) data sets, done some minor cleanup and sliced them into samples of 1K and 100K JSON objects. Each file is a serialised JSON array of JSON objects. So you'll need to do some minor parsing to pull the individual JSON objects into CouchDB. There is a crude python script to load the data - I have not used bulk load - please change if it is taking too long for you on the 100K data set. Now what -------- I am going to use these data sets when creating sample CouchApps of any kind and also for creating Sproingg test messages (see http://nborwankar.github.com/sproingg). More data sets from OpenLibrary and other sources will turn up here. If you have data sets to contribute please send a pull request or contact me at nitin at couch dot io. The two data sets have also been uploaded to cdata.couchone.com, please feel free to replicate from databases ol1k and ol100k respectively. Be warned that the data in couchdb is approx 500K and 50M resp.