standard data sets for couchdb testing and prototyping
Switch branches/tags
Nothing to show
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.




A collection of JSON datasets for various CouchDB dev purposes

ol1k.json 1000 JSON records from Open Library Authors (approx 327K in size)
ol100k.json 100,000 JSON records from Open Library Authors (approx 32M in size)

I've long felt the need for a set of standard databases that everyone could use for various development purposes 
such as

a) creating example views, CouchApps ...
b) doing performance tests that could be used to do apples/apples comparisons
c) create training materials

Sybase has a standard "pubs" database that is shipped with the db distro. All tutorials, and most 3rd party training 
books use this database to teach database basics, database design, performance tuning etc.  The Java world has the
 PetStore app which became a template "hello world" J2EE app.

My hope is that this collection of datasets and tools will fulfill a similar "well known dataset" function in the
 CouchDB world and the larger NoSQL world.


I've taken the  Open Library ( data sets, done some minor cleanup and sliced them into samples of
 1K and 100K JSON objects.
Each file is a serialised JSON array of JSON objects.   So you'll need to do some minor parsing to pull the individual
JSON objects into CouchDB.  There is a crude python script to load the data  - I have not used bulk load - 
please change if it is taking too long for you on the 100K data set.

Now what

I am going to use these data sets when creating sample CouchApps of any kind and also for creating Sproingg test
messages (see  More data sets from OpenLibrary and other sources will turn up
 here.  If you have data sets to contribute please send a pull request or contact me at nitin at couch dot io.

The two data sets have also been uploaded to, please feel free to replicate from databases ol1k and
 ol100k respectively.

Be warned that the data in couchdb is approx 500K and 50M resp.