Switch branches/tags
Nothing to show
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
49 lines (34 sloc) 1.96 KB
A collection of JSON datasets for various CouchDB dev purposes
ol1k.json 1000 JSON records from Open Library Authors (approx 327K in size)
ol100k.json 100,000 JSON records from Open Library Authors (approx 32M in size)
I've long felt the need for a set of standard databases that everyone could use for various development purposes
such as
a) creating example views, CouchApps ...
b) doing performance tests that could be used to do apples/apples comparisons
c) create training materials
Sybase has a standard "pubs" database that is shipped with the db distro. All tutorials, and most 3rd party training
books use this database to teach database basics, database design, performance tuning etc. The Java world has the
PetStore app which became a template "hello world" J2EE app.
My hope is that this collection of datasets and tools will fulfill a similar "well known dataset" function in the
CouchDB world and the larger NoSQL world.
I've taken the Open Library ( data sets, done some minor cleanup and sliced them into samples of
1K and 100K JSON objects.
Each file is a serialised JSON array of JSON objects. So you'll need to do some minor parsing to pull the individual
JSON objects into CouchDB. There is a crude python script to load the data - I have not used bulk load -
please change if it is taking too long for you on the 100K data set.
Now what
I am going to use these data sets when creating sample CouchApps of any kind and also for creating Sproingg test
messages (see More data sets from OpenLibrary and other sources will turn up
here. If you have data sets to contribute please send a pull request or contact me at nitin at couch dot io.
The two data sets have also been uploaded to, please feel free to replicate from databases ol1k and
ol100k respectively.
Be warned that the data in couchdb is approx 500K and 50M resp.