experimental SQLite database for MusicBrainz
Python Shell
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.



This is a probably-doomed experiment intending to import the entire MusicBrainz dataset into an SQLite database. The primary reason is to read the database quickly and conveniently. (Note that I have no illusions of being able to efficiently support writes.) Currently, if you want a local copy of the MusicBrainz database, you need a PostgreSQL server and a bunch of complicated Perl and Java software. The goal here is that, by using SQLite, a local MB database will be much easier to set up. In fact, mblite only needs Python and SQLite, which are ubiquitous on modern OSes.

If all goes well, then this infrastructure could even be used to quickly and easily set up read-only MusicBrainz mirrors. While its scalability is questionable, SQLite can be much less resource-intensive than "production" databases like PostreSQL, which are designed to support frequent, concurrent writes.

Creating a Database

To create an SQLite MusicBrainz database, run these commands:

$ ./mblite.py --fetch-sql
$ ./mblite.py --fetch-data
$ ./mblite.py --init
$ ./mblite.py --import ./mbdump
$ ./mblite.py --index

Supply the path to the PostgreSQL data dump directory as the argument to the second command. There's also a create.sh script that automates this process.

After a good long while, this will create a database called mblite.db containing a full copy of the MusicBrainz snapshot. When I ran this most recently on my aging Core 2 Duo, the import took about 30 minutes and created a 3 GB database. Indexing the DB took another 7 hours (!) and grew the database to about 5.6 GB.

The Future

Things to do next:

  • Explore using SQLite's full-text indexing to support Lucene-like queries.
  • Use replication packets to update the database.
  • Implement a library for querying the database.
  • Implement a clone of the MusicBrainz XML Web service that uses the SQLite database as a backend?


Adrian Sampson is responsible for this abomination. The code is made available under the GPL.