Detexify data

This repository contains instructions on how to obtain and use the training data for It does not contain the data.

Obtaining the data

Detexify's training data is stored in a CouchDB database hosted on Cloudant.

The database lives at:

NOTE: Please send me an email ( to get auth credentials. Having this open caused too many expensive calls against the API.


This does not work right now because of #1

The best way to obtain the data is to set up your own CouchDB and replicate the database to yours. This can be done easily via CouchDB's Admin Interface. Assuming you have a local CouchDB running visit

If you are familiar with CochDB's HTTP interface you can instead use the following request to start replication:

POST /_replicate HTTP/1.1
Content-Type: application/json

Querying data & data format

You can query data via the view by_id. Have a look at for an example.

Please have a look at the CouchDB documentation on views for more information.

A prettyprinted example document can be viewed in example.json. It was obtained via the request

The json objects contain two relevant keys. One is key and it identifies the LaTeX command this sample is for (see for details). The other one is data which contains an array of ink strokes. Ink strokes are represented as arrays of objects { x: x-coordinate, y: y-coordinate, t: timestamp }. This data is not preprocessed in any way.


The database is licensed under the ODbL. This license is also used by OpenStreetMap. A human-readable form of the license can be found at I am no expert in database licensing but this looks reasonable to me. Feel free to contact me if you have questions or concerns.


Please be kind. I pay for database traffic. Replicate the data once and then use your own database.