Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
An openly-licensed corpus of small example files, covering a wide range of formats and creation tools.
Ruby Shell
branch: master
Failed to load latest commit information.
desktop-publishing/InDesign Added desktop publishing/InDesign example files
ebooks Added more example CC0 files, this time from iBooks Author 2.0.
file-archive added office, statistica, pcraster, arj samples
filesys-trials Updated readme.md of PDF Horror Corpus
govdocs1-error-pdfs Add pdfs from Govdocs1 that are potentially broken
jp2k-formats added jpeg 2000 samples
jp2k-test fixed typo
knowledge-management Set of various files from @carusb, including personally validate migr…
office-examples MS Access files provide by David Clipsham under CC0.
office added another ODT
pcraster added office, statistica, pcraster, arj samples
pdfCabinetOfHorrors fixed problem with attachment annotation PDF
statistica added office, statistica, pcraster, arj samples
tiff-examples/NANETH_8bpp_grayscale added a digitally signed PDF 1.7 portfolio with multiple sheets, form…
tools Corrected task list markdown in README
variations Added link to coverage report, and renamed plain-text as variations, …
video/Quicktime Add sample Quicktime videos
.gitattributes Added more sensible gitattributes files.
.gitignore Tidied .gitignore
.opf.yml Added basic OPF Metadata file.
.project Removed project files and added sensible ignores for them.
.pydevproject Fixed pythonpath.
README.md Removed doubled-up text form README.
metadata-template.ext.md Stripped dummy data out of the template.

README.md

format-corpus

An openly-licensed corpus of small example files, covering a wide range of formats and creation tools.

All items, apart from the source code under 'tools', is CC0 licenced unless otherwise stated. The source code is Apache 2.0 Licenced unless otherwise stated.

A recent summary of the contents of the repository can be found here.

How to Contribute

See http://wiki.curatecamp.org/index.php/Collecting_format_ID_test_files for more information.

See metadata-template.ext.md for a simple per-file metadata template.

Pooled Signatures

As well as pooling example files, we also pool format signatures:

More details here: http://wiki.curatecamp.org/index.php/Improving_format_ID_coverage

Something went wrong with that request. Please try again.