An openly-licensed corpus of small example files, covering a wide range of formats and creation tools.
HTML Shell Batchfile
Switch branches/tags
Nothing to show
Permalink
Failed to load latest commit information.
desktop-publishing/InDesign Added desktop publishing/InDesign example files Nov 16, 2012
droid-jhove-datetest-files [documentation] Demonstrating a pull request. Oct 11, 2016
ebooks Added more example CC0 files, this time from iBooks Author 2.0. Nov 16, 2012
file-archive added office, statistica, pcraster, arj samples Nov 19, 2012
filesys-trials Updated readme.md of PDF Horror Corpus Jul 16, 2013
govdocs1-error-pdfs Add pdfs from Govdocs1 that are potentially broken Nov 5, 2013
jp2k-formats added jpeg 2000 samples Nov 19, 2012
jp2k-test fixed typo Sep 18, 2012
knowledge-management Set of various files from @carusb, including personally validate migr… Nov 17, 2012
office-examples MS Access files provide by David Clipsham under CC0. Apr 9, 2013
office removed xls file from wq2 dir Mar 7, 2016
pcraster added office, statistica, pcraster, arj samples Nov 19, 2012
pdfCabinetOfHorrors Added PDF with PDF-1.8 version string in header Sep 27, 2016
statistica added office, statistica, pcraster, arj samples Nov 19, 2012
tiff-examples added TIFF with old-style JPEG compression Feb 21, 2017
tools Corrected task list markdown in README Mar 14, 2013
variations Added link to coverage report, and renamed plain-text as variations, … Nov 22, 2012
video/Quicktime Add sample Quicktime videos Nov 16, 2012
.gitattributes Added more sensible gitattributes files. Nov 22, 2012
.gitignore Tidied .gitignore Mar 11, 2013
.opf.yml Added basic OPF Metadata file. Apr 11, 2013
.project Removed project files and added sensible ignores for them. Dec 5, 2011
.pydevproject Fixed pythonpath. Dec 27, 2010
README.md Removed doubled-up text form README. Nov 22, 2012
metadata-template.ext.md Stripped dummy data out of the template. Nov 16, 2012

README.md

format-corpus

An openly-licensed corpus of small example files, covering a wide range of formats and creation tools.

All items, apart from the source code under 'tools', is CC0 licenced unless otherwise stated. The source code is Apache 2.0 Licenced unless otherwise stated.

A recent summary of the contents of the repository can be found here.

How to Contribute

See http://wiki.curatecamp.org/index.php/Collecting_format_ID_test_files for more information.

See metadata-template.ext.md for a simple per-file metadata template.

Pooled Signatures

As well as pooling example files, we also pool format signatures:

More details here: http://wiki.curatecamp.org/index.php/Improving_format_ID_coverage