Skip to content

Common corpora used for lossless compression testing and benchmarking.

Notifications You must be signed in to change notification settings

isabella232/corpora

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Corpora

This repository contains common corpora used for lossless compression testing and benchmarking.

Sources

Detailed descriptions of the files found in each of the corpus can be found below.

Corpus URL Notes
Canterbury https://corpus.canterbury.ac.nz/ Includes artificial, calgary, canterbury, large, and miscellaneous corpus.
Silesia http://sun.aei.polsl.pl/~sdeor/index.php?page=silesia
Snappy https://github.com/google/snappy Test data with some duplicates removed that were present in other corpus.
Neuro https://github.com/neurolabusc/zlib-bench NIfTI format brain images.

License

All files are the works of their respective authors. Please see the sources above for any licensing information.

About

Common corpora used for lossless compression testing and benchmarking.

Resources

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • HTML 56.3%
  • C 25.5%
  • Roff 9.7%
  • Common Lisp 8.5%