Skip to content

This is an odd little project where we map Wikipedia documents to vectors of integers and then we compress them.

Notifications You must be signed in to change notification settings

lemire/CompressWikipediaAsIntegerVectors

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 

Repository files navigation

CompressWikipediaAsIntegerVectors

This is an odd little project where we map Wikipedia documents to vectors of integers and then we compress them.

Sorry for the poor documentation. This is mostly a personal project.

Example: java -cp target/CompressWikipediaAsIntegerVectors-0.0.1-SNAPSHOT.jar:target/lib/* me.lemire.wikipediacompress.Benchmark /home/dlemire/enwiki-20130102-pages-articles.xml.bz2 ../IndexWikipedia/crdict.txt

About

This is an odd little project where we map Wikipedia documents to vectors of integers and then we compress them.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages