Skip to content
aks is a utility for extracting n-grams from texts
C Shell
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
README.md
aks.c
compressmasters
createmasters
processmasters
sortmasters
trimmasters

README.md

aks

This program is a utility for extracting n-grams from texts. It extracts every contiguous string from a collection of texts, from length 1 up to a maximum length determined by the user. The included scripts then perform sorting routines on the n-gram files to determine which strings occur most frequently. This method is especially useful for texts composed in languages that do not feature orthographic spacing between individual words.

usage: ./aks [language] [maximum n value] [source directory] ./processmasters [maximum n value] [source directory]

examples:

aks tibetan_roman 32 /home/handyc/texts

aks tibetan_uchen 32 /home/handyc/texts

aks chinese 32 /home/handyc/texts

aks sanskrit_unicode 32 /home/handyc/texts

You may need to change permissions on the scripts in order to allow yourself to run them.

Questions, comments, please write to handyc@sdf.org or search for my Leiden University address

You can’t perform that action at this time.