-
Notifications
You must be signed in to change notification settings - Fork 0
sinujohn/ngram
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Developer: Sinu John sinuvian at gmail dot com www.sinujohn.wordpress.com 1. USAGE Usage: java -jar ngram.jar sourcefile [--debug] To run ngram.jar : java -jar ngram.jar sourcefile Here 'sourcefile' is the text file from which trigrams are first obtained. To run in DEBUG mode (be careful if the trigrams are large in number it could printout loads of output): java -jar ngram.jar sourcefile --debug 2. OBJECTIVE Write a program which first collects all word-level trigrams in a novel-length story. Then, choosing any arbitrary trigram as the starting point, generate a short story which is between 200 to 300 words. The next trigram in the chain must always have the first two words same as that of the previous trigrams last two words. What is a trigram you ask? The sentence "The quick red fox jumps over the lazy brown dog" has the following trigrams - "The quick red", "quick red fox", "red fox jumps", "fox jumps over", "jumps over the", "over the lazy", "the lazy brown", "lazy brown dog". Hint: Project Gutenberg is a good place to find a lot of plain text books 3. DESIGN The trigrams are stored using HashMaps and ArrayLists. The storage can be viewed as 3 levels. The first 2 levels are implemented using HashMaps and the 3rd level is implemented using ArrayLists. The Level 1 has all the words once. Each of these words act as as Key to obtain its corresponding value which is another HashMap. By following the key (a word) in Level 1 we can obtain another HashMap which contains all the words which are preceded by the key word of Level 1. This HashMap forms the Level 2. In Level 2, the keys consist of all those words which are successors to the key word of Level 1. Each key of Level 2 has a corresponding value which is made up of by the ArrayList, and this is Level 3. Level 3 contains all the words which are successors of the key word in Level 2, which is in turn the successor of key word in Level 1. To illustrate a small portion of this trigram storage, consider the following trigrams (which is incomplete ofcourse): Trigrams: (a b c), (a c b), (a c e), (b a b) Level 1 Level 2 Level 3 a--------------b---------------c |------c---------------b |------e b--------------a---------------b For another example please see "block.jpg".
About
Creates a random story out of the input file containing a novel-sized story
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published