Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Branch: master
Commits on Feb 13, 2008
Commits on Feb 12, 2008
  1. wordProb prototype change

    authored
Commits on Feb 11, 2008
  1. train: Use closed vocab

    authored
Commits on Feb 10, 2008
  1. vspell-report: fix --detail

    authored
  2. Use ${O}.vocab2 for LM generation

    authored
    ${O}.vocab2 does contain special tokens like <opaque>
    while ${O}.vocab does not
Commits on Feb 5, 2008
  1. Echo running commands

    authored
Commits on Feb 3, 2008
Commits on Feb 21, 2007
  1. Updated train script

    authored
     - Include timing for long processings
     - Use sc-train --replay
     - Accept two parameters, the first one will be PREFIX.
       The second one is a number
  2. Prevent overflows in sc2wngram

    authored
    <s> <digit>, <s> <opaque> and <s> <punct> may well run over
    int limit (about 2G). So long long int is used
Commits on Dec 2, 2006
  1. Fixed leaks in LM::operator[](const char *) and LM::clear_oov()

    authored
    LM::clear_oov() (also known as clear_rest) resizes LM::oov without
    release memory allocated for its strings
    
    LM::operator[](const char *) abused LM::lm->HT to index LM::oov.
    This has a nasty effect that lm->HT keeps growing no matter you call
    LM::clear_oov(). With lm->HT's growing bigger and bigger, hash lookup
    slows down significantly.
    
    The new implementation uses another hash for index oov and free it
    when LM::clear_oov() is called
Commits on Dec 1, 2006
  1. Preserve \n in std-syllable output

    authored
    It greatly helps preparing the wordlist
Commits on Nov 30, 2006
  1. Reworked WordArchive::load to use new wordlist format

    authored
    The new format is as same as CMU SLM's vocab format. It should
    be noted that words in wordlist are standardized ones.
    
    If WordArchive::load() is called with NULL as argument, it'll
    then use wordlist from struct lm_t inside class LM. So if you
    already load an LM, call warch.load(NULL) to save I/O.
Commits on Nov 29, 2006
  1. Added record/replay mode to sc-train to speed up the process

    authored
    According to sysprof, a large amount of time was spent for
    input processing (operator >> Lattice& and friends). This mode
    tries to eliminate that work.
    
    Record mode runs as usual without real calculation. It
    outputs what steps needed to calculate the final results.
    The format is as follow (for bigrams only):
    <dag count> <node begin id> <node end id>
    <L|R> <v> <vv> <word1> <word2>
    <D> 0 0 none none
    
    Replay mode reads record mode's output and do the rest of work.
    It allocates Sleft, Sright, fill them up and output the counts.
    
    Record mode may take as long as normal mode. But replay mode is
    much faster (about fifteen minutes while normal mode may take
    ninety to one hundred and twenty minutes). Record mode seems,
    however, to output three times bigger than lattice output (approx
    gzipped 300MB)
Commits on Nov 28, 2006
  1. Use double for softcounting as float is too small

    authored
    Some values reached 1e-47 which seems too large for float
    Also added some checking to make sure we get noticed if
    some values are overflowed
Commits on Nov 21, 2006
  1. Replaced bare Sentence pointer in Lattice class with boost::shared_ptr

    authored
    This should fix the leak produced by commit
    c735117
Commits on Nov 13, 2006
  1. Avoid signed/unsigned char pitfalls when calling viet_is* functions

    authored
    Strings are by default char*. That means there are some negative
    character values (such as 'đ' - 0xf0). If these values are passed
    to viet_is* they will reference to undetermined places because
    viet_is* don't check for negative values.
    
    This caused Sentence::tokenize() to ignore 'đủ'
Commits on Nov 12, 2006
  1. Always ignore 0-weighted edges in PFS. It's too perfect to be true.

    authored
    This adds a trampoline to avoid nasty bugs such as
    caf35e6
  2. LM::wordProb should return a large value in bad cases

    authored
    Returning 0 means it's the best LogP out there, which is
    obviously wrong. The selected value is -9999.0
Something went wrong with that request. Please try again.