Get raw ngram count in addition to logProb #3

GoogleCodeExporter · 2015-03-22T17:33:51Z

A request for adding the feature to obtain also the raw count of an n-gram if 
Google n-gram data is used in the back-end.

Original issue reported on code.google.com by torsten....@gmail.com on 14 Jul 2011 at 7:14

The text was updated successfully, but these errors were encountered:

GoogleCodeExporter · 2015-03-22T17:33:51Z

Do you need this access to be fast? I have some functionality which you can 
access by doing:
 new NgramMapWrapper<W, LongRef>(lm.getNgramMap(), lm.getWordIndexer());

on a StupidBackoffLm. This gives a Map from List<W> to LongRefs. However, this 
interface is slow due to all the boxing/unboxing.

Original comment by adpa...@gmail.com on 14 Jul 2011 at 5:39

GoogleCodeExporter · 2015-03-22T17:33:51Z

Of course, fast is always better :)

However, it seems I have not fully understood the way the library works.
Two questions:
1) As the JavaDocs say that getLogProb() is slow, what is a fast way to get 
this information given a phrase?

2) How is this probability computed given the raw counts in the Google web1t 
corpus? It seems to me there should be an easy way to just invert the process.

thanks for your help,
Torsten

Original comment by torsten....@gmail.com on 15 Jul 2011 at 7:52

GoogleCodeExporter · 2015-03-22T17:33:52Z

1) NgramLanguageModel.getLogProb(List<W>) is "slow" because it has to turn the 
List<W> into an int[] first. Note that it is not actually "slow", just slow 
relative to the efficient accessors in 
ArrayEncodedNgramLanguageModel.getLogProb(int[]) and 
ContextEncodedNgramLanguageModel.getLogProb. I have added additional comments 
that direct you towards those calls so others are not confused by this. 

2) The probability is computed using Stupid Backoff. I have added a call to 
StupidBackoffLm that grabs the count, and will be releasing a new version of 
the code with this fix shortly.

Original comment by adpa...@gmail.com on 15 Jul 2011 at 6:19

Changed state: Fixed

GoogleCodeExporter added Type-Defect Priority-Medium auto-migrated labels Mar 22, 2015

GoogleCodeExporter closed this as completed Mar 22, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Get raw ngram count in addition to logProb #3

Get raw ngram count in addition to logProb #3

GoogleCodeExporter commented Mar 22, 2015

GoogleCodeExporter commented Mar 22, 2015

GoogleCodeExporter commented Mar 22, 2015

GoogleCodeExporter commented Mar 22, 2015

Get raw ngram count in addition to logProb #3

Get raw ngram count in addition to logProb #3

Comments

GoogleCodeExporter commented Mar 22, 2015

GoogleCodeExporter commented Mar 22, 2015

GoogleCodeExporter commented Mar 22, 2015

GoogleCodeExporter commented Mar 22, 2015