# Topic Coherence example
This is an example of the sample usage of the topic coherence pipeline to calculate topic coherence on a set of topics obtained from LDA on an example corpus.

[Here's](http://svn.aksw.org/papers/2015/WSDM_Topic_Evaluation/public.pdf) the paper from which this pipeline is implemented.

In [1]:
import logging
from gensim.corpora import Dictionary, MmCorpus
from gensim.models.ldamodel import LdaModel
from gensim.models.coherencemodel import CoherenceModel
from gensim.matutils import argsort

### Set up logging

In [2]:
logger = logging.getLogger()
logger.setLevel(logging.DEBUG)
logging.debug("test")

### Set up example corpus

In [3]:
texts = [['human', 'interface', 'computer'],
 ['survey', 'user', 'computer', 'system', 'response', 'time'],
 ['eps', 'user', 'interface', 'system'],
 ['system', 'human', 'system', 'eps'],
 ['user', 'response', 'time'],
 ['trees'],
 ['graph', 'trees'],
 ['graph', 'minors', 'trees'],
 ['graph', 'minors', 'survey']] # Corpus has 12 unique words

In [4]:
dictionary = Dictionary(texts)
corpus = [dictionary.doc2bow(text) for text in texts]
MmCorpus.serialize('/tmp/deerwester.mm', corpus) # Currently corpus is like: [[(word1, freq1), (word2, freq2)], [(word3, freq3)]]

### Perform LDA on corpus

In [5]:
topics = []
str_topics = []
lm = LdaModel(corpus=corpus)
# lm.state.get_lambda() give a shape of (100, 12).
for topic in lm.state.get_lambda():
    topic = topic / topic.sum()
    bestn = argsort(topic, topn=3, reverse=True)
    topics.append(bestn)
    beststr = [(topic[id], lm.id2word[id]) for id in bestn]
    str_topics.append(beststr)
# topics is now a (100, 3) array

### Set up coherence model

In [6]:
cm = CoherenceModel(corpus, topics, coherence='u_mass') # Let's use the U_Mass topic coherence

### View the different pipeline parameters of U_Mass coherence

In [7]:
print cm

CoherenceModel(segmentation=<function s_one_pre at 0x7fd51ceb32a8>, probability estimation=<function p_boolean_document at 0x7fd51ceb3500>, confirmation measure=<function log_conditional_probability at 0x7fd51ceb3578>, aggregation=<function arithmetic_mean at 0x7fd51ceb35f0>)


### Get U_Mass coherence for given topics on the given corpus

In [8]:
print cm.get_coherence()

-1.11888554406
