# Building LDA Mallet Model

So far you have seen Gensim’s inbuilt version of the LDA algorithm. Mallet’s version, however, often gives a better quality of topics.

Based on [Gensim Topic Modeling](https://www.machinelearningplus.com/nlp/topic-modeling-gensim-python/)

In [5]:
import os
import sys
import gensim
import pyLDAvis
import pyLDAvis.gensim

from gensim import corpora
from gensim import models
from gensim.models.coherencemodel import CoherenceModel
from gensim.models.wrappers import LdaMallet

print('Python Version: %s' % (sys.version))

Python Version: 2.7.15 | packaged by conda-forge | (default, Feb 28 2019, 04:00:11) 
[GCC 7.3.0]


In [3]:
dictionary = corpora.Dictionary.load('documents.dict')
corpus = corpora.MmCorpus('documents.mm')
lda_model = models.LdaModel.load('lda_model')
print(dictionary)
print(corpus)
print(lda_model)

Dictionary(7714 unique tokens: [u'francesco', u'csuci', u'univesidad', u'sation', u'efimenko']...)
MmCorpus(4 documents, 7714 features, 10760 non-zero entries)
LdaModel(num_terms=7714, num_topics=20, decay=0.5, chunksize=100)


  'See the migration notes for details: %s' % _MIGRATION_NOTES_URL


In [11]:
import pickle
#with open('documents', 'wb') as f: #save
#    pickle.dump(mylist, f)

with open('documents', 'rb') as f: #load
    documents = pickle.load(f)

Gensim provides a wrapper to implement Mallet’s LDA from within Gensim itself. You only need to [install](http://mallet.cs.umass.edu/download.php), unzip it and provide the path to mallet in the unzipped directory to gensim.models.wrappers.LdaMallet. See how I have done this below.

In [4]:
mallet_path = 'mallet-2.0.8/bin/mallet' # update this path
ldamallet = gensim.models.wrappers.LdaMallet(mallet_path, corpus=corpus, num_topics=20, id2word=dictionary)

In [10]:
# Show Topics
#print(ldamallet.show_topics(formatted=False))

# Compute Coherence Score
coherence_model_ldamallet = CoherenceModel(model=ldamallet, 
                                           texts=documents, 
                                           dictionary=dictionary, 
                                           coherence='c_v')
coherence_ldamallet = coherence_model_ldamallet.get_coherence()
print('\nCoherence Score: ', coherence_ldamallet)

('\nCoherence Score: ', 0.5411833430131268)


In [12]:
ldamallet.save('ldamallet')