# Building LDA Mallet Model

So far you have seen Gensim’s inbuilt version of the LDA algorithm. Mallet’s version, however, often gives a better quality of topics.

Based on [Gensim Topic Modeling](https://www.machinelearningplus.com/nlp/topic-modeling-gensim-python/)

In [1]:
!pip install gensim pyLDAvis

Collecting gensim
  Downloading gensim-4.1.2.tar.gz (23.2 MB)
Collecting pyLDAvis
  Using cached pyLDAvis-3.3.1.tar.gz (1.7 MB)
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Installing backend dependencies: started
  Installing backend dependencies: finished with status 'done'
    Preparing wheel metadata: started
    Preparing wheel metadata: finished with status 'done'
Collecting smart_open>=1.8.1
  Using cached smart_open-5.2.1-py3-none-any.whl (58 kB)
Collecting funcy
  Using cached funcy-1.17-py2.py3-none-any.whl (33 kB)
Collecting numexpr
  Downloading numexpr-2.8.1-cp310-cp310-win_amd64.whl (88 kB)
Collecting future
  Using cached future-0.18.2.tar.gz (829 kB)
Building wheels for collected packages: gensim, pyLDAvis, future
  Building wheel for gensim (setup.py): started
  Building wheel for gensim (setup.py):

In [3]:
import os
import sys
import gensim
import pyLDAvis
#import pyLDAvis.gensim

from gensim import corpora
from gensim import models
from gensim.models.coherencemodel import CoherenceModel
from gensim.models.wrappers import LdaMallet

print('Python Version: %s' % (sys.version))

ModuleNotFoundError: No module named 'gensim.models.wrappers'

In [None]:
dictionary = corpora.Dictionary.load('documents.dict')
corpus = corpora.MmCorpus('documents.mm')
lda_model = models.LdaModel.load('lda_model')
print(dictionary)
print(corpus)
print(lda_model)

Dictionary(7714 unique tokens: [u'francesco', u'csuci', u'univesidad', u'sation', u'efimenko']...)
MmCorpus(4 documents, 7714 features, 10760 non-zero entries)
LdaModel(num_terms=7714, num_topics=20, decay=0.5, chunksize=100)


  'See the migration notes for details: %s' % _MIGRATION_NOTES_URL


In [None]:
import pickle
#with open('documents', 'wb') as f: #save
#    pickle.dump(mylist, f)

with open('documents', 'rb') as f: #load
    documents = pickle.load(f)

Gensim provides a wrapper to implement Mallet’s LDA from within Gensim itself. You only need to [install](http://mallet.cs.umass.edu/download.php), unzip it and provide the path to mallet in the unzipped directory to gensim.models.wrappers.LdaMallet. See how I have done this below.

In [None]:
mallet_path = 'mallet-2.0.8/bin/mallet' # update this path
ldamallet = gensim.models.wrappers.LdaMallet(mallet_path, corpus=corpus, num_topics=20, id2word=dictionary)

In [None]:
# Show Topics
#print(ldamallet.show_topics(formatted=False))

# Compute Coherence Score
coherence_model_ldamallet = CoherenceModel(model=ldamallet, 
                                           texts=documents, 
                                           dictionary=dictionary, 
                                           coherence='c_v')
coherence_ldamallet = coherence_model_ldamallet.get_coherence()
print('\nCoherence Score: ', coherence_ldamallet)

('\nCoherence Score: ', 0.5411833430131268)


In [None]:
ldamallet.save('ldamallet')