# Metadata

```yaml
Course:   DS 5001
Module:   08a Lab
Topic:    Using Mazo
Author:   R.C. Alvarado
Date:     23 March 2023
```

**Purpose:** Demonstrate utility of Mazo output.

# Synopsis

1. Download and install `mazo` from https://github.com/ontoligent-design/mazo
2. Set up your project directory and put in appropriate files and diretories as per `README.md`.
3. Run mazo from commandline to generate topic models.

# Set Up

## Config

In [1]:
!ls -l output/

total 8
drwxr-xr-x@ 12 rca2t1  staff  384 Mar 23 13:44 [34mdemo-20-16795933002483132[m[m
drwxr-xr-x@ 12 rca2t1  staff  384 Mar 23 18:15 [34mdemo-20-16796095708890848[m[m
drwxr-xr-x@ 12 rca2t1  staff  384 Mar 23 18:20 [34mdemo-20-16796098938625019[m[m
drwxr-xr-x@ 12 rca2t1  staff  384 Mar 23 13:32 [34mnovels-20-16795927188712971[m[m
drwxr-xr-x@ 12 rca2t1  staff  384 Mar 23 16:10 [34mnovels-20-16796021757424011[m[m
-rw-r--r--@  1 rca2t1  staff   41 Mar 23 14:24 readme.txt


In [2]:
model_id = 'demo-20-16796098938625019'

In [3]:
mazo_tables = f"./output/{model_id}/tables/*.*"
db_file = f"./db/{model_id}.db"

## Imports

In [4]:
import pandas as pd
import numpy as np
from glob import glob
import sqlite3

In [26]:
import mazo.polite

Using /Users/rca2t1/opt/mallet/bin/mallet as mallet.
Using ./output as output directory.
Please enter an integer for the number of topics.


AssertionError: 

In [25]:
polite.polite.Polite.schema['PHRASE'].index

['phrase_str']

# Import Model Tables

In [6]:
class Model: pass
model = Model()

In [7]:
db = sqlite3.connect(db_file)

In [9]:
for tfile in glob(mazo_tables):
    tname = tfile.split('/')[-1].split('.')[0]
    print(tname)
    df = pd.read_csv(tfile)
    idx = polite.Polite.schema[tname].index
    try:
        df = df.set_index(idx)
    except:
        print(tname, "has no", idx)
        pass
    try:
        df.to_sql(tname, db)
    except: 
        pass
    setattr(model, tname, df)

TOPIC
TOPICWORD_DIAGS
TOPICPHRASE
DOCTOPIC_NARROW
VOCAB
DOCTOPIC
TOPICWORD
PHRASE
PHRASE has no ['phrase_str']
TOPICWORD_NARROW
DOCWORD
DOC


In [10]:
model.PHRASE

Unnamed: 0,topic_phrase,n_topics,n_words,topic_list,topic_weight_mean
0,aged french,1,11,13,0.000514
1,almond paste,1,13,08,0.000826
2,apple aromas,1,11,03,0.000817
3,apple flavors,1,11,03,0.000817
4,apple melon,1,10,03,0.000743
...,...,...,...,...,...
140,white pepper,2,66,04 07,0.003429
141,wild berry,1,29,04,0.002683
142,wood aging,2,111,12 19,0.003025
143,yellow fruit,1,15,19,0.000986


In [11]:
model.TOPICWORD_NARROW

Unnamed: 0_level_0,Unnamed: 1_level_0,word_count
word_id,topic_id,Unnamed: 2_level_1
0,5,1233
0,3,672
0,19,407
0,2,357
0,17,346
...,...,...
39012,5,1
39013,2,1
39014,2,1
39015,14,1


# Next

* Look at how the Mallet output files look
* Open db in [DBeaver](https://dbeaver.io/)
* Look at how Polo is built on top of such a database

In [12]:
!cd mazo-demo/

zsh:cd:1: no such file or directory: mazo-demo/


In [13]:
ls

M08a_01_Gensim.ipynb                  [34mdb[m[m/
M08a_02_GensimRunCompareLDA.ipynb     [34mmallet[m[m/
M08a_03_MALLET.ipynb                  [34mmazo_demo[m[m/
M08a_04_Mazo.ipynb                    [34mmimno-Mallet-39d1f92[m[m/
[34m_HIDE[m[m/                                mimno-Mallet-v202108-44-g39d1f92.zip
config.ini                            [34moutput[m[m/
[34mcorpus[m[m/


In [14]:
from mazo_demo.polite import polite

In [15]:
polite.Polite.schema['DOC'].index

['doc_id']

In [16]:
ls

M08a_01_Gensim.ipynb                  [34mdb[m[m/
M08a_02_GensimRunCompareLDA.ipynb     [34mmallet[m[m/
M08a_03_MALLET.ipynb                  [34mmazo_demo[m[m/
M08a_04_Mazo.ipynb                    [34mmimno-Mallet-39d1f92[m[m/
[34m_HIDE[m[m/                                mimno-Mallet-v202108-44-g39d1f92.zip
config.ini                            [34moutput[m[m/
[34mcorpus[m[m/


In [17]:
ls

M08a_01_Gensim.ipynb                  [34mdb[m[m/
M08a_02_GensimRunCompareLDA.ipynb     [34mmallet[m[m/
M08a_03_MALLET.ipynb                  [34mmazo_demo[m[m/
M08a_04_Mazo.ipynb                    [34mmimno-Mallet-39d1f92[m[m/
[34m_HIDE[m[m/                                mimno-Mallet-v202108-44-g39d1f92.zip
config.ini                            [34moutput[m[m/
[34mcorpus[m[m/
