# LDA Output

>Now that the LDA model has been trained, we can extract the desired output.
>
>Specifically, we will obtain the following:
>* Assigned name for each Genre
>* Word-weighting within each Genre (Top 50 terms)
>* Genre-breakdown per Anime show
>
>These will be written in separate files in the JSON Lines format, for later consumption.

## Load LDA Model

In [1]:
from gensim.models import LdaModel

lda_model = LdaModel.load('lda_model/lda_model')

## Read Text Input

In [2]:
from gensim.corpora.dictionary import Dictionary
from lda_helpers import read_lda_input  # Package with helpers

# Read anime show titles -with text-, for later
title_texts = read_lda_input('lda_input/lda_input.jl', title=True)
texts = [title_text[1] for title_text in title_texts]
id2word = Dictionary(texts)
corpus = [id2word.doc2bow(text) for text in texts]

## Visualize LDA Genres

>Previously, we provided an *eta* matrix to specify a prior assumptions about the genres.
>
>However, the genres produced by the LDA model may not be in the same order as initially specified.
>
>Thus we must observe the produced genres, and then re-assign the Genre names in the appropriate order.

In [3]:
import pyLDAvis
import pyLDAvis.gensim

pyLDAvis.enable_notebook()
LDAvis_display = pyLDAvis.gensim.prepare(lda_model, corpus, id2word, sort_topics=False)

# Suppress warning from using pyLDAvis
import warnings
warnings.filterwarnings('ignore', category=DeprecationWarning)

LDAvis_display

## Assign Genre Names

>These match what is shown in the *pyLDAvis* visualization, which is also what *lda_model* holds.
>
>Use the same genres names as provided in *lda_seed.py*

In [4]:
genre_names = [
    'Adventure',      # 1
    'Sports',         # 2
    'Sci-Fi',         # 3
    'Mystery',        # 4
    'Slice of Life',  # 5
    'School'          # 6
]

# Save LDA Results

In [5]:
import json
from os import mkdir

mkdir('lda_output')

## Genre Names

In [6]:
with open('lda_output/genre_names.jl', 'w') as f:
    for i, genre_name in enumerate(genre_names):
        # Write output JSON as newline
        record = {
            'LDA Genre ID': i,
            'LDA Genre Name': genre_name
        }
        line = json.dumps(record)
        f.write('{}\n'.format(line))

## Word Distribution of each Genre (Top 50 Words by Weight)

In [7]:
with open('lda_output/genre_word_weights.jl', 'w') as f:
    for i in range(len(genre_names)):
        genre = lda_model.show_topic(i, topn=50)
        for word, word_weight in genre:
            # Write output JSON as newline
            record = {
                'LDA Genre ID': i,
                'Word': word,
                'Word Weight': float(word_weight)
            }
            line = json.dumps(record)
            f.write('{}\n'.format(line))

## Genre Breakdown of each Anime

In [8]:
with open('lda_output/anime_genre_weights.jl', 'w') as f:
    for i, bow in enumerate(corpus):
        title = title_texts[i][0]
        anime_genres = lda_model.get_document_topics(bow, minimum_probability=0)
        for genre_id, genre_weight in anime_genres:
            # Write output JSON as newline
            record = {
                'Anime Title': title,
                'LDA Genre ID': genre_id,
                'LDA Genre Weight': float(genre_weight)
            }
            line = json.dumps(record)
            f.write('{}\n'.format(line))