# Topic Exploration

This notebook serves as an interface with the LDA model, and uses several utility functions that are specifically implemented to explore what each topic contains.

Let's start by loading up the model and these auxiliar functions:

In [1]:
import os
import sys 

# Jupyter Notebooks are not good at handling relative imports.
# Best solution (not great practice) is to add the project's path
# to the module loading paths of sys.

module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
    sys.path.append(module_path)

In [2]:
from utils.exploration import get_articles_in_topic, get_titles_in_topic
from utils.exploration import get_keywords_in_topic, summary
from utils.exploration import summarize_topic, topic_top_n

And now, the LDA model:

In [3]:
from gensim.models.ldamodel import LdaModel
lda = LdaModel.load("LDA_gensim_90_final.model")

## Getting the top words in a topic

The function `topic_top_n(lda, topic_id, n=10, verbose=False)` grabs the lda model and a topic id and returns the top `n` words and their probabilities:

In [4]:
topic_top_n(lda, 17, n=20)

[('ser', '0.0119'),
 ('acción', '0.0117'),
 ('libertar', '0.00843'),
 ('moral', '0.00798'),
 ('práctico', '0.00769'),
 ('caso', '0.00632'),
 ('voluntad', '0.00498'),
 ('bien', '0.00467'),
 ('idea', '0.00459'),
 ('deber', '0.00445'),
 ('accionar', '0.00438'),
 ('ley', '0.00434'),
 ('razón', '0.00418'),
 ('agente', '0.00408'),
 ('natural', '0.00392'),
 ('teoría', '0.00379'),
 ('tipo', '0.00372'),
 ('concepto', '0.00368'),
 ('punto', '0.00363'),
 ('acto', '0.00357')]

## Getting all articles in a topic

You can get all `Article`s in a topic using `get_articles_in_topic(topic_id, min_prob=0.1, n=None)`. Set an `n` if you want to cap the results.

In [5]:
top_5_articles = get_articles_in_topic(17, min_prob=0.5, n=5)
top_5_articles

[(<utils.Article.Article at 0x7fb3fc527d10>, 0.9507384300231934),
 (<utils.Article.Article at 0x7fb3fe8eac10>, 0.9426233768463135),
 (<utils.Article.Article at 0x7fb3f7392b90>, 0.8738116025924683),
 (<utils.Article.Article at 0x7fb3f74e3f90>, 0.8089929819107056),
 (<utils.Article.Article at 0x7fb3feb56c90>, 0.8071157336235046)]

Notice that it also returns the probability. An example of how to use it:

In [6]:
for article, _ in top_5_articles:
    print(article.title)

Razón, acción y debilidad de la voluntad. Una lectura semántica*
Contrafácticos y lógica deóntica de la acción
El sentido de la libertad
Sobre el valor de la verdad: una crítica a Richard Rorty
Hobbes y la moral egoísta en el estado de naturaleza


## Getting titles and keywords

For this, we have the functions `get_titles_in_topic(topic_id, min_prob=0.1, n=None)` and `get_keywords_in_topic(topic_id, min_prob=0.1, n=None)`. Some examples:

In [7]:
get_titles_in_topic(17, n=5)

['Razón, acción y debilidad de la voluntad. Una lectura semántica*',
 'Contrafácticos y lógica deóntica de la acción',
 'El sentido de la libertad',
 'Sobre el valor de la verdad: una crítica a Richard Rorty',
 'Hobbes y la moral egoísta en el estado de naturaleza']

If the `Article` class has no value for the `keyword` attribute, then a default `"NO KEYWORDS FOUND"` is put into the list.

In [8]:
get_keywords_in_topic(17, n=5)

['NO KEYWORDS FOUND',
 'contrafácticos; acción; modelos deónticos; teoría social; Wright; lógica modal temporal',
 'NO KEYWORDS FOUND',
 'NO KEYWORDS FOUND',
 'Hobbes; ley natural; estado natural; estado civil; egoísmo; respuesta al necio; dilema del prisionero']

## Summarize an article

If you have an `Article` object, you can use the `summary(lda, article, probability=None, topics=False)` to get a quick summary of it.

In [9]:
# Let's pick the last one from the list we've created:
article, prob = top_5_articles[-1]
summary(lda, article, probability=prob, topics=True)

--------------------------------------------------
		 TITLE 		
Hobbes y la moral egoísta en el estado de naturaleza
(with prob. 0.8071)
		 KEYWORDS 		
Hobbes; ley natural; estado natural; estado civil; egoísmo; respuesta al necio; dilema del prisionero

		 ABSTRACT 		
No abstract stored

		 TOPICS 		
Topic 17 (w. probability 0.807)
Topic 26 (w. probability 0.089)
Topic 27 (w. probability 0.015)
Topic 51 (w. probability 0.074)


## Summarizing an entire topic

For this, we have implemented `summarize_topic(lda, topic_id, min_prob=0.1, n=None)`.

In [10]:
summarize_topic(lda, 17, n=5)

--------------------------------------------------
		 TITLE 		
Razón, acción y debilidad de la voluntad. Una lectura semántica*
		 KEYWORDS 		
No keywords stored

		 ABSTRACT 		
No abstract stored

--------------------------------------------------
		 TITLE 		
Contrafácticos y lógica deóntica de la acción
		 KEYWORDS 		
contrafácticos; acción; modelos deónticos; teoría social; Wright; lógica modal temporal

		 ABSTRACT 		
No abstract stored

--------------------------------------------------
		 TITLE 		
El sentido de la libertad
		 KEYWORDS 		
No keywords stored

		 ABSTRACT 		
No abstract stored

--------------------------------------------------
		 TITLE 		
Sobre el valor de la verdad: una crítica a Richard Rorty
		 KEYWORDS 		
No keywords stored

		 ABSTRACT 		
No abstract stored

--------------------------------------------------
		 TITLE 		
Hobbes y la moral egoísta en el estado de naturaleza
		 KEYWORDS 		
Hobbes; ley natural; estado natural; estado civil; egoísmo; respuesta al n

## Using pyLDAvis

In [11]:
from utils.loaders import loadCorpusList

corpusPath = '../data/corpus'
corpusList = loadCorpusList(corpusPath)
corpusList = [a for a in corpusList if a.lang == "es"]

In [12]:
from utils.exploration import prepare_bag_of_words
corpus = [lda.id2word.doc2bow(prepare_bag_of_words(a)) for a in corpusList]

In [13]:
import pyLDAvis.gensim
lda_display = pyLDAvis.gensim.prepare(lda, corpus, lda.id2word, sort_topics=False)
pyLDAvis.display(lda_display)