# Topic Exploration

This notebook serves as an interface with the LDA model, and uses several utility functions that are specifically implemented to explore what each topic contains.

Let's start by loading up the model and these auxiliary functions:

And now, the LDA model:

In [1]:
from utils.model import Model
from utils.corpus import Corpus

corpus = Corpus(registry_path = 'utils/article_registry.json')
model = Model(corpus, num_topics=50)
#model.load()

Loading corpus. Num. of articles: 771


In [2]:
model.train()

[(0, 16), (1, 1), (2, 3), (3, 1), (4, 1), (5, 1), (6, 3), (7, 1), (8, 1), (9, 1), (10, 4), (11, 1), (12, 3), (13, 2), (14, 8), (15, 1), (16, 2), (17, 1), (18, 1), (19, 2), (20, 1), (21, 4), (22, 1), (23, 2), (24, 1), (25, 2), (26, 1), (27, 1), (28, 1), (29, 3), (30, 1), (31, 2), (32, 1), (33, 2), (34, 1), (35, 1), (36, 1), (37, 1), (38, 1), (39, 2), (40, 2), (41, 1), (42, 2), (43, 3), (44, 1), (45, 2), (46, 1), (47, 2), (48, 1), (49, 6), (50, 1), (51, 1), (52, 1), (53, 1), (54, 2), (55, 3), (56, 12), (57, 1), (58, 2), (59, 1), (60, 1), (61, 1), (62, 1), (63, 1), (64, 2), (65, 8), (66, 1), (67, 1), (68, 1), (69, 1), (70, 1), (71, 2), (72, 1), (73, 3), (74, 1), (75, 1), (76, 1), (77, 1), (78, 1), (79, 1), (80, 5), (81, 1), (82, 1), (83, 4), (84, 1), (85, 2), (86, 3), (87, 1), (88, 1), (89, 1), (90, 2), (91, 2), (92, 1), (93, 1), (94, 1), (95, 3), (96, 1), (97, 1), (98, 4), (99, 1), (100, 1), (101, 1), (102, 1), (103, 2), (104, 4), (105, 1), (106, 3), (107, 2), (108, 1), (109, 3), (110, 1

KeyboardInterrupt: 

In [6]:
len(model.get_orphans())

724

Our `Model` object uses `Topic` objects that we can interface with. We will use methods of both classes to navigate the model we have trained.

## Getting the top words in a topic

The `Topic.get_top_words()` method returns the top `n` words and their probabilities:

In [25]:
model.topics[3].get_top_words()

[('merleauponty', '0.011'),
 ('marx', '0.010'),
 ('experiencia', '0.010'),
 ('arte', '0.009'),
 ('él', '0.008'),
 ('cuerpo', '0.007'),
 ('obra', '0.007'),
 ('forma', '0.007'),
 ('conciencia', '0.006'),
 ('capital', '0.005')]

## Getting all articles in a topic

You can get all article ID's and probabilities in a topic using the `Topic.get_top_articles()` method. Set an `n` if you want to cap the results. By default we have `n=5`.

In [26]:
model.topics[3].get_top_articles()

[('66898', 0.8983381390571594),
 ('48173', 0.8065474629402161),
 ('57601', 0.39241376519203186),
 ('8831', 0.009229096584022045)]

We can also get titles.

In [27]:
model.topics[5].get_top_titles()

['Leonardo Ivarola (2019/09/01). Consecuencias alternativas y asimetría de resultados en la implementación de políticas socioeconómicas',
 'José Manuel Chillón (2019/01/01). Heidegger y la prudencia aristotélica como protofenomenología',
 'Matías Abeijón (2019/01/01). Historia, estructura y experiencia. Relaciones metodológicas entre Michel Foucault y Georges Dumézil',
 'Jorge Aurelio Díaz (2021/12/15). Impacto de las políticas gubernamentales en Ideas y Valores',
 'José H. Silveira De Brito (2012/01/01). HUMANIZAÇÃO DA SAÚDE: DA INTENÇÃO À INTELIGÊNCIA EMOTIVA PELAS IDEIAS']

## Summarize an article

If you have an `Article` object, you can use the `Model.get_topics_in_article(Article)` method to get a summary of what topics the article is likely to be in.

In [22]:
model.get_topics_in_article(model.articles[3])

## Summarizing an entire topic

For this, we have implemented `Topic.summary()`.

In [6]:
from IPython.display import Markdown as md


md(model.topics[0].summary())

# Topic 0

## Top words:
|    | Word       |   Probability |
|---:|:-----------|--------------:|
|  0 | smith      |         0.01  |
|  1 | él         |         0.01  |
|  2 | moral      |         0.008 |
|  3 | naturaleza |         0.006 |
|  4 | objeto     |         0.006 |
|  5 | humano     |         0.005 |
|  6 | razón      |         0.005 |
|  7 | idea       |         0.005 |
|  8 | kant       |         0.005 |
|  9 | hombre     |         0.004 |
## Top articles:

* Rosa Colmenarejo (2016/01/01). Enfoque de capacidades  y sostenibilidad. Aportaciones de Amartya Sen  y Martha Nussbaum
* José de la Cruz Garrido (2015/09/01). El papel de la imaginación en la refutación de Adam Smith a la tesis del homo economicus
* Nicolás Novoa Artigas (2016/01/01). La problemática posición  de Adam Smith acerca  de la suerte moral
* Martín Fleitas González (2015/09/01). ¿Solo hay realismo o constructivismo moral dentro del neokantismo contemporáneo? Notas para una fundamentación moral kantiana con base en la idea de libertad
* Emilse Galvis (2016/01/01). La subjetivación política  más allá de la esfera pública: Michel Foucault, Jacques Rancière  y Simone Weil


We can export the summary of all topics in the model to pdf by using the `Model.export_summary()` method. This will output a `summary.pdf` file we can read.

In [7]:
model.export_summary("summary.pdf")