## Text Analysis - Topic Modeling
### <span style='color: green'>SETUP </span> Prepare Notebook and Load Model <span style='float: right; color: red'>MANDATORY</span>

In [1]:

from typing import Callable

import __paths__  # pylint: disable=unused-import
import bokeh.plotting
import penelope.notebook.topic_modelling as ntm
from IPython.display import display
from penelope.utility import pandas_utils

from notebooks.source.courier import overload_state_on_loaded_handler

bokeh.plotting.output_notebook(hide_banner=True)
pandas_utils.set_default_options()

__paths__.data_folder = "/data/inidun"
__paths__.resources_folder = f"{__paths__.data_folder}/resources"

corpus_folder: str = __paths__.data_folder

current_state: Callable[[], ntm.TopicModelContainer] = ntm.TopicModelContainer.singleton
current_state().register(None, callback=overload_state_on_loaded_handler)

### <span style='color: green'>PREPARE</span> Load Topic Model <span style='float: right; color: red'>MANDATORY</span>

In [2]:
load_gui: ntm.LoadGUI = ntm.LoadGUI(data_folder=corpus_folder, state=current_state()).setup()
display(load_gui.layout())

VBox(children=(HBox(children=(Dropdown(description='Model', layout=Layout(width='40%'), options=('tm_courier_v…

### <span style='color: green;'>BROWSE</span> Find topics by token<span style='color: red; float: right'>TRY IT</span>

Displays topics in which given token is among toplist of dominant words.

In [3]:
fd_ui = ntm.WithPivotKeysText.FindTopicDocumentsGUI(
    current_state(), vertical=True, year_span=(1990, 1992), width='160px'
).setup()
display(fd_ui.layout())

VBox(children=(HBox(children=(VBox(children=(HTML(value='<b>Threshold</b> 0.20'), FloatSlider(value=0.2, conti…

### <span style='color: green;'>BROWSE</span> Browse Topic Documents<span style='color: red; float: right'>TRY IT</span>

Displays documents in which a topic occurs above a given threshold.

In [4]:
td_ui = ntm.WithPivotKeysText.BrowseTopicDocumentsGUI(
    current_state(), vertical=True, year_span=(1990, 1995), width='400px'
).setup()
display(td_ui.layout())

VBox(children=(HBox(children=(VBox(children=(HTML(value='<b>Threshold</b> 0.20'), FloatSlider(value=0.2, conti…

### <span style='color: green;'>VISUALIZE</span> Display Topic's Word Distribution as a Wordcloud<span style='color: red; float: right'> TRY IT</span>

In [3]:
ntm.display_topic_wordcloud_gui(current_state())

VBox(children=(HTML(value=''), HBox(children=(HBox(children=(Button(button_style='success', description='◀', l…

### <span style='color: green;'>VISUALIZE</span> Topic-Word Distribution<span style='color: red; float: right'>TRY IT</span>


In [4]:
ntm.display_topic_word_distribution_gui(current_state())

VBox(children=(HBox(children=(HBox(children=(Button(button_style='success', description='◀', layout=Layout(wid…

### <span style='color: green;'>VISUALIZE</span> Topic Trends over Time<span style='color: red; float: right'>RUN</span>

In [3]:
ntm.display_topic_trends_gui(current_state())

VBox(children=(HBox(children=(VBox(children=(HTML(value='<b>Years</b> 1947-1957'), IntRangeSlider(value=(1947,…

### <span style='color: green;'>VISUALIZE</span> Topic Trends Overview<span style='color: red; float: right'>TRY IT</span>

- The topic shares  displayed as a scattered heatmap plot using gradient color based on topic's weight in document.
- [Stanford’s Termite software](http://vis.stanford.edu/papers/termite) uses a similar visualization.

In [4]:
ntm.display_topic_trends_overview_gui(current_state())

AttributeError: 'super' object has no attribute 'observe'

### <span style='color: green;'>VISUALIZE</span> Topic Topic Network<span style='color: red; float: right'>TRY IT</span>

Computes weighted graph of topics co-occurring in the same document. Topics are defined as co-occurring in a document if they both have a weight above given threshold. The edge weights are the number of co-occurrences (binary yes or no). Node size reflects topic proportions over the entire corpus computed in accordance to LDAvis topic proportions.

In [5]:
ntm.display_topic_topic_network_gui(current_state())

VBox(children=(HBox(children=(VBox(children=(HTML(value='<b>Years</b> 1947-1957'), IntRangeSlider(value=(1947,…

<penelope.notebook.topic_modelling.topic_topic_network_gui.TopicTopicGUI at 0x7f706cf1bc90>

### <span style='color: green;'>VISUALIZE</span> Document Topic Network<span style='color: red; float: right'>TRY IT</span>


In [6]:
dtdn_ui: ntm.TopicDocumentNetworkGui = ntm.DefaultTopicDocumentNetworkGui(
    state=current_state(), pivot_key_specs=None
).setup()
display(dtdn_ui.layout())

VBox(children=(HBox(children=(VBox(children=(HTML(value='<b>Years</b> 1947-1957'), IntRangeSlider(value=(1947,…

### <span style='color: green;'>VISUALIZE</span> Pivot-Topic Network<span style='color: red; float: right'>TRY IT</span>


In [7]:
ptn_ui: ntm.PivotTopicNetworkGUI = ntm.PivotTopicNetworkGUI(pivot_key_specs=None, state=current_state()).setup()
display(ptn_ui.layout())

VBox(children=(HBox(children=(VBox(children=(HTML(value='<b>Pivot Key</b>'), Dropdown(layout=Layout(width='140…

### <span style='color: green;'>VISUALIZE</span> Focus-Topic Document Network<span style='color: red; float: right'>TRY IT</span>


In [8]:
ftdn_ui: ntm.TopicDocumentNetworkGui = ntm.FocusTopicDocumentNetworkGui(
    state=current_state(), pivot_key_specs=None
).setup()
display(ftdn_ui.layout())

VBox(children=(HBox(children=(VBox(children=(HTML(value='<b>Years</b> 1947-1957'), IntRangeSlider(value=(1947,…

### <span style='color: green;'>VISUALIZE</span> Topic-Token  Network<span style='color: red; float: right'>TRY IT</span>

In [9]:
custom_styles = {'edges': {'curve-style': 'haystack'}}
w = ntm.create_topics_token_network_gui(data_folder=corpus_folder, custom_styles=custom_styles)
display(w.layout())

VBox(children=(HBox(children=(VBox(children=(HTML(value='<b>Model</b>'), Dropdown(layout=Layout(width='200px')…