## The Culture of International Relations - Text Analysis
### <span style='color: green'>SETUP </span> Prepare and Setup Notebook <span style='float: right; color: red'>MANDATORY</span>

In [1]:
# Setup
%load_ext autoreload
%autoreload 2

import sys, os, collections, zipfile
import re, typing.re
import nltk, textacy, spacy 
import pandas as pd
import ipywidgets as widgets

sys.path = list(set(['..', '../3_text_analysis']) - set(sys.path)) + sys.path

import matplotlib.pyplot as plt
import common.utility as utility
import common.widgets_utility as widgets_utility
import common.widgets_config as widgets_config
import common.config as config
import common.utility as utility
import common.treaty_utility as treaty_utility
import common.treaty_state as treaty_repository
import treaty_corpus
import textacy.keyterms

from beakerx.object import beakerx
from beakerx import *
from IPython.display import display, set_matplotlib_formats

logger = utility.getLogger('corpus_text_analysis')

utility.setup_default_pd_display(pd)

treaty_repository.load_wti_index_with_gui(data_folder=config.DATA_FOLDER)

%matplotlib inline

current_corpus_container = lambda: textacy_utility.CorpusContainer.container()
current_corpus = lambda: textacy_utility.CorpusContainer.corpus()

VBox(children=(Dropdown(description='Load index', layout=Layout(width='300px'), options=(('WTI 7CULT', 'is_cul…

## <span style='color: green'>PREPARE </span> Load and Prepare Corpus <span style='float: right; color: red'>MANDATORY</span>


In [2]:
import textacy_corpus_utility as textacy_utility
import textacy_corpus_gui

try:
    container = current_corpus_container()
    textacy_corpus_gui.display_corpus_load_gui(config.DATA_FOLDER, treaty_repository.current_wti_index(), container)
except Exception as ex:
    logger.error(ex)

VBox(children=(IntProgress(value=0, layout=Layout(width='90%'), max=5), HBox(children=(Dropdown(description='C…

### <span style='color: green;'>DESCRIBE</span> Most Discriminating Terms<span style='color: blue; float: right'>OPTIONAL</span>
References
King, Gary, Patrick Lam, and Margaret Roberts. “Computer-Assisted Keyword and Document Set Discovery from Unstructured Text.” (2014). http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.458.1445&rep=rep1&type=pdf.

Displays the *most discriminating words* between two sets of treaties. Each treaty group can be filtered by country and period (signed year). In this way, the same group of countries can be studied for different time periods, or different groups of countries can be studied for the same time period. If "Closed region" is checked then **both** parties must be to the selected set of countries, from each region. In this way, one can for instance compare treaties signed between countries within the WTI group "Communists", against treaties signed within "Western Europe". 

<b>#terms</b> The number of most discriminating terms to return for each group.<br>
<b>#top</b> Only terms with a frequency within the top #top terms out of all terms<br>
<b>Closed region</b> If checked, then <u>both</u> treaty parties must be within selected region

In [3]:
import most_discriminating_terms_gui
try:
    most_discriminating_terms_gui.display_gui(treaty_repository.current_wti_index(), current_corpus())
except Exception as ex:
    logger.error(ex)

VBox(children=(HBox(children=(VBox(children=(SelectMultiple(description='Group 1', layout=Layout(width='250px'…