# Topic Classification and EDA

Here we explore the outputs of our topic model.

We:

* Identify the arXiv categories that topics are related to
* Explore their evolution
* Provide functions to chart the evolution of ad hoc topic groupings
* Output a topic - category mapping and table with example topics by category

## Preamble

In [None]:
%run ../notebook_preamble.ipy

from itertools import combinations, product, chain
from narrowing_ai_research.utils.altair_utils import *
from narrowing_ai_research.paper.s2_topic_classification_eda import *
import random
import altair as alt

%config Completer.use_jedi = False

In [None]:
# Run this if you want to save charts
# driv = altair_visualisation_setup()

## Load data

In [None]:
papers,topic_mix,topic_long,cats,cat_sets,one_cat_ids,arxiv_cat_lookup = read_process_data()

## Analysis

### Calculate topic salience in different categories

In [None]:
topic_rca = make_topic_cat_specialisation(topic_mix,cat_sets,one_cat_ids)

In [None]:
topic_spec_chart = make_chart_topic_spec(topic_rca,topic_mix,arxiv_cat_lookup)

In [None]:
# Visualise
topic_spec_chart

In [None]:
# Map topics vs categories
topic_category_map = extract_topics(topic_rca,1.2)

### Visualise topic trends (by associated category)

In [None]:
topic_trends = extract_topic_trends(topic_long,topic_category_map,topic_value=0.1,window=15)

In [None]:
make_chart_topic_trends(topic_trends,arxiv_cat_lookup,save=False,year_sort=2006)

### Visualise microtrends

In [None]:
p = ['privacy_private_differential_privacy_differentially_private_privacy_preserving',
     'federated_learning_server_clients_client_federated']

e = ['law_society_legal_ethical_stakeholders','bias_biases_biased_exposure_gender',]

i = ['explanations_explanation_trust_explaining_explainable',
     'interpretation_explain_interpret_interpreted_explained']

ex_trends = pd.concat(
    [micro_trends(papers,topic_mix,cats,threshold=0.05,name=n)[f'{n}_share'] for cats,n in zip([p,e,i],['privacy','ethics','explainability'])],axis=1)
    

In [None]:
plot_microtrends(ex_trends)