## TF-IDF results visualization

This is a visualization I made using the results of 
[Melanie Walsh's](https://melaniewalsh.org/)
excellent e-book,
[Introduction to Cultural Analytics](https://melaniewalsh.github.io/Intro-Cultural-Analytics/welcome.html)

The data comes from her analysis of US presidential inaugural addresses through history, in her chapter on
[TFIDF Text Analytics using Scikit Learn](https://melaniewalsh.github.io/Intro-Cultural-Analytics/05-Text-Analysis/03-TF-IDF-Scikit-Learn.html). I have added a bit of randomness to the results to disambiguate tied rankings.

In [1]:
import altair as alt

#### We need "files" as the beginning of the URL for JupyterLab...

Any terms in the list will be highlighted in the visualization with a red dot

In [25]:
URL = '/files/data/top_tfidf_plusRand.json'

term_list = ['nation','national','republic','union']

In [26]:
base = alt.Chart(URL).encode(
    x = 'rank:O',
    y = 'document:N'
).transform_window(
    rank = "rank()",
    sort = [alt.SortField("tfidf", order="descending")],
    groupby = ["document"],
)

# heatmap specification
heatmap = base.mark_rect().encode(
    color = 'tfidf:Q'
)

# red circle over terms in above list
circle = base.mark_circle(size=100).encode(
    color = alt.condition(
        alt.FieldOneOfPredicate(field='term', oneOf=term_list),
        alt.value('red'),
        alt.value('#FFFFFF00')        
    )
)

# text labels, white for darker heatmap colors
text = base.mark_text(baseline='middle').encode(
    text = 'term:N',
    color = alt.condition(alt.datum.tfidf >= 0.23, alt.value('white'), alt.value('black'))
)

# display the three superimposed visualizations
(heatmap + circle + text).properties(width = 600)