# AI for Humanists **Demo**: Measuring Word Similarity with BERT (English Language Public Domain Poems)

By [The AI for Humanists](https://melaniewalsh.github.io/BERT-for-Humanists/) Team (formerly the BERT for Humanists team)

<a href="https://www.aiforhumanists.com/"> <img src="https://www.aiforhumanists.com/assets/images/AI-for-Humanists-logo-tahoma-v7-no-outline.png" alt="logo" width="300"/></a>

One of the most powerful things about large language models (LLMs) is that they can understand words in context. This  allows us to measure the similarity of different words in a collection of texts, and to even measure the similarity of different uses of the same word.

The interactive plots below show the results of running a collection of English language poems (ranging from the 16th to 20th Century) through a BERT model (you can [find the full Colab notebook where we do this here](https://colab.research.google.com/drive/1r_eoi8CMea_a3YjWC1M4EmTqKMGVMbzQ?usp=sharing)). Through this process, we are able to get vectors for each use of a word in the entire collection, and we are able to map these contextual uses in an embedding space.

## Explore the plots below. What do you notice about the way BERT works? What does BERT capture well or not so well?

- *You can hover over each point to see the instance of a word in context*
- *If you press `Shift` and click on a point, you will be taken to the original poem on Public-Domain-Poetry.com*



In [None]:
#@title Art, Nature, Religion, Science { display-mode: "form" }
import pandas as pd
import altair as alt

url = "https://raw.githubusercontent.com/melaniewalsh/BERT-4-Humanists/main/data/bert-word-nature.csv"
df = pd.read_csv(url, encoding='utf-8')

keywords = ['art', 'nature', 'religion', 'science']
color_by = 'word'

alt.Chart(df, title=f"Word Similarity: {', '.join(keywords).title()}").mark_circle(size=200).encode(
    alt.X('x',
        scale=alt.Scale(zero=False)
    ), y="y",
    color= color_by,
    href="link",
    tooltip=['title', 'word', 'poem_title', 'author', 'period']
    ).interactive().properties(
    width=500,
    height=500
)

In [None]:
#@title Ring { display-mode: "form" }
import pandas as pd
import altair as alt

url = "https://raw.githubusercontent.com/melaniewalsh/BERT-4-Humanists/main/data/bert-word-ring.csv"
df = pd.read_csv(url, encoding='utf-8')

keywords = ['ring']
color_by = 'word'

alt.Chart(df, title=f"Word Similarity: {', '.join(keywords).title()}").mark_circle(size=200).encode(
    alt.X('x',
        scale=alt.Scale(zero=False)
    ), y="y",
    color= color_by,
    href="link",
    tooltip=['title', 'word', 'poem_title', 'author', 'period']
    ).interactive().properties(
    width=500,
    height=500
)

- *You can hover over each point to see the instance of a word in context*
- *If you press `Shift` and click on a point, you will be taken to the original poem on Public-Domain-Poetry.com*



In [None]:
#@title Ring (Colored By Time Period) { display-mode: "form" }
import pandas as pd
import altair as alt

url = "https://raw.githubusercontent.com/melaniewalsh/BERT-4-Humanists/main/data/bert-word-ring.csv"
df = pd.read_csv(url, encoding='utf-8')

keywords = ['ring']
color_by = 'period'

alt.Chart(df, title=f"Word Similarity: {', '.join(keywords).title()}").mark_circle(size=200).encode(
    alt.X('x',
        scale=alt.Scale(zero=False)
    ), y="y",
    color= color_by,
    href="link",
    tooltip=['title', 'word', 'poem_title', 'author', 'period']
    ).interactive().properties(
    width=500,
    height=500
)

- *You can hover over each point to see the instance of a word in context*
- *If you press `Shift` and click on a point, you will be taken to the original poem on Public-Domain-Poetry.com*



In [None]:
#@title Body Parts { display-mode: "form" }
import pandas as pd
import altair as alt

url = "https://raw.githubusercontent.com/melaniewalsh/BERT-4-Humanists/main/data/bert-word-heart.csv"
df = pd.read_csv(url, encoding='utf-8')

keywords = ['heart', 'head', 'eye', 'arm', 'leg']
color_by = 'word'

alt.Chart(df, title=f"Word Similarity: {', '.join(keywords).title()}").mark_circle(size=200).encode(
    alt.X('x',
        scale=alt.Scale(zero=False)
    ), y="y",
    color= color_by,
    href="link",
    tooltip=['title', 'word', 'poem_title', 'author', 'period']
    ).interactive().properties(
    width=500,
    height=500
)

- *You can hover over each point to see the instance of a word in context*
- *If you press `Shift` and click on a point, you will be taken to the original poem on Public-Domain-Poetry.com*

