---
title: "Bonus: Finding Semantic Directions"
format: 
  html:
    toc: true
    self-contained: true
jupyter: python3
---

## Discovering Semantic Directions

Can I find the directional representation of an abstract concepts "evil"?

In [18]:
import pickle
import numpy as np
from gensim.models import KeyedVectors, Word2Vec
from sklearn.metrics.pairwise import cosine_similarity
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')

In [19]:
# Load FastText embeddings
embeddings = KeyedVectors.load('fasttext-wiki-news-subwords-300.model', mmap='r')

## Finding Semantic Directions

We'll compute directions by averaging multiple word pair differences

In [20]:
def find_direction(word_pairs, embeddings):
    directions = []
    for w1, w2 in word_pairs:
        if w1 in embeddings and w2 in embeddings:
            direction = embeddings[w2] - embeddings[w1]
            directions.append(direction)
    if not directions:
        return None
    avg_direction = np.mean(directions, axis=0)
    return avg_direction / np.linalg.norm(avg_direction)  

def apply_direction(word, direction, embeddings, top_n=5):
    if word not in embeddings:
        return []
    result_vec = embeddings[word] + direction
    similar = embeddings.similar_by_vector(result_vec, topn=top_n + 1)
    return [(w, s) for w, s in similar if w != word][:top_n]

## The "Evil/Corruption" Direction

Find the mathematical direction that transforms good/neutral concepts into evil/corrupted ones

In [23]:
# Rairs that represent "evil/corruption" transformation
evil_pairs = [
    ('knight', 'warlord'),
    ('wizard', 'sorcerer'),
    ('priest', 'cultist'),
    ('medicine', 'poison'),
    ('truth', 'propaganda'),
    ('justice', 'revenge'),
    ('law', 'tyranny'),
    ('virtue', 'vice'),
    ('loyalty', 'betrayal'),
]

# Find the "evil" direction
evil_dir = find_direction(evil_pairs, embeddings)

# Test the direction on new words
test_words = ['teacher', 'scientist', 'leader', 'wizard', 'knight']
for word in test_words:
    results = apply_direction(word, evil_dir, embeddings, top_n=3)
    if results:
        print(f"{word:5}: {results[0][0]} (similarity: {results[0][1]:.3f})")
        print(f"Other versions: {', '.join([w for w, s in results[1:]])}")
        print()

teacher: ex-teacher (similarity: 0.481)
Other versions: schoolmate, tormentor

scientist: pseudo-scientist (similarity: 0.524)
Other versions: mad-scientist, pseudoscientist

leader: leader- (similarity: 0.538)
Other versions: co-leader, sub-leader

wizard: sorcerer (similarity: 0.566)
Other versions: warlock, wizards

knight: knight-errant (similarity: 0.536)
Other versions: marauder, goblin



In [24]:
evil_word_vec = embeddings['evil']
evil_word_normalized = evil_word_vec / np.linalg.norm(evil_word_vec)
similarity = cosine_similarity([evil_dir], [evil_word_normalized])[0][0]
print(similarity)

0.10707768


I attempted to find an "eigenvector" for the concept of evil. This idea that abstract concepts like "evil" and "corruption" can be represented as directions in embedding space is based of the fact that words that appear in similar contexts have similar meanings. Word embeddings are learned so that semantic relationships are encoded as geometric patterns When a model is trained, the model will be made such that certain word transformations consistently appear in the same direction. For example "knight" and "warlord" both appear in military contexts, but "warlord" co-occurs more frequently with words like "brutal," "conquest," and "tyranny". Similarly "medicine" and "poison" both relate to health, but "poison" appears near "deadly," "toxic," and "harmful". The model learns these contextual associations in the embedding space, just like "king - man + woman â‰ˆ queen" gender is encoded as a consistent direction. 

I tried to extracted is a representation for evil or corruction by averaging 9 word pair differences. Despite using diverse, non-related pairs, the model found a direction that was applicable to unseen words. Applying this direction new words produces their corrupted versions, showing the learned direction captures the intended semantic shift. For example:

     Teacher trasnforms to "tormentor"
     Scientist transforms to "mad-scientist"
     Wizard transforms to "warlock"

This shows that the embeddings for the concept of evil encode more than just semantic meaning but also ethical moral associations, making abstract concepts like "corruption" navigable through vector math. We see that the conceptal representation of evil learned from the word pairs, isn't actually very similar to the embeddings of the word "evil". This reveals a distinction about the word "evil" as a static noun or adjective, and our learned direction representing a transformation of corruption. The corruption direction I found encodes the change from good to evil, a dynamic relationship rather than a static one. This lower similarity validates my exploration, were I wasn't trying to find words similar to "evil," but rather find the geometric operation that corrupts concepts, which is a different type of semantic relationship.

However, limitations to exist. Often the most logical word wasn't the most similar word, but one of slightly further down the similarity list. Some words, like Leader, also didn't give me the words that I expected like dictator, but it gave other words synonymous with leader. So the quality depends on the training pairs chosen as well as the vocabulary coverage of the embedding model. In the end, this demonstrates that mbeddings capture abstract transformations geometrically, where similar semantic shifts create parallel vectors, enabling us to learn and apply complex operations like "corruption" through simple vector math.