# Valence in Space

In this project, we are interested in the association between valence and different spatial orientations (vertical, horizontal, towards-away).

In this notebook, we obtain semantic projection scores for words along a valence axis as well as the various spatial axes in the GloVe embedding space.

The words we project are ones for which we also have human ratings. These are from a previous study by Meteyard and Vigliocco (see link below. Later, we will use the projection scores to compare in howfar valence is associated with the different spatial axes.

https://doi.org/10.3758/BRM.41.2.565



# Load libraries

In [None]:
!pip install --upgrade gensim



In [None]:
import gensim
import numpy
import re
import string
import pandas
from numpy import linalg
import pandas as pd
import numpy as np
from google.colab import files


# Load pre-trained GloVe embeddings

In [None]:
!wget 'https://nlp.stanford.edu/data/glove.42B.300d.zip'
!unzip glove.42B.300d.zip
from gensim.scripts.glove2word2vec import glove2word2vec
glove_input = 'glove.42B.300d.txt'
from gensim.models import KeyedVectors
glove = KeyedVectors.load_word2vec_format(glove_input, binary = False, no_header = True)

--2025-05-07 15:57:47--  https://nlp.stanford.edu/data/glove.42B.300d.zip
Resolving nlp.stanford.edu (nlp.stanford.edu)... 171.64.67.140
Connecting to nlp.stanford.edu (nlp.stanford.edu)|171.64.67.140|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://downloads.cs.stanford.edu/nlp/data/glove.42B.300d.zip [following]
--2025-05-07 15:57:47--  https://downloads.cs.stanford.edu/nlp/data/glove.42B.300d.zip
Resolving downloads.cs.stanford.edu (downloads.cs.stanford.edu)... 171.64.64.22
Connecting to downloads.cs.stanford.edu (downloads.cs.stanford.edu)|171.64.64.22|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1877800501 (1.7G) [application/zip]
Saving to: ‘glove.42B.300d.zip’


2025-05-07 16:03:41 (5.07 MB/s) - ‘glove.42B.300d.zip’ saved [1877800501/1877800501]

Archive:  glove.42B.300d.zip
  inflating: glove.42B.300d.txt      


# Projections with Antonym Axes


We create a semantic axis by "drawing a line" between antonyms representing that axis. For example, to represent a vertical axis, we draw a line from the word embeddings for "up" and "down".
We create four semantic axes
- valence: good - bad
- vertical orientation: up - down
- horizontal orientation: left - right
- towards-away orientation: towards - away

<br>

**NB:** The semantic axes really represent a direction, from one of the poles to the other. The vector resulting from the subtraction points from the subtracted embedding to the embedding that is being subtracted from, e.g. the direction of the valence axis is from "bad" to "good". This is important for the interpretation of the projections: the higher the projection score of a word, the closer it projects towards the "good" pole of the axis, the lower the projection score, the closer to the "bad" pole of the axis.

### 1. Creating Antonym Axes


In [None]:
valence_axis = glove["good"] - glove["bad"]
vertical_axis = glove["up"] - glove["down"]
horizontal_axis = glove["right"] - glove["left"]
towards_away_axis = glove["towards"] - glove["away"]

### 2. Obtaining Projections for all Words in the GloVe Model

Get a list of all words with embeddings in the GloVe model

In [None]:
all_words = list(glove.key_to_index.keys())

Get projections for each axis

In [None]:
all_valence_projections = []
for i in all_words:
  dotproduct = numpy.dot(glove[i], valence_axis)
  all_valence_projections.append(dotproduct)

In [None]:
all_vertical_projections = []
for i in all_words:
  dotproduct = numpy.dot(glove[i], vertical_axis)
  all_vertical_projections.append(dotproduct)

In [None]:
all_horizontal_projections = []
for i in all_words:
  dotproduct = numpy.dot(glove[i], horizontal_axis)
  all_horizontal_projections.append(dotproduct)

In [None]:
all_towards_away_projections = []
for i in all_words:
  dotproduct = numpy.dot(glove[i], towards_away_axis)
  all_towards_away_projections.append(dotproduct)

Combine all words and their projections along the different axes into a dataframe

In [None]:
glove_space_projections = pd.DataFrame(list(zip(all_words, all_valence_projections, all_vertical_projections, all_horizontal_projections,
                                      all_towards_away_projections)), columns = ['words', 'valence_proj', 'vertical_proj', 'horizontal_proj', 'towards_away_proj'])

Write dataset to file and download

In [None]:
glove_space_projections.to_csv('glove_space_projections_Feb25.csv')
files.download('glove_space_projections_Feb25.csv')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

# Projections with Rating Centroid Axes

### Define a function for obtaining rating centroid axes and their projections

Some of the ratings we work with have a single scale, where a higher rating means more towards one end of the scale and a lower means towards the other. Other ratings operate so that each word has a rating for either end; e.g. an up rating and a down rating. In the former, we construct one pole from the highest scoring words, and the other from the lowest scoring words. In the latter case, we take the highest scoring of the one pole and the highest scoring of the second pole.

In [None]:
def get_rating_proj_1scale(ratings, n_words, column, model):

  #get a list of the n most and least iconic words
  top = ratings.sort_values(by = str(column), ascending = False).iloc[0:n_words]
  top = top['word'].to_list()
  bottom = ratings.sort_values(by = str(column), ascending = True).iloc[0:n_words]
  bottom = bottom['word'].to_list()

  #calculate the centroid for both sets of words
  #this is given by the average of the vectors for those words
  top_centroid = np.average([model[w] for w in top if w in model.key_to_index], axis = 0)
  bottom_centroid = np.average([model[w] for w in bottom if w in model.key_to_index], axis = 0)


  #fit the axis between the centroids
  axis = top_centroid - bottom_centroid

  #we don't want to obtain projections for the words in the centroids in general
  #this would mess with the correlation results and regression later on
  #create a list of all rated words
  all_words = ratings['word'].to_list()

  #delete the centroid words from the complete list and make a new
  #list of words to be projected
  words_minus_top = [i for i in all_words if i not in top]
  projection_words = [i for i in words_minus_top if i not in bottom]

  #initialise an empty list to hold the projection scores
  projection_scores = []

  #go over the words to be projected
  for i in projection_words:
    if i in model.key_to_index: #check if the word has a vector representation in glove
      dotproduct = numpy.dot(model[i], axis) #calculate the words projection onto the axis
      projection_scores.append(dotproduct) #add the projection score to the list

  #initialise an empty list for all the words that were found in glove
  words_in_model = []

  #for all words to be projected (those in the ratings - centroid words)
  for i in projection_words:
    if i in model.key_to_index: #check if the word has an embedding in glove
      word = i
      words_in_model.append(word) #add word to words_in_model list

  column_name = 'projection_' + str(n_words)

  #combine words in model and projection scores into a dataframe
  proj_data = pd.DataFrame(list(zip(words_in_model, projection_scores)),
                                 columns = ['word', column_name])

  #combine projection data with rating data by word
  full_proj_data = proj_data.merge(ratings, how = 'left', on = 'word')

  #return the data frame
  return full_proj_data

The different number of words we are testing per centroid

In [None]:
n_words = [5, 10, 25, 50]

### Valence Warriner

In [None]:
war = pd.read_csv("warriner_et_al.csv", usecols = ['Word', 'V.Mean.Sum'])

rename Word column to word

In [None]:
war = war.rename(columns = {'Word': 'word', 'V.Mean.Sum': 'Val'})

In [None]:
for i in n_words:
  war_proj = get_rating_proj_1scale(war, i, 'Val', glove)
  proj_col = [col for col in war_proj.columns if col not in ['word', 'Val']][0]
  war = war.merge(war_proj[['word', 'Val', proj_col]], how = 'left', on = ['word', 'Val'])

In [None]:
war.to_csv('war_proj.csv')
files.download('war_proj.csv')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

### Vertical Orientation Goodhew and Kidd

In [None]:
gk = pd.read_csv("goodhew_kidd.csv")

In [None]:
gk

Unnamed: 0,Word,up_down
0,Lucky,1.000000
1,Funny,0.978022
2,Proud,0.934066
3,Loving,0.956044
4,Merry,0.978022
...,...,...
495,Note,0.181818
496,Speak,0.887640
497,Walk,0.460674
498,Business,0.191011


In [None]:
gk = gk.rename(columns = {'Word': 'word'})

In [None]:
gk['word'] = gk['word'].str.lower()

In [None]:
gk

Unnamed: 0,word,up_down
0,lucky,1.000000
1,funny,0.978022
2,proud,0.934066
3,loving,0.956044
4,merry,0.978022
...,...,...
495,note,0.181818
496,speak,0.887640
497,walk,0.460674
498,business,0.191011


In [None]:
for i in n_words:
  gk_proj = get_rating_proj_1scale(gk, i, 'up_down', glove)
  proj_col = [col for col in gk_proj.columns if col not in ['word', 'up_down']][0]
  gk = gk.merge(gk_proj[['word', 'up_down', proj_col]], how = 'left', on = ['word', 'up_down'])


In [None]:
gk.to_csv('gk_proj.csv')
files.download('gk_proj.csv')

### Vertical Motion, Horizontal Motion and Sagittal Motion Meteyard and Vigliocco

In [None]:
vm = pd.read_csv("meteyard_vigliocco.csv", usecols = ['verb', 'upwrd', 'dwnwrd', 'left', 'right', 'toward', 'away'])

In [None]:
vm = vm.rename(columns = {'verb': 'word'})

In [None]:
vm

Unnamed: 0,word,upwrd,dwnwrd,left,right,toward,away
0,abandon,1.44,1.97,2.03,2.06,4.44,3.06
1,abduct,2.00,1.10,2.27,2.33,3.47,4.23
2,accept,1.57,0.93,2.47,2.40,5.03,2.33
3,acquire,1.70,1.30,2.47,2.40,4.47,3.40
4,admire,3.20,0.53,1.80,1.33,0.10,6.77
...,...,...,...,...,...,...,...
294,wilt,0.00,4.13,0.67,0.87,2.97,2.70
295,wither,0.40,4.50,0.43,0.83,2.80,3.37
296,worry,0.61,1.71,2.36,2.25,1.00,5.93
297,worship,3.67,0.33,0.80,0.57,0.10,5.27


#### Calculate Single Scales for All 3 Dimensions

In [None]:
vm['vert'] = vm['upwrd'] - vm['dwnwrd']
vm['hor'] = vm['right'] - vm['left']
vm['sag'] = vm['toward'] - vm['away']

In [None]:
vm

Unnamed: 0,word,upwrd,dwnwrd,left,right,toward,away,vert,hor,sag
0,abandon,1.44,1.97,2.03,2.06,4.44,3.06,-0.53,0.03,1.38
1,abduct,2.00,1.10,2.27,2.33,3.47,4.23,0.90,0.06,-0.76
2,accept,1.57,0.93,2.47,2.40,5.03,2.33,0.64,-0.07,2.70
3,acquire,1.70,1.30,2.47,2.40,4.47,3.40,0.40,-0.07,1.07
4,admire,3.20,0.53,1.80,1.33,0.10,6.77,2.67,-0.47,-6.67
...,...,...,...,...,...,...,...,...,...,...
294,wilt,0.00,4.13,0.67,0.87,2.97,2.70,-4.13,0.20,0.27
295,wither,0.40,4.50,0.43,0.83,2.80,3.37,-4.10,0.40,-0.57
296,worry,0.61,1.71,2.36,2.25,1.00,5.93,-1.10,-0.11,-4.93
297,worship,3.67,0.33,0.80,0.57,0.10,5.27,3.34,-0.23,-5.17


#### Projections

up - down

In [None]:
for i in n_words:
  vm_proj = get_rating_proj_1scale(vm, i, 'vert', glove)
  proj_col = [col for col in vm_proj.columns if col not in ['word', 'vert']][0]
  vm = vm.merge(vm_proj[['word', 'vert', proj_col]], how = 'left', on = ['word', 'vert'])

In [None]:
vm.rename(columns = {'projection_5': 'projection_5_vert', 'projection_10': 'projection_10_vert', 'projection_25': 'projection_25_vert',
                     'projection_50' : 'projection_50_vert'}, inplace = True)


right - left

In [None]:
for i in n_words:
  vm_proj = get_rating_proj_1scale(vm, i, 'hor', glove)
  proj_col = [col for col in vm_proj.columns if col not in ['word', 'hor']][0]
  vm = vm.merge(vm_proj[['word', 'hor', proj_col]], how = 'left', on = ['word', 'hor'])

In [None]:
vm.rename(columns = {'projection_5': 'projection_5_hor', 'projection_10': 'projection_10_hor', 'projection_25': 'projection_25_hor',
                     'projection_50' : 'projection_50_hor'}, inplace = True)

toward-away

In [None]:
for i in n_words:
  vm_proj = get_rating_proj_1scale(vm, i, 'sag', glove)
  proj_col = [col for col in vm_proj.columns if col not in ['word', 'sag']][0]
  vm = vm.merge(vm_proj[['word', 'sag', proj_col]], how = 'left', on = ['word', 'sag'])

In [None]:
vm.rename(columns = {'projection_5': 'projection_5_sag', 'projection_10': 'projection_10_sag', 'projection_25': 'projection_25_sag',
                     'projection_50' : 'projection_50_sag'}, inplace = True)

In [None]:
vm

Unnamed: 0,word,upwrd,dwnwrd,left,right,toward,away,vert,hor,sag,...,projection_25_vert,projection_50_vert,projection_5_hor,projection_10_hor,projection_25_hor,projection_50_hor,projection_5_sag,projection_10_sag,projection_25_sag,projection_50_sag
0,abandon,1.44,1.97,2.03,2.06,4.44,3.06,-0.53,0.03,1.38,...,-0.224644,0.704726,-0.094538,0.459564,-1.966455,-1.145001,0.910517,-0.711537,-0.372333,
1,abduct,2.00,1.10,2.27,2.33,3.47,4.23,0.90,0.06,-0.76,...,0.452585,-0.045619,2.662090,3.958709,2.751269,0.853767,0.766552,-0.777276,-0.241214,-0.979298
2,accept,1.57,0.93,2.47,2.40,5.03,2.33,0.64,-0.07,2.70,...,1.148747,2.411886,1.552678,0.459734,-4.561045,-3.377840,-2.852697,-1.329933,-2.302672,
3,acquire,1.70,1.30,2.47,2.40,4.47,3.40,0.40,-0.07,1.07,...,2.458971,2.774814,4.671803,2.718266,-2.655052,-2.796312,-1.224734,-1.074666,-2.035921,
4,admire,3.20,0.53,1.80,1.33,0.10,6.77,2.67,-0.47,-6.67,...,2.350796,,2.434420,0.180487,-2.782614,-1.915436,-1.951549,-2.170799,-1.321160,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
294,wilt,0.00,4.13,0.67,0.87,2.97,2.70,-4.13,0.20,0.27,...,,,-3.281404,-1.960279,-2.117159,0.260891,1.045790,0.752539,1.548444,0.898749
295,wither,0.40,4.50,0.43,0.83,2.80,3.37,-4.10,0.40,-0.57,...,,,-1.814232,-0.024384,0.878748,,1.427422,0.840419,1.783522,2.037467
296,worry,0.61,1.71,2.36,2.25,1.00,5.93,-1.10,-0.11,-4.93,...,0.119391,0.383987,-1.395065,-1.369195,-4.863720,-3.268044,0.683886,-0.259807,-0.748371,-1.388759
297,worship,3.67,0.33,0.80,0.57,0.10,5.27,3.34,-0.23,-5.17,...,3.608669,,0.255721,-2.170650,-5.212754,-3.843907,0.041494,-0.826228,-0.982121,-3.308030


In [None]:
vm.to_csv('vm_proj.csv')
files.download('vm_proj.csv')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

# Reduce GloVe model to 2 Dimensions

Finally, I will create a dimensionality reduced version of the GloVe embedding space for the purpose of creating visualisations.

In [None]:
from sklearn.decomposition import PCA

In [None]:
all_glove_embeddings = glove[all_words]
pca = PCA(n_components=2)
reduced_words = pca.fit_transform(all_glove_embeddings)

In [None]:
reduced_words_pca = pd.DataFrame(reduced_words, columns=['dimension1', 'dimension2'], index = all_words)

In [None]:
reduced_words_pca.to_csv('all_glove_embeddings_2dim.csv')
files.download('all_glove_embeddings_2dim.csv')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>