### Emotional arousal and shape of words
Does the association differ between abstract and concrete concepts? What about human vs. Qwen3 model? More broadly, can sound symbolism (specifically shape sound symbolism) be extended to abstract concepts?

To test this, I used Muraki and Pexman (2025)'s data: https://osf.io/3mywq/files/43skq. Each word has ratings on arousal, which was a 9-point Likert scale rancing from 1 (calm) to 5 (neutral) to 9 (excited). Note that this is after the authors reverse coded the scale.

Used Qwen3 embedding model by Zhang et al. (2025). Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models. https://arxiv.org/abs/2506.05176

Spefically the "Qwen3-Embedding-0.6B" model: https://huggingface.co/Qwen/Qwen3-Embedding-0.6B

In [None]:
#run this using GPU instead of CPU to reduce processing time
#on google colab, go to "runtime", go to "change runtime type", click on "GPu", hit "save"

In [2]:
#install dependencies
!pip install transformers accelerate sentencepiece
import torch
from transformers import AutoModel, AutoTokenizer
import numpy as np
import pandas as pd
from google.colab import files
import io
from scipy.stats import pearsonr
import sys
import time



In [3]:
#load the model + tokenizer
#downloaded from hugginface: https://huggingface.co/Qwen/Qwen3-Embedding-0.6B
model_name = "Qwen/Qwen3-Embedding-0.6B"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(
    model_name, torch_dtype="auto", device_map="auto")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json:   0%|          | 0.00/11.4M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/727 [00:00<?, ?B/s]

`torch_dtype` is deprecated! Use `dtype` instead!


model.safetensors:   0%|          | 0.00/1.19G [00:00<?, ?B/s]

In [4]:
#test it out
text = ["This is a test sentence."]

inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True)
inputs = {k: v.to(model.device) for k, v in inputs.items()}  #sends tensors to same device as model

with torch.no_grad():
    outputs = model(**inputs)
    embeddings = outputs.last_hidden_state.mean(dim=1)  #simple mean pool

embeddings.shape  #shape of the embedding vector that the model produced. 1024 dimensions

torch.Size([1, 1024])

In [5]:
#see if notebook is operating on GPU
model.device  #GPU becasue it's "cuda"

device(type='cuda', index=0)

In [6]:
#define anchor words for shape dimension. these words are the ones that are commonly used sound symbolism research
shape_low  = ["spiky", "sharp", "pointy"]
shape_high = ["round", "curved", "smooth"]

#define anchor words for arousal dimension. these words are from Warriner et al. (2013)
arousal_low  = [   "relaxed",    "calm", "sluggish",    "dull", "sleepy"]
arousal_high = ["stimulated", "excited", "frenzied", "jittery",  "awake", "aroused"]

In [33]:
#loading data file
uploaded = files.upload() #choose "MurakiPexman2025VerbNorms" file

df = pd.read_csv(io.BytesIO(uploaded['MurakiPexman2025VerbNorms.csv']))
print(df.head())

Saving MurakiPexman2025VerbNorms.csv to MurakiPexman2025VerbNorms.csv
         Item  Arousal.Mean  Arousal.SD  Arousal.N  Arousal.Min  Arousal.Max  \
0       abase          4.70        1.06         10            2            6   
1       abate          4.43        1.34         14            2            7   
2  abbreviate          3.89        1.68         18            1            6   
3    abdicate          5.80        1.32         10            4            9   
4      abduct          7.74        1.48         19            5            9   

   Arousal.Unknown  Concreteness.Mean  Concreteness.SD  Concreteness.N  ...  \
0             11.0               2.62             0.92               8  ...   
1              7.0               2.00             1.00              11  ...   
2              NaN               2.24             1.33              25  ...   
3             11.0               2.00             1.17              17  ...   
4              1.0               4.35             0.65

In [34]:
#store the three columns in a new dataset
df_cleaned = df[["Item","Arousal.Mean", "Concreteness.Mean"]].copy()

df_cleaned.head()

Unnamed: 0,Item,Arousal.Mean,Concreteness.Mean
0,abase,4.7,2.62
1,abate,4.43,2.0
2,abbreviate,3.89,2.24
3,abdicate,5.8,2.0
4,abduct,7.74,4.35


In [25]:
#define how to extract Qwen3 embedding vectors for each words
def embed_words_batch(words, batch_size=64):
    """
    Embed a list of words using Qwen3 in GPU batches.
    Returns dict: word -> normalized numpy embedding.
    """
    embeddings = {}
    model.eval()

    for i in range(0, len(words), batch_size):
        batch = words[i:i+batch_size]

        # Tokenize (CPU) then move to GPU
        inputs = tokenizer(batch, return_tensors="pt", padding=True, truncation=True)
        inputs = {k: v.to(model.device) for k, v in inputs.items()}

        with torch.no_grad():
            outputs = model(**inputs)

            # (batch, hidden_dim)
            vecs = outputs.last_hidden_state.mean(dim=1)
            vecs = torch.nn.functional.normalize(vecs, p=2, dim=1)

        # Move batch embeddings to CPU
        vecs = vecs.float().cpu().numpy()

        # Store into dictionary
        for w, v in zip(batch, vecs):
            embeddings[w] = v

        # Progress bar
        print(f"\rEmbedded {min(i+batch_size, len(words))}/{len(words)} words...", end='')

    print("\nBatch embedding complete.")
    return embeddings

In [26]:
#define all words that we will get embedding vectors
all_words = df_cleaned['Item'].tolist()
anchor_words = shape_low + shape_high + arousal_low + arousal_high

#unique vocabulary (no repeats)
vocab = list(set(all_words + anchor_words))
print("Total unique words to embed:", len(vocab))

#n=2955 because df_cleaned has 2938 words plus the 6 anchor words for shape and 11 anchor words for arousal

Total unique words to embed: 2955


In [27]:
#extracting embedding vectors for these words
emb_dict = embed_words_batch(vocab, batch_size=64)

Embedded 2955/2955 words...
Batch embedding complete.


In [28]:
#functions to calculate shape and arousal score (dot product)
def normalize(vec):
    return vec / np.linalg.norm(vec)

def distance_with_concepts_multi_pairs_fast(word, low_anchors, high_anchors, emb_dict):
    v_word = emb_dict[word]
    #note that the individual word's vector (v_word) is normalized
    #his happens within the embed_words_batch function, where
    #torch.nn.functional.normalize(vecs, p=2, dim=1) is applied before the
    #embeddings are stored in emb_dict

    directions = []
    for h in high_anchors:
        for l in low_anchors:
            directions.append(emb_dict[h] - emb_dict[l])

    axis = normalize(np.mean(directions, axis=0))
    return np.dot(v_word, axis)

In [29]:
#calculating shape and arousal score using functions above
shape_scores = []
arousal_scores = []
total = len(df_cleaned)

for index, row in df_cleaned.iterrows():
    word = row['Item']

    progress_str = f'\rprocessing {index+1}/{total} words...'
    sys.stdout.write(progress_str)
    sys.stdout.flush()

    if word in emb_dict:
        shapescore = distance_with_concepts_multi_pairs_fast(
            word,
            shape_low,
            shape_high,
            emb_dict)
    else:
        shapescore = "na"

    if word in emb_dict:
        arousalscore = distance_with_concepts_multi_pairs_fast(
            word,
            arousal_low,
            arousal_high,
            emb_dict)
    else:
        arousalscore = "na"

    shape_scores.append(shapescore)
    arousal_scores.append(arousalscore)

print()

df_scores = pd.DataFrame({
    'Item': df_cleaned['Item'],
    'ShapeScore': shape_scores,
    'ArousalScore': arousal_scores,
})

processing 2938/2938 words...


In [30]:
df_scores.head()

Unnamed: 0,Item,ShapeScore,ArousalScore
0,abase,0.019607,0.018808
1,abate,0.014308,0.013917
2,abbreviate,-0.031624,-0.005933
3,abdicate,0.025481,0.00476
4,abduct,0.002836,0.017618


In [35]:
#append arousal and shape scores back to df_cleaned
df_merge = pd.merge(df_cleaned, df_scores, on='Item')
print(df_merge.head())
print(len(df_merge))

         Item  Arousal.Mean  Concreteness.Mean  ShapeScore  ArousalScore
0       abase          4.70               2.62    0.019607      0.018808
1       abate          4.43               2.00    0.014308      0.013917
2  abbreviate          3.89               2.24   -0.031624     -0.005933
3    abdicate          5.80               2.00    0.025481      0.004760
4      abduct          7.74               4.35    0.002836      0.017618
2938


### correlations between arousal, shape, and concreteness

arousal (human), arousal (embedding), shape (embedding), and concreteness (human)

In [36]:
from scipy.stats import spearmanr

# Columns to include in the combined matrix
cols = ['Arousal.Mean', 'ArousalScore', 'ShapeScore', 'Concreteness.Mean']

# Convert columns to numeric (coerce errors to NaN)
for col in cols:
    df_merge[col] = pd.to_numeric(df_merge[col], errors='coerce')

# Drop rows with ANY missing values in these columns
df_corr = df_merge.dropna(subset=cols)

# Create empty result DataFrame
combined_matrix = pd.DataFrame(index=cols, columns=cols, dtype=object)

# Compute Spearman correlations for all pairs
for i in cols:
    for j in cols:
        if pd.api.types.is_numeric_dtype(df_corr[i]) and pd.api.types.is_numeric_dtype(df_corr[j]):
            corr, pval = spearmanr(df_corr[i], df_corr[j])
            combined_matrix.loc[i, j] = f"{corr:.3f} ({pval:.3f})"
        else:
            combined_matrix.loc[i, j] = "N/A"

print("Combined Spearman Correlation Coefficient (p-value) Matrix:")
print(combined_matrix)

Combined Spearman Correlation Coefficient (p-value) Matrix:
                     Arousal.Mean    ArousalScore      ShapeScore  \
Arousal.Mean        1.000 (0.000)   0.272 (0.000)  -0.354 (0.000)   
ArousalScore        0.272 (0.000)   1.000 (0.000)  -0.151 (0.000)   
ShapeScore         -0.354 (0.000)  -0.151 (0.000)   1.000 (0.000)   
Concreteness.Mean   0.165 (0.000)  -0.076 (0.000)  -0.152 (0.000)   

                  Concreteness.Mean  
Arousal.Mean          0.165 (0.000)  
ArousalScore         -0.076 (0.000)  
ShapeScore           -0.152 (0.000)  
Concreteness.Mean     1.000 (0.000)  


In [None]:
#human-rated arousal is related to embedding arousal (r = .28, p < .001)
#makes sense that more arousing words are spikier
  #(r = -.34, p < .001 for human-rated arousal)
  #(r = -.11, p < .001 for embedding arousal)
#more arousig words are also more concrete, but only for human-rated arousal (r = .17, p < .001)
#more concrete words are spikier (r = -.15, p < .001)

### Interaction between arousal and concreteness
Model 1

x: **Arousal** (human), y: **ShapeScore** (embedding), modx: **Concreteness** (human)

Model 2

x: **Arousal** (embedding), y: **ShapeScore** (embedding), modx: **Concreteness** (human)

Model 3

x: **Arousal** (aggregated), y: **ShapeScore** (embedding), modx: **Concreteness**, compared between **Group** (human vs. Qwen3)

In [None]:
#data processing for regression analyses

#ensure variables are numeric
df_merge['Concreteness.Mean'] = pd.to_numeric(df_merge['Concreteness.Mean'], errors='coerce')
df_merge['Arousal.Mean'] = pd.to_numeric(df_merge['Arousal.Mean'], errors='coerce')
df_merge['ArousalScore'] = pd.to_numeric(df_merge['ArousalScore'], errors='coerce')
df_merge['ShapeScore'] = pd.to_numeric(df_merge["ShapeScore"], errors="coerce")

#scale the predictors
from sklearn.preprocessing import StandardScaler
df_merge[["Arousal.Mean_scaled", "ArousalScore_scaled", "Concreteness.Mean_scaled"]] = StandardScaler().fit_transform(df_merge[["Arousal.Mean", "ArousalScore", "Concreteness.Mean"]])

#change variable names to make sure we don't run into error
df_merge = df_merge.rename(columns={
    "Arousal.Mean_scaled": "Arousal_Mean_scaled",
    "Concreteness.Mean_scaled": "Concreteness_Mean_scaled"})

#print the data out
print(df_merge.head())


         Item  Arousal.Mean  Concreteness.Mean  ShapeScore  ArousalScore  \
0       abase          4.70               2.62    0.030835      0.014363   
1       abate          4.43               2.00    0.019576      0.010386   
2  abbreviate          3.89               2.24   -0.010304     -0.012168   
3    abdicate          5.80               2.00    0.008252     -0.001922   
4      abduct          7.74               4.35    0.010391      0.014605   

   Arousal_Mean_scaled  ArousalScore_scaled  Concreteness_Mean_scaled  
0            -0.630922            -0.348099                 -0.522837  
1            -0.871653            -0.435651                 -1.256411  
2            -1.353113            -0.932260                 -0.972447  
3             0.349831            -0.706668                 -1.256411  
4             2.079522            -0.342767                  1.524074  


In [None]:
#Model1: interaction between Arousal (human) and Concreteness (human)
import statsmodels.formula.api as smf

model1 = smf.ols("ShapeScore ~ Concreteness_Mean_scaled * Arousal_Mean_scaled", data=df_merge).fit()

print(model1.summary())

                            OLS Regression Results                            
Dep. Variable:             ShapeScore   R-squared:                       0.127
Model:                            OLS   Adj. R-squared:                  0.126
Method:                 Least Squares   F-statistic:                     141.7
Date:                Thu, 11 Dec 2025   Prob (F-statistic):           1.01e-85
Time:                        16:56:23   Log-Likelihood:                 5074.4
No. Observations:                2938   AIC:                        -1.014e+04
Df Residuals:                    2934   BIC:                        -1.012e+04
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
                                                   coef    std err          t      P>|t|      [0.025      0.975]
-------------------------------------------------------------------------------------------------

In [None]:
#main effect of Arousal (human) on Shape (embedding) (b = -0.02, p < .001)
  #more arousing -> spikier
#main effect of Concreteness (human) on Shape (embedding) (b = -0.01, p < .001)
  #more concrete -> spikier
#no interaction between Concreteness (human) and Arousal (human) (b = 0.00, p = .385)

#more arousing and concrete words tend to be spikier
#this pattern holds similarily across levels of concreteness when using human-rated arousal
#this means that human perception of arousal seems to have a consistent influence on shape meanings across levels of concreteness

In [None]:
#Model2: interaction between Arousal (embedding) and Concreteness (human)

model2 = smf.ols("ShapeScore ~ Concreteness_Mean_scaled * ArousalScore_scaled", data=df_merge).fit()

print(model2.summary())

                            OLS Regression Results                            
Dep. Variable:             ShapeScore   R-squared:                       0.051
Model:                            OLS   Adj. R-squared:                  0.050
Method:                 Least Squares   F-statistic:                     52.97
Date:                Thu, 11 Dec 2025   Prob (F-statistic):           2.44e-33
Time:                        17:02:39   Log-Likelihood:                 4953.2
No. Observations:                2938   AIC:                            -9898.
Df Residuals:                    2934   BIC:                            -9874.
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
                                                   coef    std err          t      P>|t|      [0.025      0.975]
-------------------------------------------------------------------------------------------------

In [None]:
#main effect of Arousal (embedding) on Shape (embedding) (b = -0.01, p < .001)
  #more arousing -> spikier
#main effect of Concreteness (human) on Shape (embedding) (b = -0.01, p < .001)
  #more concrete -> spikier
#no interaction between Concreteness (human) and Arousal (embedding)

#more arousing and more concrete words are spikier
#this pattern holds similarily across levels of concreteness when using embedding arousal scores
#this means that embedding representations of arousal have a consistent influence on shape meanings across levels of concreteness

In [None]:
#data processing for Model3

#reshape to long format
df_long = df_merge.melt(
    id_vars=["Item", "Concreteness_Mean_scaled", "ShapeScore"],
    value_vars=["Arousal_Mean_scaled", "ArousalScore_scaled"],
    var_name="Group",
    value_name="Arousal")

#convert group names to match desired output
df_long["Group"] = df_long["Group"].map({
    "Arousal_Mean_scaled": "human",
    "ArousalScore_scaled": "Qwen3"
})

print(df_long.head())

         Item  Concreteness_Mean_scaled  ShapeScore  Group   Arousal
0       abase                 -0.522837    0.030835  human -0.630922
1       abate                 -1.256411    0.019576  human -0.871653
2  abbreviate                 -0.972447   -0.010304  human -1.353113
3    abdicate                 -1.256411    0.008252  human  0.349831
4      abduct                  1.524074    0.010391  human  2.079522


In [None]:
#Model3: interaction between Arousal (aggregated), Concreteness (human), and Group (human vs. Qwen3)

model3 = smf.ols("ShapeScore ~ Concreteness_Mean_scaled * Arousal * Group", data=df_long).fit()

print(model3.summary())

                            OLS Regression Results                            
Dep. Variable:             ShapeScore   R-squared:                       0.089
Model:                            OLS   Adj. R-squared:                  0.088
Method:                 Least Squares   F-statistic:                     81.85
Date:                Thu, 11 Dec 2025   Prob (F-statistic):          6.60e-114
Time:                        17:08:45   Log-Likelihood:                 10025.
No. Observations:                5876   AIC:                        -2.003e+04
Df Residuals:                    5868   BIC:                        -1.998e+04
Df Model:                           7                                         
Covariance Type:            nonrobust                                         
                                                      coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------------------

In [None]:
###interpreting important output

#no effect of Group (b = -0.000, p = 0.986). this makes sense becasue ShapeScore is the same across group

#main effect of concreteness (b = -0.008, p < .001), meaning that concrete words are spikier

#main effect of arousal (b = -0.007, p < .001), meaning that arousing words are spikier

#no interaction between concreteness and arousal (b = -0.001, p = .119)
#meaning that the association between arousal and shape doesn't vary across concreteness level

#no three-way interaction (b = 0.002, p = 0.086)
#meaning that the effects of concreteness and arousal on shape do not change between human vs. Qwen3