# Concreteness Experiments

Attempt to correlate N/V percentage with concreteness norms from Yao et al. (2017).

Yao, Zhao, et al. "Norms of valence, arousal, concreteness, familiarity, imageability, and context availability for 1,100 Chinese words." Behavior research methods 49.4 (2017): 1374-1385.

In [1]:
from collections import defaultdict
import numpy as np
import pandas as pd

In [2]:
# Extracted from Yao et al. (2017) PDF using tabula, with manual corrections
yao2017 = pd.read_csv('../data/yao2017_norms.csv')
yao2017.head()

Unnamed: 0,Chinese words,English translations,VAL_M,VAL_SD,ARO_M,ARO_SD,CON_M,CON_SD,FAM_M,FAM_SD,IMA_M,IMA_SD,AVA_M,AVA_SD,RT_M,RT_SD,ACC_M,ACC_SD
0,爱慕,admire,4.89,1.65,3.39,0.79,4.5,1.97,5.07,0.76,4.01,1.55,5.39,1.24,543.88,33.32,0.935,0.032
1,爱情,love,6.97,1.45,7.43,1.45,5.26,0.84,5.0,0.68,4.92,1.99,5.43,1.36,524.32,38.03,0.91,0.037
2,安然,comfortable,6.81,0.75,6.36,1.08,7.58,1.33,7.69,0.59,6.72,1.44,7.05,0.45,520.45,53.82,0.943,0.046
3,安神,soothing,4.51,2.2,3.41,1.74,5.32,0.54,5.44,1.75,4.52,0.39,5.05,1.19,550.95,40.74,0.881,0.035
4,安详,serene,6.86,1.37,6.11,0.45,6.83,0.53,6.83,0.81,7.05,1.78,6.35,1.12,505.79,61.24,0.913,0.016


## Break down by character

In [3]:
D = defaultdict(list)
for ix, row in yao2017.iterrows():
  for ch in row['Chinese words']:
    D[ch].append(row['CON_M'])

In [4]:
concreteness_df = []
for ch, ratings in D.items():
  concreteness_df.append(pd.Series({
    'character': ch,
    'compounds': len(ratings),
    'concreteness': np.mean(np.array(ratings))
  }))
concreteness_df = pd.DataFrame(concreteness_df)
concreteness_df.head()

Unnamed: 0,character,compounds,concreteness
0,爱,7,6.032857
1,慕,1,4.5
2,情,16,5.79125
3,安,6,5.783333
4,然,2,6.3
