# Speech Perception
Based on Ch. 5 of Johnson, Keith. (2012). _Acoustic and Auditory Phonetics_. 3rd Ed. [Wiley-Blackwell](https://www.wiley.com/en-us/Acoustic+and+Auditory+Phonetics%2C+3rd+Edition-p-9781444343083).

---

## Programming Environment

In [2]:
import numpy  as np
import pandas as pd

---

Speech perception is the active, intentional perception of speech sounds as opposed to the meaning of speech.

the detection of mispronunciations or speech errors, more prevelant in?
* word-initial or -medial
* vowels or consonants
* nouns and verbs or grammatical words

speech perception is shaped by general properties of the auditory system that determine
* what can and cannot be heard
* what cues will be recoverable in particular segmental contexts
* how adjacent sounds will influence each other

the cochlea's nonlinear frequency scale probably underlies the fact that no language distinguishes fricatives on the basis of frequency components above 6000 Hz

What affects what we hear?
1. The nature of the auditory system constrains our perception of speech sounds
    1. the nonlinearity of the cochlea's frequency scale: no language distinguishes fricatives on the basis of frequency components above 6K Hz
    2. VOT: aspirated stop vs unaspirated stop
    3. Compensation for Coarticulation
2. Phonetic knowledge applied to speech sounds affects its perception
    1. Categorical Perception (Johnson and Ralston 1994; Remez et al 1981: sine wave analogs; Best 1995; Flege 1995)
        * Categorical Magnets (Kuhl et al 1992)
    2. Coherence
        * DUPLEX PERCEPTION
        * MCGURK EFFECT
3. Lexical (word; morpheme) knowledge applied to speech sounds affects its perception
    * slips of the ear (Bond, Zinny 1999)
    * WORD MAGNETS (Ganong 1980)
    * PHONEME RESTORATION (Warren 1970; Samuel, Arthur 1991)
    * (Elman, Jeff and McClelland, Jay 1988)

---

## Measuring Perceptual Similarity via Multidimensional Scaling

Data from Miller, George & Patricia Nicely. (1955). "An analysis of perceptual confusions among some English consonants."

Mathematical approach from Shepard, Roger. (1972). "Psychological Representation of Speech Sounds".

In [13]:
cm = np.array([
  [199,  0, 46,  1,  4,  0,  0,14],
  [  3,177,  1, 29,  0,  4,  0,22],
  [ 85,  2,114,  0, 10,  0,  0,21],
  [  0, 64,  0,105,  0, 18,  0,17],
  [  5,  0, 38,  0,170,  0,  0,15],
  [  0,  4,  0, 22,  0,132, 17,49],
  [  0,  0,  0,  4,  0,  8,189,59],
])
cm

array([[199,   0,  46,   1,   4,   0,   0,  14],
       [  3, 177,   1,  29,   0,   4,   0,  22],
       [ 85,   2, 114,   0,  10,   0,   0,  21],
       [  0,  64,   0, 105,   0,  18,   0,  17],
       [  5,   0,  38,   0, 170,   0,   0,  15],
       [  0,   4,   0,  22,   0, 132,  17,  49],
       [  0,   0,   0,   4,   0,   8, 189,  59]])

In [29]:
fricatives=['f','v','th','dh','s','z','d']
cmdf = pd.DataFrame(
  data   =cm,
  index  =fricatives,
  columns=fricatives+['other'],
)
cmdf['total']=cmdf.sum(axis=1)
cmdf

Unnamed: 0,f,v,th,dh,s,z,d,other,total
f,199,0,46,1,4,0,0,14,264
v,3,177,1,29,0,4,0,22,236
th,85,2,114,0,10,0,0,21,232
dh,0,64,0,105,0,18,0,17,204
s,5,0,38,0,170,0,0,15,228
z,0,4,0,22,0,132,17,49,224
d,0,0,0,4,0,8,189,59,260


In [58]:
# # submatrices
# for i in range(7):
#   for j in range(7):
#     if i!=j:
#       print()
#       c=cmdf.iloc[[i,j],[i,j]]
#       c.iloc[0]=c.iloc[0]/cmdf.iloc[i,-1]
#       c.iloc[1]=c.iloc[1]/cmdf.iloc[j,-1]
#       print(c.round(2))

The Shepard similarity between category $i$ and category $j$ is

$
\begin{aligned}
\text{Shepard similarity}\,\,\,
S_{ij}=\frac{p_{ij}+p_{ji}}{p_{ii}+p_{jj}}
\end{aligned}
$

$
\begin{aligned}
\text{Johnson approximation of Shepard similarity}\,\,\,
S_{ij}=\frac{p_{ij}+p_{ji}}{2}
\end{aligned}
$

Perceptual distance $d_{ij}$ according to Shepard's Law (that similarity is exponentially related to perceptual distance)

$
\begin{aligned}
\text{perceptual distance}\,\,\,
d_{ij}=-\ln(S_{ij})
\iff
e^{-d_{ij}}=S_{ij}
\end{aligned}
$

In [93]:
# proportions
ps=cmdf.drop(columns='total').div(cmdf.total,axis=0)
ps.round(2)

Unnamed: 0,f,v,th,dh,s,z,d,other
f,0.75,0.0,0.17,0.0,0.02,0.0,0.0,0.05
v,0.01,0.75,0.0,0.12,0.0,0.02,0.0,0.09
th,0.37,0.01,0.49,0.0,0.04,0.0,0.0,0.09
dh,0.0,0.31,0.0,0.51,0.0,0.09,0.0,0.08
s,0.02,0.0,0.17,0.0,0.75,0.0,0.0,0.07
z,0.0,0.02,0.0,0.1,0.0,0.59,0.08,0.22
d,0.0,0.0,0.0,0.02,0.0,0.03,0.73,0.23


In [112]:
# similarities
Ss=np.array([
  (ps.iloc[i,j]+ps.iloc[j,i])/(ps.iloc[i,i]+ps.iloc[j,j])
  for j in range(7)
  for i in range(7)
]).reshape(7,7)
Ssdf=pd.DataFrame(data=Ss,index=fricatives,columns=fricatives).clip(lower=1e-10)
Ssdf.round(3)

Unnamed: 0,f,v,th,dh,s,z,d
f,1.0,0.008,0.434,0.003,0.025,0.0,0.0
v,0.008,1.0,0.01,0.345,0.0,0.026,0.0
th,0.434,0.01,1.0,0.0,0.17,0.0,0.0
dh,0.003,0.345,0.0,1.0,0.0,0.169,0.012
s,0.025,0.0,0.17,0.0,1.0,0.0,0.0
z,0.0,0.026,0.0,0.169,0.0,1.0,0.081
d,0.0,0.0,0.0,0.012,0.0,0.081,1.0


In [113]:
# distances
ds=-np.log(Ssdf)
ds.clip(lower=1e-10).round(3)

Unnamed: 0,f,v,th,dh,s,z,d
f,0.0,4.773,0.834,5.814,3.7,23.026,23.026
v,4.773,0.0,4.57,1.064,23.026,3.65,23.026
th,0.834,4.57,0.0,23.026,1.774,23.026,23.026
dh,5.814,1.064,23.026,0.0,23.026,1.779,4.391
s,3.7,23.026,1.774,23.026,0.0,23.026,23.026
z,23.026,3.65,23.026,1.779,23.026,0.0,2.513
d,23.026,23.026,23.026,4.391,23.026,2.513,0.0


---

## Resources

Casey Connor
* [[Y](https://www.youtube.com/watch?v=qdTloDvvy10)] Casey Connor. (14 Jan 2022). "Part 7/5 of Psychoacoustics / Audio Illusions". YouTube.
* [[Y](https://www.youtube.com/watch?v=wLcqQOJmnio)] Casey Connor. (14 Jan 2022). "Part 6/5 of Psychoacoustics / Audio Illusions". YouTube.
* [[Y](https://www.youtube.com/watch?v=YQNsCg4z6L8)] Casey Connor. (18 Apr 2020). "42 Audio Illusions & Phenomena! - Part 5/5 of Psychoacoustics". YouTube.
* [[Y](https://www.youtube.com/watch?v=WMHyYCk7OqE)] Casey Connor. (12 Apr 2020). "42 Audio Illusions & Phenomena! - Part 4/5 of Psychoacoustics". YouTube.
* [[Y](https://www.youtube.com/watch?v=TVsMiSrlSSc)] Casey Connor. (08 Apr 2020). "42 Audio Illusions & Phenomena! - Part 3/5 of Psychoacoustics". YouTube.
* [[Y](https://www.youtube.com/watch?v=fBMli2YAR8k)] Casey Connor. (28 Mar 2020). "42 Audio Illusions & Phenomena! - Part 2/5 of Psychoacoustics". YouTube.
* [[Y](https://www.youtube.com/watch?v=OiW8gzBGz1A)] Casey Connor. (26 Mar 2020). "42 Audio Illusions & Phenomena! - Part 1/5 of Psychoacoustics". YouTube

The Ling Space
* [[Y](https://www.youtube.com/watch?v=Czvgf-Xc-A4)] The Ling Space. (24 Jun 2015). "Phonological Illusions". YouTube.

https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=067d412ef7ad42401bb89bb826f5aea31a26d40b

---

## Figures

* [[W](https://en.wikipedia.org/wiki/Diana_Deutsch)] Deutsch, Diana (1938-) [[Illusions](http://dianadeutsch.ucsd.edu/psychology/pages.php?i=201)]
* [[W](https://en.wikipedia.org/wiki/Harry_McGurk)] McGurk, Harry (1936-1998)
* [[W](https://en.wikipedia.org/wiki/Roger_Shepard)] Shepard, Roger (1929-2022)

---

## Terms

* [[W](https://en.wikipedia.org/wiki/Allophone)] Allophone
* [[W](https://en.wikipedia.org/wiki/Auditory_cortex)] Auditory Cortex
* [[W](https://en.wikipedia.org/wiki/Auditory_illusion)] Auditory Illusion
* [[W](https://en.wikipedia.org/wiki/Hearing)] Auditory Perception (Hearing)
* [[W](https://en.wikipedia.org/wiki/Broca%27s_area)] Broca's Area
* [[W](https://en.wikipedia.org/wiki/Categorical_perception)] Categorical Perception
* [[W](https://en.wikipedia.org/wiki/Coarticulation)] Coarticulation
* [[W](https://en.wikipedia.org/wiki/Common_coding_theory)] Common Coding Theory
* [[W](https://en.wikipedia.org/wiki/Confusion_matrix)] Confusion Matrix
* [[W](https://en.wikipedia.org/wiki/Dichotic_listening)] Dichotic Listening
* [[W](https://en.wikipedia.org/wiki/Duplex_perception)] Duplex Perception
* [[W](https://en.wikipedia.org/wiki/Exemplar_theory)] Exemplar Theory
* [W] Ganong Effect
* [[W](https://en.wikipedia.org/wiki/Haskins_Laboratories)] Haskins Laboratories
* [[W](https://en.wikipedia.org/wiki/Hierarchical_clustering)] Hierarchical Cluster Analysis
* [[W](https://en.wikipedia.org/wiki/Levenshtein_distance)] Levenshtein Distance
* [[W](https://en.wikipedia.org/wiki/McGurk_effect)] McGurk Effect
  * https://www.youtube.com/watch?v=2k8fHR9jKVM
  * https://www.youtube.com/watch?v=kzo45hWXRWU
* [[W](https://en.wikipedia.org/wiki/Motor_theory_of_speech_perception)] Motor Theory of Speech Perception
* [[W](https://en.wikipedia.org/wiki/Multidimensional_scaling)] Multidimensional Scaling (MDS)
* [[W](https://en.wikipedia.org/wiki/Multisensory_integration)] Multisensory Integration
* [W] Percept
* [[W](https://en.wikipedia.org/wiki/Perception)] Perception
* [W] Phantom Word Illusion
  * [Diana Deutsch](http://deutsch.ucsd.edu/psychology/pages.php?i=211)
  * https://www.youtube.com/watch?v=muCPjK4nGY4
* [[W](https://en.wikipedia.org/wiki/Phoneme)] Phoneme
* [[W](https://en.wikipedia.org/wiki/Phonemic_restoration_effect)] Phoneme Restoration Effect
  * https://www.youtube.com/watch?v=kbzL9PxtFf0
  * https://www.youtube.com/watch?v=ZyvyGMkzNQc
* [[W](https://en.wikipedia.org/wiki/Precedence_effect)] Precedence Effect
* [[W](https://en.wikipedia.org/wiki/Music_psychology)] Psychology of Music
* [[W](https://en.wikipedia.org/wiki/Linguistic_relativity)] Sapir-Whorf Hypothesis
* [[W](https://en.wikipedia.org/wiki/Shepard_tone)] Shepard Tone
  * https://vimeo.com/34749754
  * https://www.youtube.com/watch?v=BzNzgsAE4F0
  * https://www.youtube.com/watch?v=kzo45hWXRWU
* [[W](https://en.wikipedia.org/wiki/Sensory_cue)] Sensory Cue
* [[W](https://en.wikipedia.org/wiki/Signal-to-noise_ratio)] Signal-to-Noise Ratio (SNR)
* [[W](https://en.wikipedia.org/wiki/Sound_localization)] Sound Localization
* [[W](https://en.wikipedia.org/wiki/Speech-to-song_illusion)] Speech-to-Song Illusion
  * [Diana Deutsch](http://dianadeutsch.ucsd.edu/psychology/pages.php?i=212)
  * https://www.youtube.com/watch?v=kbzL9PxtFf0
* [[W](https://en.wikipedia.org/wiki/Speech_perception)] Speech Perception
* [[W](https://en.wikipedia.org/wiki/Speech_segmentation)] Speech Segmentation
* [[W](https://en.wikipedia.org/wiki/Speech_shadowing)] Speech Shadowing
* [[W](https://en.wikipedia.org/wiki/Speech_synthesis)] Speech Synthesis
* [[W](https://en.wikipedia.org/wiki/Stimulus_(physiology))] Stimulus
* [[W](https://en.wikipedia.org/wiki/Triangulation)] Triangulation
* [[W](https://en.wikipedia.org/wiki/Tritone)] Tritone
* [[W](https://en.wikipedia.org/wiki/Tritone_paradox)] Tritone Paradox
  * [Diana Deutsch](http://dianadeutsch.ucsd.edu/psychology/pages.php?i=206)
  * https://www.youtube.com/watch?v=kbzL9PxtFf0
  * https://www.youtube.com/watch?v=kzo45hWXRWU
* [[W](https://en.wikipedia.org/wiki/Voice_onset_time)] Voice Onset Time (VOT)
* [[W](https://en.wikipedia.org/wiki/Wernicke%27s_area)] Wernicke's Area

---

## Bibliography

Johnson, Keith. (2012). _Acoustic and Auditory Phonetics_. 3rd Ed. [Wiley-Blackwell](https://www.wiley.com/en-us/Acoustic+and+Auditory+Phonetics%2C+3rd+Edition-p-9781444343083).

---