## 促音便

## ToDo

phonetic minimality を明示的に表せる

* 規則・制約なしに再現できるか. durationで上がるはず
    -  kuwu + ta -> ku?ta
    -  kuwu + ta -> koota
* u_trinsic と duration の交互作用はあるか -> 心理言語学の予測
    - /u/ の intrinsic time を制御
    - 知識としてあるか。交互作用が肝心。
* どのようなタブローが作成出来るか -> 音韻論の candidates

## Future Studies

- generation のときの調音結合などをいじる
    - 経済性の組み込みで koota なども出てくるはず。
- /sd/ を制御(まずは正確に聞き取れるようになるまで)
- 入力(産出)にはノイズが乗る
- パラメータのdir は chmod 777 しておく

## Materials and Methods

* モデルを複数作成
    - データ: 産出・知覚する音声
    - モデル: 産出・知覚が依存する知識
* 一回サイクルする
* 100回実行してサンプリング
* 50回回して統計を取る

In [1]:
import sys
sys.path.append('..')

In [2]:
%load_ext autoreload
%autoreload 2

from hydra.experimental import initialize, compose
from src.agent import Agent
# https://hydra.cc/docs/next/experimental/compose_api

with initialize(config_path="../hyparam"):
    config = compose(config_name="config.yml")
agent_sample = Agent(config)
agent_sample.poisson_params

array([9, 9, 9, 3, 3, 3, 9], dtype=object)

### u の duration を操作して実験

- 産出の時の duraion の変化か
- 知識として学習されないのか。されるならどう問題が起きるのか
- koota, ka?ta, kawuta の割合を調査

In [3]:
from itertools import product
import numpy as np
from collections import Counter
from joblib import Parallel, delayed

trial = list(range(20))
# u_durations = [0, 2, 4]
u_durations = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
switches = ["production", "update"]
n_iter = list(range(20))

# 高速化
# ベースラインの速度測定: 808.2551259994507[sec]
## 測定: https://qiita.com/fantm21/items/3dc7fbf4e935311488bc
# 並列化: 437.15462827682495[sec]
# assertやumapをinit時に実施: 12.637521743774414[sec]

import time

def perception(prod, perc, symbol):
    phoneme, obs, states = prod.production("kawuta")
    obs = np.array(obs).astype('double')
    phoneme_hat, obs, states_hat = perc.perception(obs)
    if phoneme_hat == "kota":  # TODO: 二重母音のkotaが識別できないので修正. perceptionメソッドの修正で対応すべき
        phoneme_hat = "koota"
    return phoneme_hat

results_list = []
for t in trial:
    # start: 一回のiterにXs かかる。これから終わる時間を推定する
    start = time.time()
    # start
    if t % 5 == 0:
        print(f"trial: {t}")
    for u_duration, switch in product(u_durations, switches):
        # u_duration を制御
        with initialize(config_path="../hyparam"):
            config_u_reduced = compose(config_name="config.yml", overrides=[f"u_duration={u_duration}"])
            
        a = Agent(config_u_reduced)
        if switch == "production":  # production のときのみ短くする
            with initialize(config_path="../hyparam"):
                config = compose(config_name="config.yml")
            b = Agent(config)
        elif switch == "update":
            b = Agent(config_u_reduced)
        else:
            raise ValueError

        perceptions = Parallel(n_jobs=-1)(delayed(perception)(prod=a, perc=b, symbol="kawuta") for n in n_iter)

        settings =  {
            "trial": t,
            "u_duration": u_duration,
            "intrinsic": switch,
            "n_kawuta": perceptions.count("kawuta"),
            "n_kawta": perceptions.count("kawta"),
            "n_kaQta": perceptions.count("kaQta"),
            "n_kauta": perceptions.count("kauta"),
            "n_koota": perceptions.count("koota"),
            "counter": Counter(perceptions),
        }
        results_list.append(settings)
        print(Counter(perceptions))
        
    # end
    elapsed_time = time.time() - start
    print ("elapsed_time:{0}".format(elapsed_time) + "[sec]")
    # end

trial: 0
Counter({'kauta': 8, 'kawuta': 6, 'kaQta': 5, 'tauta': 1})
Counter({'kawuta': 19, 'kawutua': 1})
Counter({'kawuta': 12, 'kauta': 6, 'kawata': 2})
Counter({'kawuta': 18, 'kauwuta': 1, 'kawutua': 1})
Counter({'kawuta': 16, 'kauta': 3, 'kawata': 1})
Counter({'kawuta': 20})
Counter({'kawuta': 18, 'kawutaw': 1, 'kawuka': 1})
Counter({'kawuta': 20})
Counter({'kawuta': 16, 'kauta': 3, 'kawutau': 1})
Counter({'kawuta': 19, 'kauta': 1})
Counter({'kawuta': 18, 'kauta': 1, 'kawuka': 1})
Counter({'kawuta': 18, 'kauta': 2})
Counter({'kawuta': 20})
Counter({'kawuta': 20})
Counter({'kawuta': 20})
Counter({'kawuta': 20})
Counter({'kawuta': 19, 'tawuta': 1})
Counter({'kawuta': 19, 'kauta': 1})
Counter({'kawuta': 19, 'kautau': 1})
Counter({'kawuta': 16, 'kauta': 2, 'kawuka': 1, 'tawuta': 1})
elapsed_time:74.5317440032959[sec]
Counter({'kawuta': 11, 'kauta': 5, 'kaQta': 3, 'tauta': 1})
Counter({'kawuta': 16, 'kaQta': 3, 'kauwuta': 1})
Counter({'kawuta': 14, 'kawata': 4, 'kauta': 2})
Counter({'ka

Counter({'kawuta': 19, 'kaQta': 1})
Counter({'kawuta': 14, 'kauta': 3, 'kawata': 1, 'kawuka': 1, 'tawuta': 1})
Counter({'kawuta': 19, 'kawuka': 1})
Counter({'kawuta': 16, 'kauta': 2, 'kaowuta': 1, 'kawuka': 1})
Counter({'kawuta': 20})
Counter({'kawuta': 19, 'kauta': 1})
Counter({'kawuta': 20})
Counter({'kawuta': 20})
Counter({'kawuta': 20})
Counter({'kawuta': 19, 'kauta': 1})
Counter({'kawuta': 20})
Counter({'kawuta': 19, 'kauta': 1})
Counter({'kawuta': 19, 'kawutao': 1})
Counter({'kawuta': 18, 'kawutao': 1, 'kauta': 1})
Counter({'kawuta': 19, 'kawukau': 1})
Counter({'kawuta': 18, 'kauta': 1, 'kawuka': 1})
Counter({'kawuta': 19, 'kauta': 1})
Counter({'kawuta': 18, 'kauta': 1, 'tawuta': 1})
Counter({'kawuta': 19, 'kauta': 1})
elapsed_time:49.22221326828003[sec]
Counter({'kawuta': 12, 'kauta': 5, 'kaQta': 2, 'kauka': 1})
Counter({'kawuta': 17, 'kauQta': 2, 'kauwuta': 1})
Counter({'kawuta': 15, 'kauta': 3, 'kawata': 2})
Counter({'kawuta': 20})
Counter({'kawuta': 18, 'kauta': 1, 'kautao': 

In [4]:
import pandas as pd
results = pd.DataFrame(results_list)

In [5]:
results

Unnamed: 0,trial,u_duration,intrinsic,n_kawuta,n_kawta,n_kaQta,n_kauta,n_koota,counter
0,0,0,production,6,0,5,8,0,"{'kauta': 8, 'tauta': 1, 'kawuta': 6, 'kaQta': 5}"
1,0,0,update,19,0,0,0,0,"{'kawuta': 19, 'kawutua': 1}"
2,0,1,production,12,0,0,6,0,"{'kawuta': 12, 'kauta': 6, 'kawata': 2}"
3,0,1,update,18,0,0,0,0,"{'kawuta': 18, 'kauwuta': 1, 'kawutua': 1}"
4,0,2,production,16,0,0,3,0,"{'kawuta': 16, 'kauta': 3, 'kawata': 1}"
...,...,...,...,...,...,...,...,...,...
395,19,7,update,20,0,0,0,0,{'kawuta': 20}
396,19,8,production,19,0,0,1,0,"{'kawuta': 19, 'kauta': 1}"
397,19,8,update,20,0,0,0,0,{'kawuta': 20}
398,19,9,production,19,0,0,0,0,"{'kawuta': 19, 'tawuta': 1}"


## 結果

### Intrinsic

ここではデータの吐き出しのみ。整形とグラフ化、統計分析は results.Rmd を参照

1. 他の音便変化の比較
1. そもそもどのくらい正しい知覚がされるか
1. 促音便の頻度
1. 内在時間長の表現方法の違い

In [6]:
# 分析は results.md の
intrinsic = results[["trial", "u_duration", "intrinsic", "n_kawuta", "n_kawta","n_kaQta", "n_kauta", "n_koota"]]

intrinsic.to_csv('../data/intrinsic.csv', index=False)
intrinsic.head()

Unnamed: 0,trial,u_duration,intrinsic,n_kawuta,n_kawta,n_kaQta,n_kauta,n_koota
0,0,0,production,6,0,5,8,0
1,0,0,update,19,0,0,0,0
2,0,1,production,12,0,0,6,0
3,0,1,update,18,0,0,0,0
4,0,2,production,16,0,0,3,0


## タブロー


ここではデータの吐き出しのみ。整形とグラフ化、統計分析は results.Rmd を参照

1. 認知的に妥当なモデルだけで Candidates は生成できる
1. その際、

In [7]:
def flatten_dict(count_i):
    flat = []
    count_dict_i = dict(count_i)
    for k, v in count_dict_i.items():
        flat += [k]*v
    return flat

In [8]:
recognized = []
for cound_dict in results.counter.to_numpy():
    recognized += flatten_dict(cound_dict)

len(recognized) # 100*900 だから

8000

In [9]:
# https://stackoverflow.com/questions/31111032/transform-a-counter-object-into-a-pandas-dataframe
d = dict(Counter(recognized))
df = pd.DataFrame.from_dict(d, orient='index').reset_index()
candidate = df.rename(columns={'index': 'candidate', 0:'count'})
candidate.to_csv('../data/candidate.csv', index=False)

## Referecnce

- https://www.researchgate.net/figure/Statistics-on-the-Vowel-Formant-Frequenc_tbl1_300346463
- https://slideplayer.com/slide/3359572/
- https://www.fon.hum.uva.nl/rob/Courses/InformationInSpeech/CDROM/Literature/LOTwinterschool2006/speech.bme.ogi.edu/tutordemos/SpectrogramReading/cse551html/cse551/node38.html