# Ghehlien

This notebook is used for clustering of Old Chinese Phonology.

## Playing with `FuzzyNum`

Defining a new type called `FuzzyNum`, `*` becomes `min` and `+` becomes `max`.

In [1]:
include("fuzzynum.jl")
using fuzzynum

In [2]:
FuzzyNum(0.5) + FuzzyNum(0.6)

0.6

In [3]:
a = [FuzzyNum(0.4) FuzzyNum(0.8); FuzzyNum(0.2) FuzzyNum(0.6)]

2×2 Array{fuzzynum.FuzzyNum,2}:
 0.4  0.8
 0.2  0.6

In [4]:
b = [FuzzyNum(0.2) FuzzyNum(0.7); FuzzyNum(0.5) FuzzyNum(0.1)]

2×2 Array{fuzzynum.FuzzyNum,2}:
 0.2  0.7
 0.5  0.1

In [5]:
a * b

2×2 Array{fuzzynum.FuzzyNum,2}:
 0.5  0.4
 0.5  0.2

## Reading data from file

In [6]:
using CSV

Read CSV from file, the result would be `DataFrames.DataFrame`.

In [7]:
df = CSV.read("data.csv", types=Dict(7=>String))

Unnamed: 0,廣韻韻部順序&廣韻韻部原貌(調整前),小韻序,上字,下字,中古拼音(polyhedron 版),廣韻字頭(覈校後),小韻內字序
1,上平01東,1,德,紅,tung,東,1
2,上平01東,1,德,紅,tung,菄,2
3,上平01東,1,德,紅,tung,鶇,3
4,上平01東,1,德,紅,tung,䍶,4
5,上平01東,1,德,紅,tung,𠍀,5
6,上平01東,1,德,紅,tung,倲,6
7,上平01東,1,德,紅,tung,𩜍,7
8,上平01東,1,德,紅,tung,𢘐,8
9,上平01東,1,德,紅,tung,涷,9
10,上平01東,1,德,紅,tung,蝀,10


In [8]:
mapfoldl(length, +, 0, df[5])/25333

4.192949907235621

## 1. Create a new set $S$ and put all upper characters into it:

In [9]:
s = Set(df[3])

Set(Union{CategoricalArrays.CategoricalString{UInt32}, Missings.Missing}[CategoricalArrays.CategoricalString{UInt32} "當", CategoricalArrays.CategoricalString{UInt32} "跪", CategoricalArrays.CategoricalString{UInt32} "女", CategoricalArrays.CategoricalString{UInt32} "握", CategoricalArrays.CategoricalString{UInt32} "羽", CategoricalArrays.CategoricalString{UInt32} "危", CategoricalArrays.CategoricalString{UInt32} "尼", CategoricalArrays.CategoricalString{UInt32} "羊", CategoricalArrays.CategoricalString{UInt32} "同", CategoricalArrays.CategoricalString{UInt32} "醋"  …  CategoricalArrays.CategoricalString{UInt32} "匹", CategoricalArrays.CategoricalString{UInt32} "連", CategoricalArrays.CategoricalString{UInt32} "征", CategoricalArrays.CategoricalString{UInt32} "并", CategoricalArrays.CategoricalString{UInt32} "下", CategoricalArrays.CategoricalString{UInt32} "辝", CategoricalArrays.CategoricalString{UInt32} "色", CategoricalArrays.CategoricalString{UInt32} "卑", CategoricalArrays.CategoricalString{UInt32

In [10]:
filter!(x -> typeof(x) != Missings.Missing, s)

Set(Union{CategoricalArrays.CategoricalString{UInt32}, Missings.Missing}[CategoricalArrays.CategoricalString{UInt32} "當", CategoricalArrays.CategoricalString{UInt32} "跪", CategoricalArrays.CategoricalString{UInt32} "女", CategoricalArrays.CategoricalString{UInt32} "握", CategoricalArrays.CategoricalString{UInt32} "羽", CategoricalArrays.CategoricalString{UInt32} "危", CategoricalArrays.CategoricalString{UInt32} "尼", CategoricalArrays.CategoricalString{UInt32} "羊", CategoricalArrays.CategoricalString{UInt32} "同", CategoricalArrays.CategoricalString{UInt32} "醋"  …  CategoricalArrays.CategoricalString{UInt32} "匹", CategoricalArrays.CategoricalString{UInt32} "連", CategoricalArrays.CategoricalString{UInt32} "征", CategoricalArrays.CategoricalString{UInt32} "并", CategoricalArrays.CategoricalString{UInt32} "下", CategoricalArrays.CategoricalString{UInt32} "辝", CategoricalArrays.CategoricalString{UInt32} "色", CategoricalArrays.CategoricalString{UInt32} "卑", CategoricalArrays.CategoricalString{UInt32

## 2. Get the length of the set $S$, construct an empty $n*n$ array

In [11]:
n = length(s)

471

In [12]:
arr = zeros(Int, n, n)

471×471 Array{Int64,2}:
 0  0  0  0  0  0  0  0  0  0  0  0  0  …  0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  0     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  0     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  0     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  0     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  0  …  0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  0     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  0     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  0     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  0     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  0  …  0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  0     0  0  0  0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0  0  0  0  0  

## 3. Generate a list $xs$ from set $S$

In [13]:
dupS = copy(s)

Set(Union{CategoricalArrays.CategoricalString{UInt32}, Missings.Missing}[CategoricalArrays.CategoricalString{UInt32} "兹", CategoricalArrays.CategoricalString{UInt32} "鋤", CategoricalArrays.CategoricalString{UInt32} "爭", CategoricalArrays.CategoricalString{UInt32} "明", CategoricalArrays.CategoricalString{UInt32} "之", CategoricalArrays.CategoricalString{UInt32} "數", CategoricalArrays.CategoricalString{UInt32} "北", CategoricalArrays.CategoricalString{UInt32} "彼", CategoricalArrays.CategoricalString{UInt32} "衢", CategoricalArrays.CategoricalString{UInt32} "爲"  …  CategoricalArrays.CategoricalString{UInt32} "速", CategoricalArrays.CategoricalString{UInt32} "始", CategoricalArrays.CategoricalString{UInt32} "呵", CategoricalArrays.CategoricalString{UInt32} "部", CategoricalArrays.CategoricalString{UInt32} "諸", CategoricalArrays.CategoricalString{UInt32} "丕", CategoricalArrays.CategoricalString{UInt32} "榮", CategoricalArrays.CategoricalString{UInt32} "遵", CategoricalArrays.CategoricalString{UInt32

In [14]:
xs = []

0-element Array{Any,1}

In [15]:
for i in 1:n
    push!(xs, pop!(dupS))
end

In [16]:
xs

471-element Array{Any,1}:
 "兹"
 "鋤"
 "爭"
 "明"
 "之"
 "數"
 "北"
 "彼"
 "衢"
 "爲"
 "匹"
 "愛"
 "傍"
 ⋮  
 "平"
 "區"
 "速"
 "始"
 "呵"
 "部"
 "諸"
 "丕"
 "榮"
 "遵"
 "除"
 "狂"

## 4. Set `count` as 0

In [17]:
count = 0

0

## 5. Iterate `xs`, for all `x` in `xs`, find the upper character of `x`

## Then, let $i_1$ be the index of `x` in `xs`, $i_2$ be the index of (the upper character of `x`) in `xs`

## Increace one for array $(i_1, i_2)$ and $(i_2, i_1)$

In [18]:
typeof(xs[1])

CategoricalArrays.CategoricalString{UInt32}

In [19]:
function findUpperChar(ch)
    for (wrd, ucOfWrd) in zip(df[6], df[3])
        if wrd == ch
            return ucOfWrd
        end
    end
end

findUpperChar (generic function with 1 method)

In [20]:
findUpperChar(df[6][1])

CategoricalArrays.CategoricalString{UInt32} "德"

In [21]:
function getUpperCharIndex(ch :: typeof(xs[1]))
    for i in 1:n
        if ch == xs[i]
            return i
        end
    end
end

getUpperCharIndex (generic function with 1 method)

In [22]:
getUpperCharIndex(findUpperChar(df[6][1]))

33

In [23]:
for i in 1:n
    i1 = i
    i2 = getUpperCharIndex(findUpperChar(xs[i]))
    arr[i1, i2] += 1
    arr[i2, i1] += 1
end

## 6. Get the transitive closure of `arr`

In [24]:
FuzzyArr = map(x -> FuzzyNum(x / n), arr)

471×471 Array{fuzzynum.FuzzyNum,2}:
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  …  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  …  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  …  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.

In [25]:
function getTransitiveClosure(FuzzyArr)
    while true
        arr_new = FuzzyArr * FuzzyArr
        if arr_new == FuzzyArr
            return arr_new
        end
        FuzzyArr = arr_new
    end
end       

getTransitiveClosure (generic function with 1 method)

In [26]:
FuzzyArr_TC = getTransitiveClosure(FuzzyArr)

471×471 Array{fuzzynum.FuzzyNum,2}:
 0.00212314  0.0         0.0         …  0.0         0.0         0.0       
 0.0         0.00212314  0.0            0.0         0.0         0.0       
 0.0         0.0         0.00212314     0.0         0.0         0.0       
 0.0         0.0         0.0            0.0         0.0         0.0       
 0.0         0.0         0.0            0.0         0.0         0.0       
 0.0         0.0         0.0         …  0.0         0.0         0.0       
 0.0         0.0         0.0            0.0         0.0         0.0       
 0.0         0.0         0.0            0.0         0.0         0.0       
 0.0         0.0         0.0            0.0         0.0         0.0       
 0.0         0.0         0.0            0.0         0.0         0.0       
 0.0         0.0         0.0         …  0.0         0.0         0.0       
 0.0         0.0         0.0            0.0         0.0         0.0       
 0.0         0.0         0.0            0.0         0.0         

In [27]:
Set(map(x -> Float64(x), FuzzyArr_TC))

Set([0.00212314, 0.00424628, 0.0])

In [28]:
FuzzyArr_Regression_TC = getTransitiveClosure(map(x -> FuzzyNum(x == zero(x) ? 0.0 : 1.0), FuzzyArr))

471×471 Array{fuzzynum.FuzzyNum,2}:
 1.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  …  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  1.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  1.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  1.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  1.0  0.0  0.0  0.0     0.0  1.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  1.0  0.0  0.0  …  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  1.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  1.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  1.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  …  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.

In [29]:
function getCluster(arr)
    arr_new = map(x -> Float64(x), arr)

    ss = []
    for i in 1:n
        if all(xs -> !in(i, xs), ss)
            s = map(x -> x[2], filter(x -> x[1] != zero(x[1]), zip(view(arr_new, :, i), collect(1:n))))
            if length(s) != 0
                push!(ss, s)
            end
        end
    end
    ss
end

getCluster (generic function with 1 method)

In [30]:
c_regress = getCluster(FuzzyArr_Regression_TC)

Stacktrace:
 [1] [1mdepwarn[22m[22m[1m([22m[22m::String, ::Symbol[1m)[22m[22m at [1m.\deprecated.jl:70[22m[22m
 [2] [1mfilter[22m[22m[1m([22m[22m::Function, ::Base.Iterators.Zip2{SubArray{Float64,1,Array{Float64,2},Tuple{Base.Slice{Base.OneTo{Int64}},Int64},true},Array{Int64,1}}[1m)[22m[22m at [1m.\deprecated.jl:57[22m[22m
 [3] [1mgetCluster[22m[22m[1m([22m[22m::Array{fuzzynum.FuzzyNum,2}[1m)[22m[22m at [1m.\In[29]:7[22m[22m
 [4] [1minclude_string[22m[22m[1m([22m[22m::String, ::String[1m)[22m[22m at [1m.\loading.jl:522[22m[22m
 [5] [1minclude_string[22m[22m[1m([22m[22m::Module, ::String, ::String[1m)[22m[22m at [1mE:\julia-depot\v0.6\Compat\src\Compat.jl:88[22m[22m
 [6] [1mexecute_request[22m[22m[1m([22m[22m::ZMQ.Socket, ::IJulia.Msg[1m)[22m[22m at [1mE:\julia-depot\v0.6\IJulia\src\execute_request.jl:180[22m[22m
 [7] [1m(::Compat.#inner#14{Array{Any,1},IJulia.#execute_request,Tuple{ZMQ.Socket,IJulia.Msg}})[22m[2

88-element Array{Any,1}:
 [1, 30, 203, 222, 263, 421, 443]                                         
 [2, 74, 84, 118, 158, 360, 391, 397]                                     
 [3, 134, 230, 380, 438]                                                  
 [4, 17, 41, 48, 52, 90, 119, 258, 285, 298, 366, 426]                    
 [5, 83, 86, 243, 289, 335, 408, 466]                                     
 [6, 71, 115, 133, 138, 233, 244, 305]                                    
 [7, 40, 79, 320, 388, 407]                                               
 [8, 63, 65, 68, 82, 98, 177, 192, 193, 268, 347, 352, 355, 427, 436, 454]
 [9, 80, 319, 329, 435]                                                   
 [10, 95, 266, 280, 359, 468]                                             
 [11]                                                                     
 [12, 23, 145, 162]                                                       
 [13, 127, 146, 198, 317, 433]                                            


In [31]:
for i in map(m -> join(map(n -> xs[n], m)), c_regress)
    println(i)
end

兹匠疾自情慈秦
鋤鶵查豺雛崱鉏牀
爭阻莊鄒簪
明靡文美武亡望眉巫彌綿無
之氏占旨煑識支諸
數山沙色疏疎生砂
北伯布補百晡
彼兵并防筆婢卑方皮縛畀分裴封附房
衢具渠臼巨
爲于有韋王榮
匹
愛哀安鷖
傍白捕薄蒲步
先胥須息素寫速
雨筠羽薳洧雲云永
胡侯乎懷
父甫必弼陂符扶毗馮浮府鄙便平部
弋台隨悅實營辝余似旬夷辭以羊乘食寺詳移翼徐祥予與餘神夕
堂唐特同陀度杜
憂謁握央依烟委衣一乙伊憶
驅傾跪袪乞丘欽詰羌卿窺墟區
側仄
博邊巴
獲下戶何黃
署是視市寔殖
德得
嘗承成常蜀殊時
力良離
借𩛠醉作即姊漸則遵
當冬
仕崇助士
廁創瘡初叉楚測芻
連縷里呂林
郎
章征止脂職正
治丈持佇植遟墜池場柱馳除
天吐土託通
蘇司辛雖斯桑思相私悉
虛香羲朽休興況許喜
姑乖各過兼楷公佳格
母模慕摸謨
洛勒落
如儒耳汝
火虎花馨荒海
宅臣直
豬追張竹丁卓徵陟珍迍知中褚猪
蒼取倉遷親
牛俄虞魚疑研愚吾
乃內諾㚷
康口謙枯恪空牽客
女
強俟求暨奇其狂
披拂芳峯丕
弃起曲綺去豈
前藏在徂才
挹於紆
徒
舉規居九俱紀几吉
資祖將子臧
敷孚撫妃
式釋書舒始
矢施詩試傷失湯賞商
麁采麤醋青七千
抽楮癡敕
滂
充處赤尺叱
拏尼穠
都
烏
兒人而仍
雌
危宜玉遇擬五語
奴那
莫
恥丑
古詭
魯練
苦可
所史
普
此
賴盧來
他
譬
多
呼呵
昌
昨
