# Ghehlien

**Ghehlien** (系聯) is a clustering method used in Old Chinese Phonology

## 1. Prerequisites

### 1.1 Playing with Fuzzy Numbers

Defining a new type called `FuzzyFloat`, in which `*` becomes `min` and `+` becomes `max`.

In [1]:
include("fuzzynum.jl")
using fuzzynum

In [2]:
FuzzyFloat(0.5) + FuzzyFloat(0.6)

0.6

In [3]:
a = [FuzzyFloat(0.4) FuzzyFloat(0.8); FuzzyFloat(0.2) FuzzyFloat(0.6)]

2×2 Array{fuzzynum.FuzzyFloat,2}:
 0.4  0.8
 0.2  0.6

In [4]:
b = [FuzzyFloat(0.2) FuzzyFloat(0.7); FuzzyFloat(0.5) FuzzyFloat(0.1)]

2×2 Array{fuzzynum.FuzzyFloat,2}:
 0.2  0.7
 0.5  0.1

In [5]:
a * b

2×2 Array{fuzzynum.FuzzyFloat,2}:
 0.5  0.4
 0.5  0.2

### 1.2 Reading data from file

In [6]:
using CSV

When reading data from a CSV file, the result would be `DataFrames.DataFrame`.

In [7]:
df = CSV.read("data.csv", types = Dict(7 => String))

Unnamed: 0,廣韻韻部順序&廣韻韻部原貌(調整前),小韻序,上字,下字,中古拼音(polyhedron 版),廣韻字頭(覈校後),小韻內字序
1,上平01東,1,德,紅,tung,東,1
2,上平01東,1,德,紅,tung,菄,2
3,上平01東,1,德,紅,tung,鶇,3
4,上平01東,1,德,紅,tung,䍶,4
5,上平01東,1,德,紅,tung,𠍀,5
6,上平01東,1,德,紅,tung,倲,6
7,上平01東,1,德,紅,tung,𩜍,7
8,上平01東,1,德,紅,tung,𢘐,8
9,上平01東,1,德,紅,tung,涷,9
10,上平01東,1,德,紅,tung,蝀,10


In [8]:
mapfoldl(length, +, 0, df[5])/25333  # Process the column 中古拼音(polyhedron 版)

4.192949907235621

## 2. Analyses of Kuangxyonh

In this section, the ghehlien of **pyanxchet upper characters** (反切上字) and **pyanxchet lower characters** (反切下字) in **Kuangxyonh** (廣韻) will be analysed.

### 2.1 Pyanxchet Upper Characters

**2.1.1. Create a new set $S$ and put all upper characters into it:**

In [9]:
s = Set(Array(df[:上字]))

Set(Union{Missings.Missing, String}["當", "跪", "女", "握", "羽", "危", "尼", "羊", "同", "醋"  …  "匹", "連", "征", "并", "下", "辝", "色", "卑", "視", "縛"])

In [10]:
filter!(x -> typeof(x) != Missings.Missing, s)  # Remove missing data, for those small rhymes (小韻) that has no pyanxchet

Set(Union{Missings.Missing, String}["當", "跪", "女", "握", "羽", "危", "尼", "羊", "同", "醋"  …  "匹", "連", "征", "并", "下", "辝", "色", "卑", "視", "縛"])

**2.1.2. Zip all the pyanxchet upper characters with their pyanxchet upper characters**

In [11]:
function getUCList()
    dfG = Array(df[Symbol("廣韻字頭(覈校後)")])
    dfS = Array(df[:上字])
    lst = setToList(s)
    n = length(lst)
    ret = []
    for i in 1:n
        ch = lst[i]
        push!(ret, (ch, dfS[getIndexInArr(dfG, ch)]))
    end
    ret
end

getUCList (generic function with 1 method)

In [12]:
uclist = getUCList()

471-element Array{Any,1}:
 ("兹", "疾")
 ("鋤", "士")
 ("爭", "側")
 ("明", "武")
 ("之", "止")
 ("數", "所")
 ("北", "博")
 ("彼", "甫")
 ("衢", "其")
 ("爲", "薳")
 ("匹", "譬")
 ("愛", "烏")
 ("傍", "步")
 ⋮         
 ("平", "房")
 ("區", "豈")
 ("速", "桑")
 ("始", "詩")
 ("呵", "虎")
 ("部", "裴")
 ("諸", "章")
 ("丕", "敷")
 ("榮", "永")
 ("遵", "將")
 ("除", "直")
 ("狂", "巨")

**2.1.3 Do ghehlien**

In [13]:
ghehlien(uclist)

Stacktrace:
 [1] [1mdepwarn[22m[22m[1m([22m[22m::String, ::Symbol[1m)[22m[22m at [1m.\deprecated.jl:70[22m[22m
 [2] [1mfilter[22m[22m[1m([22m[22m::Function, ::Base.Iterators.Zip2{SubArray{Float64,1,Array{Float64,2},Tuple{Base.Slice{Base.OneTo{Int64}},Int64},true},Array{Int64,1}}[1m)[22m[22m at [1m.\deprecated.jl:57[22m[22m
 [3] [1mgetCluster[22m[22m[1m([22m[22m::Array{fuzzynum.FuzzyFloat,2}, ::Int64[1m)[22m[22m at [1mF:\Source\Repos\Ghehlien\fuzzynum.jl:61[22m[22m
 [4] [1mghehlien[22m[22m[1m([22m[22m::Array{Any,1}[1m)[22m[22m at [1mF:\Source\Repos\Ghehlien\fuzzynum.jl:85[22m[22m
 [5] [1minclude_string[22m[22m[1m([22m[22m::String, ::String[1m)[22m[22m at [1m.\loading.jl:522[22m[22m
 [6] [1minclude_string[22m[22m[1m([22m[22m::Module, ::String, ::String[1m)[22m[22m at [1mE:\julia-depot\v0.6\Compat\src\Compat.jl:88[22m[22m
 [7] [1mexecute_request[22m[22m[1m([22m[22m::ZMQ.Socket, ::IJulia.Msg[1m)[22m[22m at [1

兹匠疾自情慈秦
鋤仕崇鶵查豺助雛士崱鉏牀
爭側阻仄莊鄒簪
明靡文美武亡望眉巫彌綿無
之章征氏占止旨煑脂識職正支諸
數山沙色疏疎生砂所史
北博伯布邊補巴百晡
彼父甫必兵并防筆弼婢陂符卑方皮縛扶畀分毗裴馮浮府鄙便封附房平部
爲雨筠于羽薳洧雲云永有韋王榮
衢具強俟求暨渠臼奇其巨狂
匹譬
愛哀安鷖烏
傍白捕薄蒲步
先胥蘇須司息素寫辛雖斯桑思相私悉速
胡侯獲乎下戶懷何黃
弋台隨悅實營辝余似旬夷辭以羊乘食寺詳移翼徐祥予與餘神夕
堂徒唐特同陀度杜
驅傾跪弃起袪曲乞綺丘欽詰羌去卿窺豈墟區
憂謁握挹央依烟於委衣一乙紆伊憶
署嘗承是成視市常蜀殊寔時殖
德多得
力連縷里良呂離林
借𩛠醉資祖將作即子姊漸則臧遵
當都冬
廁創瘡初叉楚測芻
郎魯練
治宅丈持佇植臣遟直墜池場柱馳除
天吐土託他通
虛香羲朽休興況許喜
姑乖各過兼楷公古佳格詭
母模慕莫摸謨
洛勒落賴盧來
如兒儒人耳而仍汝
火虎花馨荒海呼呵
豬追張竹丁卓徵陟珍迍知中褚猪
蒼麁取采麤倉遷醋青七千親
牛俄虞危宜玉遇魚擬疑研愚吾五語
乃奴內諾那㚷
康口謙枯恪苦空牽可客
女拏尼穠
披敷孚拂撫芳峯妃丕
前藏在徂才昨
舉規居九俱紀几吉
式矢施詩釋試傷失湯書賞舒商始
抽楮癡恥敕丑
滂普
充處赤尺叱昌
雌此


### 2.2 Pyanxchet Lower Characters

**2.2.1. Create a new set $S$ and put all upper characters into it:**

In [14]:
s = Set(Array(df[:下字]))

Set(Union{Missings.Missing, String}["懈", "甾", "當", "甚", "法", "賄", "越", "俾", "運", "河"  …  "亞", "寸", "教", "戀", "畏", "位", "鄭", "醒", "贈", "圓"])

In [15]:
filter!(x -> typeof(x) != Missings.Missing, s)  # Remove missing data, for those small rhymes (小韻) that has no pyanxchet

Set(Union{Missings.Missing, String}["懈", "甾", "當", "甚", "法", "賄", "越", "俾", "運", "河"  …  "亞", "寸", "教", "戀", "畏", "位", "鄭", "醒", "贈", "圓"])

**2.2.2. Zip all the pyanxchet upper characters with their pyanxchet upper characters**

In [16]:
function getLCList()
    dfG = Array(df[Symbol("廣韻字頭(覈校後)")])
    dfS = Array(df[:下字])
    lst = setToList(s)
    n = length(lst)
    ret = []
    for i in 1:n
        ch = lst[i]
        ind = getIndexInArr(dfG, ch)
        if ind != -1
            if typeof(ch) == String && typeof(dfS[ind]) == String
                push!(ret, (ch, dfS[ind]))
            end
        end
    end
    ret
end

getLCList (generic function with 1 method)

In [17]:
lclist = getLCList()

1185-element Array{Any,1}:
 ("婁", "朱")
 ("肌", "夷")
 ("焉", "言")
 ("肴", "茅")
 ("制", "例")
 ("懈", "隘")
 ("鍾", "容")
 ("預", "洳")
 ("孟", "更")
 ("綸", "迍")
 ("爲", "支")
 ("灼", "若")
 ("甾", "持")
 ⋮         
 ("杯", "回")
 ("佃", "年")
 ("贈", "亙")
 ("襃", "毛")
 ("拜", "怪")
 ("荏", "甚")
 ("允", "準")
 ("赧", "板")
 ("牒", "協")
 ("斗", "口")
 ("曹", "勞")
 ("圓", "權")

**2.2.3 Do ghehlien**

In [18]:
ghehlien(lclist)

婁于熱朱足別句滅輸列誅俞隅辥逾俱芻
肌夷尼糾資私脂飢黝
焉軒言
肴孝交嘲茅稍皃覺教
制訐罽蔽袂例憩祭弊
懈隘卦賣
鍾封用凶容頌庸恭
預灼甾姐遮若嗟居車魚奢藥諸邪其而爵洳雀賒與兹之勺略余野持
孟行當盲浪宕剛岡郎庚更
綸筠脣贇倫旬遵勻迍
爲隨倚毀吹危帋垂紙是規綺離移知支累詭彼靡此髓爾侈捶氏隋委豸
隱謹
焮靳
帶太大轄貝艾蓋
敢覽埯
合閤荅沓雜
彪烋幽虯
類醉遂萃
賜避益迹昔智寄積義亦恚易辟豉
佞徑定
哀來開
勞刀遭牢曹
冉廉漸淹炎染琰占斂鹽
政正成盛盈貞姓征并情鄭
貢弄鳳送
晏澗鴈按旰諫案旦贊
妙虐笑肖約
幸耿
蛙緺媧
鑒懺
文倦權員彥變囀攣眷云分戀卷
綏維遺隹追
記既溉志豙吏置
道抱晧老早浩
甚深淫枕針朕稔荏
法乏
贍豔
乎姑吾孤胡都烏吳
皛晈鳥了皎
賄猥罪
典峴殄繭
激弔嘯叫
妹輩昧佩
酉九有柳婦久
麵見電練甸
夥𠁥蟹買
幻幰偃蹇辨免堰
摘核革責厄戹
越拔伐八發黠
戶補賈魯杜古
俾婢企弭
或國
男陷𧸖含南韽
刮䫄
筆乙密
涬冷靈刑萌頂莖爭鼎宏迥丁挺剄耕打經醒
巷絳
界戒怪壞介拜
逼即側力極直
灰恢回杯
恕署
證蒸乘應庱矜冰升膺𩜁仍孕甑兢陵
運問
勒德得則
昆尊䰟渾奔
計詣戾
緣川泉全宣專
四質叱至寐畢必利自一二日悉栗冀器七吉
唾臥钁縛貨籰
朗黨
犯錽范
河何俄歌
令仙扇然連延
店念
六逐菊竹福匊
㢡掌网妄養放昉丈往兩
疋葅
恆滕崩增登棱朋
賀邏个箇佐
訪亮向況讓㨾
候奏漏豆遘
羽甫矩雨武禹
展演翦煙輦先前淺善
真振珍遴鄰印刃覲晉人賓
𩏩嚴
飽巧絞爪
割曷葛達
還鰥關班頑
月厥物勿弗
銜監鑑
牙加霞巴
敏殞
泛終梵中弓眾融戎仲宮
禾婆和過波戈
但寒乾安干
兼甜
檻𣊟暫唵瞰禫濫蹔黤感
里紀史理擬士己
墨北黑
張羊章莊陽良
輒葉攝接涉
丸潘貫喚官端筭
董摠孔動
華瓜花
話夬邁快
玉欲蜀曲錄
在紿改肯等亥愷乃宰
膎佳
哉才
戰膳
冬宗
任林心尋
郤逆戟劇
尾匪
后垢口厚苟斗
恩痕根
霸㕦嫁訝駕化亞
楷皆駭諧
遇注具戍
爇衛歲芮銳輟劣稅
公蠓空紅東凍
扃螢
䒦凡
緩伴管滿旱纂笴
瀌夭表嬌矯喬囂
滑屑忽結蔑骨
內報秏隊繢對
甲狎
閑閒山
謝夜炙
呪宿祐副溜富又救
板綰鯇赧
協愜頰牒
京驚卿
病命
冢奉踵宂勇隴
役隻石
奇羈宜
末撥
桂惠
广奄儉檢險
翼職
篆兗轉緬
迄訖乞
綜宋統
庾主
潁營傾頃䁝
昭遙招
湩𪁪
準尹允
擊狄歷
建阮怨願袁販煩晚元万遠
秋周尤由鳩求州流
外會最


Stacktrace:
 [1] [1mdepwarn[22m[22m[1m([22m[22m::String, ::Symbol[1m)[22m[22m at [1m.\deprecated.jl:70[22m[22m
 [2] [1mfilter[22m[22m[1m([22m[22m::Function, ::Base.Iterators.Zip2{SubArray{Float64,1,Array{Float64,2},Tuple{Base.Slice{Base.OneTo{Int64}},Int64},true},Array{Int64,1}}[1m)[22m[22m at [1m.\deprecated.jl:57[22m[22m
 [3] [1mgetCluster[22m[22m[1m([22m[22m::Array{fuzzynum.FuzzyFloat,2}, ::Int64[1m)[22m[22m at [1mF:\Source\Repos\Ghehlien\fuzzynum.jl:61[22m[22m
 [4] [1mghehlien[22m[22m[1m([22m[22m::Array{Any,1}[1m)[22m[22m at [1mF:\Source\Repos\Ghehlien\fuzzynum.jl:85[22m[22m
 [5] [1minclude_string[22m[22m[1m([22m[22m::String, ::String[1m)[22m[22m at [1m.\loading.jl:522[22m[22m
 [6] [1minclude_string[22m[22m[1m([22m[22m::Module, ::String, ::String[1m)[22m[22m at [1mE:\julia-depot\v0.6\Compat\src\Compat.jl:88[22m[22m
 [7] [1mexecute_request[22m[22m[1m([22m[22m::ZMQ.Socket, ::IJulia.Msg[1m)[22m[22m at [1