This is a project that aims to collect and create accurate written Cantonese subtitles for educational purposes. Accurate subtitles are those that match the spoken dialogue. Written Cantonese is the written form of the Cantonese language that contrasts with what is typically used, known as Standard Written Chinese. Written Cantonese subtitles are seldom used but are very powerful learning resources.
If you would like to contribute to transcripts or subtitles, make a donation, find out about current projects, or simply learn more, please join our Discord server.
Important
Since many of the characters used in these subtitles fall outside the coverage of typical fonts, it is HIGHLY recommended that you install a Cantonese specific font. We recommend installing one of the fonts from https://github.com/chiron-fonts/chiron-hei-hk.
This section details how the subtitles should look. In general, Traditional characters are used as opposed to simplified characters, since they can always be converted to the latter with relative ease.
The goal of these subtitles is to be as useful to learners as possible. The goal is NOT to be as faithful to the literal utterances as spoken by the actors or voice actors. Put another way, we want to capture intended, correct speech, and not misspeaks or agrammatical speech. Furthermore, while the subtitles do aim at comprehensive coverage of what is said, grunts, yells, laughter, and miscellaneous expressive noises should in general be transcribed sparingly and, in some cases, not at all. Such subtitles, broadly speaking, don't contribute to building understanding of the language. To this end, it is recommended to transcribe most interjections only in so far as they are followed by or form part of a longer utterance.
- .srt format
- single line max length is 18 characters
This .srt subtitle format is chosen because of its wide-ranging compatibility especially with language learning tools such as pop-up dictionaries.
- lines can appear 0-50ms before the start of the speech
- lines can slightly trail speech (~0-60ms) after the end of the speech
- lines that end within roughly ~20ms of a scene change should be synced with the scene change
- lines with a length of 3 characters or more need a minimum duration of 750ms
- this can be shorter in the case of a scene change or based on other factors such as lots of speech or interrupted speech
- lines with a length of 2 or fewer characters don't have to follow that minimum
Explanation | Examples |
---|---|
Written or background info is enclosed in Chinese parentheses. | (三年前) |
The titles (of episodes, works, etc.) are enclosed in Chinese double arrow brackets. | 《進擊的巨人》 |
Secondary titles are separated with a Chinese colon. | 《哈利波特:神秘的魔法石》 |
Episode titles are enclosed with Chinese square brackets. | [戰士] |
Miscellaneous titles, such as in on-screen text are enclosed with lenticular brackets. | 【Sub Topic】 |
A Chinese comma is placed after all SFP, except when followed by 你 without a pause. | ❌好啦我明喇。 ✅好啦,我明喇 ❌好春廢啊,你 ✅好春廢啊你 |
Multiple speaker dialogue uses two lines and dialogue that begins with a hyphen without a following space. | -speaker 1 -speaker 2 |
Direct speech styling uses Chinese colon followed by dialogue enclosed in left and right Chinese quotation characters. | 我媽媽話:「唔准去嗰度」 |
When a question is followed by the name of who is being addressed then the question mark is used as the separator as opposed to a comma and a question mark | ❌你仲喺度,阿明? ✅你仲喺度?阿明 |
Only 1 Chinese ellipsis character is used (never 2 as in ……). | ❌…… ✅… |
When an utterance is repeated, transcribe only 1 instance with a trailing Chinese ellipsis character. | ❌ 喂喂喂 ✅ 喂… |
In the case of interrupted speech, a Chinese ellipsis character is used to mark where the speaker is cut off and a new line begins with the new speech. | -點解你… -唔知啊 |
In the case of trailing speech, a Chinese ellipsis character is used. | ❌佢唔可以嘅話~~ ✅佢唔可以嘅話… |
In the case of stammering, the start is separated by a Chinese ellipsis, but this is only done once. | ❌只只不過 ❌只…只…只不過 ✅只…只不過 |
When listing with 同 or 同埋, Chinese list comma is used on the elements that are not connected with the conjunction. | A、B、C同埋D |
Subtitles never end in a period and Chinese period is never used. | ❌我個名叫Tom。 ✅我個名叫Tom |
The middle period is never used. | ❌哈利·波特 ✅哈利波特 |
Italics are never used. |
In general, these subtitles are a learning resource. The goal is not to transcribe verbatim all utterances in their entirety. The goal is have a complete subtitle that contains information useful to the learner. We do not want to include very minor, incidental speech/sounds, or unintentionally incorrect speech. Sentence Final Particles are transcribed as accurately as possible to benefit the learner.
Speech | Example |
---|---|
The sound of hesitation, e.g. "uh" (a6 / e6), is only transcribed when drawn out and precedes a longer utterance. When directly following a word, it should not be transcribed and an ellipsis should be used instead. | ✅誒…你係邊個啊? ❌你誒…係邊個啊? ✅你…係邊個啊? ❌佢…誒…佢係…誒…我唔知 ✅佢…佢係…我唔知 |
The Chinese exclamation point is used sparingly. For example, for exceptionally loud/declarative yells or for emphasis among quieter speech, such as when calling someone's name. Even if a character is yelling, it's discouraged to end every line with an exclamation point. | |
Miscellaneous grunts, yells, screams, and the like are not transcribed. | |
Ah, oh, hmm, huh, mhmm and other acknowledgement noises are transcribed sparingly and primarily in the case that they form part of other utterances. | ❌吓? ✅吓?你講咩啊? |
The aim behind establishing conventions is to promote consistency that will enhance learning and further promote written Cantonese. These conventions go out of their way to promote unambiguous character usage in many instances.
Note
The conventions have been evolving over time and many of the existing subtitles have not been updated in accordance with the latest standards.
Syllable\Tone | 1 | 2 | 3 | 4 | 5 | 6 |
---|---|---|---|---|---|---|
aa | 吖 | 嗄 | 啊 | 呀 | 咓 | 𠻺 |
aak | 𡅅 | |||||
baa | 罷 | |||||
bo | 噃 | |||||
gaa | 𠺢 | 𠿪 | 㗎 | 嘎 | ||
gaak | 𠺝 | |||||
ge | 𠸏 | 嘅 | ||||
gwaa | 啩 | |||||
haa | 吓 | 下 | ||||
he | 嚱 | |||||
ho | 嗬 | |||||
laa | 啦 | 喇 | 嗱 | |||
laak | 嘞 | |||||
le | 呢 | 咧 | 哩 | |||
lo | 囖 | 咯 | 囉 | |||
lok | 嚛 | |||||
lu | 嚕 | |||||
maa | 嘛 | 嗎 | ||||
me | 咩 | |||||
tim | 𠻹 | |||||
waa | 哇 | |||||
wo | 喎 | 啝 | 𡁜 | |||
zaa | 吒 | 咋 | 喳 | 𠾵 | 咤 | |
ze | 啫 | |||||
zek | 唧 |
In the interest of clarity, the contraction aa6 maa5 into maa5 should be written out in full: 𠻺嗎. However, when aa6 maa5 contracts to aa5, you can use the character 咓 unambiguously.
Jyutping|粵拼 | Honzi|漢字 |
---|---|
a1 maa3 | 吖嘛 |
a1 naa4 | 吖嗱 |
a3 ho2 | 啊嗬 |
a3 haa2 | 啊吓 |
a6 maa5 | 𠻺嗎 |
baa2 laa1 | 罷啦 |
ga1 maa3 | 𠺢嘛 |
ga3 wo3 | 㗎喎 |
ge3 ze1 | 嘅啫 |
ge3 zek1 | 嘅唧 |
ha6 waa5 | 下哇 |
la1 maa3 | 啦嘛 |
la3 wo3 | 喇喎 |
za1 maa3 | 吒嘛 |
Jyutping|粵拼 | Honzi|漢字 | Examples|例子 |
---|---|---|
aa3 | 阿 | 阿爸、阿伯、阿明 |
can1 | 親 | 親隻腳、跌親 |
dei2 | 地 | 嘛嘛地、悶悶地 |
di1 | 啲 | 細啲、靚啲 |
dou2 | 到 | 見到、做唔到 |
faan1 | 返 | 好返、畀返你 |
gam2 | 噉 | 噉樣、係噉 |
gam3 | 咁 | 咁多、咁耐 |
haa5 | 下 | 睇下、試下、行行下 |
kiu1 | Q | 痴Q線、做乜Q啊 |
maai4 | 埋 | 同埋、畀埋、交埋 |
saai3 | 晒 | 多謝晒、辛苦晒 |
Jyutping|粵拼 | Honzi|漢字 | Explanation|解釋 |
---|---|---|
aai1, aai2 | 唉 | |
ai1 jaa3/5/6, ai1 jaak3 | 哎吔 | |
ai1 jo3 | 哎喲 | |
ce1 | 唓 | "tsk" |
e2, ei2 | 欸 | |
e4, e6 | 誒 | "uh" |
hei1 | 嘿 | as a greeting / shows satisfaction |
hei5 | 唏 | shows discontent |
hng6 | 哼 | "hmph" |
ji2 | 咦 | |
m2, m3, m6 | 嗯 | "hmm"; "um"; "mhmm" |
naa4 | 嗱 | "look"; call for attention |
o1 | 喔 | |
o2, o3, o4, o5, o6 | 哦 | |
oi2, oi3 | 噯 | variant of 喂 |
ou3 | 噢 | |
syu4 | 𭉝 | "shh" |
u1 | 嗚 | "ooo"; sound of interest/wonder |
waa1 | 哇 | "wah"; sound of crying |
waa3, waa4 | 嘩 | "wow" |
wai2, wai3 | 喂 |
Where applicable, the Hong Kong variant of characters is chosen.
✅ Selected Variant | ❌ Other Variants | Jyutping |
---|---|---|
為 | 爲 | wai4 |
揾 | 搵 | wan2 |
揀 | 㨂 | gaan2 |
説 | 說 | syut3 |
牀 | 床 | cong4 |
群 | 羣 | kwan4 |
裏 | 裡 | leoi5 |
麪 | 麵 | min6 |
教 | 敎 | gaau3 |
秘 | 祕 | bei3 |
市 | 巿 | si5 |
眾 | 衆 | zung3 |
濕 | 溼 | sap1 |
雞 | 鷄 | gai1 |
告 | 吿 | gou3 |
污 | 汙 | wu1 |
泄 | 洩 | sit3 |
罵 | 駡 | maa6 |
鏽 | 銹 | sau3 |
鏽 | 銹 | sau3 |
鈎 | 鉤 | ngau1 |
衞 | 衛 | wai6 |
葱 | 蔥 | cung1 |
豔 | 艷 | jim6 |
藥 | 葯 | joek6 |
匯 | 滙 | wui6 |
啓 | 啟 | kai2 |
獎 | 奬 | zoeng2 |
✅ Selected Variant | ❌ Other Variants | Jyutping | Explanation |
---|---|---|---|
畀 | 俾 | bei2 | |
搞 | 攪 | gaau2 | 指「做」;打搞晒 |
攰 | 癐 | gui6 | |
郁 | 喐 | juk1 | |
𦧷 | 舔、lem | lem2 | 用條脷輕輕力掃 |
唔單止 | 唔單只 | m4 daan1 zi2 | |
唔止 | 唔只 | m4 zi2 | |
而家 | 宜家 | ji4 gaa1 | |
唔使 | 唔駛 | m4 sai2 | |
即係 | 姐係、啫係、唧係 | zik1 hai6 | |
淨係 | 剩係 | zing6 hai6 | |
呢個 | 依個 | ni1 go3 | |
依個 | 𠵱個 | ji1 go3 | |
傾偈 | 傾計 | king1 gai2 | |
抌 | 丼、揼 | dam2 | |
𢱕 | 溚、揼 | dap6 | |
揼 | 耽 | dam1, dam3, dam6 | |
着 | 著 | zoek3, zoek6 | |
著 | 着 | zyu3 | |
兇 | 凶 | hung1 | 兇猛、兇手、兇某人 |
凶 | 兇 | hung1 | 泛指一啲不祥嘅嘢(凶兆) |
證 | 証 | zing3 | |
㧻 | 篤、督、厾 | duk1 | 係動詞,指「刺」、「戳」 |
涿 | 篤 | duk1 | 係量詞,指「一涿屎」或「一涿尿」 |
督 | 篤 | duk1 | 用於「監督」、「都督」等 |
篤 | 督 | duk1 | 用於「篤信」、「篤定」等 |
㞘 | 𡰪 | duk1 | 指最尾或末端,例如「行到㞘」 |
邊 | 便 | bin1, bin6 | 用於「邊度」、「入邊」 |
弊喇 | 𡃇喇 | bai6 laa3 | |
瀨屎、瀨尿 | 賴屎、賴尿 | laai6 si2, laai6 niu6 | |
鬥 | 鬭 | dau3 | 1. 對打 2. 分勝負 3. 花工夫去整一樣嘢 |
抖 | 鬥 | dau3 | 摸;掂 |
唞 | 抖 | tau2 | 休息;歇息(早唞、等等) |
渣 | 鮓、謯、苴 | zaa2 | |
撳 | 㩒 | gam6 | |
好嘢 | 好耶 | hou2 je5 | |
𢯎 | R、摳、撓、𢲷 | ngaau1 | |
𡃴 | 除 | ceoi4 | 臭味 |
枝 | 支 | zi1 | 指植物或木嘅嘢 |
不嬲 | 不溜、不留 | bat1 lau1, bat1 lau2 | 一直 |
𢫏 | 冚 | kam2 | 遮住 |
扻 | 冚 | kam2 | 掌摑 |
撼 | 扻 | ham2 | 撞到 |
冚 | kam2 | 用嚟遮住底下嘅嘢(量詞:個) | |
冚 | ham6 | 全部; 接口閂得實 | |
撼 | 冚 | ham6 | 引起強烈感受 |
爹哋 | 爹地、爹啲 | de1 di4 | |
BB | 啤啤 | bi4 bi1 | |
錔 | 搭、塔 | taap3 | 用手銬;鎖 |
𠹷 | 哦 | ngo4 | 好煩噉樣批評或者抱怨 |
髹 | 油 | jau4 | 用油漆或顏料填上顏色、覆蓋表面 |
𨈇 | 𨂾、揇、檻 | laam3 | |
讕 | 懶 | laan2 | 扮做;自命 |
大部份 | 大部分 | daai6 bou6 fan6 | |
過份 | 過分 | gwo3 fan6 | |
咭 | 卡 | kaat1 | 例如:信用卡 |
卡 | car, carat, 黐住 | kaat1 | |
朦 | 矇、蒙 | mung4 | 朦朧;模糊 |
賜予 | 賜與 | ci3 jyu5 | |
抰 | 揚 | joeng2 | 揮動一件軟軟地嘅物件 |
揚 | joeng4 | 傳揚;張揚 | |
哽 | 啃、鯁、骾 | kang2 | 夾硬吞落喉嚨;有啲嘢食卡咗喺喉嚨 |
𬒔 | 哽 | ang2 | 一啲突起嘅嘢頂住,令人唔舒服或痛 |
濕𣲷𣲷 | 濕立立 | sap1 nap6 nap6 | |
嗱嗱聲 | 拿拿聲、啦啦聲 | laa4 laa2 seng1 | |
倔 | 掘 | gwat6 | 執著;鈍 |
籮柚 | 囉柚 | lo1 jau2 | |
㓟 | 批、𠜱 | pai1 | 1. 刀法 2. 削走啲嘢 |
唯有 | 惟有 | wai4 jau5 | |
惜 | 錫 | sek3 | 1. 愛、關心、緊張 2. 用嘴唇掂另一個人嘅身體 |
錫 | sek6 | 一種金 | |
冧 | 㨆 | lam1 | 1. 甜蜜、氹人 2. 花植物嘅一部分 3. 冧歌 |
㨆 | 冧 | lam3, lam6 | 1. 跌倒 2. 堆起 3. 連續 |
嘺 | 橋、蹺、巧 | kiu2 | 表示咁啱 |
騎呢怪 | 奇離怪 | ke4 le4 gwaai3, ke4 le4 gwaai2 | |
淝 | fea、啡、fe | fe4 | |
林沈 | 淋糝、林審 | lam4 sam2 | |
拮 | 㓤 | gat1 | 用尖而幼細嘅嘢插入 |
咖哩雞 | 咖喱雞 | gaa3 lei1 gai1 | |
𦧲 | lur | loe1*2 | |
掹 | 擝 | mang1 | |
拈 | lim、令、捻 | lim1 | 紙嘅單位,通常指500張 |
捻 | 掐 | nin2 | 雙手或者多隻手指夾住一嚿嘢 |
吼住 | 睺住、喉住 | hau1 zyu6, hau4 zyu6 | 望住 |
飆 | 標 | biu1 | |
故仔、故事 | 古仔、古事 | gu3 zai2, gu3 si6 | |
囈 | 𠼮、誽、𠱓 | ngai1, ai1 | 央求 |
氹 | 𠱁、𧨾 | tam3 | 1. 令人開心 2. 哄騙 |
凼 | 氹 | tam5 | 1. 水喺凹陷地方 2. 陷阱 |
蓆 | 席 | zek6 | 用竹片等材料製成嘅墊 |
席 | zik6 | ||
𥄫 | gup | gap6 | 1. 偷窺 2. 凝視 |
㨃 | 隊 | deoi2 | 1. 捅 2. 短時間內攝取好多嘢 |
盟塞 | 盲塞、萌塞 | mang4 sik1, mang4 sak1 | |
軟腍腍 | 軟淋淋 | jyun5 nam4 nam4 | |
倔頭路 | 掘頭路 | gwat6 tau4 lou6 | |
𣲷懦 | 𥹉懦 | nap6 no6 | |
䁓 | 裝、𥅾、𥊙 | zong1 | 偷窺 |
韞 | 困 | wan3 | 局限喺一個地方之內,唔出嚟 |
係咁歹 | 係咁大 | hai6 gam3 daai2 | |
係噉咦 | 係咁意 | hai6 gam2 ji2 | |
urk | 嗝 | oet4, oet6, oek4 | |
嘍 | 摟 | lau3 | |
篋 | gip、喼 | gip1 | |
鋅盤 | sink盤、星盤、等等 | sing1 pun2 |