This is a project that aims to collect and create accurate written Cantonese subtitles for educational purposes. Accurate subtitles are those that match the spoken dialogue. Written Cantonese is the written form of the Cantonese language that contrasts with what is typically used, known as Standard Written Chinese. Written Cantonese subtitles are seldom used but are very powerful learning resources.
If you would like to contribute to transcripts or subtitles, make a donation, find out about current projects, or simply learn more, please join our Discord server.
Important
Since many of the characters used in these subtitles fall outside the coverage of typical fonts, it is HIGHLY recommended that you install a Cantonese specific font. We recommend installing one of the fonts from https://github.com/chiron-fonts/chiron-hei-hk.
Written Vernacular Cantonese has no accepted standards, however, establishing our own conventions will make this resource even more useful to learners. With these conventions we can use greater specificity than would normally be shown. For example, we have chosen a set of sentences-final particles (SFP) such that each character represents a certain syllable/tone pair. In this way, learners can build a deeper level of understanding of the language.
There are 3 main resources that served as a starting point.
- https://www.cantonese.com.hk/cantonese/sfp/ - The sentence-final particles are largely based on the table used here with some modifications. 𠵝 is dropped in favor of 呀 due to the former being unsupported by almost all fonts, as well as 呀 being far more common. 可 is dropped in favor of 嗬 for disambiguation. Aside from those exceptions, there are some additional particles (gaa5, laa2, laa6, and zaa6) which exist but which were not mentioned in their table, so we devised our own conventions.
- https://jyutping.org/en/blog/typo/ - Many characters for disambiguation are taken directly from the list here.
- https://words.hk/ - A guiding principle behind the conventions are that they are searchable in words.hk which is the most comprehensible and accessible Cantonese dictionary. There are scant exceptions for rare SFP but nearly all selected characters must be searchable. Character variants are also taken directly from what they consider to be the correct Hong Kong variants.
Note
The conventions have been evolving over time and many of the existing subtitles have not been updated in accordance with the latest standards.
| Syllable\Tone | 1 | 2 | 3 | 4 | 5 | 6 |
|---|---|---|---|---|---|---|
| aa | 吖 | 嗄 | 啊 | 呀 | 咓 | 𠻺 |
| aak | 𡅅 | |||||
| baa | 罷 | |||||
| bo | 噃 | |||||
| gaa | 𠺢 | 𠿪 | 㗎 | 嘎 | 㗇 | |
| gaak | 𠺝 | |||||
| ge | 𠸏 | 嘅 | ||||
| gwaa | 啩 | |||||
| haa | 吓 | 下 | ||||
| he | 嚱 | |||||
| ho | 嗬 | |||||
| laa | 啦 | 𠸎 | 喇 | 嗱 | 嚹 | |
| laak | 嘞 | |||||
| le | 呢 | 咧 | 哩 | |||
| lo | 囖 | 咯 | 囉 | |||
| lok | 嚛 | |||||
| lu | 嚕 | |||||
| maa | 嘛 | 嗎 | ||||
| me | 咩 | |||||
| tim | 𠻹 | |||||
| waa | 哇 | |||||
| wo | 喎 | 啝 | 𡁜 | |||
| zaa | 吒 | 咋 | 喳 | 𠾵 | 咤 | |
| ze | 啫 | |||||
| zek | 唧 |
| Jyutping|粵拼 | Honzi|漢字 |
|---|---|
| a1 maa3 | 吖嘛 |
| a1 naa4 | 吖嗱 |
| a3 ho2 | 啊嗬 |
| a3 haa2 | 啊吓 |
| a6 maa5 | 𠻺嗎 |
| a6 le5 | 𠻺哩 |
| baa2 laa1 | 罷啦 |
| ding2 laa1 | 定啦 |
| ga1 maa3 | 𠺢嘛 |
| ga3 wo3 | 㗎喎 |
| ge3 ne1 | 嘅呢 |
| ge3 ze1 | 嘅啫 |
| ge3 zek1 | 嘅唧 |
| ha6 waa5 | 下哇 |
| la1 maa3 | 啦嘛 |
| la3 wo3 | 喇喎 |
| la6 maa5 | 嚹嗎 |
| za1 maa3 | 吒嘛 |
| za6 maa5 | 咤嗎 |
| Jyutping|粵拼 | Honzi|漢字 | Examples|例子 |
|---|---|---|
| aa3 | 阿 | 阿爸、阿伯、阿明 |
| can1 | 親 | 親隻腳、跌親 |
| dei2 | 地 | 嘛嘛地、悶悶地 |
| di1 | 啲 | 細啲、靚啲 |
| dou2 | 到 | 見到、做唔到 |
| faan1 | 返 | 好返、畀返你 |
| gam2 | 噉 | 噉樣、係噉 |
| gam3 | 咁 | 咁多、咁耐 |
| haa5 | 下 | 睇下、試下、行行下 |
| kiu1 | Q | 痴Q線、做乜Q啊 |
| maai4 | 埋 | 同埋、畀埋、交埋 |
| saai3 | 晒 | 多謝晒、辛苦晒 |
| Jyutping|粵拼 | Honzi|漢字 | Explanation|解釋 |
|---|---|---|
| aai1, aai2 | 唉 | sigh of exasperation |
| ai1 jaa3/5/6, ai1 jaak3 | 哎吔 | |
| ai1 jo3 | 哎喲 | |
| bai6 laa3 | 弊喇 | "oh no" |
| ce1 | 唓 | "tsk" |
| e2, ei2 | 欸 | |
| e4, e6 | 誒 | "uh" |
| hei1 | 嘿 | as a greeting / shows satisfaction |
| hei5 | 唏 | shows discontent |
| hng6 | 哼 | "hmph" |
| hou2 je5 | 好嘢 | woohoo; yeah |
| ji2 | 咦 | |
| m2 | 呣 | "mmm"; sound of enjoyment of food |
| m2, m3, m6 | 嗯 | "hmm"; "um"; "mhmm" |
| naa4 | 嗱 | "look"; call for attention |
| o1 | 喔 | |
| o2, o3, o4, o5, o6 | 哦 | |
| oi2, oi3 | 噯 | variant of 喂 |
| ou3 | 噢 | |
| syu4 | 𭉝 | "shh" |
| u1 | 嗚 | "ooo"; sound of interest/wonder |
| waa1 | 哇 | "wah"; sound of crying |
| waa3, waa4 | 嘩 | "wow" |
| wai2, wai3 | 喂 |
Where applicable, these Hong Kong variants are used. These map 1:1.
| ✅ Selected Variant | ❌ Other Variants | Jyutping |
|---|---|---|
| 為 | 爲 | wai4 |
| 揾 | 搵 | wan2 |
| 揀 | 㨂 | gaan2 |
| 説 | 說 | syut3 |
| 牀 | 床 | cong4 |
| 群 | 羣 | kwan4 |
| 裏 | 裡 | leoi5 |
| 麪 | 麵 | min6 |
| 教 | 敎 | gaau3 |
| 秘 | 祕 | bei3 |
| 市 | 巿 | si5 |
| 眾 | 衆 | zung3 |
| 濕 | 溼 | sap1 |
| 雞 | 鷄 | gai1 |
| 告 | 吿 | gou3 |
| 污 | 汙 | wu1 |
| 泄 | 洩 | sit3 |
| 罵 | 駡 | maa6 |
| 鏽 | 銹 | sau3 |
| 鈎 | 鉤 | ngau1 |
| 衞 | 衛 | wai6 |
| 葱 | 蔥 | cung1 |
| 豔 | 艷 | jim6 |
| 藥 | 葯 | joek6 |
| 匯 | 滙 | wui6 |
| 啓 | 啟 | kai2 |
| 獎 | 奬 | zoeng2 |
| ✅ Selected Variant | ❌ Other Variants | Jyutping | Explanation |
|---|---|---|---|
| 畀 | 俾 | bei2 | |
| 搞 | 攪 | gaau2 | 指「做」;搞錯 |
| 打攪晒 | 打搞晒 | daa2 gaau2 saai3 | |
| 攰 | 癐 | gui6 | |
| 郁 | 喐 | juk1 | |
| 𦧷 | 舔、lem | lem2 | 用條脷輕輕力掃 |
| 𦧲 | lur | loe1*2 | |
| 唯有 | 惟有 | wai4 jau5 | |
| 只係 | 衹係 | zi2 hai6 | |
| 之不過 | 只不過 | zi1 bat1 gwo3 | |
| 只不過 | 之不過 | zi2 bat1 gwo3 | |
| 唔止 | 唔只 | m4 zi2 | |
| 唔單止 | 唔單只 | m4 daan1 zi2 | |
| 唔止 | 唔只 | m4 zi2 | |
| 而家 | 宜家 | ji4 gaa1 | |
| 唔使 | 唔駛 | m4 sai2 | |
| 即係 | 姐係、啫係、唧係 | zik1 hai6 | |
| 淨係 | 剩係 | zing6 hai6 | |
| 呢個 | 依個 | ni1 go3 | |
| 依個 | 𠵱個 | ji1 go3 | |
| 傾偈 | 傾計 | king1 gai2 | |
| 抌 | 丼、揼 | dam2 | |
| 𢱕 | 溚、揼 | dap6 | |
| 揼 | 耽 | dam1, dam3, dam6 | |
| 着 | 著 | zoek3, zoek6 | |
| 著 | 着 | zyu3 | |
| 證 | 証 | zing3 | |
| 㧻 | 篤、督、厾 | duk1 | 係動詞,指「刺」、「戳」 |
| 涿 | 篤 | duk1 | 係量詞,指「一涿屎」或「一涿尿」 |
| 督 | 篤 | duk1 | 用於「監督」、「都督」等 |
| 篤 | 督 | duk1 | 用於「篤信」、「篤定」等 |
| 㞘 | 𡰪 | duk1 | 指最尾或末端,例如「行到㞘」 |
| 邊 | 便 | bin1, bin6 | 用於「邊度」、「入邊」 |
| 唞 | 抖 | tau2 | 休息;歇息(早唞、等等) |
| 撳 | 㩒 | gam6 | |
| 爹哋 | 爹地、爹啲 | de1 di4 | |
| BB | 啤啤 | bi4 bi1 | |
| 𠹷 | 哦 | ngo4 | 好煩噉樣批評或者抱怨 |
| 韞 | 困 | wan3 | 局限喺一個地方之內,唔出嚟 |
| ✅ Selected Variant | ❌ Other Variants | Jyutping | Explanation |
|---|---|---|---|
| 𠹻 | 陣 | zam6 | 氣味、風嘅量詞 |
| 𡃴 | 除 | ceoi4 | 臭味 |
| 瀨屎、瀨尿 | 賴屎、賴尿 | laai6 si2, laai6 niu6 | |
| 鬥 | 鬭 | dau3 | 1. 對打 2. 分勝負 3. 花工夫去整一樣嘢 |
| 抖 | 鬥 | dau3 | 摸;掂 |
| 𢯎 | R、摳、撓、𢲷 | ngaau1 | |
| 渣 | 鮓、謯、苴 | zaa2 | |
| 枝 | 支 | zi1 | 指植物或木嘅嘢 |
| 不嬲 | 不溜、不留 | bat1 lau1, bat1 lau2 | 一直 |
| 𢫏 | 冚 | kam2 | 遮住 |
| 扻 | 冚 | kam2 | 掌摑 |
| 撼 | 扻 | ham2 | 撞到 |
| 冚 | kam2 | 用嚟遮住底下嘅嘢(量詞:個) | |
| 冚 | ham6 | 全部; 接口閂得實 | |
| 撼 | 冚 | ham6 | 引起強烈感受 |
| 兇 | 凶 | hung1 | 兇猛、兇手、兇某人 |
| 凶 | 兇 | hung1 | 泛指一啲不祥嘅嘢(凶兆) |
| 錔 | 搭、塔 | taap3 | 用手銬;鎖 |
| 髹 | 油 | jau4 | 用油漆或顏料填上顏色、覆蓋表面 |
| 𨈇 | 𨂾、揇、檻 | laam3 | |
| 讕 | 懶 | laan2 | 扮做;自命 |
| 大部份 | 大部分 | daai6 bou6 fan6 | |
| 過份 | 過分 | gwo3 fan6 | |
| 咭 | 卡 | kaat1 | 例如:信用卡 |
| 卡 | car, carat, 黐住 | kaat1 | |
| 朦 | 矇、蒙 | mung4 | 朦朧;模糊 |
| 賜予 | 賜與 | ci3 jyu5 | |
| 抰 | 揚 | joeng2 | 揮動一件軟軟地嘅物件 |
| 揚 | joeng4 | 傳揚;張揚 | |
| 哽 | 啃、鯁、骾 | kang2 | 夾硬吞落喉嚨;有啲嘢食卡咗喺喉嚨 |
| 𬒔 | 哽 | ang2 | 一啲突起嘅嘢頂住,令人唔舒服或痛 |
| 濕𣲷𣲷 | 濕立立 | sap1 nap6 nap6 | |
| 嗱嗱聲 | 拿拿聲、啦啦聲 | laa4 laa2 seng1 | |
| 倔 | 掘 | gwat6 | 執著;鈍 |
| 籮柚 | 囉柚 | lo1 jau2 | |
| 㓟 | 批、𠜱 | pai1 | 1. 刀法 2. 削走啲嘢 |
| 冧 | 㨆 | lam1 | 1. 甜蜜、氹人 2. 花植物嘅一部分 3. 冧歌 |
| 㨆 | 冧 | lam3, lam6 | 1. 跌倒 2. 堆起 3. 連續 |
| 嘺 | 橋、蹺、巧 | kiu2 | 表示咁啱 |
| 騎呢怪 | 奇離怪 | ke4 le4 gwaai3, ke4 le4 gwaai2 | |
| 淝 | fea、啡、fe | fe4 | |
| 拮 | 㓤 | gat1 | 用尖而幼細嘅嘢插入 |
| 咖哩雞 | 咖喱雞 | gaa3 lei1 gai1 | |
| 掹 | 擝 | mang1 | |
| 拈 | lim、令、捻 | lim1 | 紙嘅單位,通常指500張 |
| 捻 | 掐 | nin2 | 雙手或者多隻手指夾住一嚿嘢 |
| 吼住 | 睺住、喉住 | hau1 zyu6, hau4 zyu6 | 望住 |
| 飆 | 標 | biu1 | |
| 故仔、故事 | 古仔、古事 | gu3 zai2, gu3 si6 | |
| 囈 | 𠼮、誽、𠱓 | ngai1, ai1 | 央求 |
| 氹 | 𠱁、𧨾 | tam3 | 1. 令人開心 2. 哄騙 |
| 凼 | 氹 | tam5 | 1. 水喺凹陷地方 2. 陷阱 |
| 蓆 | 席 | zek6 | 用竹片等材料製成嘅墊 |
| 席 | zik6 | ||
| 𥄫 | gup | gap6 | 1. 偷窺 2. 凝視 |
| 㨃 | 隊 | deoi2 | 1. 捅 2. 短時間內攝取好多嘢 |
| 盟塞 | 盲塞、萌塞 | mang4 sik1, mang4 sak1 | |
| 軟腍腍 | 軟淋淋 | jyun5 nam4 nam4 | |
| 倔頭路 | 掘頭路 | gwat6 tau4 lou6 | |
| 𣲷懦 | 𥹉懦 | nap6 no6 | |
| 䁓 | 裝、𥅾、𥊙 | zong1 | 偷窺 |
| 係咁歹 | 係咁大 | hai6 gam3 daai2 | |
| 係噉咦 | 係咁意 | hai6 gam2 ji2 | |
| urk | 嗝 | oet4, oet6, oek4 | |
| 嘍 | 摟 | lau3 | |
| 篋 | gip、喼 | gip1 | |
| 鋅盤 | sink盤、星盤、等等 | sing1 pun2 |
This section details how the subtitles should look. In general, Traditional characters are used as opposed to simplified characters, since they can always be converted to the latter with relative ease.
The goal of these subtitles is to be as useful to learners as possible. The goal is NOT to be as faithful to the literal utterances as spoken by the actors or voice actors. Put another way, we want to capture intended, correct speech, and not misspeaks or agrammatical speech. Furthermore, while the subtitles do aim at comprehensive coverage of what is said, grunts, yells, laughter, and miscellaneous expressive noises should in general be transcribed sparingly and, in some cases, not at all. Such subtitles, broadly speaking, don't contribute to building understanding of the language. To this end, it is recommended to transcribe most interjections only in so far as they are followed by or form part of a longer utterance.
- .srt format
- single line max length is 17.5 characters
The .srt subtitle format is chosen because of its wide-ranging compatibility especially with language learning tools such as pop-up dictionaries.
- lines can appear 0-50ms before the start of the speech
- lines should slightly trail the end of speech (50-100ms) when possible (e.g. no scene change or interruption)
- lines that end within roughly 50ms of a scene change should be synced with the scene change
- lines with a length of 3 characters or more need a minimum duration of 750ms
- this can be shorter in the case of a scene change or based on other factors such as lots of speech or interrupted speech
- lines with a length of 2 or fewer characters don't have to follow that minimum
- background dialogue does not have to be subtitled
- if subtitled, use {\an8} tag to put speech on top
| Explanation | Examples |
|---|---|
| Written or background info is enclosed in Chinese parentheses. | (三年前) |
| The titles (of episodes, works, etc.) are enclosed in Chinese double arrow brackets. | 《進擊的巨人》 |
| Secondary titles are separated with a Chinese colon. | 《哈利波特:神秘的魔法石》 |
| Episode titles are enclosed with Chinese square brackets. | [戰士] |
| Miscellaneous titles, such as in on-screen text are enclosed with lenticular brackets. | 【Sub Topic】 |
| A Chinese comma is placed after all SFP, except when followed by 你 without a pause. | ❌好啦我明喇。 ✅好啦,我明喇 ❌好春廢啊,你 ✅好春廢啊你 |
| Multiple speaker dialogue uses two lines and dialogue that begins with a hyphen without a following space. | -speaker 1 -speaker 2 |
| Direct speech styling uses Chinese colon followed by dialogue enclosed in left and right Chinese quotation characters. | 我媽媽話:「唔准去嗰度」 |
| When a question is followed by the name of who is being addressed then the question mark is used as the separator as opposed to a comma and a question mark | ❌你仲喺度,阿明? ✅你仲喺度?阿明 |
| Only 1 Chinese ellipsis character is used (never 2 as in ……). | ❌…… ✅… |
| When an utterance is repeated, transcribe only 1 instance with a trailing Chinese ellipsis character. | ❌ 喂喂喂 ✅ 喂… |
| In the case of interrupted speech, a Chinese ellipsis character is used to mark where the speaker is cut off and a new line begins with the new speech. | -點解你… -唔知啊 |
| In the case of trailing speech, a Chinese ellipsis character is used. | ❌佢唔可以嘅話~~ ✅佢唔可以嘅話… |
| In the case of stammering, the start is separated by a Chinese ellipsis, but this is only done once. | ❌只只不過 ❌只…只…只不過 ✅只…只不過 |
| When listing with 同 or 同埋, Chinese list comma is used on the elements that are not connected with the conjunction. | A、B、C同埋D |
| Subtitles never end in a period and Chinese period is never used. | ❌我個名叫Tom。 ✅我個名叫Tom |
| The middle period is never used. | ❌哈利·波特 ✅哈利波特 |
| Italics are never used. |
In general, these subtitles are a learning resource. The goal is not to transcribe verbatim all utterances in their entirety. The goal is have a complete subtitle that contains information useful to the learner. We do not want to include very minor, incidental speech/sounds, or unintentionally incorrect speech. Sentence Final Particles are transcribed as accurately as possible to benefit the learner.
| Speech | Example |
|---|---|
| The sound of hesitation, e.g. "uh" (a6 / e6), is only transcribed when drawn out and precedes a longer utterance. When directly following a word, it should not be transcribed and an ellipsis should be used instead. | ✅誒…你係邊個啊? ❌你誒…係邊個啊? ✅你…係邊個啊? ❌佢…誒…佢係…誒…我唔知 ✅佢…佢係…我唔知 |
| The Chinese exclamation point is used sparingly. For example, for exceptionally loud/declarative yells or for emphasis among quieter speech, such as when calling someone's name. Even if a character is yelling, it's discouraged to end every line with an exclamation point. | |
| Miscellaneous grunts, yells, screams, and the like are not transcribed. | |
| Ah, oh, hmm, huh, mhmm and other acknowledgement noises are transcribed sparingly and primarily in the case that they form part of other utterances. | ❌吓? ✅吓?你講咩啊? |