This repository was archived by the owner on Jun 1, 2023. It is now read-only.
-
-
Notifications
You must be signed in to change notification settings - Fork 17
This repository was archived by the owner on Jun 1, 2023. It is now read-only.
Unicode mixed script confusables #229
Copy link
Copy link
Closed
Labels
Description
In order to avoid TR39 confusable security hacks, we add the following unicode rules for identifiers and literals:
- The first non-Latin and not-Common unicode script for an identifier is the only allowed one. Others lead to parsers errors.
- Additional unicode scripts can and should be declared via `use utf8 'Greek', 'script-name2'... to prevent mixed script errors. This allows more scripts than in rule 1. This can be scoped in blocks.
- The 'Common' and 'Latin' scripts are always enabled and don't need to be declared.
See http://www.unicode.org/reports/tr39/#Mixed_Script_Detection
This holds for all identifiers (all names: package, gv, sub, variables) and literal numbers.
The scriptname is returned by Unicode::UCD::charscript($codepoint_as_uv)
Currently there exist 131 scripts:
perl -alne'/; (\w+) #/ && print $1' lib/unicore/Scripts.txt | sort -u > scripts.lst
Ahom
Anatolian_Hieroglyphs
Arabic
Armenian
Avestan
Balinese
Bamum
Bassa_Vah
Batak
Bengali
Bopomofo
Brahmi
Braille
Buginese
Buhid
Canadian_Aboriginal
Carian
Caucasian_Albanian
Chakma
Cham
Cherokee
Common
Coptic
Cuneiform
Cypriot
Cyrillic
Deseret
Devanagari
Duployan
Egyptian_Hieroglyphs
Elbasan
Ethiopic
Georgian
Glagolitic
Gothic
Grantha
Greek
Gujarati
Gurmukhi
Han
Hangul
Hanunoo
Hatran
Hebrew
Hiragana
Imperial_Aramaic
Inherited
Inscriptional_Pahlavi
Inscriptional_Parthian
Javanese
Kaithi
Kannada
Katakana
Kayah_Li
Kharoshthi
Khmer
Khojki
Khudawadi
Lao
Latin
Lepcha
Limbu
Linear_A
Linear_B
Lisu
Lycian
Lydian
Mahajani
Malayalam
Mandaic
Manichaean
Meetei_Mayek
Mende_Kikakui
Meroitic_Cursive
Meroitic_Hieroglyphs
Miao
Modi
Mongolian
Mro
Multani
Myanmar
Nabataean
New_Tai_Lue
Nko
Ogham
Ol_Chiki
Old_Hungarian
Old_Italic
Old_North_Arabian
Old_Permic
Old_Persian
Old_South_Arabian
Old_Turkic
Oriya
Osmanya
Pahawh_Hmong
Palmyrene
Pau_Cin_Hau
Phags_Pa
Phoenician
Psalter_Pahlavi
Pau_Cin_Hau
Phags_Pa
Phoenician
Psalter_Pahlavi
Rejang
Runic
Samaritan
Saurashtra
Sharada
Shavian
Siddham
SignWriting
Sinhala
Sora_Sompeng
Sundanese
Syloti_Nagri
Syriac
Tagalog
Tagbanwa
Tai_Le
Tai_Tham
Tai_Viet
Takri
Tamil
Telugu
Thaana
Thai
Tibetan
Tifinagh
Tirhuta
Ugaritic
Vai
Warang_Citi
Yi