Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

uniprop needs tests for all unicode properties #195

Open
samcv opened this issue Dec 13, 2016 · 7 comments
Open

uniprop needs tests for all unicode properties #195

samcv opened this issue Dec 13, 2016 · 7 comments
Assignees

Comments

@samcv
Copy link
Contributor

samcv commented Dec 13, 2016

Emoji:

See: http://unicode.org/reports/tr51/#Data_Files for how they are determined (These are Boolean)
Values stored here: http://unicode.org/Public/emoji/latest/emoji-data.txt
If you hadn't seen this before, really great resource to check which symbols have a property:

http://unicode.org/cldr/utility/properties.html

  • Emoji
  • Emoji_Presentation
  • Emoji_Modifier
  • Emoji_Modifier_Base
  • Emoji_All

Emoji_Zwj_Sequences is a property of multiple codepoints together (we may need another routine to do this since it is a property of sequences of codepoints).

  • Emoji_Zwj_Sequences

Numeric Properties

  • cjkAccountingNumeric ; kAccountingNumeric
  • cjkOtherNumeric ; kOtherNumeric
  • cjkPrimaryNumeric ; kPrimaryNumeric
  • nv ; Numeric_Value

String Properties

  • cf ; Case_Folding
  • cjkCompatibilityVariant ; kCompatibilityVariant
  • dm ; Decomposition_Mapping
  • FC_NFKC ; FC_NFKC_Closure
  • lc ; Lowercase_Mapping
  • NFKC_CF ; NFKC_Casefold
  • scf ; Simple_Case_Folding ; sfc
  • slc ; Simple_Lowercase_Mapping
  • stc ; Simple_Titlecase_Mapping
  • suc ; Simple_Uppercase_Mapping
  • tc ; Titlecase_Mapping
  • uc ; Uppercase_Mapping

Miscellaneous Properties

  • bmg ; Bidi_Mirroring_Glyph
  • bpb ; Bidi_Paired_Bracket
  • cjkIICore ; kIICore
  • cjkIRG_GSource ; kIRG_GSource
  • cjkIRG_HSource ; kIRG_HSource
  • cjkIRG_JSource ; kIRG_JSource
  • cjkIRG_KPSource ; kIRG_KPSource
  • cjkIRG_KSource ; kIRG_KSource
  • cjkIRG_MSource ; kIRG_MSource
  • cjkIRG_TSource ; kIRG_TSource
  • cjkIRG_USource ; kIRG_USource
  • cjkIRG_VSource ; kIRG_VSource
  • cjkRSUnicode ; kRSUnicode ; Unicode_Radical_Stroke; URS
  • isc ; ISO_Comment
  • JSN ; Jamo_Short_Name
  • na ; Name
  • na1 ; Unicode_1_Name

Name_Alias and Script_Extensions can hold multiple values. It is not yet determined how we will access them once they are added to some backend

  • Name_Alias ; Name_Alias
  • scx ; Script_Extensions

Catalog Properties

  • age ; Age
  • blk ; Block
  • sc ; Script

Enumerated Properties

  • bc ; Bidi_Class
  • bpt ; Bidi_Paired_Bracket_Type
  • ccc ; Canonical_Combining_Class
  • dt ; Decomposition_Type
  • ea ; East_Asian_Width
  • gc ; General_Category
  • GCB ; Grapheme_Cluster_Break
  • hst ; Hangul_Syllable_Type
  • InPC ; Indic_Positional_Category
  • InSC ; Indic_Syllabic_Category
  • jg ; Joining_Group
  • jt ; Joining_Type
  • lb ; Line_Break
  • NFC_QC ; NFC_Quick_Check
  • NFD_QC ; NFD_Quick_Check
  • NFKC_QC ; NFKC_Quick_Check
  • NFKD_QC ; NFKD_Quick_Check
  • nt ; Numeric_Type
  • SB ; Sentence_Break
  • WB ; Word_Break

Binary Properties

  • AHex ; ASCII_Hex_Digit
  • Alpha ; Alphabetic
  • Bidi_C ; Bidi_Control
  • Bidi_M ; Bidi_Mirrored
  • Cased ; Cased
  • CE ; Composition_Exclusion
  • CI ; Case_Ignorable
  • Comp_Ex ; Full_Composition_Exclusion
  • CWCF ; Changes_When_Casefolded
  • CWCM ; Changes_When_Casemapped
  • CWKCF ; Changes_When_NFKC_Casefolded
  • CWL ; Changes_When_Lowercased
  • CWT ; Changes_When_Titlecased
  • CWU ; Changes_When_Uppercased
  • Dash ; Dash
  • Dep ; Deprecated
  • DI ; Default_Ignorable_Code_Point
  • Dia ; Diacritic
  • Ext ; Extender
  • Gr_Base ; Grapheme_Base
  • Gr_Ext ; Grapheme_Extend
  • Gr_Link ; Grapheme_Link
  • Hex ; Hex_Digit
  • Hyphen ; Hyphen
  • IDC ; ID_Continue
  • Ideo ; Ideographic
  • IDS ; ID_Start
  • IDSB ; IDS_Binary_Operator
  • IDST ; IDS_Trinary_Operator
  • Join_C ; Join_Control
  • LOE ; Logical_Order_Exception
  • Lower ; Lowercase
  • Math ; Math
  • NChar ; Noncharacter_Code_Point
  • OAlpha ; Other_Alphabetic
  • ODI ; Other_Default_Ignorable_Code_Point
  • OGr_Ext ; Other_Grapheme_Extend
  • OIDC ; Other_ID_Continue
  • OIDS ; Other_ID_Start
  • OLower ; Other_Lowercase
  • OMath ; Other_Math
  • OUpper ; Other_Uppercase
  • Pat_Syn ; Pattern_Syntax
  • Pat_WS ; Pattern_White_Space
  • PCM ; Prepended_Concatenation_Mark
  • QMark ; Quotation_Mark
  • Radical ; Radical
  • SD ; Soft_Dotted
  • STerm ; Sentence_Terminal
  • Term ; Terminal_Punctuation
  • UIdeo ; Unified_Ideograph
  • Upper ; Uppercase
  • VS ; Variation_Selector
  • WSpace ; White_Space ; space
  • XIDC ; XID_Continue
  • XIDS ; XID_Start
  • XO_NFC ; Expands_On_NFC
  • XO_NFD ; Expands_On_NFD
  • XO_NFKC ; Expands_On_NFKC
  • XO_NFKD ; Expands_On_NFKD

Total: 118 + 6 Emoji

Implementation specific properties

These are not official Unicode properties and should not have tests written for them. They are listed here for completeness.

  • MVM_COLLATION_PRIMARY
  • MVM_COLLATION_SECONDARY
  • MVM_COLLATION_TERTIARY
  • MVM_COLLATION_QC
  • Numeric_Value_Numerator
  • Numeric_Value_Denominator
  • NFG_QC
@zoffixznet
Copy link
Contributor

Can this be somehow generated instead of done manually?

@samcv
Copy link
Contributor Author

samcv commented Dec 24, 2016

To generate it I would probably have to write a script to read the UNIDATA files. Maybe it would make sense to generate them after we already have some tests in place. Because otherwise how do you test the generator?

More importantly, a large number of these are derived properties, that are set depending on how a bunch of other properties are set, so it would be quite hard to do.

Once we have tests for all of them it may be a good idea to get some generation in place though for certain properties that would be easy to generate for (that don't rely on a large number of other properties).

@samcv samcv self-assigned this Dec 24, 2016
@flexibeast
Copy link
Contributor

i'd like to help out with this; can i just start adding tests for properties which haven't yet been checked off?

@samcv
Copy link
Contributor Author

samcv commented Jan 16, 2017

Yep! See S15-unicode-information/uniprop.t for the properties

@flexibeast
Copy link
Contributor

Great, thanks! Already looking at it. :-)

@flexibeast
Copy link
Contributor

Okay, i've now made PR #222; only changed five properties, so you can make sure i'm on the right track.

@samcv
Copy link
Contributor Author

samcv commented Jan 16, 2017

Good thank you :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants