/
README
230 lines (173 loc) · 9.88 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
U
U extends Ruby’s Unicode support. It provides a string class called
U::String with an interface that mimics that of the String class in Ruby 2.0,
but that can also be used from both Ruby 1.8. This interface also has more
complete Unicode support and never modifies the receiver. Thus, a U::String
is an immutable value object.
U comes with complete and very accurate documentation¹. The documentation can
realistically also be used as a reference to the Ruby String API and may
actually be preferable, as it’s a lot more explicit and complete than the
documentation that comes with Ruby.
¹ See http://disu.se/software/u-1.0/api/
§ Installation
Install u with
% gem install u
§ Usage
Usage is basically the following:
require 'u-1.0'
a = 'äbc'
a.upcase # ⇒ 'äBC'
a.u.upcase # ⇒ 'ÄBC'
That is, you require the library, then you invoke #u on a String. This’ll
give you a U::String that has much better Unicode support than a normal
String. It’s important to note that U only uses UTF-8, which means that #u
will try to #encode the String as such. This shouldn’t be an issue in most
cases, as UTF-8 is now more or less the universal encoding – and rightfully
so.
As U::Strings¹ are immutable value objects, there’s also a U::Buffer²
available for building U::Strings efficiently.
See the API³ for more complete usage information. The following sections
will only cover the extensions and differences that U::String exhibit from
Ruby’s built-in String class.
¹ See http://disu.se/software/u-1.0/api/U/String/
² See http://disu.se/software/u-1.0/api/U/Buffer/
³ See http://disu.se/software/u-1.0/api/
§ Unicode Properties
There are quite a few property-checking interrogators that let you check
if all characters in a U::String have the given Unicode property:
• #alnum?¹
• #alpha?²
• #assigned?³
• #case_ignorable?⁴
• #cased?⁵
• #cntrl?⁶
• #defined?⁷
• #digit?⁸
• #graph?⁹
• #newline?¹⁰
• #print?¹¹
• #punct?¹²
• #soft_dotted?¹³
• #space?¹⁴
• #title?¹⁵
• #valid?¹⁶
• #wide?¹⁷
• #wide_cjk?¹⁸
• #xdigit?¹⁹
• #zero_width?²⁰
¹ See http://disu.se/software/u-1.0/api/U/String/#alnum-p-instance-method
² See http://disu.se/software/u-1.0/api/U/String/#alpha-p-instance-method
³ See http://disu.se/software/u-1.0/api/U/String/#assigned-p-instance-method
⁴ See http://disu.se/software/u-1.0/api/U/String/#case_ignorable-p-instance-method
⁵ See http://disu.se/software/u-1.0/api/U/String/#cased-p-instance-method
⁶ See http://disu.se/software/u-1.0/api/U/String/#cntrl-p-instance-method
⁷ See http://disu.se/software/u-1.0/api/U/String/#defined-p-instance-method
⁸ See http://disu.se/software/u-1.0/api/U/String/#digit-p-instance-method
⁹ See http://disu.se/software/u-1.0/api/U/String/#graph-p-instance-method
¹⁰ See http://disu.se/software/u-1.0/api/U/String/#newline-p-instance-method
¹¹ See http://disu.se/software/u-1.0/api/U/String/#print-p-instance-method
¹² See http://disu.se/software/u-1.0/api/U/String/#punct-p-instance-method
¹³ See http://disu.se/software/u-1.0/api/U/String/#soft_dotted-p-instance-method
¹⁴ See http://disu.se/software/u-1.0/api/U/String/#space-p-instance-method
¹⁵ See http://disu.se/software/u-1.0/api/U/String/#title-p-instance-method
¹⁶ See http://disu.se/software/u-1.0/api/U/String/#valid-p-instance-method
¹⁷ See http://disu.se/software/u-1.0/api/U/String/#wide-p-instance-method
¹⁸ See http://disu.se/software/u-1.0/api/U/String/#wide_cjk-p-instance-method
¹⁹ See http://disu.se/software/u-1.0/api/U/String/#xdigit-p-instance-method
²⁰ See http://disu.se/software/u-1.0/api/U/String/#zero_width-p-instance-method
Similar to these methods are
• #folded?¹
• #lower?²
• #upper?³
which check whether a ‹U::String› has been cased in a given manner.
¹ See http://disu.se/software/u-1.0/api/U/String/#folded-p-instance-method
² See http://disu.se/software/u-1.0/api/U/String/#lower-p-instance-method
³ See http://disu.se/software/u-1.0/api/U/String/#upper-p-instance-method
There’s also a #normalized?¹ method that checks whether a ‹U::String› has
been normalized on a given form.
¹ See http://disu.se/software/u-1.0/api/U/String/#normalized-p-instance-method
You can also access certain Unicode properties of the characters of a
U::String:
• #canonical_combining_class¹
• #general_category²
• #grapheme_break³
• #line_break⁴
• #script⁵
• #word_break⁶
¹ See http://disu.se/software/u-1.0/api/U/String/#canonical_combining_class-instance-method
² See http://disu.se/software/u-1.0/api/U/String/#general_category-instance-method
³ See http://disu.se/software/u-1.0/api/U/String/#grapheme_break-instance-method
⁴ See http://disu.se/software/u-1.0/api/U/String/#line_break-instance-method
⁵ See http://disu.se/software/u-1.0/api/U/String/#script-instance-method
⁶ See http://disu.se/software/u-1.0/api/U/String/#word_break-instance-method
§ Locale-specific Comparisons
Comparisons of U::Strings respect the current locale (and also allow you
to specify a locale to use): ‹#<=>›¹, #casecmp², and #collation_key³.
¹ See http://disu.se/software/u-1.0/api/U/String/#comparison-operator
² See http://disu.se/software/u-1.0/api/U/String/#casecmp-instance-method
³ See http://disu.se/software/u-1.0/api/U/String/#collation_key-instance-method
§ Additional Enumerators
There are a couple of additional enumerators in #each_grapheme_cluster¹
and #each_word² (along with aliases #grapheme_clusters³ and #words⁴).
¹ See http://disu.se/software/u-1.0/api/U/String/#each_grapheme_cluster-instance-method
² See http://disu.se/software/u-1.0/api/U/String/#each_word-instance-method
³ See http://disu.se/software/u-1.0/api/U/String/#grapheme_clusters-instance-method
⁴ See http://disu.se/software/u-1.0/api/U/String/#words-instance-method
§ Unicode-aware Sub-sequence Removal
#Chomp¹, #chop², #lstrip³, #rstrip⁴, and #strip⁵ all look for Unicode
newline and space characters, rather than only ASCII ones.
¹ See http://disu.se/software/u-1.0/api/U/String/#chomp-instance-method
² See http://disu.se/software/u-1.0/api/U/String/#chop-instance-method
³ See http://disu.se/software/u-1.0/api/U/String/#lstrip-instance-method
⁴ See http://disu.se/software/u-1.0/api/U/String/#rstrip-instance-method
⁵ See http://disu.se/software/u-1.0/api/U/String/#strip-instance-method
§ Unicode-aware Conversions
Case-shifting methods #downcase¹ and #upcase² do proper Unicode casing
and the interface is further augmented by #foldcase³ and #titlecase⁴.
#Mirror⁵ and #normalize⁶ do conversions similar in nature to the
case-shifting methods.
¹ See http://disu.se/software/u-1.0/api/U/String/#downcase-instance-method
² See http://disu.se/software/u-1.0/api/U/String/#upcase-instance-method
³ See http://disu.se/software/u-1.0/api/U/String/#foldcase-instance-method
⁴ See http://disu.se/software/u-1.0/api/U/String/#titlecase-instance-method
⁵ See http://disu.se/software/u-1.0/api/U/String/#mirror-instance-method
⁶ See http://disu.se/software/u-1.0/api/U/String/#normalize-instance-method
§ Width Calculations
#Width¹ will return the number of cells on a terminal that a U::String
will occupy.
#Center², #ljust³, and #rjust⁴ deal in width rather than length, making
them much more useful for generating terminal output. #%⁵ (and its alias
#format⁶) similarly deal in width.
¹ See http://disu.se/software/u-1.0/api/U/String/#width-instance-method
² See http://disu.se/software/u-1.0/api/U/String/#center-instance-method
³ See http://disu.se/software/u-1.0/api/U/String/#ljust-instance-method
⁴ See http://disu.se/software/u-1.0/api/U/String/#rjust-instance-method
⁵ See http://disu.se/software/u-1.0/api/U/String/#modulo-operator
⁶ See http://disu.se/software/u-1.0/api/U/String/#format-instance-method
§ Extended Type Conversions
Finally, #hex¹, #oct², and #to_i³ use Unicode alpha-numerics for their
respective conversions.
¹ See http://disu.se/software/u-1.0/api/U/String/#hex-instance-method
² See http://disu.se/software/u-1.0/api/U/String/#oct-instance-method
³ See http://disu.se/software/u-1.0/api/U/String/#to_i-instance-method
§ News
§ 1.0.0
Initial public release!
§ Financing
Currently, most of my time is spent at my day job and in my rather busy
private life. Please motivate me to spend time on this piece of software
by donating some of your money to this project. Yeah, I realize that
requesting money to develop software is a bit, well, capitalistic of me.
But please realize that I live in a capitalistic society and I need money
to have other people give me the things that I need to continue living
under the rules of said society. So, if you feel that this piece of
software has helped you out enough to warrant a reward, please PayPal a
donation to now@disu.se¹. Thanks! Your support won’t go unnoticed!
¹ Send a donation:
https://www.paypal.com/cgi-bin/webscr?cmd=_donations&business=now@disu.se&item_name=U
§ Reporting Bugs
Please report any bugs that you encounter to the {issue tracker}¹.
¹ See https://github.com/now/u/issues
§ Authors
Nikolai Weibull wrote the code, the tests, the documentation, and this
README.