Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Browse files

Add chr and ord. They seem to fit better here than in my clj-perls na…

…mespace
  • Loading branch information...
commit b620ede217925c49ebf761616c3dee84cb06dea3 1 parent 8dd41f5
@jafingerhut authored
View
25 src/com/fingerhutpress/text/unicode.clj
@@ -5,6 +5,31 @@
(set! *warn-on-reflection* true)
+(defn ^String chr
+ "Return a string containing only the one specified Unicode code point,
+ although it may contain 1 or 2 UTF-16 code units.
+
+ Warning: Will return an invalid UTF-16 string containing only a
+ leading or trailing surrogate if you give it a codepoint in the
+ surrogate range, 0xD800 through 0xDFFF."
+ [codepoint]
+ (String. (Character/toChars codepoint)))
+
+
+(defn ord
+ "Return the Unicode code point of the first character in string s.
+ If the first character is a UTF-16 surrogate pair, the code point
+ returned is that of the pair, not of the leading surrogate. Return
+ 0 if the string is empty.
+
+ The behavior is undefined if the string is not valid UTF-16."
+ [^CharSequence s]
+ (let [s (.toString s)]
+ (if (= s "") ; special case for Perl compatability
+ 0
+ (.codePointAt s 0))))
+
+
(defmacro bmp-codepoint? [c]
`(let [cp# ~c]
(and (<= 0 cp#) (< cp# Character/MIN_SUPPLEMENTARY_CODE_POINT))))
View
28 test/com/fingerhutpress/text/unicode/test.clj
@@ -18,34 +18,6 @@
{:major major, :minor minor, :patch patch}))
-;; TBD: See if I can use com.fingerhutpress.clj-perls where chr and
-;; ord are defined, instead of redefining it here. DRY.
-
-(defn ^String chr
- "Return a string containing only the one specified Unicode code point,
- although it may contain 1 or 2 UTF-16 code units.
-
- Warning: Will return an invalid UTF-16 string containing only a
- leading or trailing surrogate if you give it a codepoint in the
- surrogate range, 0xD800 through 0xDFFF."
- [codepoint]
- (String. (Character/toChars codepoint)))
-
-
-(defn ord
- "Return the Unicode code point of the first character in string s.
- If the first character is a UTF-16 surrogate pair, the code point
- returned is that of the pair, not of the leading surrogate. Return
- 0 if the string is empty.
-
- The behavior is undefined if the string is not valid UTF-16."
- [^CharSequence s]
- (let [s (.toString s)]
- (if (= s "") ; special case for Perl compatability
- 0
- (.codePointAt s 0))))
-
-
;; Some interesting boundary values, as strings.
(def MIN_CODE_POINT_STR
(chr Character/MIN_CODE_POINT)) ; U+0000
Please sign in to comment.
Something went wrong with that request. Please try again.