Skip to content
Browse files

Add chr and ord. They seem to fit better here than in my clj-perls na…

…mespace
  • Loading branch information...
1 parent 8dd41f5 commit b620ede217925c49ebf761616c3dee84cb06dea3 @jafingerhut committed Jan 26, 2012
Showing with 25 additions and 28 deletions.
  1. +25 −0 src/com/fingerhutpress/text/unicode.clj
  2. +0 −28 test/com/fingerhutpress/text/unicode/test.clj
View
25 src/com/fingerhutpress/text/unicode.clj
@@ -5,6 +5,31 @@
(set! *warn-on-reflection* true)
+(defn ^String chr
+ "Return a string containing only the one specified Unicode code point,
+ although it may contain 1 or 2 UTF-16 code units.
+
+ Warning: Will return an invalid UTF-16 string containing only a
+ leading or trailing surrogate if you give it a codepoint in the
+ surrogate range, 0xD800 through 0xDFFF."
+ [codepoint]
+ (String. (Character/toChars codepoint)))
+
+
+(defn ord
+ "Return the Unicode code point of the first character in string s.
+ If the first character is a UTF-16 surrogate pair, the code point
+ returned is that of the pair, not of the leading surrogate. Return
+ 0 if the string is empty.
+
+ The behavior is undefined if the string is not valid UTF-16."
+ [^CharSequence s]
+ (let [s (.toString s)]
+ (if (= s "") ; special case for Perl compatability
+ 0
+ (.codePointAt s 0))))
+
+
(defmacro bmp-codepoint? [c]
`(let [cp# ~c]
(and (<= 0 cp#) (< cp# Character/MIN_SUPPLEMENTARY_CODE_POINT))))
View
28 test/com/fingerhutpress/text/unicode/test.clj
@@ -18,34 +18,6 @@
{:major major, :minor minor, :patch patch}))
-;; TBD: See if I can use com.fingerhutpress.clj-perls where chr and
-;; ord are defined, instead of redefining it here. DRY.
-
-(defn ^String chr
- "Return a string containing only the one specified Unicode code point,
- although it may contain 1 or 2 UTF-16 code units.
-
- Warning: Will return an invalid UTF-16 string containing only a
- leading or trailing surrogate if you give it a codepoint in the
- surrogate range, 0xD800 through 0xDFFF."
- [codepoint]
- (String. (Character/toChars codepoint)))
-
-
-(defn ord
- "Return the Unicode code point of the first character in string s.
- If the first character is a UTF-16 surrogate pair, the code point
- returned is that of the pair, not of the leading surrogate. Return
- 0 if the string is empty.
-
- The behavior is undefined if the string is not valid UTF-16."
- [^CharSequence s]
- (let [s (.toString s)]
- (if (= s "") ; special case for Perl compatability
- 0
- (.codePointAt s 0))))
-
-
;; Some interesting boundary values, as strings.
(def MIN_CODE_POINT_STR
(chr Character/MIN_CODE_POINT)) ; U+0000

0 comments on commit b620ede

Please sign in to comment.
Something went wrong with that request. Please try again.