Skip to content

Commit

Permalink
Japanese edge cases, meet your match.
Browse files Browse the repository at this point in the history
  • Loading branch information
mzsanford committed Jul 13, 2011
1 parent d672eb6 commit fd5d855
Show file tree
Hide file tree
Showing 3 changed files with 33 additions and 2 deletions.
2 changes: 2 additions & 0 deletions README.md
Expand Up @@ -45,6 +45,8 @@ If you are creating a new twitter-text library in a different programming langua
* [FIX] Japanese autolink including long vowel mark (chouon) * [FIX] Japanese autolink including long vowel mark (chouon)
* [FIX] Japanese autolink after a full-width exclamation point * [FIX] Japanese autolink after a full-width exclamation point
* [FIX] Japanese autolink including ideographic iteration mark * [FIX] Japanese autolink including ideographic iteration mark
* [FIX] Add hashtag extraction with indices test for new language hashtags
* [FIX] Add hashtag extraction with indices test for multiple latin hashtags


* v1.4.2 - 2011-07-08 [ Git tag v1.4.2 ] * v1.4.2 - 2011-07-08 [ Git tag v1.4.2 ]
* [FIX] Additional Japanese hashtag autolinking tests * [FIX] Additional Japanese hashtag autolinking tests
Expand Down
5 changes: 4 additions & 1 deletion autolink.yml
Expand Up @@ -347,11 +347,14 @@ tests:
text: "できましたよー!#日本語ハッシュタグ。" text: "できましたよー!#日本語ハッシュタグ。"
expected: "できましたよー!<a href=\"http://twitter.com/search?q=%23日本語ハッシュタグ\" title=\"#日本語ハッシュタグ\" class=\"tweet-url hashtag\">#日本語ハッシュタグ</a>。" expected: "できましたよー!<a href=\"http://twitter.com/search?q=%23日本語ハッシュタグ\" title=\"#日本語ハッシュタグ\" class=\"tweet-url hashtag\">#日本語ハッシュタグ</a>。"



- description: "Autolink a hashtag containing ideographic iteration mark" - description: "Autolink a hashtag containing ideographic iteration mark"
text: "#云々" text: "#云々"
expected: "<a href=\"http://twitter.com/search?q=%23云々\" title=\"#云々\" class=\"tweet-url hashtag\">#云々</a>" expected: "<a href=\"http://twitter.com/search?q=%23云々\" title=\"#云々\" class=\"tweet-url hashtag\">#云々</a>"


- description: "Autolink multiple hashtags in multiple languages"
text: "Hashtags in #中文, #日本語, #한국말, and #русский! Try it out!"
expected: "Hashtags in <a href=\"http://twitter.com/search?q=%23中文\" title=\"#中文\" class=\"tweet-url hashtag\">#中文</a>, <a href=\"http://twitter.com/search?q=%23日本語\" title=\"#日本語\" class=\"tweet-url hashtag\">#日本語</a>, <a href=\"http://twitter.com/search?q=%23한국말\" title=\"#한국말\" class=\"tweet-url hashtag\">#한국말</a>, and <a href=\"http://twitter.com/search?q=%23русский\" title=\"#русский\" class=\"tweet-url hashtag\">#русский</a>! Try it out!"

urls: urls:
- description: "Autolink URL with pipe character" - description: "Autolink URL with pipe character"
text: "text http://example.com/pipe|character?yes|pipe|character" text: "text http://example.com/pipe|character?yes|pipe|character"
Expand Down
28 changes: 27 additions & 1 deletion extract.yml
Expand Up @@ -445,9 +445,35 @@ tests:
- hashtag: "hashtag" - hashtag: "hashtag"
indices: [7, 15] indices: [7, 15]


- description: "Extract a hastag in a string of multi-byte characters" - description: "Extract a hashtag in a string of multi-byte characters"
text: "会議中 #hashtag 会議中" text: "会議中 #hashtag 会議中"
expected: expected:
- hashtag: "hashtag" - hashtag: "hashtag"
indices: [4, 12] indices: [4, 12]


- description: "Extract multiple valid hashtags"
text: "One #two three #four"
expected:
- hashtag: "two"
indices: [4, 8]
- hashtag: "four"
indices: [15, 20]

- description: "Extract a non-latin hashtag"
text: "Hashtags in #русский!"
expected:
- hashtag: "русский"
indices: [12, 20]

- description: "Extract multiple non-latin hashtags"
text: "Hashtags in #中文, #日本語, #한국말, and #русский! Try it out!"
expected:
- hashtag: "中文"
indices: [12, 15]
- hashtag: "日本語"
indices: [17, 21]
- hashtag: "한국말"
indices: [23, 27]
- hashtag: "русский"
indices: [33, 41]

0 comments on commit fd5d855

Please sign in to comment.