-
Notifications
You must be signed in to change notification settings - Fork 7.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PLT-2077 Support CJK hashtags #4555
Conversation
Thanks @cometkim for the pull request! Per the CONTRIBUTING.md file displayed when you created this pull request, we need to add you to the list of approved contributors for the Mattermost project. Please help complete the Mattermost contribution license agreement? This is a standard procedure for many open source projects. Your form should be processed within 24 hours and reviewers for your pull request will be able to proceed. Please let us know if you have any questions. We are very happy to have you join our growing community! If you're not yet a member, please consider joining our Contributors community channel to meet other contributors and discuss new opportunities with the core team. |
Becuase most of keywords in CJK are two characters
Hi @cometkim, thanks for the PR! Looks like you a client unit test failing:
Let me know if you need help fixing it |
Hello, @jwilander. May I ask what these mean of this? // Known issue, trailing underscore is captured by the client-side regex but not the server-side one
assert.equal(
TextFormatting.formatText('#test_').trim(),
"<p><a class='mention-link' href='#' data-hashtag='#test_'>#test_</a></p>"
) Is the test must be passed? I think it depends on which is the right regex for hashtags.
|
Hi @cometkim, you can remove that test if your regex fixes it. I just left that there as a reminder to myself when I started working on that ticket. |
`test_` shouldn't be a hashtag
Hi @hmhealey, I removed the test, and all tests were passed. But I think we still need more considering about hashtags allowed. |
I don't remember if we originally intended dots to be allowed in hashtags, but we have used it in the past for version numbers like I also looked into how Twitter does their hashtags, and while they don't allow |
I've tested some Japanese word in Spinmint test server. A word including full-width space is detected as a hashtag.(see bellow) I expect to detect "#鰻" as hashtag, but detected "#鰻 他". I think a hashtag should be separated by space regardless of weather full-width or half-width. |
Thanks @cometkim. I can test Japanese only, sorry... :( But, I prefer using the range of Japanese-style punctuation except for full-width space (\u3000). (i.e. |
@cometkim Have you tried using this regex that I added to the ticket |
@hmhealey I've tested that regex also, but it would restrict minimum characters of hashtags on regex side. Did you make any policies about hashtags pattern and length at last meeting? |
No worries. I thought the meeting was in the middle of the night for you, but I wanted to offer it in case you worked unusual hours. We decided to keep it as is for now since we use hashtags including dots (like Regarding the minimum length, I'm not too familiar with CJK, but would 2 character hashtags be common for them? The regex I posted would support 2 character hashtags if we change it to Instead of adding the special case for CJK hashtags on Postgres, we could consider adding a MinimumHashtagLength to the ServiceSettings section of config.json that defaults to 3. That way, users could lower it to 2 if their database is set up to support it. If you're interested in adding something like that, you could do it as part of this PR. If not, you can just leave the minimum length as 3, and I'll file a separate ticket to add it. |
I've fixed the regex to
I think it should be. Because :
Removed back the |
@@ -304,7 +304,7 @@ func Etag(parts ...interface{}) string { | |||
return etag | |||
} | |||
|
|||
var validHashtag = regexp.MustCompile(`^(#[A-Za-zäöüÄÖÜß]+[A-Za-z0-9äöüÄÖÜß_\-]*[A-Za-z0-9äöüÄÖÜß])$`) | |||
var validHashtag = regexp.MustCompile(`^(#\\pL[\\pL\\d\\-_.]*[\\pL\\d])$`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your unit tests are failing. I think it's because of the extra backslashes causing the regex to be incorrect. Since they're surrounded by backquotes (like `), you don't need to escape them. This should just be
var validHashtag = regexp.MustCompile(`^(#\pL[\pL\d\-_.]*[\pL\d])$`)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed invalid escape characters. It is a mistake during copy paste the regex.
|
Spinmint test server created at: http://i-3f7712ab.spinmint.com:8065/pr4555 Test Account 1: Email: Test Account 2: Email: Instance ID: i-3f7712ab |
Just following up on this, here's the ticket to add the minimum hashtag length setting @cometkim, if you're interested in working on that: https://mattermost.atlassian.net/browse/PLT-4793 I've asked @kaakaa to test if the full width hashtag issue is still happening. |
Now I tested. Thanks @cometkim! |
+1 |
Spinmint test server destroyed |
Awesome work @cometkim ! |
Support CJK hashtags
Handle CJK characters when hashtags are parsed.
Using regex patterns with character ranges below.
Exist ranges
Added a new range
Ticket Link
PLT-2077 Support all unicode letters in hashtags
Checklist
Note:
CJK Hashtag may requires to modify database.
please refer a related issue.