Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing of text with non-breaking space has incorrect results. #2

Open
kostrahb opened this issue Jun 12, 2020 · 4 comments
Open

Parsing of text with non-breaking space has incorrect results. #2

kostrahb opened this issue Jun 12, 2020 · 4 comments

Comments

@kostrahb
Copy link

kostrahb commented Jun 12, 2020

I have written a slack bot using slacker, however, during testing we have encountered strange issues with parsing texts with URLs. After a little bit of tinkering, I pinpointed the issue - slack automatically replaces normal space right before URL with a non-breaking one in a command text. I raised a question to slack support and they replied that slack expects the application is able to handle UTF-8 characters. Therefore I would like to ask if it would be possible to replace them either in this library just before command parsing or in slacker before sending the text to this library?

An example which will result in incorrect parsing:

package main

import (
	"fmt"
	"github.com/shomali11/commander"
)

func main() {
	properties, isMatch := commander.NewCommand("set <component> <environment> <xpath> <value>").Match("set be approval xpath-expression\u00A0https://some-url/")
	fmt.Println(isMatch)
	fmt.Println(properties.StringParam("xpath", ""))
}
@shomali11
Copy link
Owner

Good question. I think that this is something that slacker should be handling but I am open to learning your thoughts.

My only concern would be when does UTF-8 characters need to be handled and when should they not be. For example, when the user enters a UTF-8 character in the message.

@kostrahb
Copy link
Author

I am not sure how widespread this library is and how much trouble it would raise. However, IMHO this is a backward compatible change that might even be beneficial to this library as you expand its possible uses to the UTF-8 world. In any case, the decision is up to you I guess, but this stack overflow answer might help you getting the whitespace characters: https://stackoverflow.com/a/46637343/1869278

@dexxtreme
Copy link

I'm just now noticing this issue come up when you copy text containing an HTML link from a Slack channel and paste it back into Slack as a message to the bot. If there is a hostname in the copied text, Slack applies HTML logic to the copied text. According to the "debug" output, this results in normal spaces being converted to "non-breaking spaces", e.g.:

remote\u00a0<http:\/\/www.domain.com|www.domain.com>

... instead of what comes up when you manually type it...

remote <http:\/\/www.domain.com|www.domain.com>

I'd be happy with the ability to modify/filter the incoming text before it is parsed to manually rip out and replace the spaces.

@themarcelor
Copy link

themarcelor commented Dec 12, 2021

Yeah. Same issue here:
shomali11/slacker#94

shomali11/slacker#94 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants