Parsing of text with non-breaking space has incorrect results. #2

kostrahb · 2020-06-12T15:19:08Z

I have written a slack bot using slacker, however, during testing we have encountered strange issues with parsing texts with URLs. After a little bit of tinkering, I pinpointed the issue - slack automatically replaces normal space right before URL with a non-breaking one in a command text. I raised a question to slack support and they replied that slack expects the application is able to handle UTF-8 characters. Therefore I would like to ask if it would be possible to replace them either in this library just before command parsing or in slacker before sending the text to this library?

An example which will result in incorrect parsing:

package main

import (
	"fmt"
	"github.com/shomali11/commander"
)

func main() {
	properties, isMatch := commander.NewCommand("set <component> <environment> <xpath> <value>").Match("set be approval xpath-expression\u00A0https://some-url/")
	fmt.Println(isMatch)
	fmt.Println(properties.StringParam("xpath", ""))
}

The text was updated successfully, but these errors were encountered:

shomali11 · 2020-06-12T15:26:24Z

Good question. I think that this is something that slacker should be handling but I am open to learning your thoughts.

My only concern would be when does UTF-8 characters need to be handled and when should they not be. For example, when the user enters a UTF-8 character in the message.

kostrahb · 2020-06-12T15:35:25Z

I am not sure how widespread this library is and how much trouble it would raise. However, IMHO this is a backward compatible change that might even be beneficial to this library as you expand its possible uses to the UTF-8 world. In any case, the decision is up to you I guess, but this stack overflow answer might help you getting the whitespace characters: https://stackoverflow.com/a/46637343/1869278

dexxtreme · 2020-10-19T15:21:41Z

I'm just now noticing this issue come up when you copy text containing an HTML link from a Slack channel and paste it back into Slack as a message to the bot. If there is a hostname in the copied text, Slack applies HTML logic to the copied text. According to the "debug" output, this results in normal spaces being converted to "non-breaking spaces", e.g.:

remote\u00a0<http:\/\/www.domain.com|www.domain.com>

... instead of what comes up when you manually type it...

remote <http:\/\/www.domain.com|www.domain.com>

I'd be happy with the ability to modify/filter the incoming text before it is parsed to manually rip out and replace the spaces.

themarcelor · 2021-12-12T20:03:24Z

Yeah. Same issue here:
shomali11/slacker#94

shomali11/slacker#94 (comment)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parsing of text with non-breaking space has incorrect results. #2

Parsing of text with non-breaking space has incorrect results. #2

kostrahb commented Jun 12, 2020 •

edited

shomali11 commented Jun 12, 2020

kostrahb commented Jun 12, 2020

dexxtreme commented Oct 19, 2020

themarcelor commented Dec 12, 2021 •

edited

Parsing of text with non-breaking space has incorrect results. #2

Parsing of text with non-breaking space has incorrect results. #2

Comments

kostrahb commented Jun 12, 2020 • edited

shomali11 commented Jun 12, 2020

kostrahb commented Jun 12, 2020

dexxtreme commented Oct 19, 2020

themarcelor commented Dec 12, 2021 • edited

kostrahb commented Jun 12, 2020 •

edited

themarcelor commented Dec 12, 2021 •

edited