Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse posted URLs for action #109

Closed
beardpapa opened this issue Dec 10, 2014 · 13 comments
Closed

Parse posted URLs for action #109

beardpapa opened this issue Dec 10, 2014 · 13 comments
Labels
bug M-T: A confirmed bug report. Issues are confirmed when the reproduction steps are documented

Comments

@beardpapa
Copy link

So this probably isn't the right forum to ask - but hopefully there are some elegant solutions that people can discuss!
Previously I captured URL's mentioned in chat to pull DNS information for digestion but that stopped working after this most recent release.

For example - I would grab the url and run it through some basic regex to clean it up/remove the http, www, etc:
domain = (msg.match[1].replace /https?:\/\/|www\./g, "")
Later using $domain in the url: http://api.statdns.com/google.com/ANY to collect the DNS data needed.

After this last update, I noticed that slack provides the URLs as either:
<http://www.google.com|www.google.com> or just
<http://www.google.com> depending on how they were entered.

Do people generally remove the brackets and split the string on the | character if it exists? Or is there a better method all together?

@paulhammond
Copy link
Contributor

The formatting of URLs inside slack is documented on our API site - but in short, you can remove <> then the url is the bit before the | (if it exists).

I wonder if we should change the default behavior to handle this inside the adapter - your scripts might not be the only ones that don't expect our formatted URL structure...

@beardpapa
Copy link
Author

Thanks! At the moment I am testing with (msg.match[1].split('|')[0].replace /\<|\>|https?:\/\/|www\./g, "") and it seems to be working.

As always though I assume there are 20+ use cases that I haven't thought up :P

@rufo
Copy link

rufo commented Dec 10, 2014

Yeah, at least one script from hubot-scripts is broken for us, I'm guessing there are more :/

I feel like handling this in the adapter is the cleaner/more correct approach, since otherwise every script would have to have this in mind when used for Slack. (Maybe with some way to opt-out if the formatted behavior is useful?)

@paulhammond paulhammond added the bug M-T: A confirmed bug report. Issues are confirmed when the reproduction steps are documented label Dec 16, 2014
@paulhammond
Copy link
Contributor

I've just merged #122, which converts <http://www.example.com|label> into label http://www.example.com.

I think this fixes the immediate bug, but I'm going to leave this issue open, because some scripts might want access to the original slack-formatted message. That's going to need a bit more work.

@technicalpickles
Copy link
Contributor

@lilyball
Copy link
Contributor

PR #129 stashes the original text on the Message object (on a new subclass of TextMessage called SlackTextMessage).

@paulhammond
Copy link
Contributor

Pull #131 updated the link conversion code slightly:

  • Most links on slack.com have a label which is a cleaner version of the url (eg: <http://www.example.com|example.com>. We now turn these into just the URL and not the URL and label
  • If there's a label we now put the URL in brackets, which looks neater and matches other places we convert our markup into plain text.

Pull #133 provides a way for scripts to access the original message - and fixes the rest of this issue.

@jimmycuadra
Copy link

This has come up in Lita too. I think what should happen is that Slack should be including the raw, unformatted version of the message as part of the payload. Without this, every chat bot (and really, any program at all that's consuming the API) is going to have duplicate logic that tries to undo the formatting Slack has applied to what the user originally typed. This is error prone and fragile.

A big downside of that approach is that the size of each message that is sent from Slack would double, since it included both the formatted and unformatted version of the message. Perhaps there could be an option (maybe to rtm.start?) that allowed the consumer to specify which version of the message they want. Then the payload could include only one copy of the message.

@lilyball
Copy link
Contributor

There's no reason for Slack to include the "raw" message in the payload. Not only is it unnecessary data, but clients may have chosen to post the data already formatted instead of letting the server format it, meaning the server doesn't even have the "raw" text in that case.

The message formatting is fairly straightforward and completely unambiguous, meaning it's not hard to parse it. As of #131, hubot-slack is now parsing it and producing text that is as close as we can reasonably get to something that, were you to submit it back to Slack as an unformatted message, would produce the same formatted message (attachments and link labels won't be reproduced correctly, but there's no unformatted message syntax to produce those).

You're also making the assumption that all consumers of the API want to undo the formatting. That's almost certainly not the case. Hubot wants to undo the formatting because it's not designed around Slack and has no concept of a formatted message, and any scripts written for Hubot are expecting plain text. But any bot written specifically for Slack, or any Slack client, will want the formatted text so they can parse out the semantic data.

PR #133 goes a step further and provides the "raw" (i.e. formatted) text in addition to the parsed text, for scripts that want to specially handle the extra data Slack provides. It also provides the entire slack-client Message object if scripts want to get at attachments or other data, or want to distinguish attachments from the message text. This should all be sufficient to handle the needs of any script.

@paulhammond
Copy link
Contributor

Thanks for the comments!

In addition to what Kevin has said, unfortunately the proposal to send plain text messages is a huge task on our side. Usually that's not an excuse, we do the hard work to make our product good, but in this case it's not going to gain us much. I hate to say it, but it's not something we're planning on doing.

@jimmycuadra: I'll follow up with you in the Lita bug and do everything I can to help you get the text formatting code right. I know our text formatting documentation needs some work, we'll be using the feedback here and there to help make it better.

@beardpapa
Copy link
Author

Did this change again? As of this week I am now getting different results:

robot.respond /TEST (.*)/i, (msg) -> domain = msg.match[1] msg.send "#{domain}"

Now returns:
google.com http://google.com
when passing it just google.com

@paulhammond
Copy link
Contributor

@beardpapa: It's possible #133 might have had an effect, but the tests of this code didn't change significantly. What's the exact version of hubot-slack you're running?

@beardpapa
Copy link
Author

Sorry, I wasn't keeping my dependency list up to date :|
Updated hubot-slack last week and it returned just the URL as expected.

Closing the issue :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug M-T: A confirmed bug report. Issues are confirmed when the reproduction steps are documented
Projects
None yet
Development

No branches or pull requests

6 participants