Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Show code block preview of Github code line permalink #128

Closed
mattermod opened this issue Sep 28, 2019 · 11 comments · Fixed by #137
Closed

Show code block preview of Github code line permalink #128

mattermod opened this issue Sep 28, 2019 · 11 comments · Fixed by #137
Assignees
Labels
Help Wanted Community help wanted

Comments

@mattermod
Copy link
Contributor

When a user post contains a Github code line permalink, it should attach a markdown code block that will contain the preview of that section of the code with possibly 3 lines before and after (much like the display shown in code compares in Github:

Screen Shot 2019-09-28 at 5 03 43 PM

The Github plugin should subscribe to the MessageWillBePosted hook and also include a setting to enable/disable this feature.


If you're interested please comment here and come join our "Contributors" community channel on our daily build server, where you can discuss questions with community members and the Mattermost core team. For technical advice or questions, please join our "Developers" community channel.

New contributors please see our Developer's Guide.

JIRA: https://mattermost.atlassian.net/browse/MM-18880

@agnivade
Copy link
Member

agnivade commented Oct 1, 2019

I'll take this one.

@hanzei hanzei transferred this issue from mattermost/mattermost Oct 2, 2019
@agnivade
Copy link
Member

agnivade commented Oct 3, 2019

A rough design sketch:

Step 1: Extract the permalinks from the message

I think a regex is the best way to easily capture all the links without creating a hand-written link parser. This is what I have now https?://(?P<haswww>www.)?github.com/(?P<user>\w+)/(?P<repo>\w+)/blob/(?P<commit>\w+)/(?P<path>[\w/.]+)#(?P<line>[\w-]+)? and it does the job. The size is mainly due to the named capture groups; otherwise it's pretty simple.

Notes: the link must be a permalink to a file having a line number or a line number range. Anything else will not work. Also, more importantly, permalinks inside hyperlinks should not be extracted (Need to see how best to do this).

Step 2: Get the lines

Github does not provide any API to just get the line(s) from a permalink. From the browser it makes a call to https://github.com/preview with some query params and it gets an HTML response. So I think internally, github server does something. So we need to rely on the contents API to give us the file contents and filter out the lines by ourselves.

Notes: The API only supports files of 1MB in size, which I think is a good thing. We don't want to fetch huge files just to show a code preview. Secondly, what if somebody posts 500 permalinks in a single message ? Some thought needs to be given on limiting the no. of requests.

Step 3: Construct the markdown

For now, I have a simple version where I create a link to the given file with a markdown code block showing the desired lines. We can iterate further on the design. I also check the file extension and add the appropriate language tag in the markdown.

I have a rough prototype ready:

Before:
before

After:
after

@marianunez / @hanzei - Does this approach sound reasonable to you ?

Side note: I see that for the /slash commands, the github client is constructed anew every time from the userID in the header. I think perhaps, it is better to wrap all usages of that in a sync.Once, so that the client is created only once and reused for future API calls.

@marianunez
Copy link
Member

This looks great @agnivade! Only a couple of observations:

  1. The contents API seems to return a link to the source of the file where you would have to parse it in order to find the needed lines. One thing that I noticed is that within the permalink it does contain within the DOM the source code lines that Github uses to display. You could just look for the html items of the given lines - you would still need to do some parsing on them but may save us a Github API call. Just another option to explore:

Screen Shot 2019-10-03 at 2 23 02 PM

  1. Looking good 👌

@agnivade
Copy link
Member

agnivade commented Oct 3, 2019

The contents API seems to return a link to the source of the file where you would have to parse it in order to find the needed lines.

Prima facie, that's what it looked like to me too. But it also returns a base64 encoded version of the file contents itself. See the "content" field. So it's just one Github call.

One thing that I noticed is that within the permalink it does contain within the DOM the source code lines that Github uses to display.

Didn't get this. Did you mean to use the github internal API (https://github.com/preview) to get the html ? That uses some undocumented query parameters like markdown_unsupported=false&repository=140969738&subject=8&subject_type=Issue and is subject to change any time from Github.

@marianunez
Copy link
Member

Didn't get this. Did you mean to use the github internal API (https://github.com/preview) to get the html ?

I meant an actual GET Http Request to get the permalink HTML contents but you are right, it's one request either way.

@agnivade
Copy link
Member

agnivade commented Oct 4, 2019

Right, and even then the underlying HTML structure can change anytime. I think it is safer and cleaner to use the API.

@hanzei hanzei changed the title Github Plugin: Show code block preview of Github code line permalink Show code block preview of Github code line permalink Oct 4, 2019
@marianunez
Copy link
Member

Right, and even then the underlying HTML structure can change anytime. I think it is safer and cleaner to use the API.

Agreed!

Also, we should add a #4 to your list for having a setting to enable/disable this feature on the plugin.

@agnivade
Copy link
Member

agnivade commented Oct 4, 2019

Also, we should add a #4 to your list for having a setting to enable/disable this feature on the plugin.

Yes absolutely.

Nearly done with it. Tested all scenarios. Seems to work. Remaining work is to add the config switch, write test cases and add more polish.

Btw: here's a nice visualization of the regex I am using - https://jex.im/regulex/#!cmd=export&flags=&re=https%3F%3A%2F%2F(www%5C.)%3Fgithub%5C.com%2F(%5Cw%2B)%2F(%5Cw%2B)%2Fblob%2F(%5Cw%2B)%2F(%5B%5Cw%2F.%5D%2B)%23(%5B%5Cw-%5D%2B)%3F.

One last note:
The current logic will expand links inside backtick blocks too. To get around this, we would need a full-blown markdown parser to parse every single message. That seems like a lot of work for messages which don't have any permalinks at all. And after that, we need a regex pass again to extract the links.

A better way might be to somehow achieve this on the client side, because the client is already doing the markdown parsing. But that would need passing the github token to the client (which might be a security issue). Github can do that because they own the pipeline end to end.

For now, I think we can go ahead with the current flow which is fast and simple; and circle back on further improvements depending on what the feedback is. At worst case, it is anyways killable with a switch.

@marianunez
Copy link
Member

One last note:
The current logic will expand links inside backtick blocks too. To get around this, we would need a full-blown markdown parser to parse every single message. That seems like a lot of work for messages which don't have any permalinks at all. And after that, we need a regex pass again to extract the links.

A better way might be to somehow achieve this on the client side, because the client is already doing the markdown parsing. But that would need passing the github token to the client (which might be a security issue). Github can do that because they own the pipeline end to end.

For now, I think we can go ahead with the current flow which is fast and simple; and circle back on further improvements depending on what the feedback is. At worst case, it is anyways killable with a switch.

I agree. This border case can be handled as a separate PR as a fix/enhancement in the future.

@ThiefMaster
Copy link

To get around this, we would need a full-blown markdown parser to parse every single message. That seems like a lot of work for messages which don't have any permalinks at all.

Is this really a problem? The autolink plugin already does this for example to avoid the - IMHO broken - problem of replacing stuff inside code blocks.

@agnivade
Copy link
Member

There is certainly an overhead here to pass every single message through a markdown parser. Whether that overhead is acceptable or not is a different question, given that we don't have any SLAs around plugin response latency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Help Wanted Community help wanted
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants