Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for manpages via mandoc #1196

Closed
wants to merge 22 commits into from
Closed

Add support for manpages via mandoc #1196

wants to merge 22 commits into from

Conversation

jordemort
Copy link
Contributor

mandoc is small and simple enough that we decided not to sandbox it

In support of https://github.com/github/enterprise2/pull/14408

cc @anth1y @dctucker

lib/github/markups.rb Outdated Show resolved Hide resolved
@jordemort
Copy link
Contributor Author

Hey @kivikakk - what do you think of this? With some additional polish, would it be considered for merging?

@kivikakk
Copy link
Contributor

kivikakk commented May 30, 2018

👋 My responsibilities have shifted far away from linguist and markup this year; @bkeepers and @lildude are probably better people to ask.

@Alhadis
Copy link

Alhadis commented Aug 16, 2018

I'm so grateful this is being considered. Seriously.

I'm not sure if that's a separate language / sub language / same syntax highlighting or different from a linguist perspective; from a cursory glance, linguist seems to have no existing knowledge of it.

It's the same language. Authors don't typically use .mdoc as an extension because the section number is conventionally used instead. For the same reason, .man doesn't see much use as an extension either.

However, there is enough in-the-wild usage of the .mdoc extension on GitHub, so I'll submit a PR later to have it recognised by Linguist.

As for syntax highlighting, well, you don't have anything to worry about. 😉

.Sh SYNOPSIS
.Nm grep
.Bk -words
.Op Fl A Ar num
.Op Fl C Ns Op Ar num
.Op Fl Fl binary-files Ns = Ns Ar value
.Op Fl Fl color Ns Op = Ns Ar when

Yes, you should update regex to support also .tmac, .roff ideally also .1.in, .2.in, .3.in, ... , .8.in

For .roff and .?.in, yes. For .tmac, no. Those files are supposed to be used like includes, so they'll generate little to no output on their own. Attempting to render .tmac files on GitHub will result in lots of blank pages.

@jordemort
Copy link
Contributor Author

There has been another release of mandoc with further improvements to its HTML output since we started on this, so we'll probably want to build that and revisit the stylesheet if we pick this back up.

@Alhadis
Copy link

Alhadis commented Aug 16, 2018

with further improvements to its HTML output

Yeah, that'd be my doing. Sorry about that. 😜

@jordemort
Copy link
Contributor Author

Sorry about that. 😜

@Alhadis No apologies necessary, especially if the changes mean we can toss the XSLT in the trash ;)

@pali
Copy link
Contributor

pali commented Aug 16, 2018

For .roff and .?.in, yes. For .tmac, no. Those files are supposed to be used like includes, so they'll generate little to no output on their own. Attempting to render .tmac files on GitHub will result in lots of
blank pages.

Yea, that is truth, *.tmac is just file with troff macros, so it really should not be rendered as real manpage. So for manpages, at least .man, .man.in, .[0-9] and .[0-9].in are needed.

@Alhadis
Copy link

Alhadis commented Aug 17, 2018

Support for the .mdoc extension landed in github-linguist/linguist@55bf480.

Changes will become live on GitHub after the next release of Linguist.

@Alhadis
Copy link

Alhadis commented Aug 20, 2018

Question: How will eqn(1) markup be handled in rendered output? Its use in manpages is uncommon, but mandoc(1) still supports it using MathML.

However, MathML support is far from universal, with only Firefox having full support. A client-side library like MathJax would therefore be needed to guarantee consistent handling across browsers. This might complicate things unless MathJax is already in use as a production dependency.

In any case, eqn(1) needs consideration, because documents marked-up with equations will almost certainly lose meaning if the eqn blocks are stripped or imprecisely rendered.

Example:

Input:

.Dt Document title
.Dd 2018-08-20
.Ss SYNOPSIS
.EQ
Z ~=~~ U over V
 ~~=~~ {a + b X} over {c + d X}
.EN

Output:

Figure 1
<math class="eqn">
	<mrow><mi>Z</mi><mo>=</mo><mfrac><mi>U</mi><mi>V</mi></mfrac><mo>=</mo><mfrac><mrow><mi>a</mi><mo>+</mo><mi>b</mi><mi>X</mi></mrow><mrow><mi>c</mi><mo>+</mo><mi>d</mi><mi>X</mi></mrow></mfrac></mrow>
</math>

@pali
Copy link
Contributor

pali commented Aug 20, 2018

Question: How will eqn(1) markup be handled in rendered output?

It is really uncommon and I think this can be postponed after basic support will be there.

@Alhadis
Copy link

Alhadis commented Aug 20, 2018

That works for me then. 👍

@pali
Copy link
Contributor

pali commented Aug 28, 2018

@Alhadis: and file extensions .man.in, .mdoc.in and .1.in, .2.in, .3.in, .4.in, .5.in, .6.in, .7.in, .8.in, .9.in would be added too?

@Alhadis
Copy link

Alhadis commented Aug 28, 2018

I'm not sure. I think so, but care would be needed to expand .so requests gracefully so inaccessible or non-existent resources don't mangle output.

I may be a little apprehensive because of the widespread misclassification of files using .[1-9] or .in file extensions, and I don't really have much familiarity with GNU Autotools. All things considered, I'm probably not the best person to ask concerning the suitability of rendering .in documents.

@RalphCorderoy Could you add your thoughts here?

@pali
Copy link
Contributor

pali commented Aug 28, 2018

@Alhadis: about .in files, basically you should strip .in suffix and then look at extension. So something.xyz.in has extension .xyz. In GNU autotools, files with .in means as input for processing. Autotools take something.xyz.in file and generates from it something.xyz by replacing autotools text macros (expand @something@ in it). Most common are file.h.in (as C header file) or man.1.in (as manpage). But any text file can be used.

@Alhadis
Copy link

Alhadis commented Aug 28, 2018

Ah, alright. :) That's what I thought, but I didn't want to assert an assumption.

Cheers!

@pali
Copy link
Contributor

pali commented Oct 19, 2018

Hi! Any progress on this pull request? It is still marked as WIP and lot of people would love to see support for rendering manpages.

@Alhadis
Copy link

Alhadis commented Oct 19, 2018

Seconding this request for a status report. I'd love to start documenting projects with a real formatting language.

Forever sick of markdown.

@jordemort
Copy link
Contributor Author

Hi, I plan to start working again on this on Monday.

@jordemort
Copy link
Contributor Author

Here's my plan for this:

  • Alter the regex to be something like /[1-9][a-z]*(\.in)?|man|mdoc/ - need to check and see what exactly I'm matching against here, is it just going to give me in for a foo.1.in?
  • Return this particular "useless use of cat award"
  • Switch to using fragment output, see if that's fine to pass onwards without further XSLT fiddling
  • Need to find out if it's ok to just pass through/serve MathJAX unaltered for now or investigate how we do equation rendering elsewhere

Please ping me if any of those sound off

@ischwarze
Copy link

All parts of your plan sound reasonable, let us know how it goes and whether you encounter any issues with mandoc. Two minor remarks:

something like /[1-9][a-z]*(\.in)?|man|mdoc/ - need to check and see what exactly I'm matching against here, is it just going to give me in for a foo.1.in?

You probably also want files with the extension "*.n", which is customary for Tcl/Tk library manuals. For example, i have these manual pages on my system:

/usr/local/lib/tcl/tcl8.6/man/mann/Tcl.n
/usr/local/lib/tcl/tcl8.6/man/mann/append.n
/usr/local/lib/tcl/tcl8.6/man/mann/argc.n
...

Need to find out if it's ok to just pass through/serve MathJAX unaltered for now or investigate how we do equation rendering elsewhere

Not sure why you mention MathJAX; mandoc does not emit MathJax (or any JavaScript whatsoever). Mandoc merely converts eqn(7) input embedded in manual pages into MathML as defined by HTML5.

@jordemort
Copy link
Contributor Author

@ischwarze I meant MathML; what I was wondering is if we were already using MathJAX elsewhere on GitHub to solve the cross-browser rendering problems, but it appears that we are unable to do so because it ends up causing problems with some other parts of our frontend.

Right now I am considering the following options, with regard to equation support:

  • Pass the MathML through unaltered, and if the user's browser doesn't render it, it doesn't render it
  • Strip the MathML from the output and/or replace it with something like "(equation not rendered)"
  • It seems like it may be possible to convert the MathML to inline SVG, but that's going to necessitate throwing a whole lot of tooling at it

@pali
Copy link
Contributor

pali commented Oct 24, 2018

I think you should really ignore MathML problem right now. Mathematic formulas are not so common in Linux manpages... So the easiest thing is to pass MathML output as is.

@Alhadis
Copy link

Alhadis commented Oct 24, 2018

Agreed, just pass it through unaltered. It's Chrome's bloody fault for not bothering to support a published standard.

We can worry about fine-tuning later.

dctucker and others added 3 commits October 24, 2018 13:47
Co-authored-by: Jordan Webb <jordemort@github.com>
Co-authored-by: Anthony Riley <anth1y@github.com>
Co-authored-by: Jordan Webb <jordemort@github.com>
Co-authored-by: Anthony Riley <anth1y@github.com>
Co-authored-by: Jordan Webb <jordemort@github.com>
Co-authored-by: Anthony Riley <anth1y@github.com>
@jordemort
Copy link
Contributor Author

@Alhadis I do not think there would be any internal barrier to getting this merged, if we could get it cleaned up. I definitely won't have time to work on it myself any time soon, though.

@pali
Copy link
Contributor

pali commented Oct 14, 2020

@jordemort: what is needed to be done? @Alhadis already wrote that is willing to finish it. Can we move forward?

@Alhadis
Copy link

Alhadis commented Oct 23, 2020

@jordemort Be on the lookout for a @mention from me in the next few days. I'll fork from your branch, so you'll need to merge my changes manually (I could submit a separate PR, if you'd prefer—whichever gets this landed sooner).

I've tested it. It's working. I can see HTML markup printed to STDOUT by
running `bundle exec bin/github-markup /usr/share/man/man1/mandoc.1`. So
`test_each_render_has_a_name` can take its cryptic obsession with `name`
properties and shove it up its khyber.
@Alhadis
Copy link

Alhadis commented Oct 24, 2020

Okay, done. That was;

  • 25 minutes of coding,
  • 5 hours struggling to wrap my head around XSLT,
  • Another 4 hours debugging inexplicable test failures,
  • And me finally cracking the shits and nuking the test I couldn't fix, because everything else is working fine

Your move, @jordemort.

@pali
Copy link
Contributor

pali commented Oct 25, 2020

@Alhadis: Some of tests on Travis are failing...

@Alhadis
Copy link

Alhadis commented Oct 26, 2020

Some of tests on Travis are failing...

I wasn't even aware that I could trigger a build re-run by pushing a fork of a PR branch…

Anyway, two of those failures are caused by Ruby 2.1.* and Ruby 2.2.* having no support for <<~squiggly heredocs (which I've removed, even though I doubt very much that GitHub are still supporting two EOL versions of Ruby). The other two failures, though… eh, I've got no explanation. The tests are passing perfectly fine when run locally. I can only assume it's environment-related (possibly something to do with an old version of mandoc(1) being used?).

I'll keep testing, though @jordemort may need to throw me a bone here.

@jordemort
Copy link
Contributor Author

@Alhadis I triggered them, by pushing your commits from your branch to this one - you can also trigger them yourself by opening a new PR and pushing commits to it.

I will need to get back to you re: support for older Ruby but it might be fine to drop it? We certainly don't run anything that old internally.

It looks to me like the test failures are a side effect of the HTML output of mandoc varying by version. We might need an additional safeguard here to ensure that the correct version of mandoc is available, and some additional gymnastics to ensure that that's the version installed in the test environment.

@Alhadis
Copy link

Alhadis commented Oct 26, 2020

It looks to me like the test failures are a side effect of the HTML output of mandoc varying by version. We might need an additional safeguard here to ensure that the correct version of mandoc is available, and some additional gymnastics to ensure that that's the version installed in the test environment.

We have two major problems here:

  1. HTML output will inevitably vary between versions. Even if the variance is as trivial as changes to whitespace, it's not really something we can (or should) try to enforce.

  2. mandoc provides no means of interrogating its release version:

    $ mandoc --version
    mandoc: illegal option -- -
    mandoc: illegal option -- v
    mandoc: illegal option -- e
    mandoc: illegal option -- r
    usage: mandoc [-ac] [-I os=name] [-K encoding] [-mdoc | -man] [-O options]
    	      [-T output] [-W level] [file ...]
    $ mandoc -v
    mandoc: illegal option -- v
    usage: mandoc [-ac] [-I os=name] [-K encoding] [-mdoc | -man] [-O options]
    	      [-T output] [-W level] [file ...]

    From @ischwarze's earlier feedback, it seems unlikely that mandoc will introduce a --version switch, or something similar.

IMHO, a more reliable test would involve fuzzy-matching portions of a rendered HTML document.

@jordemort
Copy link
Contributor Author

@Alhadis Yes, the tests could be fuzzier, or the XSLT could be more complex, or maybe we could just give up on the XSLT altogether and say "I'll accept whatever mandoc outputs as the body"

@ischwarze
Copy link

1. HTML output will inevitably vary between versions.

True. The exact format of HTML output is expected to become more stable over time, but some changes will likely happen for quite some time: the HTML output is not yet perfect.

I'm a bit confused what you are trying to test here. It can't really be whether mandoc produces the expected HTML output, right? Because you are not testing mandoc here as far as i understand. Besides, whether mandoc produces the expected HTML output is already tested extensively in the mandoc regression suite. That seems the proper place to do such tests. Doing the tests there also implies that they are always done for the right version of mandoc, without doing any version detection, simply because the mandoc regression suite is part of the mandoc repository itself.

2. `mandoc` provides no means of interrogating its release version:

In the test suite for some other program, are you considering to provide different test suites (or at least varying tests) depending on the mandoc version? That would seem surprising to me. You want to test the other program, not mandoc, or what am i missing?

IMHO, a more reliable test would involve fuzzy-matching portions of a rendered HTML document.

That sounds reasonable. The short .Dx example you provided is less likely to change than a larger chunk of output. Alternatively, just looking for the string

<div class="manual-text">

might be even more robust. That is very likely to be contained in the HTML output of any version of mandoc for any input document, and it is not likely to change in the future.

@Alhadis
Copy link

Alhadis commented Oct 26, 2020

@jordemort i've updated the tests to use Nokogiri's CSS selectors to assert the existence of <h1> and <table> elements in rendered output. I'm also puzzled as to why we're using XMLStarlet instead of Nokogiri (which would make more sense, since it's a Ruby library).

I'm a bit confused what you are trying to test here.

We're testing this library's ability to shell out to mandoc(1) and filter its -Thtml output for optimal display on GitHub.

@jordemort
Copy link
Contributor Author

@Alhadis I think at this point you're going to be better off opening your own PR, so you don't have to wait for me to push things to the branch to get the tests to run.

Re: "why is this this way / why is this not finished" - this was a project from an internal hack week in 2018. I was not familiar with this codebase at the time; indeed, I barely knew Ruby. I do not have any ownership or authority over this repository, I do not have any time on my schedule allocated to work on it any further, and as far as I know it hasn't made it to anyone else's roadmap either. I would have really liked to see manpage support added to this gem, but I am not the person that is going to be able to make it happen.

@jordemort jordemort closed this Oct 26, 2020
@RalphCorderoy
Copy link

Hi @Alhadis, If you do open your own PR then please state its URL here. Thanks.

@Alhadis
Copy link

Alhadis commented Oct 28, 2020

Already on it. You'll all be @cc'd in the new PR. 👍

@pali
Copy link
Contributor

pali commented Nov 23, 2020

@Alhadis: Do you have some updates? Or do you need some help?

@Alhadis
Copy link

Alhadis commented Nov 24, 2020

@pali It's mostly finished; I've written a gem that handles the HTML filtering and refinement (in a manner optimised for display on GitHub). The only things left are regression tests, coverage reporting, and CI.

You see, nobody at GitHub is currently maintaining github/markup at the moment. From what I've been told, @lildude is the most likely staff member who'd be in a position to review/approve a PR for adding man page support. I don't know how willing he'd be to add markup maintenance to his list of responsibilities, so I'm doing everything I can to make this a frictionless process.

With all that said, I still have no idea what the chances are of getting this PR approved by GitHub. 😕

@Alhadis
Copy link

Alhadis commented May 9, 2021

Pinging @lildude to confirm whether he'd be willing to step in as this gem's maintainer/"owner".

I've a lot on my plate at the moment, so it makes no sense to continue working on something that's not going to be used anyway…

(Note that I'll take care of all the hard work if this gem does get "adopted"… 😉)

@lildude
Copy link
Member

lildude commented May 24, 2021

Pinging @lildude to confirm whether he'd be willing to step in as this gem's maintainer/"owner".

Unfortunately, I don't have the bandwidth right now. I'll keep it in mind for when my load lightens though.

@pali
Copy link
Contributor

pali commented Mar 25, 2023

Any news on manpage/mandoc support? Cc: @lildude or @aharpole ?
It would be nice to get that gem adopted to have manpage support on github.

@lildude
Copy link
Member

lildude commented Mar 27, 2023

Nope. No news. I'm not actively involved in this project and don't have the bandwidth to become active.

@Artoria2e5
Copy link

Here's to hoping something happens.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet