Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Scaladoc tables #6043

Merged
merged 1 commit into from Sep 14, 2018
Merged

Add Scaladoc tables #6043

merged 1 commit into from Sep 14, 2018

Conversation

janekdb
Copy link
Member

@janekdb janekdb commented Aug 20, 2017

Parse a subset of GitHub flavoured markdown to define simple tables which can include inline wiki markdown. The specification for the table markdown is based on GitHub Flavored Markdown Spec, Version 0.28-gfm (2017-08-01).

The markdown used to structure the table follows GitHub Flavoured Markdown Table Extension with the restriction that pipe symbols at the start and end of row are mandatory and the clarification that delimiter row cells must have a minimum of three hyphens.

Note: Most of the following is irrelevant now the markdown is GHFM based..

  |     Define table cells
  ---   Default text alignment
  :---  Left text alignment
  ---:  Right text alignment
  :---: Center text alignment

Example.

A simple 2x2 table with a caption and header row,

  {|
  |+ Caption: Example Table
  ! header 1        ! header 2        !
  | row 1, column 1 | row 1, column 2 |
  | row 2, column 1 | row 2, column 2 |
  |}

Header and cell markdown can be repeated at the start of the heaer or row to allow | and ! to be included in content,

  {|
  !!  Disjunction!    !!  Conjunction!     !!
  ||| Logical OR = || ||| Logical AND = && |||
  |}

Fuller example including longer cell marks,

  {|
  |+ Caption: Example Table
  ! Header cell 1 ! Header cell 2 !
  | Row 1, column 1 | Row 1, column 2 |
  || Content with pipe symbol:  |A| := det(A) || final cell over
  two lines||
  |}

Caption, header and cell content can be broken over multiple lines and
include paragraphs and inline markdown,

    |Multiline
    cell
    content with __inline__ markdown

    Paragraph two.

Block level elements other than paragraphs are not yet parsed. Specifically these types are not yet supported,

  • Code blocks
  • Horizontal rules
  • Titles
  • Lists

A warning is issued when the rows and optional header do not all have the same number of cells.
This reserves an option to do something "wiki-ish" when cell counts differ.

Inspired by MediaWiki markdown with new row conciseness following GitHub flavoured markdown.

GitHub Project with subtasks: https://github.com/janekdb/scala/projects/1

Live Demo

http://janekdb.github.io/scala/PR-6043/scala/test/scaladoc/tables/code/index.html

This page contains the unit test tables with the table markdown duplicated into a code block,

image

I'm hoping this makes it easier to review the PR and also furnishes a good idea of what you get for what you give.

PR Status

WIP

Review Comments

  • More compact way of introducing new rows

TODO: Move to GHFM

  • Make GHFM compliant
  • Add GHFM column alignment options
  • Update commit message
  • Document requirement for delimiting pipes on rows per @lrytz comment
  • Add negative test for missing pipes

TODO: Support multiline content cell with GHFM extension

  • Understand dotty 3rd party markdown parser
  • Explore conversion of pre-dotty markdown to allow dotty markdown parser to be used
  • Retain multiline support subject to review of dotty approach

TODO

  • Include table start in check for new paragraph
  • Fix lines numbers in table-warnings.check
  • Add progress check in parseCells0
  • Parse Caption as Seq[Block] matching Cell.
  • Fix missing paragraphs in cell content starting with CellsParagraphsB and friends
  • Remove new row markdown |- because final cell termination is sufficient
  • Consolidate or remove identically structured test traits
  • CSS style table to blend into current Scaladoc themes not including row alternation styles
  • Consider review comment regarding paragraph element remembering CSS is the correct way to address this
  • Decide which other block elements other than Paragraph to parse: Code, Lists, Title, HRule.
  • Review commit message
  • Update grammar on table() method comment
  • Remove isNewlineAnyTableMarks or the duplicate
  • Run static analysis
  • Squash commits

WONT

Item Reason Comment
Be more lenient toward leading whitespace below table marks Deliver current tables sooner Possible future improvement
Support code blocks in cells in first merge Deliver current tables sooner Parsing code blocks will be the next improvement. I'd rather separate this off into new work because parsing code block while avoid code duplication could be quite a significant transformation requiring some recursion to be added at the top level of the WikiParser.
Drop requirement for table start and end markers Deliver current tables sooner Without the start and end markers detecting transitions into and out of table parsing mode would be more complex and invasive. If ever we want some table level options around table formatting etc there would be no where to locate this information. Main reason is the implementation complexity.

Acceptance Testing

  • Check HTML rendering of test traits
  • No regressions on existing non-table markdown in Scala library

Review

  • Create version of tables.scala which includes the table markdown as a code block and share it

Target Outcome Over the Course of a few PRs :)

  • Tables as seen in ScalaTest docs can be created without requiring non-markdown work

Follow Ons

  • Support code blocks.
  • Support other block elements: Horizontal Rules, Titles, List
  • Make table css responsive
  • Make styling in CaptionsB, CaptionsMultilineB effective instead of code style being bold and indistinguisable for other text
  • Consider applying a minimum width for table to avoid caption wrapping as seen in CaptionsC
  • Make styling of "and" in CellsC style the word differently to surrounding text.
  • Left/right/centre cell alignment

@scala-jenkins scala-jenkins added this to the 2.12.4 milestone Aug 20, 2017
@janekdb
Copy link
Member Author

janekdb commented Aug 20, 2017

This is an example of table markdown which includes all types of supported marks,

/** A variety of decorators that enable converting between
 *  Scala and Java collections using extension methods, `asScala` and `asJava`.
 *
 *  The extension methods return adapters for the corresponding API.
 *
 * {|
 * |+ Conversions are supported via `asScala` and `asJava`
 * ! Scala                           ! Direction ! Java !
 * | scala.collection.Iterable       | <=>       | java.lang.Iterable |
 * | scala.collection.Iterator       | <=>       | java.util.Iterator |
 * | scala.collection.mutable.Buffer | <=>       | java.util.List |
 * | scala.collection.mutable.Set    | <=>       | java.util.Set |
 * | scala.collection.mutable.Map    | <=>       | java.util.Map |
 * | scala.collection.concurrent.Map | <=>       | java.util.concurrent.ConcurrentMap |
 * |}
 *

image

The unit test contains additional examples which include examples of multiline caption, header and cell content using inline wiki styling. Tables cannot contain other block elements.

At this point I've dealt with all the defects I could anticipate with the exception of one described below in Known Issues. I'l like to get this out to a wider audience now so the next set of defects can be found.

Known Issues

Although it could be charitably regarded as adherent to the "spec" the markdown parsing does not tolerate leading whitespace before marks. To avoid wasting user time a subsequent PR will allow this to parse.

  // Should parse to table with a caption, header and a row
  /**
    * {|
    *  |+Leading
    *   whitespace before marks
    *    !!Not Yet Skipped!!TODO
    *    |-
    *    |A
    *       |B
    * |}
    */
  trait LeadingWhitespaceNotYetSkipped

Supporting this may require an extension to the check* method which so far I have managed to avoid. It would also require a few more unit tests and I feel since this limitation is well understood I'd rather get this in use and get some unanticipated cases reported.

Backlogged as a fast follower: scala/bug#10475

CSS Styling

The styling is a call to action for anyone with a passion for css. I will submit a PR if there is no interest from other quarters.

Follow on actions include

  • Converting some library Scaladoc to use tables where it makes an improvement
  • Updating the Scaladoc guide
  • Writing a blog post

@adriaanm
Copy link
Contributor

Cool! I'd prefer using GitHub-flavored markdown, as we do everywhere else. What do you think?

From https://guides.github.com/features/mastering-markdown/

You can create tables by assembling a list of words and dividing them with hyphens - (for the first row), and then separating each column with a pipe |:

First Header Second Header
Content from cell 1 Content from cell 2
Content in the first column Content in the second column

@janekdb
Copy link
Member Author

janekdb commented Aug 25, 2017

I surveyed a few markdown systems here: https://github.com/janekdb/scala/wiki/Table-Markdown-Survey

I believe the MediaWiki style markdown is easier to work with having used both GH and MW. It requires less attention to syntax than GH flavoured markdown and has the benefit of supporting vertical layout of cells in addition to horizontal layout which will work well with larger runs of inline markdown. MW can do everything GH can do AFAIK with less fiddling so I think it should be preferred.

Putting the choice into context here are the requirements I evaluated the markdown options against: https://github.com/janekdb/scala/wiki/Scaladoc-Tables-Requirements

@adriaanm
Copy link
Contributor

My opinion is that there's a strong case for standardization here, with feature completeness being a bit less urgent of a concern. Using GitHub markdown also makes it easier to preview

@lrytz
Copy link
Member

lrytz commented Sep 7, 2017

In general I agree to bias towards GFM, but here maybe the limitations are too severe?

  • GFM requires a header row
  • Cell content has to be on a single line
  • I like the option for a caption

Would it be OK adapt GFM to our needs?

On the other hand, the current proposal doesn't have functionality for left/center/right alignment.

I played a bit with the current state. One thing that's annoying for small tables is that each row requires a |- and a |. Example:

 * {|
 * |- | One | `1`
 * |- | Four | `4`
 * |}

It actually works without the first |-. I think it would be good if the | after |- was optional.

A detail in presentation: if two tables follow each other, the are sticked together. Maybe every table should be wrapped in a <p>?

The caption and header colors cold be a little less flashy :-)

I didn't get code blocks ({{{ ... }}}) to work within table cells - is that expected?

@janekdb
Copy link
Member Author

janekdb commented Sep 7, 2017

@lrytz

  • Cell content is inline markdown only at this point — no block level elements.

  • The styling is deferred to a later commit.

  • Left/right/centre alignment and other styling options I was intending to address later after a first iteration that results in usable tables.

I have some views to share regarding GHFM or not, just need a bit of time to write them down.

Thanks for the feedback. I am surprised the example was parsed but getting some feedback was the purpose of the PR!

@lrytz
Copy link
Member

lrytz commented Sep 7, 2017

OK, thanks for the clarifications!

I definitely think that allowing to use multiple lines for a cell's content (or even a row's) is very important. Otherwise the usefulness of this new feature would be severely limited.

I guess a good source of real-world examples is scalatest's scaladoc, searching for <table> in its checkout I find 42 results in 36 files. Ideally all of these should be doable in the new syntax.

Example: http://doc.scalatest.org/3.0.0/index.html#org.scalatest.AsyncFunSuite

@janekdb
Copy link
Member Author

janekdb commented Sep 7, 2017

@lrytz — The identification of these pre-existing table is super useful. This will allow for some concrete feature validation. I will take a look over the next few days.

@adriaanm
Copy link
Contributor

@janekdb, which version would you like to schedule this for? For 2.12.4, the PR must be mergeable by Wed Sep 27

@janekdb
Copy link
Member Author

janekdb commented Sep 21, 2017

@adriaanm — 2.12.5 as it needs rethinking to accommodate,

  • More compact way of introducing new rows
  • Tables as seen in ScalaTest docs

@adriaanm adriaanm modified the milestones: 2.12.4, 2.12.5 Sep 21, 2017
@janekdb janekdb force-pushed the topic/2.12/scaladoc-tables branch 2 times, most recently from a7d5962 to 11312cb Compare October 6, 2017 10:19
@janekdb janekdb added the WIP label Oct 6, 2017
@janekdb janekdb force-pushed the topic/2.12/scaladoc-tables branch 2 times, most recently from 8668b21 to 8d27dfb Compare October 20, 2017 10:40
@janekdb
Copy link
Member Author

janekdb commented Oct 20, 2017

Not forgotten. Making gradual progress on more compact row markdown.

Example of a three row table not using the |+ new row markdown,

  /**
    * {|
    * |Orange|Apple|
    * |Bread|Pie|
    * |Butter|Ice cream|
    * |}
    */

84/84 tests enabled and passing with new more-GHFM compact row markdown using cleaner parsing approach.

☺️

@janekdb janekdb force-pushed the topic/2.12/scaladoc-tables branch 5 times, most recently from 7d72ba9 to 440bb46 Compare November 3, 2017 11:26
@janekdb janekdb force-pushed the topic/2.12/scaladoc-tables branch 4 times, most recently from 1423ff5 to 4be8d29 Compare November 8, 2017 11:05
@lrytz
Copy link
Member

lrytz commented Sep 13, 2018

@janekdb

[test] [error]   - reflect/compile:doc failed: scala.NotImplementedError: an implementation is missing
[test] [error]   - compiler/compile:doc failed: scala.NotImplementedError: an implementation is missing

and

[test] !! 66 - run/tables-warnings.scala                 [compilation failed]

@janekdb
Copy link
Member Author

janekdb commented Sep 13, 2018

All positive and negative tests passing locally,

The parsing outcomes known to be incorrect are captured as-is in these test cases,

  trait CellMarkerEscaped
  trait CellMarkerEscapedTwice
  trait MissingInitialCellMark
  trait SplitCellContent
  trait MixedContentUnspaced
  trait LeadingWhitespaceNotSkipped

I'm intending to address these as fast follower bug fixes after this PR is merged.

Outstanding work,

  • Remove dead code from CommentFactoryBase
  • Update HtmlPage.tableToHtml

I'll need to do justice to those points tonight.

Update: The CI build found two cases of incompatibility with the Scaladoc in this project,

TreeInfo.scala

[test] peek: 6: newline: '
[test]           | [id ’.’] ‘super’ [‘[’ id ‘]’] ‘.’ id
[test] '

MarkupParsers.scala

[test] peek: 6: newline: '
[test]                      | `{` scalablock `}`
[test] '

Tables should only be recognized when the initial pipe is blocked to the left. There might be some whitespace trimming occurring that is causing this. For now I've removed the line breaks which I realise simply hides the problem.

Update: TreeInfo and MarkupParsers have been restored and initial ws trimming removed,

    def block(): Block = {
      if (checkSkipInitWhitespace("{{{"))
        code()
      else if (checkSkipInitWhitespace('='))
        title()
      else if (checkSkipInitWhitespace("----"))
        hrule()
      else if (checkList)
        listBlock
      else if (check(TableCellStart)) <- was checkSkipInitWhitespace
        table()
      else {
        para()
      }
    }

This implementation of tables assumed blocking to the left, which reduces the chances of parsing non-table markdown. Leading whitespace being allowed on the first row was inconsistent with that. Allowing leading whitespaces on any row appears to be allowed by the GHFM table spec but will cause incompatibilities as we've seen.

I will take this one as bug fix b/c it will require some thought and testing. One approach is to lookahead and require the line following the initial table row to be a delimiter row. This will head off any incompatibilities with Scaladoc found in the community I'm sure.

@janekdb janekdb force-pushed the topic/2.12/scaladoc-tables branch 3 times, most recently from b2e2794 to d931e23 Compare September 13, 2018 12:28
@janekdb
Copy link
Member Author

janekdb commented Sep 13, 2018

Seeing some encouraging HTML now...
image

Column alignment options looking good,
image

Status: All planned GHFM parsing done, initial end-to-end tests are good. Remaining: dead code removal, squash commits. I'll need to pick up that tomorrow morning and given that this should be good for merge decision.

@janekdb janekdb force-pushed the topic/2.12/scaladoc-tables branch 2 times, most recently from 66d5edb to 7f070e3 Compare September 13, 2018 20:20
Copy link
Member

@lrytz lrytz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! This is good to go once you're done with squashing and cleaning up.


val inlineTerminators = makeInlineTerminators(startMark)

val content = Paragraph(inline(isInlineEnd = checkAny(inlineTerminators)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it true that all terminators are single chars? Then this could be made more efficient by just checking the buffer's current char instead of calling check many times. (We can do this in a follow-up PR). A first simple improvement would be to test in check if the chars string has length 1, and avoid calling jump in that case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm doing the clean up now so will look at this.

@janekdb janekdb force-pushed the topic/2.12/scaladoc-tables branch 3 times, most recently from 6c5675c to ac6371b Compare September 14, 2018 11:17
Based on GitHub Flavored Markdown Spec,

    https://github.github.com/gfm/#tables-extension-
    Version 0.28-gfm (2017-08-01)

A table is a block element consisting of,

  * A header row
  * A delimiter row separating the header from the data
  * Zero or more data rows

Restrictions,

    Rows must begin and end with pipe symbols
    A blank line required after table

Limitations,

    Escaping of pipe symbols is not yet supported

Inline markdown can be used in header and data cells, block markdown cannot be used.

Example,

    /**
      * |Nibbles|Main|Desert|
      * |:--:|:---:|----|
      * |Bread|Yak|Vodka|
      * |Figs|Cheese on toast^three ways^|Coffee|
      */
  trait RepastOptions

The accepted markdown is intended to be a strict subset of all possible GHFM tables.
@janekdb
Copy link
Member Author

janekdb commented Sep 14, 2018

@lrytz - This is the merge candidate.

Last actions,

  • Removed most dead code
  • Actioned some TODOs, consolidated remaining TODO comments
  • Fixed incorrect parsing of empty leading cell
  • Investigated your optimization suggestion to use Char checks now that the only delimiter is the pipe. It will work but not with the time I have today.
  • Squashed commits
  • Ran end-to-end test

Not done,

  • Make scala project Scaladoc
  • Make community Scaladoc

Every known instance of suboptimal parsing has a test case. This will make it easy to continue with the edge cases following merge.

Compared to the pre-GHFM version this had only light testing, however in balance the parsing code is much simpler thanks to the stricter requirements. The community build @SethTisue will give us more information about how well this copes with existing markdown.

I lay my offering down at the side of the lake and hope when the morning comes to see it gone.

@janekdb janekdb added release-notes worth highlighting in next release notes and removed WIP labels Sep 14, 2018
@adriaanm
Copy link
Contributor

We gratefully and humbly accept your offerings! I have stood (by the sidelines) in awe at your diligence.

Copy link
Member

@lrytz lrytz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@lrytz lrytz merged commit a3542af into scala:2.12.x Sep 14, 2018
@janekdb
Copy link
Member Author

janekdb commented Sep 14, 2018

Phew! Thanks for everyone's help and patience. I'll remind myself to get early community feedback before I next plot a course.

@SethTisue
Copy link
Member

scala/community-build@69bca5e moves to a Scala SHA that includes this PR, so we'll know by tomorrow if anything blows up

@SethTisue
Copy link
Member

@janekdb I would like to tweet about this feature from https://twitter.com/scala_lang (and credit you by name when we do so). do you have a tweet about it I could quote-tweet? (and/or could you propose a screenshot to include?)

@janekdb
Copy link
Member Author

janekdb commented Sep 28, 2018

@SethTisue How about this screenshot lifted from the forthcoming blog post

image

You can crop off the intitial text leaving just the markdown and the table if you like.

My tweet: https://twitter.com/janekdb/status/1045742653822955520

@janekdb
Copy link
Member Author

janekdb commented Oct 1, 2018

@SethTisue Here's a couple of alternative images,

image

image

@SethTisue
Copy link
Member

@janekdb I forgot about your blog post draft. rather than tweeting now, how about we publish the blog post and publicize the feature that way (including on Twitter)?

@janekdb
Copy link
Member Author

janekdb commented Oct 3, 2018

@SethTisue Blog post is done pending review: scala/scala-lang#959

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-notes worth highlighting in next release notes
Projects
None yet
6 participants