Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change Lua representation, behavior of some types #7718

Closed
tarleb opened this issue Nov 26, 2021 · 25 comments
Closed

Change Lua representation, behavior of some types #7718

tarleb opened this issue Nov 26, 2021 · 25 comments

Comments

@tarleb
Copy link
Collaborator

tarleb commented Nov 26, 2021

Proposed changes:

  • Block and Inline lists ([Block] / [Inline]) each get a separate list type. This should improve error messages slightly. It would also allow the addition of type-specific methods, enabling code like doc.blocks:walk{my-lua-filter}.

    The concepts of Inlines and Blocks must be defined in the documentation either way, so having actual Lua types to match seems consistent. Explicit Inlines and Blocks constructors could be added, too.

    Side note: a pandoc.Inlines constructor could replace the pandoc.utils.text function.

  • Remove .t/.tag from MetaValue objects. The current behavior is inconsistent, as MetaString and MetaBool values are pushed as strings and booleans, respectively; they don't have a tag. MetaBlocks and MetaInlines will be treated as [Block] and [Inline] values. This change requires the previous point to be implemented, too.

  • Revisit tables. Tables are currently awkward to work with. E.g., Caption values must always be given as {long = ..., short = ...}, which is inconvenient. It should be possible to just pass Blocks or Inlines in place of a Caption element.

@tarleb tarleb changed the title Change representation, behavior of some types Change Lua representation, behavior of some types Nov 26, 2021
@kysko
Copy link

kysko commented Nov 26, 2021

  • Revisit tables. Tables are currently awkward to work with.

That's could be a whole subject on its own!

  • consistent attribute/Attribute access (and their constituents) for elements1
  • traversal of elements; a walk_table_elements, with filters for Cell, Row, etc...
    can already be written/simulated with callback hooks placed here and there in a function that traverses the table, but might will be useful if officially implemented;
    (unless of course it already exists and I'm too blind to find it)

...

Footnotes

  1. /edit: I had not thoroughly checked the latest developments in this regard, and it seems now this particular detail is much better. e.g. I can access Row r's classes through r[1].classes, whereas I had to use r[1][2] before.2

  2. footnote to the footnote: oops... I was wrong about being wrong... sort of... still have to use r[1][2] in some circumstances... weird... which brings us back to my first item.3

  3. me again... sorry for the noise: I was testing 2 versions of pandoc, and footnote 2 was done in older 2.14.2... but latest 2.16.x does give the better result stated in footnote 1, with classes accessed through .classes.

@tarleb
Copy link
Collaborator Author

tarleb commented Nov 26, 2021

I created a draft PR for the first two points here: #7719. It's WIP.

@tarleb
Copy link
Collaborator Author

tarleb commented Nov 28, 2021

The first two points above have been implemented, which leaves the third.

That's could be a whole subject on its own!

Let's turn it into a separate issue! Please open one, idealy with a wishlist for features and changes to current objects. Consistent handling of attr is already a really good point. I'm looking to turn all types with Attr into userdata objects that provide better functions, but which behave in a way that ensures backwards compatibility.

@tarleb
Copy link
Collaborator Author

tarleb commented Dec 3, 2021

Suggestion for the representation of Table{Head,Foot,Body}:

  • Provide fields attr and rows; turn the numerical fields [1] and [2] into aliases for these fields. This would add convenience while being fully backwards compatible.

Suggestions for Row values:

  • The cells would become accessible via numerical indices. The list would also get an attr field plus the usual aliases identifier, classes, and attributes.

    Usage example:

    if row.classes:includes 'important' then
      row.attr = pandoc.Attr{color="red"}
      for i, cell in ipairs(row) do
        row[i] = modify(cell)
      end
    end

    This would not be backwards compatible, the integer fields 1 and 2 currently contain the Attr and list of Cells, respectively.

  • For a backwards compatible alternative, we could put the list row cells into a field cells. The above example would then turn into

    for i, cell in ipairs(row.cells) do
      row.cells[i] = modify(cell)
    end

I think it makes sense to treat Row values as sequences, so I'd prefer the breaking change.

@jgm
Copy link
Owner

jgm commented Dec 3, 2021

I think I'd be okay with a breaking change here (as long as we telegraph it clearly).

@jgm
Copy link
Owner

jgm commented Dec 3, 2021

On the other hand, using cells is pretty easy. But a possible drawback to that backwards-compatible solution is that people might think they can just iterate directly, and this WILL work, but it will give them the wrong things.

@tarleb
Copy link
Collaborator Author

tarleb commented Dec 4, 2021

Maybe I'm approaching this wrong and we should even discourage manual traversal of tables. I like the proposal by @kysko, perhaps we should put the focus on walk methods for tables parts. Modifying rows and cells would become much easier. Pseudo-code:

function Table (tbl)
  return tbl:walk{
    Row = function (row) 
      if row.classes:contains 'important' then
        return row:walk { Cell = modify_important_cell }
      end
    end
  }
end

@jgm
Copy link
Owner

jgm commented Dec 4, 2021

Would the traversal idea allow one to modify e.g. the number of cells in a row?

@tarleb
Copy link
Collaborator Author

tarleb commented Dec 4, 2021

I believe it would be possible to allow splicing the same way we do for Block and Inline values.

@jgm
Copy link
Owner

jgm commented Dec 4, 2021

This does seem like a nice interface.

I was struck by tbl:walk -- can we currently do block:walk instead of walk_block or walk_inline? If not, should we? Should walk be context-sensitive in this way, so that if what is walked is a Table, we get the extra ability to define a transformer for Row and Cell, while if it is another block-level elemnet, we don't?

@tarleb
Copy link
Collaborator Author

tarleb commented Dec 4, 2021

Having walk methods on all elements would be really nice. I'll give it a go.

Treating all Block elements the same would be the simplest approach. Not sure if it's also the best?

@jgm
Copy link
Owner

jgm commented Dec 4, 2021

Yes, simplest -- but what about the tbl:walk illustrated above?

@kysko
Copy link

kysko commented Dec 5, 2021

Would the traversal idea allow one to modify e.g. the number of cells in a row?

That can be tricky with row/col spans more than 1, even now. When you do this in Haskell, you have the "Annotated Table" (iirc) to help you, whereas in Lua script I have to build a similar "complementary" grid from scratch for such kind of things.

So I have my way of coping, I don't want to turn this into an issue, but if e.g. cell deletion/creation is to be made easier, perhaps consider "sharing" the "Annotated Table" with Lua filter users, if that's even possible (or useful).
(This reminds me of this pandoc-types issue at end of last year... seems this period of the year is fertile for musings on tables!)

@tarleb
Copy link
Collaborator Author

tarleb commented Dec 6, 2021

Question about the walk method: should the filter also be applied to the element itself, or only its subtree? The pandoc.utils.walk_{inline,block} functions walk the element without applying any filter function to the element. Is that good behavior? If we apply the filter to the element as well, then the result must be a list instead of a single element.

To give an example

pandoc.SmallCaps('Hello'):walk{
  SmallCaps = function (sc) return pandoc.Emph(sc.content) end,
  Str = function (_) return pandoc.Str 'Bye' end
}

The above could return either pandoc.SmallCaps 'Bye', or pandoc.Inlines{pandoc.Emph 'Bye'}. Not sure which of these is more intuitive.

The current work in progress can be found in this branch.

@tarleb
Copy link
Collaborator Author

tarleb commented Dec 6, 2021

Using an AnnotatedTable would certainly be nice, but we'd probably support it in a way similar to SimpleTables, i.e., by offering conversion functions from and to stock Table values. But to be honest, I don't think I'll have time for that during this year's table-fixing season.

@jgm
Copy link
Owner

jgm commented Dec 6, 2021

In favor of applying walk to the element itself: that is how walk in T.P.Walk works.

@tarleb
Copy link
Collaborator Author

tarleb commented Dec 6, 2021

OK, I've opened #7732; adding walk to Row and values would be a bit of boilerplate code, but nothing difficult. Same for accepting functions named Row in filters.

@jgm
Copy link
Owner

jgm commented Dec 7, 2021

The more I think about it, the more I'm changing my mind about how walk should work.

Applying walk to the element itself (and not just its subtree) means returning a list value, and that's both harder to work with and a source of potential bugs (if someone assumes that a single Block rather than a [Block] list is being returned). Thinking about how walk is actually used, it's hard to see a compelling reason for having it operate on the element itself.

@tarleb
Copy link
Collaborator Author

tarleb commented Dec 7, 2021

One remaining question I have is which objects should get a walk method. Pandoc is an obvious candidate, but list types like Inlines and Blocks could get one as well. How should walk behave for lists? Should the filter still only be applied to the list elements, or also to the list itself?

The Meta type should probably not have any methods at all, as setting a metadata value with a method name would shadow the method.

@jgm
Copy link
Owner

jgm commented Dec 7, 2021

I definitely think it would be nice to allow walk to be applied to lists. With lists, it certainly seems to make more sense if walk applies to the list itself. If we did it that way for lists, but for non-lists just walked the contents, then it would effectively give us both behaviors. You could do

element:walk

to walk the single element (subtree only), returning an element of the same type, or

{element}:walk

to walk the subtree and the element itself, returning a list.

@tarleb
Copy link
Collaborator Author

tarleb commented Dec 7, 2021 via email

@jgm
Copy link
Owner

jgm commented Dec 8, 2021

That sounds great!

@albfan
Copy link

albfan commented Dec 15, 2021

As a working example that might help to see how to make table model human readable:

#7748

It shows a conversion from csv to html (tables). To make result dynamic (filter, pagination) it uses javascript. Adds javascript stuff with --template (great as this is not expected to be handled with pandoc engine) but javascript needs some metadata on html table code to work. Some attributes that enables pagination, filter on <table> tag, and some fields on <th> to enable filter controls (select/input filter, sort)

A first solution was created on beautifulsoap python library, but with help of @mb21 was able to turn into a lua filter (so all works inside pandoc). But result is pretty cryptic:

#7748 (comment)

I will put here for ease reading:

working filter:

local stringify=pandoc.utils.stringify

function Table(table)
  table.attr = {
    ['data-filter-control']="true",
    ['data-page-list']="[10, 25, 50, 100, all]",
    ['data-pagination']="true",
    ['data-show-columns']="true",
    ['data-show-search-clear-button']="true",
    ['data-sortable']="true",
    ['data-toggle']="table"
  }

  for k, th in pairs(table.head[2][1][2]) do
    th.attr = pandoc.Attr("", {}, {
      {'data-field',stringify(th.contents):gsub(" ", "-")},
      {'data-sortable','true'},
      {'data-filter-control','select'}
    })
  end
  return table
end

Notice the part of manipulate th tags (Cells of Rows in TableHead)

This pseudo code is what I would expect to be available:

for k, th in pairs(table.head.rows) do
    th.cell.attr = {
      ['data-field']=stringify(th.contents):gsub(" ", "-"),
      ['data-sortable']='true',
      ['data-filter-control']='select',
    }

to turn:

<thead>
<th>Username</th>
<th>name</name>
</thead>

into

<thead>
<th data-field="Username" data-sortable="true" data-filter-control="true">Username</th>
<th data-field="Username" data-sortable="true" data-filter-control="true">name</name>
</thead>

Hope this looks like something useful and a clear example on how ease to customize results from this amazing pandoc engine

@tarleb
Copy link
Collaborator Author

tarleb commented Dec 17, 2021

I'm in favor of adding proper fields to TableHead, Row, and TableFoot, I have a branch for this in the works. The downside is that we'd loose backwards compatibility.

I added code to hslua-2.1 that would allow to keep things backwards compatible, but there are other changes that would be good to get into that release, and it might take a few more weeks before I publish it.

@jgm
Copy link
Owner

jgm commented Dec 17, 2021

I'm in favor of adding proper fields to TableHead, Row, and TableFoot, I have a branch for this in the works. The downside is that we'd loose backwards compatibility.

I think it would be a good change nonetheless. Presently it's pretty mysterious how to modify tables.
Of course if there's a way to preserve backwards compatibility, that would be worth doing.

@tarleb tarleb closed this as completed in a0af1b5 Dec 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants