Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datatable extractor causes duplicated table headers #78

Closed
fabiodrg opened this issue Oct 15, 2020 · 0 comments · Fixed by #81
Closed

Datatable extractor causes duplicated table headers #78

fabiodrg opened this issue Oct 15, 2020 · 0 comments · Fixed by #81
Assignees
Labels
bug Something isn't working extractor Extractor related issues good first issue Good for newcomers

Comments

@fabiodrg
Copy link
Collaborator

fabiodrg commented Oct 15, 2020

In some tables, the Datatable extractor fails to remove the original header before appending a new <thead>

image

The problem is that tables in Sigarra are not consistent from HTML point of view. Some use the tr > th elements, others use the more structural tags thead, tbody, etc.

Note: If I understand it correctly, we need to ensure the presence of a <thead> otherwise the DataTables library won't work properly. The remaining of the table is not relevant. I.e., the body of the table can either be wrapped around a tbody or be listed in multiple tr elements.

So, I think we just need to handle the following cases:

Example with <thead>

<table>
  ... // <caption>, <colgroup> can appear here
  <thead>
    <tr>
    ... // one or more table rows
  </thead>
  ... // one or multiple <tr> or <tbody> and lastly an optional <tfooter>
</table>

Interpretation: If we have a table > thead, then we don't have to do anything! The library should work just fine. We don't care about the table body structure.

Example with <tr>s only

<table>
  ... // <caption>, <colgroup> can appear here
  <tr>
    <th>...</th>
    <th>...</th>
    ...
  </tr>
</table>

Interpretation: If we happen to have table > tr > th, then we need to transform the table. We replace tr by a theader tag and that's it. To be more precise, we need to ensure that all direct childrens of the table row are table headers. Table headers can also be used for rows. I think it's enough to see if the first table row only has th childs, just to avoid iterating over all rows. I mean, does it make sense that a header appears in the middle of the table? Technically it works and browsers render it, but I can't imagine a use-case.

References:

@fabiodrg fabiodrg added the bug Something isn't working label Oct 15, 2020
@fabiodrg fabiodrg self-assigned this Oct 15, 2020
@fabiodrg fabiodrg added extractor Extractor related issues good first issue Good for newcomers labels Oct 15, 2020
fabiodrg added a commit that referenced this issue Oct 17, 2020
The previous code would ALWAYS insert a <thead> tag. However, some tables already have it and the final was duplicated headers. This fix checks wether a <thead> is present or not, and in case it's missing it makes a more robust search for a possible table header (which is not mandatory). If one is found, then is wrapped around a <thead> and inserted in the original table

Closes #78
fabiodrg added a commit that referenced this issue Mar 6, 2021
The previous code would ALWAYS insert a <thead> tag. However, some tables already have it and the final was duplicated headers. This fix checks wether a <thead> is present or not, and in case it's missing it makes a more robust search for a possible table header (which is not mandatory). If one is found, then is wrapped around a <thead> and inserted in the original table

Closes #78
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working extractor Extractor related issues good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant