-
-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Description
Explain the problem.
I have a script to run against all the identifiers of most of the Objects found in a html file
`parse-identifiers-issue.lua'
function parse_identifier(elem, elem_type)
-- Discard elements with no id
if elem.identifier == '' or elem.identifier == nil then
return elem
end
print('[DEBUG] elem.identifier : ' .. elem.identifier) -- DEBUGGING
print('[DEBUG] elem_type : ' .. elem_type) -- DEBUGGING
return elem
end
return {
{ CodeBlock = function(e) return parse_identifier(e, "CodeBlock") end },
{ Div = function(e) return parse_identifier(e, "Div") end },
{ Figure = function(e) return parse_identifier(e, "Figure") end },
{ Header = function(e) return parse_identifier(e, "Header") end },
{ Table = function(e) return parse_identifier(e, "Table") end },
{ Code = function(e) return parse_identifier(e, "Code") end },
{ Image = function(e) return parse_identifier(e, "Image") end },
{ Link = function(e) return parse_identifier(e, "Link") end },
{ Span = function(e) return parse_identifier(e, "Span") end },
{ Cell = function(e) return parse_identifier(e, "Cell") end },
{ TableFoot = function(e) return parse_identifier(e, "TableFoot") end },
{ TableHead = function(e) return parse_identifier(e, "TableHead") end },
{ Para = function(e) return parse_identifier(e, "Para") end },
{ BlockQuote = function(e) return parse_identifier(e, "BlockQuote") end },
{ BulletList = function(e) return parse_identifier(e, "BulletList") end },
{ OrderedList = function(e) return parse_identifier(e, "OrderedList") end }
}
Passed against the following website
curl -s https://www.man7.org/linux/man-pages/man0/aio.h.0p.html | pandoc -f html -t plain -o /dev/null -L ./parse-identifiers-issue.lua
It produces the following output
[DEBUG] elem.identifier : aio.h0p-linux-manual-page
[DEBUG] elem_type : Header
[DEBUG] elem.identifier : prolog-top
[DEBUG] elem_type : Header
[DEBUG] elem.identifier : name-top
[DEBUG] elem_type : Header
[DEBUG] elem.identifier : synopsis-top
[DEBUG] elem_type : Header
[DEBUG] elem.identifier : description-top
[DEBUG] elem_type : Header
[DEBUG] elem.identifier : application-usage-top
[DEBUG] elem_type : Header
[DEBUG] elem.identifier : rationale-top
[DEBUG] elem_type : Header
[DEBUG] elem.identifier : future-directions-top
[DEBUG] elem_type : Header
[DEBUG] elem.identifier : see-also-top
[DEBUG] elem_type : Header
[DEBUG] elem.identifier : copyright-top
[DEBUG] elem_type : Header
[DEBUG] elem.identifier : PROLOG
[DEBUG] elem_type : Link
[DEBUG] elem.identifier : NAME
[DEBUG] elem_type : Link
[DEBUG] elem.identifier : SYNOPSIS
[DEBUG] elem_type : Link
[DEBUG] elem.identifier : DESCRIPTION
[DEBUG] elem_type : Link
[DEBUG] elem.identifier : APPLICATION_USAGE
[DEBUG] elem_type : Link
[DEBUG] elem.identifier : RATIONALE
[DEBUG] elem_type : Link
[DEBUG] elem.identifier : FUTURE_DIRECTIONS
[DEBUG] elem_type : Link
[DEBUG] elem.identifier : SEE_ALSO
[DEBUG] elem_type : Link
[DEBUG] elem.identifier : COPYRIGHT
[DEBUG] elem_type : Link
[DEBUG] elem.identifier : top_of_page
[DEBUG] elem_type : Span
Where is pandoc actually parsing those prolog-top , name-top etc...
All I can see in the .html is the following
<h2><a id="PROLOG" href="#PROLOG"></a>PROLOG <a href="#top_of_page"><span class="top-link">top</span></a></h2><pre>
<h2><a id="NAME" href="#NAME"></a>NAME <a href="#top_of_page"><span class="top-link">top</span></a></h2><pre>
What is the point of parsing them that way?, can someone break down how has pandoc computed these identifiers?
Pandoc version?
$ pandoc --version
pandoc 3.1.11.1
Features: -server +lua
Scripting engine: Lua 5.4
User data directory: /home/fakuve/.local/share/pandoc
Copyright (C) 2006-2023 John MacFarlane. Web: https://pandoc.org
This is free software; see the source for copying conditions. There is no
warranty, not even for merchantability or fitness for a particular purpose.
Using Debian Trixie , Pandoc compiled using sources in x64 Computer