Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DocX Reader Bug: <w:instrText>HYPERLINK \l "bm1"</w:instrText> parsed incorrectly #9246

Closed
Aspvik opened this issue Dec 11, 2023 · 7 comments

Comments

@Aspvik
Copy link

Aspvik commented Dec 11, 2023

Hi!
I have been trying to figure out why some links are converted to <a href="\l">.. when converting from docx (legacy MS Word documents) to Epub3. I have narrowed it down to the following:

Does not work:
<w:instrText>HYPERLINK \l "bm1"</w:instrText>

Works:
<w:instrText>HYPERLINK "#bm1"</w:instrText>

Also the <w:hyperlink> element works.

Relevant code:

hyperlink :: Parser URL

Best regards
Rune

@jgm
Copy link
Owner

jgm commented Dec 11, 2023

What does

<w:instrText>HYPERLINK  \l "bm1"</w:instrText>

mean? Can you point to documentation? Does \l mean "this is an internal link"?

@Aspvik
Copy link
Author

Aspvik commented Dec 11, 2023

<w:instrText>HYPERLINK \l "bm1"</w:instrText> is an internal link yes; to a bookmark named bm1.

\l means location. Either by itself as shown in my example (internal file), but I believe also as a second argument to a remote file like so: <w:instrText>HYPERLINK "http://example.com" \l "hash"</w:instrText> where \l would replace #.

I have not been successful at finding documentation at any official sources.

Snippet from docx file where in ms word the links works fine:
Link paragraph
<w:p w14:paraId="1582B871" w14:textId="70888DC0" w:rsidR="00BB31C9" w:rsidRPr="0097371A" w:rsidRDefault="00C33FD4"> <w:pPr> <w:pStyle w:val="NormalIndent"/> <w:spacing w:after="240"/> <w:rPr> <w:lang w:val="nb-NO"/> </w:rPr> </w:pPr> <w:bookmarkStart w:id="71" w:name="COPNO-0245maaling-93"/> <w:bookmarkStart w:id="72" w:name="COPNO-0245maaling-94"/> <w:bookmarkEnd w:id="71"/> <w:bookmarkEnd w:id="72"/> <w:r w:rsidRPr="0097371A"> <w:rPr> <w:lang w:val="nb-NO"/> </w:rPr> <w:t xml:space="preserve">De forskjellige kalibreringsfrekvensene er oppgitt under </w:t> </w:r> <w:r> <w:fldChar w:fldCharType="begin"/> </w:r> <w:r w:rsidRPr="00901C92"> <w:rPr> <w:lang w:val="nb-NO"/> </w:rPr> <w:instrText>HYPERLINK \l "CEGEIDCF"</w:instrText> </w:r> <w:r> <w:fldChar w:fldCharType="separate"/> </w:r> <w:r w:rsidRPr="0097371A"> <w:rPr> <w:rStyle w:val="Hyperlink"/> <w:color w:val="FF0000"/> <w:lang w:val="nb-NO"/> </w:rPr> <w:t>Vedlegg N - Kalibreringsintervall</w:t> </w:r> <w:r> <w:rPr> <w:rStyle w:val="Hyperlink"/> <w:color w:val="FF0000"/> <w:lang w:val="nb-NO"/> </w:rPr> <w:fldChar w:fldCharType="end"/> </w:r> <w:r w:rsidRPr="0097371A"> <w:rPr> <w:color w:val="000000"/> <w:lang w:val="nb-NO"/> </w:rPr> <w:t>.</w:t> </w:r> </w:p>

This should have created an internal link to #CEGEIDCF, however it creates a link to "\l"

Pandoc v. 3.1.9

@jgm
Copy link
Owner

jgm commented Dec 12, 2023

Can you upload a small docx that uses this, so I can test?

@Aspvik
Copy link
Author

Aspvik commented Dec 12, 2023

I just tested and confirmed that the issue occurs when there is no argument in-front of the location switch (\l)

This works and is parsed as "http://example.com#hash"
<w:instrText>HYPERLINK "http://example.com" \l "hash"</w:instrText>

This does not work because of the missing expected parameter ("http://example.com")
<w:instrText>HYPERLINK \l "hash"</w:instrText>

I think the first argument should be optional. However I'm not able to find any official documentation that supports this.
Please find attached docx with example link from page 3 to Heading 1.
doc.docx

An additional gotcha: I have seen cases of multiple (two) whitespaces in-front of the \l switch. Example: <w:instrText>HYPERLINK \l "hash"</w:instrText>

@Aspvik
Copy link
Author

Aspvik commented Dec 18, 2023

Can I please get an update on the status of this issue? Will it be fixed or ignored?
Thanks

@jgm
Copy link
Owner

jgm commented Dec 19, 2023

Just a friendly remark: this is the kind of comment that can demoralize open-source maintainers, who are working on a volunteer basis. We are not the service desk of a company whose product you have purchased. If there were an update on this issue, it would have been posted here, on the issue tracker. Will the issue be ignored? No, it is not being ignored. Will it have as high a priority for your volunteer maintainer as it does for you? That's unlikely.

@jgm jgm closed this as completed in 2f6a66f Dec 19, 2023
@Aspvik
Copy link
Author

Aspvik commented Dec 19, 2023

Thank you, and sorry if I came across as rude. I am fully aware of the situation you are describing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants