Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zero Width Space in input text crashes the program #552

Closed
LdBeth opened this issue Jan 1, 2024 · 11 comments
Closed

Zero Width Space in input text crashes the program #552

LdBeth opened this issue Jan 1, 2024 · 11 comments
Assignees
Labels

Comments

@LdBeth
Copy link

LdBeth commented Jan 1, 2024

Zero width space (ZWSP) ​ or ​, either occurs as UTF-8 character or as XML entity in the input, would crash Speedata when ran with default flags, leaving a cryptic error message:

$ sp
...
> Shipout page 1
Page of type "page" created (2)
Number of rows: 28, number of columns = 19
PlaceObject: Textblock at (1,1) wd/ht: 1/0 in "_page" (p. 2)
PlaceObject: Textblock at (10,1) wd/ht: 10/1 in "_page" (p. 2)
PlaceObject: Textblock at (1,2) wd/ht: 19/2 in "_page" (p. 2)
Selecting node: "entry", mode="", pos=14
PlaceObject: Textblock at (1,4) wd/ht: 19/2 in "_page" (p. 2)
Selecting node: "entry", mode="", pos=15
Total run time: 765.678525ms
signal: abort trap

This is tested with 4.14.0 release and also developer version 4.15.19 on Intel based macOS.

I noticed the Speedata manual listed ​ as one of the space characters interpreted, and seems other unicode space characters does not cause the problem, so this is likely a bug.

@pgundlach
Copy link
Member

Thank you very much!

For me: here is a layout that crashes

<Layout xmlns="urn:speedata.de:2009/publisher/en"
    xmlns:sd="urn:speedata:2009/publisher/functions/en">

    <Record element="data">
        <PlaceObject>
            <Textblock>
                <Paragraph>
                    <Value>&#8203; text</Value>
                </Paragraph>
            </Textblock>
        </PlaceObject>
    </Record>
</Layout>

(4.15.19, sp --dummy)

@pgundlach pgundlach added the Bug label Jan 1, 2024
@pgundlach pgundlach self-assigned this Jan 1, 2024
@pgundlach
Copy link
Member

@LdBeth I am not sure that I am able to fix the error without more help from you.

  1. do you use harfbuzz mode when loading the fonts (a global switch in the configuration file or mode="harfbuzz" with <LoadFont...>?
  2. could you provide a small layout file that shows the problem?

I have a fix for a problem I have constructed above, but I am not sure (different error message) that this will also fix your error.

@LdBeth
Copy link
Author

LdBeth commented Jan 2, 2024

do you use harfbuzz mode when loading the fonts (a global switch in the configuration file or mode="harfbuzz" with <LoadFont...>?

No, it is a different issue from the problem here but It seems harfbuzz mode cannot be use together with font fallback. While in the layout file I used when I discover this issue I relied on font fallback to handle English text mixed with Japanese.

  <LoadFontfile name="Sans" filename="IBMPlexSerif-Regular.ttf">
    <Fallback filename="KleeOne-Regular.ttf" />
  </LoadFontfile>

Actually, the problem cannot be reproduced with harfbuzz mode on.

<Layout
    xmlns="urn:speedata.de:2009/publisher/en"
    xmlns:db='http://docbook.org/ns/docbook'
    xmlns:sd="urn:speedata:2009/publisher/functions/en">
  <Options mainlanguage="en"/>
  <LoadFontfile name="Sans" filename="KleeOne-Regular.ttf"/> <!-- with mode="harfbuzz" the crash won't happen -->
  <DefineFontfamily name="sans" fontsize="9" leading="11">
    <Regular fontface="Sans"/>
  </DefineFontfamily>
  <Hyphenation>Gun-dam</Hyphenation>
  <Hyphenation>as–sas–sin</Hyphenation>
  <DefineTextformat name="title" break-below="no"
                    alignment="leftaligned"/>
  <DefineTextformat name="yr" break-below="no"
                    alignment="rightaligned"/>
  <DefineTextformat name="desc" alignment="leftaligned"
                    indentation="0.2cm"/>
  <Pagetype name="page" test="true()">
    <Margin left="1cm" right="1cm" top="1cm" bottom="1cm"/>
    <AtPageCreation>
      <PlaceObject column="1" row="1">
        <Textblock><Copy-of select="$header"/></Textblock>
      </PlaceObject>
      <PlaceObject column="10" row="1">
        <Textblock>
          <Paragraph>
            <Value>Page: </Value>
            <Value select="sd:current-page()"/>
          </Paragraph>
        </Textblock>
      </PlaceObject>
    </AtPageCreation>
    <PositioningArea name="text">
      <PositioningFrame
          width="9"
          height="{(sd:number-of-rows() div 2) - 2}"
          row="2"
          column="1"/>
      <PositioningFrame
          width="9"
          height="{(sd:number-of-rows() div 2) - 2}"
          row="2"
          column="11"/>
      <PositioningFrame
          width="9"
          height="{(sd:number-of-rows() div 2) - 1}"
          row="{(sd:number-of-rows() div 2) + 2}"
          column="1"/>
      <PositioningFrame
          width="9"
          height="{(sd:number-of-rows() div 2) - 1}"
          row="{(sd:number-of-rows() div 2) + 1}"
          column="11"/>
    </PositioningArea>
  </Pagetype>

  <Record element="document">
    <SetVariable variable="header">
      <Paragraph><Value select="header"/></Paragraph>
    </SetVariable>
    <ProcessNode select="*"/>
  </Record>

  <Record element="entry">
    <Output area="text">
      <Text>
        <Paragraph language="--"
                   fontfamily="sans"
                   textformat="title"><Value select="title"/></Paragraph>
        <Paragraph textformat="yr"><Value>(</Value>
        <Value select="year"/><Value>)</Value></Paragraph>
      </Text>
    </Output>
  </Record>

</Layout>

and

data.xml

<document>
   <entry>
      <title>Chainsaw&#8203; Man</title>
      <year>2022</year>
   </entry>
</document>

Also the font file seems unrelated to the problem so you can replace them with the files available on your system.

@LdBeth
Copy link
Author

LdBeth commented Jan 2, 2024

Thank you very much!

For me: here is a layout that crashes

<Layout xmlns="urn:speedata.de:2009/publisher/en"
    xmlns:sd="urn:speedata:2009/publisher/functions/en">

    <Record element="data">
        <PlaceObject>
            <Textblock>
                <Paragraph>
                    <Value>&#8203; text</Value>
                </Paragraph>
            </Textblock>
        </PlaceObject>
    </Record>
</Layout>

(4.15.19, sp --dummy)

I cannot reproduce the program crash with this example on 4.15.19, however I found the issue to be using sp --dummy and following layout file, the program exits without indication of error, but the output pdf only contains the "text" after ZWSP, the "my a" before are missing.

<Layout xmlns="urn:speedata.de:2009/publisher/en"
    xmlns:sd="urn:speedata:2009/publisher/functions/en">
  <Options mainlanguage="en"/>
  
  <Record element="data">
        <PlaceObject>
            <Textblock>
                <Paragraph>
                    <Value>my a&#8203;text</Value>
                </Paragraph>
            </Textblock>
        </PlaceObject>
    </Record>
</Layout>

@LdBeth
Copy link
Author

LdBeth commented Jan 2, 2024

Also I would like to confirm an unexpected behavior, when the file directory is like

layout.xml
data.xml
foo/data.xml

The file foo/data.xml is loaded instead of data.xml. Which I believe is an edge case not handled in the code.

@pgundlach
Copy link
Member

do you use harfbuzz mode when loading the fonts (a global switch in the configuration file or mode="harfbuzz" with <LoadFont...>?

No, it is a different issue from the problem here but It seems harfbuzz mode cannot be use together with font fallback.

... thank you very much, I can reproduce the problem and I will provide a fix.

@pgundlach
Copy link
Member

Also I would like to confirm an unexpected behavior

...

It would help me organzing stuff if this is opened in a different issue. That said, this is "expected", although not well documented: https://doc.speedata.de/publisher/en/basics/fileorganization/#ch-fileorganization

dupicate entries should give a better warning.

@pgundlach
Copy link
Member

Minimal layout:

<Layout
  xmlns="urn:speedata.de:2009/publisher/en"
  xmlns:sd="urn:speedata:2009/publisher/functions/en">

  <Record element="data">
    <PlaceObject>
      <Textblock>
        <Paragraph>
          <Value select="title" />
        </Paragraph>
      </Textblock>
    </PlaceObject>
  </Record>
</Layout>

data:

<data>
    <title>a&#8203; b</title>
</data>

@LdBeth
Copy link
Author

LdBeth commented Jan 2, 2024

Yes, I can confirm the minimal layout reproduces the same problem I have.

@pgundlach
Copy link
Member

A workaround (until I provide a fix) is to say html="off" with Paragraph:

    <Paragraph html="off">

@pgundlach
Copy link
Member

This should be fixed in version 4.15.20 (now online). Thank you very much for your bug report and your patience!

@pgundlach pgundlach mentioned this issue Jan 2, 2024
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants