Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with text offset and Linebreak #52

Closed
kosloot opened this issue Jul 3, 2018 · 3 comments
Closed

Problem with text offset and Linebreak #52

kosloot opened this issue Jul 3, 2018 · 3 comments
Assignees
Labels

Comments

@kosloot
Copy link
Collaborator

kosloot commented Jul 3, 2018

example:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="folia2html.xsl"?>
<FoLiA xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://ilk.uvt.nl/folia" xml:id="WR-P-E-J-0000000001" generator="libfolia-v1.14" version="1.5">
  <text xml:id="WR-P-E-J-0000000001.text">
    <div>
      <head xml:id="sandbox.3">
        <t>De <br/><br/><br/><br/>FoLiA developers zijn:</t>
        <str xml:id="sandbox.3.str">
          <t offset="7">FoLiA</t>
        </str>
      </head>
    </div>
  </text>
</FoLiA>

C++'s libfolia accepts this, as it sees every <br/> as 1 character, so the offset of FoLiA is 7

Python's folia.py rejects this as it ignores all <br/> symbols and requires an offset of 3

I think libfolia is right here. but this is very tricky indeed.

@proycon
Copy link
Owner

proycon commented Jul 3, 2018

I'd like to latch on some related issues to this one, as I've seen it in practice:

<t>De
FoLiA developers zijn:</t>

So a newline in the XML but not an explicit newline, meaning no newline as far as FoLiA is concerned. But it is still whitespace. So I think this is:

  • <t>De FoLiA developers zijn:</t> (offset 3)

And not (I've seen this happen):

  • <t>DeFoLiA developers zijn:</t> (offset 2)

Add what about?

<t>De\s\s\s\s\s\s\s
FoLiA developers zijn:</t>

I'm not entirely sure how we handle that currently, I'd say it's still offset 3.

I do agree there is a good argument to consider the offset to be 4 in your above case of an explicit linebreak.

@proycon proycon added the bug label Jul 3, 2018
@kosloot
Copy link
Collaborator Author

kosloot commented Jul 3, 2018

well, libfolia's folialint happily accepts this:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="folia2html.xsl"?>
<FoLiA xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://ilk.uvt.nl/folia" xml:id="WR-P-E-J-0000000001" generator="libfolia-v1.14" version="1.5">
  <metadata type="native">
    <annotations/>
  </metadata>
  <text xml:id="WR-P-E-J-0000000001.text">
    <div>
      <head xml:id="sandbox.3">
        <t>De
FoLiA developers zijn:</t>
        <str xml:id="sandbox.3.str">
          <t offset="3">FoLiA</t>
        </str>
      </head>
    </div>
  </text>
</FoLiA>

Also with offset 3.

Considering:

Add what about?

<t>De\s\s\s\s\s\s\s
FoLiA developers zijn:</t>

Regarding this (were there are 6 spaces behind 'De')

<?xml-stylesheet type="text/xsl" href="folia2html.xsl"?>
<FoLiA xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://ilk.uvt.nl/folia" xml:id="WR-P-E-J-0000000001" generator="libfolia-v1.14" version="1.5">
  <metadata type="native">
    <annotations/>
  </metadata>
  <text xml:id="WR-P-E-J-0000000001.text">
    <div>
      <head xml:id="sandbox.3">
        <t>De      
FoLiA developers zijn:</t>
        <str xml:id="sandbox.3.str">
          <t offset="9">FoLiA</t>
        </str>
      </head>
    </div>
  </text>
</FoLiA>

folialint is really happy ...

@kosloot
Copy link
Collaborator Author

kosloot commented Feb 24, 2020

So this seems to be solved a long time ago

@kosloot kosloot closed this as completed Feb 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants