Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TeX's non-breaking space ~ breaking samewords #29

Closed
floriandk opened this issue May 7, 2018 · 5 comments
Closed

TeX's non-breaking space ~ breaking samewords #29

floriandk opened this issue May 7, 2018 · 5 comments

Comments

@floriandk
Copy link

floriandk commented May 7, 2018

Hej igen -- it's me, your nemesis… ;)

I am now running samewords on my huge project, so I have dug up some more problems. I'd really appreciate if you could iron out the last few glitches.

First:

~ in \edtext isn't compiled:

\documentclass{article}

\usepackage[series={A},nofamiliar,noeledsec,noledgroup,draft]{reledmac}

\begin{document}

\beginnumbering
\pstart
2~dollars and
\edtext{2~\edtext{cent}{%
	\Afootnote{dimes}}
and
even
2
more}{%
	\lemma{2–more}%
	\Afootnote{del.}},
and
some
more.
\pend
\endnumbering

\end{document}
  File "/usr/local/bin/samewords", line 11, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.6/site-packages/samewords/cli.py", line 107, in main
    print(samewords.core.process_document(filename, procedure))
  File "/usr/local/lib/python3.6/site-packages/samewords/core.py", line 32, in process_document
    for par in chunk_pars(chunk)])
  File "/usr/local/lib/python3.6/site-packages/samewords/core.py", line 32, in <listcomp>
    for par in chunk_pars(chunk)])
  File "/usr/local/lib/python3.6/site-packages/samewords/core.py", line 13, in run_annotation
    words = matcher.annotate()
  File "/usr/local/lib/python3.6/site-packages/samewords/matcher.py", line 31, in annotate
    edtext_end = entry['data'][1] + 1
IndexError: list index out of range

With \edtext{2 \edtext{cent} it runs through, but doesn't catch all occurrences of "2", not surprisingly, as I assume that the whole string "2~dollars" is handled as one word and thus not matched with the word "2" of the edtext.

@stenskjaer
Copy link
Owner

Don't apologize for contributing to the project! Your feedback is really great.

I'm looking into this now.

@stenskjaer
Copy link
Owner

stenskjaer commented May 7, 2018

I have pushed a suggestion for this to the branch "issue-29".
Now this gives me this:

\sameword{2}~dollars and
\edtext{\sameword[1]{2}~\edtext{cent}{%
    \Afootnote{dimes}}
and
even
2
\sameword[1]{more}}{%
    \lemma{\sameword{2}–\sameword{more}}%
    \Afootnote{del.}},
and
some
\sameword{more}.

Does that fit your expectation?

Edit: There are no other one character space characters in TeX right (aside from of course " "). So that excludes for example \:, \, etc., as they are more than one character.

@floriandk
Copy link
Author

Yes, this is the result I'd expect.

But I found something else while trying the branch (haven't checked for other branches yet). Try:

\documentclass{scrartcl}

\usepackage[series={A},nofamiliar,noeledsec,noledgroup,draft]{reledmac}

\begin{document}

\beginnumbering
\pstart
TWO~dollars and
\edtext{TWO \edtext{cent}{%
	\Afootnote{dimes}}
and
even
%
\edtext{TWO}{%
	\Afootnote{4}}
 more}{%
	\lemma{TWO–more}%
	\Afootnote{del.}},
and
%
\edtext{TWO~%
\edtext{dollars}{%
	\Afootnote{cents}}
}{%
	\Afootnote{del.}}
some
more.
\pend
\endnumbering

\end{document}

The result gives some double \sameword{\sameword{ and will not compile with reledmac:

./orig1-SWtestSW.tex:30: Undefined control sequence.
<argument> ... @\this@absline @\the \section@numR 
                                                  @R
l.30 \endnumbering

Removing the double \samewords makes it run through with a correct result.

On the other hand

1 TWO⁴ dollars² ] del.

is a bit strange. I'd expect to find

1 TWO dollars² ] del.

in an edition.
(But I keep wondering whether one could construct cases where this would be ambiguous. Can you come up with one?)

Considering this, your markup seems to make sense and reledmac's handling of the markup should be different.
What do you think?

@floriandk
Copy link
Author

floriandk commented May 8, 2018

Trying again without the ~s and the 0.4.2 version I see that the new problem is unrelated to ~ and the new branch. Perhaps you want it to be moved to a new issue?

@stenskjaer
Copy link
Owner

Yes, I'm moving this and then I'll close the original issue when the update is pushed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants