Skip to content

Commit

Permalink
Merge pull request #153 from osamuaoki/master
Browse files Browse the repository at this point in the history
enhance debug output, fix docbook behavior, allow separate input for POT
  • Loading branch information
mquinson committed Oct 17, 2018
2 parents 062957b + efb9a87 commit 7f3ff73
Show file tree
Hide file tree
Showing 29 changed files with 1,218 additions and 185 deletions.
26 changes: 26 additions & 0 deletions NEWS
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,32 @@ AsciiDoc:

Translations:
* Updated: German, thanks Helge Kreutzmann.

TransTractor:
* Ensure to split lines before addendum operation without loss or addition of
newline. With this change, addendum behavior is more intuitive.
(Debian's #518218, Github's #147, #153)

Xml, Docbook:
* Document XML tag behavior customization with example to help use case
specific customization. (Debian's #515763)
* Debug output enhancement to help people understand what exactly happening
inside po4a.
* Extensive POD and code comment additions and updates.

Sgml:
* Avoid deprecated unescaped left brace in regex to get ready for Perl 5.32.
(Debian's #903735)

po4a tool:
* Add pot_in feature to support the secondary master file for the base of
POT/PO file generation.

Test:
* Add XML test cases with po4a including addendum, tag customization, and
pot_in feature.

=======================================================================
___ ____ _ _
__ __/ _ \ | ___|| || |
\ \ / / | | ||___ \| || |_ Back on track. Again.
Expand Down
108 changes: 65 additions & 43 deletions doc/po4a.7.pod
Original file line number Diff line number Diff line change
Expand Up @@ -552,10 +552,21 @@ header indicating where in the produced document they should be placed. The
rest of the addendum file will be added verbatim at the determined position of
the resulting document.

The header has a pretty rigid syntax: It must begin with the string
B<PO4A-HEADER:>, followed by a semi-colon (B<;>) separated list of
I<key>B<=>I<value> fields. White spaces ARE important. Note that you cannot use
the semi-colon char (B<;>) in the value, and that quoting it doesn't help.
The header line which specify context has a pretty rigid syntax: It must begin
with the string B<PO4A-HEADER:>, followed by a semi-colon (B<;>) separated list
of I<key>B<=>I<value> fields. White spaces ARE important. Note that you cannot
use the semi-colon char (B<;>) in the value, and that quoting it doesn't help.
Optionally, spaces (B< >) may be inserted before I<key> for readability.

Although this context search may be considered to operate roughly on each line
of the translated document, it actually operates on the internal data string of
the translated document. This internal data string may be a text spanning a
paragraph containing multiple lines or may be a XML tag itself alone. The
exact I<insertion point> of the addendum must be before or after the internal
data string and can not be within the internal data string.

The actual internal data string of the translated document can be visualized by
executing po4a in debug mode.

Again, it sounds scary, but the examples given below should help you to find
how to write the header line you need. To illustrate the discussion, assume
Expand All @@ -566,53 +577,56 @@ Here are the possible header keys:

=over 4

=item B<position> (mandatory)
=item B<mode> (mandatory)

a Perl regexp. The addendum will be placed near the line matching this regexp.
Note that we're speaking about the translated document here, not the
original. If more than a line match this expression (or none), the addition
will fail. It is indeed better to report an error than inserting the
addendum at the wrong location.
It can be either the string B<before> or B<after>.

This line is called I<position point> in the following. The point where the
addendum is added is called I<insertion point>. Those two points are near one
from another, but not equal. For example, if you want to insert a new section,
it is easier to put the I<position point> on the title of the preceding section
and explain po4a where the section ends (remember that I<position point> is
given by a regexp which should match a unique line).
If B<mode=before>, the I<insertion point> is determined by one step regex match
specified by the B<position> argument regex. The I<insertion point> is
immediately before the uniquely matched internal data string of the translated
document.

The localization of the I<insertion point> with regard to the I<position point>
is controlled by the B<mode>, B<beginboundary> and B<endboundary> fields, as
explained below.
If B<mode=after>, the I<insertion point> is determined by two step regex
matches specified by the B<position> argument regex; and by the
B<beginboundary> or B<endboundary> argument regex.

In our case, we would have:
Since there may be multiple sections for the assumed case, let's use 2 step
approach.

position=<title>About this document</title>
mode=after

=item B<position> (mandatory)

=item B<mode> (mandatory)
A Perl regexp for specifying the context.

It can be either the string B<before> or B<after>, specifying the position of
the addendum, relative to the I<position point>. In case B<before> is given
the I<insertion point> will placed exactly before the I<position point>. The
B<after> behaviour is detailed bellow.
If more than one internal data strings match this expression (or none), the
search for the I<insertion point> and addition of the addendum will fail. It is
indeed better to report an error than inserting the addendum at the wrong
location.

Since we want the new section to be placed below the one we are matching, we
have:
If B<mode=before>, the I<insertion point> is specified to be immediately before
the internal data string uniquely matching the B<position> argument regex.

mode=after
If B<mode=after>, the search for the I<insertion point> is narrowed down to the
data after the internal data string uniquely matching the B<position> argument
regex. The exact I<insertion point> is further specified by the
B<beginboundary> or B<endboundary>.

In our case, we need to skip several preceding sections by narrowing down
search using the section title string.

position=About this document

(In reality, you need to use the translated section title string here,
instead.)

=item B<beginboundary> (used only when B<mode=after>, and mandatory in that case)

=item B<endboundary> (idem)

regexp matching the end of the section after which the addendum goes.

When B<mode=after>, the I<insertion point> is after the I<position point>, but
not directly after! It is placed at the end of the section beginning at the
I<position point>, i.e., after or before the line matched by the
I<???>B<boundary> argument, depending on whether you used B<beginboundary> or
B<endboundary>.
A second Perl regexp required only when B<mode=after>. The addendum will be
placed immediately before or after the first internal data string matching the
B<beginboundary> or B<endboundary> argument regexp, respectively.

In our case, we can choose to indicate the end of the section we match by
adding:
Expand Down Expand Up @@ -650,16 +664,15 @@ document, you can use either of those header lines:
PO4A-HEADER: mode=after; position=About this document; endboundary=</section>
PO4A-HEADER: mode=after; position=About this document; beginboundary=<section>


=item
If you want to add something after the following nroff section:

.SH "AUTHORS"

you should put a B<position> matching this line, and a B<beginboundary>
matching the beginning of the next section (i.e., B<^\.SH>). The addendum will
then be added B<after> the I<position point> and immediately B<before> the
first line matching the B<beginboundary>. That is to say:
You should select two step approach by setting B<mode=after>. Then you should
narrow down search to the line after B<AUTHORS> with the B<position> argument
regex. Then, you should match the beginning of the next section (i.e.,
B<^\.SH>) with the B<beginboundary> argument regex. That is to say:

PO4A-HEADER:mode=after;position=AUTHORS;beginboundary=\.SH

Expand All @@ -677,7 +690,7 @@ it's not unique), and give an B<endboundary> matching nothing. Don't use simple
strings here like B<"EOF">, but prefer those which have less chance to be in
your document.

PO4A-HEADER:mode=after;position=<title>About</title>;beginboundary=FakePo4aBoundary
PO4A-HEADER:mode=after;position=About this document;beginboundary=FakePo4aBoundary

=back

Expand Down Expand Up @@ -796,7 +809,9 @@ the input PO). Here is a graphical representation of this:

This little bone is the core of all the po4a architecture. If you omit the
input PO and the output document, you get B<po4a-gettextize>. If you provide
both input and disregard the output PO, you get B<po4a-translate>.
both input and disregard the output PO, you get B<po4a-translate>. The B<po4a>
calls TransTractor twice and calls B<msgmerge -U> between these TransTractor
invocations to provide one-stop solution with a single configuration file.

TransTractor::parse() is a virtual function implemented by each module. Here
is a little example to show you how it works. It parses a list of paragraphs,
Expand All @@ -823,6 +838,13 @@ each of them beginning with B<E<lt>pE<gt>>.
19 }
20 }

On line 6 and 7, we encounter C<shiftline()> and C<unshiftline()>. These help
you to read and unread the head of internal input data stream of master
document into the line string and its reference. Here, the reference is
provided by a string C<< $filename:$linenum >>. Please remember Perl only has
one dimensional array data structure. So codes handling the internal input
data stream line are a bit cryptic.

On line 6, we encounter B<E<lt>pE<gt>> for the second time. That's the signal
of the next paragraph. We should thus put the just obtained line back into
the original document (line 7) and push the paragraph built so far into the
Expand Down
2 changes: 2 additions & 0 deletions lib/Locale/Po4a/Dia.pm
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,7 @@ use 5.006;
use strict;
use warnings;

use Locale::Po4a::Common;
use Locale::Po4a::Xml;

use vars qw(@ISA);
Expand All @@ -95,6 +96,7 @@ sub initialize {
$self->SUPER::initialize(%options);
$self->{options}{'nostrip'}=1;
$self->{options}{'_default_translated'}.=' <dia:string>';
print wrap_mod("po4a::dia", dgettext("po4a", "Call treat_options")) if $self->{options}{'debug'};
$self->treat_options;
}

Expand Down
45 changes: 29 additions & 16 deletions lib/Locale/Po4a/Docbook.pm
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,28 @@ the file inclusion entities, but you can translate most of those files alone
(except the typical entities files), and it's usually better to maintain them
separated.
=head1 OVERRIDE THE DEFAULT BEHAVIOR WITH COMMAND LINE OPTIONS
The default behavior of system provided modules is set to be on the safe side.
For example, the default of B<< <author> >> tag is aiming it to appear under
B<< <para> >>. But you may be using it only under B<< <bookinfo> >>. For this
case, you may want to translate it independently for each author.
If you don't like the default behavior of the xml module and its derivative
modules, you can provide command line options to change their behavior. For
example, you can add the following to the po4a configuration file:
opt:"-k 0 -o nodefault=\"<bookinfo> <author>\" \
-o break=\"<bookinfo> <author>\" \
-o untranslated=\"<bookinfo>\" \
-o translated=\"<author>\""
This overrides the default behavior for B<< <bookinfo> >> and B<< <author> >>,
set B<< <bookinfo> >> and B<< <author> >> to break input data stream on these
tags, set B<< <bookinfo> >> not to translate its tagged content, and set B<<
<author> >> to translate its tagged content.
=head1 SEE ALSO
L<Locale::Po4a::TransTractor(3pm)>, L<Locale::Po4a::Xml(3pm)>, L<po4a(7)|po4a.7>
Expand All @@ -77,6 +99,7 @@ use 5.006;
use strict;
use warnings;

use Locale::Po4a::Common;
use Locale::Po4a::Xml;

use vars qw(@ISA);
Expand Down Expand Up @@ -393,9 +416,7 @@ sub initialize {
# classsynopsis; does not contain text; may be in a para
# NOTE: It may contain a classsynopsisinfo, which should be
# verbatim
# XXX: since it is in untranslated class, does the W flag takes
# effect?
$self->{options}{'_default_untranslated'} .= " W<classsynopsis>";
$self->{options}{'_default_untranslated'} .= " <classsynopsis>";
$self->{options}{'_default_placeholder'} .= " <classsynopsis>";

# classsynopsisinfo; contains text;
Expand All @@ -404,10 +425,7 @@ sub initialize {
$self->{options}{'_default_inline'} .= " <classsynopsisinfo>";

# cmdsynopsis; does not contain text; may be in a para
# NOTE: It may be clearer as a verbatim block
# XXX: since it is in untranslated class, does the W flag takes
# effect? => not completely. Rewrap afterward?
$self->{options}{'_default_untranslated'} .= " W<cmdsynopsis>";
$self->{options}{'_default_untranslated'} .= " <cmdsynopsis>";
$self->{options}{'_default_placeholder'} .= " <cmdsynopsis>";

# co; does not contain text; Formatted inline
Expand Down Expand Up @@ -507,10 +525,7 @@ sub initialize {
$self->{options}{'_default_break'} .= " <constraintdef>";

# constructorsynopsis; does not contain text; may be in a para
# NOTE: It may be clearer as a verbatim block
# XXX: since it is in untranslated class, does the W flag takes
# effect?
$self->{options}{'_default_untranslated'} .= " W<constructorsynopsis>";
$self->{options}{'_default_untranslated'} .= " <constructorsynopsis>";
$self->{options}{'_default_placeholder'} .= " <constructorsynopsis>";

# contractnum; contains text; Formatted inline or as a displayed block
Expand Down Expand Up @@ -575,10 +590,7 @@ sub initialize {
$self->{options}{'_default_break'} .= " <dedication>";

# destructorsynopsis; does not contain text; may be in a para
# NOTE: It may be clearer as a verbatim block
# XXX: since it is in untranslated class, does the W flag takes
# effect?
$self->{options}{'_default_untranslated'} .= " W<destructorsynopsis>";
$self->{options}{'_default_untranslated'} .= " <destructorsynopsis>";
$self->{options}{'_default_placeholder'} .= " <destructorsynopsis>";

# docinfo; does not contain text; removed in v4.0
Expand Down Expand Up @@ -775,7 +787,7 @@ sub initialize {
$self->{options}{'_default_placeholder'} .= " <graphicco>";

# group; does not contain text; Formatted inline
$self->{options}{'_default_untranslated'} .= " W<group>";
$self->{options}{'_default_untranslated'} .= " <group>";
$self->{options}{'_default_inline'} .= " <group>";

# guibutton; contains text; Formatted inline
Expand Down Expand Up @@ -2038,5 +2050,6 @@ sub initialize {
lang
xml:lang';

print wrap_mod("po4a::docbook::initialize", dgettext("po4a", "Call treat_options")) if $self->{options}{'debug'};
$self->treat_options;
}
2 changes: 2 additions & 0 deletions lib/Locale/Po4a/Guide.pm
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,7 @@ use 5.006;
use strict;
use warnings;

use Locale::Po4a::Common;
use Locale::Po4a::Xml;

use vars qw(@ISA);
Expand Down Expand Up @@ -147,5 +148,6 @@ sub initialize {
<sup>
<uri>
<var>';
print wrap_mod("po4a::guide", dgettext("po4a", "Call treat_options")) if $self->{options}{'debug'};
$self->treat_options;
}
6 changes: 3 additions & 3 deletions lib/Locale/Po4a/InProgress/Debconf.pm
Original file line number Diff line number Diff line change
Expand Up @@ -121,9 +121,9 @@ sub parse {
}

$eval .= ")\n";
print STDERR $eval if $self->debug();
print STDERR $eval if $self->{options}{'debug'};
eval $eval;
print STDERR "XXXXXXXXXXXXXXXXX\n" if $self->debug();
print STDERR "XXXXXXXXXXXXXXXXX\n" if $self->{options}{'debug'};

# two leading _: split on coma and multi-translate each part. No extended value.
} elsif ($undercount == 2) {
Expand All @@ -140,7 +140,7 @@ sub parse {
}
$eval .= ")\n";

print $eval if $self->debug();
print $eval if $self->{options}{'debug'};
eval $eval;

# no leading _: don't touch it
Expand Down
6 changes: 3 additions & 3 deletions lib/Locale/Po4a/InProgress/NewsDebian.pm
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ sub parse {

# main loop
($line,$lref)=$self->shiftline();
print "seen >>$line<<\n" if $self->debug();
print "seen >>$line<<\n" if $self->{options}{'debug'};
while (defined($line)) {

# Begining of an entry
Expand All @@ -99,7 +99,7 @@ sub parse {
# eat all leading empty lines
($line,$lref)=$self->shiftline();
while (defined($line) && $line =~ m/^\s*$/) {
print "Eat >>$line<<\n" if $self->debug();
print "Eat >>$line<<\n" if $self->{options}{'debug'};
($line,$lref)=$self->shiftline();
}
# ups, ate one line too much. Put it back.
Expand Down Expand Up @@ -128,7 +128,7 @@ sub parse {
}

($line,$lref)=$self->shiftline();
print "seen >>".($line || '')."<<\n" if $self->debug();
print "seen >>".($line || '')."<<\n" if $self->{options}{'debug'};
}
}

Expand Down
2 changes: 1 addition & 1 deletion lib/Locale/Po4a/Sgml.pm
Original file line number Diff line number Diff line change
Expand Up @@ -563,7 +563,7 @@ sub parse_file {
# Remove <![ IGNORE [ sections
# FIXME: we don't support included PO4A-beg-
my $tmp1 = $origfile;
while ($tmp1 =~ m/^(.*?)({PO4A-beg-\s*IGNORE\s*}(?:.+?)<po4aend>)(.*)$/s)
while ($tmp1 =~ m/^(.*?)(\{PO4A-beg-\s*IGNORE\s*}(?:.+?)<po4aend>)(.*)$/s)
{
my ($begin,$ignored,$end) = ($1, $2, $3);
my @begin = split(/\n/, $begin);
Expand Down

0 comments on commit 7f3ff73

Please sign in to comment.