Merge pull request #153 from osamuaoki/master

enhance debug output, fix docbook behavior, allow separate input for POT
mquinson · Oct 17, 2018 · 7f3ff73 · 7f3ff73
2 parents 062957b + efb9a87
commit 7f3ff73
Show file tree

Hide file tree

Showing 29 changed files with 1,218 additions and 185 deletions.
diff --git a/NEWS b/NEWS
@@ -14,6 +14,32 @@ AsciiDoc:
 
 Translations:
  * Updated: German, thanks Helge Kreutzmann.
+
+TransTractor:
+ * Ensure to split lines before addendum operation without loss or addition of
+   newline.  With this change, addendum behavior is more intuitive.  
+   (Debian's #518218, Github's #147, #153)
+
+Xml, Docbook:
+ * Document XML tag behavior customization with example to help use case
+   specific customization.  (Debian's #515763)
+ * Debug output enhancement to help people understand what exactly happening
+   inside po4a.
+ * Extensive POD and code comment additions and updates.
+
+Sgml:
+ * Avoid deprecated unescaped left brace in regex to get ready for Perl 5.32.
+   (Debian's #903735)
+
+po4a tool:
+ * Add pot_in feature to support the secondary master file for the base of
+   POT/PO file generation.
+
+Test:
+ * Add XML test cases with po4a including addendum, tag customization, and
+   pot_in feature.
+
+=======================================================================
         ___   ____  _  _
 __   __/ _ \ | ___|| || |
 \ \ / / | | ||___ \| || |_   Back on track. Again.

diff --git a/doc/po4a.7.pod b/doc/po4a.7.pod
@@ -552,10 +552,21 @@ header indicating where in the produced document they should be placed. The
 rest of the addendum file will be added verbatim at the determined position of
 the resulting document.
 
-The header has a pretty rigid syntax: It must begin with the string
-B<PO4A-HEADER:>, followed by a semi-colon (B<;>) separated list of
-I<key>B<=>I<value> fields. White spaces ARE important. Note that you cannot use
-the semi-colon char (B<;>) in the value, and that quoting it doesn't help.
+The header line which specify context has a pretty rigid syntax: It must begin
+with the string B<PO4A-HEADER:>, followed by a semi-colon (B<;>) separated list
+of I<key>B<=>I<value> fields. White spaces ARE important. Note that you cannot
+use the semi-colon char (B<;>) in the value, and that quoting it doesn't help.
+Optionally, spaces (B< >) may be inserted before I<key> for readability.
+
+Although this context search may be considered to operate roughly on each line
+of the translated document, it actually operates on the internal data string of
+the translated document.  This internal data string may be a text spanning a
+paragraph containing multiple lines or may be a XML tag itself alone.  The
+exact I<insertion point> of the addendum must be before or after the internal
+data string and can not be within the internal data string.
+
+The actual internal data string of the translated document can be visualized by
+executing po4a in debug mode.
 
 Again, it sounds scary, but the examples given below should help you to find
 how to write the header line you need. To illustrate the discussion, assume
@@ -566,53 +577,56 @@ Here are the possible header keys:
 
 =over 4
 
-=item B<position> (mandatory)
+=item B<mode> (mandatory)
 
-a Perl regexp. The addendum will be placed near the line matching this regexp.
-Note that we're speaking about the translated document here, not the
-original. If more than a line match this expression (or none), the addition
-will fail. It is indeed better to report an error than inserting the
-addendum at the wrong location.
+It can be either the string B<before> or B<after>.
 
-This line is called I<position point> in the following. The point where the
-addendum is added is called I<insertion point>. Those two points are near one
-from another, but not equal. For example, if you want to insert a new section,
-it is easier to put the I<position point> on the title of the preceding section
-and explain po4a where the section ends (remember that I<position point> is
-given by a regexp which should match a unique line).
+If B<mode=before>, the I<insertion point> is determined by one step regex match
+specified by the B<position> argument regex.  The I<insertion point> is
+immediately before the uniquely matched internal data string of the translated
+document. 
 
-The localization of the I<insertion point> with regard to the I<position point>
-is controlled by the B<mode>, B<beginboundary> and B<endboundary> fields, as
-explained below.
+If B<mode=after>, the I<insertion point> is determined by two step regex
+matches specified by the B<position> argument regex; and by the
+B<beginboundary> or B<endboundary> argument regex.
 
-In our case, we would have:
+Since there may be multiple sections for the assumed case, let's use 2 step
+approach.
 
-     position=<title>About this document</title>
+     mode=after
 
+=item B<position> (mandatory)
 
-=item B<mode> (mandatory)
+A Perl regexp for specifying the context.
 
-It can be either the string B<before> or B<after>, specifying the position of
-the addendum, relative to the I<position point>. In case B<before> is given
-the I<insertion point> will placed exactly before the I<position point>. The
-B<after> behaviour is detailed bellow.
+If more than one internal data strings match this expression (or none), the
+search for the I<insertion point> and addition of the addendum will fail. It is
+indeed better to report an error than inserting the addendum at the wrong
+location.
 
-Since we want the new section to be placed below the one we are matching, we
-have:
+If B<mode=before>, the I<insertion point> is specified to be immediately before
+the internal data string uniquely matching the B<position> argument regex.
 
-     mode=after
+If B<mode=after>, the search for the I<insertion point> is narrowed down to the
+data after the internal data string uniquely matching the B<position> argument
+regex.  The exact I<insertion point> is further specified by the
+B<beginboundary> or B<endboundary>.
+
+In our case, we need to skip several preceding sections by narrowing down
+search using the section title string.
+
+     position=About this document
+
+(In reality, you need to use the translated section title string here,
+instead.)
 
 =item B<beginboundary> (used only when B<mode=after>, and mandatory in that case)
 
 =item B<endboundary> (idem)
 
-regexp matching the end of the section after which the addendum goes.
-
-When B<mode=after>, the I<insertion point> is after the I<position point>, but
-not directly after! It is placed at the end of the section beginning at the
-I<position point>, i.e., after or before the line matched by the
-I<???>B<boundary> argument, depending on whether you used B<beginboundary> or
-B<endboundary>.
+A second Perl regexp required only when B<mode=after>. The addendum will be
+placed immediately before or after the first internal data string matching the
+B<beginboundary> or B<endboundary> argument regexp, respectively.
 
 In our case, we can choose to indicate the end of the section we match by
 adding:
@@ -650,16 +664,15 @@ document, you can use either of those header lines:
  PO4A-HEADER: mode=after; position=About this document; endboundary=</section>
  PO4A-HEADER: mode=after; position=About this document; beginboundary=<section>
 
-
 =item
 If you want to add something after the following nroff section:
 
   .SH "AUTHORS"
 
-you should put a B<position> matching this line, and a B<beginboundary>
-matching the beginning of the next section (i.e., B<^\.SH>). The addendum will
-then be added B<after> the I<position point> and immediately B<before> the
-first line matching the B<beginboundary>. That is to say:
+You should select two step approach by setting B<mode=after>.  Then you should
+narrow down search to the line after B<AUTHORS> with the B<position> argument
+regex.  Then, you should match the beginning of the next section (i.e.,
+B<^\.SH>) with the B<beginboundary> argument regex. That is to say:
 
  PO4A-HEADER:mode=after;position=AUTHORS;beginboundary=\.SH
 
@@ -677,7 +690,7 @@ it's not unique), and give an B<endboundary> matching nothing. Don't use simple
 strings here like B<"EOF">, but prefer those which have less chance to be in
 your document.
 
- PO4A-HEADER:mode=after;position=<title>About</title>;beginboundary=FakePo4aBoundary
+ PO4A-HEADER:mode=after;position=About this document;beginboundary=FakePo4aBoundary
 
 =back
 
@@ -796,7 +809,9 @@ the input PO). Here is a graphical representation of this:
 
 This little bone is the core of all the po4a architecture. If you omit the
 input PO and the output document, you get B<po4a-gettextize>. If you provide
-both input and disregard the output PO, you get B<po4a-translate>.
+both input and disregard the output PO, you get B<po4a-translate>.  The B<po4a>
+calls TransTractor twice and calls B<msgmerge -U> between these TransTractor
+invocations to provide one-stop solution with a single configuration file.
 
 TransTractor::parse() is a virtual function implemented by each module. Here
 is a little example to show you how it works. It parses a list of paragraphs,
@@ -823,6 +838,13 @@ each of them beginning with B<E<lt>pE<gt>>.
  19   }
  20 }
 
+On line 6 and 7, we encounter C<shiftline()> and C<unshiftline()>.  These help
+you to read and unread the head of internal input data stream of master
+document into the line string and its reference.  Here, the reference is
+provided by a string C<< $filename:$linenum >>.  Please remember Perl only has
+one dimensional array data structure.  So codes handling the internal input
+data stream line are a bit cryptic.
+
 On line 6, we encounter B<E<lt>pE<gt>> for the second time. That's the signal
 of the next paragraph. We should thus put the just obtained line back into
 the original document (line 7) and push the paragraph built so far into the

diff --git a/lib/Locale/Po4a/Dia.pm b/lib/Locale/Po4a/Dia.pm
@@ -83,6 +83,7 @@ use 5.006;
 use strict;
 use warnings;
 
+use Locale::Po4a::Common;
 use Locale::Po4a::Xml;
 
 use vars qw(@ISA);
@@ -95,6 +96,7 @@ sub initialize {
     $self->SUPER::initialize(%options);
     $self->{options}{'nostrip'}=1;
     $self->{options}{'_default_translated'}.=' <dia:string>';
+    print wrap_mod("po4a::dia", dgettext("po4a", "Call treat_options")) if $self->{options}{'debug'};
     $self->treat_options;
 }
 

diff --git a/lib/Locale/Po4a/Docbook.pm b/lib/Locale/Po4a/Docbook.pm
@@ -53,6 +53,28 @@ the file inclusion entities, but you can translate most of those files alone
 (except the typical entities files), and it's usually better to maintain them
 separated.
 
+=head1 OVERRIDE THE DEFAULT BEHAVIOR WITH COMMAND LINE OPTIONS
+
+The default behavior of system provided modules is set to be on the safe side.
+
+For example, the default of B<< <author> >> tag is aiming it to appear under
+B<< <para> >>.  But you may be using it only under B<< <bookinfo> >>.  For this
+case, you may want to translate it independently for each author.
+
+If you don't like the default behavior of the xml module and its derivative
+modules, you can provide command line options to change their behavior.  For
+example, you can add the following to the po4a configuration file:
+
+  opt:"-k 0 -o nodefault=\"<bookinfo> <author>\" \
+            -o break=\"<bookinfo> <author>\" \
+            -o untranslated=\"<bookinfo>\" \
+            -o translated=\"<author>\""
+
+This overrides the default behavior for B<< <bookinfo> >> and B<< <author> >>,
+set B<< <bookinfo> >> and B<< <author> >> to break input data stream on these
+tags, set B<< <bookinfo> >> not to translate its tagged content, and set B<<
+<author> >> to translate its tagged content.
+
 =head1 SEE ALSO
 
 L<Locale::Po4a::TransTractor(3pm)>, L<Locale::Po4a::Xml(3pm)>, L<po4a(7)|po4a.7>
@@ -77,6 +99,7 @@ use 5.006;
 use strict;
 use warnings;
 
+use Locale::Po4a::Common;
 use Locale::Po4a::Xml;
 
 use vars qw(@ISA);
@@ -393,9 +416,7 @@ sub initialize {
     # classsynopsis; does not contain text; may be in a para
     # NOTE: It may contain a classsynopsisinfo, which should be
     #       verbatim
-    # XXX: since it is in untranslated class, does the W flag takes
-    #      effect?
-    $self->{options}{'_default_untranslated'} .= " W<classsynopsis>";
+    $self->{options}{'_default_untranslated'} .= " <classsynopsis>";
     $self->{options}{'_default_placeholder'} .= " <classsynopsis>";
 
     # classsynopsisinfo; contains text;
@@ -404,10 +425,7 @@ sub initialize {
     $self->{options}{'_default_inline'} .= " <classsynopsisinfo>";
 
     # cmdsynopsis; does not contain text; may be in a para
-    # NOTE: It may be clearer as a verbatim block
-    # XXX: since it is in untranslated class, does the W flag takes
-    #      effect? => not completely. Rewrap afterward?
-    $self->{options}{'_default_untranslated'} .= " W<cmdsynopsis>";
+    $self->{options}{'_default_untranslated'} .= " <cmdsynopsis>";
     $self->{options}{'_default_placeholder'} .= " <cmdsynopsis>";
 
     # co; does not contain text; Formatted inline
@@ -507,10 +525,7 @@ sub initialize {
     $self->{options}{'_default_break'} .= " <constraintdef>";
 
     # constructorsynopsis; does not contain text; may be in a para
-    # NOTE: It may be clearer as a verbatim block
-    # XXX: since it is in untranslated class, does the W flag takes
-    #      effect?
-    $self->{options}{'_default_untranslated'} .= " W<constructorsynopsis>";
+    $self->{options}{'_default_untranslated'} .= " <constructorsynopsis>";
     $self->{options}{'_default_placeholder'} .= " <constructorsynopsis>";
 
     # contractnum; contains text; Formatted inline or as a displayed block
@@ -575,10 +590,7 @@ sub initialize {
     $self->{options}{'_default_break'} .= " <dedication>";
 
     # destructorsynopsis; does not contain text; may be in a para
-    # NOTE: It may be clearer as a verbatim block
-    # XXX: since it is in untranslated class, does the W flag takes
-    #      effect?
-    $self->{options}{'_default_untranslated'} .= " W<destructorsynopsis>";
+    $self->{options}{'_default_untranslated'} .= " <destructorsynopsis>";
     $self->{options}{'_default_placeholder'} .= " <destructorsynopsis>";
 
     # docinfo; does not contain text; removed in v4.0
@@ -775,7 +787,7 @@ sub initialize {
     $self->{options}{'_default_placeholder'} .= " <graphicco>";
 
     # group; does not contain text; Formatted inline
-    $self->{options}{'_default_untranslated'} .= " W<group>";
+    $self->{options}{'_default_untranslated'} .= " <group>";
     $self->{options}{'_default_inline'} .= " <group>";
 
     # guibutton; contains text; Formatted inline
@@ -2038,5 +2050,6 @@ sub initialize {
         lang
         xml:lang';
 
+    print wrap_mod("po4a::docbook::initialize", dgettext("po4a", "Call treat_options")) if $self->{options}{'debug'};
     $self->treat_options;
 }
diff --git a/lib/Locale/Po4a/Guide.pm b/lib/Locale/Po4a/Guide.pm
@@ -75,6 +75,7 @@ use 5.006;
 use strict;
 use warnings;
 
+use Locale::Po4a::Common;
 use Locale::Po4a::Xml;
 
 use vars qw(@ISA);
@@ -147,5 +148,6 @@ sub initialize {
         <sup>
         <uri>
         <var>';
+    print wrap_mod("po4a::guide", dgettext("po4a", "Call treat_options")) if $self->{options}{'debug'};
     $self->treat_options;
 }
diff --git a/lib/Locale/Po4a/InProgress/Debconf.pm b/lib/Locale/Po4a/InProgress/Debconf.pm
@@ -121,9 +121,9 @@ sub parse {
             }
 
             $eval .= ")\n";
-            print STDERR $eval if $self->debug();
+            print STDERR $eval if $self->{options}{'debug'};
             eval $eval;
-            print STDERR "XXXXXXXXXXXXXXXXX\n" if $self->debug();
+            print STDERR "XXXXXXXXXXXXXXXXX\n" if $self->{options}{'debug'};
 
         # two leading _: split on coma and multi-translate each part. No extended value.
         } elsif ($undercount == 2) {
@@ -140,7 +140,7 @@ sub parse {
             }
             $eval .= ")\n";
 
-            print $eval if $self->debug();
+            print $eval if $self->{options}{'debug'};
             eval $eval;
 
         # no leading _: don't touch it

diff --git a/lib/Locale/Po4a/InProgress/NewsDebian.pm b/lib/Locale/Po4a/InProgress/NewsDebian.pm
@@ -81,7 +81,7 @@ sub parse {
 
     # main loop
     ($line,$lref)=$self->shiftline();
-    print "seen >>$line<<\n" if $self->debug();
+    print "seen >>$line<<\n" if $self->{options}{'debug'};
     while (defined($line)) {
 
         # Begining of an entry
@@ -99,7 +99,7 @@ sub parse {
             # eat all leading empty lines
             ($line,$lref)=$self->shiftline();
             while (defined($line) && $line =~ m/^\s*$/) {
-                print "Eat >>$line<<\n" if $self->debug();
+                print "Eat >>$line<<\n" if $self->{options}{'debug'};
                 ($line,$lref)=$self->shiftline();
             }
             # ups, ate one line too much. Put it back.
@@ -128,7 +128,7 @@ sub parse {
         }
 
         ($line,$lref)=$self->shiftline();
-        print "seen >>".($line || '')."<<\n" if $self->debug();
+        print "seen >>".($line || '')."<<\n" if $self->{options}{'debug'};
     }
 }
 

diff --git a/lib/Locale/Po4a/Sgml.pm b/lib/Locale/Po4a/Sgml.pm
@@ -563,7 +563,7 @@ sub parse_file {
     # Remove <![ IGNORE [ sections
     # FIXME: we don't support included PO4A-beg-
     my $tmp1 = $origfile;
-    while ($tmp1 =~ m/^(.*?)({PO4A-beg-\s*IGNORE\s*}(?:.+?)<po4aend>)(.*)$/s)
+    while ($tmp1 =~ m/^(.*?)(\{PO4A-beg-\s*IGNORE\s*}(?:.+?)<po4aend>)(.*)$/s)
     {
         my ($begin,$ignored,$end) = ($1, $2, $3);
         my @begin   = split(/\n/, $begin);