Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What do we mean by "paragraph mode"? #16787

Closed
p5pRT opened this issue Dec 12, 2018 · 15 comments

Comments

@p5pRT
Copy link
Collaborator

commented Dec 12, 2018

Migrated from rt.perl.org#133722 (status was 'resolved')

Searchable as RT133722$

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

commented Dec 12, 2018

From @jkeenan

Summary​: This ticket replaces RT 133703 and proposes improved
documentation and testing for the so-called "paragraph mode", i.e.,
processing of records in a file while $/ set to the empty string.

I. Background​:

In https://rt-archive.perl.org/perl5/Ticket/Display.html?id=133703, I submitted a
patch to sv.c intended to address a "Comparison result is always the
same" warning reported by LTGM.com analysis of the Perl 5 core
distribution. Dave Mitchell pointed out where the patch was wrong and,
in the course of his discussion, mentioned that the section of sv.c at
issue governed so-called "paragraph mode", that is, what happens when
you read a file when the input record separator has been set to an empty
string​:

#####
$/ = '';
#####

While poking around in the code and the test suite, I became convinced​:

(a) that I didn't understand "paragraph mode" very well;

(b) that "paragraph mode" wasn't well documented;

(c) that "paragraph mode" wasn't thoroughly tested in the core
distribution's test suite; and

(d) that as a consequence of (b) and (c), it might contain bugs.

I spent several days working on this. I no longer think "paragraph
mode" has bugs, but I'm more convinced that it is under-documented and
under-tested. In this RT I propose better documentation of paragraph
mode and additional tests.

II. Paragraph Mode as Currently Found in the Core Distribution

From this point forward I'll treat "paragraph mode" and "setting $/ to
an empty string" as equivalent.

In the core distribution $/ (or $INPUT_RECORD_SEPARATOR) is defined in
pod/perlvar.pod and is mainly discussed in perlfaq5 and perlfaq6.
Please see this attachment​:

#####
paragraph_mode_in_core_distribution.pod
#####

... for a thorough discussion of paragraph mode in the core
distribution, including instances in code found in cpan/, ext/, dist/
and lib/. You can also find this discussion on the web at​:

#####
http​://thenceforward.net/perl/misc/paragraph_mode_in_core_distribution.html
#####

III. Proposed Additional Documentation

I believe that the best place to put additional discussion of paragraph
mode will be in perlfaq5. Please review the patch attached

#####
0001-More-detailed-explanation-of-paragraph-mode.patch
#####

Note that perlfaq5 is maintained upstream on CPAN, so once P5P is
on-board with the patch, I will submit it as a github issue at
https://github.com/perl-doc-cats/perlfaq/issues.

IV. Proposed Additional Testing

Please see the program attached​:

#####
paragraph_mode.t
#####

To facilitate discussion, I've written this test program in a modern
style using Test​::More, File​::Temp, CPAN module Data​::Dump and
subroutines. When we bring this into the core distribution, I'll adapt
it for inclusion under, say, t/op/, where I'll have to use only the
testing functions provided by t/test.pl.

What is important for discussion now is​: Do these tests thoroughly
cover what we mean by "paragraph mode"?

Thank you very much.
Jim Keenan

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

commented Dec 12, 2018

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

commented Dec 12, 2018

From @jkeenan

0001-More-detailed-explanation-of-paragraph-mode.patch
From dc1d6b22a64e9ddbf003204beef033bf320cbe81 Mon Sep 17 00:00:00 2001
From: James E Keenan <jkeenan@cpan.org>
Date: Wed, 12 Dec 2018 16:52:00 -0500
Subject: [PATCH] More detailed explanation of "paragraph mode"

---
 lib/perlfaq5.pod | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/lib/perlfaq5.pod b/lib/perlfaq5.pod
index 7464b26..96470e7 100644
--- a/lib/perlfaq5.pod
+++ b/lib/perlfaq5.pod
@@ -1179,6 +1179,16 @@ C<"\n\n"> to accept empty paragraphs.
 Note that a blank line must have no blanks in it. Thus
 S<C<"fred\n \nstuff\n\n">> is one paragraph, but C<"fred\n\nstuff\n\n"> is two.
 
+When C<$/> is set to C<""> -- so-called I<paragraph mode> -- and the entire
+file is read in with that setting, any sequence of consecutive newlines
+C<"\n\n"> at the beginning of the file is discarded.  With the exception of
+the final record in the file, each sequence of characters ending in two or
+more newlines is treated as one record and is read in to end in exactly two
+newlines.  If the last record in the file ends in zero or one consecutive
+newlines, that record is read in with that number of newlines.  If the last
+record ends in two or more consecutive newlines, it is read in with two
+newlines like all preceding records.
+
 =head2 How can I read a single character from a file? From the keyboard?
 X<getc> X<file, reading one character at a time>
 
-- 
2.17.1

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

commented Dec 12, 2018

From @jkeenan

use strict;
use warnings;
use Test​::More;
use File​::Temp qw( tempfile );
use Data​::Dumper;
use Data​::Dump qw( dd pp );

# Test paragraph mode in two ways​:
# first, in the style of t/base/rs.t, by reading one record at a time
# and comparing it to an expected value,
# then, after seek-ing back to start of file, by reading all the records
# into an array and then comparing the array to an expected value.

my ($OUT, $filename, @​chunks, @​expected, $msg);
my $testcount = 0;

{
  note("'Well behaved' files");
  # We start with files whose "paragraphs" contain no internal newlines.
  @​chunks = (
  join('' => ( 1..3 )),
  join('' => ( 4..6 )),
  join('' => ( 7..9 )),
  10
  );

  {
  $msg = "Case " . ++$testcount . "​: 'Well behaved' file​: >= 2 newlines between text blocks\n";
  $msg .= " no internal newlines; no final newline";
  note($msg);

  ($OUT, $filename) = open_tempfile();
  print $OUT "$_\n" for (
  $chunks[0],
  ("") x 1,
  $chunks[1],
  ("") x 2,
  $chunks[2],
  ("") x 3,
  );
  print $OUT $chunks[3];
  close $OUT or die;

  @​expected = (
  "$chunks[0]\n\n",
  "$chunks[1]\n\n",
  "$chunks[2]\n\n",
  $chunks[3],
  );
  local $/ = '';
  perform_tests($filename, \@​expected);
  }

  {
  $msg = "Case " . ++$testcount . "​: 'Well behaved' file​: >= 2 newlines between text blocks\n";
  $msg .= " no internal newlines; 1 final newline";
  note($msg);

  ($OUT, $filename) = open_tempfile();
  print $OUT "$_\n" for (
  $chunks[0],
  ("") x 1,
  $chunks[1],
  ("") x 2,
  $chunks[2],
  ("") x 3,
  $chunks[3],
  );
  close $OUT or die;

  @​expected = (
  "$chunks[0]\n\n",
  "$chunks[1]\n\n",
  "$chunks[2]\n\n",
  "$chunks[3]\n",
  );
  local $/ = '';
  perform_tests($filename, \@​expected);
  }

  {
  $msg = "Case " . ++$testcount . "​: 'Well behaved' file​: >= 2 newlines between text blocks\n";
  $msg .= " no internal newlines; 2 final newlines";
  note($msg);

  ($OUT, $filename) = open_tempfile();
  print $OUT "$_\n" for (
  $chunks[0],
  ("") x 1,
  $chunks[1],
  ("") x 2,
  $chunks[2],
  ("") x 3,
  $chunks[3],
  ("") x 1,
  );
  close $OUT or die;

  @​expected = (
  "$chunks[0]\n\n",
  "$chunks[1]\n\n",
  "$chunks[2]\n\n",
  "$chunks[3]\n\n",
  );
  local $/ = '';
  perform_tests($filename, \@​expected);
  }

  {
  $msg = "Case " . ++$testcount . "​: 'Well behaved' file​: >= 2 newlines between text blocks\n";
  $msg .= " no internal newlines; 3 final newlines";
  note($msg);

  ($OUT, $filename) = open_tempfile();
  print $OUT "$_\n" for (
  $chunks[0],
  ("") x 1,
  $chunks[1],
  ("") x 2,
  $chunks[2],
  ("") x 3,
  $chunks[3],
  ("") x 2,
  );
  close $OUT or die;

  @​expected = (
  "$chunks[0]\n\n",
  "$chunks[1]\n\n",
  "$chunks[2]\n\n",
  "$chunks[3]\n\n",
  );
  local $/ = '';
  perform_tests($filename, \@​expected);
  }
}

{
  note("'Misbehaving' files");
  # We continue with files whose "paragraphs" contain internal newlines.
  @​chunks = (
  join('' => ( 1, 2, "\n", 3 )),
  join('' => ( 4, 5, " \n", 6 )),
  join('' => ( 7, 8, " \t\n", 9 )),
  10
  );

  {
  $msg = "Case " . ++$testcount . "​: 'Misbehaving' file​: >= 2 newlines between text blocks\n";
  $msg .= " no internal newlines; no final newline";
  note($msg);

  ($OUT, $filename) = open_tempfile();
  print $OUT "$_\n" for (
  $chunks[0],
  ("") x 1,
  $chunks[1],
  ("") x 2,
  $chunks[2],
  ("") x 3,
  );
  print $OUT $chunks[3];
  close $OUT or die;

  @​expected = (
  "$chunks[0]\n\n",
  "$chunks[1]\n\n",
  "$chunks[2]\n\n",
  $chunks[3],
  );
  local $/ = '';
  perform_tests($filename, \@​expected);
  }

  {
  $msg = "Case " . ++$testcount . "​: 'Misbehaving' file​: >= 2 newlines between text blocks\n";
  $msg .= " no internal newlines; 1 final newline";
  note($msg);

  ($OUT, $filename) = open_tempfile();
  print $OUT "$_\n" for (
  $chunks[0],
  ("") x 1,
  $chunks[1],
  ("") x 2,
  $chunks[2],
  ("") x 3,
  $chunks[3],
  );
  close $OUT or die;

  @​expected = (
  "$chunks[0]\n\n",
  "$chunks[1]\n\n",
  "$chunks[2]\n\n",
  "$chunks[3]\n",
  );
  local $/ = '';
  perform_tests($filename, \@​expected);
  }

  {
  $msg = "Case " . ++$testcount . "​: 'Misbehaving' file​: >= 2 newlines between text blocks\n";
  $msg .= " no internal newlines; 2 final newlines";
  note($msg);

  ($OUT, $filename) = open_tempfile();
  print $OUT "$_\n" for (
  $chunks[0],
  ("") x 1,
  $chunks[1],
  ("") x 2,
  $chunks[2],
  ("") x 3,
  $chunks[3],
  ("") x 1,
  );
  close $OUT or die;

  @​expected = (
  "$chunks[0]\n\n",
  "$chunks[1]\n\n",
  "$chunks[2]\n\n",
  "$chunks[3]\n\n",
  );
  local $/ = '';
  perform_tests($filename, \@​expected);
  }

  {
  $msg = "Case " . ++$testcount . "​: 'Misbehaving' file​: >= 2 newlines between text blocks\n";
  $msg .= " no internal newlines; 3 final newlines";
  note($msg);

  ($OUT, $filename) = open_tempfile();
  print $OUT "$_\n" for (
  $chunks[0],
  ("") x 1,
  $chunks[1],
  ("") x 2,
  $chunks[2],
  ("") x 3,
  $chunks[3],
  ("") x 2,
  );
  close $OUT or die;

  @​expected = (
  "$chunks[0]\n\n",
  "$chunks[1]\n\n",
  "$chunks[2]\n\n",
  "$chunks[3]\n\n",
  );
  local $/ = '';
  perform_tests($filename, \@​expected);
  }
}

{
  note("'Badly behaved' files starting with newlines");
  # We continue with files which start with newlines
  # but whose "paragraphs" contain no internal newlines.
  # We'll set our expectation that the leading newlines will get trimmed off
  # and everything else will proceed normally.

  @​chunks = (
  join('' => ( 1..3 )),
  join('' => ( 4..6 )),
  join('' => ( 7..9 )),
  10
  );

  {
  $msg = "Case " . ++$testcount . "​: 'Badly behaved' file​: leading newlines";
  note($msg);

  ($OUT, $filename) = open_tempfile();
  print $OUT "\n\n\n";
  print $OUT "$_\n" for (
  $chunks[0],
  ("") x 1,
  $chunks[1],
  ("") x 2,
  $chunks[2],
  ("") x 3,
  );
  print $OUT $chunks[3];
  close $OUT or die;

  @​expected = (
  "$chunks[0]\n\n",
  "$chunks[1]\n\n",
  "$chunks[2]\n\n",
  $chunks[3],
  );
  local $/ = '';
  perform_tests($filename, \@​expected);
  }

  {
  $msg = "Case " . ++$testcount . "​: 'Badly behaved' file​: leading newlines";
  note($msg);

  ($OUT, $filename) = open_tempfile();
  print $OUT "\n\n\n";
  print $OUT "$_\n" for (
  $chunks[0],
  ("") x 1,
  $chunks[1],
  ("") x 2,
  $chunks[2],
  ("") x 3,
  $chunks[3],
  );
  close $OUT or die;

  @​expected = (
  "$chunks[0]\n\n",
  "$chunks[1]\n\n",
  "$chunks[2]\n\n",
  "$chunks[3]\n",
  );
  local $/ = '';
  perform_tests($filename, \@​expected);
  }

  {
  $msg = "Case " . ++$testcount . "​: 'Badly behaved' file​: leading newlines";
  note($msg);

  ($OUT, $filename) = open_tempfile();
  print $OUT "\n\n\n";
  print $OUT "$_\n" for (
  $chunks[0],
  ("") x 1,
  $chunks[1],
  ("") x 2,
  $chunks[2],
  ("") x 3,
  $chunks[3],
  ("") x 1,
  );
  close $OUT or die;

  @​expected = (
  "$chunks[0]\n\n",
  "$chunks[1]\n\n",
  "$chunks[2]\n\n",
  "$chunks[3]\n\n",
  );
  local $/ = '';
  perform_tests($filename, \@​expected);
  }

  {
  $msg = "Case " . ++$testcount . "​: 'Badly behaved' file​: leading newlines";
  note($msg);

  ($OUT, $filename) = open_tempfile();
  print $OUT "\n\n\n";
  print $OUT "$_\n" for (
  $chunks[0],
  ("") x 1,
  $chunks[1],
  ("") x 2,
  $chunks[2],
  ("") x 3,
  $chunks[3],
  ("") x 2,
  );
  close $OUT or die;

  @​expected = (
  "$chunks[0]\n\n",
  "$chunks[1]\n\n",
  "$chunks[2]\n\n",
  "$chunks[3]\n\n",
  );
  local $/ = '';
  perform_tests($filename, \@​expected);
  }
}

{
  note("'Very badly behaved' files starting with newlines");
  # We continue with files which start with newlines
  # and whose "paragraphs" contain internal newlines.
  # We'll set our expectation that the leading newlines will get trimmed off
  # and everything else will proceed normally.

  @​chunks = (
  join('' => ( 1, 2, "\n", 3 )),
  join('' => ( 4, 5, " \n", 6 )),
  join('' => ( 7, 8, " \t\n", 9 )),
  10
  );

  {
  $msg = "Case " . ++$testcount . "​: 'Very badly behaved' file​: leading newlines";
  $msg .= " internal newlines";
  note($msg);

  ($OUT, $filename) = open_tempfile();
  print $OUT "\n\n\n";
  print $OUT "$_\n" for (
  $chunks[0],
  ("") x 1,
  $chunks[1],
  ("") x 2,
  $chunks[2],
  ("") x 3,
  );
  print $OUT $chunks[3];
  close $OUT or die;

  @​expected = (
  "$chunks[0]\n\n",
  "$chunks[1]\n\n",
  "$chunks[2]\n\n",
  $chunks[3],
  );
  local $/ = '';
  perform_tests($filename, \@​expected);
  }

  {
  $msg = "Case " . ++$testcount . "​: 'Very badly behaved' file​: leading newlines";
  $msg .= " internal newlines";
  note($msg);

  ($OUT, $filename) = open_tempfile();
  print $OUT "\n\n\n";
  print $OUT "$_\n" for (
  $chunks[0],
  ("") x 1,
  $chunks[1],
  ("") x 2,
  $chunks[2],
  ("") x 3,
  $chunks[3],
  );
  close $OUT or die;

  @​expected = (
  "$chunks[0]\n\n",
  "$chunks[1]\n\n",
  "$chunks[2]\n\n",
  "$chunks[3]\n",
  );
  local $/ = '';
  perform_tests($filename, \@​expected);
  }

  {
  $msg = "Case " . ++$testcount . "​: 'Very badly behaved' file​: leading newlines";
  $msg .= " internal newlines";
  note($msg);

  ($OUT, $filename) = open_tempfile();
  print $OUT "\n\n\n";
  print $OUT "$_\n" for (
  $chunks[0],
  ("") x 1,
  $chunks[1],
  ("") x 2,
  $chunks[2],
  ("") x 3,
  $chunks[3],
  ("") x 1,
  );
  close $OUT or die;

  @​expected = (
  "$chunks[0]\n\n",
  "$chunks[1]\n\n",
  "$chunks[2]\n\n",
  "$chunks[3]\n\n",
  );
  local $/ = '';
  perform_tests($filename, \@​expected);
  }

  {
  $msg = "Case " . ++$testcount . "​: 'Very badly behaved' file​: leading newlines";
  $msg .= " internal newlines";
  note($msg);

  ($OUT, $filename) = open_tempfile();
  print $OUT "\n\n\n";
  print $OUT "$_\n" for (
  $chunks[0],
  ("") x 1,
  $chunks[1],
  ("") x 2,
  $chunks[2],
  ("") x 3,
  $chunks[3],
  ("") x 2,
  );
  close $OUT or die;

  @​expected = (
  "$chunks[0]\n\n",
  "$chunks[1]\n\n",
  "$chunks[2]\n\n",
  "$chunks[3]\n\n",
  );
  local $/ = '';
  perform_tests($filename, \@​expected);
  }
}

done_testing();

sub open_tempfile {
  my ($fh, $filename) = tempfile();
  open my $OUT, '>', $filename or die;
  binmode $OUT;
  return ($OUT, $filename);
}

sub perform_tests {
  my ($filename, $expected) = @​_;
  open my $IN, '<', $filename or die;
  for (my $i=0; $i<=$#${expected}; $i++) {
  is(<$IN>, $expected->[$i], "Got expected record $i");
  }

  seek $IN, 0, 0;
  my @​got = <$IN>;
  is_deeply(\@​got, $expected, "Got all expected records");
  close $IN or die;
}

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

commented Dec 13, 2018

From @tonycoz

On Wed, 12 Dec 2018 14​:41​:11 -0800, jkeenan@​pobox.com wrote​:

Summary​: This ticket replaces RT 133703 and proposes improved
documentation and testing for the so-called "paragraph mode", i.e.,
processing of records in a file while $/ set to the empty string.

I. Background​:

In https://rt-archive.perl.org/perl5/Ticket/Display.html?id=133703, I submitted a
patch to sv.c intended to address a "Comparison result is always the
same" warning reported by LTGM.com analysis of the Perl 5 core
distribution. Dave Mitchell pointed out where the patch was wrong and,
in the course of his discussion, mentioned that the section of sv.c at
issue governed so-called "paragraph mode", that is, what happens when
you read a file when the input record separator has been set to an empty
string​:

#####
$/ = '';
#####

While poking around in the code and the test suite, I became convinced​:

(a) that I didn't understand "paragraph mode" very well;

(b) that "paragraph mode" wasn't well documented;

(c) that "paragraph mode" wasn't thoroughly tested in the core
distribution's test suite; and

(d) that as a consequence of (b) and (c), it might contain bugs.

I spent several days working on this. I no longer think "paragraph
mode" has bugs, but I'm more convinced that it is under-documented and
under-tested. In this RT I propose better documentation of paragraph
mode and additional tests.

If the behaviour of <> with $/ = "" is unclear, any improvements to the documentation belong in either the documentation for <> (aka readline in pod/perlfunc.pod) or for $/ ($/ in pod/perlvar.pod), not in perlfaq. Given the other reference documentation of readline/$/ combined behaviour is in perlvar, it probably belongs there.

To facilitate discussion, I've written this test program in a modern
style using Test​::More, File​::Temp, CPAN module Data​::Dump and
subroutines. When we bring this into the core distribution, I'll adapt
it for inclusion under, say, t/op/, where I'll have to use only the
testing functions provided by t/test.pl.

What is important for discussion now is​: Do these tests thoroughly
cover what we mean by "paragraph mode"?

One suggestion I'd make for the tests is to include a brief description of the test case in the is()/is_deeply() calls, since normal default test failure out includes the name of the test - it doesn't include the note() output.

In your case you might pass a test name prefix to perform_tests() and include that as part of the name supplied to is()/is_deeply().

Tony

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

commented Dec 13, 2018

The RT System itself - Status changed from 'new' to 'open'

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

commented Dec 13, 2018

From @jkeenan

On 12/12/18 11​:11 PM, Tony Cook via RT wrote​:

On Wed, 12 Dec 2018 14​:41​:11 -0800, jkeenan@​pobox.com wrote​:

Summary​: This ticket replaces RT 133703 and proposes improved
documentation and testing for the so-called "paragraph mode", i.e.,
processing of records in a file while $/ set to the empty string.

I. Background​:

In https://rt-archive.perl.org/perl5/Ticket/Display.html?id=133703, I submitted a
patch to sv.c intended to address a "Comparison result is always the
same" warning reported by LTGM.com analysis of the Perl 5 core
distribution. Dave Mitchell pointed out where the patch was wrong and,
in the course of his discussion, mentioned that the section of sv.c at
issue governed so-called "paragraph mode", that is, what happens when
you read a file when the input record separator has been set to an empty
string​:

#####
$/ = '';
#####

While poking around in the code and the test suite, I became convinced​:

(a) that I didn't understand "paragraph mode" very well;

(b) that "paragraph mode" wasn't well documented;

(c) that "paragraph mode" wasn't thoroughly tested in the core
distribution's test suite; and

(d) that as a consequence of (b) and (c), it might contain bugs.

I spent several days working on this. I no longer think "paragraph
mode" has bugs, but I'm more convinced that it is under-documented and
under-tested. In this RT I propose better documentation of paragraph
mode and additional tests.

If the behaviour of <> with $/ = "" is unclear, any improvements to the documentation belong in either the documentation for <> (aka readline in pod/perlfunc.pod) or for $/ ($/ in pod/perlvar.pod), not in perlfaq. Given the other reference documentation of readline/$/ combined behaviour is in perlvar, it probably belongs there.

To facilitate discussion, I've written this test program in a modern
style using Test​::More, File​::Temp, CPAN module Data​::Dump and
subroutines. When we bring this into the core distribution, I'll adapt
it for inclusion under, say, t/op/, where I'll have to use only the
testing functions provided by t/test.pl.

What is important for discussion now is​: Do these tests thoroughly
cover what we mean by "paragraph mode"?

One suggestion I'd make for the tests is to include a brief description of the test case in the is()/is_deeply() calls, since normal default test failure out includes the name of the test - it doesn't include the note() output.

In your case you might pass a test name prefix to perform_tests() and include that as part of the name supplied to is()/is_deeply().

Tony

Please review the two new patches attached.

0001-More-specific-documentation-of-paragraph-mode.patch
0002-Thoroughly-test-paragraph-mode.patch

Thank you very much.
Jim Keenan

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

commented Dec 13, 2018

From @jkeenan

0002-Thoroughly-test-paragraph-mode.patch
From 5a2ed1015aa3f39bf3a320962d519e57c22a8771 Mon Sep 17 00:00:00 2001
From: James E Keenan <jkeenan@cpan.org>
Date: Thu, 13 Dec 2018 18:29:29 -0500
Subject: [PATCH 2/2] Thoroughly test paragraph mode

For: RT # 133722
---
 MANIFEST              |   1 +
 t/io/paragraph_mode.t | 504 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 505 insertions(+)
 create mode 100644 t/io/paragraph_mode.t

diff --git a/MANIFEST b/MANIFEST
index 4276316980..ca5f78cdf3 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -5404,6 +5404,7 @@ t/io/layers.t			See if PerlIO layers work
 t/io/nargv.t			See if nested ARGV stuff works
 t/io/open.t			See if open works
 t/io/openpid.t			See if open works for subprocesses
+t/io/paragraph_mode.t			See if paragraph mode works
 t/io/perlio.t			See if PerlIO works
 t/io/perlio_fail.t		See if bad layers fail
 t/io/perlio_leaks.t		See if PerlIO layers are leaking
diff --git a/t/io/paragraph_mode.t b/t/io/paragraph_mode.t
new file mode 100644
index 0000000000..edbb4cb196
--- /dev/null
+++ b/t/io/paragraph_mode.t
@@ -0,0 +1,504 @@
+#!./perl
+
+BEGIN {
+    chdir 't' if -d 't';
+    require './test.pl';
+    set_up_inc('../lib');
+}
+
+plan tests =>  80;
+
+my ($OUT, $filename, @chunks, @expected, $msg);
+
+{
+    # We start with files whose "paragraphs" contain no internal newlines.
+    @chunks = (
+        join('' => ( 1..3 )),
+        join('' => ( 4..6 )),
+        join('' => ( 7..9 )),
+        10
+    );
+
+    {
+        $msg = "'Well behaved' file: >= 2 newlines between text blocks; no internal newlines; 3 final newlines";
+
+        ($OUT, $filename) = open_tempfile();
+        print $OUT "$_\n" for (
+            $chunks[0],
+            ("") x 1,
+            $chunks[1],
+            ("") x 2,
+            $chunks[2],
+            ("") x 3,
+        );
+        print $OUT $chunks[3];
+        close $OUT or die;
+
+        @expected = (
+            "$chunks[0]\n\n",
+            "$chunks[1]\n\n",
+            "$chunks[2]\n\n",
+            $chunks[3],
+        );
+        local $/ = '';
+        perform_tests($filename, \@expected, $msg);
+    }
+
+    {
+        $msg = "'Well behaved' file: >= 2 newlines between text blocks; no internal newlines; 0 final newline";
+
+        ($OUT, $filename) = open_tempfile();
+        print $OUT "$_\n" for (
+            $chunks[0],
+            ("") x 1,
+            $chunks[1],
+            ("") x 2,
+            $chunks[2],
+            ("") x 3,
+            $chunks[3],
+        );
+        close $OUT or die;
+
+        @expected = (
+            "$chunks[0]\n\n",
+            "$chunks[1]\n\n",
+            "$chunks[2]\n\n",
+            "$chunks[3]\n",
+        );
+        local $/ = '';
+        perform_tests($filename, \@expected, $msg);
+    }
+
+    {
+        $msg = "'Well behaved' file: >= 2 newlines between text blocks; no internal newlines; 1 final newline";
+
+        ($OUT, $filename) = open_tempfile();
+        print $OUT "$_\n" for (
+            $chunks[0],
+            ("") x 1,
+            $chunks[1],
+            ("") x 2,
+            $chunks[2],
+            ("") x 3,
+            $chunks[3],
+            ("") x 1,
+        );
+        close $OUT or die;
+
+        @expected = (
+            "$chunks[0]\n\n",
+            "$chunks[1]\n\n",
+            "$chunks[2]\n\n",
+            "$chunks[3]\n\n",
+        );
+        local $/ = '';
+        perform_tests($filename, \@expected, $msg);
+    }
+
+    {
+        $msg = "'Well behaved' file: >= 2 newlines between text blocks; no internal newlines; 2 final newlines";
+
+        ($OUT, $filename) = open_tempfile();
+        print $OUT "$_\n" for (
+            $chunks[0],
+            ("") x 1,
+            $chunks[1],
+            ("") x 2,
+            $chunks[2],
+            ("") x 3,
+            $chunks[3],
+            ("") x 2,
+        );
+        close $OUT or die;
+
+        @expected = (
+            "$chunks[0]\n\n",
+            "$chunks[1]\n\n",
+            "$chunks[2]\n\n",
+            "$chunks[3]\n\n",
+        );
+        local $/ = '';
+        perform_tests($filename, \@expected, $msg);
+    }
+}
+
+{
+    # We continue with files whose "paragraphs" contain internal newlines.
+    @chunks = (
+        join('' => ( 1, 2, "\n", 3 )),
+        join('' => ( 4, 5, "   \n", 6 )),
+        join('' => ( 7, 8, " \t\n", 9 )),
+        10
+    );
+
+    {
+        $msg = "'Misbehaving' file: >= 2 newlines between text blocks; no internal newlines; 3 final newlines";
+
+        ($OUT, $filename) = open_tempfile();
+        print $OUT "$_\n" for (
+            $chunks[0],
+            ("") x 1,
+            $chunks[1],
+            ("") x 2,
+            $chunks[2],
+            ("") x 3,
+        );
+        print $OUT $chunks[3];
+        close $OUT or die;
+
+        @expected = (
+            "$chunks[0]\n\n",
+            "$chunks[1]\n\n",
+            "$chunks[2]\n\n",
+            $chunks[3],
+        );
+        local $/ = '';
+        perform_tests($filename, \@expected, $msg);
+    }
+
+    {
+        $msg = "'Misbehaving' file: >= 2 newlines between text blocks; no internal newlines; 0 final newline";
+
+        ($OUT, $filename) = open_tempfile();
+        print $OUT "$_\n" for (
+            $chunks[0],
+            ("") x 1,
+            $chunks[1],
+            ("") x 2,
+            $chunks[2],
+            ("") x 3,
+            $chunks[3],
+        );
+        close $OUT or die;
+
+        @expected = (
+            "$chunks[0]\n\n",
+            "$chunks[1]\n\n",
+            "$chunks[2]\n\n",
+            "$chunks[3]\n",
+        );
+        local $/ = '';
+        perform_tests($filename, \@expected, $msg);
+    }
+
+    {
+        $msg = "'Misbehaving' file: >= 2 newlines between text blocks; no internal newlines; 1 final newline";
+
+        ($OUT, $filename) = open_tempfile();
+        print $OUT "$_\n" for (
+            $chunks[0],
+            ("") x 1,
+            $chunks[1],
+            ("") x 2,
+            $chunks[2],
+            ("") x 3,
+            $chunks[3],
+            ("") x 1,
+        );
+        close $OUT or die;
+
+        @expected = (
+            "$chunks[0]\n\n",
+            "$chunks[1]\n\n",
+            "$chunks[2]\n\n",
+            "$chunks[3]\n\n",
+        );
+        local $/ = '';
+        perform_tests($filename, \@expected, $msg);
+    }
+
+    {
+        $msg = "'Misbehaving' file: >= 2 newlines between text blocks; no internal newlines; 2 final newlines";
+
+        ($OUT, $filename) = open_tempfile();
+        print $OUT "$_\n" for (
+            $chunks[0],
+            ("") x 1,
+            $chunks[1],
+            ("") x 2,
+            $chunks[2],
+            ("") x 3,
+            $chunks[3],
+            ("") x 2,
+        );
+        close $OUT or die;
+
+        @expected = (
+            "$chunks[0]\n\n",
+            "$chunks[1]\n\n",
+            "$chunks[2]\n\n",
+            "$chunks[3]\n\n",
+        );
+        local $/ = '';
+        perform_tests($filename, \@expected, $msg);
+    }
+}
+
+{
+    # We continue with files which start with newlines
+    # but whose "paragraphs" contain no internal newlines.
+    # We'll set our expectation that the leading newlines will get trimmed off
+    # and everything else will proceed normally.
+
+    @chunks = (
+        join('' => ( 1..3 )),
+        join('' => ( 4..6 )),
+        join('' => ( 7..9 )),
+        10
+    );
+
+    {
+        $msg = "'Badly behaved' file: leading newlines; 3 final newlines";
+
+        ($OUT, $filename) = open_tempfile();
+        print $OUT "\n\n\n";
+        print $OUT "$_\n" for (
+            $chunks[0],
+            ("") x 1,
+            $chunks[1],
+            ("") x 2,
+            $chunks[2],
+            ("") x 3,
+        );
+        print $OUT $chunks[3];
+        close $OUT or die;
+
+        @expected = (
+            "$chunks[0]\n\n",
+            "$chunks[1]\n\n",
+            "$chunks[2]\n\n",
+            $chunks[3],
+        );
+        local $/ = '';
+        perform_tests($filename, \@expected, $msg);
+    }
+
+    {
+        $msg = "'Badly behaved' file: leading newlines; 0 final newline";
+
+        ($OUT, $filename) = open_tempfile();
+        print $OUT "\n\n\n";
+        print $OUT "$_\n" for (
+            $chunks[0],
+            ("") x 1,
+            $chunks[1],
+            ("") x 2,
+            $chunks[2],
+            ("") x 3,
+            $chunks[3],
+        );
+        close $OUT or die;
+
+        @expected = (
+            "$chunks[0]\n\n",
+            "$chunks[1]\n\n",
+            "$chunks[2]\n\n",
+            "$chunks[3]\n",
+        );
+        local $/ = '';
+        perform_tests($filename, \@expected, $msg);
+    }
+
+    {
+        $msg = "'Badly behaved' file: leading newlines; 1 final newline";
+
+        ($OUT, $filename) = open_tempfile();
+        print $OUT "\n\n\n";
+        print $OUT "$_\n" for (
+            $chunks[0],
+            ("") x 1,
+            $chunks[1],
+            ("") x 2,
+            $chunks[2],
+            ("") x 3,
+            $chunks[3],
+            ("") x 1,
+        );
+        close $OUT or die;
+
+        @expected = (
+            "$chunks[0]\n\n",
+            "$chunks[1]\n\n",
+            "$chunks[2]\n\n",
+            "$chunks[3]\n\n",
+        );
+        local $/ = '';
+        perform_tests($filename, \@expected, $msg);
+    }
+
+    {
+        $msg = "'Badly behaved' file: leading newlines; 2 final newlines";
+
+        ($OUT, $filename) = open_tempfile();
+        print $OUT "\n\n\n";
+        print $OUT "$_\n" for (
+            $chunks[0],
+            ("") x 1,
+            $chunks[1],
+            ("") x 2,
+            $chunks[2],
+            ("") x 3,
+            $chunks[3],
+            ("") x 2,
+        );
+        close $OUT or die;
+
+        @expected = (
+            "$chunks[0]\n\n",
+            "$chunks[1]\n\n",
+            "$chunks[2]\n\n",
+            "$chunks[3]\n\n",
+        );
+        local $/ = '';
+        perform_tests($filename, \@expected, $msg);
+    }
+}
+
+{
+    # We continue with files which start with newlines
+    # and whose "paragraphs" contain internal newlines.
+    # We'll set our expectation that the leading newlines will get trimmed off
+    # and everything else will proceed normally.
+
+    @chunks = (
+        join('' => ( 1, 2, "\n", 3 )),
+        join('' => ( 4, 5, "   \n", 6 )),
+        join('' => ( 7, 8, " \t\n", 9 )),
+        10
+    );
+
+    {
+        $msg = "'Very badly behaved' file: leading newlines; internal newlines; 3 final newlines";
+
+        ($OUT, $filename) = open_tempfile();
+        print $OUT "\n\n\n";
+        print $OUT "$_\n" for (
+            $chunks[0],
+            ("") x 1,
+            $chunks[1],
+            ("") x 2,
+            $chunks[2],
+            ("") x 3,
+        );
+        print $OUT $chunks[3];
+        close $OUT or die;
+
+        @expected = (
+            "$chunks[0]\n\n",
+            "$chunks[1]\n\n",
+            "$chunks[2]\n\n",
+            $chunks[3],
+        );
+        local $/ = '';
+        perform_tests($filename, \@expected, $msg);
+    }
+
+    {
+        $msg = "'Very badly behaved' file: leading newlines; internal newlines; 0 final newline";
+
+        ($OUT, $filename) = open_tempfile();
+        print $OUT "\n\n\n";
+        print $OUT "$_\n" for (
+            $chunks[0],
+            ("") x 1,
+            $chunks[1],
+            ("") x 2,
+            $chunks[2],
+            ("") x 3,
+            $chunks[3],
+        );
+        close $OUT or die;
+
+        @expected = (
+            "$chunks[0]\n\n",
+            "$chunks[1]\n\n",
+            "$chunks[2]\n\n",
+            "$chunks[3]\n",
+        );
+        local $/ = '';
+        perform_tests($filename, \@expected, $msg);
+    }
+
+    {
+        $msg = "'Very badly behaved' file: leading newlines; internal newlines; 1 final newline";
+
+        ($OUT, $filename) = open_tempfile();
+        print $OUT "\n\n\n";
+        print $OUT "$_\n" for (
+            $chunks[0],
+            ("") x 1,
+            $chunks[1],
+            ("") x 2,
+            $chunks[2],
+            ("") x 3,
+            $chunks[3],
+            ("") x 1,
+        );
+        close $OUT or die;
+
+        @expected = (
+            "$chunks[0]\n\n",
+            "$chunks[1]\n\n",
+            "$chunks[2]\n\n",
+            "$chunks[3]\n\n",
+        );
+        local $/ = '';
+        perform_tests($filename, \@expected, $msg);
+    }
+
+    {
+        $msg = "'Very badly behaved' file: leading newlines; internal newlines; 2 final newlines";
+
+        ($OUT, $filename) = open_tempfile();
+        print $OUT "\n\n\n";
+        print $OUT "$_\n" for (
+            $chunks[0],
+            ("") x 1,
+            $chunks[1],
+            ("") x 2,
+            $chunks[2],
+            ("") x 3,
+            $chunks[3],
+            ("") x 2,
+        );
+        close $OUT or die;
+
+        @expected = (
+            "$chunks[0]\n\n",
+            "$chunks[1]\n\n",
+            "$chunks[2]\n\n",
+            "$chunks[3]\n\n",
+        );
+        local $/ = '';
+        perform_tests($filename, \@expected, $msg);
+    }
+}
+
+########## SUBROUTINES ##########
+
+sub open_tempfile {
+    my $filename = tempfile();
+    open my $OUT, '>', $filename or die;
+    binmode $OUT;
+    return ($OUT, $filename);
+}
+
+sub perform_tests {
+    my ($filename, $expected, $msg) = @_;
+    open my $IN, '<', $filename or die;
+    my @got = <$IN>;
+    my $success = 1;
+    for (my $i=0; $i<=$#${expected}; $i++) {
+        if ($got[$i] ne $expected->[$i]) {
+            $success = 0;
+            last;
+        }
+    }
+    ok($success, $msg);
+
+    seek $IN, 0, 0;
+    for (my $i=0; $i<=$#${expected}; $i++) {
+        is(<$IN>, $expected->[$i], "Got expected record $i");
+    }
+    close $IN or die;
+}
-- 
2.17.1

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

commented Dec 13, 2018

From @jkeenan

0001-More-specific-documentation-of-paragraph-mode.patch
From efd60cd2d95b880edd52d8f2402154d6e9423665 Mon Sep 17 00:00:00 2001
From: James E Keenan <jkeenan@cpan.org>
Date: Thu, 13 Dec 2018 17:42:42 -0500
Subject: [PATCH 1/2] More specific documentation of paragraph mode.

For: RT # 133722
---
 pod/perlvar.pod | 38 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/pod/perlvar.pod b/pod/perlvar.pod
index 5faea28062..03b2215b66 100644
--- a/pod/perlvar.pod
+++ b/pod/perlvar.pod
@@ -1487,6 +1487,44 @@ the next paragraph, even if it's a newline.
 Remember: the value of C<$/> is a string, not a regex.  B<awk> has to
 be better for something. :-)
 
+Setting C<$/> to an empty string -- the so-called I<paragraph mode> -- merits
+special attention.  When C<$/> is set to C<""> and the entire file is read in
+with that setting, any sequence of consecutive newlines C<"\n\n"> at the
+beginning of the file is discarded.  With the exception of the final record in
+the file, each sequence of characters ending in two or more newlines is
+treated as one record and is read in to end in exactly two newlines.  If the
+last record in the file ends in zero or one consecutive newlines, that record
+is read in with that number of newlines.  If the last record ends in two or
+more consecutive newlines, it is read in with two newlines like all preceding
+records.
+
+Suppose we wrote the following string to a file:
+
+    my $string = "\n\n\n";
+    $string .= "alpha beta\ngamma delta\n\n\n";
+    $string .= "epsilon zeta eta\n\n";
+    $string .= "theta\n";
+
+    my $file = 'simple_file.txt'; 
+    open my $OUT, '>', $file or die;
+    print $OUT $string;
+    close $OUT or die;
+
+Now we read that file in paragraph mode:
+
+    local $/ = ""; # paragraph mode
+    open my $IN, '<', $file or die;
+    my @records = <$IN>;
+    close $IN or die;
+
+C<@records> will consist of these 3 strings:
+
+    (
+      "alpha beta\ngamma delta\n\n",
+      "epsilon zeta eta\n\n",
+      "theta\n",
+    )
+
 Setting C<$/> to a reference to an integer, scalar containing an
 integer, or scalar that's convertible to an integer will attempt to
 read records instead of lines, with the maximum record size being the
-- 
2.17.1

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

commented Dec 19, 2018

From @tonycoz

On Thu, 13 Dec 2018 15​:44​:48 -0800, jkeenan@​pobox.com wrote​:

Please review the two new patches attached.

0001-More-specific-documentation-of-paragraph-mode.patch
0002-Thoroughly-test-paragraph-mode.patch

That's fine.

Tony

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

commented Dec 19, 2018

From @jkeenan

On Wed, 19 Dec 2018 03​:09​:34 GMT, tonyc wrote​:

On Thu, 13 Dec 2018 15​:44​:48 -0800, jkeenan@​pobox.com wrote​:

Please review the two new patches attached.

0001-More-specific-documentation-of-paragraph-mode.patch
0002-Thoroughly-test-paragraph-mode.patch

That's fine.

Tony

Merged to blead in commits 440af01 and bf8c368.

Will monitor for a few days.

--
James E Keenan (jkeenan@​cpan.org)

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

commented Dec 30, 2018

From @jkeenan

On Wed, 19 Dec 2018 14​:46​:07 GMT, jkeenan wrote​:

On Wed, 19 Dec 2018 03​:09​:34 GMT, tonyc wrote​:

On Thu, 13 Dec 2018 15​:44​:48 -0800, jkeenan@​pobox.com wrote​:

Please review the two new patches attached.

0001-More-specific-documentation-of-paragraph-mode.patch
0002-Thoroughly-test-paragraph-mode.patch

That's fine.

Tony

Merged to blead in commits 440af01
and bf8c368.

Will monitor for a few days.

No failures observed; resolving ticket.

--
James E Keenan (jkeenan@​cpan.org)

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

commented Dec 30, 2018

@jkeenan - Status changed from 'open' to 'pending release'

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

commented May 22, 2019

From @khwilliamson

Thank you for filing this report. You have helped make Perl better.

With the release today of Perl 5.30.0, this and 160 other issues have been
resolved.

Perl 5.30.0 may be downloaded via​:
https://metacpan.org/release/XSAWYERX/perl-5.30.0

If you find that the problem persists, feel free to reopen this ticket.

@p5pRT

This comment has been minimized.

Copy link
Collaborator Author

commented May 22, 2019

@khwilliamson - Status changed from 'pending release' to 'resolved'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant
You can’t perform that action at this time.