Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix improper reinitialization of $v_ref #68

Closed
wants to merge 1 commit into from
Closed

fix improper reinitialization of $v_ref #68

wants to merge 1 commit into from

Conversation

XSven
Copy link

@XSven XSven commented Mar 27, 2024

When the comment_str is set and the strict mode is enabled too the first row after the comment row fails with

# CSV_PP ERROR: 2014 - ENF - Inconsistent number of fields @ rec 2 pos 10 field 1

The reason is an improper fld_idx count which is caused by an improper reinitialiazion of the $v_ref variable.

@Tux
Copy link

Tux commented Mar 27, 2024

Thanks for spotting.

IMHO the more correct change would be

diff --git a/lib/Text/CSV_PP.pm b/lib/Text/CSV_PP.pm
index d1873f6..a41c6dd 100644
--- a/lib/Text/CSV_PP.pm
+++ b/lib/Text/CSV_PP.pm
@@ -2320,6 +2320,7 @@ LOOP:
                     if (!$spl && $ctx->{comment_str} && $ctx->{tmp} =~ /\A\Q$ctx->{comment_str}/) {
                         $ctx->{used} = $ctx->{size};
                         $ctx->{fld_idx} = 0;
+                        $ctx->{strict_n} = 0;
                         $seenSomething = 0;
                         next LOOP;
                     }
diff --git a/t/47_comment.t b/t/47_comment.t
index b2a4169..4f86660 100644
--- a/t/47_comment.t
+++ b/t/47_comment.t
@@ -13,7 +13,7 @@ BEGIN {
        }
     else {
        require Encode;
-       plan tests => 61;
+       plan tests => 62;
        }
     require "./t/util.pl";
     }
@@ -57,17 +57,29 @@ foreach my $cstr ("#", "//", "Comment", "\xe2\x98\x83") {
        }
     }

+my $data = <<"EOC";
+id | name
+#
+42 | foo
+#
+EOC
+
 is_deeply (csv (
-    in               => *DATA,
+    in               => \$data,
     sep_char         => "|",
     headers          => "auto",
     allow_whitespace => 1,
-    comment_str      => "#"
+    comment_str      => "#",
+    strict           => 0,
+    ), [{ id => 42, name => "foo" }], "Last record is comment");
+
+is_deeply (csv (
+    in               => \$data,
+    sep_char         => "|",
+    headers          => "auto",
+    allow_whitespace => 1,
+    comment_str      => "#",
+    strict           => 1,
     ), [{ id => 42, name => "foo" }], "Last record is comment");

 1;
-__END__
-id | name
-#
-42 | foo
-#

@XSven
Copy link
Author

XSven commented Mar 28, 2024

I doubt that. Although this description

If this attribute is set to 1, any row that parses to a different number of fields than the previous row will cause the parser to throw error 2014.

of the strict attribute refers only to the previous row and a comment row could be such a kind of previous row that reinitializes the number of fields back to 0 my understanding is different: strict means to me that all non-comment rows should have the the same number of fields. If we would reinitialize strict_n back to 0 after each comment-row this file

id | name
43 | bar
#
42 | foo | arrgh
#

would pass. That is weird. It should fail and my change guarantees that.

@XSven
Copy link
Author

XSven commented Mar 28, 2024

If we could at least agree on the same definition with respect to the strict attribute, I would appreciate that the issue could be solved short term in both the PP and the XS implementation of Text::CSV. I want to use this in production code and I am somehow blocked now by this bug. Thx.

@Tux
Copy link

Tux commented Mar 28, 2024

The meaning of strict true is to stop parsing on the first non-skipped syntactically valid row where the number of fields does not match the number of fields in the first valid parsed rom of the file/stream.

@XSven
Copy link
Author

XSven commented Mar 28, 2024

Ok that means a comment row that is a skipped row should not reinitialize strict_n.

@Tux
Copy link

Tux commented Mar 29, 2024

How about this one? Tux/Text-CSV_XS@6ed8c31

@XSven
Copy link
Author

XSven commented Apr 2, 2024

The commit Tux/Text-CSV_XS@6ed8c31 does fix the issue. Thanks. Some possible tidying improvements:

  1. CSV_XS.exp could be added to the .gitignore file.
  2. My vim does not like the local .exrc file
    Error detected while processing ./Text-CSV_XS/.exrc:
    line    1:
    E518: Unknown option: gw=75,5
    line    2:
    E185: Cannot find color scheme 'guide #0000a0'

I am asking myself if the error message could/should be improved

CSV_XS ERROR: 2014 - ENF - Inconsistent number of fields @ rec 3 pos 3 field 1

Why not mentioning the number of expected fields and the number of fields got?

@Tux
Copy link

Tux commented Apr 2, 2024

.exp add to .gitignore
gw might be elvis-specific. Not going to remove that
`The error message is generated based on the error code with additional generic state fields. I don't see (at tis moment) how that could simply be improved without cluttering all other messages with information that adds no value. Point taken though. I'll think about it.

@XSven
Copy link
Author

XSven commented Apr 2, 2024

Ok thanks, any idea when you will release the fix?

@Tux
Copy link

Tux commented Apr 2, 2024

Any time soon when I have time (and currently I have none).
A Text::CSV_XS release takes 6 hours to complete all steps

@XSven XSven closed this by deleting the head repository Apr 2, 2024
@XSven
Copy link
Author

XSven commented Apr 3, 2024

Initially this pull request was related to Text::CSV which contains the Test::CSV_PP implementation which is still broken!

@XSven
Copy link
Author

XSven commented Apr 4, 2024

I still cannot find the new version 1.54 of Text::CSV_XS in metacpan?!

@XSven
Copy link
Author

XSven commented Apr 4, 2024

Sorry I have overlooked the statement

Any time soon when I have time (and currently I have none).

We will try to install from github directly.

@XSven
Copy link
Author

XSven commented Apr 10, 2024

This test

use Test::More tests => 1;
use Text::CSV_XS qw(csv);

 my $data = <<EOC;
 # some comment
 42 | foo
 43 | bar
 EOC

 is_deeply(
   csv( in => \$data, allow_whitespace => 1, auto_diag => 2, strict => 1, comment_str => '#', sep_char => '|' ),
   [ [ 42, 'foo' ], [ 43, 'bar' ] ],
   'Two records read successfully'
 );

is failing

CSV_XS ERROR: 2014 - ENF - Inconsistent number of fields @ rec 2 pos 10 field 2

although it should not. The problem is that the comment line is the first line. If we move the comment line to the second line the test is successful.

@Tux
Copy link

Tux commented Apr 11, 2024

Could you try Tux/Text-CSV_XS@8cdf92e
Thanks for the new corner cases

@XSven
Copy link
Author

XSven commented Apr 18, 2024

Sorry for the late reply. Everything works as expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants