Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional validation of CSV Dialect parameters #113796

Closed
serhiy-storchaka opened this issue Jan 7, 2024 · 2 comments
Closed

Additional validation of CSV Dialect parameters #113796

serhiy-storchaka opened this issue Jan 7, 2024 · 2 comments
Assignees
Labels
type-feature A feature request or enhancement

Comments

@serhiy-storchaka
Copy link
Member

serhiy-storchaka commented Jan 7, 2024

Feature or enhancement

CSV writer and reader allow to configure parameters of the CSV dialect. They check for some incompatible combination, for example quotechar must be set if quoting enabled. But it is still possible to specify values that produce ambiguous output which cannot be parsed neither by the CSV reader not by any other CSV parser. For example if any two of delimiter, quotechar or escapechar match. The error can be only noticed when try to read the CSV file, and it can simply produce wrong result instead of failing.

I propose to add more validation checks in addition to existing checks in the Dialect constructor.

Linked PRs

@serhiy-storchaka serhiy-storchaka added the type-feature A feature request or enhancement label Jan 7, 2024
@serhiy-storchaka serhiy-storchaka self-assigned this Jan 7, 2024
serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this issue Jan 7, 2024
…uctor

ValueError is now raised if the same character is used in different roles.
serhiy-storchaka added a commit that referenced this issue Jan 22, 2024
…H-113797)

ValueError is now raised if the same character is used in different roles.
aisk pushed a commit to aisk/cpython that referenced this issue Feb 11, 2024
…uctor (pythonGH-113797)

ValueError is now raised if the same character is used in different roles.
@befeleme
Copy link
Contributor

This changed possibly broke docutils - it was built successfully with Python 3.13.0a1 to a3, started failing in a4 in Fedora's Copr environment: https://copr-be.cloud.fedoraproject.org/results/@python/python3.13/fedora-rawhide-x86_64/07033758-python-docutils/builder-live.log.gz

======================================================================
ERROR: test_parser (test_parsers.test_rst.test_directives.test_tables.ParserTestCase.test_parser) (id="totest['csv_table'][17]")
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/builddir/build/BUILD/docutils-0.20.1/test/test_parsers/test_rst/test_directives/test_tables.py", line 41, in test_parser
    parser.parse(case_input, document)
    ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^
  File "/builddir/build/BUILD/docutils-0.20.1/docutils/parsers/rst/__init__.py", line 184, in parse
    self.statemachine.run(inputlines, document, inliner=self.inliner)
    ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/builddir/build/BUILD/docutils-0.20.1/docutils/parsers/rst/states.py", line 169, in run
    results = StateMachineWS.run(self, input_lines, input_offset,
              ~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                                 input_source=document['source'])
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/builddir/build/BUILD/docutils-0.20.1/docutils/statemachine.py", line 233, in run
    context, next_state, result = self.check_line(
                                  ~~~~~~~~~~~~~~~^
        context, state, transitions)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/builddir/build/BUILD/docutils-0.20.1/docutils/statemachine.py", line 445, in check_line
    return method(match, context, next_state)
           ~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/builddir/build/BUILD/docutils-0.20.1/docutils/parsers/rst/states.py", line 2357, in explicit_markup
    self.explicit_list(blank_finish)
    ~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^
  File "/builddir/build/BUILD/docutils-0.20.1/docutils/parsers/rst/states.py", line 2382, in explicit_list
    newline_offset, blank_finish = self.nested_list_parse(
                                   ~~~~~~~~~~~~~~~~~~~~~~^
          self.state_machine.input_lines[offset:],
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<2 lines>...
          blank_finish=blank_finish,
          ^^^^^^^^^^^^^^^^^^^^^^^^^^
          match_titles=self.state_machine.match_titles)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/builddir/build/BUILD/docutils-0.20.1/docutils/parsers/rst/states.py", line 316, in nested_list_parse
    state_machine.run(block, input_offset, memo=self.memo,
    ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                      node=node, match_titles=match_titles)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/builddir/build/BUILD/docutils-0.20.1/docutils/parsers/rst/states.py", line 195, in run
    results = StateMachineWS.run(self, input_lines, input_offset)
              ~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/builddir/build/BUILD/docutils-0.20.1/docutils/statemachine.py", line 233, in run
    context, next_state, result = self.check_line(
                                  ~~~~~~~~~~~~~~~^
        context, state, transitions)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/builddir/build/BUILD/docutils-0.20.1/docutils/statemachine.py", line 445, in check_line
    return method(match, context, next_state)
           ~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/builddir/build/BUILD/docutils-0.20.1/docutils/parsers/rst/states.py", line 2660, in explicit_markup
    nodelist, blank_finish = self.explicit_construct(match)
                             ~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^
  File "/builddir/build/BUILD/docutils-0.20.1/docutils/parsers/rst/states.py", line 2367, in explicit_construct
    return method(self, expmatch)
           ~~~~~~^^^^^^^^^^^^^^^^
  File "/builddir/build/BUILD/docutils-0.20.1/docutils/parsers/rst/states.py", line 2104, in directive
    return self.run_directive(
           ~~~~~~~~~~~~~~~~~~^
        directive_class, match, type_name, option_presets)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/builddir/build/BUILD/docutils-0.20.1/docutils/parsers/rst/states.py", line 2154, in run_directive
    result = directive_instance.run()
             ~~~~~~~~~~~~~~~~~~~~~~^^
  File "/builddir/build/BUILD/docutils-0.20.1/docutils/parsers/rst/directives/tables.py", line 295, in run
    csv_data, self.DocutilsDialect(self.options), source)
              ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^
  File "/builddir/build/BUILD/docutils-0.20.1/docutils/parsers/rst/directives/tables.py", line 237, in __init__
    super().__init__()
    ~~~~~~~~~~~~~~~~^^
  File "/usr/lib64/python3.13/csv.py", line 110, in __init__
    self._validate()
    ~~~~~~~~~~~~~~^^
  File "/usr/lib64/python3.13/csv.py", line 114, in _validate
    _Dialect(self)
    ~~~~~~~~^^^^^^
ValueError: bad delimiter value

@serhiy-storchaka
Copy link
Member Author

Thank you for your report. It fails if the delimiter is '\r', '\n' or a space when skipinitialspace is true. The failing test case is:

.. csv-table:: good delimiter
   :delim: /

   some/csv/data

.. csv-table:: good delimiter
   :delim: \

   some\csv\data

.. csv-table:: good delimiter
   :delim: 0x5c

   some\csv\data

.. csv-table:: good delimiter
   :delim: space

   some csv data

I guess that the problem is caused by :delim: space. skipinitialspace is set to True in DocutilsDialect.

It is perhaps a Python failure. I see now a use case for such combination. Opened #115712.

serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this issue Feb 20, 2024
…ialspace=True

Restore support of such combination, disabled in pythongh-113796.

csv.writer() now quotes empty fields if delimiter is a space and
skipinitialspace is true and raises exception if quoting is not possible.
serhiy-storchaka added a commit that referenced this issue Feb 20, 2024
…ce=True (GH-115721)

Restore support of such combination, disabled in gh-113796.

csv.writer() now quotes empty fields if delimiter is a space and
skipinitialspace is true and raises exception if quoting is not possible.
serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this issue Feb 20, 2024
…kipinitialspace=True (pythonGH-115721)

Restore support of such combination, disabled in pythongh-113796.

csv.writer() now quotes empty fields if delimiter is a space and
skipinitialspace is true and raises exception if quoting is not possible.
(cherry picked from commit 937d282)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
woodruffw pushed a commit to woodruffw-forks/cpython that referenced this issue Mar 4, 2024
…ialspace=True (pythonGH-115721)

Restore support of such combination, disabled in pythongh-113796.

csv.writer() now quotes empty fields if delimiter is a space and
skipinitialspace is true and raises exception if quoting is not possible.
diegorusso pushed a commit to diegorusso/cpython that referenced this issue Apr 17, 2024
…ialspace=True (pythonGH-115721)

Restore support of such combination, disabled in pythongh-113796.

csv.writer() now quotes empty fields if delimiter is a space and
skipinitialspace is true and raises exception if quoting is not possible.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-feature A feature request or enhancement
Projects
Status: Done
Development

No branches or pull requests

2 participants