raise_if_empty=False does not suppress NoDataError if skip_rows_after_header is passed and no data rows remain #18362

ts826848 · 2024-08-25T19:57:09Z

Checks

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of Polars.

Reproducible example

polars-test % cat test.csv
col1,col2,col3
u1,u2,u3

[ins] In [1]: import polars as pl

[ins] In [2]: pl.read_csv('test.csv')
Out[2]:
shape: (1, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ str  ┆ str  ┆ str  │
╞══════╪══════╪══════╡
│ u1   ┆ u2   ┆ u3   │
└──────┴──────┴──────┘

[ins] In [3]: pl.read_csv('test.csv', skip_rows_after_header=1, raise_if_empty=False)
---------------------------------------------------------------------------
NoDataError                               Traceback (most recent call last)
Cell In[3], line 1
----> 1 pl.read_csv('test.csv', skip_rows_after_header=1, raise_if_empty=False)

File /private/tmp/polars-test/.venv/lib/python3.12/site-packages/polars/_utils/deprecation.py:91, in deprecate_renamed_parameter.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
     86 @wraps(function)
     87 def wrapper(*args: P.args, **kwargs: P.kwargs) -> T:
     88     _rename_keyword_argument(
     89         old_name, new_name, kwargs, function.__qualname__, version
     90     )
---> 91     return function(*args, **kwargs)

File /private/tmp/polars-test/.venv/lib/python3.12/site-packages/polars/_utils/deprecation.py:91, in deprecate_renamed_parameter.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
     86 @wraps(function)
     87 def wrapper(*args: P.args, **kwargs: P.kwargs) -> T:
     88     _rename_keyword_argument(
     89         old_name, new_name, kwargs, function.__qualname__, version
     90     )
---> 91     return function(*args, **kwargs)

File /private/tmp/polars-test/.venv/lib/python3.12/site-packages/polars/_utils/deprecation.py:91, in deprecate_renamed_parameter.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
     86 @wraps(function)
     87 def wrapper(*args: P.args, **kwargs: P.kwargs) -> T:
     88     _rename_keyword_argument(
     89         old_name, new_name, kwargs, function.__qualname__, version
     90     )
---> 91     return function(*args, **kwargs)

File /private/tmp/polars-test/.venv/lib/python3.12/site-packages/polars/io/csv/functions.py:496, in read_csv(source, has_header, columns, new_columns, separator, comment_prefix, quote_char, skip_rows, schema, schema_overrides, null_values, missing_utf8_is_empty_string, ignore_errors, try_parse_dates, n_threads, infer_schema, infer_schema_length, batch_size, n_rows, encoding, low_memory, rechunk, use_pyarrow, storage_options, skip_rows_after_header, row_index_name, row_index_offset, sample_size, eol_char, raise_if_empty, truncate_ragged_lines, decimal_comma, glob)
    488 else:
    489     with prepare_file_arg(
    490         source,
    491         encoding=encoding,
   (...)
    494         storage_options=storage_options,
    495     ) as data:
--> 496         df = _read_csv_impl(
    497             data,
    498             has_header=has_header,
    499             columns=columns if columns else projection,
    500             separator=separator,
    501             comment_prefix=comment_prefix,
    502             quote_char=quote_char,
    503             skip_rows=skip_rows,
    504             schema_overrides=schema_overrides,
    505             schema=schema,
    506             null_values=null_values,
    507             missing_utf8_is_empty_string=missing_utf8_is_empty_string,
    508             ignore_errors=ignore_errors,
    509             try_parse_dates=try_parse_dates,
    510             n_threads=n_threads,
    511             infer_schema_length=infer_schema_length,
    512             batch_size=batch_size,
    513             n_rows=n_rows,
    514             encoding=encoding if encoding == "utf8-lossy" else "utf8",
    515             low_memory=low_memory,
    516             rechunk=rechunk,
    517             skip_rows_after_header=skip_rows_after_header,
    518             row_index_name=row_index_name,
    519             row_index_offset=row_index_offset,
    520             sample_size=sample_size,
    521             eol_char=eol_char,
    522             raise_if_empty=raise_if_empty,
    523             truncate_ragged_lines=truncate_ragged_lines,
    524             decimal_comma=decimal_comma,
    525             glob=glob,
    526         )
    528 if new_columns:
    529     return _update_columns(df, new_columns)

File /private/tmp/polars-test/.venv/lib/python3.12/site-packages/polars/io/csv/functions.py:642, in _read_csv_impl(source, has_header, columns, separator, comment_prefix, quote_char, skip_rows, schema, schema_overrides, null_values, missing_utf8_is_empty_string, ignore_errors, try_parse_dates, n_threads, infer_schema_length, batch_size, n_rows, encoding, low_memory, rechunk, skip_rows_after_header, row_index_name, row_index_offset, sample_size, eol_char, raise_if_empty, truncate_ragged_lines, decimal_comma, glob)
    638         raise ValueError(msg)
    640 projection, columns = parse_columns_arg(columns)
--> 642 pydf = PyDataFrame.read_csv(
    643     source,
    644     infer_schema_length,
    645     batch_size,
    646     has_header,
    647     ignore_errors,
    648     n_rows,
    649     skip_rows,
    650     projection,
    651     separator,
    652     rechunk,
    653     columns,
    654     encoding,
    655     n_threads,
    656     path,
    657     dtype_list,
    658     dtype_slice,
    659     low_memory,
    660     comment_prefix,
    661     quote_char,
    662     processed_null_values,
    663     missing_utf8_is_empty_string,
    664     try_parse_dates,
    665     skip_rows_after_header,
    666     parse_row_index_args(row_index_name, row_index_offset),
    667     sample_size=sample_size,
    668     eol_char=eol_char,
    669     raise_if_empty=raise_if_empty,
    670     truncate_ragged_lines=truncate_ragged_lines,
    671     decimal_comma=decimal_comma,
    672     schema=schema,
    673 )
    674 return wrap_df(pydf)

NoDataError: not enough lines to skip

Log output

No response

Issue description

I was trying to read just the first-row column contents of a CSV with two header rows and no data rows. I tried to use skip_rows_after_header=1 in my first attempt, which resulted in a NoDataError as expected, but raise_if_empty=False did not suppress that error.

After taking another look at the docs passing n_rows=0 works, so this is not a blocker, but I thought the behavior in my first attempt was a bit surprising.

Expected behavior

I would expect an output similar to using n_rows=0:

[ins] In [4]: pl.read_csv('test.csv', n_rows=0)
Out[4]:
shape: (0, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ str  ┆ str  ┆ str  │
╞══════╪══════╪══════╡
└──────┴──────┴──────┘

Installed versions

[ins] In [5]: pl.show_versions()
--------Version info---------
Polars:               1.5.0
Index type:           UInt32
Platform:             macOS-12.7.6-x86_64-i386-64bit
Python:               3.12.5 (main, Aug  9 2024, 15:04:47) [Clang 14.0.0 (clang-1400.0.29.202)]

----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          <not installed>
connectorx:           <not installed>
deltalake:            <not installed>
fastexcel:            <not installed>
fsspec:               <not installed>
gevent:               <not installed>
great_tables:         <not installed>
hvplot:               <not installed>
matplotlib:           <not installed>
nest_asyncio:         <not installed>
numpy:                <not installed>
openpyxl:             <not installed>
pandas:               <not installed>
pyarrow:              <not installed>
pydantic:             <not installed>
pyiceberg:            <not installed>
sqlalchemy:           <not installed>
torch:                <not installed>
xlsx2csv:             <not installed>
xlsxwriter:           <not installed>

The text was updated successfully, but these errors were encountered:

ts826848 added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Aug 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

raise_if_empty=False does not suppress NoDataError if skip_rows_after_header is passed and no data rows remain #18362

raise_if_empty=False does not suppress NoDataError if skip_rows_after_header is passed and no data rows remain #18362

ts826848 commented Aug 25, 2024

raise_if_empty=False does not suppress NoDataError if skip_rows_after_header is passed and no data rows remain #18362

raise_if_empty=False does not suppress NoDataError if skip_rows_after_header is passed and no data rows remain #18362

Comments

ts826848 commented Aug 25, 2024

Checks

Reproducible example

Log output

Issue description

Expected behavior

Installed versions