Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raise_if_empty=False does not suppress NoDataError if skip_rows_after_header is passed and no data rows remain #18362

Open
2 tasks done
ts826848 opened this issue Aug 25, 2024 · 0 comments
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars

Comments

@ts826848
Copy link

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

polars-test % cat test.csv
col1,col2,col3
u1,u2,u3
[ins] In [1]: import polars as pl

[ins] In [2]: pl.read_csv('test.csv')
Out[2]:
shape: (1, 3)
┌──────┬──────┬──────┐
│ col1col2col3 │
│ ---------  │
│ strstrstr  │
╞══════╪══════╪══════╡
│ u1u2u3   │
└──────┴──────┴──────┘

[ins] In [3]: pl.read_csv('test.csv', skip_rows_after_header=1, raise_if_empty=False)
---------------------------------------------------------------------------
NoDataError                               Traceback (most recent call last)
Cell In[3], line 1
----> 1 pl.read_csv('test.csv', skip_rows_after_header=1, raise_if_empty=False)

File /private/tmp/polars-test/.venv/lib/python3.12/site-packages/polars/_utils/deprecation.py:91, in deprecate_renamed_parameter.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
     86 @wraps(function)
     87 def wrapper(*args: P.args, **kwargs: P.kwargs) -> T:
     88     _rename_keyword_argument(
     89         old_name, new_name, kwargs, function.__qualname__, version
     90     )
---> 91     return function(*args, **kwargs)

File /private/tmp/polars-test/.venv/lib/python3.12/site-packages/polars/_utils/deprecation.py:91, in deprecate_renamed_parameter.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
     86 @wraps(function)
     87 def wrapper(*args: P.args, **kwargs: P.kwargs) -> T:
     88     _rename_keyword_argument(
     89         old_name, new_name, kwargs, function.__qualname__, version
     90     )
---> 91     return function(*args, **kwargs)

File /private/tmp/polars-test/.venv/lib/python3.12/site-packages/polars/_utils/deprecation.py:91, in deprecate_renamed_parameter.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
     86 @wraps(function)
     87 def wrapper(*args: P.args, **kwargs: P.kwargs) -> T:
     88     _rename_keyword_argument(
     89         old_name, new_name, kwargs, function.__qualname__, version
     90     )
---> 91     return function(*args, **kwargs)

File /private/tmp/polars-test/.venv/lib/python3.12/site-packages/polars/io/csv/functions.py:496, in read_csv(source, has_header, columns, new_columns, separator, comment_prefix, quote_char, skip_rows, schema, schema_overrides, null_values, missing_utf8_is_empty_string, ignore_errors, try_parse_dates, n_threads, infer_schema, infer_schema_length, batch_size, n_rows, encoding, low_memory, rechunk, use_pyarrow, storage_options, skip_rows_after_header, row_index_name, row_index_offset, sample_size, eol_char, raise_if_empty, truncate_ragged_lines, decimal_comma, glob)
    488 else:
    489     with prepare_file_arg(
    490         source,
    491         encoding=encoding,
   (...)
    494         storage_options=storage_options,
    495     ) as data:
--> 496         df = _read_csv_impl(
    497             data,
    498             has_header=has_header,
    499             columns=columns if columns else projection,
    500             separator=separator,
    501             comment_prefix=comment_prefix,
    502             quote_char=quote_char,
    503             skip_rows=skip_rows,
    504             schema_overrides=schema_overrides,
    505             schema=schema,
    506             null_values=null_values,
    507             missing_utf8_is_empty_string=missing_utf8_is_empty_string,
    508             ignore_errors=ignore_errors,
    509             try_parse_dates=try_parse_dates,
    510             n_threads=n_threads,
    511             infer_schema_length=infer_schema_length,
    512             batch_size=batch_size,
    513             n_rows=n_rows,
    514             encoding=encoding if encoding == "utf8-lossy" else "utf8",
    515             low_memory=low_memory,
    516             rechunk=rechunk,
    517             skip_rows_after_header=skip_rows_after_header,
    518             row_index_name=row_index_name,
    519             row_index_offset=row_index_offset,
    520             sample_size=sample_size,
    521             eol_char=eol_char,
    522             raise_if_empty=raise_if_empty,
    523             truncate_ragged_lines=truncate_ragged_lines,
    524             decimal_comma=decimal_comma,
    525             glob=glob,
    526         )
    528 if new_columns:
    529     return _update_columns(df, new_columns)

File /private/tmp/polars-test/.venv/lib/python3.12/site-packages/polars/io/csv/functions.py:642, in _read_csv_impl(source, has_header, columns, separator, comment_prefix, quote_char, skip_rows, schema, schema_overrides, null_values, missing_utf8_is_empty_string, ignore_errors, try_parse_dates, n_threads, infer_schema_length, batch_size, n_rows, encoding, low_memory, rechunk, skip_rows_after_header, row_index_name, row_index_offset, sample_size, eol_char, raise_if_empty, truncate_ragged_lines, decimal_comma, glob)
    638         raise ValueError(msg)
    640 projection, columns = parse_columns_arg(columns)
--> 642 pydf = PyDataFrame.read_csv(
    643     source,
    644     infer_schema_length,
    645     batch_size,
    646     has_header,
    647     ignore_errors,
    648     n_rows,
    649     skip_rows,
    650     projection,
    651     separator,
    652     rechunk,
    653     columns,
    654     encoding,
    655     n_threads,
    656     path,
    657     dtype_list,
    658     dtype_slice,
    659     low_memory,
    660     comment_prefix,
    661     quote_char,
    662     processed_null_values,
    663     missing_utf8_is_empty_string,
    664     try_parse_dates,
    665     skip_rows_after_header,
    666     parse_row_index_args(row_index_name, row_index_offset),
    667     sample_size=sample_size,
    668     eol_char=eol_char,
    669     raise_if_empty=raise_if_empty,
    670     truncate_ragged_lines=truncate_ragged_lines,
    671     decimal_comma=decimal_comma,
    672     schema=schema,
    673 )
    674 return wrap_df(pydf)

NoDataError: not enough lines to skip

Log output

No response

Issue description

I was trying to read just the first-row column contents of a CSV with two header rows and no data rows. I tried to use skip_rows_after_header=1 in my first attempt, which resulted in a NoDataError as expected, but raise_if_empty=False did not suppress that error.

After taking another look at the docs passing n_rows=0 works, so this is not a blocker, but I thought the behavior in my first attempt was a bit surprising.

Expected behavior

I would expect an output similar to using n_rows=0:

[ins] In [4]: pl.read_csv('test.csv', n_rows=0)
Out[4]:
shape: (0, 3)
┌──────┬──────┬──────┐
│ col1 ┆ col2 ┆ col3 │
│ ---  ┆ ---  ┆ ---  │
│ str  ┆ str  ┆ str  │
╞══════╪══════╪══════╡
└──────┴──────┴──────┘

Installed versions

[ins] In [5]: pl.show_versions()
--------Version info---------
Polars:               1.5.0
Index type:           UInt32
Platform:             macOS-12.7.6-x86_64-i386-64bit
Python:               3.12.5 (main, Aug  9 2024, 15:04:47) [Clang 14.0.0 (clang-1400.0.29.202)]

----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          <not installed>
connectorx:           <not installed>
deltalake:            <not installed>
fastexcel:            <not installed>
fsspec:               <not installed>
gevent:               <not installed>
great_tables:         <not installed>
hvplot:               <not installed>
matplotlib:           <not installed>
nest_asyncio:         <not installed>
numpy:                <not installed>
openpyxl:             <not installed>
pandas:               <not installed>
pyarrow:              <not installed>
pydantic:             <not installed>
pyiceberg:            <not installed>
sqlalchemy:           <not installed>
torch:                <not installed>
xlsx2csv:             <not installed>
xlsxwriter:           <not installed>
@ts826848 ts826848 added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Aug 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars
Projects
None yet
Development

No branches or pull requests

1 participant