Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: ibis.read_parquet fails AttributeError: 'NoneType' object has no attribute 'read_parquet' if the DuckDB backend is not installed #5420

Closed
10 tasks done
ogrisel opened this issue Feb 2, 2023 · 3 comments · Fixed by #5466
Labels
bug Incorrect behavior inside of ibis
Milestone

Comments

@ogrisel
Copy link
Contributor

ogrisel commented Feb 2, 2023

What happened?

This is mostly a UX problem, I am not sure what fixes are best. Let's start with the reproducer to highlight how a naive user (e.g. myself :) could get confused by the current state of the online doc and public API:

pip install ibis-framework

or alternatively, install ibis with a backend that would be expected to naturally support parquet file reading (but not duckdb):

pip install ibis-framework[dask]
  • Then naively call ibis.read_parquet:
>>> import ibis
>>> data_url = "https://storage.googleapis.com/ibis-tutorial-data/wowah_data/wowah_data_raw.parquet"
>>> ibis.read_parquet(data_url)
Traceback (most recent call last):
  Cell In[7], line 2
    ibis.read_parquet(data_url)
  File ~/mambaforge/envs/tmp/lib/python3.11/site-packages/ibis/expr/api.py:902 in read_parquet
    return con.read_parquet(sources, **kwargs)
AttributeError: 'NoneType' object has no attribute 'read_parquet'

At least the error message should be improved to state that no default backend has been configured and point to the documentation on how to do so.

I googled and found:

But then:

>>> ibis.options.default_backend = 'dask'
>>> ibis.read_parquet(data_url)
Traceback (most recent call last):
  Cell In[10], line 1
    ibis.read_parquet(data_url)
  File ~/mambaforge/envs/tmp/lib/python3.11/site-packages/ibis/expr/api.py:902 in read_parquet
    return con.read_parquet(sources, **kwargs)
AttributeError: 'str' object has no attribute 'read_parquet'

But then even if configuring as follows does not work:

>>> ibis.options.default_backend = ibis.dask
>>> ibis.read_parquet(data_url)
Traceback (most recent call last):
  Cell In[16], line 1
    ibis.read_parquet(data_url)
  File ~/mambaforge/envs/tmp/lib/python3.11/site-packages/ibis/expr/api.py:902 in read_parquet
    return con.read_parquet(sources, **kwargs)
AttributeError: 'Backend' object has no attribute 'read_parquet'

Let's try again with a backend that exposes the read_parquet method explicitly:

>>> %pip install "ibis-framework[datafusion]"
>>> ibis.options.default_backend = ibis.datafusion
>>> ibis.read_parquet(data_url)
Traceback (most recent call last):
  Cell In[11], line 1
    ibis.read_parquet(data_url)
  File ~/mambaforge/envs/tmp/lib/python3.11/site-packages/ibis/expr/api.py:902 in read_parquet
    return con.read_parquet(sources, **kwargs)
  File ~/mambaforge/envs/tmp/lib/python3.11/site-packages/ibis/backends/datafusion/__init__.py:201 in read_parquet
    self._context.deregister_table(table_name)
AttributeError: 'Backend' object has no attribute '_context'

Is this expected to work? If so I can open a dedicated issue.

EDIT: done at #5436.

Then I tried with polars and it works

>>> %pip install "ibis-framework[polars]"
>>> ibis.options.default_backend = ibis.polars
>>> ibis.read_parquet(data_url)
DatabaseTable: ibis_read_parquet_1
  char      int32
  level     int32
  race      string
  charclass string
  zone      string
  guild     int32
  timestamp timestamp

So from that experiment here is a summary of the UX / doc fixes that could improve the Ibis onboarding experience:

  • improve the docstrings of top level functions such as ibis.read_parquet to mention that they delegate to methods with the same name on the backend configured as default in ibis.config.default_backend;
  • improve the install doc to encourage the user to install at least one backend and then link to the doc on how to configure ibis.config.default_backend` accordingly. If they are unsure, recommend to install the duckdb backend and explaining why (no server install required, efficient query processing and extensive coverage for all Ibis operations).
  • document the default backend configuration and what it impacts in https://ibis-project.org/docs/dev/user_guide/configuration/
  • link to that doc from https://ibis-project.org/docs/dev/backends/
  • fix the API doc of ibis.config.default_backend to not advertise for the str type and pass a backend instance instead;
  • or alternatively make it possible to pass a backend name instead of an instance;
  • ibis.read_parquet should probably fail with an explicit error message when ibis.get_backend() is None and suggest to install the default duckdb backend or configure an alternative backend that has direct support for read_parquet;
  • or maybe ibis.get_backend() ibis.config._default_backend() directly raise a warning or an exception when duckdb is not installed and no valid ibis.config.default_backend has been configured?
  • add the read_parquet method on all backend classes, raising NotImplementedError when not easy to support natively and potentially pointing to the contributors guide :)

What version of ibis are you using?

4.1.0

What backend(s) are you using, if any?

dask

Relevant log output

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@ogrisel ogrisel added the bug Incorrect behavior inside of ibis label Feb 2, 2023
@ogrisel
Copy link
Contributor Author

ogrisel commented Feb 2, 2023

Maybe the install guide should not even display the naked install command:

pip install ibis-framework

and instead replace it with:

pip install ibis-framework[duckdb]

(same for conda/mamba).

It would be possible to add a paragraph at the end to explain how to do a naked install for development purpose, but this should not be the first command exposed in the doc.

@cpcloud
Copy link
Member

cpcloud commented Feb 2, 2023

@ogrisel Wonderful feedback ❤️ as usual.

I think most or all of these are well within scope for 5.0.

@cpcloud cpcloud added this to the 5.0 milestone Feb 2, 2023
gforsyth added a commit to gforsyth/ibis that referenced this issue Feb 2, 2023
xref ibis-project#5420

Default to `duckdb` in the install doc, fix the API doc of
`default_backend`, and note that the `read_*` functions passthrough to
the default backend.
gforsyth added a commit to gforsyth/ibis that referenced this issue Feb 2, 2023
xref ibis-project#5420

Default to `duckdb` in the install doc, fix the API doc of
`default_backend`, and note that the `read_*` functions passthrough to
the default backend.

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
gforsyth added a commit to gforsyth/ibis that referenced this issue Feb 2, 2023
xref ibis-project#5420

Default to `duckdb` in the install doc, fix the API doc of
`default_backend`, and note that the `read_*` functions passthrough to
the default backend.

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
gforsyth added a commit that referenced this issue Feb 2, 2023
xref #5420

Default to `duckdb` in the install doc, fix the API doc of
`default_backend`, and note that the `read_*` functions passthrough to
the default backend.

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
gforsyth added a commit to gforsyth/ibis that referenced this issue Feb 2, 2023
xref ibis-project#5420

This adds `read_parquet` and `read_csv` as methods to the
`FileIOHandler` mixin (renamed from `ResultHandler`).

Now every backend will have these two `read_` methods and will raise a
`NotImplementedError`.
ogrisel added a commit to ogrisel/ibis that referenced this issue Feb 3, 2023
…backend

Follow up on ibis-project#5423 
xref ibis-project#5420

This also adds single quotes around bracketed pip install commands to make sure they work with zsh which is the default shell on macOS.
ogrisel added a commit to ogrisel/ibis that referenced this issue Feb 3, 2023
…backend

Follow up on ibis-project#5423
xref ibis-project#5420

This also adds single quotes around bracketed pip install commands to make sure they work with zsh which is the default shell on macOS.
cpcloud pushed a commit to ogrisel/ibis that referenced this issue Feb 3, 2023
…backend

Follow up on ibis-project#5423
xref ibis-project#5420

This also adds single quotes around bracketed pip install commands to make sure they work with zsh which is the default shell on macOS.
cpcloud pushed a commit to ogrisel/ibis that referenced this issue Feb 3, 2023
…backend

Follow up on ibis-project#5423
xref ibis-project#5420

This also adds single quotes around bracketed pip install commands to make sure they work with zsh which is the default shell on macOS.
cpcloud pushed a commit that referenced this issue Feb 3, 2023
…backend

Follow up on #5423
xref #5420

This also adds single quotes around bracketed pip install commands to make sure they work with zsh which is the default shell on macOS.
gforsyth added a commit to gforsyth/ibis that referenced this issue Feb 3, 2023
xref ibis-project#5420

This adds `read_parquet` and `read_csv` as methods to the
`FileIOHandler` mixin (renamed from `ResultHandler`).

Now every backend will have these two `read_` methods and will raise a
`NotImplementedError`.

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
cpcloud pushed a commit that referenced this issue Feb 3, 2023
xref #5420

This adds `read_parquet` and `read_csv` as methods to the
`FileIOHandler` mixin (renamed from `ResultHandler`).

Now every backend will have these two `read_` methods and will raise a
`NotImplementedError`.

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
@ogrisel
Copy link
Contributor Author

ogrisel commented Feb 3, 2023

I removed:

- [ ] fix `ibis.datafusion.read_parquet`.

from the todo list and instead opened a dedicated issue #5436 as it's quite unrelated to the DOC/UX focus of this report.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect behavior inside of ibis
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants