Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: what is Ibis? #8251

Closed
1 task done
lostmygithubaccount opened this issue Feb 6, 2024 · 0 comments · Fixed by #8490
Closed
1 task done

docs: what is Ibis? #8251

lostmygithubaccount opened this issue Feb 6, 2024 · 0 comments · Fixed by #8490
Assignees
Labels
docs Documentation related issues or PRs

Comments

@lostmygithubaccount
Copy link
Member

Please describe the issue

we should have a conceptual document clearly explaining what Ibis is


from Will on Zulip:

Where can you find a clear guide that states what Ibis is? It is very unclear to me what it can and can't do. The documentation tells you how to install it and run a few examples, but I see something like .filter(_.columnname == 3) and I have no idea what is supposed to be going on.

In particular:

  1. Can you use Ibis to create tables in e.g. a snowflake warehouse?
  2. Does Ibis work for data workloads that are larger than memory?
  3. How does it actually work? I picture it converting things into SQL, but I'm not clear at what point in evaluates things.
  4. E.g., if I've built a table and I'm looking at the first 10 values, and then I add a new column, will it recalculate everything, or only the new column?

I answered:

hi! we have a "Why Ibis" article here: https://ibis-project.org/why

I had drafted a "What is Ibis" article at one point. Ibis is a Python dataframe library that decouples the API from the execution engine. Most Python dataframes (pandas, Polars, PySpark, Snowpark, etc.) tightly couple these -- resulting in slight differences in API and a lot of overhead in converting between them. Ibis instead uses an intermediary representation of its API to convert to backend-native code. For most backends, this is ends up being converted to SQL.

Can you use Ibis to create tables in e.g. a snowflake warehouse?
Yes -- Ibis has a create_table method that, when used with the Snowflake backend, can be used to create tables in Snowflake (or other backends).

Does Ibis work for data workloads that are larger than memory?
Yes -- Ibis will perform as well as the backend allows. For DuckDB and Polars, they generally support larger-than-memory operation.

How does it actually work? I picture it converting things into SQL, but I'm not clear at what point in evaluates things.
Ibis expressions are compiled into some IR and then converted into backend-native code, usually SQL. There's some explanation here but it's a bit out of date: https://ibis-project.org/concepts/internals

E.g., if I've built a table and I'm looking at the first 10 values, and then I add a new column, will it recalculate everything, or only the new column?
Generally only the new column, though I guess it could depend if you're using a view instead of a table. Ibis is lazily evaluated. You can use .cache() functions or turn on interactive mode

Code of Conduct

  • I agree to follow this project's Code of Conduct
@lostmygithubaccount lostmygithubaccount added the docs Documentation related issues or PRs label Feb 6, 2024
lostmygithubaccount added a commit that referenced this issue Mar 1, 2024
…es (#8490)

## Description of changes

- add link to the Why VoDa blog post from who concept article
- makes some formatting edits in the who concept article per guidelines
- major rework of Why Ibis concept article, including what Ibis is
- add RisingWave to install tabset


## Issues closed

closes #8251
closes #8488
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation related issues or PRs
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

1 participant