| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,3 +1,16 @@ | ||
| # Configuration Options | ||
|
|
||
| <!-- prettier-ignore-start --> | ||
| ::: ibis.config.Options | ||
| options: | ||
| show_bases: false | ||
| ::: ibis.config.Repr | ||
| options: | ||
| show_bases: false | ||
| ::: ibis.config.SQL | ||
| options: | ||
| show_bases: false | ||
| ::: ibis.config.ContextAdjustment | ||
| options: | ||
| show_bases: false | ||
| <!-- prettier-ignore-end --> |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| --- | ||
| backend_name: MS SQL Server | ||
| backend_url: https://www.microsoft.com/en-us/evalcenter/evaluate-sql-server-2022 | ||
| backend_module: mssql | ||
| backend_param_style: connection parameters | ||
| version_added: "4.0" | ||
| --- | ||
|
|
||
| {% include 'backends/template.md' %} |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| --- | ||
| backend_name: Polars | ||
| backend_url: https://pola-rs.github.io/polars-book/user-guide/index.html | ||
| backend_module: polars | ||
| backend_param_style: connection parameters | ||
| is_experimental: true | ||
| version_added: "4.0" | ||
| --- | ||
|
|
||
| {% include 'backends/template.md' %} |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,11 @@ | ||
| --- | ||
| backend_name: Snowflake | ||
| backend_url: https://snowflake.com/ | ||
| backend_module: snowflake | ||
| backend_param_style: a SQLAlchemy connection string | ||
| backend_connection_example: ibis.connect("snowflake://user:pass@locator/database/schema") | ||
| is_experimental: true | ||
| version_added: "4.0" | ||
| --- | ||
|
|
||
| {% include 'backends/template.md' %} |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| --- | ||
| backend_name: Trino | ||
| backend_url: https://trino.io | ||
| backend_module: trino | ||
| backend_param_style: a SQLAlchemy connection string | ||
| is_experimental: true | ||
| version_added: "4.0" | ||
| --- | ||
|
|
||
| {% include 'backends/template.md' %} |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,41 @@ | ||
| # Ibis v4.0.0 | ||
|
|
||
| **by Patrick Clarke** | ||
|
|
||
| 09 January 2023 | ||
|
|
||
| ## Introduction | ||
|
|
||
| Ibis 4.0 has officially been released as the latest version of the package. | ||
| This release includes several new backends, improved functionality, and some major internal refactors. | ||
| A full list of the changes can be found in the [project release notes](../release_notes.md). | ||
| Let’s talk about some of the new changes 4.0 brings for Ibis users. | ||
|
|
||
| ## Backends | ||
|
|
||
| Ibis 4.0 brings [Polars](https://ibis-project.org/docs/latest/backends/Polars/), [Snowflake](https://ibis-project.org/docs/dev/backends/Snowflake/), and [Trino](https://ibis-project.org/docs/dev/backends/Trino/) into an already-impressive stock of supported backends. | ||
| The [Polars](https://www.pola.rs/) backend adds another way for users to work locally with DataFrames. | ||
| The [Snowflake](https://www.snowflake.com/en/) and [Trino](https://trino.io/) backends add a free and familiar python API to popular data warehouses. | ||
|
|
||
| Alongside these new backends, Google BigQuery and Microsoft SQL have been moved to the main repo and have been updated. | ||
|
|
||
| ## Functionality | ||
|
|
||
| There are a lot of improvements incoming, but some notable changes include: | ||
|
|
||
| - [read API](https://github.com/ibis-project/ibis/pull/5005)): allows users to read various file formats directly into their [configured `default_backend`](https://ibis-project.org/docs/dev/api/config/?h=default#ibis.config.Options) (default DuckDB) through `read_*` functions, which makes working with local files easier than ever. | ||
| - [to_pyarrow and to_pyarrow_batches](https://github.com/ibis-project/ibis/pull/4454#issuecomment-1262640204): users can now return PyArrow objects (Tables, Arrays, Scalars, RecordBatchReader) and therefore grants all of the functionality that PyArrow provides | ||
| - [JSON getitem](https://github.com/ibis-project/ibis/pull/4525): users can now run getitem on a JSON field using Ibis expressions with some backends | ||
| - [Plotting support through `__array__`](https://github.com/ibis-project/ibis/pull/4547): allows users to plot Ibis expressions out of the box | ||
|
|
||
| ## Refactors | ||
|
|
||
| This won't be visible to most users, but the project underwent a series of refactors that spans [multiple PRs](https://github.com/ibis-project/ibis/pulls?q=is%3Apr+is%3Amerged+%22refactor%3A%22+milestone%3A4.0.0). | ||
| Notable changes include removing intermediate expressions, improving the testing framework, and UX updates. | ||
|
|
||
| ## Additional Changes | ||
|
|
||
| As mentioned previously, additional functionality, bugfixes, and more have been included in the latest 4.0 release. | ||
| To stay up to date and learn more about recent changes: check out the project's homepage at [ibis-project.org](https://ibis-project.org/docs/latest/), follow [@IbisData](https://twitter.com/IbisData) on Twitter, find the source code and community on [GitHub](https://github.com/ibis-project/ibis), and join the discussion on [Gitter](https://gitter.im/ibis-dev/Lobby). | ||
|
|
||
| As always, try Ibis by [installing](https://ibis-project.org/docs/latest/install/) it today. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,81 @@ | ||
| # Contribute to the Ibis Codebase | ||
|
|
||
| ## Getting started | ||
|
|
||
| First, set up a [development environment](01_environment.md). | ||
|
|
||
| ## Taking Issues | ||
|
|
||
| If you find an issue you want to work on, write a comment with the text | ||
| `/take` on the issue. GitHub will then assign the issue to you. | ||
|
|
||
| ## Running the test suite | ||
|
|
||
| To run tests that do not require a backend: | ||
|
|
||
| ```sh | ||
| pytest -m core | ||
| ``` | ||
|
|
||
| ### Backend Test Suites | ||
|
|
||
| !!! info "You may be able to skip this section" | ||
|
|
||
| If you haven't made changes to the core of ibis (e.g., `ibis/expr`) | ||
| or any specific backends (`ibis/backends`) this material isn't necessary to | ||
| follow to make a pull request. | ||
|
|
||
| To run the tests for a specific backend (e.g. sqlite): | ||
|
|
||
| ```sh | ||
| pytest -m sqlite | ||
| ``` | ||
|
|
||
| ## Setting up non-trivial backends | ||
|
|
||
| These client-server backends need to be started before testing them. | ||
| They can be started with `docker-compose` directly, or using the `just` tool. | ||
|
|
||
| - ClickHouse: `just up clickhouse` | ||
| - PostgreSQL: `just up postgres` | ||
| - MySQL: `just up mysql` | ||
| - impala: `just up impala` | ||
|
|
||
| ### Test the backend locally | ||
|
|
||
| If anything seems amiss with a backend, you can of course test it locally: | ||
|
|
||
| ```sh | ||
| export PGPASSWORD=postgres | ||
| psql -t -A -h localhost -U postgres -d ibis_testing -c "select 'success'" | ||
| ``` | ||
|
|
||
| ## Download Test Data | ||
|
|
||
| Backends need to be populated with test data to run the tests successfully: | ||
|
|
||
| ```sh | ||
| just download-data | ||
| ``` | ||
|
|
||
| ## Writing the commit | ||
|
|
||
| Ibis follows the [Conventional Commits](https://www.conventionalcommits.org/) structure. | ||
| In brief, the commit summary should look like: | ||
|
|
||
| fix(types): make all floats doubles | ||
|
|
||
| The type (e.g. `fix`) can be: | ||
|
|
||
| - `fix`: A bug fix. Correlates with PATCH in SemVer | ||
| - `feat`: A new feature. Correlates with MINOR in SemVer | ||
| - `docs`: Documentation only changes | ||
| - `style`: Changes that do not affect the meaning of the code (white-space, formatting, missing semi-colons, etc) | ||
| ` | ||
| If the commit fixes a Github issue, add something like this to the bottom of the description: | ||
|
|
||
| fixes #4242 | ||
|
|
||
| ## Submit a PR | ||
|
|
||
| Ibis follows the standard Git Pull Request process. The team will review the PR and merge when it's ready. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,17 @@ | ||
| # Style and Formatting | ||
|
|
||
| ## Code Style | ||
|
|
||
| - [`black`](https://github.com/psf/black): Formatting Python code | ||
| - [`ruff`](https://github.com/charliermarsh/ruff): Formatting and sorting `import` statements | ||
| - [`shellcheck`](https://github.com/koalaman/shellcheck): Linting shell scripts | ||
| - [`shfmt`](https://github.com/mvdan/sh): Formatting shell scripts | ||
| - [`statix`](https://github.com/nerdypepper/statix): Linting nix files | ||
| - [`nixpkgs-fmt`](https://github.com/nix-community/nixpkgs-fmt): Formatting nix files | ||
|
|
||
| !!! tip | ||
|
|
||
| If you use `nix-shell`, all of these are already setup for you and ready to use, and you don't need to do anything to install these tools. | ||
|
|
||
| We use [numpydoc](https://numpydoc.readthedocs.io/en/latest/format.html) as our | ||
| standard format for docstrings. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,39 @@ | ||
| # Maintaining the Codebase | ||
|
|
||
| Ibis maintainers are expected to handle the following tasks as they arise: | ||
|
|
||
| - Reviewing and merging pull requests | ||
| - Triaging new issues | ||
|
|
||
| ## Dependencies | ||
|
|
||
| A number of tasks that are typically associated with maintenance are partially or fully automated. | ||
|
|
||
| - [WhiteSource Renovate](https://www.whitesourcesoftware.com/free-developer-tools/renovate/) (Python library dependencies and GitHub Actions) | ||
| - [Custom GitHub Action](https://github.com/ibis-project/ibis/actions/workflows/update-deps.yml) (Nix dependencies) | ||
|
|
||
| ### poetry | ||
|
|
||
| Occasionally you may need to lock [`poetry`](https://python-poetry.org) dependencies. Edit `pyproject.toml` as needed, then run: | ||
|
|
||
| ```sh | ||
| poetry lock --no-update | ||
| ``` | ||
|
|
||
| ## Release | ||
|
|
||
| Ibis is released on [PyPI](https://pypi.org/project/ibis-framework/) and [Conda Forge](https://github.com/conda-forge/ibis-framework-feedstock). | ||
|
|
||
| === "PyPI" | ||
|
|
||
| Releases to PyPI are handled automatically using [semantic | ||
| release](https://egghead.io/lessons/javascript-automating-releases-with-semantic-release). | ||
|
|
||
| To trigger a release use the [Release GitHub Action](https://github.com/ibis-project/ibis/actions/workflows/release.yml). | ||
|
|
||
| === "`conda-forge`" | ||
|
|
||
| The conda-forge package is maintained as a [conda-forge feedstock](https://github.com/conda-forge/ibis-framework-feedstock). | ||
|
|
||
| After a release to PyPI, the conda-forge bot automatically updates the ibis | ||
| package. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,15 @@ | ||
| # Contribute to Ibis | ||
|
|
||
| {{ config.extra.project_name }} is developed and maintained by a [community of | ||
| volunteer contributors]({{ config.repo_url }}/graphs/contributors). | ||
|
|
||
| {% for group in config.extra.team %} | ||
|
|
||
| ## {{ group.name }} | ||
|
|
||
| {% for person in group.members %} | ||
|
|
||
| - https://github.com/{{ person }} | ||
| {% endfor %} | ||
|
|
||
| {% endfor %} |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,6 @@ | ||
| # Community | ||
|
|
||
| Ibis aims to be a welcoming, friendly, diverse and | ||
| inclusive community. Everybody is welcome, regardless of gender, sexual | ||
| orientation, gender identity, and expression, disability, physical appearance, | ||
| body size, race, or religion. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| # Ibis Documentation | ||
|
|
||
| Welcome to the ibis documentation! | ||
|
|
||
| - **Coming from Pandas?**: Check out [ibis for pandas users](../ibis-for-pandas-users.ipynb)! | ||
| - **Coming from SQL?**: Take a look at [ibis for SQL programmers](../ibis-for-sql-programmers.ipynb)! | ||
| - **Want to see some more examples?**: We've got [a set of tutorial notebooks](../tutorial/index.md) for that! | ||
| - **Looking for API docs?**: Start [here](../api/expressions/top_level.md)! | ||
| - **Interested in contributing?**: Our [contribution section](../community/contribute/index.md) has what you need! |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,68 @@ | ||
| # How to Chain Expressions with Underscore | ||
|
|
||
| Expressions can easily be chained using the deferred expression API, also known as the Underscore (`_`) API. | ||
|
|
||
| In this guide, we use the `_` API to concisely create column expressions and then chain table expressions. | ||
|
|
||
| ## Setup | ||
|
|
||
| To get started, import `_` from ibis: | ||
|
|
||
| ```python | ||
| import ibis | ||
| from ibis import _ | ||
|
|
||
| import pandas as pd | ||
| ``` | ||
|
|
||
| Let's create two in-memory tables using [`ibis.memtable`](memtable-join.md), an API introduced in 3.2: | ||
|
|
||
| ```python | ||
| t1 = ibis.memtable(pd.DataFrame({'x': range(5), 'y': list('ab')*2 + list('e')})) | ||
| t2 = ibis.memtable(pd.DataFrame({'x': range(10), 'z': list(reversed(list('ab')*2 + list('e')))*2})) | ||
| ``` | ||
|
|
||
| ## Creating ColumnExpressions | ||
|
|
||
| We can use `_` to create new column expressions without explicit reference to the previous table expression: | ||
|
|
||
| ```python | ||
| # We can pass a deferred expression into a function: | ||
| def modf(t): | ||
| return t.x % 3 | ||
|
|
||
| xmod = modf(_) | ||
|
|
||
| # We can create ColumnExprs like aggregate expressions: | ||
| ymax = _.y.max() | ||
| zmax = _.z.max() | ||
| zct = _.z.count() | ||
| ``` | ||
|
|
||
| ## Chaining Ibis Expressions | ||
|
|
||
| We can also use it to chain Ibis expressions in one Python expression: | ||
|
|
||
| ```python | ||
| join = ( | ||
| t1 | ||
| # _ is t1 | ||
| .join(t2, _.x == t2.x) | ||
| # _ is the join result: | ||
| .mutate(xmod=xmod) | ||
| # _ is the TableExpression after mutate: | ||
| .group_by(_.xmod) | ||
| # `ct` is a ColumnExpression derived from a deferred expression: | ||
| .aggregate(ymax=ymax, zmax=zmax) | ||
| # _ is the aggregation result: | ||
| .filter(_.ymax == _.zmax) | ||
| # _ is the filtered result, and re-create xmod in t2 using modf: | ||
| .join(t2, _.xmod == modf(t2)) | ||
| # _ is the second join result: | ||
| .join(t1, _.xmod == modf(t1), suffixes=('', '_x')) | ||
| # _ is the third join result: | ||
| .select(_.x, _.y, _.z) | ||
| # Finally, _ is the selection result: | ||
| .order_by(_.x) | ||
| ) | ||
| ``` |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,110 @@ | ||
| # How to join an in-memory DataFrame to a TableExpression | ||
|
|
||
| You might have an in-memory DataFrame that you want to join to a TableExpression. | ||
| For example, you might have a file on your local machine that you don't want to upload to | ||
| your backend, but you need to join it to a table in that backend. | ||
|
|
||
| You can perform joins on local data to TableExpressions from your backend easily with Ibis MemTables. | ||
|
|
||
| In this guide, you will learn how to join a pandas DataFrame to a TableExpression. | ||
|
|
||
| ## Data Setup: | ||
|
|
||
| In this example, we will create two DataFrames: one containing events and one containing event names. | ||
| We will save the events to a parquet file and read that as a TableExpression in the DuckDB backend. | ||
| We will then convert the event names DataFrame to a PandasInMemoryTable (MemTable), which is | ||
| a pandas DataFrame as a TableExpression and join the two expressions together as we would | ||
| two TableExpressions in a backend. | ||
|
|
||
| ```python | ||
| In [1]: import ibis | ||
|
|
||
| In [2]: import pandas as pd | ||
| ...: from datetime import date | ||
|
|
||
| In [3]: # create a pandas DataFrame that we will convert to a | ||
| ...: # PandasInMemoryTable (Ibis MemTable) | ||
| ...: events = pd.DataFrame( | ||
| ...: { | ||
| ...: 'event_id': range(4), | ||
| ...: 'event_name': [f'e{k}' for k in range(4)], | ||
| ...: } | ||
| ...: ) | ||
|
|
||
| In [4]: # Create a parquet file that we will read in using the DuckDB backend | ||
| ...: # as a TableExpression | ||
| ...: measures = pd.DataFrame({ | ||
| ...: "event_id": [0] * 2 + [1] * 3 + [2] * 5 + [3] * 2 | ||
| ...: ,"measured_on": map( | ||
| ...: date | ||
| ...: ,[2021] * 12, [6] * 4 + [5] * 6 + [7] * 2 | ||
| ...: ,range(1, 13) | ||
| ...: ) | ||
| ...: ,"measurement": None | ||
| ...: }) | ||
|
|
||
| In [5]: measures.at[1, "measurement"] = 5. | ||
| ...: measures.at[4, "measurement"] = 42. | ||
| ...: measures.at[5, "measurement"] = 42. | ||
| ...: measures.at[7, "measurement"] = 11. | ||
|
|
||
| In [6]: # Save measures to parquet: | ||
| ...: measures.to_parquet('measures.parquet') | ||
|
|
||
| In [7]: # connect to a DuckDB backend | ||
| ...: conn = ibis.connect('duckdb://:memory:') | ||
| ...: measures = conn.register('measures.parquet', 'measures') | ||
|
|
||
| In [8]: # `measures` is a TableExpression in a DuckDB backend connection: | ||
| ...: measures | ||
| Out[8]: | ||
| AlchemyTable: measures | ||
| event_id int64 | ||
| measured_on date | ||
| measurement float64 | ||
| ``` | ||
|
|
||
| Converting a pandas DataFrame to a MemTable is as simple as feeding it to `ibis.memtable`: | ||
|
|
||
| ```python | ||
| In [9]: # To join, convert your DataFrame to a memtable | ||
| ...: mem_events = ibis.memtable(events) | ||
|
|
||
| In [10]: mem_events | ||
| Out[10]: | ||
| PandasInMemoryTable | ||
| data: | ||
| DataFrameProxy: | ||
| event_id event_name | ||
| 0 0 e0 | ||
| 1 1 e1 | ||
| 2 2 e2 | ||
| 3 3 e3 | ||
| ``` | ||
|
|
||
| and joining is the same as joining any two TableExpressions: | ||
|
|
||
| ```python | ||
| In [11]: # Join as you would two table expressions | ||
| ...: measures.join( | ||
| ...: mem_events | ||
| ...: ,measures['event_id'] == mem_events['event_id'] | ||
| ...: ,suffixes=('', '__x') | ||
| ...: ).execute() | ||
| Out[11]: | ||
| event_id measured_on measurement event_id__x event_name | ||
| 0 0 2021-06-01 NaN 0 e0 | ||
| 1 0 2021-06-02 5.0 0 e0 | ||
| 2 1 2021-06-03 NaN 1 e1 | ||
| 3 1 2021-06-04 NaN 1 e1 | ||
| 4 1 2021-05-05 42.0 1 e1 | ||
| 5 2 2021-05-06 42.0 2 e2 | ||
| 6 2 2021-05-07 NaN 2 e2 | ||
| 7 2 2021-05-08 11.0 2 e2 | ||
| 8 2 2021-05-09 NaN 2 e2 | ||
| 9 2 2021-05-10 NaN 2 e2 | ||
| 10 3 2021-07-11 NaN 3 e3 | ||
| 11 3 2021-07-12 NaN 3 e3 | ||
| ``` | ||
|
|
||
| Note that the return result of the `join` is a TableExpression and that `execute` returns a pandas DataFrame. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,74 @@ | ||
| # How to Sessionize a Log of Events | ||
|
|
||
| Suppose you have entities (users, objects, actions, etc) that have event logs through polling or event triggers. | ||
|
|
||
| You might be interested in partitioning these logs by something called **sessions**, which can be defined as the duration of an event. | ||
|
|
||
| In the case of a user portal, it might be the time spent completing a task or navigating an app. | ||
| For games, it might be a time spent playing the game or remaining logged in. | ||
| For retail, it might be checking out or walking the premises. | ||
|
|
||
| This guide on sessionization is inspired by [_The Expressions API in Polars is Amazing_](https://www.pola.rs/posts/the-expressions-api-in-polars-is-amazing/), | ||
| a blog post in the [Polars](https://www.pola.rs/) community demonstrating the strength of polars expressions. | ||
|
|
||
| ## Sessionizing Logs on a Cadence | ||
|
|
||
| For this example, we have a dataset that contains entities polled on a cadence. | ||
| The data used here can be found at `https://storage.googleapis.com/ibis-tutorial-data/wowah_data/wowah_data.csv`. | ||
| You can use `ibis.read("https://storage.googleapis.com/ibis-tutorial-data/wowah_data/wowah_data.csv")` to quickly get it into a table expression. | ||
|
|
||
| Our data contains the following: | ||
|
|
||
| - `char` : a unique identifier for a character (or a player). This is our entity column | ||
| - `timestamp`: a timestamp denoting when a `char` was polled. This occurs every ~10 minutes | ||
|
|
||
| We can take this information, along with a definition of what separates two sessions for an entity, and break our dataset up into sessions **without using any joins**: | ||
|
|
||
| ```python | ||
| # Imports | ||
| import ibis | ||
| from ibis import _ as c | ||
|
|
||
| # Read files into table expressions with ibis.read: | ||
| data = ibis.read("https://storage.googleapis.com/ibis-tutorial-data/wowah_data/wowah_data_raw.parquet") | ||
|
|
||
| # integer delay in seconds noting if a row should be included in the previous session for an entity | ||
| session_boundary_threshold = 30 * 60 | ||
|
|
||
| # Window for finding session ids per character | ||
| entity_window = ibis.cumulative_window(group_by=c.char, order_by=c.timestamp) | ||
|
|
||
| # Take the previous timestamp within a window (by character ordered by timestamp): | ||
| # Note: the first value in a window will be null | ||
| ts_lag = c.timestamp.lag().over(entity_window) | ||
|
|
||
| # Subtract the lag from the current timestamp to get a timedelta | ||
| ts_delta = c.timestamp - ts_lag | ||
|
|
||
| # Compare timedelta to our session delay in seconds to determine if the | ||
| # current timestamp falls outside of the session. | ||
| # Cast as int for aggregation | ||
| is_new_session = (ts_delta > ibis.interval(seconds=session_boundary_threshold)) | ||
|
|
||
| # Window for finding session min/max | ||
| session_window = ibis.window(group_by=[c.char, c.session_id]) | ||
|
|
||
| # Generate all of the data we need to analyze sessions: | ||
| sessionized = ( | ||
| data | ||
| # Create a session id for each character by using a cumulative sum | ||
| # over the `new_session` column | ||
| .mutate(new_session=is_new_session.fillna(True)) | ||
| # Create a session id for each character by using a cumulative sum | ||
| # over the `new_session` column | ||
| .mutate(session_id=c.new_session.sum().over(entity_window)) | ||
| # Drop `new_session` because it is no longer needed | ||
| .drop("new_session") | ||
| .mutate( | ||
| # Get session duration using max(timestamp) - min(timestamp) over our window | ||
| session_duration=c.timestamp.max().over(session_window) - c.timestamp.min().over(session_window) | ||
| ) | ||
| # Sort for convenience | ||
| .order_by([c.char, c.timestamp]) | ||
| ) | ||
| ``` |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,215 +1,57 @@ | ||
| --- | ||
| hide: | ||
| - toc | ||
| - navigation | ||
| - footer | ||
| --- | ||
|
|
||
| # <span style="font-size: 1.5em; margin: 0">:ibis-logo: The Ibis Project</span> | ||
|
|
||
| ## The flexibility of Python analytics with the scale and performance of modern SQL. | ||
|
|
||
| --- | ||
|
|
||
| <div class="install-tutorial-button" markdown> | ||
| [Install](./install.md){ .md-button .md-button--primary } | ||
| [Tutorial](./tutorial/index.md){ .md-button } | ||
| </div> | ||
|
|
||
| --- | ||
|
|
||
| ```python title="Write high-level Python code" | ||
| >>> import ibis | ||
| >>> con = ibis.connect('movielens.sqlite') | ||
| >>> movies = con.tables.movies | ||
| >>> rating_by_year = movies.group_by('year').avg_rating.mean() | ||
| >>> q = rating_by_year.order_by(rating_by_year.year.desc()) | ||
| ``` | ||
|
|
||
| ```py title="Compile to SQL" | ||
| >>> con.compile(q) | ||
|
|
||
| SELECT year, avg(avg_rating) | ||
| FROM movies t1 | ||
| GROUP BY t1.year | ||
| ORDER BY t1.year DESC | ||
| ``` | ||
|
|
||
| ```py title="Execute on multiple backends" | ||
| >>> con.execute(q) | ||
|
|
||
| year mean(avg_rating) | ||
| 0 2021 2.586362 | ||
| 1 2020 2.719994 | ||
| 2 2019 2.932275 | ||
| 3 2018 3.005046 | ||
| 4 2017 3.071669 | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ## Features | ||
|
|
||
| - **Consistent syntax across backends**: Enjoy a uniform Python API, whether using [DuckDB](https://duckdb.org), [PostgreSQL](https://postgresql.org), [PySpark](https://spark.apache.org/docs/latest/api/python/index.html), [BigQuery](https://cloud.google.com/bigquery/), or [any other supported backend](./backends/index.md). | ||
| - **Performant**: Execute queries as fast as the database engine itself. | ||
| - **Interactive**: Explore data in a notebook or REPL. | ||
| - **Extensible**: Add new operations, optimizations, and custom APIs. | ||
| - **Free and open-source**: licensed under Apache 2.0, [available on Github](https://github.com/ibis-project/ibis/blob/master/README.md) |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,44 @@ | ||
| --- | ||
| hide: | ||
| - toc | ||
| - navigation | ||
| - footer | ||
| --- | ||
|
|
||
| # Install Ibis | ||
|
|
||
| === "pip" | ||
|
|
||
| ```sh | ||
| pip install ibis-framework # (1) | ||
| ``` | ||
|
|
||
| 1. Note that the `ibis-framework` package is *not* the same as the `ibis` package in PyPI. These two libraries cannot coexist in the same Python environment, as they are both imported with the `ibis` module name. | ||
|
|
||
| {% for mgr in ["conda", "mamba"] %} | ||
| === "{{ mgr }}" | ||
|
|
||
| ```sh | ||
| {{ mgr }} install -c conda-forge ibis-framework | ||
| ``` | ||
|
|
||
| {% endfor %} | ||
|
|
||
| ## Install backend dependencies | ||
|
|
||
| {% for backend in sorted(ibis.backends.base._get_backend_names()) %} | ||
| === "{{ backend }}" | ||
|
|
||
| ```sh | ||
| pip install 'ibis-framework[{{ backend }}]' | ||
| ``` | ||
|
|
||
| {% endfor %} | ||
|
|
||
| --- | ||
|
|
||
| After you've successfully installed Ibis, try going through the tutorial: | ||
|
|
||
| <div class="install-tutorial-button" markdown> | ||
| [Go to the Tutorial](./tutorial/index.md){ .md-button .md-button--primary } | ||
| </div> |