| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,344 @@ | ||
| --- | ||
| title: "Using one Python dataframe API to take the billion row challenge with DuckDB, Polars, and DataFusion" | ||
| author: "Cody" | ||
| date: "2024-01-22" | ||
| categories: | ||
| - blog | ||
| - duckdb | ||
| - polars | ||
| - datafusion | ||
| - portability | ||
| --- | ||
|
|
||
| ## Overview | ||
|
|
||
| This is an implementation of the [The One Billion Row | ||
| Challenge](https://www.morling.dev/blog/one-billion-row-challenge/): | ||
|
|
||
| > Let’s kick off 2024 true coder style—I’m excited to announce the One Billion | ||
| > Row Challenge (1BRC), running from Jan 1 until Jan 31. | ||
| > Your mission, should you decide to accept it, is deceptively simple: write a | ||
| > Java program for retrieving temperature measurement values from a text file and | ||
| > calculating the min, mean, and max temperature per weather station. There’s just | ||
| > one caveat: the file has 1,000,000,000 rows! | ||
| I haven't written Java since dropping a computer science course my second year | ||
| of college that forced us to do functional programming exclusively in Java. | ||
| However, I'll gladly take the challenge in Python using Ibis! In fact, I did | ||
| something like this (generating a billion rows with 26 columns of random numbers | ||
| and doing basic aggregations) to test out DuckDB and Polars. | ||
|
|
||
| In this blog, we'll demonstrate how Ibis provides a single Python dataframe API | ||
| to take the billion row challenge with DuckDB, Polars, and DataFusion. | ||
|
|
||
| ## Setup | ||
|
|
||
| We need to generate the data from the challenge. First, clone the | ||
| [repo](https://github.com/gunnarmorling/1brc): | ||
|
|
||
| ```{.bash} | ||
| gh repo clone gunnarmorling/1brc | ||
| ``` | ||
|
|
||
| Then change into the Python directory and run the generation script with the | ||
| number of rows you want to generate: | ||
|
|
||
| ```{.bash} | ||
| cd 1brc/src/main/python | ||
| python create_measurements.py 1_000_000_000 | ||
| ``` | ||
|
|
||
| This will generate a file called `measurements.txt` in the `data` directory at | ||
| the root of the repo. It is 15GB on disk: | ||
|
|
||
| ```{.bash} | ||
| (venv) cody@voda 1brc % du 1brc/data/* | ||
| 15G 1brc/data/measurements.txt | ||
| 808K 1brc/data/weather_stations.csv | ||
| ``` | ||
|
|
||
| And consists of one billion rows with two columns separated by a semicolon: | ||
|
|
||
| ```{.bash} | ||
| (venv) cody@voda 1brc % head 1brc/data/measurements.txt | ||
| Kusugal;-67.2 | ||
| Ipil;-88.6 | ||
| Sohna;-31.2 | ||
| Lubuagan;-2.3 | ||
| Szentes;29.2 | ||
| Sylvan Lake;-70.7 | ||
| Ambato;-35.2 | ||
| Berkine;97.0 | ||
| Wernau;73.4 | ||
| Kennewick;-19.9 | ||
| ``` | ||
|
|
||
| Also, you'll need to install Ibis with the three backends we'll use: | ||
|
|
||
| ```{.bash} | ||
| pip install 'ibis-framework[duckdb,polars,datafusion]' | ||
| ``` | ||
|
|
||
| ## Understanding Ibis | ||
|
|
||
| Ibis provides a standard dataframe API decoupled from the execution engine. It | ||
| compiles Ibis expressions to a form of intermediary representation (often SQL) | ||
| that can be executed by different backends. | ||
|
|
||
| This allows us to write a single Ibis expression to complete the challenge with | ||
| many different execution engine backends. | ||
|
|
||
| :::{.callout-warning} | ||
| While Ibis does its best to abstract away the differences between backends, this | ||
| cannot be done in some areas like data input and output. For example, the | ||
| `read_csv` function across various backends (in their SQL and Python forms) have | ||
| different parameters. We'll handle that with different `kwargs` dictionaries for | ||
| these backends in this post. | ||
|
|
||
| In general, besides creating a connection and data input/output, the Ibis API is | ||
| the same across backends. | ||
| ::: | ||
|
|
||
| ## Completing the challenge thrice | ||
|
|
||
| We'll use three great options for local backends -- DuckDB, Polars, and | ||
| DataFusion -- to complete the challenge. | ||
|
|
||
| ### Setup | ||
|
|
||
| Before we get started, we'll make some imports, turn on interactive mode, and | ||
| define the `kwargs` dictionary for the backends corresponding to their | ||
| `read_csv` function: | ||
|
|
||
| ```{python} | ||
| import ibis | ||
| import polars as pl | ||
| import pyarrow as pa | ||
| ibis.options.interactive = True | ||
| duckdb_kwargs = { | ||
| "delim": ";", | ||
| "header": False, | ||
| "columns": {"station": "VARCHAR", "temperature": "DOUBLE"}, | ||
| } | ||
| polars_kwargs = { | ||
| "separator": ";", | ||
| "has_header": False, | ||
| "new_columns": ["station", "temperature"], | ||
| "schema": {"station": pl.Utf8, "temperature": pl.Float64}, | ||
| } | ||
| datafusion_kwargs = { | ||
| "delimiter": ";", | ||
| "has_header": False, | ||
| "schema": pa.schema( | ||
| [ | ||
| ( | ||
| "station", | ||
| pa.string(), | ||
| ), | ||
| ( | ||
| "temperature", | ||
| pa.float64(), | ||
| ), | ||
| ] | ||
| ), | ||
| "file_extension": ".txt", | ||
| } | ||
| ``` | ||
|
|
||
| Let's define a function to run the same code with each backend to complete the challenge: | ||
|
|
||
| ```{python} | ||
| def run_challenge(t): | ||
| res = ( | ||
| t.group_by(ibis._.station) | ||
| .agg( | ||
| min_temp=ibis._.temperature.min(), | ||
| mean_temp=ibis._.temperature.mean(), | ||
| max_temp=ibis._.temperature.max(), | ||
| ) | ||
| .order_by(ibis._.station.desc()) | ||
| ) | ||
| return res | ||
| ``` | ||
|
|
||
| ### Completing the challenge | ||
|
|
||
| Let's complete the challenge with each backend. | ||
|
|
||
| :::{.callout-note} | ||
| The results are the same across backends but look suspicious. It is noted in the | ||
| repository that the Python generation code is "unofficial", so may have some | ||
| problems. Given this is a contrived example of generated data, I'm not going to | ||
| worry about it. | ||
|
|
||
| The point is that we can easily complete the challenge with the same code across | ||
| many backends, letting them worry about the details of execution. For this | ||
| reason, I'm also not providing execution times. Try it out yourself! | ||
| ::: | ||
|
|
||
| ::: {.panel-tabset} | ||
|
|
||
| ## DuckDB | ||
|
|
||
| First let's set the backend to DuckDB (redundantly since it's the default) and | ||
| the `kwargs` dictionary: | ||
|
|
||
| ```{python} | ||
| ibis.set_backend("duckdb") # <1> | ||
| kwargs = duckdb_kwargs | ||
| ``` | ||
|
|
||
| ```{python} | ||
| # | code-fold: true | ||
| # | echo: false | ||
| _ = ibis.get_backend().raw_sql("set enable_progress_bar = false") | ||
| ``` | ||
|
|
||
| 1. Redundant given DuckDB is the default | ||
|
|
||
| Next, we'll read in the data and take a look at the table: | ||
|
|
||
| ```{python} | ||
| t = ibis.read_csv("1brc/data/measurements.txt", **kwargs) | ||
| t.limit(3) | ||
| ``` | ||
|
|
||
| Then let's confirm it's **a billion** rows: | ||
|
|
||
| ```{python} | ||
| f"{t.count().to_pandas():,}" | ||
| ``` | ||
|
|
||
| Finally, we'll compute the min, mean, and max temperature per weather station: | ||
|
|
||
| ```{python} | ||
| res = run_challenge(t) | ||
| res | ||
| ``` | ||
|
|
||
| ## Polars | ||
|
|
||
| First let's set the backend to Polars and the `kwargs` dictionary: | ||
|
|
||
| ```{python} | ||
| ibis.set_backend("polars") # <1> | ||
| kwargs = polars_kwargs | ||
| ``` | ||
|
|
||
| 1. Set Polars as the default backend used | ||
|
|
||
| Next, we'll read in the data and take a look at the table: | ||
|
|
||
| ```{python} | ||
| t = ibis.read_csv("1brc/data/measurements.txt", **kwargs) | ||
| t.limit(3) | ||
| ``` | ||
|
|
||
| Then let's confirm it's **a billion** rows: | ||
|
|
||
| ```{python} | ||
| f"{t.count().to_pandas():,}" | ||
| ``` | ||
|
|
||
| Finally, we'll compute the min, mean, and max temperature per weather station: | ||
|
|
||
| ```{python} | ||
| res = run_challenge(t) | ||
| res | ||
| ``` | ||
|
|
||
| ## DataFusion | ||
|
|
||
| First let's set the backend to DataFusion and the `kwargs` dictionary: | ||
|
|
||
| ```{python} | ||
| ibis.set_backend("datafusion") # <1> | ||
| kwargs = datafusion_kwargs | ||
| ``` | ||
|
|
||
| 1. Set DataFusion as the default backend used | ||
|
|
||
| Next, we'll read in the data and take a look at the table: | ||
|
|
||
| ```{python} | ||
| t = ibis.read_csv("1brc/data/measurements.txt", **kwargs) | ||
| t.limit(3) | ||
| ``` | ||
|
|
||
| Then let's confirm it's **a billion** rows: | ||
|
|
||
| ```{python} | ||
| f"{t.count().to_pandas():,}" | ||
| ``` | ||
|
|
||
| Finally, we'll compute the min, mean, and max temperature per weather station: | ||
|
|
||
| ```{python} | ||
| res = run_challenge(t) | ||
| res | ||
| ``` | ||
|
|
||
| ::: | ||
|
|
||
| ## Conclusion | ||
|
|
||
| While the one billion row challenge isn't a great benchmark, it's a fun way to | ||
| demonstrate how Ibis provides a single Python dataframe API to take the billion | ||
| row challenge with DuckDB, Polars, and DataFusion. Feel free to try it out with | ||
| other backends! | ||
|
|
||
| Happy coding! | ||
|
|
||
| ## Bonus: more billion row data generation | ||
|
|
||
| While we're here, I'll share the code I've used in the past to generate a | ||
| billion rows of random data: | ||
|
|
||
| ```{.python} | ||
| import ibis | ||
| con = ibis.connect("duckdb://data.ddb") | ||
| ROWS = 1_000_000_000 | ||
| sql_str = "" | ||
| sql_str += "select\n" | ||
| for c in list(map(chr, range(ord("a"), ord("z") + 1))): | ||
| sql_str += f" random() as {c},\n" | ||
| sql_str += f"from generate_series(1, {ROWS})" | ||
| t = con.sql(sql_str) | ||
| con.create_table("billion", t, overwrite=True) | ||
| ``` | ||
|
|
||
| Nowadays I'd convert that to an Ibis expression: | ||
|
|
||
| :::{.callout-note} | ||
| This is a slightly different result with a monotonic index column, but I prefer | ||
| it anyway. You could drop that column or adjust the expression. | ||
| ::: | ||
|
|
||
| ```{.python} | ||
| import ibis | ||
| con = ibis.connect("duckdb://data.ddb") | ||
| ROWS = 1_000_000_000 | ||
| t = ( | ||
| ibis.range(ROWS) | ||
| .unnest() | ||
| .name("index") | ||
| .as_table() | ||
| .mutate(**{c: ibis.random() for c in list(map(chr, range(ord("a"), ord("z") + 1)))}) | ||
| ) | ||
| con.create_table("billion", t, overwrite=True) | ||
| ``` | ||
|
|
||
| But if you do need to construct a programmatic SQL string, it's cool that you | ||
| can! |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,201 @@ | ||
| --- | ||
| title: "Geospatial analysis with Ibis and DuckDB (redux)" | ||
| author: Naty Clementi and Gil Forsyth | ||
| date: 2024-01-16 | ||
| categories: | ||
| - blog | ||
| - duckdb | ||
| - geospatial | ||
| execute: | ||
| freeze: false | ||
| --- | ||
|
|
||
| Spatial Dev Guru wrote a great [tutorial](https://spatial-dev.guru/2023/12/09/geospatial-analysis-using-duckdb/) | ||
| that walks you through a step-by-step geospatial analysis of bike sharing data using DuckDB. | ||
|
|
||
| Ibis has support for all the geospatial functions used on the tutorial, and we | ||
| decided to replicate it and share it with you. | ||
|
|
||
| ## Installation | ||
|
|
||
| Install Ibis with the dependencies needed to work with geospatial data using DuckDB: | ||
|
|
||
| ```bash | ||
| $ pip install 'ibis-framework[duckdb,geospatial]' | ||
| ``` | ||
|
|
||
|
|
||
| ## Data | ||
|
|
||
| The parquet file used in the original tutorial is available at | ||
| https://github.com/iamgeoknight/common_datasets/tree/main/parquet. The original | ||
| data is also available from the citibike | ||
| [source](https://s3.amazonaws.com/tripdata/index.html) but as a `.csv` file. | ||
|
|
||
| ```{python} | ||
| from pathlib import Path | ||
| import tarfile | ||
| from urllib.request import urlretrieve | ||
| # Download data | ||
| url = "https://github.com/iamgeoknight/common_datasets/raw/main/parquet/202003-citibike-tripdata.tar.xz" | ||
| tar_path = Path("202003-citibike-tripdata.tar.xz") | ||
| parquet_path = Path("202003-citibike-tripdata.parquet") | ||
| if not tar_path.exists(): | ||
| urlretrieve(url, tar_path) | ||
| if not parquet_path.exists(): | ||
| with tarfile.open(tar_path, "r:xz") as t: | ||
| t.extract("202003-citibike-tripdata.parquet") | ||
| ``` | ||
|
|
||
| Now that we have the data, we import Ibis and turn on the interactive mode, to | ||
| easily explore the output of our queries. | ||
|
|
||
| ```{python} | ||
| from ibis.interactive import * | ||
| ``` | ||
|
|
||
| ## Let's get started | ||
|
|
||
| Because this dataset does not contain any geometries, we have to load the spatial | ||
| extension. If the dataset included any geometry columns, Ibis is smart enough to | ||
| load the extension for us upon reading the data. | ||
|
|
||
| ```{python} | ||
| con = ibis.duckdb.connect("biketrip.ddb") | ||
| con.load_extension("spatial") | ||
| # read data and rename columns to use snake case | ||
| biketrip = con.read_parquet("202003-citibike-tripdata.parquet").rename("snake_case") | ||
| biketrip | ||
| ``` | ||
|
|
||
| We have the information about the longitude and latitude for start and end stations, | ||
| to create geometry points and put the spatial features to use. | ||
|
|
||
| ## Create bike trip table | ||
|
|
||
| In the original tutorial, Spatial Dev Guru creates a table with transformed | ||
| "Pickup" and "Dropoff" points. In DuckDB the `st_transform` function takes by default | ||
| points as `YX` (latitude/longitude) while in Ibis, we assume data in the form | ||
| `XY` (longitude/latitude) to be consistent with PostGIS and Geopandas. | ||
|
|
||
| ```{python} | ||
| # Notice longitude/latitude order | ||
| pickup = _.start_station_longitude.point(_.start_station_latitude) | ||
| dropoff = _.end_station_longitude.point(_.end_station_latitude) | ||
| # convert is the equivalent of `st_transform` | ||
| biketrip = biketrip.mutate( | ||
| pickup_point=pickup.convert("EPSG:4326", "EPSG:3857"), | ||
| dropoff_point=dropoff.convert("EPSG:4326", "EPSG:3857"), | ||
| ) | ||
| biketrip[["pickup_point", "dropoff_point"]] | ||
| ``` | ||
|
|
||
| Using `mutate` we add two new columns to our `biketrip` table with transformed | ||
| spatial points for pickup and dropoff locations, that are in the Web Mercator projection ([EPSG:3857](https://epsg.io/3857)). | ||
|
|
||
| ## Identify popular starts and end stations | ||
|
|
||
| The following queries retrieve a list of bike start and end stations with their respective trip count in descending order. | ||
|
|
||
| **Top 10 start stations by trip count** | ||
|
|
||
| ```{python} | ||
| biketrip.group_by(biketrip.start_station_name).agg(trips=ibis._.count()).order_by( | ||
| ibis.desc("trips") | ||
| ) | ||
| ``` | ||
|
|
||
| Similarly, in Ibis you can use the [`topk`](https://ibis-project.org/tutorials/ibis-for-sql-users#top-k-operations) operation: | ||
|
|
||
| ```{python} | ||
| biketrip.start_station_name.topk(10) | ||
| ``` | ||
|
|
||
| **Top 10 end stations by trip count** | ||
|
|
||
| ```{python} | ||
| biketrip.end_station_name.topk(10) | ||
| ``` | ||
|
|
||
| ## Explore trip patterns by user type | ||
|
|
||
| We can also calculate the average trip duration and distance traveled for each | ||
| user type. According to the [data dictionary](https://ride.citibikenyc.com/system-data), user type can be "customer" or "subscriber" where: | ||
|
|
||
| - Customer = 24-hour pass or 3-day pass user | ||
| - Subscriber = Annual Member | ||
|
|
||
| ```{python} | ||
| biketrip.group_by(_.usertype).aggregate( | ||
| avg_duration=_.tripduration.mean(), | ||
| avg_distance=_.pickup_point.distance(_.dropoff_point).mean() | ||
| ) | ||
| ``` | ||
|
|
||
| ## Analyzing efficiency: trip duration vs linear distance | ||
|
|
||
| The original tutorial defines `efficiency_ratio` as the `trip_duration` / `linear_distance`, where a higher efficiency ratio could mean a more direct | ||
| route or faster travel times. | ||
|
|
||
| ```{python} | ||
| # linear distance | ||
| trip_distance = biketrip.pickup_point.distance(biketrip.dropoff_point) | ||
| biketrip = biketrip.mutate( | ||
| linear_distance=trip_distance, | ||
| efficiency_ratio=_.tripduration / trip_distance, | ||
| ) | ||
| biketrip[["pickup_point", "dropoff_point", "linear_distance", "efficiency_ratio"]] | ||
| ``` | ||
|
|
||
| Let's take take a look at the table in descending order for the `linear_distance`, for trips that are longer than 0 meters. | ||
|
|
||
| ```{python} | ||
| biketrip.filter(_.linear_distance > 0).order_by(ibis.desc("linear_distance")) | ||
| ``` | ||
|
|
||
| ## Analyzing bike trips within a 500 meters radius | ||
|
|
||
| In the original tutorial, the author choses a point (first point on the table), | ||
| and it creates a buffer of 500 m radius around it. In our table we already have | ||
| the point in meters, since we converted them in a previous query. | ||
|
|
||
| The following query shows all the bike trips whose pickup point falls within a | ||
| 500 meter radius of the first point of the table with `long=-74.00552427` and | ||
| `lat=40.71146364`. | ||
|
|
||
| ```{python} | ||
| # grab the first row of the data | ||
| first_point = biketrip.limit(1) | ||
| trips_within_500 = biketrip.filter( | ||
| _.pickup_point.within(first_point.select(_.pickup_point.buffer(500)).to_array()) | ||
| ) | ||
| trips_within_500 | ||
| ``` | ||
|
|
||
| ## Acknowledgements and resources | ||
|
|
||
| Thank you to [Spatial Dev Guru](https://spatial-dev.guru/), for the amazing | ||
| tutorial showcasing DuckDB spatial features. It was fun to replicate the tutorial | ||
| using Ibis. | ||
|
|
||
| If you are interested in learning more about Ibis-DuckDB geospatial support, | ||
| here is another blog post [bis + DuckDB geospatial: a match made on Earth](https://ibis-project.org/posts/ibis-duckdb-geospatial/). | ||
|
|
||
| Here are some resources to learn more about Ibis: | ||
|
|
||
| - [Ibis Docs](https://ibis-project.org/) | ||
| - [Ibis GitHub](https://github.com/ibis-project/ibis) | ||
|
|
||
| Chat with us on Zulip: | ||
|
|
||
| - [Ibis Zulip Chat](https://ibis-project.zulipchat.com/) |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,27 @@ | ||
| --- | ||
| title: "Building scalable data pipelines with Kedro" | ||
| author: "Cody" | ||
| date: "2024-01-31" | ||
| categories: | ||
| - blog | ||
| - kedro | ||
| - data engineering | ||
| --- | ||
|
|
||
| # Overview | ||
|
|
||
| [Kedro](https://kedro.org) is a toolbox for production-ready data science. It is | ||
| an open-source Python framework like Ibis, and together you can bring the | ||
| portability and scale of Ibis to the production-ready pipelines of Kedro. | ||
|
|
||
| > In your ~~Kedro~~ data journey, have you ever... | ||
| > | ||
| > ...slurped up large amounts of data into memory, instead of pushing execution down to the source database/engine? | ||
| > | ||
| > ...prototyped a node in pandas, and then rewritten it in PySpark/Snowpark/some other native dataframe API? | ||
| > | ||
| > ...implemented a proof-of-concept solution in 3-4 months on data extracts, and then struggled massively when you needed to move to running against the production databases and scale out? | ||
| > ... | ||
| If so, [read the full article on the Kedro | ||
| blog](https://kedro.org/blog/building-scalable-data-pipelines-with-kedro-and-ibis)! |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,43 @@ | ||
| --- | ||
| title: "Announcing Zulip for Ibis community chat" | ||
| author: "Ibis team" | ||
| date: "2024-01-04" | ||
| categories: | ||
| - blog | ||
| - chat | ||
| - community | ||
| --- | ||
|
|
||
| ## Overview | ||
|
|
||
| The Ibis project has moved to Zulip for its community chat! We've been testing | ||
| it out for a few months and are happy with the results. From the [Zulip | ||
| repository's README](https://github.com/zulip/zulip): | ||
|
|
||
| > Zulip is an open-source team collaboration tool with unique topic-based | ||
| > threading that combines the best of email and chat to make remote work | ||
| > productive and delightful. Fortune 500 companies, leading open source projects, | ||
| > and thousands of other organizations use Zulip every day. Zulip is the only | ||
| > modern team chat app that is designed for both live and asynchronous | ||
| > conversations. | ||
| > | ||
| > Zulip is built by a distributed community of developers from | ||
| > all around the world, with 74+ people who have each contributed 100+ commits. | ||
| > With over 1000 contributors merging over 500 commits a month, Zulip is the | ||
| > largest and fastest growing open source team chat project. | ||
| ## Benefits for Ibis users | ||
|
|
||
| GitHub issues remain the source of truth for work item tracking and bug reports, | ||
| while Zulip offers a more interactive chat experience. This is useful when | ||
| you're not sure if you've found a bug or just need help with something. It's | ||
| also a great place to ask questions about Ibis or get help with your code. | ||
|
|
||
| Zulip splits conversations into streams (like channels in Slack or Teams), but | ||
| uniquely requires each individual conversation to also have a topic. This makes | ||
| it easy to follow along with conversations that are relevant to you, and to find | ||
| conversations that you've participated in. | ||
|
|
||
| ## Next steps | ||
|
|
||
| [Join us on Zulip and introduce yourself!](https://ibis-project.zulipchat.com/) |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,57 +1,91 @@ | ||
| --- | ||
| title: "Operation support matrix" | ||
| format: dashboard | ||
| hide: | ||
| - toc | ||
| --- | ||
|
|
||
| ```{python} | ||
| #| echo: false | ||
| !python ../gen_matrix.py | ||
| ``` | ||
|
|
||
| ```{python} | ||
| #| echo: false | ||
| import pandas as pd | ||
| support_matrix = pd.read_csv("./backends/raw_support_matrix.csv") | ||
| support_matrix = support_matrix.assign( | ||
| Category=support_matrix.Operation.map(lambda op: op.rsplit(".", 1)[0].rsplit(".", 1)[-1]), | ||
| Operation=support_matrix.Operation.map(lambda op: op.rsplit(".", 1)[-1]), | ||
| ).set_index(["Category", "Operation"]) | ||
| all_visible_ops_count = len(support_matrix) | ||
| coverage = pd.Index( | ||
| support_matrix.sum() | ||
| .map(lambda n: f"{n} ({round(100 * n / all_visible_ops_count)}%)") | ||
| .T | ||
| ) | ||
| support_matrix.columns = pd.MultiIndex.from_tuples( | ||
| list(zip(support_matrix.columns, coverage)), names=("Backend", "API coverage") | ||
| ) | ||
| support_matrix = support_matrix.replace({True: "✔", False: "🚫"}) | ||
| ``` | ||
|
|
||
| ## {height=25%} | ||
|
|
||
| ::: {.card title="Welcome to the operation support matrix!"} | ||
|
|
||
| This is a [Quarto dashboard](https://quarto.org/docs/dashboards/) that shows | ||
| the operations each backend supports. | ||
|
|
||
| Due to differences in SQL dialects and upstream support for different | ||
| operations in different backends, support for the full breadth of the Ibis API | ||
| varies. | ||
|
|
||
| ::: {.callout-tip} | ||
| Backends with low coverage are good places to start contributing! | ||
|
|
||
| Each backend implements operations differently, but this is usually very similar to other backends. If you want to start contributing to ibis, it's a good idea to start by adding missing operations to backends that have low operation coverage. | ||
| ::: | ||
|
|
||
| ::: | ||
|
|
||
| ### {width=25%} | ||
|
|
||
| ```{python} | ||
| #| content: valuebox | ||
| #| title: "Number of backends" | ||
| import ibis | ||
| dict( | ||
| value=len(ibis.util.backend_entry_points()), | ||
| color="info", | ||
| icon="signpost-split-fill", | ||
| ) | ||
| ``` | ||
|
|
||
| ### {width=25%} | ||
|
|
||
| ```{python} | ||
| #| content: valuebox | ||
| #| title: "Number of SQL backends" | ||
| import importlib | ||
| from ibis.backends.base.sql import BaseSQLBackend | ||
| sql_backends = sum( | ||
| issubclass( | ||
| importlib.import_module(f"ibis.backends.{entry_point.name}").Backend, | ||
| BaseSQLBackend | ||
| ) | ||
| for entry_point in ibis.util.backend_entry_points() | ||
| ) | ||
| dict(value=sql_backends, color="green", icon="database") | ||
| ``` | ||
|
|
||
| ## {height=70%} | ||
|
|
||
| ```{python} | ||
| from itables import show | ||
| show(support_matrix, ordering=False, paging=False, buttons=["copy", "excel", "csv"]) | ||
| ``` |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,2 +1,7 @@ | ||
| /*-- scss:defaults --*/ | ||
| $code-color: #c2d94c; | ||
| $code-bg: #2b2b2b; | ||
|
|
||
| thead.tableFloatingHeaderOriginal { | ||
| background-color: rgb(47, 47, 47); | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,2 +1,2 @@ | ||
| SELECT | ||
| toDate(parseDateTimeBestEffort('2009-05-17T12:34:56')) AS "TimestampTruncate(datetime.datetime(2009, 5, 17, 12, 34, 56))" |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,2 +1,2 @@ | ||
| SELECT | ||
| toStartOfHour(parseDateTimeBestEffort('2009-05-17T12:34:56')) AS "TimestampTruncate(datetime.datetime(2009, 5, 17, 12, 34, 56))" |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,2 +1,2 @@ | ||
| SELECT | ||
| toStartOfMinute(parseDateTimeBestEffort('2009-05-17T12:34:56')) AS "TimestampTruncate(datetime.datetime(2009, 5, 17, 12, 34, 56))" |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,2 +1,2 @@ | ||
| SELECT | ||
| toStartOfMinute(parseDateTimeBestEffort('2009-05-17T12:34:56')) AS "TimestampTruncate(datetime.datetime(2009, 5, 17, 12, 34, 56))" |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,2 +1,2 @@ | ||
| SELECT | ||
| toMonday(parseDateTimeBestEffort('2009-05-17T12:34:56')) AS "TimestampTruncate(datetime.datetime(2009, 5, 17, 12, 34, 56))" |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,2 +1,2 @@ | ||
| SELECT | ||
| toStartOfYear(parseDateTimeBestEffort('2009-05-17T12:34:56')) AS "TimestampTruncate(datetime.datetime(2009, 5, 17, 12, 34, 56))" |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,2 +1,2 @@ | ||
| SELECT | ||
| parseDateTime64BestEffort('2015-01-01T12:34:56.789321', 6) AS "datetime.datetime(2015, 1, 1, 12, 34, 56, 789321)" |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,2 +1,2 @@ | ||
| SELECT | ||
| parseDateTime64BestEffort('2015-01-01T12:34:56.789321+00:00', 6, 'UTC') AS "datetime.datetime(2015, 1, 1, 12, 34, 56, 789321, tzinfo=tzutc())" |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,2 +1,2 @@ | ||
| SELECT | ||
| parseDateTime64BestEffort('2015-01-01T12:34:56.789000', 3) AS "datetime.datetime(2015, 1, 1, 12, 34, 56, 789000)" |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,2 +1,2 @@ | ||
| SELECT | ||
| parseDateTime64BestEffort('2015-01-01T12:34:56.789000+00:00', 3, 'UTC') AS "datetime.datetime(2015, 1, 1, 12, 34, 56, 789000, tzinfo=tzutc())" |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,2 +1,2 @@ | ||
| SELECT | ||
| parseDateTimeBestEffort('2015-01-01T12:34:56') AS "datetime.datetime(2015, 1, 1, 12, 34, 56)" |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,2 +1,2 @@ | ||
| SELECT | ||
| parseDateTimeBestEffort('2015-01-01T12:34:56') AS "datetime.datetime(2015, 1, 1, 12, 34, 56)" |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,2 +1,2 @@ | ||
| SELECT | ||
| parseDateTimeBestEffort('2015-01-01T12:34:56') AS "datetime.datetime(2015, 1, 1, 12, 34, 56)" |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| SELECT | ||
| CAST('1.0' AS REAL) AS "Cast('1.0', float32)" |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| SELECT | ||
| CAST('1.0' AS DOUBLE) AS "Cast('1.0', float64)" |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,3 +1,3 @@ | ||
| SELECT | ||
| ST_DWITHIN(t0.geom, t0.geom, CAST(3.0 AS DOUBLE)) AS tmp | ||
| FROM t AS t0 |