Skip to content

hadarsharon/grizzlys

Repository files navigation

grizzlys


Code style: Ruff Linting: Ruff pre-commit

grizzlys: User-friendly Python DataFrames powered by Julia

grizzlys is a Python package that provides a native interface on top of Julia's popular DataFrames.jl package.

As a user-friendly alternative to existing Python packages such as pandas and polars, it is designed to be a convenient & easy to use DataFrames tool for data analysts, data engineers and data scientists alike, while still providing high performance and abstractions, thanks to Julia's high-performance computing capabilities.

Why you might consider using grizzlys

βœ… You are transitioning into Python from a Julia or R programming background

βœ… You are accustomed to working with Jupyter notebooks (or a REPL) and performing exploratory data analysis (EDA) on-the-fly

βœ… You need a quick-and-dirty data wrangling tool that provides readymade macros and convenience functions out of the box

βœ… You work with statistics or linear algebra often and require a wide range of statistical/algebraic functions to be well-integrated with your DataFrames

What is grizzlys (currently) NOT well-suited for

❌ Larger-than-memory datasets - grizzlys' current implementation relies on data being stored in-memory, and therefore it is not a good choice if you work with datasets that don't fit in your machine's RAM.

For such cases, using Polars or Dask DataFrames would be a much better choice as of now.

❌ Lazy Evaluation - Similar to the above, grizzlys is currently designed to be fully eager, which means it always immediately executes your code, as opposed to building a task/computation graph or thereabout and delaying execution until it's needed.

❌ Backwards compatibility - grizzlys is based on a relatively new programming language in Julia, and is developed using an advanced version of Python, with little regard to end-of-life versions or any compatibility with Python 2.7, for example.

You should therefore not rely on grizzlys for integrations with very old code or any other legacy/deprecated tools and implementations.

❌ Best-in-class Performance - Though Julia is widely considered a very high-performance language (it is actually a major reason why it's used under the hood here), grizzlys is still a work-in-progress (WIP) and therefore does not currently aim to compete with, or outperform, other high-performance DataFrame libraries, such as Polars (written in Rust) or Modin (Multi-threaded pandas).

This, of course, might no longer be a limitation in the future, as grizzlys will have undergone optimizations and maturation.


Go to Top