Skip to content
View sspaeti's full-sized avatar
🖲️
tinkering
🖲️
tinkering

Sponsoring

@neovim

Organizations

@ssp-data

Block or report sspaeti

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Stars

datanengineering

61 repositories

DuckDB is an analytical in-process SQL database management system

C++ 36,630 3,006 Updated Mar 13, 2026

The Data Engineering Cookbook

Python 14,989 2,697 Updated Jan 17, 2026

The Auron accelerator for distributed computing framework (e.g., Spark) leverages native vectorized execution to accelerate query processing

Rust 1,727 210 Updated Mar 13, 2026

Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀

Python 8,492 603 Updated Mar 1, 2026

Modin: Scale your Pandas workflows by changing a single line of code

Python 10,365 676 Updated Feb 10, 2026

Database connectivity API standard and libraries for Apache Arrow

C# 569 181 Updated Mar 13, 2026

Zero-ETL, infinite possibilities. Live query APIs, code & more with SQL. No DB required.

Go 7,728 327 Updated Mar 10, 2026

data load tool (dlt) is an open source Python library that makes data loading easy 🛠️

Python 5,038 468 Updated Mar 13, 2026

Sling is a CLI tool that extracts data from a source storage/database and loads it in a target storage/database.

Go 814 60 Updated Mar 12, 2026

the portable Python dataframe library

Python 6,451 708 Updated Mar 13, 2026

Dagster Labs' open-source data platform, built with Dagster.

Python 444 41 Updated Mar 10, 2026

The SQL IDE for Your Terminal.

Python 5,857 138 Updated Mar 12, 2026

Apache XTable (incubating) is a cross-table converter for lakehouse table formats that facilitates interoperability across data processing systems and query engines.

Java 1,168 198 Updated Feb 23, 2026

This is a list of links to different freely available learning resources about computer programming, math, and science.

1,867 136 Updated Dec 11, 2025

This is a repo with links to everything you'd ever want to learn about data engineering

Jupyter Notebook 40,488 7,705 Updated Feb 26, 2026

Fastest library to load data from DB to DataFrames in Rust and Python

Rust 2,572 206 Updated Mar 11, 2026

PRQL as a DuckDB extension

C++ 319 9 Updated Sep 22, 2025

This is a the starter workspace for HelloDATA BE.

Python 9 2 Updated Jun 19, 2024

Firefox extension that shows parquet schema when going over GCP cloud storage. Use DuckDB WASM

JavaScript 12 1 Updated Jan 19, 2024

Support for generating modern platforms dynamically with services such as Kafka, Spark, Streamsets, HDFS, ....

Jinja 80 17 Updated Mar 5, 2026

All things awesome related to Dagster!

142 19 Updated Jan 20, 2026

Python SQL Parser and Transpiler

Python 9,021 1,085 Updated Mar 13, 2026

dagster scikit-learn pipeline example.

Python 46 9 Updated Mar 18, 2023

end-to-end data engineering project to get insights from PyPi using python, duckdb, MotherDuck & Evidence

TypeScript 234 36 Updated Mar 12, 2026

Free, simple, and intuitive online database diagram editor and SQL generator.

JavaScript 36,879 2,946 Updated Mar 12, 2026

The property-based testing library for Python

Python 8,495 635 Updated Mar 9, 2026

The best place to learn data engineering. Built and maintained by the data engineering community.

CSS 1,905 230 Updated Jan 31, 2026

Turning PySpark Into a Universal DataFrame API

Python 494 24 Updated Mar 9, 2026