Skip to content
View andrewwxy's full-sized avatar

Block or report andrewwxy

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Stars

DSci@cli #ddj

data driven journos
270 repositories

Extremely fast Query Engine for DataFrames, written in Rust

Rust 37,682 2,676 Updated Mar 11, 2026

Open-source scientific and technical publishing system built on Pandoc.

JavaScript 5,382 416 Updated Mar 10, 2026

An self-improving embodied conversational agent seamlessly integrated into the operating system to automate our daily tasks.

Python 1,756 192 Updated Sep 9, 2024

Search and browse documents and data; find the people and companies you look for.

JavaScript 4 Updated Jun 12, 2023

All files of thecleverprogrammer.com

Jupyter Notebook 126 188 Updated Jul 7, 2025

A curated list of Polars talks, tools, examples & articles. Contributions welcome !

1,066 52 Updated Mar 9, 2026

Data validation using Python type hints

Python 27,222 2,487 Updated Mar 10, 2026

🎯 Personal data science and machine learning toolbox

Python 366 74 Updated Feb 4, 2020

Breaking Into Data Handbook

367 55 Updated Jun 29, 2024

Data Engineering Zoomcamp is a free 9-week course on building production-ready data pipelines. The next cohort starts in January 2026. Join the course here πŸ‘‡πŸΌ

Jupyter Notebook 39,007 7,839 Updated Mar 5, 2026

qpdf: A content-preserving PDF document transformer

C++ 4,819 361 Updated Mar 9, 2026

🌊 Online machine learning in Python

Python 5,746 609 Updated Mar 9, 2026

Financial datasets for LLMs πŸ§ͺ

Python 409 66 Updated May 27, 2024

Simple VTXXX-compatible linux terminal emulator

Python 721 117 Updated Sep 2, 2025

OCR, layout analysis, reading order, table recognition in 90+ languages

Python 19,415 1,332 Updated Mar 1, 2026

ACLED v5 (1997-2014) Conflict Dataset (http://www.acleddata.com/data/version-5-data-1997-2014) Visualization

JavaScript 1 Updated Apr 12, 2023

An open source DevOps tool from the CNCF for packaging and versioning AI/ML models, datasets, code, and configuration into an OCI Artifact.

Go 1,308 168 Updated Mar 9, 2026

πŸš€ πŸ’Έ Easily build, backtest and deploy your algo in just a few lines of code. Trade stocks, cryptos, and forex across exchanges w/ one package.

Python 2,417 309 Updated Dec 30, 2024

Introduction to the Command Line for Genomics

67 197 Updated Mar 10, 2026

Data Cleaning with OpenRefine for Ecologists

29 111 Updated Mar 10, 2026

OpenRefine for Social Science Data

25 47 Updated Mar 10, 2026

Label, clean and enrich text datasets with LLMs.

Python 2,303 159 Updated Mar 5, 2025

TerminusDB is a distributed, collaborative database designed for building, sharing, versioning, and reasoning on structured data.

Prolog 3,213 130 Updated Mar 9, 2026

An Open-Source Package for Textual Adversarial Attack.

Python 773 130 Updated Jul 20, 2023

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

Python 44,571 16,638 Updated Mar 10, 2026

Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.

Python 2,939 247 Updated Mar 4, 2026

Apache Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.

Jupyter Notebook 2,425 175 Updated Mar 8, 2026

Python library for building highly effective data science workflows

Python 947 71 Updated Jul 20, 2023

Synthetic data generators for tabular and time-series data

Jupyter Notebook 1,613 257 Updated Mar 2, 2026

Practical active learning in python

Python 192 18 Updated Sep 22, 2022