- Hong Kong
- https://x.com/Andrew_WXY
DSci@cli #ddj
Extremely fast Query Engine for DataFrames, written in Rust
Open-source scientific and technical publishing system built on Pandoc.
An self-improving embodied conversational agent seamlessly integrated into the operating system to automate our daily tasks.
Search and browse documents and data; find the people and companies you look for.
All files of thecleverprogrammer.com
A curated list of Polars talks, tools, examples & articles. Contributions welcome !
Data validation using Python type hints
π― Personal data science and machine learning toolbox
Data Engineering Zoomcamp is a free 9-week course on building production-ready data pipelines. The next cohort starts in January 2026. Join the course here ππΌ
OCR, layout analysis, reading order, table recognition in 90+ languages
ACLED v5 (1997-2014) Conflict Dataset (http://www.acleddata.com/data/version-5-data-1997-2014) Visualization
An open source DevOps tool from the CNCF for packaging and versioning AI/ML models, datasets, code, and configuration into an OCI Artifact.
π πΈ Easily build, backtest and deploy your algo in just a few lines of code. Trade stocks, cryptos, and forex across exchanges w/ one package.
Introduction to the Command Line for Genomics
Data Cleaning with OpenRefine for Ecologists
Label, clean and enrich text datasets with LLMs.
TerminusDB is a distributed, collaborative database designed for building, sharing, versioning, and reasoning on structured data.
An Open-Source Package for Textual Adversarial Attack.
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
Apache Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.
Python library for building highly effective data science workflows
Synthetic data generators for tabular and time-series data