Skip to content
View saulshanabrook's full-sized avatar
🏊
🏊

Organizations

@techforelissa @slanglab @jupyterlab @xnd-project @webview-crypto @egraphs-good
Block or Report

Block or report saulshanabrook

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
saulshanabrook/README.md

πŸ‘‹ Hi, my name is Saul Shanabrook. πŸ‘‹

πŸ’ Welcome to my website! πŸ’
πŸ”— Here lies a collection of "internet links." πŸ”—
πŸ—‚ I have helpfully arranged them into categories. πŸ—‚
🍾 I hope you enjoy! 🍾

πŸ“„ Oh also, if you are looking for my resume, here it is πŸ“„

πŸ’Œ Contact πŸ’Œ

Feel free to reach out to me via email, google meet, twitter, mastodon or github.

πŸ’ Nice Things πŸ’

a database of edible plants

nonprofit helping create public access food forests in western MA

an inspiring model of using community control to prevent gentrification and create affordable resident controlled housing

a v. fun video on some great alternative options on community stewardship

a cool tool to help build replacement systems without having to worry about rule order 😱.

🏑 Things I have worked on 🏑

a library to use e-graphs in Python for building expressive DSLs and optimizing code

a project with some friends to find a place to live, do fun things, and try something out

an iOS app I started to help people become better friends with plants near them

a library to use pattern matching and type analysis to build safe DSLs in Python, in order to allow scientific computing libraries to better collaborate and share key abstractions.

provides a friendly isomporphic representation of Python's bytecode objects

an open source data science IDE in your browser. I was a core maintainor for a while and helped on a variety of extensions as well

a python code analysis tool, which helps productionize data science code by building a DAG of python code

πŸ”„ Links to Links πŸ”„

my new blog posts on Github Discussions

my old blog posts on my previous statically generated website

🎭 Talks 🎭

Now that I have this great e-graph library in Python, what extra mechanisms do I need to make it useful in existing Python code?

This talk will go thorugh a few techniques developed and also point to how by bringing in use cases from scientific Python can help drive further theoretic research

EGRAPHS Community - Lightning Talks

November 3rd, 2023: egglog: e-graphs in Python

PyData NYC '23 Lightning Talk

August 1st, 2023: egglog: E-Graphs in Python

The PyData ecosystem is home to one of the largest and most successful open source communities. It's both where most newcomers to data science start and also where cutting edge research takes place. It has been able to support the diverse needs of its users through its decentralized nature, promoting creativity and collaboration.

As the size of data has increased and our compute has moved off of our single CPUs, the nature of libraries has evolved. Whereas in the past client code would generally call out to fast pre-compiled libraries (SciPy, NumPy, etc.), now it often works via calls to a variety of distributed, out-of-core, and specialized compilation and computation backends (PyTorch, Dask, Numba, Ibis, etc.). This means a growing number of libraries do not eagerly execute a computation in the CPython interpreter, but instead optimize and translate it to some other target.

At a high level, we can see this ecosystem as a large decentralized, embedded, domain-specific compiler, translating from high-level user expressions to different low-level primitives. This calls for an exploration of tooling to help enable this translation of programs between different representations, to facilitate the efficient use of code across this distributed ecosystem.

One approach to automating this translation among different representations is the rewriting technique called β€œequality saturation.” This allows us to construct a data structure of equivalent programs (an β€˜e-graph’), and then search that space for a functionally-equivalent program that has desirable characteristics such as improved performance or memory efficiency. Building this translation tooling once can enhance sharing and collaboration between the libraries which use it.

In this talk, Saul Shanabrook goes over how e-graphs work, how they were developed, and different ways they can be used in the PyData ecosystem. Saul also surveys the egglog library, which is one specific tool for using e-graphs in Python.

OpenTeams Technical Talk

Altair is a lovely tool that lets you build up complex interactive charts in Python. Ibis is also a lovely tool that lets you use a Pandas, like API to compose SQL expressions in OmniSci and other backends. By tying them together you can use the familiar syntax of Pandas, combined with the expressive power of Vega and Vega Lite, to visualize large amounts of data stored in OmniSci. This talk will walk through a number of examples of using this pipeline and then go through how it works.

The OmniSci summer sessions

metadsl is a Python framework for writing APIs that are detached from how they are executed. With it we can be framework agnostic definitions of concepts like "arrays" and compile them to backends like Tensorflow or LLVM. In this talk, we will use metadsl to build high performance scientific computing libraries.

PyData Austin 2019

Can the Python data science and scientific computing ecoystem remain in the hands of community open source projects? Or will increasingly complex performance and hardware requirements leave room only for vertically integrated corporate sponsored projects?

PyData New York 2019

Efficient array computing is required to continue advances in fields like IoT and AI. We demonstrate a system, uarray, that does array computation generically and targets different backends. We rely on a Mathematics of Arrays, a theory of shapes and indexing, to reduce array expressions. As a result, temporary arrays and unneeded calculations are eliminated leading to minimal memory and CPU usage.

PyData Washington DC 2018

source @ github.com/saulshanabrook/saulshanabrook

Pinned Loading

  1. jupyterlab/jupyterlab jupyterlab/jupyterlab Public

    JupyterLab computational environment.

    TypeScript 14k 3.2k

  2. Quansight/ibis-vega-transform Quansight/ibis-vega-transform Public

    @vega transforms with @ibis-project expressions

    Python 29 7

  3. LineaLabs/lineapy LineaLabs/lineapy Public

    Move fast from data science prototype to pipeline. Capture, analyze, and transform messy notebooks into data pipelines with just two lines of code.

    Jupyter Notebook 663 57

  4. saulshanabrook saulshanabrook Public

    personal website and discussions

    HTML 6

  5. data-apis/python-record-api data-apis/python-record-api Public

    Inferring Python API signatures from tracing usage.

    Python 75 6