👋 I'm Simon.
Currently, I'm a PhD student at Berkeley Sky Computing Lab for machine learning system and cloud infrastructures. I am advised by Prof. Joseph Gonzalez.
My latest focus is building an end to end stack for LLM inference on your own infrastructure. This work includs
- vLLM runs LLM inference efficiently.
- Conex builds, push, and pull containers fast.
- SkyATC orchestrate LLMs in multi-cloud and scaling them to zero.
I previously work on Model Serving System @anyscale.
- Ray takes your Python code and scale it to thousands of cores.
- Ray Serve empowers data scientists to own their end-to-end inference APIs.
Before Anyscale, I was a undergraduate researcher @ucbrise:
- VLDB 2024: RALF: Accuracy-Aware Scheduling for Feature Store Maintenance proposes feature update in feature store can be a lot more efficient.
- SoCC 2020: InferLine: ML Inference Pipeline Composition Framework studies how to optimize model serving pipelines.
- VLDB 2020: Towards Scalable Dataframe Systems formalizes Pandas DataFrame.
- SysML Workshop @ Neurips 2018: The OoO VLIW JIT Compiler for GPU Inference tries to multiplex many kernels on the same GPU.
Reach out to me: simon.mo at hey.com