Currently, I'm a PhD student at Berkeley Sky Computing Lab for machine learning system and cloud infrastructures. I am advised by Prof. Joseph Gonzalez.
My latest focus is building an end to end stack for LLM inference on your own infrastructure.
I previously work on Model Serving System @anyscale.
- Ray takes your Python code and scale it to thousands of cores.
- Ray Serve empowers data scientists to own their end-to-end inference APIs.
Before Anyscale, I was a student researcher @ucbrise:
- SoCC 2020: InferLine: ML Inference Pipeline Composition Framework studies how to optimize model serving pipelines.
- VLDB 2020: Towards Scalable Dataframe Systems formalizes Pandas DataFrame.
- The OoO VLIW JIT Compiler for GPU Inference tries to multiplex many kernels on the same GPU.
Reach out to me: simon.mo at hey.com