# **Jacob Nelson**

PO Box 95415 Seattle, WA 98145 (206) 659-9683 jacob@jfet.net
Full CV and research statement available at:
 http://nelsonje.github.io

### **EMPLOYMENT**

- Computer Science and Engineering, University of Washington, Seattle, WA. Postdoctoral Research Associate, January 2015–present. Research Assistant, September 2006–December 2014.
  - Led team of 3 grad students in building open-source distributed computing system, Grappa (my thesis work).
    - C++ library that presents a latency-tolerant distributed shared memory abstraction to the programmer.
    - Aggregating network layer provides >10x small-message performance vs. native RDMA network.
    - Memory-efficient user-level threading layer supports thousands of concurrent threads per core in cluster.
    - Beat Spark performance by 10x on some benchmarks.
    - Users include researchers at Intel, Cray, Pacific Northwest National Laboratory, UW-Tacoma, CU-Boulder.
    - Won best paper award at USENIX ATC 2015.
  - Co-advised grad student on FPGA-based embedded neural network accelerator project, SNNAP.
    - Idea was to save energy executing functions that don't require perfect correctness.
    - We built a coherent shared-memory interface for communication between ARM CPU and accelerator.
    - 3.8x performance improvement, 2.8x energy savings compared to CPU-only benchmarks.
  - · Currently co-advising two grad students on in-network computation project.
    - Idea is to augment datacenter switches with restricted compute capability for stateful user code.
    - Reduces network latency, trip count for apps like key/value stores, machine learning parameter servers.
    - Prototype implementation uses Cavium Octeon II network processors, XPliant programmable switch.
  - Other recent projects include a new distributed sparse linear solver using Grappa, and a patent analytics framework using Hadoop and Grappa.
  - Mentored a total of 7 undergraduates and 5 graduate students, leading to publications.
  - Built and supported 40-node InfiniBand cluster for multiple research groups.
- ♦ Cray, Seattle, WA.

Consultant, September 2015-present.

• Porting distributed graph database to run on Grappa for portability, enabling new product.

Intern, Summer 2009.

- · Runtime work for the Chapel parallel programming language.
  - Fixed signal safety and termination bugs in task scheduling layer.
  - This enabled standard performance analysis tools to be used with this new language.
- ♦ Konvac, Seattle, WA.

Co-founder, January 2013–January 2014.

- Startup exploring commercial applications of Grappa. Led to open-source effort at grappa.io.
- ♦ Google, Mountain View, CA., Seattle, WA.

Intern, Summer 2010, October 2010-September 2011 (part time).

- Analyzed performance of early version of Google Compute Engine.
  - Explored multithreaded, shared memory performance using Parsec benchmarks and sampling profiler.

 Compared performance of KVM/QEMU-based virtual machine monitor with existing container-based environment.

Intern, Summer 2007, Summer 2008.

- Designed and built FPGA-based accelerator hardware.
  - Designed microarchitecture, prototype for FPGA using Bluespec and Verilog.
  - Supervised work of PCB/logic design contract firm, ensuring interfaces matched specification and were debuggable.
  - Goal was ~10x better performance per watt per dollar, or performance per square foot per dollar.
- Amazon, Seattle, WA.

Software Development Engineer, 2005–2006.

- Implemented new supply chain optimization algorithms, reducing cost to ship orders.
- · Some web frontend work.
- ♦ XKL, Redmond, WA.

Member Technical Staff, Hardware, 2001–2004.

- Designed control processors for for high-end networking hardware, including large core Internet routers and dense wavelength-division multiplexers.
  - Spearheaded new processor design, enabling new product line that is still being sold.
  - Led CPU team, systems and network administration team.

### **EDUCATION**

♦ University of Washington, Seattle, WA.

Ph.D. in Computer Science, December 2014.

Thesis: Latency-Tolerant Distributed Shared Memory For Data-Intensive Applications.

M.S. in Computer Science, June 2009.

Pacific Lutheran University, Tacoma, WA.

B.S. in Computer Engineering and Math, May 2000.

## **AWARDS AND HONORS**

- ♦ 2015 USENIX ATC Best Paper Award
- ♦ Paper from MICRO 2013 invited for fast-track inclusion in ACM TOCS
- 2011 HPC Advisory Council University Award
- 2008 Bob Bandes Memorial Award for Excellence in Teaching

### SELECTED PUBLICATIONS

- 1. Jacob Nelson, Brandon Holt, Brandon Myers, Preston Briggs, Luis Ceze, Simon Kahan, Mark Oskin. Latency-Tolerant Software Distributed Shared Memory. *USENIX Annual Technical Conference*, July 2015. Best Paper award.
- 2. Rob F. Van Der Wijngaart, Abdullah Kayi, Jeff R. Hammond, Timothy G. Mattson, Gabriele Jost, Tom St. John, Srinivas Sridharan, John Abercrombie, Jacob Nelson. Comparing Runtime Systems with Exascale Ambitions Using the Parallel Research Kernels. To appear in *International Conference on Supercomputing (ISC)*, June 2016.
- Golnoosh Farnadi, Zeinab Mahdavifar, Ivan Keller, Jacob Nelson, Ankur Teredesai, Marie-Francine Moens, Martine De Cock. Scalable Adaptive Label Propagation in Grappa. Special Session on Intelligent Mining, IEEE Big Data 2015, October 2015.

- 4. Rob F. Van der Wijngaart, Srinivas Sridharan, Abdullah Kayi, Gabriele Jost, Jeff R. Hammond, Timothy G. Mattson, Jacob Nelson. Using the Parallel Research Kernels to Study PGAS Runtimes. *International Conference on PGAS Programming Models (PGAS)*, September 2015.
- 5. Vincent T. Lee, Jacob Nelson, Mark Oskin, Luis Ceze. A 10G NetFPGA Prototype for In-Network Aggregation. Workshop on Architectural Research Prototyping (WARP w/ISCA), June 2015.
- 6. Thierry Moreau, Mark Wyse, Jacob Nelson, Adrian Sampson, Hadi Esmaeilzadeh, Luis Ceze, Mark Oskin. SNNAP: Approximate Computing on Programmable SoCs via Neural Acceleration. *International Symposium on High-Performance Computer Architecture (HPCA)*, February 2015.
- 7. Brandon Myers, Dan Halperin, Jacob Nelson, Mark Oskin, and Bill Howe. Radish: Compiling Efficient Query Plans for Distributed Shared Memory. UW CSE Tech Report 14-10-01, 2014.
- 8. Jacob Nelson, Brandon Holt, Brandon Myers, Preston Briggs, Luis Ceze, Simon Kahan, Mark Oskin. Grappa: A Latency-Tolerant Runtime for Large-Scale Irregular Applications. *International Workshop on Rack-Scale Computing (WRSC w/EuroSys)*, April 2014.
- 9. Adrian Sampson, Jacob Nelson, Karin Strauss, Luis Ceze. Approximate Storage in Solid-State Memories. *International Symposium on Microarchitecture (MICRO)*, December 2013. Selected to appear as an expanded version in ACM TOCS.
- Brandon Holt, Jacob Nelson, Brandon Myers, Preston Briggs, Luis Ceze, Simon Kahan, Mark Oskin. Flat Combining Synchronized Global Data Structures. *International Conference on PGAS Programming Models (PGAS)*, October 2013.
- 11. Jacob Nelson, Brandon Holt, Brandon Myers, Preston Briggs, Luis Ceze, Simon Kahan, Mark Oskin. Pomace: A Grappa for Non-Volatile Memory. *Non-Volatile Memories Workshop*, March 2013.
- 12. Jacob Nelson, Brandon Myers, A. H. Hunter, Preston Briggs, Luis Ceze, Carl Ebeling, Dan Grossman, Simon Kahan, Mark Oskin. Crunching Large Graphs With Commodity Processors. *USENIX Hot Topics in Parallelism (HotPar)*, June 2011.
- Jacob Nelson, Adrian Sampson, and Luis Ceze. Dense Approximate Storage in Phase-Change Memory. Ideas and Perspectives session, International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), March 2011.
- 14. Joseph Devietti, Jacob Nelson, Tom Bergan, Luis Ceze, Dan Grossman. RCDC: A Relaxed Consistency Deterministic Computer. *International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)*, March 2011.
- 15. Jacob Nelson, Luis Ceze. Dynamic Concurrency Discovery for Very Large Windows of Execution. Workshop on Parallel Execution of Sequential Programs on Multi-core Architectures (PESPMA w/ ISCA), June 2009.

## **PATENTS**

♦ Luis Ceze, Tom Bergan, Joseph Devietti, Dan Grossman, Jacob Nelson. Systems and Methods for Providing Deterministic Execution. US9146746, issued September 2015.

## OTHER

- Taught and TAed grad and undergrad classes in computer architecture, digital logic design.
- Program committee member, Workshop on Irregular Applications: Architectures and Algorithms, 2012 and 2015.
- Member of Industrial Advisory Board for Pacific Lutheran University's Department of Computer Science and Computer Engineering.
- ⋄ Member of ACM, IEEE, USENIX.