# Xiaowei Ren

Senior Deep Learning Architect Cell: (+1)408-628-8965

NVIDIA Corporation Homepage: https://ericrxw.github.io/xiaoweiren/

2788 San Tomas Expy Email: xren@nvidia.com

Santa Clara, CA 95051 renxiaowei66@gmail.com

## Education

Sept. 2015 - Oct. 2020 PhD in Computer Engineering University of British Columbia, Canada

Dissertation: Efficient Synchronization Mechanisms for Scalable GPU Architectures

Sept. 2012 – Jun. 2015 M.Sc. in Computer Engineering Xi'an Jiaotong University, China

Thesis: Parallel Acceleration Algorithms and FPGA Implementation for KLMS and KAP

Sept. 2008 – Jun. 2012 B.Sc. in Electronic Engineering Xi'an Jiaotong University, China

## **Professional Experience**

| Jun. 2021 – Present    | Senior Architect        | NVIDIA, Canada and USA                             |
|------------------------|-------------------------|----------------------------------------------------|
| Nov. 2020 – Jun. 2021  | Postdoc Research Fellow | University of British Columbia, Canada             |
| Sept. 2015 – Oct. 2020 | Research Assistant      | University of British Columbia, Canada             |
| Sept. 2019 – Nov. 2019 | Research Intern         | Max Planck Institute for Software Systems, Germany |
| Aug. 2018 – Nov. 2018  | Research Intern         | NVIDIA, USA                                        |
| May. 2017 – Aug. 2017  | Research Intern         | NVIDIA, USA                                        |
| Sept. 2012 – Jun. 2015 | Research Assistant      | Xi'an Jiaotong University, China                   |
| Jul. 2011 – Sept. 2011 | Undergraduate Intern    | ICT, Chinese Academy of Science, China             |

#### **Publications**

- Michalis Kokologiannakis, Xiaowei Ren, and Viktor Vafeiadis. "Dynamic Partial Order Reductions for Spinloops", Annual International Conference of Formal Methods in Computer-Aided Design (FMCAD), Yale University, Connecticut, USA, October 2021.
- Xiaowei Ren, and Mieszko Lis. "CHOPIN: Scalable Graphics Rendering in Multi-GPU Systems via Parallel Image Composition", 27th International Symposium on High Performance Computer Architecture (HPCA), Seoul, South Korea, February 2021. (acceptance rate: 63/258 = 24.4%)
- Dingqing Yang, Amin Ghasemazar\*, **Xiaowei Ren**\*, Maximilian Golub, Guy Lemieux, and Mieszko Lis. "Procrustes: a Dataflow and Accelerator for Sparse Deep Neural Network Training", *53rd International Symposium on Microarchitecture (MICRO)*, Athens, Greece, October 2020. (acceptance rate: 82/424 = 19.3%, \*equal contribution)
- Xiaowei Ren, Daniel Lustig, Evgeny Bolotin, Aamer Jaleel, Oreste Villa, and David Nellans. "HMG: Extending Cache Coherence Protocols Across Modern Hierarchical Multi-GPU Systems", 26th International Symposium on High Performance Computer Architecture (HPCA), San Diego, USA, February 2020. (acceptance rate: 48/248 = 19.4%)

- Xiaowei Ren, and Mieszko Lis. "High-Performance GPU Transactional Memory via Eager Conflict Detection", 24th International Symposium on High Performance Computer Architecture (HPCA), Vienna, Austria, February 2018. (acceptance rate: 54/260 = 20.8%)
- **Xiaowei Ren**, and Mieszko Lis. "Efficient Sequential Consistency in GPUs via Relativistic Cache Coherence", *23rd International Symposium on High Performance Computer Architecture (HPCA)*, Austin, USA, February 2017. (acceptance rate: 50/224 = 22.3%)
- Pengju Ren, Xiaowei Ren, Sudhanshu Sane, Michel A. Kinsy, and Nanning Zheng. "A Deadlock-Free and Connectivity-Guaranteed Methodology for Achieving Fault-tolerance in On-Chip Networks", *IEEE Transactions on Computers (TC)*, 2016.
- Xiaowei Ren, Qihang Yu, Badong Chen, Nanning Zheng, and Pengju Ren. "A Reconfigurable Parallel Accelerator for the Kernel Affine Projection Algorithm", *IEEE International Conference on Digital Signal Processing (DSP)*, Singapore, July 2015.
- Xiaowei Ren, Qihang Yu, Badong Chen, Nanning Zheng, and Pengju Ren. "A 128-way FPGA Platform
  for the Acceleration of KLMS Algorithm", Asia and South Pacific Design Automation Conference (ASPDAC), Tokyo, Japan, January 2015. (University LSI Design Contest)
- Xiaowei Ren, Qihang Yu, Badong Chen, Nanning Zheng, and Pengju Ren. "A Real-time Permutation Entropy Computation for EEG Signals", Asia and South Pacific Design Automation Conference (ASP-DAC), Tokyo, Japan, January 2015. (University LSI Design Contest)
- Xiaowei Ren, Pengju Ren, Badong Chen, Jose C. Principe, and Nanning Zheng. "A Reconfigurable Parallel Acceleration Platform for Evaluation of Permutation Entropy", *36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)*, Chicago, USA, August 2014.
- Xiaowei Ren, Pengju Ren, Badong Chen, Tai Min, and Nanning Zheng. "Hardware implementation of KLMS Algorithm using FPGA", *International Joint Conference on Neural Networks (IJCNN)*, Beijing, China, July 2014.
- Pengju Ren, Qingxin Meng, Xiaowei Ren, and Nanning Zheng. "Fault-tolerant Routing for On-chip Network without Using Virtual Channel", ACM/EDAC/IEEE Design Automation Conference (DAC), San Francisco, USA, June 2014. (acceptance rate: 3150/10963 = 29%)

#### Talks & Presentations

- Oral, "CHOPIN: Scalable Graphics Rendering in Multi-GPU Systems via Parallel Image Composition", HPCA, Global Virtual Event, February 2021.
- Oral, "HMG: Extending Cache Coherence Protocols Across Modern Hierarchical Multi-GPU Systems", HPCA, San Diego, USA, February 2020.
- Oral, "High-Performance GPU Transactional Memory via Eager Conflict Detection", *HPCA*, Vienna, Austria, February 2018.
- Oral, "Efficient Sequential Consistency in GPUs via Relativistic Cache Coherence", *HPCA*, Austin, USA, February 2017.

- Oral and Poster, "A 128-way FPGA Platform for the Acceleration of KLMS Algorithm", ASP-DAC, Tokyo, Japan, January 2015.
- Oral and Poster, "A Real-time Permutation Entropy Computation for EEG Signals", *ASP-DAC*, Tokyo, Japan, January 2015.
- Poster, "A Reconfigurable Parallel Acceleration Platform for Evaluation of Permutation Entropy", EMBC, Chicago, USA, August 2014.
- Poster, "Hardware implementation of KLMS Algorithm using FPGA", IJCNN, Beijing, China, July 2014.

#### **Professional Service**

- External Reviewer Committee, MICRO, 2021
- External Reviewer Committee, ISCA, 2021, 2022
- Artifact Evaluation Committee, ASPLOS, 2021
- Reviewer, ACM TECS, 2020
- Reviewer, ACM TACO, 2020, 2021
- Reviewer, IEEE TPDS, 2021
- Reviewer, IEEE TC, 2020
- Reviewer, IEEE TCAD, 2020, 2021
- Reviewer, IEEE CAL, 2020, 2021
- Reviewer, JPDC, 2020
- Subreviewer, HPCA, 2021
- Shadow Program Committee, EuroSys, 2021

### **Awards**

| 2016 – 2020 | UBC Graduate Support Initiative (GSI) Awards                |
|-------------|-------------------------------------------------------------|
| 2012 – 2015 | National Master Scholarship (honors top 5% students)        |
| 2013 – 2014 | Suzhou Industrial Park Scholarship                          |
| 2010 – 2011 | CASC Secondary Class Scholarship                            |
| 2009 – 2010 | National Encouragement Scholarship (honors top 3% students) |
| 2008 – 2009 | Siyuan Scholarship                                          |

## **Teaching Experience**

Jan. 2017 – Apr. 2017 Teaching Assistant, University of British Columbia, Canada EECE527: Advanced Computer Architecture (Instructor: Mieszko Lis)

Sept. 2016 – Dec. 2016 Teaching Assistant, University of British Columbia, Canada

CPEN411: Computer Architecture (Instructor: Mieszko Lis)

Sept. 2015 – Dec. 2015 Teaching Assistant, University of British Columbia, Canada

CPEN211: Introduction to Microcomputers (Instructor: Tor Aamodt)