# KYLE C. HALE

March 9, 2020

#### PERSONAL INFORMATION

email khale@cs.iit.edu

website http://halek.co

*phone* +1 (361) 563 4196

address Department of Computer Science

Stuart Building Room 229C Illinois Institute of Technology

10 West 31<sup>st</sup> Street Chicago, IL 60616-3717

#### RESEARCH INTERESTS

Unconventional and experimental computer systems, especially relating to operating systems, parallel computing, high-performance computing, resource virtualization and virtual machine monitors, computer architecture, and network and software security.

#### **EDUCATION**

August 2016 Northwestern University

Ph.D. in Department of Electrical Engineering and Computer Science

Computer Science Thesis: Hybrid Runtime Systems
Advisor: Prof. Peter A. Dinda

March 2013 Northwestern University

M.S. in Computer Department of Electrical Engineering and Computer Science

May 2010 The University of Texas at Austin

B.S. in Computer Department of Computer Science

Science Honors Thesis: Segment Gating for Static Energy Reduction

with Introspective Networks-on-Chip

Advisor: Prof. Stephen W. KECKLER

Sept. 2007 Sophia University, Tokyo, Japan

Intensive Japanese Language Program

#### **EMPLOYMENT**

2016-Present Assistant Professor, Illinois Institute of Technology, Department of Computer Science

Chicago, IL

Illinois Institute of Technology

Assistant Professor in the Department of Computer Science.

Ph.D. Student, Northwestern University 2010-2016 Department of Electrical Engineering and Computer Science Evanston, IL

Northwestern University

Conducted research in unconventional and experimental computer systems, with an emphasis on operating systems and high-performance computing.

Research Intern, VMWARE, INC. Summer 2013 Proactive Distributed Resource Management Team Palo Alto, CA

VMWare, Inc.

Investigated the ability to leverage application communication patterns in parallel codes to implement proactive resiliency in a virtualized environment (particularly for VMWare vSphere).

Reference: Rean Griffith rean@caa.columbia.edu

Aug-Sep Technical Computing Intern, Fujitsu Ltd. Technical Computing Solutions Unit

Chiba, Japan

Fujitsu Ltd.

Tested, packaged, and installed the Fujitsu cross-compiler toolkit for the PRIMEHPC FX10 Supercomputer on access nodes. Developed test-suite of hybrid parallel applications (MPI/OpenMP/FFTW) aimed at customers developing cross-compiled programs for the PRIMEHPC FX10. Reference: Shinya Fukumoto fukumoto.shinya@jp.fujitsu.com

Summer 2012 Graduate Technical Research Intern, SANDIA

LABS

Scalable Systems Software Unit

Albuquerque, NM

Sandia National Laboratories

Ported the Palacios Virtual Machine Monitor to the Cray XK6. Developed a novel, RDMA-based high-performance networking component within the Palacios VMM to mitigate network virtualization overhead in HPC applications.

Reference: Kevin Pedretti ktpedre@sandia.gov

Database Support Intern, USAA Summer 2008

Database Group San Antonio, TX

**USAA** 

Helped consolidate data from several independent legacy data warehouses (Oracle, DB2, SQL Server, MySQL). Helped consolidate data from several independent legacy data warehouses (Oracle, DB2, SQL Server, MySQL). Migrated data reporting mechanisms from hard-coded scripts and Excel sheets to a fully customizable reporting system using Microsoft products, allowing management to quickly and efficiently drill down on important

data-usage statistics. Reference: Ranjith Raghunath ranjith\_nath@hotmail.com

# **PUBLICATIONS**

# Refereed Conference Papers

| MASCOTS 2019 | B. Tauro, C. Liu, and <b>K.C. Hale</b> . Modeling Speedup in Multi-OS Environments. <i>Proceedings of the</i> 27 <sup>th</sup> <i>IEEE International Conference on the Modeling, Analysis, and Simulation of Computer and Telecommunication Systems,</i> October, 2019.                                             |
|--------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| MASCOTS 2019 | C. Hetland, G. Tziantzioulis, B. Suchy, <b>K.C. Hale</b> , N. Hardavellas, and P. Dinda. Prospects for Functional Address Translation. <i>Proceedings of the 27<sup>th</sup> IEEE International Conference on the Modeling, Analysis, and Simulation of Computer and Telecommunication Systems</i> , October, 2019. |
| ROSS 2019    | C. Liu and <b>K.C. Hale</b> . Towards a Practical Ecosystem for Specialized Operating Systems. <i>Proceedings of the</i> 9 <sup>th</sup> <i>International Workshop on Runtime and Operating Systems for Supercomputers</i> , June, 2019.                                                                            |
| MASCOTS 2018 | <b>K.C. Hale</b> and P. Dinda. An Evaluation of Asynchronous Software Events on Modern Hardware. <i>Proceedings of the</i> 26 <sup>th</sup> <i>IEEE International Symposium on the Modeling, Analysis, and Simulation of Computer and Telecommunication Systems,</i> September, 2018.                               |
| ICAC 2017    | <b>K.C. Hale</b> , C. Hetland, and P. Dinda. Multiverse: Easy Conversion of Runtime Systems into OS Kernels via Automatic Hybridization. <i>Proceedings of the</i> 14 <sup>th</sup> <i>International Conference on Autonomic Computing</i> , July, 2017.                                                            |
| HPDC 2016    | <b>K.C. Hale</b> , C. Hetland, and P. Dinda. Automatic Hybridization of Runtime Systems. <i>Proceedings of the</i> 25 <sup>th</sup> <i>International ACM Symposium on High-performance Parallel and Distributed Computing</i> , June, 2016.                                                                         |
| VEE 2016     | <b>K.C. Hale</b> and P. Dinda. Enabling Hybrid Parallel Runtimes Through Kernel and Virtualization Support. <i>Proceedings of the</i> 12 <sup>th</sup> ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, April, 2016.                                                                  |
| HPDC 2015    | <b>K.C. Hale</b> and P. Dinda. A Case for Transforming Parallel Runtimes into Operating System Kernels. <i>Proceedings of the</i> 24 <sup>th</sup> <i>International ACM Symposium on High-performance Parallel and Distributed Computing</i> , June, 2015.                                                          |
| ROSS 2014    | M. Swiech, <b>K.C. Hale</b> , and P. Dinda. VMM Emulation of Intel Hardware Transactional Memory. <i>Proceedings of the</i> 4 <sup>th</sup> <i>International Workshop on Runtime and Operating Systems for Supercomputers</i> , June, 2014.                                                                         |
| HPDC 2014    | L. Xia, <b>K.C. Hale</b> , and P. Dinda. ConCORD: Easily Exploiting Memory Content Redundancy Through the Content-aware Service Command. <i>Proceedings of the</i> 23 <sup>rd</sup> <i>ACM Symposium on High-performance Parallel and Distributed Computing</i> , June, 2014.                                       |
| ICAC 2014    | <b>K.C. Hale</b> and P. Dinda. Guarded Modules: Adaptively Extending the VMM's Privilege Into the Guest. <i>Proceedings of the</i> 11 <sup>th</sup> <i>International Conference on Autonomic Computing</i> , June, 2014.                                                                                            |
| ICAC 2012    | K.C. Hale, L. Xia, and P. Dinda. Shifting GEARS to Enable Guest-context                                                                                                                                                                                                                                             |

Virtual Services. *Proceedings of the* 9<sup>th</sup> *International Conference on Autonomic Computing*, September, 2012.

NoCArc 2009

**K.C. Hale**, B. Grot, and S. Keckler. Segment Gating for Static Energy Reduction in Networks-On-Chip. *Proceedings of the* 2<sup>nd</sup> *International Workshop on Network-on-Chip Architectures*, December, 2009.

# Non-overlapping Technical Reports

April 2014

**K.C. Hale** and P. Dinda. Details of the Case for Transforming Parallel Runtimes into Operating System Kernels. Technical Report NU-EECS-15-01, Department of Electrical Engineering and Computer Science, Northwestern University, April, 2014.

November 2011

J. Lange, P. Dinda, **K.C. Hale**, and L. Xia. An Introduction to the Palacios Virtual Machine Monitor—Version 1.3. Technical Report NU-EECS11-10, Department of Electrical Engineering and Computer Science, Northwestern University, November, 2011.

#### Miscellaneous Posters and Talks

GCASR 2019

P. Nookala, P. Dinda, **K.C. Hale** and I. Raicu. XQueue: Extreme Fine-grained Concurrent Lock-less Queue. Poster at *the* 8<sup>th</sup> *Annual Greater Chicago Area Systems Research Workshop*, May, 2019.

GCASR 2019

B. Tauro, C. Liu, and **K.C. Hale**. Modeling Speedup in Multi-OS Environments. Poster at the 8<sup>th</sup> Annual Greater Chicago Area Systems Research Workshop, May, 2019.

GCASR 2018

A. Rizvi, **K.C. Hale**. Evaluating Julia as a Vehicle for High-performance Parallel Runtime Construction. Poster at the  $7^{th}$  Annual Greater Chicago Area Systems Research Workshop, April, 2018.

GCASR 2017

P. Nookala, I. Raicu, P. Dinda, and **K.C. Hale**. Performance Analysis of Queue-based Data Structures. Poster at *the* 6<sup>th</sup> *Annual Greater Chicago Area Systems Research Workshop*, April, 2017.

ROSS 2016

**K.C. Hale** and P. Dinda. Accelerating Asynchronous Events for Hybrid Parallel Runtimes. Invited talk at the 6<sup>th</sup> International Workshop on Runtime and Operating Systems for Supercomputers, June, 2016.

GCASR 2016

**K.C. Hale** and P. Dinda. Multiverse: Automatic Hybridization of Parallel Runtime Systems. At the 5<sup>th</sup> Annual Greater Chicago Area Systems Research Workshop, April, 2016.

HPDC 2015

**K.C. Hale** and P. Dinda. A Case for Transforming Parallel Runtimes into Operating System Kernels. At the 23<sup>rd</sup> ACM Symposium on High-performance Parallel and Distributed Computing, June, 2015.

GCASR 2015

G. Tziantzioulis, **K.C. Hale**, B. Pashaj, N. Hardavellas, and P. Dinda. SeaFire: Specialized Computing on Dark Silicon with Heterogeneous Hardware Multi-Pipelining. At *the* 4<sup>th</sup> *Annual Greater Chicago Area Systems Research Workshop*, April, 2015.

GCASR 2015

**K.C. Hale** and P. Dinda. A Case for Transforming Parallel Runtimes into Operating System Kernels. At the  $4^{th}$  Annual Greater Chicago Area Systems Research Workshop, April, 2015.

GCASR 2014

**K.C. Hale** and P. Dinda. Guarded Modules: Adaptively Extending the VMM's Privilege Into the Guest. At *the* 3<sup>rd</sup> *Annual Greater Chicago Area Systems Research Workshop*, May, 2014.

February 2013

**K.C. Hale.** Dynamic Linking Considered Harmful. Talk given at the NU Computer Systems Reading Group, February, 2013.

ICAC 2012

**K.C. Hale**, L. Xia, and P. Dinda. Shifting GEARS to Enable Guest-context Virtual Services. At *the* 9<sup>th</sup> *ACM International Conference on Autonomic Computing*, September, 2012.

#### RESEARCH FUNDING

NSF CSR Medium

"CSR: Medium: Collaborative Research: Interweaving the Parallel Software/Hardware Stack," NSF CNS 1763612, \$305,578, September 2018 through August 2021, Principal Investigator. This project is in collaboration with Peter Dinda, Nikos Hardavellas, and Simone Campanoni at Northwestern University.

NSF REU Site

"REU Site: Collaborative Research: BigDataX: From theory to practice in Big Data computing at eXtreme scales," NSF CNS 1757964, \$325,000, March 2018 through February 2021, Co-PI. This project is in collaboration with Ioan Raicu at IIT (lead PI) and Kyle Chard at the University of Chicago.

Intel Hardware Grant

"Exploring the Integeration of FPGA-based Reconfigurable Hardware with Specialized OS Environments," Intel Hardware Accelerator Research Program (HARP), May, 2017, prototype hardware access. Principal Investigator (in collaboration with Peter Dinda, Northwestern University).

NSF CRI II-NEW

"CRI: II-NEW: MYSTIC: prograMmable sYstems reSearch Testbed to explore a stack-wIde adaptive system fabriC," NSF CNS 1730689, \$1,000,000, July 2017 through June 2020, Co-PI. This project is in collaboration with Ioan Raicu (lead PI) and Xian-He Sun at IIT.

NSF CSR Small

"CSR: Small: Collaborative Research: Flexible Resource Management and Coordination Schemes for Lightweight, Rapidly Deployable OS/Rs," NSF CNS 1718252, \$249,771, August 2017 through July 2020, Principal Investigator. This project is in collaboration with Jack Lange at the University of Pittsburgh.

SOFTWARE

Diver

Diver is a tool to boot specialized OS kernels such as Unikernels, library

OSes, and lightweight kernels on virtual and physical hardware in a similar manner to how containers are launched.

Nautilus Aerokernel

# https://github.com/hexsa-lab/diver

Nautilus is an extremely lightweight kernel layer designed as a *privileged* library operating system (called an Aerokernel) that demonstrates the Hybrid Runtime model, wherein the combined parallel runtime and Aerokernel are transformed into a specialized OS kernel. It is a many-core capable OS layer that runs on commodity x64 hardware and the Intel Xeon Phi. I am the primary developer of Nautilus. Descriptions of related systems appear below.

http://nautilus.halek.co Estimated Lines of Code: 35,000

• Nemo: Nemo is an event system in Nautilus for Hybrid Runtimes. Nemo accelerates asynchronous software event delivery by several orders of magnitude by leveraging hardware features typically only available to the OS (but not the runtime) in kernel-mode.

Estimated Lines of Code: 200

• Multiverse Runtime: Multiverse is a runtime system aimed at alleviating the effort required to build and port Hybrid Runtimes. It allows users to explore the benefits of HRTs by automatically running their legacy Linux programs in the Nautilus Aerokernel without any porting effort.

Estimated Lines of Code: 4,000

Philix

Philix is a tool that I designed for booting  $3^{rd}$  party OS kernels on the Intel Xeon Phi. It allows kernel developers to rapidly prototype new kernel mechanisms on the Phi without implementing Intel's SCIF protocol.

http://philix.halek.co Estimated Lines of Code: 4,000

Palacios VMM

Palacios is an open-source, embeddable Virtual Machine Monitor actively developed by researchers at several institutions. I have built several systems within the context of Palacios and have made many contributions to the codebase, a subset of which are described below. The code for all of these systems can be found in the development branch of the Palacios repository at the website listed below.

http://v3vee.org/palacios

• **GEARS**: Guest Examination and Revision Services (GEARS) is a set of tools that allows developers to create *guets-context virtual services*, VMM-based services that extend *into* the guest.

Estimated Lines of Code: 2,500

 Guarded Modules: Guarded Modules extend the concept of guest-context virtual services by granting them *privileged* access to hardware and VMM state. Guarded Modules protect this privilege from the rest of the guest by maintaining a software border with compile-time and run-time techniques.

Estimated Lines of Code: 1,000

 Virtualized DVFS: This system allows fine-grained control of the Dynamic Voltage and Frequency Scaling (DVFS) hardware during VM exits, leveraging inferred information about guests to make informed power management decisions.

Estimated Lines of Code: 500

• **Virtual HPET**: This is a virtual implementation of the High-Precision Event Timer, a fine-grained platform timer present on most contemporary high-performance hardware. I added support for the HPET to allow us to run experimental systems like OSv on Palacios.

Estimated Lines of Code: 1,000

QEMU backend: QEMU provides a rich diversity of virtual devices. This is
one contributor to the simplicity of, e.g. the KVM codebase, as it leverages
these device implementations. I wanted to similarly be able to leverage these
devices for Palacios. This system implements that functionality with a
software bridge between Palacios and QEMU.

Estimated Lines of Code: 2,000

 VMM-emulated RTM: This was the first VMM-emulated implementation of the Restricted Transactional Memory (RTM) component of the Intel Transactional Synchronization Extensions (TSX). Its performance is roughly 6ox relative to Intel's emulator.

Estimated Lines of Code: 1,300

- Palacios on the Cray XK6: I ported the Palacios VMM to run on the Cray XK6 series of supercomputer nodes. This comprised several bug fixes and enhancements to the Palacios codebase.
- Other contributions: I have also participated in regular development and maintenance of the Palacios codebase. This includes bug fixes, enhancements to the extension architecture, guest configuration and loading, software interrupt and system call interception, and others.

Estimated Lines of Code: 12,000

SETI Lab

For our Introduction to Computer Systems Course (EECS 213) at Northwestern, we wanted a new lab to give students an earlier, practical introduction to parallel programming. To accomplish this, I designed and implemented SETI Lab, which is a lab that draws inspiration from SETI@Home. Students compete to parallelize signal analysis code and find alien signals in synthetic radio telescope data.

Estimated Lines of Code: 5,500

AWARDS AND HONORS

- 2017 · Best Computer Science PhD Dissertation Award · Northwestern University EECS Department
- 2016 · Invitee, the  $6^{th}$  International Workshop on Runtime and Operating Systems for Supercomputers (ROSS), June, 2016.
- 2015 · Best Short Presentation Award · A Case for Transforming Parallel Runtimes into Operating System Kernels · HPDC 2015
- 2005-2010 · Member of Turing Scholars Honors Computer Science Degree Program
- 2008-2010 · Member of Ronald E. McNair Post-Baccalaureate Achievement Program
- 2010-2011 · Murphy Graduate Fellowship Recipient

#### TEACHING AND ADVISING

#### PhD Students

- Amal Rizvi, 4<sup>th</sup> year
- Conghao Liu, 3<sup>rd</sup> year
- Brian Tauro,  $1^{st}$  year

#### Masters Students Advised

- Nanda Velugoti, Compiler-based blending and debugging
- Ganesh Mahesh, Measuring address space dynamics
- Piyush Nath, Nautilus InfiniBand driver
- Goutham Kannan, Lua in Nautilus kernel
- Imran Ali-Usmani, Lua in Nautilus kernel
- Suraj Chafle, Dune threads in Nautilus

# Undergraduate Students Advised

- Josh Bowden (Spring '20) Language abstractions for virtualized co-routines
- Nicholas Wanninger (REU, Summer '19) Virtualized co-routines
- Iris Uwizeyimana (Summer '19) AI-accelerated hearing aid architecture
- Justin Orr (Summer '19) Multiverse and HVM

- Andrew Neth (Summer '19) Multiverse and HVM
- Justin Goodman (BigDataX REU, Summer '19) Address space dynamics
- Hussain Khajanchi (BigDataX REU, Summer '19) AI-accelerated hearing aid architecture
- Gyucheon (Jake) Heo (Summer '19) Investigating new OS abstractions for high-performance I/O
- Samuel Grayson (BigDataX REU, Summer '18) building a customized kernel for high-performance data processing; now PhD student at UIUC
- Jagruti Depan (BigDataX REU, Summer '18) FPGA-based implementation of network science algorithms
- Lucas Myers (Summer '18) development of NES emulator for CS 562 class
- Josué Rodríguez Nieves (BigDataX REU, Summer '17) programmable on-chip network architectures
- Zachary McKee (Summer '17) development of CFG language generation system

### Courses Created

- CS 562 Virtual Machines
- CS 595-03 OS and Runtime System Design for Supercomputing
- CSP 544 System and Network Security
- CS 450 Operating Systems (course redesign)

# PhD Committees

- Xin Wang (Advisor: Zhiling Lan)
- Hariharan Devarajan (Advisor: Xian-He Sun)
- Anthony Kougkas (Advisor: Xian-He Sun)
- Christopher Hannon (Advisor: Dong (Kevin) Jin)
- Baharet Sadat Arab (Advisor: Boris Glavic)
- Seokki Lee (Advisor: Boris Glavic)
- Maral Mesmakhosroshahi (ECE, Advisor: Joohee Kim)

#### Miscellaneous

TA for Introduction to Databases (NU EECS 339)

TA for Introduction to Computer Systems (NU EECS 213), 2 quarters

Designed a new parallel computing lab called SETI Lab for the NU Introduction to Computer Systems (EECS 213) course. Students are tasked with parallelizing signal analysis in the search for synthetic "alien" signals.

#### Co-advised masters student Shiva Rao

Topic: Feasibility of Making DVFS Decisions in the VMM Now Senior Software Engineer at Altera

## Co-advised masters student Madhav Suresh

Topic: Parallel language synchronization techniques; Deterministic and stochastic barrier synchronization

# Guided and assisted undergraduate students in independent study projects:

Conor Hetland & Jonathan Ford

Topic: Prototype port of the Nautilus AeroKernel to the Intel Xeon Phi

Akhil Guliani, Billy Gross, and Panitan Wongse-ammat Topic: Device file virtualization in the Palacios VMM

# SERVICE TO DISCIPLINE

# Technical Program Committee Memberships

IPDPS 2020

ICCD 2019

MASCOTS 2019

VIRT 2018, 2019, 2020

MCHPC 2018, 2019

SC 2018

VHPC 2015, 2016, 2017, 2018, 2019, 2020

FiCloud 2016

CloudCom 2016, 2017, 2018, 2019, 2020

ICS 2017 (External Review Committee)

Local Chair, ICS 2017

# External Reviewing

DATE 2012

ISPASS 2012, 2017

```
HPDC 2012, 2013, 2014, 2015
    SC 2012, 2016
    ICAC 2013
    ICDCS 2015
    OOPSLA 2016
    HPCA 2017
    CGO 2017
Journal Reviewing
    JPDC (2019)
    CCPE (2018)
    SPE (2018, 2019)
    TPDS (2016, 2018)
    Parallel Computing (2014)
Miscellaneous
    Panel Member for NSF CSR Small Proposals (February 2019)
    Member of ACM (SIGARCH, SIGOPS, SIGHPC)
SERVICE TO INSTITUTION
```

IIT CS Department committees: undergraduate studies, graduate admissions, faculty search, ad hoc TA selection

Helped lead creation of CS Honors degree program

Faculty advisor for Computer Science Graduate Student Association

# OTHER INFORMATION

Languages English · Native

JAPANESE · Advanced (conversationally fluent, reading and writing)

Spanish · Basic (simple words and phrases only)

# REFERENCES

Available upon request.