GSM:  $+32-\overline{488-533-196}$ 

email: kartik.lakshminarasimhan@ugent.be

Skills Languages C++, Python, Bash, Chisel

Hardware Verilog, VHDL, Cadence Virtuoso

Libraries STL, pthreads

Tools Intel Pin, RISCV Spike-PK, Xilinx Platforms Xilinx Alveo u250, Zedboard Ubuntu, Microsoft Windows Simulators Sniper, Chipyard, Firesim

Relevant Experience

# Graduate Research Assistant

Fall'17 - Present

Performance Lab, Ghent University

## Graduate Technical Intern

May'16 - December'16

Microarchitecture Research Labs, Intel, Bangalore, India

Developed a visualization tool (in python and C++) aiding the analysis to find performance bottlenecks and improving IPC gains. Workload characterization using VT une Amplifier

## Graduate Research Assistant

Fall'14 - Spring'16

Computer Architecture Group, University of Connecticut

Education

#### **Ghent University**

Fall'17 - Summer'23(expected)

Doctor of Philosophy, Computer Science and Engineering

## University of Connecticut

August '14 - December'16

Master of Science, Electrical and Computer Engineering

GPA: 3.4/4

Graduate Courses (Applied Probability and Stochastic Process, Advanced Storage Systems, Neural Computing, Computer Architecture, Advanced Computer Architecture, Machine Learning[Coursera])

Thesis: WCET Analysis for Concurrent Execution of Multiple Applications on Safety Critical Embedded Multicores

#### Anna University, Chennai

August '10 - May'14

Bachelor of Engineering, Electronics and Communication

First Class

Relevant coursework: VLSI Design (Theory and Laboratory), Digital Design (Theory and Laboratory), Data Structures and Object Oriented Design, Microprocessors and Microcontrollers

Patents

CPU with multiple instruction queues, L. Eeckhout, K. Lakshminarasimhan and A. Naithani, filed at European Patent Office (EPO) WO2022069374A1 WIPO (PCT)

#### **Publications**

#### Co-authored

The Forward Slice Core: A High-Performance, Yet Low-Complexity Microarchitecture. **Kartik Lakshminarasimhan**, Ajeya Naithani, Josu Feliu, and Lieven Eeckhout. 2022. ACM Trans. Archit. Code Optim. 19, 2, Article 17 (June 2022), 25 pages. https://doi.org/10.1145/3499424

The Forward Slice Core Microarchitecture, **K. Lakshminarasimhan**, A. Naithani, J. Feliu Perez, and L. Eeckhout, International Conference on Parallel Architectures and Compilation Techniques (PACT), Oct 2020

A Lightweight Spatio-temporally Partitioned Multicore Architecture for Concurrent Execution of Safety Critical Workloads, Q.Shi, **K.Lakshminarasimhan**,C. Noll, E. Scholte, O.Khan SAE 2016 Aerospace Systems and Technology Conference(ASTC), September, 2016

Efficient Parallelization of Path Planning Workload on Single-chip Shared-memory Multicores M. Ahmad, **K. Lakshminarasimhan**, O. Khan, to appear in IEEE High Performance Extreme Computing Conference, (HPEC'15), September 2015

Performance and Energy Efficient Cache System Design: Simultaneous Execution of Multiple Applications across Heterogeneous Cores, Venkateswaran Nagarajan, **K.Lakshminarasimhan**, et al. presented at IEEE Symposium on VLSI(ISVLSI'13), Natal, Brazil

### **Projects**

#### TinyMLPerf Benchmark suite

Summer20 - Fall'20

Part of the TinyMLPerf working group as a benchmark developer . Contributing code to the Keyword Spotting benchmark in TF2.0 using DS-CNN. (Python/TF2/Keras)

#### Complexity-effective microarchitectures

Fall17 - Present

Exploring the performance gap between in-order and OOO cores by adding simple structures on top of an in-order cores (Simulators used : Sniper, Chipyard, Firesim)

Multiprogram support for Graphite Many-core Simulator Fall14, Summer'15 Part of a team to implement multiprogram support in the lite (no memory/system call emulation) mode of Graphite simulator. Studied the multiprogramming methodology in Dynamic Binary Translation(DBT) based simulators of ZSim(uses PIN and system calls) and Sniper(uses PinPoint and PinPlay and Unix Pipes).

#### Cache sensitivity of Loop-Tiled Matrix Algorithms

Spring'15

Conducted various cache sensitivity studies in Graphite and ZSim(state of the art cache partitioning schemes) on Loop-tiled Matrix Algorithms.

#### Partitioning Shared Resources in a Multicore

Summer'15, Fall'15

Implemented Way-Partitioning in shared last level cache, spatial and temporal partitioning of shared memory controllers.

# Code Optimization for Path Planning Algorithms

Fall'14

Optimized the data Structures in path-planning algorithms (Dijkstra,  $A^*$ ,  $D^*$ ) for reduced code completion time in a simulator (Java).

## Computation and Learning in Biological Neuron Models

Spring,Summer'15

Tried to study computation and learning mechanisms in biological neural models of Hodgkin Huxley and Izhikevich.

#### Parallel Support Vector Machines Training using Pthreads

Spring'16

Implemented scalable serial and parallel versions of : Kernel trick in SVM and simplified version of Sequential Minimization Optimization Algorithm.

#### ARM Bus Architecture Design

December'13 - April'14

Bachelors' project : FPGA implementation of AXI-APB bridge architecture in AMBA 3.0 using Bluespec System Verilog.

## References

Available upon request