GSM: +32-488-533-196/+91-9791595369

email: kartik.cciprep@gmail.com, kartik.lakshminarasimhan@imec.be

Skills Languages C++, Python, Bash, Chisel

Hardware Verilog, VHDL, Cadence Virtuoso

Libraries STL, pthreads

Tools Intel Pin, RISCV Spike, LLVM(tablegen, libc)

Platforms Xilinx Alveo u250, Zedboard

Simulators SST, Sparta, Sniper, Chipyard, Firesim

Relevant Experience

### Researcher

Sep'23 - Present

Compute System Architecture unit, imec

Modelling Tensor Processing Unit in cycle accurate simulators, Analytical modeling of Systolic array style architectures for Superconducting technology. GPU System-level Power modeling and calibration with AMD and NVIDIA GPUs

#### Research intern

August'22 - February'23

Compute System Architecture unit, imec

Working on adapting systolic-array based accelerators for superconducting technologies targeted at  ${\rm AI/HPC}$  applications

# Graduate Research Assistant

Fall'17 - August'23

Performance Lab, Ghent University

## **Graduate Technical Intern**

May'16 - December'16

Microarchitecture Research Labs, Intel, Bangalore, India

Developed a visualization tool(in python and C++) aiding the analysis to find performance bottlenecks and improving IPC gains. Workload characterization using VTune Amplifier

# Graduate Research Assistant

Fall'14 - Spring'16

Computer Architecture Group, University of Connecticut

#### Education

#### **Ghent University**

Fall'17 - Summer'23

Doctor of Philosophy, Computer Science and Engineering

Graduate Courses: VLSI Technology, FPGA Design, Physics of MRI imaging, Statistics using python

### University of Connecticut

August '14 - December'16

Master of Science, Electrical and Computer Engineering

GPA: 3.4/4

Graduate Courses (Applied Probability and Stochastic Process, Advanced Storage Systems, Neural Computing, Computer Architecture, Advanced Computer Architecture, Machine Learning[Coursera])

Thesis: WCET Analysis for Concurrent Execution of Multiple Applications on Safety Critical Embedded Multicores

### Anna University, Chennai

August '10 - May'14

Bachelor of Engineering, Electronics and Communication

First Class

Relevant coursework: VLSI Design (Theory and Laboratory), Digital Design (Theory and Laboratory), Data Structures and Object Oriented Design, Microprocessors and Microcontrollers

#### Patents

CPU with multiple instruction queues, EECKHOUT, Lieven, Kartik Lakshminarasimhan, and Ajeya NAITHANI. U.S. Patent Application 18/029,232, filed November 9, 2023.

#### **Publications**

Superconducting Array of Arrays for Acceleration of Transformers, Manu Perumkunnil, **Kartik Lakshminarasimhan**, Quentin Herr, Anna Herr, et al. 16th workshop on Low Temperature Electronics, IEEE WOLTE 2024

The Forward Slice Core: A High-Performance, Yet Low-Complexity Microarchitecture. **Kartik Lakshminarasimhan**, Ajeya Naithani, Josu Feliu, and Lieven Eeckhout. ACM Trans. Archit. Code Optim. 19, 2, Article 17 (June 2022), 25 pages. https://doi.org/10.1145/3499424 (TACO'22)

The Forward Slice Core Microarchitecture, **K. Lakshminarasimhan**, A. Naithani, J. Feliu Perez, and L. Eeckhout, International Conference on Parallel Architectures and Compilation Techniques (PACT), Oct 2020

A Lightweight Spatio-temporally Partitioned Multicore Architecture for Concurrent Execution of Safety Critical Workloads, Q.Shi, **K.Lakshminarasimhan**, C. Noll, E. Scholte, O.Khan SAE 2016 Aerospace Systems and Technology Conference(ASTC), September, 2016

Efficient Parallelization of Path Planning Workload on Single-chip Shared-memory Multicores M. Ahmad, **K. Lakshminarasimhan**, O. Khan, IEEE High Performance Extreme Computing Conference, (HPEC'15), September 2015

Performance and Energy Efficient Cache System Design: Simultaneous Execution of Multiple Applications across Heterogeneous Cores, Venkateswaran Nagarajan, **K.Lakshminarasimhan**, et al. presented at IEEE Symposium on VLSI(ISVLSI'13), Natal, Brazil

#### **Projects**

# Shift Register Design for LCoS micro-display

Spring'22

Schematic and layout design of a 5-bit Shift Register in ON semiconductor 0.35  $\mu m$  process using Cadence Virtuoso

## Optimizing CNN Kernels on FPGA

Fall'21

FPGA Synthesis and timing simulation of CNN kernels (written in VHDL)

## TinyMLPerf Benchmark suite

Summer, Fall'20

Part of the TinyMLPerf working group as a benchmark developer . Contributing code to the Keyword Spotting benchmark in TF2.0 using DS-CNN. (Python/TF2/Keras)

# Complexity-effective microarchitectures

Fall17 - Present

Exploring the performance gap between in-order and OOO cores by adding simple structures on top of an in-order cores (Simulators used : Sniper, Chipyard, Firesim)

# Multiprogram support for Graphite Many-core Simulator

Fall14, Summer'15

Part of a team to implement multiprogram support in the lite (no memory/system call emulation) mode of Graphite simulator. Studied the multiprogramming methodology in Dynamic Binary Translation(DBT) based simulators of ZSim(uses PIN and system calls) and Sniper(uses PinPoint and PinPlay and Unix Pipes).

### Partitioning Shared Resources in a Multicore

Summer'15, Fall'15

Implemented Way-Partitioning in shared last level cache, spatial and temporal partitioning of shared memory controllers.

## Parallel Support Vector Machines Training using Pthreads

Spring'16

Implemented scalable serial and parallel versions of : Kernel trick in SVM and simplified version of Sequential Minimization Optimization Algorithm.

### ARM Bus Architecture Design

December'13 - April'14

Bachelors' project : FPGA implementation of AXI-APB bridge architecture in AMBA 3.0 using Bluespec System Verilog.

#### References

Prof.Lieven Eeckhout

Professor, Department of Electronics and Information Systems, Ghent University lieven.eeckhout@ugent.be