# Srijeet Guha

Pune, Maharashtra, India

J (+91)7294929088 ■ srijeet2310@gmail.com linkedin.com/in/srijeet-guha https://srijeet2310.github.io/

#### Education

#### Birla Institute of Technology and Science - Pilani Campus

Aug. 2019 - May 2023

B.E. (Hons.) in Electrical and Electronics Engineering: CGPA - 8.18 / 10

Pilani, India

#### Technical Skills and Courses

**EE Courses**: Embedded Systems, Computer Architecture, Microprocessor Interfacing, Digital Design, Internet of Things **CS Courses**: Data Structures and Algorithms, Object Oriented Programming, Operating Systems, Discreet Mathematics **Programming Languages**: C/C++, JAVA, Python, CUDA, Assembly, Bash, Verilog

Developer's Tools: Vivado, Quartus Prime, Vitis HLS, Intel HLS, Icarus Verilog, Git, RapidWright, Matlab, Gem5

# Research Experiences

#### Independent Researcher, HES-SO Valais-Wallis

Jan. 2024 – Dec. 2024

Research Assistant | Project Advisor: Prof. Andrea Guerrieri

Sion. Switzerland

- Developed an **iterative frequency tuning framework** to extract the best quality energy-efficient design for FPGA-based post-quantum cryptographic cores, converging to the best design  $\mathbf{2.89} \times$  faster than the classical approach
- Developed **precision unwound** where we explore and fine-tune high-level design parameters like loop unrolling, pipelining and dataflow optimization to reduce latency, area and energy consumption of FPGA-based PQC accelerators

# Processor Architecture Laboratory, École polytechnique fédérale de Lausanne Bachelor's Thesis | Project Advisor: Prof. Paolo Ienne and Prof. Andrea Guerrieri Lausanne, Switzerland

• Developed **DynaRapid** to decrease compilation time of fully-legal placed and routed kernel designs from high-level C/C++ codes for commercial FPGAs with minimal degradation in operating frequency and increase in FPGA resources

- Reduced the C to FPGA implementation time by 33× with a minimal degradation of 20% in the operating frequency
- Generated a library of pre-synthesized, placed and routed building blocks to be stitched at runtime using RapidWright

## Edinburgh Architecture and Systems Lab, The University of Edinburgh

Sept. 2022 – Dec. 2022

Project Intern | Project Advisor: Prof. Boris Grot and Dr. David Schall

Edinburgh, United Kingdom

- Improved <u>T2</u> by expanding its stride-prefetcher table to incorporate at most 3 stable strides for all loop instructions, improving prefetch coverage by **56**% while reducing accuracy by only **7**%, when tested for ARM processors on **Gem5**
- Developed benchmarks to measure accuracy, performance and power consumed by prefetchers implemented on Gem5
- Evaluated appropriate warm-up interval of idle instances for large caches with high accuracy in serverless systems

#### On-Board Computing Subsystem, Team Anant

Feb. 2020 - Nov. 2022

On-board Computer Engineer | Team Advisor: Prof. Meetha V. Shenoy

Pilani, India

- Designed architectures and protocols to facilitate inter-processor communication and data transfer on the satellite
- Implemented CCSDS122 2-D compression algorithm on the Zynq-7000 FPGA to achieve a compression ratio of 67%
- Developed **FSM-based** mode-switching algorithms with advanced graph-traversal methodologies to reduce latency and energy consumption by **2.7**% and **11**% respectively, while switching between different modes of operation of the satellite

#### **Publications**

- Srijeet Guha, Andrea Guerrieri, "Precision Unwound: Fine-Tuning Loop Unrolling for Energy-efficient FPGA-based PQC using HLS". In proceedings of the 26th International Symposium on Quality Electronic Design, San Francisco, California, March 2025
  [Publication], [Presentation], [Slides]
- Srijeet Guha, Andrea Guerrieri, "Iterative Frequency Tuning Targeting Energy Efficiency Ratio for FPGA-based Post-Quantum Cryptographic Cores". In proceedings of the 31st IEEE International Conference on Electronics, Circuits and Systems, Nancy, France, November 2024, Poster Presentation [Publication], [Poster]
- Andrea Guerrieri, **Srijeet Guha**, Chris Lavin, Eddie Hung, Lana Josipović, and Paolo Ienne, "**DynaRapid: Fast-tracking from C to routed circuits**". In proceedings of the 34th IEEE International Conference on Field-Programmable Logic and Applications, Torino, Italy, **September 2024**, **Best Paper Award** [Publication], [Code], [Website], [Demonstration], [Michal Servit Best Paper Award]

Andrea Guerrieri, Srijeet Guha, Lana Josipović, and Paolo Ienne, "DynaRapid: From C to FPGA in a few seconds". In proceedings of the 32nd ACM/SIGDA International Symposium on Field Programmable Gate Arrays, Monterey, California, March 2024, Abstract Only [Poster], [Code], [Website], [Demonstration]

# **Industrial Experiences**

#### Devtools Perfworks Team, NVIDIA Graphics Private Ltd.

Jan. 2024 - Present

System Software Engineer | Employment Letter

Pune, India

- Improved data eviction policies of L1 caches for global memory loads to reduce cache miss rates by 3-4%
- Developed profiling tools to monitor the performance of data (de)compaction units between caches in Blackwell GPUs
- Added support for basic block optimization in Blackwell GPUs to reduce profiling latency of software metrics by 37%
- Introduced new software counter libraries and metrics to reduce latency in monitoring data bank conflicts in GPUs

#### IOSXE Polaris Team, Cisco Systems Private Ltd.

Aug. 2023 - Dec. 2023

Software Engineer | Employment Letter

Bangalore, India

- Improved the IOSd build system to reduce build time of bootable images by 14%, thus improving turnaround time
- Implemented novel message-passing protocols on stack interconnect to reduce congestion in multi-processor router stacks
- Developed low-level driver software to improve communication between IOSd SHIM layers and the router hardware
- Introduced tests to verify the reason and timing of previous unexpected reloads in multi-processor router stacks

#### SWIFT Automatic Unit Testing Team, Cisco Systems Private Ltd.

May 2022 - Aug. 2022

Technical Undergraduate Intern | Project Documentation

Bangalore, India

- Used Software Integration and Functional Test Framework and Orchestrator Infrastructure to execute Automatic Unit Test cases and remove memory leaks from the CISCO Inter-networking Operating System (IOS)
- Simplified IOS debugging by integrating <u>Undo</u> live-recorder in **host** unit-test **pre-commit** build system of the codebase
- Received a pre-placement offer (PPO) at the end of the internship for outstanding performance

#### On-board Computing Team, Team Robocon

Nov. 2019 - March 2020

Computer Engineer

Pilani, India

- Built a robot mouse to find the shortest path using Dijkstra's algorithm and traverse through it using PID Algorithms
- Developed a **Bluetooth-based communication system** to transfer real-time position, housekeeping data, attitude, and images from drones to the user or other drones of the drone cluster for coordination

#### **Projects**

#### Cold Storage Management System | Project Advisor: Prof. Meetha Shenoy

May 2022

- Interfaced DHT11 (humidity sensor), infrared flame sensors, MQ-3 (gas sensor) and L293D (motor driver), with a STM32F407IG processor to measure, maintain, and relay vital health parameters of the cold-storage
- Used 2 STM32F407IG processors connected using Zigbee modules to relay real-time information amongst each other

# 16-bit RISC Processor with Floating Point ALU | Project Advisor: Prof. Karri Babu Ravi Teja

May 2022

- Implemented a 5-stage multi-cycle data path and a 21-stage FSM based control path with multiple addressing modes
- ALU included ripple carry adders, barrel shifters, radix 4 booth's multiplier and logic for under/overflow detection

# Using Memory Traces to draw Insights | Project Advisor: Prof. Vinay Chamola

Dec. 2021

• Developed applications to dump the **SRAM traces** on the serial monitor without altering the SRAM, thus remaining **undetected** by other applications running on the processor

#### MOS Circuits using LTSpice and Microwind | Project Advisor: Prof. Anu Gupta

Dec. 2021

- Designed a 3-input XOR / XNOR Gate using complementary pass transistor logic with 1GHz freq. and 500 pf load
- Designed a two-stage single-ended OPAmp (folded cascode [folded amplifier + common gate stage] + gain stage) with a gain > 90 dB, Unity Gain Bandwidth > 1 MHz, and phase margin ≈ 60 degrees

## **Test Scores**

GRE General Test: 322 / 340 [Verbal Reasoning: 152/170, Quantitative Reasoning: 170/170, Analytical Writing: 4.0/6.0] TOEFL iBT: 105 /120 [Reading: 26/30, Listening: 23/30, Speaking: 28/30, Writing: 28/30]

#### Leadership

#### System Engineer | Team Anant

Aug. 2021 - Aug. 2022

- Coordinated between subsystems and the Indian Space Research Org to resolve inter-subsystem dependencies and issues
- Collaborated with the subsystems, stakeholders, and the institute to finalize the annual team budget for FY' 22-23