# Lecture 1: Introduction & Basics
*Notes*
<hr>

**Leading goal of the course:** teaching to build fundamentally better architectures

<hr>

### Key current directions

Computer architecture is currently looking into producing:
- Fundamentally **secure/reliable/safe** architectures
- Fundamentally **energy-efficient** architectures (memory-centric arch.)
    - <u>example:</u> data is moved around a lot within an architecture's memory, leading to inefficiency and heating as processing and memory units are separated under the Von Neumann paradigm
- Fundamentally **low-latency and predictable** architectures
- Specialized architectures (**AI/ML**, Genomics, Medicine, health)

## 1 - The transformation hierarchy

The **transformation hierarchy** is an extended view of the general understanding of computer architecture (restricted to the software/hardware interface and micro-architecture). The expanded view [of computer architecture] covers within the transformation hierarchy:

*Problem*
- **Algorithm**
- **Program/Language**
- **System Software**
- **SH/HW interface**
- **Micro-Architecture**
- **Logic**
- **Devices**

*Electrons*

The expanded view helps understand the seemless working of machines and the goal is to help **co-design** architectures by integrating the herarchy together. 

<u>To read:</u> "You and your research," Richard Hamming

## 2 - Why study computer architecture?

### Computer Architecture

It is the science and art of **designing computing platforms**, including the hardware, interface, system software and programming model.

It is **hard to evaluate** the whole technology stack of a machine scientifically, especially when looking forward to potential future development. It is in part art.

Computer architecture aims **to achieve a set of design goals** (e.g. highest performance on a workload, longest battery life, etc.).

Computer architectures/platforms can be **tailored to the task** or **generalist**.

<u>i.e.</u>

- Enable better systems (faster, cheaper, smaller, more reliable)
- Enable new applications
- Enable better solutions to problems
- Understand why computers work the way they do

"*The goal is to optimize the top and the bottom, but also the communication between and across both top and bottom.*"

## 3 - Some cross-layer design examples

#### EDEN: Data-Aware Efficient DNN Inference

- relies on approximate DRAM

#### SMASH: SW/HW Indexing Acceleration

- Efficient sparse matrix operations

#### GenASM

- low-power approximate string matching acceleration framework

#### NERO

- Stencil Acceleration for weather predicfion modeling

#### NATSA

- Near-data processing acceleration for time series analysis

## 4 - One Problem: Limited SW/HW communication

Usually the higher-level information is not visible to hardware.

![codesign](images/codesign.png)

Solution: **more expressive interfaces**

![codsol](images/codesignsol.png)

Expressive Memory (e.g. X-MeM) aids on many optimization such as compression acceleration. Usually data information (metadata) is lost through the SW/HW interface.

Communicating datatypes/structures is also an issue/example.

Other example: memory error tolerance with hybrid memory systems can be exploited to make more reliable processes. As such being able to distinguish between vulnerable and tolerant data could help design better memory management (**Heterogeneous-Reliability Memory**, 2014).

![hrm](images/HRM.png)

## 5 - Ongoing Developments

With regards to Moore's law, DRAM scaling is more problematic than Logic (CPU) scaling.

### 1st reason why: Performance and Energy Efficiency

- **non-volatile main memory** (persistent memory device, Intel Optane Persistent Memory) based on 3D-XPoint technology.

- **Cerebras' Wafer Scale Engine (-2)** (largest chip on the market, specialized in ML acceleration 2.6tr transistors vs. 54.2bn in the largest GPU). 

- **UPMEM Processing-in-DRAM memory modules** (processor in the DRAM stick that allows computing in memory or near memory).

- **Samsung Function-in-Memory DRAM**

### Processing in-, near- and using memory

Processing using memory relies on using the memory bank itself for computing (i.e. computing occurs as the data is accessed by using on the fundamental properties of the memory device).

### Systolic Array

- almost at once matrix multiplication (TPU) for ML acceleration