# PYNQ‑Z2 Lab Protocol: Accelerating Conway’s Game of Life with Vitis HLS + Vivado

**Audience:** Embedded Systems students
**Board:** PYNQ‑Z2 (Zynq‑7020)
**Tools:** Vitis HLS, Vivado, Jupyter on PYNQ
**Goal:** Implement a hardware accelerator for Conway’s Game of Life (GoL) and compare performance against a software baseline on the PYNQ‑Z2.

---

## Learning outcomes

* Understand when and why bit‑level algorithms (like GoL) benefit from FPGA acceleration.
* Use Vitis HLS to synthesize C/C++ into an AXI‑compliant RTL IP core.
* Integrate custom HLS IP into a Vivado block design targeting PYNQ‑Z2.
* Generate bitstream and hardware handoff (.hwh) suitable for PYNQ overlays.
* Control the accelerator from Python in a Jupyter notebook and measure speedup.

## Prerequisites

* PYNQ‑Z2 board imaged and reachable over network (Jupyter working).
* Tool versions installed on host: Vivado + Vitis HLS (matching versions), PYNQ image compatible with the board.
* Provided files:

  * `gameoflife.cpp` (HLS source with AXI pragmas)
  * `gameoflife_tb.cpp` (HLS testbench)
  * `gameoflife_soft.ipynb` (PYNQ notebook for CPU implemetation)
  * `gameoflife_hard.ipynb` (PYNQ notebook for FPGA implemetation)

> **Note:** Keep the accelerator clock at 100 MHz unless otherwise instructed to match the PS‑generated FCLK0.

---

## Part A — Software baseline on the board

1. **Boot the board** and connect to Jupyter (default user `xilinx`).
2. Open the provided **software‑only GoL** notebook.
3. **Run one iteration** on a reasonably sized grid (e.g., 1024×1024 or as specified in the notebook).

   * Record the **time per iteration** (ms) and any configuration (grid size, number of iterations, seed).
4. Save results (you’ll compare against the hardware design later).

---

## Part B — Build the accelerator IP in Vitis HLS

1. **Create project** → *HLS Component* (C/C++). Name it `gol_hls`. Set **Part/Board** to match **PYNQ‑Z2** (or the same Zynq‑7020 part Vivado will use).
2. **Add sources:** `gameoflife.cpp`.
   Ensure the function top‑level has these interfaces (already in the template):

   * AXI4‑Lite for control (start/done, args)
   * AXI4 (m_axi) for frame buffers in DDR
3. **Add testbench**: `gameoflife_tb.cpp`.
4. **C Simulation**: run to verify functional correctness (expected testbench pass).
5. **C Synthesis**: target **100 MHz** solution clock. Inspect utilization/timing estimates.
6. *(Optional)* **C/RTL Co‑Simulation** to confirm RTL matches C.
7. **Package/Export RTL**: *Solution → Export RTL* (package IP).
   This creates a reusable IP repository folder (note its path).

---

## Part C — Integrate IP in Vivado (Block Design)

1. **New Vivado Project** → RTL project without default sources. Select **PYNQ‑Z2** as the target **board**.
2. **Create Block Design** (e.g., name `system`).
3. **Add IP** → **ZYNQ7 Processing System**.
4. **Run Block Automation** to configure Zynq for PYNQ‑Z2 (clocks, DDR, MIO). FCLK0 will be available to drive the PL at 100 MHz.
5. **Add your HLS IP repository**: *Settings → IP → Repository* → **Add** the folder exported by Vitis HLS.
6. **Insert your GoL IP** into the block design.
7. **Connect control (AXI‑Lite) interface**:

   * Use **Run Connection Automation** to connect the IP’s `s_axi_control` (AXI‑Lite slave) to **PS M_AXI_GP0** via an **AXI Interconnect/SmartConnect**.
8. **Enable a PS HP/HPC port and connect IP master (m_axi)**:

   * In **ZYNQ7 PS** configuration, enable one **S_AXI_HP0** (or **HPC**) **slave** port.
     *(This allows **PL masters** to access DDR through the PS.)*
   * Select your IP’s `m_axi` interface and **Run Connection Automation** to attach it to **S_AXI_HP0** (via AXI Interconnect/SmartConnect).
9. **Clocks and resets**: Ensure the IP’s `ap_clk` is driven by **FCLK_CLK0 (100 MHz)** and the reset is correctly connected (*Processor System Reset* may be inserted automatically).
10. **Address Editor**: Click **Assign All** so the AXI‑Lite control space gets a valid base address.
11. **Validate Design** (green check icon). Resolve any DRCs before continuing.
12. **Generate HDL Wrapper** for the block design (use **Let Vivado manage wrapper**).
13. **Run Synthesis → Implementation → Generate Bitstream**.
    When complete, open the implemented design and re‑validate if needed.

**Artifacts to locate** (names will reflect your BD name, often `system`):

* Bitstream: `.../project.runs/impl_1/system_wrapper.bit`
* Hardware handoff: `system.hwh` (commonly under `.../project.srcs/sources_1/bd/system/` or `.../project.gen/sources_1/bd/system/`).

> If unsure, search the project directory for `*.hwh`.

---

## Part D — Deploy overlay files to PYNQ

1. Choose a base name (e.g., `gol`). **Rename** your files so base names **match**:

   * `gol.bit`
   * `gol.hwh`
2. Copy files and notebook to the board (example using `scp`):

   ```bash
   scp gol.bit gol.hwh notebook_gol.ipynb xilinx@<board_ip>:/home/xilinx/jupyter_notebooks/gol/
   ```
3. On the board, confirm the files are in the destination folder and names match exactly.

---

## Part E — Run the hardware‑accelerated notebook

1. Open `gameoflife_hard.ipynb` on the board.
2. The notebook will:

   * Load the overlay: `Overlay('gol.bit')`
   * Set up buffers and parameters
   * Launch the accelerator and collect timings
3. **Measure and record**:

   * Time/iteration (hardware)
   * Speedup vs. Part A baseline
   * Correctness: final board state matches CPU reference for the same initial grid and iteration count
4. Save results.

---

## Deliverables (per student/team)

* Baseline CPU time/iteration and configuration used.
* FPGA time/iteration, configuration, and calculated speedup.
* Screenshot of Vivado **Block Design** showing connections.
* Short note on any design decisions (clock, data widths) and issues encountered.

---

## Troubleshooting & tips

* **IP not visible in Vivado**: Confirm you added the **correct repository path** (the folder containing `component.xml`). Click *Refresh*.
* **m_axi connection fails**: Ensure **S_AXI_HP0/HPC** is **enabled** in ZYNQ PS and use **Connection Automation** to insert the proper interconnect.
* **Address conflicts**: Open **Address Editor** and use **Assign All** or manually choose a free range.
* **Timing warnings in HLS**: Re‑synthesize at 100 MHz or adjust pragmas (unroll/partition) after the basic flow is working.
* **Naming for PYNQ**: `.bit` and `.hwh` **must share the exact same base name** (e.g., `gol.bit` & `gol.hwh`).
* **Clock domain**: Keep the accelerator on **FCLK0 @ 100 MHz** unless you’ve planned constraints across domains.
* **Correctness checks**: Compare a few iterations (including edge cases and wrap/no‑wrap rules matching your implementation) against the software version.

---

## Optional extensions (time permitting)

* Try different **grid sizes** and measure scaling.
* Experiment with **data packing** (e.g., bit‑packing 32 or 64 cells per word) and observe bandwidth effects.
* Enable a **second FCLK** and explore higher clock rates after validating at 100 MHz.
