# DDR3 Data DQ groups
Data pairs need to be matched separately in DQ-groups, and have more stringent requirements than the other pins:
- DQ group 0..7 bits  should be length matched together with their strobes
- DQ group 8..15 bits should be length matched together with their strobes
- Data, command control-bus: should be matched together.

![](images/BGA96.png)

## DRAM type
- 96-Ball FBGA, DDR3-SDRAM
- Datasheet: https://media.digikey.com/pdf/Data%20Sheets/Micron%20Technology%20Inc%20PDFs/MT41J256M4,128M8,64M16.pdf

# Impedance
Impedance of DDR3-traces should be 
- single-ended signals: 50 ohms
- differential signals: 100 ohms

Whenever a via is used, one must make sure to route all traces from that bus through the same via, to avoid delays.

For the acoustic-carrier-board, most traces go through a via, check if all are routed through a via.

One needs to compensate for that via delay.
## Via delay simulation
- Via width: 0.35 / 0.2 mm
- Material: FR4

## Trace coupling / impedance simulation
- Trace width: 0.1565 mm
- Target single-ended impedance: 50 ohms
- Target differential impedance: 100 ohms

# Definitions
## Write-leveling (WRITE TIMING ONLY)
One-time calibration process.
- Deskews DQS strobe to CK relationship.
- DQS: input, DQ: output for this operation mode

## tDQSCK (READ TIMING ONLY)
Clock-to-data strobe relationship: time difference between DQS and CK, CK#.

This is an internal timing constraint, timing delay between CK/CK# and DQS for reads. This needs to be compensated for by layout.

# Data Setup / Hold / Derating
Setup and hold times are strongly related to slew rate. Larger slew rate means lower setup and hold times.

The slew rate is accounted for by an additional variable $\Delta t_{DS}, \Delta t_{HS}$. Which takes into account the time for the signal to reach the AC175 thresholds.


## AC175-threshold

$$V_{IH}(AC) = V_{ref}(DC) + 175 mV$$
$$V_{IL}(AC) = V_{ref}(DC) - 175 mV$$

## Setup time
Time signal must be applied after the last clock edge.

For the DDR3-IC, total setup time is the sum of:
### $t_{ds, base}$ base setup time
- CMD-CTRL-ADDR bus: 275 ps
- Data bus: 75 ps

### $\Delta t_{DS}$
- CMD-CTRL-ADDR bus: varies between (-62 ps, 120 ps)

## Hold time
Time signal must be held before the next clock edge.

For the DDR3-IC, total hold time is the sum of:

### $t_{dh, base}$ base setup time
- CMD-CTRL-ADDR bus: 275 ps
- Data bus: 75 ps

### $\Delta t_{DH}$
- CMD-CTRL-ADDR bus: varies between (-62 ps, 120 ps)
- Data bus: 

## Slew rate
In order to calculate the actual hold and setup times, one must have an idea of the slew rate.
Slew rates vary from
- (0.4, 2) V/ns for DQ-signals
- (1, 4) V/ns for DQS-differential signals

## Routing topology
### Fly-by Topology
Routing topology where COMMAND/ADDRESS bus is daisy chained with very short / no stubs.

Each chip has its own DQS-groups.

![](images/ddr_flyby2.png)

### Double-T Topology


![](images/double_T.png)


# Alignment
## Alignment types
- UI = 1 / Data Rate (Unit Interval), time duration for a single bit of data.
	- Half CLK period in DQ case
	- Full CLK period in CMD/CTRL case

![](images/data_alignment.png)

### CLK and DATA centered at pin
At the clock edge, the data is centered. (so 90 / 180 degree shift between clock edge and data)
- So there is a positive setup time, and a positive hold time just before the CLK edge.

### CLK and DATA aligned at pin
At the clock edge, the data is invalid. It will only be valid after about 0.5 x UI.
- The data-transition is aligned with the clock transition.
- There's a negative setup time, and a larger hold time (hold time > setup time).

- CLK and DATA aligned at rising clock edge.
	- Data is valid at the positive clock edge.
	- There is a > 0 setup time (so negative absolute) and > hold time that should be respected.
- So you must make sure here that your data is slightly earlier than your clock.

### Write
- Controller (FPGA) sends data to DRAM
- (LPDDR3) DQS is center-aligned for write data.

#### Why?
- DDR3-DRAM uses edges to capture incoming data.
- The controller has the ability to delay the data-signals with respect to the DQS.

### Read
- DRAM sends data to controller (FPGA)
- (LPDDR3) DQS is edge-aligned for read-data

#### Why?
- The DDR3-DRAM doesn't have the ability to delay DQ/DQS like the controller.

### Command-control bus
- Edge-aligned

#### Why?
- Center-alignment is used when data is sampled on both edges, it brings extra complications in hardware
- Unnecessary in this case.

# DLL
## DLL mode vs normal mode
DLL: delay-locked loop, delays the clock.
- Locks the output to an input, without using an oscillator.
Used often to improve the clock rise-to-data output timing.

### DLL disabled mode

Affects 
- tDQSCK (read data clock-to-data strobe)

Does not affect 
- tDQSQ, tQH (read-data-to-data strobe)
- Data needs to be lined up with the controller time domain.

## LPDDR3 vs DDR3
LPDDR3 does NOT support DLL mode for power conservation reasons, DDR3 does. So the clock-to-data output delay from memory device is not compensated.

# WARNING: THE FOLLOWING CALCULATIONS ARE INCOMPLETE

# CTRL / CMD / ADDR Bus (DDR3-800)
- DDRX1 on the FPGA-side, one bit per cycle
- CK, CK# differential clock inputs
- inputs sampled on crossing of positive CK edge, negative CK# edge.

![](images/micron_cmd_ctrl.png)


## Functionality
- Indicates the type of command (read / write)
- Indicates address (Address)
- RAM reset command
- Refresh
- .. (check page 91 on datasheet)

### Input
- CLK and DATA Aligned at pin (case here)

In [None]:

# * 350 MHz cycle time (CMD-CTRL BUS)
t_cycle = 1/(350e6)

# * T_skew
t_cmdctrl_clk_trace = 1e-9 #? GET CORRECT CLK TRACE LENGTH
t_cmdctrl_data_trace = 1e-9 #? GET CORRECT DATA TRACE LENGTH

# * Trace delay
t_trace_delay_s_per_mm = 5.5e-12 # 5.5 ps per mm

# * T_skew_actual
# Choose t_clk_trace e.g.: 6 mm longer than data-trace
t_cmdctrl_clk_trace = t_trace_delay_s_per_mm * 6
t_cmdctrl_data_trace = 0


### LPDDR3 PHY -> FPGA
Setup-time margin 
* Margin = Cycle time - FPGA's clock to output delay - PHY setup time 
* = T_cycle - T_CO(max) - T_su - T_skew [T_skew = T_data_trace - T_clk_trace]
* Time it takes for the data to get to the receiver: T_CO(max) + T_data_trace

Hold-time margin (Positive means hold time is met)
* Margin = Datapath delay (min) - Hold time requirement = Clock to output delay (FPGA_min) + PCB trace delay - PHY hold-time

#### Setup-time margin calculation

In [69]:

#! FPGA->LPDDR3 SETUP TIME MARGIN
# * DDR3 PHY Setup time
t_setup_base = 200e-12
t_setup_slew = 0e-12 # (Make sure to fill in based on slew-rate)
t_setup = t_setup_base + t_setup_slew

# * Clock to output delay (MAXIMUM)
# - Data output invalid before/after CLK Output (CLK+/-)
t_dco_before_clk = 300e-12

#! CALCULATION (t_clk_trace - t_data_trace)
# 
t_setup_margin = (t_dco_before_clk + t_data_trace)  + t_setup - t_clk_trace
t_setup_budget = t_setup_margin / 10
print(f"t_skew_max: {t_setup_margin*1e9} ns")
print(f"Budget: {t_setup_budget} ns")
print(f"Critical length: {t_setup_budget / t_trace_delay_s_per_mm} mm")

# Note: don't add the period to the calculation
# 		Data is sampled at the clock edge, so essentially 
#		this would delay the data to be sampled to the next edge.
# 		The clock traces should in fact have a delay compared to the data traces.

t_skew_max: 0.467 ns
Budget: 4.6700000000000004e-11 ns
Critical length: 8.49090909090909 mm


#### Hold-time margin calculation

In [70]:

#! FPGA->LPDDR3 HOLD TIME MARGIN
# * DDR3 PHY Hold time (time after the CLK-toggle the data needs to be ready)
t_hold_base = 275e-12
t_hold_slew = 0e-12 # (Make sure to fill in based on slew-rate)
t_hold = t_hold_base + t_hold_slew

# * Time output data is valid
# - Shortest possible delay from FPGA internal update to signal change on FPGA output pin
# Should be tDIB (min/max time between clock edge and data validity)
tDIB = 0.3e-9
t_UI = t_cycle / 2
t_do_FPGA_valid = t_UI - tDIB

# * Clock to output delay
t_co_min_FPGA = 0 #? Assume 0 t_co delay due to compensation???

#! CALCULATION (t_clk_trace - t_data_trace)
t_hold_margin = (t_data_trace + t_do_FPGA_valid + t_co_min_FPGA - t_hold) - t_clk_trace
print(f"t_skew_max: {t_hold_margin*1e9} ns")


t_skew_max: 0.8205714285714286 ns


### Data / DQS-Bus (DDR3-800) TX (FPGA -> LPDDR3)

- (FPGA-side) Instantiated using: ODDRX2F-MACRO in ECP5U: so 2 data-bits per clock cycle.
- Center-aligned data sending (so data transition happens before and after the clock edge)


In [71]:

# * 350 x 2 MHz cycle time (CMD-CTRL BUS)
t_cycle = 1/ (350e6 * 2)
t_UI = t_cycle # Half the clock period (time to transmit a single bit)

# * T_skew
t_clk_trace = 1e-9 #? GET CORRECT CLK TRACE LENGTH
t_data_trace = 1e-9 #? GET CORRECT DATA TRACE LENGTH


#### Setup-time calculation

In [None]:

#! FPGA->LPDDR3 SETUP TIME MARGIN
# * DDR3 PHY Setup time
# Data setup time to DQS, DQS#
t_setup_base = 75e-12
t_setup_slew = 0e-12 # (Make sure to fill in based on slew-rate, 128..-62)
t_setup = t_setup_base + t_setup_slew

# * Clock to output delay (MAXIMUM)
# - Data output invalid before/after CLK Output (CLK+/-)
# NOTE: took DDRX-values, not LPDDR-values
t_dco_before_clk = 0.442e-9 + 0.5 * t_UI


#! CALCULATION (t_clk_trace - t_data_trace)
t_skew_max = t_cycle - t_dco_before_clk - t_setup
print(f"t_skew_max: {t_skew_max*1e9} ns")

t_skew_max: 0.19728571428571437 ns


#### hold-time calculation

In [34]:

#! FPGA->LPDDR3 HOLD TIME MARGIN
# * Clock to output delay (MAXIMUM)
# - Data output valid before next CLK Output (CLK+/-)
t_dco_before_next_clk = 0.442e-9 + 0.5 * t_UI

# * DDR3 PHY Hold time
t_hold_base = 150e-12
t_hold_slew = 0e-12 # (Make sure to fill in based on slew-rate, 88..-30)
t_hold = t_hold_base + t_hold_slew

# * T_skew
t_clk_trace = 1e-9 #? GET CORRECT CLK TRACE LENGTH
t_data_trace = 1e-9 #? GET CORRECT DATA TRACE LENGTH

#! CALCULATION (t_clk_trace - t_data_trace)
t_skew_max = (t_cycle - t_dco_before_next_clk) - t_hold
print(f"t_skew_max: {t_skew_max*1e9} ns")


t_skew_max: 0.12228571428571439 ns


### Data / DQS-Bus (DDR3-800) RX (LPDDR3 -> FPGA)
- Edge-aligned: the LPDDR3-controller typically doesn't have the resources to center-align data.

In [6]:

# * 350 x 2 MHz cycle time (CMD-CTRL BUS)
t_cycle = 1/ (350e6 * 2)
t_UI = t_cycle # Half the clock period (time to transmit a single bit)

# * T_skew
t_clk_trace = 1e-9 #? GET CORRECT CLK TRACE LENGTH
t_data_trace = 1e-9 #? GET CORRECT DATA TRACE LENGTH

#### Setup-time calculation

In [10]:

#! FPGA->LPDDR3 SETUP TIME MARGIN
# * DDR3 PHY Setup time
# Data setup time to DQS, DQS#
t_setup_base = -344e-12 + 0.5*t_UI
t_setup_slew = 0e-12 # (Make sure to fill in based on slew-rate, 128..-62)
t_setup = t_setup_base + t_setup_slew

# * Clock to output delay (MAXIMUM)
# - Data output invalid before/after CLK Output (CLK+/-)
# NOTE: took DDRX-values, not LPDDR-values
t_dco_before_clk = 0.442e-9 + 0.5 * t_UI


#! CALCULATION (t_clk_trace - t_data_trace)
t_skew_max = t_cycle - t_dco_before_clk - t_setup
print(f"t_skew_max: {t_skew_max*1e9} ns")

t_skew_max: -0.09799999999999992 ns


#### hold-time calculation
$t_{QH}$ is the guaranteed output hold-time from DQS, DQS#. So the time the data will be held stable by the LPDDR3-peripheral after it is sent.

In [4]:

#! FPGA->LPDDR3 HOLD TIME MARGIN
# * Clock to output delay (MAXIMUM)
# - Data output valid before next CLK Output (CLK+/-)
t_dco_before_next_clk = 0.442e-9 + 0.5 * t_UI

# * DDR3 PHY Hold time
t_hold_base = 150e-12
t_hold_slew = 0e-12 # (Make sure to fill in based on slew-rate, 88..-30)
t_hold = t_hold_base + t_hold_slew

# * T_skew
t_clk_trace = 1e-9 #? GET CORRECT CLK TRACE LENGTH
t_data_trace = 1e-9 #? GET CORRECT DATA TRACE LENGTH

#! CALCULATION (t_clk_trace - t_data_trace)
t_skew_max = (t_cycle - t_dco_before_next_clk) - t_hold
print(f"t_skew_max: {t_skew_max*1e9} ns")


t_skew_max: 0.12228571428571439 ns


## CK/CK# and DQS-offset calibration
- $t_{DQSCK}$: maximum offset write-leveling can compensate for. (on the LPDDR3-side) (total time budget here)
- $t_{CO,fpga}$: maximum delay between CK/CK# and DQS on the FPGA side.
- $t_{CO,dqs}$: maximum delay between CK/CK# and DQS on the DQS side.

The maximum time-offset is: $$T_{DQSCK} - t_{CO} - abs(t_{clk} - t_{dqs})$$ for both data-groups

In [None]:

#! CK/CK# and DQS offset calibration

# * tDQSQ
tDQSQ = 200e-12 # DQSQ, DQSQ# to DQ skew, per access


# Sources
- ECP5 and ECP5-5G High-Speed I/O Interface (FPGA-TN-02035-1-3)


## OrangeCrab
### Command Control bus
Distances:
- CK, CK#: 24.79 x 2 (adding setup time margin)
- CS#: 18.13 mm
- BA0: 18.1 mm
- A1: 18.16

### DQ[0..7]
- D3: 18.44 mm
- D4: 18.4468 mm
- LDQS+ / LDQS-: 18.9 mm

### DQ[8..15]
- D13: 18.935
- DQU/#:  18.4852 mm
- D14: 18.94 mm

## TrellisBoard
### Command Control bus
Distances:
- CK, CK#: 77.5 mm
- CKE: 88 mm
- CAS: 87 mm

### DQU
- DQ10: 36.9 mm
- DQ14: 36.83 mm
- DQS1+-: 36.833 mm

### DQL
- DQ3: 40.5 mm
- DM0:: 40.5262 mm
- DQ1: 40.5173 mm
- DQS0+-: 40.5 mm

## Rudimentary Timing 

- Table 58: DQS, DQS# to DQ skew, per access - tDQSQ: 200 ps max (DDR3L-800) (total time budget)
- Period is 2 / 800e6 = 2.5 ns
- Between clock pairs
	- 5 ps
	- (300e6 m/s * 50e-12 s) / ((sqrt(4.1) + 1) / 2) = 9.9 mm (6.635 mm in )

- Between DQS+ and DQS-: +-2ps
	- 0.4 mm
- Between DQ-signals: 10 ps
	- 2 mm


# Testpoints
- Add small testpoints to DDR3-interface (0.5 mm diameter)
