/
turbor-vdp-io.txt
427 lines (337 loc) · 13.7 KB
/
turbor-vdp-io.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
Goal of the experiment
======================
On a MSX2+ machine or a turboR machine in Z80 mode, there are extra wait cycles
when accessing a VDP IO-port (0x98-0x9b) compared to other IO ports (these are
introduced by the T9769{A,B,C} chip).
In openMSX we also added those wait cycles in R800 mode. That's likely wrong.
We verify this hypothesis in this experiment. Spoiler: in R800 mode there are
indeed no extra wait cycles.
Sub-goals
=========
I don't have my real MSX computers permanently connected anymore. When I do set
them up, it's not too much extra effort to run a few more tests.
Emulation of Z80 timing is pretty straight forward. R800 is more complex. We
know about the following peculiarities:
- Align to even cycle on IO instructions (because R800, which runs at 7MHz,
needs to access the IO bus, which runs at 3.5MHz).
- Doing two accesses to VDP IO-ports introduces a delay when those accesses are
"too close together".
- Refresh behavior. Every so many cycles (current estimate: 210 cycles @7MHz),
normal R800 instruction execution gets interrupted for a refresh period
(current estimate: this period lasts 25.5 cycles. Best guess for that half
cycle: because 50% chance it needs to align to an even cycle).
Design of the experiment
========================
I (repeatedly) run test programs with the following structure:
DI
OUT (#20),A ; reset counter
<body>
IN A,(#20) ; latch counter, and read 32-bit value in [DE:HL]
LD L,A
IN A,(#21)
LD H,A
IN A,(#22)
LD E,A
IN A,(#23)
LD D,A
EI
RET
Where <body> varies from test-to-test.
Alex Wulms gave me a FPGA test board that implements a 32-bit counter which
runs at 3.5MHz and is accessed via IO-ports 0x20-0x23. In openMSX this device
is emulated as the 'hirestimer' extension.
- Writing (any) value to port 0x20 resets the counter to zero.
- That counter increments at a frequency of 3.5MHz (because the MSX IO-bus runs
at this frequency).
- Reading IO port 0x20 latches the value of the counter and returns the 8 LSB
bits.
- Reading IO ports 0x21-0x23 retrieves the other bits of the latched value.
So in Z80 mode (runs at 3.5MHz) this board allows to make cycle-accurate
measurements. But the R800 runs at 7MHz, so we can only measure up-to 2 cycles
accurate.
The test code is located at the start of a 256-byte block (specifically at
address 0x0100 in my case). This ensures that none of the instruction fetches
take an extra cycle for the "page-break".
I ran all tests on a turbor-GT machine, both in Z80 and R800 mode.
(Not important) I ran all tests in compass, because that allows me to easily:
- Edit the test program
- Re-assemble (CTRL-A)
- Execute (CTRL-G)
- Look at the result: the value in registers DE:HL (F4)
The actual tests
================
In this section I'll present _processed_ test results. Because often the
details are not important for our main goal and too detailed results would only
add confusion. However these details could be important to (in the future)
investigate the sub-goals. Therefor I still included the raw results in an
appendix.
1) Establish a base-line
------------------------
Replace <body> with a sequence of n NOP instructions.
n | Z80 | R800
---+-----+-----
0 | 12 | 5
1 | 17 | 5
2 | 22 | 6
3 | 27 | 6
4 | 32 | 7
5 | 37 | 7
The case n=0 (an empty <body>) is important. It measures the time between
resetting the counter and immediately after reading it back. Thus
OUT (#20),A ; reset counter
IN A,(#20) ; read it back
This means that for the actual duration of the <body>, we must subtract this
n=0 result.
On a Z80 both an 'OUT (n),A' and 'IN A,(n)' instruction take 12 cycles. So the
value 12 in the above table makes sense. (Actually it also depends where
exactly within the OUT instruction the counter gets reset and where exactly the
counter gets latched, but apparently that happens at the same relative offsets
in the OUT/IN instructions).
On Z80, for the n>=1 cases we then get the, totally as expected, result that a
NOP instruction takes 5 cycles.
For R800 n=0 we measure 5 cycles at 3.5MHz, this could be either 9 or 10 cycles
at 7MHz. In our current R800 emulation model an IN or OUT instruction takes 9
cycles plus an optional cycle to 'align' the 7MHz R800-bus with the 3.5MHz
IO-bus. So that would be 10 cycles @7MHz in this case which matches the
measured value.
The case R800 n=1 also measures 5, that's because the cost of the NOP
instruction is "absorbed" by no longer having to wait for 1 cycle to align both
buses. And also for n>=2, the measured value only goes up every 2nd increment
of n.
2a) Measure a single OUT instruction, non-VDP port
--------------------------------------------------
In these group of tests the body of our test program is
NOP, repeated m times
OUT (#30),A
NOP, repeated n times
I arbitrarily picked IO port 0x30 because it has no function and because (I
hoped) it introduces no extra wait cycles when accessing it. I put a variable
amount (possibly zero) of NOP instruction both in front and after the OUT
instruction.
Here are the results:
m | n | Z80 | R800
---+---+-----+------------
0 | 0 | 24 | 10 (sometimes 22)
0 | 1 | 29 | 10 (sometimes 23)
0 | 2 | 34 | 11 (sometimes 23)
0 | 3 | 39 | 11 (sometimes 24)
0 | 4 | 44 | 12 (sometimes 24)
0 | 5 | 49 | 12 (sometimes 25)
--+---+-----+------------
1 | 0 | 29 | 10 (*)
1 | 1 | 34 | 10 (*)
1 | 2 | 39 | 11 (sometimes 23,24,25)
--+---+-----+------------
2 | 0 | 34 | 11 (sometimes 23)
2 | 1 | 39 | 11 (sometimes 24)
2 | 2 | 44 | 12 (sometimes 24,25)
(*) The lack of a 'sometimes' annotation does not mean it never occurs,
instead it probably means I didn't measure long enough to observe it.
Z80 is completely as expected. Here's the formula that predicts the result:
12 + 12 + 5*(m+n)
The 'normal' R800 values (ignoring the 'sometimes' annotations) can also be
explained. There are now two IO instructions that can 'absorb' the cost of a
NOP. Thus again the measured value only goes up by 1 for every 2nd increment of
'm' or 'n' independently.
Now for the 'sometimes' values. For example for R800 n=0 we mostly measured
value 11, but from time time instead we measured the value 23. Very roughly
speaking, the chance of measuring the alternative value went up with the
duration of the test (so mostly visible in test 3b with high 'n' values).
This phenomenon can be explained by the R800 refresh stuff. At regular
intervals the R800 gets interrupted from normal instruction execution for
refresh. If such an interruption falls in the test then we'll measure a higher
duration. The value of this extra duration is 12 or 13 (glossing over the m=1,
n=2 result). This corresponds with ~25 R800 cycles. (This matches the current
emulation model).
2b) Measure a single OUT instruction, VDP port
----------------------------------------------
This is the most important test in this experiment (from the point of view of
the original goal). It's the same test as in 2a) but with port 0x30 replaced
with 0x98.
m | n | Z80 | R800
---|---+-----+------------
0 | 0 | 25 | 10 (sometimes 22)
0 | 1 | 30 | 10 (sometimes 23)
0 | 2 | 35 | 11 (sometimes 23)
0 | 3 | 40 | 11 (sometimes 24)
0 | 4 | 45 | 12 (sometimes 24)
0 | 5 | 50 | 12 (sometimes 25)
--+---+-----+------------
1 | 0 | 30 | 10 (sometimes 22)
1 | 1 | 35 | 10 (sometimes 23)
1 | 2 | 40 | 11 (sometimes 23,25)
--+---+-----+------------
2 | 0 | 35 | 11 (sometimes 23)
2 | 1 | 40 | 11 (*)
2 | 2 | 45 | 12 (*)
Comparing 2a) with 2b) gives:
- Identical results for R800.
- For Z80 2b) is exactly 1 cycle higher than 2a).
Thus we can confirm our hypothesis:
- On a turboR in Z80 mode, accessing IO-port 0x98-9b introduces 1 wait cycle.
- But doing the same in R800 mode does not.
3a) Two OUT instructions, VDP-port followed by non-VDP port
-----------------------------------------------------------
Ok, so in the previous test we've already accomplished our goal. But we know
that there's something else going on in R800 (but not in Z80) mode, namely wait
cycles between VDP accesses that are "too close together". While we're add it,
let's also do some measurements on those.
To get a new base-line, we first measure access to a VDP-port followed by
access to a non-VDP port. More specifically, the new test body is:
OUT (#98),A
NOP, repeated n times
OUT (#30),A
Which results in:
n | Z80 | R800
---+-----+------------------
0 | 37 | 15 (sometimes 27)
1 | 42 | 15 (sometimes 28)
2 | 47 | 16 (sometimes 28)
3b) Two VDP-OUT instructions
----------------------------
Now the more interesting test, we make the test body:
OUT (#98),A
NOP, repeated n times
OUT (#98),A
And we get:
n | Z80 | R800
---+-----+-------------------------
0 | 38 | 41 (sometimes 53)
1 | 43 | 41 (sometimes 53)
2 | 48 | 41 (sometimes 53)
5 | 63 | 41 (sometimes 53)
10 | 88 | 41 (sometimes 53)
20 | 138 | 41 (sometimes 53)
30 | 188 | 41 (sometimes 42,53)
40 | 238 | 41 (sometimes 47,48,53)
50 | 288 | 41 (sometimes 52,53)
51 | 293 | 41 (sometimes 53)
52 | 298 | 41 (sometimes 53,54)
53 | 303 | 41 (sometimes 54,55)
54 | 308 | 42 (sometimes 54,55)
55 | 313 | 42 (sometimes 55,56)
56 | 318 | 43 (sometimes 55,56)
57 | 323 | 43 (sometimes 56,57)
58 | 328 | 44 (sometimes 56,57)
59 | 333 | 44 (sometimes 56,57,58)
60 | 338 | 45 (sometimes 57,58)
65 | 363 | 47 (sometimes 59,60,61)
70 | 388 | 50 (sometimes 62,63)
(*) for higher values of 'n' the qualification 'sometimes' becomes more and
more frequent
The trend is clear. For n<=53 the output remains constant. For n>53 the output
is again linearly rising, and in fact following the same curve as for non-VDP
ports.
If you work out the details of how many wait cycles are added, you'll find that
it matches pretty well with the current openMSX emulation model: openMSX
requires at least 62 cycles @7MHz between two consecutive VDP IO access. (TODO
see next chapter.
TODO Comparison with openMSX
============================
TODO repeat these experiments in openMSX and compare.
[I plan to work on this soon, but need to take a break and want already to
publish this.]
Appendix: raw test measurements
===============================
Notes:
- Test results are written as hex values (all other values are decimal)
- Comma separated values within one cell are repeated measurements of the same
test.
- In some cases I did not repeated a test, in other cases many times. This is
based on when I 'guessed' the variation in the result was interesting. Or
when I was expecting some variation, but didn't get it in the initial
measurements, I kept measuring until I did see a different result.
- Though that does mean there is some bias in the results. E.g. you cannot
use these results to get an accurate estimate of the frequency at which the
different values occur.
1) No (extra) OUT instructions
<body> =
NOP, repeated n times
n | Z80 | R800
--+-------+-----
0 | c | 5
1 | 11,11 | 5,5
2 | 16,16 | 6,6
3 | 1b,1b | 6,6
4 | 20,20 | 7,7
5 | 25,25 | 7,7
2a) Single OUT instruction, non-VDP port
<body> =
NOP, repeated m times
OUT (#30),A
NOP, repeated n times
m | n | Z80 | R800
--+---+-------+------------
0 | 0 | 18,18 | 16,a,a,a,16
0 | 1 | 1d | a,a,17,a,a
0 | 2 | 22 | b,b,b,b,b,17
0 | 3 | 27,27 | b,b,b,b,18
0 | 4 | 2c | 18,c,c,c
0 | 5 | 31,31 | c,19,19,c,c
--+---+-------+------------
1 | 0 | 1d,1d | a,a,a,a,a,a,a ... (46 measurements)
1 | 1 | 22 | a,a,a,a,a,a,a ... (11 measurements)
1 | 2 | 27 | b,b,b,b,b,19,18,17,17
--+---+-------+------------
2 | 0 | 22 | b,b,b,b,17
2 | 1 | 27 | b,b,b,b,b,b,b,18
2 | 2 | 2c | 19,c,c,c,18,c
2b) Single OUT instruction, VDP port
<body> =
NOP, repeated m times
OUT (#98),A
NOP, repeated n times
m | n | Z80 | R800
--|---+-------+------------
0 | 0 | 19,19 | a,a,a,a,16
0 | 1 | 1e | a,a,a,17
0 | 2 | 23 | b,b,b,17
0 | 3 | 28 | b,b,b,b,b,18
0 | 4 | 2d | c,18,c
0 | 5 | 32 | c,19,c,c
--+---+-------+------------
1 | 0 | 1e | a,a,a,16,a,a
1 | 1 | 23 | a,17,a,a,17
1 | 2 | 28 | b,b,b,b,b,b,b,b,19,17,b,b
--+---+-------+------------
2 | 0 | 23 | b,b,b,b,b,b,b,b,b,b,b,b,b,b,b,b,17,17,b
2 | 1 | 28 | b,b,b,b,b,b,b
2 | 2 | 2d | c,c,c,c,c,c,c,c,c,c
3a) Two OUT instructions, VDP-port followed by non-VDP port
<body> =
OUT (#98),A
NOP, repeated n times
OUT (#30),A
n | Z80 | R800
--+-------+------------------
0 | 25,25 | f,1b,f,f,f,f
1 | 2a | 1c,f,f,f,f
2 | 2f | 1c,10,10,1c,10,10
3b) Two OUT instructions, VDP-port followed by another VDP port
<body> =
OUT (#98),A
NOP, repeated n times
OUT (#98),A
n | Z80 | R800
---+--------+-------------------------
0 | 26,26 | 29,29,29,35,29,29,29
1 | 2b | 29,29,35,29,29,29,35
2 | 30 | 29,35,29,35,29,29,29
5 | 3f | 35,29,29,29,29
10 | 58 | 29,29,35,29,29,29,35
20 | 8a | 29,29,29,29,35
30 | bc,bc | 2a,29,35,29,29,29,29,29
40 | ee,ee | 29,29,30,35,29,29,2f,30,29,29,29,29,2f (30 not a typo!)
50 | 120,120| 35,29,34,29,29,29,29,34,29,29,29,25 (34 not a typo!)
51 | 125 | 29,35,29,29,29,29,29,35,35,29,29
52 | 12a | 36,29,35,29,29,29,36,29,36,29,36,29,36,35
53 | 12f,12f| 29,29,29,36,29,37,29,29,29,29,36,36,29,29
54 | 134 | 2a,2a,2a,2a,2a,2a,37,2a,2a,2a,36,2a,2a,2a,37,2a,2a,2a,36,2a,37
55 | 139 | 2a,38,36,38,2a,38,37,2a,37,2a,38,36,2a,37,2a,38
56 | 13e | 2b,2b,2b,2b,2b,38,37,2b,2b,37,2b,2b,38
57 | 143 | 39,37,37,38,2b,39,38,2b,2b,2b,38,2b,2b,2b,38
58 | 148 | 38,39,2c,38,2c,2c,2c,2c,39,38,38,2c,2c,2c,2c,39
59 | 14d | 39,2c,39,39,39,2c,38,3a,39,2c,2c,39,2c,2c,2c
60 | 152,152| 39,2d,39,2d,3a,2d,3a,39,2d,39,2d
65 | 16b | 3d,3b,3c,3b,2f,2f,3d,3c,3b,2f,2f,2f,3c,2f
70 | 184,184| 3e,3e,3f,32,3e,3e,32,32,3f,32,3f,32,3e,3f,32