Query about contention delays #10

simonowen · 2021-06-06T14:02:32Z

In the zx project memory contention is applied in places such as on_fetch_cycle and on_fetch_cycle before any base operations are performed. That would imply that wait states happen even before the address has been placed on the bus. Is that correct?

I would expect T1 to always execute first to place the address on the bus, before device contention is considered. Applying contention relies on the current tick count too, so it seems important to have applied the initial cycle(s) before any rounding up to the next uncontended position. The Zaks book doesn't cover it in detail but other online sources suggests that T2 might effectively repeat as long as WAIT is held low. I'd guess that means the T2 tick is added after the contention.

For example, if contention only allows memory reads at multiples of 4 ticks (0, 4, 8, ...):

class my_emulator : public z80::z80_cpu<my_emulator> {
public:
    typedef z80::z80_cpu<my_emulator> base;
    unsigned cycles = 0;

    my_emulator() {}

    void on_tick(unsigned t) {
        cycles += t;
    }

    void handle_memory_contention(z80::fast_u16) {
        on_tick(((cycles + 3) & ~3) - cycles);
    }

    z80::fast_u8 on_fetch_cycle() {
        handle_memory_contention(get_pc());
        return base::on_fetch_cycle();
    }

    z80::fast_u8 on_read_cycle(z80::fast_u16 addr) {
        handle_memory_contention(addr);
        return base::on_read_cycle(addr);
    }
};

int main() {
    my_emulator e;
    e.on_step();
    e.on_step();
    std::printf("cycles = 0x%04x\n", e.cycles);
}

Executing two NOPs using the current code from T=0 takes 8 cycles, even though the memory access points within the instruction aren't aligned to the access boundaries. I feel the timing should perhaps be 11 cycles: T1 C C C T2 T3 T4 T1 T2 T3 T4.

I may be missing something, but does what I'm asking make sense?

The text was updated successfully, but these errors were encountered:

kosarev · 2021-06-06T14:51:47Z

In the zx project memory contention is applied in places such as on_fetch_cycle and on_fetch_cycle before any base operations are performed. That would imply that wait states happen even before the address has been placed on the bus. Is that correct?

I think what happens here is that we implement the zx contention model defined in terms of memory and port cycles and not T-states. Primarily because this is how we used to see it described in the Spectrum world. For example, the timing paper of the zx project, https://github.com/kosarev/zx/blob/master/test/screen_timing/SCREEN_TIMING.md, gives the start ticks for the whole cycles and in the same time takes it into account that the actual access takes place one T-state later.

I would expect T1 to always execute first to place the address on the bus, before device contention is considered. Applying contention relies on the current tick count too, so it seems important to have applied the initial cycle(s) before any rounding up to the next uncontended position. The Zaks book doesn't cover it in detail but other online sources suggests that T2 might effectively repeat as long as WAIT is held low. I'd guess that means the T2 tick is added after the contention.

Yes, that's my reading of the timining diagrams in the Z80 User Manual as well. But this z80 emulator doesn't imply any specific approach to implementing contentions. If for your needs it works better to apply contention ticks on T2 of the cycle, then you can just say so in your definitions of cycle handlers. Or, maybe it's me missing something?

simonowen · 2021-06-06T17:04:28Z

Ah, that explains it! I think it also explains curious -1 offset I mentioned was part of 'zx' but not my implementation.

I was finding that once the timings locked on to a contention point they remained correct. However at the boundaries to events with a different alignment there was a timing mismatch. Interrupts are one of the unaligned events on my target machine, and it resulted in the interrupt duration sometimes being too long or too short. Test programs that rely on cycle-accurate timings were failing and I couldn't explain why, until now.

I could probably copy and modify the base implementation for some extra cases, as I have for the fetch/read/write/input/output cycle handlers. Though that would still require correcting for the offset in uses of the tick counter, such as screen drawing, raster position registers, etc. It feels like a Spectrum-specific optimisation has crept into the generic CPU core implementation, which is affecting the derived tick count value that the base class doesn't control.

As a possible alternative, could the base implementation use on_mreq and on_iorq handlers to give the derived class the opportunity to add the appropriate number of wait states where needed? It would need the base implementation to be changed to call them as part of T2 processing, where appropriate. It would simplify the derived classes that want to add contention, and they may no longer need to override the 5 access mentioned handlers above.

Having a pure tick counter benefits unit testing, where tests can be run from specific time points. That's more difficult to do at the moment without applying the offset of 1 at various places, which could be error-prone.

simonowen · 2021-06-07T17:10:24Z

I've created a small patch to implement the suggested changes for testing. To apply contention I just provide on_mreq and on_iorq handlers that examine the addr/port and call on_ticks to apply the appropriate amount of wait states.

The contention hooks are called after T2 for MREQ and T4 for IORQ, and currently only for the Z80 core not 8080. It does now pass my timing test suite so it fixes the last of the differences I was seeing. These positions need checking in case there's an offset bias in my own code that means these offsets are a cycle too late.

I think the mechanism should be general enough to be in the base code for any system wanting contention. Though I think the extra ULA contention that depends on the address bus value may be fairly Spectrum-specific and still need the extra cycle handlers to be overridden? Though if it's general enough for more hooks with an empty base handler then that would further simplify zx too.

simonowen · 2021-06-07T19:10:06Z

I've just noticed my patch causes a test mismatch due to some re-ordering of operations. Zaks page 70 shows T1: PC OUT, T2: PC = PC + 1, T3: INST into IR. Incrementing PC was being performed last so it was moved up before the IR assignment. The tests will need to be updated to match if the other changes remain.

kosarev · 2021-06-12T11:51:51Z

I could probably copy and modify the base implementation for some extra cases, as I have for the fetch/read/write/input/output cycle handlers. Though that would still require correcting for the offset in uses of the tick counter, such as screen drawing, raster position registers, etc. It feels like a Spectrum-specific optimisation has crept into the generic CPU core implementation, which is affecting the derived tick count value that the base class doesn't control.

I see. So if SAM Coupé relies on CPU's wait states, then implementing the cycle handlers would indeed require applying offsets to the current tick. It's not the same for ZX as its ULA just stops the CPU when a clash on the address bus is possible -- something an implementation of which wouldn't benefit from having a lower-level interface, it seems. Revisiting the situation with the existing tick offsets is in my TODO list, though.

As a possible alternative, could the base implementation use on_mreq and on_iorq handlers to give the derived class the opportunity to add the appropriate number of wait states where needed?

The Z80 emulator was actually written with the idea of providing a simulator interface in mind, which is why in early commits it was referred to as a simulator. So the idea is that by giving the cycle handlers custom implementations, it is possible to implement a pin-level interface and, by queing operations like on_set_addr_bus(), even support tick-stepping instead of instruction-stepping -- all without performance penalty.

So yes, I absolutely agree on both the points -- adding lower-level handlers and getting rid of tick offsets is the way to go.

Having a pure tick counter benefits unit testing, where tests can be run from specific time points. That's more difficult to do at the moment without applying the offset of 1 at various places, which could be error-prone.

+1

I've created a small patch to implement the suggested changes for testing. To apply contention I just provide on_mreq and on_iorq handlers that examine the addr/port and call on_ticks to apply the appropriate amount of wait states.

Looks great! If you don't mind, I will borrow this and also re-check it on my side once again that it matches the diagrams.

I've just noticed my patch causes a test mismatch due to some re-ordering of operations. Zaks page 70 shows T1: PC OUT, T2: PC = PC + 1, T3: INST into IR. Incrementing PC was being performed last so it was moved up before the IR assignment. The tests will need to be updated to match if the other changes remain.

No worries. I will take care of this. (BTW, can you give a link to that Zaks page? I'm not sure I have it.)

Thanks!

kosarev · 2021-06-12T12:25:54Z

Hmm, it seems ~MREQ goes active on T1 and for ~IORQ it's T2, so doesn't look like the right time for wait states?

simonowen · 2021-06-13T00:34:43Z

I do suspect the offsets might be wrong. When integrating your CPU core I experimented with my memory/IO wait state alignment until all the tests passed. That might mean my figures are skewed by the same error. You're welcome to use any parts of the patch that still apply.

Someone sent me an analyzer trace showing OUT (C),B [ED 41] being executed on real SAM hardware to change the border colour. All memory access are contended and the I/O access is also contended (but with different wait state rules):

MREQ does go low during T1, and WAIT is pulled low during the same tick if memory contention applies. It looks like T1 is complete but since WAIT is still low at the start of the next rising CPU edge then memory wait states are applied there (just 1 in this case). I think that means T2 should start at the next rising CPU edge that WAIT is also high. The 'ED' opcode seems to stabilise on the address bus during the 3rd tick, which should be T2 when PC is incremented.

This trace actually shows MREQ getting pulled low 4 times, with memory refresh after each instruction fetch. I'm not taking refresh into account for contention, despite the trace showing 1 wait cycle on 3 of the 4 accesses. I don't know whether that means it only applies to RD/WR cases or whether the wait state rounding rules happen to cover it. Does the trace make sense for your understanding of executing ED 41?

Pages 70-71 of this book show the details I was looking at:
https://archive.org/details/Programming_the_Z-80_2nd_Edition_1980_Rodnay_Zaks/page/n67/mode/2up

kosarev · 2021-06-13T09:21:39Z

Oh, that's perfect. So my reading of this is, here we have:

fetch ED: T1 T2 Tw T3 T4
fetch 41: T1 T2 T3 T4
out 21:   T1 T2 TW Tw Tw Tw T3

Then if your aim is to catch the moment of sampling ~WAIT, shouldn't we just call something like on_wait_sample() on T2 of fetch, read and write cycles and also call it on TW of input and output cycles, which is what the Z80 UM says on when ~WAIT is being sampled? ~MREQ and ~IORQ don't seem like what really matters here?

kosarev · 2021-06-13T09:30:33Z

Or maybe it should be just on_wait(), since sampling of the ~WAIT line and applying wait states is the same point of time in our model.

…t states. Based on patches by Simon Owen.

…it states. Based on patches by Simon Owen.

[#10] Introduce a handler for sampling the ~WAIT line and appying wait states.

…it states. Based on patches by Simon Owen.

kosarev self-assigned this Jun 6, 2021

kosarev added the question label Jun 6, 2021

kosarev added a commit that referenced this issue Jun 13, 2021

[#10] Introduce a handler for sampling the ~WAIT line and appying wai…

d1ebb67

…t states. Based on patches by Simon Owen.

kosarev mentioned this issue Jun 13, 2021

Inconsistent timing of on_read/write() and on_input/output() #12

Open

kosarev added a commit that referenced this issue Jun 13, 2021

[#10] Introduce a handler for sampling the ~WAIT line and appying wai…

8d5e585

…t states. Based on patches by Simon Owen.

kosarev added the enhancement label Jun 13, 2021

kosarev added a commit that referenced this issue Jun 13, 2021

[#10] Introduce a handler for sampling the ~WAIT line and applying wa…

f922c28

…it states. Based on patches by Simon Owen.

kosarev added a commit that referenced this issue Jun 13, 2021

Merge pull request #11 from kosarev/issue10_add_on_wait_handler

042b95e

[#10] Introduce a handler for sampling the ~WAIT line and appying wait states.

kosarev added a commit that referenced this issue Jun 13, 2021

[#10] Introduce a handler for sampling the ~WAIT line and applying wa…

6890f78

…it states. Based on patches by Simon Owen.

simonowen closed this as completed Jul 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Query about contention delays #10

Query about contention delays #10

simonowen commented Jun 6, 2021

kosarev commented Jun 6, 2021

simonowen commented Jun 6, 2021

simonowen commented Jun 7, 2021

simonowen commented Jun 7, 2021

kosarev commented Jun 12, 2021

kosarev commented Jun 12, 2021

simonowen commented Jun 13, 2021

kosarev commented Jun 13, 2021

kosarev commented Jun 13, 2021

Query about contention delays #10

Query about contention delays #10

Comments

simonowen commented Jun 6, 2021

kosarev commented Jun 6, 2021

simonowen commented Jun 6, 2021

simonowen commented Jun 7, 2021

simonowen commented Jun 7, 2021

kosarev commented Jun 12, 2021

kosarev commented Jun 12, 2021

simonowen commented Jun 13, 2021

kosarev commented Jun 13, 2021

kosarev commented Jun 13, 2021