Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SW SPI speedup for Due (was: u8g2 in repetier-firmware v2) #749

Closed
repetier opened this issue Nov 18, 2018 · 24 comments
Closed

SW SPI speedup for Due (was: u8g2 in repetier-firmware v2) #749

repetier opened this issue Nov 18, 2018 · 24 comments
Milestone

Comments

@repetier
Copy link

I'm in the progress of using u8g2 instead of old u8g in next major release V2.

So far it is working form the functional side, but one of the most used displays are ST7920 based with software SPI. Till now V2 only supports due so I was quite astonished to see it taking more then 500ms to show a screen. It is out of question that this is not acceptable and would prevent usage. So i started investigating the software SPI. I saw you did already optimize it for 8 bit, but for due it used a generic solution, which explained the speed difference.

So I copied the 8 bit part and rewrote it to use WRITE from the firmware fastio solution using the hardcoded pin numbers (not friendly for your library but easy for me to do while being fast). Now refresh time was 28ms which would be quite usefull - except the display had no content. After some googling and reading the datasheet it was clear that my solution was much too fast for that controller.

So the next days I did many test. First I noticed that you have only 2 spi modes for software spi. YOu only use clock polarity and always send data on the first clock edge. ST7920 and the defined mode 3 let me assume that the rising edge is correct but it should be the second one.

Here a working version I came up with (taking 130ms in this version)

#elif defined(__SAM3X8E__)

inline void u8g2_spi_wait_short() {
    asm volatile("nop" ::); // 11.9ns
    asm volatile("nop" ::); // 11.9ns
    asm volatile("nop" ::); // 11.9ns
    asm volatile("nop" ::); // 11.9ns
    asm volatile("nop" ::); // 11.9ns
    asm volatile("nop" ::); // 11.9ns
    asm volatile("nop" ::); // 11.9ns
    asm volatile("nop" ::); // 11.9ns
    asm volatile("nop" ::); // 11.9ns

    asm volatile("nop" ::); // 11.9ns
    asm volatile("nop" ::); // 11.9ns
    asm volatile("nop" ::); // 11.9ns
    asm volatile("nop" ::); // 11.9ns
    asm volatile("nop" ::); // 11.9ns
    asm volatile("nop" ::); // 11.9ns
    asm volatile("nop" ::); // 11.9ns
    asm volatile("nop" ::); // 11.9ns
    asm volatile("nop" ::); // 11.9ns

    asm volatile("nop" ::); // 11.9ns
    asm volatile("nop" ::); // 11.9ns
    asm volatile("nop" ::); // 11.9ns
    asm volatile("nop" ::); // 11.9ns  261.8ns till here
    asm volatile("nop" ::); // 11.9ns
    asm volatile("nop" ::); // 11.9ns
    asm volatile("nop" ::); // 11.9ns
    asm volatile("nop" ::); // 11.9ns
    asm volatile("nop" ::); // 11.9ns  321.3ns
}

inline void u8g2_spi_wait() {
    asm volatile("nop" ::); // 11.9ns
    asm volatile("nop" ::); // 11.9ns
    asm volatile("nop" ::); // 11.9ns
    asm volatile("nop" ::); // 11.9ns
    asm volatile("nop" ::); // 11.9ns
    asm volatile("nop" ::); // 11.9ns
    asm volatile("nop" ::); // 11.9ns
    asm volatile("nop" ::); // 11.9ns
    asm volatile("nop" ::); // 11.9ns
    asm volatile("nop" ::); // 11.9ns
    asm volatile("nop" ::); // 11.9ns
    asm volatile("nop" ::); // 11.9ns
    asm volatile("nop" ::); // 11.9ns
    asm volatile("nop" ::); // 11.9ns
    asm volatile("nop" ::); // 11.9ns
    asm volatile("nop" ::); // 11.9ns
    asm volatile("nop" ::); // 11.9ns
    asm volatile("nop" ::); // 11.9ns
    asm volatile("nop" ::); // 11.9ns
    asm volatile("nop" ::); // 11.9ns
    asm volatile("nop" ::); // 11.9ns 249.9 ns

    asm volatile("nop" ::); // 11.9ns
    asm volatile("nop" ::); // 11.9ns
    asm volatile("nop" ::); // 11.9ns
    asm volatile("nop" ::); // 11.9ns
    asm volatile("nop" ::); // 11.9ns
    asm volatile("nop" ::); // 11.9ns
    asm volatile("nop" ::); // 11.9ns
    asm volatile("nop" ::); // 11.9ns
    asm volatile("nop" ::); // 11.9ns
    asm volatile("nop" ::); // 11.9ns
    asm volatile("nop" ::); // 11.9ns
    asm volatile("nop" ::); // 11.9ns
    asm volatile("nop" ::); // 11.9ns
    asm volatile("nop" ::); // 11.9ns
    asm volatile("nop" ::); // 11.9ns
    asm volatile("nop" ::); // 11.9ns
    asm volatile("nop" ::); // 11.9ns
    asm volatile("nop" ::); // 11.9ns
    asm volatile("nop" ::); // 11.9ns
    asm volatile("nop" ::); // 11.9ns
    asm volatile("nop" ::); // 11.9ns
}
// Special hack for fast software SPI on due with repetier
extern "C" uint8_t u8x8_byte_arduino_4wire_sw_spi(u8x8_t* u8x8, uint8_t msg, uint8_t arg_int, void* arg_ptr) {
    uint8_t i, b;
    uint8_t* data;
    uint8_t takeover_edge = u8x8_GetSPIClockPhase(u8x8);
    uint8_t startLevel = u8x8_GetSPIClockDefaultLevel(u8x8); // polarity

    switch (msg) {
    case U8X8_MSG_BYTE_SEND:
        // return u8x8_byte_4wire_sw_spi(u8x8, msg, arg_int, arg_ptr);
        data = (uint8_t*)arg_ptr;
        while (arg_int > 0) {
            b = *data;
            data++;
            arg_int--;
            if (takeover_edge) {
                for (i = 0; i < 8; i++) {
                    WRITE(UI_SPI_SCK, !startLevel);
                    u8g2_spi_wait();
                    if (b & 128) {
                        WRITE(UI_SPI_MOSI, 1);
                    } else {
                        WRITE(UI_SPI_MOSI, 0);
                    }
                    b <<= 1;
                    u8g2_spi_wait();
                    WRITE(UI_SPI_SCK, startLevel); // Takeover
                    u8g2_spi_wait();
                    // u8g2_spi_wait();
                }
            } else { // takeover at first edge
                for (i = 0; i < 8; i++) {
                    if (b & 128) {
                        WRITE(UI_SPI_MOSI, 1);
                    } else {
                        WRITE(UI_SPI_MOSI, 0);
                    }
                    b <<= 1;
                    u8g2_spi_wait();
                    u8g2_spi_wait();
                    WRITE(UI_SPI_SCK, !startLevel); // Takeover
                    u8g2_spi_wait();
                    u8g2_spi_wait();
                    WRITE(UI_SPI_SCK, startLevel);
                }
            }
            u8g2_spi_wait();
            u8g2_spi_wait();
            u8g2_spi_wait();
            u8g2_spi_wait();
        }
        break;

    case U8X8_MSG_BYTE_INIT:
        // return u8x8_byte_4wire_sw_spi(u8x8, msg, arg_int, arg_ptr);
        /* disable chipselect */
        /* for SPI: setup correct level of the clock signal */
        // WRITE(UI_SPI_SCK, u8x8_GetSPIClockPhase(u8x8));
        WRITE(UI_SPI_SCK, !u8x8_GetSPIClockDefaultLevel(u8x8));
        //u8x8_gpio_SetCS(u8x8, u8x8->display_info->chip_disable_level);
        WRITE(UI_SPI_CS, u8x8->display_info->chip_disable_level);
        /* no wait required here */

        break;
    case U8X8_MSG_BYTE_SET_DC:
        // return u8x8_byte_4wire_sw_spi(u8x8, msg, arg_int, arg_ptr);
#if UI_SPI_DC >= 0
        WRITE(UI_SPI_DC, arg_int);
#endif
        break;
    case U8X8_MSG_BYTE_START_TRANSFER:
        // return u8x8_byte_4wire_sw_spi(u8x8, msg, arg_int, arg_ptr);
        WRITE(UI_SPI_SCK, !u8x8_GetSPIClockDefaultLevel(u8x8));
        WRITE(UI_SPI_CS, u8x8->display_info->chip_enable_level);
        // u8x8_gpio_SetCS(u8x8, u8x8->display_info->chip_enable_level);
        u8x8->gpio_and_delay_cb(u8x8, U8X8_MSG_DELAY_NANO, u8x8->display_info->post_chip_enable_wait_ns, NULL);
        // u8g2_spi_wait();

        break;
    case U8X8_MSG_BYTE_END_TRANSFER:
        // return u8x8_byte_4wire_sw_spi(u8x8, msg, arg_int, arg_ptr);
        u8x8->gpio_and_delay_cb(u8x8, U8X8_MSG_DELAY_NANO, u8x8->display_info->pre_chip_disable_wait_ns, NULL);
        // u8x8_gpio_SetCS(u8x8, u8x8->display_info->chip_disable_level);
        WRITE(UI_SPI_CS, u8x8->display_info->chip_disable_level);
        WRITE(UI_SPI_SCK, !u8x8_GetSPIClockDefaultLevel(u8x8));
        break;
    default:
        return 0;
    }
    return 1;
}

Also this works I'm not really satisfied. This is much slower then 1MHz hardware SPI I guess. Critical section is this

            if (takeover_edge) {
                for (i = 0; i < 8; i++) {
                    WRITE(UI_SPI_SCK, !startLevel);
                    u8g2_spi_wait();
                    if (b & 128) {
                        WRITE(UI_SPI_MOSI, 1);
                    } else {
                        WRITE(UI_SPI_MOSI, 0);
                    }
                    b <<= 1;
                    u8g2_spi_wait();
                    WRITE(UI_SPI_SCK, startLevel); // Takeover
                    u8g2_spi_wait();
                    // u8g2_spi_wait();
                }

u8g2_spi_wait inserts 500ns wait. What makes no sense is the first wait I inserted. Since it should take value on rising edge a wait there should have no function, but without I get garbage.

If you read the datasheet a 5v powered ST7920 requires 200ns on each side of the rising edge, but 200ns does not work here. Further reading the specs gives 72us per command. I have seen you had some delays in your code and commented them later again. So you seem to be aware of timing issues here. I find the datasheet I have found (http://www.hpinfotech.ro/ST7920.pdf) not so clear about the timings. The sample serial code has no waits - guess 8081 is not that fast.

If you check the sample code on page 27 you see they always end with SCK 0 so start
SCK 0 // from last command
CS 1
Set bit data
SCK 0
SCK 1
Set bit data
SCK 0

Starting and ending with clock low mean CPOL 0.
Seeing sample code
MOVBIT SID, A.0 ; SID = A.0
SETB SCLK ; READ DATA FROM SID
CLR SCLK

I read that data is transferred when switching SCK to high. So the SPI mode would be 0 form this. But you are using 3 ( which also copies at rising edge just with different timing).

So you see I'm currently a bit confused on where I need waits for command and inside software spi and which SPI mode is the correct one. Ideally there would be fast solution for due software spi as well in your library.

Any ideas are welcome to improve on this.

@olikraus
Copy link
Owner

Thanks for putting so much effort into this topic.

So you say, that SPI mode 3 (https://github.com/olikraus/u8g2/blob/master/csrc/u8x8_d_st7920.c#L167) is wrong?

Actually, the ST7920 is one of the most difficult displays with respect to timing. I would say it like this: At a certain point of time, I just gave up further research here.

If you have a better solution including a mode change i am definitly willing to test this with my own displays here.

@repetier
Copy link
Author

Regarding timing you are totally correct. And it is one of the slowest devices I guess. But I can not choose what users have.

The problem is I'm not sure where the problem lies. Have just tested with mode 0 and if spi has enough delay it is working. If i reduce it a bit the top 1/4 is randomly correct or wrong.

My hope was that you has some more insight into timings. mode 0 and 3 are to some extend the same for the controller as both transfer on rising edge. So as long as data is present there is makes not difference I guess. It will either take it on first edge or ignore first edge as it is falling. That will be the reason mode 3 is also working.

One thing I also have is that signal lines have 3.3V while display has 5V I guess. might make signalling also slower on my test device. Guess I will tweak the timings a bit more and then leave it with best I get working.

@repetier
Copy link
Author

BTW: Could you test if spi mode 0 with hardware SPI would work? Would at least show that I'm on the right way.

Also does it really work with 2MHz on AVR? Due might have the 3.3V problem that it is slower. Timing diagram also shows that 3.3V needs longer delays.

@olikraus
Copy link
Owner

olikraus commented Nov 18, 2018

I try to collect the history of the u8g integration into Marlin:

The work on the u8g/st7920 started in 2013, mainly driven by maik@stohn.de
See the imported issue 175 from google code here: olikraus/u8glib#175

The original issue is still there:
https://code.google.com/archive/p/u8glib/issues/175

But it looks like, that the names and some comments are corrupted.

Nevertheless, I think maik@stohn.de (@MaikStohn) is the person with the most knowledge on this topic.

As soon as I have some time, I will do some testing here.

@olikraus
Copy link
Owner

One more comment: During development of u8g2 I rewrote the complete ST7920 interface. Although I tried to takeover some ideas of his work, I think many details of his implementation and improvements for u8glib are not part of u8g2.

@olikraus
Copy link
Owner

olikraus commented Nov 18, 2018

Here are some tests with my Uno and Hardware SPI:
2MHz does NOT work:
Mode 0 and 3: Glitches and artefacts on the screen
Mode 1 and 2: No display at all
It will work with u8g2.setBusClock(1800000); but results are the same as u8g2.setBusClock(1000000);. I assume, AVR can not control SPI speed more precisely.

  Uno U8G2_ST7920_128X64_F_HW_SPI			HW SPI 0.1MHz FPS: Clip=5.4 Box=5.4  @=3.6 Pix=4.2
  Uno U8G2_ST7920_128X64_F_HW_SPI			HW SPI 1.0MHz	 FPS: Clip=23.0 Box=22.8  @=7.4 Pix=10.4
  Uno U8G2_ST7920_128X64_F_HW_SPI			HW SPI 1.8MHz FPS: Clip=23.0 Box=22.8  @=7.4 Pix=10.4

@olikraus
Copy link
Owner

olikraus commented Nov 18, 2018

This is the result for software SPI:

  Uno U8G2_ST7920_128X64_F_SW_SPI			SW SPI		FPS: Clip=22.7 Box=21.0  @=7.1 Pix=9.6

So the results are almost the same.

Edit: The code is different between AVR and other uC: Actually the AVR code for SW SPI is very much optimized.

@olikraus
Copy link
Owner

ok, I see the Problem:

  Uno U8G2_ST7920_128X64_F_SW_SPI			SW SPI		FPS: Clip=22.7 Box=21.0  @=7.1 Pix=9.6
  Due U8G2_ST7920_128X64_F_SW_SPI			SW SPI		FPS: Clip=3.3 Box=3.2  @=3.1 Pix=3.1

Due defaults to the standard SPI procedure, which is indeed really slow.

@olikraus
Copy link
Owner

ok, I have created a special code for Due:

  Due U8G2_ST7920_128X64_F_SW_SPI			SW SPI		FPS: Clip=3.3 Box=3.2  @=3.1 Pix=3.1
  Due U8G2_ST7920_128X64_F_SW_SPI			SW SPI		FPS: Clip=18.0 Box=17.8  @=13.5 Pix=14.6		/* Due optimized, 1000ns */

It has a fixed delay of 1000us before and after the takeover edge.
Maybe it can be optimized even further.

@olikraus
Copy link
Owner

Ok, I think I have a version, which is even faster than Arduino HW SPI:

  Due U8G2_ST7920_128X64_F_SW_SPI			SW SPI		FPS: Clip=3.3 Box=3.2  @=3.1 Pix=3.1
  Due U8G2_ST7920_128X64_F_SW_SPI			SW SPI		FPS: Clip=18.0 Box=17.8  @=13.5 Pix=14.6		/* Due optimized, 1000ns pre&post edge*/
  Due U8G2_ST7920_128X64_F_SW_SPI			SW SPI		FPS: Clip=27.2 Box=26.6  @=17.9 Pix=20.1		/* Due optimized, 1000ns post edge*/

Will checkin the code...

olikraus added a commit that referenced this issue Nov 18, 2018
@olikraus
Copy link
Owner

You can just copy
https://github.com/olikraus/u8g2/blob/master/cppsrc/U8x8lib.cpp
to your local version of u8g2.
Will this solve you problem?

@olikraus olikraus changed the title u8g2 in repetier-firmware v2 SW SPI speedup for Due (was: u8g2 in repetier-firmware v2) Nov 18, 2018
@olikraus olikraus added this to the 2.25 milestone Nov 18, 2018
@repetier
Copy link
Author

Ok, tested your version but it did not work. So I modified it into this version:

// coarse 100ns timer
inline void u8g2_delay_x100ns(int x) {
    while (x--) {               // adds at last 24ns
        asm volatile("nop" ::); // 11.9ns
        asm volatile("nop" ::); // 11.9ns
        asm volatile("nop" ::); // 11.9ns
        asm volatile("nop" ::); // 11.9ns
        asm volatile("nop" ::); // 11.9ns
        asm volatile("nop" ::); // 11.9ns
        asm volatile("nop" ::); // 11.9ns  83.3ns total
    }
}
/* this function completly replaces u8x8_byte_4wire_sw_spi*/
extern "C" uint8_t u8x8_byte_arduino_4wire_sw_spi(u8x8_t* u8x8, uint8_t msg, uint8_t arg_int, void* arg_ptr) {
    uint8_t i, b;
    // uint16_t us = ((u8x8->display_info->sck_pulse_width_ns + 999) / 1000);
    uint8_t* data;
    uint8_t takeover_edge = u8x8_GetSPIClockPhase(u8x8);
    //uint8_t not_takeover_edge = 1 - takeover_edge;

    /* the following static vars are recalculated in U8X8_MSG_BYTE_START_TRANSFER */
    /* so, it should be possible to use multiple displays with different pins */

    static volatile uint32_t* arduino_clock_port;

    static uint32_t arduino_clock_mask;
    static uint32_t arduino_clock_n_mask;

    static volatile uint32_t* arduino_data_port;
    static uint32_t arduino_data_mask;
    static uint32_t arduino_data_n_mask;

    switch (msg) {
    case U8X8_MSG_BYTE_SEND:

        data = (uint8_t*)arg_ptr;
        if (takeover_edge == 0) {
            while (arg_int > 0) {
                b = *data;
                data++;
                arg_int--;
                {
                    for (i = 0; i < 8; i++) {
                        if (b & 128)
                            *arduino_data_port |= arduino_data_mask;
                        else
                            *arduino_data_port &= arduino_data_n_mask;

                        u8g2_delay_x100ns(3);
                        *arduino_clock_port |= arduino_clock_mask;
                        b <<= 1;
                        u8g2_delay_x100ns(3);
                        *arduino_clock_port &= arduino_clock_n_mask;
                    }
                }
            }
        } else {
            while (arg_int > 0) {
                b = *data;
                data++;
                arg_int--;
                {
                    for (i = 0; i < 8; i++) {
                        if (b & 128)
                            *arduino_data_port |= arduino_data_mask;
                        else
                            *arduino_data_port &= arduino_data_n_mask;

                        u8g2_delay_x100ns(3);
                        *arduino_clock_port &= arduino_clock_n_mask;
                        b <<= 1;
                        u8g2_delay_x100ns(3);
                        *arduino_clock_port |= arduino_clock_mask;
                    }
                }
            }
        }
        break;

    case U8X8_MSG_BYTE_INIT:
        /* disable chipselect */
        u8x8_gpio_SetCS(u8x8, u8x8->display_info->chip_disable_level);
        /* no wait required here */

        /* for SPI: setup correct level of the clock signal */
        u8x8_gpio_SetSPIClock(u8x8, u8x8_GetSPIClockPhase(u8x8));
        break;
    case U8X8_MSG_BYTE_SET_DC:
        u8x8_gpio_SetDC(u8x8, arg_int);
        break;
    case U8X8_MSG_BYTE_START_TRANSFER:
        u8x8_gpio_SetCS(u8x8, u8x8->display_info->chip_enable_level);
        u8x8->gpio_and_delay_cb(u8x8, U8X8_MSG_DELAY_NANO, u8x8->display_info->post_chip_enable_wait_ns, NULL);

        /* there is no consistency checking for u8x8->pins[U8X8_PIN_SPI_CLOCK] */

        arduino_clock_port = portOutputRegister(digitalPinToPort(u8x8->pins[U8X8_PIN_SPI_CLOCK]));
        arduino_clock_mask = digitalPinToBitMask(u8x8->pins[U8X8_PIN_SPI_CLOCK]);
        arduino_clock_n_mask = ~arduino_clock_mask;

        /* there is no consistency checking for u8x8->pins[U8X8_PIN_SPI_DATA] */

        arduino_data_port = portOutputRegister(digitalPinToPort(u8x8->pins[U8X8_PIN_SPI_DATA]));
        arduino_data_mask = digitalPinToBitMask(u8x8->pins[U8X8_PIN_SPI_DATA]);
        arduino_data_n_mask = ~arduino_data_mask;

        break;
    case U8X8_MSG_BYTE_END_TRANSFER:
        u8x8->gpio_and_delay_cb(u8x8, U8X8_MSG_DELAY_NANO, u8x8->display_info->pre_chip_disable_wait_ns, NULL);
        u8x8_gpio_SetCS(u8x8, u8x8->display_info->chip_disable_level);
        break;
    default:
        return 0;
    }
    return 1;
}

Here some test results:
Your optimized version 76ms (no image for me)
600/600 142
500/500 128
400/400 117
300/300 100
200/200 defect
200/300 defect

as you see I wrote my own delay version with approx. 100ns resolution. I guess a bit more.
Not sure what happened with your 1us delay. Since my 300+300ns = 600ns is slower I do not think my arduino library did a real delay here.

BTW: Your ST7920 definition has 140us pulse width, but datasheet show "TSCYC Serial clock cycle Pin E 600" for 2.7V which fits also the measured results until where it was working.

I think that the printer I'm currently testing is a bit slower then the other one. When I'm back to the other printer I test there as well. But I'm not sure I would go the risk of having all the complains for 20ms difference I could gain. Better take the save 600ns split on each side. That is 1.6MHz.
So maybe insert your ns trick back just with 100ns resolution and increase the value in st7920 definition so you get 3/3 as delay. That is at least what worked now for me best.

@olikraus
Copy link
Owner

Ok, tested your version but it did not work.

strange, I tested everything with my Due/ST7920 combination.

Can you generate a pull request with your modification?

@repetier
Copy link
Author

Can't you not just copy my code I posted. I've embedded your code inside firmware and have no fork of this. So it would be quite some work compared with a a copy/paste done in seconds.

@olikraus
Copy link
Owner

Can't you not just copy my code I posted. I've embedded your code inside firmware and have no fork of this. So it would be quite some work compared with a a copy/paste done in seconds.

ok

@repetier
Copy link
Author

repetier commented Dec 1, 2018

Just found out that your solution to set/delete a bit is very bad:-( It is not thread safe so any interrupt setting/clearing bits on the same port can cause wrong signals plus the display driver sets these bits wrong as well. Please find the new thread safe version for replacement:

/* this function completly replaces u8x8_byte_4wire_sw_spi*/
extern "C" uint8_t u8x8_byte_arduino_4wire_sw_spi(u8x8_t* u8x8, uint8_t msg, uint8_t arg_int, void* arg_ptr) {
    uint8_t i, b;
    // uint16_t us = ((u8x8->display_info->sck_pulse_width_ns + 999) / 1000);
    uint8_t* data;
    uint8_t takeover_edge = u8x8_GetSPIClockPhase(u8x8);
    //uint8_t not_takeover_edge = 1 - takeover_edge;

    /* the following static vars are recalculated in U8X8_MSG_BYTE_START_TRANSFER */
    /* so, it should be possible to use multiple displays with different pins */

    static WoReg *arduinoSetClockPort, *arduinoUnsetClockPort;
    static uint32_t arduino_clock_mask;

    static WoReg *arduinoSetDataPort, *arduinoUnsetDataPort;
    static uint32_t arduino_data_mask;

    switch (msg) {
    case U8X8_MSG_BYTE_SEND:

        data = (uint8_t*)arg_ptr;
        if (takeover_edge == 0) {
            while (arg_int > 0) {
                b = *data;
                data++;
                arg_int--;
                {
                    for (i = 0; i < 8; i++) {
                        if (b & 128)
                            *arduinoSetDataPort = arduino_data_mask;
                        else
                            *arduinoUnsetDataPort = arduino_data_mask;

                        u8g2_delay_x100ns(3);
                        *arduinoSetClockPort = arduino_clock_mask;
                        b <<= 1;
                        u8g2_delay_x100ns(3);
                        *arduinoUnsetClockPort = arduino_clock_mask;
                    }
                }
            }
        } else {
            while (arg_int > 0) {
                b = *data;
                data++;
                arg_int--;
                {
                    for (i = 0; i < 8; i++) {
                        if (b & 128)
                            *arduinoSetDataPort = arduino_data_mask;
                        else
                            *arduinoUnsetDataPort = arduino_data_mask;

                        u8g2_delay_x100ns(3);
                        *arduinoUnsetClockPort = arduino_clock_mask;
                        b <<= 1;
                        u8g2_delay_x100ns(3);
                        *arduinoSetClockPort = arduino_clock_mask;
                    }
                }
            }
        }
        break;

    case U8X8_MSG_BYTE_INIT:
        /* disable chipselect */
        u8x8_gpio_SetCS(u8x8, u8x8->display_info->chip_disable_level);
        /* no wait required here */

        /* for SPI: setup correct level of the clock signal */
        u8x8_gpio_SetSPIClock(u8x8, u8x8_GetSPIClockPhase(u8x8));
        break;
    case U8X8_MSG_BYTE_SET_DC:
        u8x8_gpio_SetDC(u8x8, arg_int);
        break;
    case U8X8_MSG_BYTE_START_TRANSFER:
        u8x8_gpio_SetCS(u8x8, u8x8->display_info->chip_enable_level);
        u8x8->gpio_and_delay_cb(u8x8, U8X8_MSG_DELAY_NANO, u8x8->display_info->post_chip_enable_wait_ns, NULL);

        /* there is no consistency checking for u8x8->pins[U8X8_PIN_SPI_CLOCK] */

        arduinoSetClockPort = &digitalPinToPort(u8x8->pins[U8X8_PIN_SPI_CLOCK])->PIO_SODR;
        arduinoUnsetClockPort = &digitalPinToPort(u8x8->pins[U8X8_PIN_SPI_CLOCK])->PIO_CODR;

        arduino_clock_mask = digitalPinToBitMask(u8x8->pins[U8X8_PIN_SPI_CLOCK]);

        /* there is no consistency checking for u8x8->pins[U8X8_PIN_SPI_DATA] */

        arduinoSetDataPort = &digitalPinToPort(u8x8->pins[U8X8_PIN_SPI_DATA])->PIO_SODR;
        arduinoUnsetDataPort = &digitalPinToPort(u8x8->pins[U8X8_PIN_SPI_DATA])->PIO_CODR;
        arduino_data_mask = digitalPinToBitMask(u8x8->pins[U8X8_PIN_SPI_DATA]);

        break;
    case U8X8_MSG_BYTE_END_TRANSFER:
        u8x8->gpio_and_delay_cb(u8x8, U8X8_MSG_DELAY_NANO, u8x8->display_info->pre_chip_disable_wait_ns, NULL);
        u8x8_gpio_SetCS(u8x8, u8x8->display_info->chip_disable_level);
        break;
    default:
        return 0;
    }
    return 1;
}

olikraus added a commit that referenced this issue Dec 2, 2018
@olikraus
Copy link
Owner

olikraus commented Dec 2, 2018

I have update the code. Test with Due & DOGXL160 was successful.

@repetier
Copy link
Author

repetier commented Dec 3, 2018

Great. I had also a look into the AVR equivalent. I fear there you have the same problem. Read
https://www.mikrocontroller.net/topic/275811

The problem is you write something like this:
*arduino_clock_port |= arduino_clock_mask;

Which gets normally translated to

in r16,arduino_clock_port
ori r16,arduino_clock_mask
out arduino_clock_port,r16

as you see the operation is not atomic and hence not thread safe. If after the in an interrupt occurs that modifies the same port you will not notice and the bit is set wrong. That is what happened with the due case causing wrong stepper signals that blocked the motor.

What you really want is
sbi arduino_clock_port,arduino_clock_mask
which is a atomic command. But sbi only works for the first 32 registers so giving a simple pointer will not force the compiler to optimize this.

Currently the new firmware using the library does not have avr support so I can not really test, but I'm quite sure the compiler will not do what you want here. I think you need to force him with some assembler code to use sbi/cbi to make it atomic as you of course know that the address will be a register.

@olikraus
Copy link
Owner

olikraus commented Dec 3, 2018

hmm... probably true

olikraus added a commit that referenced this issue Dec 3, 2018
@repetier
Copy link
Author

repetier commented Dec 4, 2018

I'm quite sure it is true. Here the part how arduino does it:

uint8_t oldSREG = SREG;
	cli();

	if (val == LOW) {
		*out &= ~bit;
	} else {
		*out |= bit;
	}

	SREG = oldSREG;

Of course this solution is quite slow due to interrupt prevention. On the other side you can not prevent too long or serial or setpper driver interrupts get into trouble. But maybe it is ok to put it around one bit being send. That are at least 4 bit changes.

Still better is of course the SBI/CBI solution whcih will be faster instead. Just have no idea how to get the port address to the asm command.

@olikraus
Copy link
Owner

olikraus commented Dec 4, 2018

So you say, I block it for a too long time?

https://github.com/olikraus/u8g2/blob/master/cppsrc/U8x8lib.cpp#L416

@repetier
Copy link
Author

repetier commented Dec 4, 2018

Ok, haven't seen your latest commit. So you have now added protection. As I see you have it per send byte and no delays so it should be fairly quick. If it is too coarse or not I can not say at the moment. For avr we normally assume 40KHz as maximum which would be every 400 cycles. I guess sending a byte takes a bit shorter so should allow enough interrupts to happen. So I guess it is good to leave it like that until someone complains. As I said currently I have no avr support with new firmware so can not say how bad it is. If I find it too bad I might check assembler solution as suggested. But for now I consider it fixed.

@olikraus
Copy link
Owner

olikraus commented Dec 4, 2018

Ok :)

@olikraus
Copy link
Owner

olikraus commented Feb 9, 2019

hmmm i assume, we can close this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants