New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SW SPI speedup for Due (was: u8g2 in repetier-firmware v2) #749
Comments
Thanks for putting so much effort into this topic. So you say, that SPI mode 3 (https://github.com/olikraus/u8g2/blob/master/csrc/u8x8_d_st7920.c#L167) is wrong? Actually, the ST7920 is one of the most difficult displays with respect to timing. I would say it like this: At a certain point of time, I just gave up further research here. If you have a better solution including a mode change i am definitly willing to test this with my own displays here. |
Regarding timing you are totally correct. And it is one of the slowest devices I guess. But I can not choose what users have. The problem is I'm not sure where the problem lies. Have just tested with mode 0 and if spi has enough delay it is working. If i reduce it a bit the top 1/4 is randomly correct or wrong. My hope was that you has some more insight into timings. mode 0 and 3 are to some extend the same for the controller as both transfer on rising edge. So as long as data is present there is makes not difference I guess. It will either take it on first edge or ignore first edge as it is falling. That will be the reason mode 3 is also working. One thing I also have is that signal lines have 3.3V while display has 5V I guess. might make signalling also slower on my test device. Guess I will tweak the timings a bit more and then leave it with best I get working. |
BTW: Could you test if spi mode 0 with hardware SPI would work? Would at least show that I'm on the right way. Also does it really work with 2MHz on AVR? Due might have the 3.3V problem that it is slower. Timing diagram also shows that 3.3V needs longer delays. |
I try to collect the history of the u8g integration into Marlin: The work on the u8g/st7920 started in 2013, mainly driven by maik@stohn.de The original issue is still there: But it looks like, that the names and some comments are corrupted. Nevertheless, I think maik@stohn.de (@MaikStohn) is the person with the most knowledge on this topic. As soon as I have some time, I will do some testing here. |
One more comment: During development of u8g2 I rewrote the complete ST7920 interface. Although I tried to takeover some ideas of his work, I think many details of his implementation and improvements for u8glib are not part of u8g2. |
Here are some tests with my Uno and Hardware SPI:
|
This is the result for software SPI:
So the results are almost the same. Edit: The code is different between AVR and other uC: Actually the AVR code for SW SPI is very much optimized. |
ok, I see the Problem:
Due defaults to the standard SPI procedure, which is indeed really slow. |
ok, I have created a special code for Due:
It has a fixed delay of 1000us before and after the takeover edge. |
Ok, I think I have a version, which is even faster than Arduino HW SPI:
Will checkin the code... |
You can just copy |
Ok, tested your version but it did not work. So I modified it into this version:
Here some test results: as you see I wrote my own delay version with approx. 100ns resolution. I guess a bit more. BTW: Your ST7920 definition has 140us pulse width, but datasheet show "TSCYC Serial clock cycle Pin E 600" for 2.7V which fits also the measured results until where it was working. I think that the printer I'm currently testing is a bit slower then the other one. When I'm back to the other printer I test there as well. But I'm not sure I would go the risk of having all the complains for 20ms difference I could gain. Better take the save 600ns split on each side. That is 1.6MHz. |
strange, I tested everything with my Due/ST7920 combination. Can you generate a pull request with your modification? |
Can't you not just copy my code I posted. I've embedded your code inside firmware and have no fork of this. So it would be quite some work compared with a a copy/paste done in seconds. |
ok |
Just found out that your solution to set/delete a bit is very bad:-( It is not thread safe so any interrupt setting/clearing bits on the same port can cause wrong signals plus the display driver sets these bits wrong as well. Please find the new thread safe version for replacement:
|
I have update the code. Test with Due & DOGXL160 was successful. |
Great. I had also a look into the AVR equivalent. I fear there you have the same problem. Read The problem is you write something like this: Which gets normally translated to
as you see the operation is not atomic and hence not thread safe. If after the in an interrupt occurs that modifies the same port you will not notice and the bit is set wrong. That is what happened with the due case causing wrong stepper signals that blocked the motor. What you really want is Currently the new firmware using the library does not have avr support so I can not really test, but I'm quite sure the compiler will not do what you want here. I think you need to force him with some assembler code to use sbi/cbi to make it atomic as you of course know that the address will be a register. |
hmm... probably true |
I'm quite sure it is true. Here the part how arduino does it:
Of course this solution is quite slow due to interrupt prevention. On the other side you can not prevent too long or serial or setpper driver interrupts get into trouble. But maybe it is ok to put it around one bit being send. That are at least 4 bit changes. Still better is of course the SBI/CBI solution whcih will be faster instead. Just have no idea how to get the port address to the asm command. |
So you say, I block it for a too long time? https://github.com/olikraus/u8g2/blob/master/cppsrc/U8x8lib.cpp#L416 |
Ok, haven't seen your latest commit. So you have now added protection. As I see you have it per send byte and no delays so it should be fairly quick. If it is too coarse or not I can not say at the moment. For avr we normally assume 40KHz as maximum which would be every 400 cycles. I guess sending a byte takes a bit shorter so should allow enough interrupts to happen. So I guess it is good to leave it like that until someone complains. As I said currently I have no avr support with new firmware so can not say how bad it is. If I find it too bad I might check assembler solution as suggested. But for now I consider it fixed. |
Ok :) |
hmmm i assume, we can close this |
I'm in the progress of using u8g2 instead of old u8g in next major release V2.
So far it is working form the functional side, but one of the most used displays are ST7920 based with software SPI. Till now V2 only supports due so I was quite astonished to see it taking more then 500ms to show a screen. It is out of question that this is not acceptable and would prevent usage. So i started investigating the software SPI. I saw you did already optimize it for 8 bit, but for due it used a generic solution, which explained the speed difference.
So I copied the 8 bit part and rewrote it to use WRITE from the firmware fastio solution using the hardcoded pin numbers (not friendly for your library but easy for me to do while being fast). Now refresh time was 28ms which would be quite usefull - except the display had no content. After some googling and reading the datasheet it was clear that my solution was much too fast for that controller.
So the next days I did many test. First I noticed that you have only 2 spi modes for software spi. YOu only use clock polarity and always send data on the first clock edge. ST7920 and the defined mode 3 let me assume that the rising edge is correct but it should be the second one.
Here a working version I came up with (taking 130ms in this version)
Also this works I'm not really satisfied. This is much slower then 1MHz hardware SPI I guess. Critical section is this
u8g2_spi_wait inserts 500ns wait. What makes no sense is the first wait I inserted. Since it should take value on rising edge a wait there should have no function, but without I get garbage.
If you read the datasheet a 5v powered ST7920 requires 200ns on each side of the rising edge, but 200ns does not work here. Further reading the specs gives 72us per command. I have seen you had some delays in your code and commented them later again. So you seem to be aware of timing issues here. I find the datasheet I have found (http://www.hpinfotech.ro/ST7920.pdf) not so clear about the timings. The sample serial code has no waits - guess 8081 is not that fast.
If you check the sample code on page 27 you see they always end with SCK 0 so start
SCK 0 // from last command
CS 1
Set bit data
SCK 0
SCK 1
Set bit data
SCK 0
Starting and ending with clock low mean CPOL 0.
Seeing sample code
MOVBIT SID, A.0 ; SID = A.0
SETB SCLK ; READ DATA FROM SID
CLR SCLK
I read that data is transferred when switching SCK to high. So the SPI mode would be 0 form this. But you are using 3 ( which also copies at rising edge just with different timing).
So you see I'm currently a bit confused on where I need waits for command and inside software spi and which SPI mode is the correct one. Ideally there would be fast solution for due software spi as well in your library.
Any ideas are welcome to improve on this.
The text was updated successfully, but these errors were encountered: